Context Navigation

Reverse Diff

results.tex [383e159:20ffcf3]

File:

: 1 edited

doc/proposals/concurrency/text/results.tex (modified) (10 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/proposals/concurrency/text/results.tex

-              r383e159
+              r20ffcf3
 % ======================================================================
 \section{Machine setup}
 Table \ref{tab:machine} shows the characteristics of the machine used to run the benchmarks. All tests where made on this machine.
 \begin{table}[H]
+Table \ref{tab:machine} shows the characteristiques of the machine used to run the benchmarks. All tests where made on this machine.
+\begin{figure}[H]
 \begin{center}
 \begin{tabular}{| l | r | l | r |}
 …
 \hline
 \hline
+Operating system                & Ubuntu 16.04.3 LTS    & Kernel                & Linux 4.4-97-generic \\
+\hline
+Compiler                        & GCC 6.3               & Translator    & CFA 1 \\
+\hline
+Java version            & OpenJDK-9             & Go version    & 1.9.2 \\
+Operating system                & Ubuntu 16.04.3 LTS    & Kernel                & Linux 4.4.0-97-generic \\
+\hline
+Compiler                        & gcc 6.3.0             & Translator    & CFA 1.0.0 \\
 \hline
 \end{tabular}
 …
 \caption{Machine setup used for the tests}
 \label{tab:machine}
 \end{table}
+\end{figure}
 \section{Micro benchmarks}
 …
 \begin{pseudo}
 #define BENCH(run, result)
         before = gettime();
+        gettime();
         run;
         after  = gettime();
+        gettime();
         result = (after - before) / N;
 \end{pseudo}
 The method used to get time is \code{clock_gettime(CLOCK_THREAD_CPUTIME_ID);}. Each benchmark is using many iterations of a simple call to measure the cost of the call. The specific number of iteration depends on the specific benchmark.
+The method used to get time is \code{clock_gettime(CLOCK_THREAD_CPUTIME_ID);}. Each benchmark is using many interations of a simple call to measure the cost of the call. The specific number of interation dependes on the specific benchmark.
 \subsection{Context-switching}
 The first interesting benchmark is to measure how long context-switches take. The simplest approach to do this is to yield on a thread, which executes a 2-step context switch. In order to make the comparison fair, coroutines also execute a 2-step context-switch (\gls{uthread} to \gls{kthread} then \gls{kthread} to \gls{uthread}), which is a resume/suspend cycle instead of a yield. Listing \ref{lst:ctx-switch} shows the code for coroutines and threads whith the results in table \ref{tab:ctx-switch}. All omitted tests are functionally identical to one of these tests.
+The first interesting benchmark is to measure how long context-switches take. The simplest approach to do this is to yield on a thread, which executes a 2-step context switch. In order to make the comparison fair, coroutines also execute a 2-step context-switch, which is a resume/suspend cycle instead of a yield. Listing \ref{lst:ctx-switch} shows the code for coroutines and threads. All omitted tests are functionally identical to one of these tests. The results can be shown in table \ref{tab:ctx-switch}.
 \begin{figure}
 \begin{multicols}{2}
 …
 \end{cfacode}
 \end{multicols}
+\begin{cfacode}[caption={\CFA benchmark code used to measure context-switches for coroutines and threads.},label={lst:ctx-switch}]
+\end{cfacode}
+\end{figure}
+\begin{table}
+\begin{center}
+\begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
+\cline{2-4}
+\multicolumn{1}{c |}{} & \multicolumn{1}{c |}{ Median } &\multicolumn{1}{c |}{ Average } & \multicolumn{1}{c |}{ Standard Deviation} \\
+\hline
+Kernel Thread   & 241.5 & 243.86        & 5.08 \\
+\CFA Coroutine  & 38            & 38            & 0    \\
+\CFA Thread             & 103           & 102.96        & 2.96 \\
+\uC Coroutine   & 46            & 45.86 & 0.35 \\
+\uC Thread              & 98            & 99.11 & 1.42 \\
+Goroutine               & 150           & 149.96        & 3.16 \\
+Java Thread             & 289           & 290.68        & 8.72 \\
+\hline
+\end{tabular}
+\end{center}
+\caption{Context Switch comparison. All numbers are in nanoseconds(\si{\nano\second})}
+\caption{\CFA benchmark code used to measure context-switches for coroutines and threads.}
+\label{lst:ctx-switch}
+\end{figure}
+\begin{figure}
+\begin{center}
+\begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
+\cline{2-4}
+\multicolumn{1}{c |}{} & \multicolumn{1}{c |}{ Median } &\multicolumn{1}{c |}{ Average } & \multicolumn{1}{c |}{ Standard Deviation} \\
+\hline
+Kernel Threads          & 239           & 242.57        & 5.54 \\
+\CFA Coroutines         & 38            & 38            & 0    \\
+\CFA Threads            & 102           & 102.39        & 1.57 \\
+\uC Coroutines          & 46            & 46.68 & 0.47 \\
+\uC Threads                     & 98            & 99.39 & 1.52 \\
+\hline
+\end{tabular}
+\end{center}
+\caption{Context Switch comparaison. All numbers are in nanoseconds(\si{\nano\second})}
 \label{tab:ctx-switch}
 \end{table}
+\end{figure}
 \subsection{Mutual-exclusion}
 The next interesting benchmark is to measure the overhead to enter/leave a critical-section. For monitors, the simplest approach is to measure how long it takes to enter and leave a monitor routine. Listing \ref{lst:mutex} shows the code for \CFA. To put the results in context, the cost of entering a non-inline function and the cost of acquiring and releasing a pthread mutex lock are also measured. The results can be shown in table \ref{tab:mutex}.
 \begin{figure}
 \begin{cfacode}[caption={\CFA benchmark code used to measure mutex routines.},label={lst:mutex}]
+The next interesting benchmark is to measure the overhead to enter/leave a critical-section. For monitors, the simplest appraoch is to measure how long it takes enter and leave a monitor routine. Listing \ref{lst:mutex} shows the code for \CFA. To put the results in context, the cost of entering a non-inline function and the cost of acquiring and releasing a pthread mutex lock are also mesured. The results can be shown in table \ref{tab:mutex}.
+\begin{figure}
+\begin{cfacode}
 monitor M {};
 void __attribute__((noinline)) call( M & mutex m /*, m2, m3, m4*/ ) {}
 …
+}
 \end{cfacode}
 \end{figure}
 \begin{table}
+\begin{center}
 \begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
 \cline{2-4}
 \multicolumn{1}{c |}{} & \multicolumn{1}{c |}{ Median } &\multicolumn{1}{c |}{ Average } & \multicolumn{1}{c |}{ Standard Deviation} \\
 \hline
 C routine                                               & 2             & 2             & 0    \\
+FetchAdd + FetchSub                             & 26            & 26            & 0    \\
 Pthreads Mutex Lock                             & 31            & 31.86 & 0.99 \\
 \uC \code{monitor} member routine               & 30            & 30            & 0    \\
 \CFA \code{mutex} routine, 1 argument   & 41            & 41.57 & 0.9  \\
 \CFA \code{mutex} routine, 2 argument   & 76            & 76.96 & 1.57 \\
 \CFA \code{mutex} routine, 4 argument   & 145           & 146.68        & 3.85 \\
 Java synchronized routine                       & 27            & 28.57 & 2.6  \\
 \hline
 \end{tabular}
 \end{center}
 \caption{Mutex routine comparison. All numbers are in nanoseconds(\si{\nano\second})}
+\caption{\CFA benchmark code used to measure mutex routines.}
+\label{lst:mutex}
+\end{figure}
+\begin{figure}
+\begin{center}
+\begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
+\cline{2-4}
+\multicolumn{1}{c |}{} & \multicolumn{1}{c |}{ Median } &\multicolumn{1}{c |}{ Average } & \multicolumn{1}{c |}{ Standard Deviation} \\
+\hline
+C routine                                               & 2             & 2             & 0      \\
+Pthreads Mutex Lock                             & 31            & 31.86 & 0.99   \\
+\uC \code{monitor} member routine               & 30            & 30            & 0      \\
+\CFA \code{mutex} routine, 1 argument   & 46            & 46.14 & 0.74   \\
+\CFA \code{mutex} routine, 2 argument   & 82            & 83            & 1.93   \\
+\CFA \code{mutex} routine, 4 argument   & 165           & 161.15        & 54.04  \\
+\hline
+\end{tabular}
+\end{center}
+\caption{Mutex routine comparaison. All numbers are in nanoseconds(\si{\nano\second})}
 \label{tab:mutex}
 \end{table}
+\end{figure}
 \subsection{Internal scheduling}
 The internal-scheduling benchmark measures the cost of waiting on and signalling a condition variable. Listing \ref{lst:int-sched} shows the code for \CFA, with results table \ref{tab:int-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests.
 \begin{figure}
 \begin{cfacode}[caption={Benchmark code for internal scheduling},label={lst:int-sched}]
+The Internal scheduling benchmark measures the cost of waiting on and signaling a condition variable. Listing \ref{lst:int-sched} shows the code for \CFA. The results can be shown in table \ref{tab:int-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests.
+\begin{figure}
+\begin{cfacode}
 volatile int go = 0;
 condition c;
 …
+}
 \end{cfacode}
+\end{figure}
+\begin{table}
+\begin{center}
+\begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
+\cline{2-4}
+\multicolumn{1}{c |}{} & \multicolumn{1}{c |}{ Median } &\multicolumn{1}{c |}{ Average } & \multicolumn{1}{c |}{ Standard Deviation} \\
+\hline
+\uC \code{signal}                                       & 322           & 323   & 3.36   \\
+\CFA \code{signal}, 1 \code{monitor}    & 352.5 & 353.11        & 3.66   \\
+\CFA \code{signal}, 2 \code{monitor}    & 430           & 430.29        & 8.97   \\
+\CFA \code{signal}, 4 \code{monitor}    & 594.5 & 606.57        & 18.33  \\
+Java \code{notify}                              & 13831.5       & 15698.21      & 4782.3 \\
+\hline
+\end{tabular}
+\end{center}
+\caption{Internal scheduling comparison. All numbers are in nanoseconds(\si{\nano\second})}
+\caption{Benchmark code for internal scheduling}
+\label{lst:int-sched}
+\end{figure}
+\begin{figure}
+\begin{center}
+\begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
+\cline{2-4}
+\multicolumn{1}{c |}{} & \multicolumn{1}{c |}{ Median } &\multicolumn{1}{c |}{ Average } & \multicolumn{1}{c |}{ Standard Deviation} \\
+\hline
+\uC \code{signal}                                       & 322           & 322.57        & 2.77  \\
+\CFA \code{signal}, 1 \code{monitor}    & 1145  & 1163.64       & 27.52 \\
+\CFA \code{signal}, 2 \code{monitor}    & 1531  & 1550.75       & 32.77 \\
+\CFA \code{signal}, 4 \code{monitor}    & 2288.5        & 2326.86       & 54.73 \\
+\hline
+\end{tabular}
+\end{center}
+\caption{Internal scheduling comparaison. All numbers are in nanoseconds(\si{\nano\second})}
 \label{tab:int-sched}
 \end{table}
+\end{figure}
 \subsection{External scheduling}
 The Internal scheduling benchmark measures the cost of the \code{waitfor} statement (\code{_Accept} in \uC). Listing \ref{lst:ext-sched} shows the code for \CFA, with results in table \ref{tab:ext-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests.
 \begin{figure}
 \begin{cfacode}[caption={Benchmark code for external scheduling},label={lst:ext-sched}]
+The Internal scheduling benchmark measures the cost of the \code{waitfor} statement (\code{_Accept} in \uC). Listing \ref{lst:ext-sched} shows the code for \CFA. The results can be shown in table \ref{tab:ext-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests.
+\begin{figure}
+\begin{cfacode}
 volatile int go = 0;
 monitor M {};
 …
+}
 \end{cfacode}
+\end{figure}
+\begin{table}
+\begin{center}
+\begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
+\cline{2-4}
+\multicolumn{1}{c |}{} & \multicolumn{1}{c |}{ Median } &\multicolumn{1}{c |}{ Average } & \multicolumn{1}{c |}{ Standard Deviation} \\
+\hline
+\uC \code{Accept}                                       & 350           & 350.61        & 3.11  \\
+\CFA \code{waitfor}, 1 \code{monitor}   & 358.5 & 358.36        & 3.82  \\
+\CFA \code{waitfor}, 2 \code{monitor}   & 422           & 426.79        & 7.95  \\
+\CFA \code{waitfor}, 4 \code{monitor}   & 579.5 & 585.46        & 11.25 \\
+\hline
+\end{tabular}
+\end{center}
+\caption{External scheduling comparison. All numbers are in nanoseconds(\si{\nano\second})}
+\caption{Benchmark code for external scheduling}
+\label{lst:ext-sched}
+\end{figure}
+\begin{figure}
+\begin{center}
+\begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
+\cline{2-4}
+\multicolumn{1}{c |}{} & \multicolumn{1}{c |}{ Median } &\multicolumn{1}{c |}{ Average } & \multicolumn{1}{c |}{ Standard Deviation} \\
+\hline
+\uC \code{Accept}                                       & 349           & 339.32        & 3.14  \\
+\CFA \code{waitfor}, 1 \code{monitor}   & 1155.5        & 1142.04       & 25.23 \\
+\CFA \code{waitfor}, 2 \code{monitor}   & 1361  & 1376.75       & 28.81 \\
+\CFA \code{waitfor}, 4 \code{monitor}   & 1941.5        & 1957.07       & 34.7  \\
+\hline
+\end{tabular}
+\end{center}
+\caption{External scheduling comparaison. All numbers are in nanoseconds(\si{\nano\second})}
 \label{tab:ext-sched}
 \end{table}
+\end{figure}
 \subsection{Object creation}
 Finally, the last benchmark measurs the cost of creation for concurrent objects. Listing \ref{lst:creation} shows the code for pthreads and \CFA threads, with results shown in table \ref{tab:creation}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests. The only note here is that the call-stacks of \CFA coroutines are lazily created, therefore without priming the coroutine, the creation cost is very low.
 \begin{figure}
 \begin{center}
+Finaly, the last benchmark measured is the cost of creation for concurrent objects. Listing \ref{lst:creation} shows the code for pthreads and \CFA threads. The results can be shown in table \ref{tab:creation}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests. The only note here is that the callstacks of \CFA coroutines are lazily created, therefore without priming the coroutine, the creation cost is very low.
+\begin{figure}
+\begin{multicols}{2}
 pthread
 \begin{ccode}
+\begin{cfacode}
 int main() {
         BENCH(
                 for(size_t i=0; i<n; i++) {
                         pthread_t thread;
+                        if(pthread_create(&thread,NULL,foo,NULL)<0) {
+                        if(pthread_create(
+                                &thread,
+                                NULL,
+                                foo,
+                                NULL
+                        ) < 0) {
                                 perror( "failure" );
                                 return 1;
+                        }
+                        if(pthread_join(thread, NULL)<0) {
+                        if(pthread_join(
+                                thread,
+                                NULL
+                        ) < 0) {
                                 perror( "failure" );
                                 return 1;
 …
         printf("%llu\n", result);
+}
+\end{ccode}
+\end{cfacode}
+\columnbreak
 \CFA Threads
 \begin{cfacode}
 …
                 result
+        )
+        printf("%llu\n", result);
+}
+\end{cfacode}
+\end{center}
+\begin{cfacode}[caption={Benchmark code for pthreads and \CFA to measure object creation},label={lst:creation}]
+\end{cfacode}
+\end{figure}
+\begin{table}
+\begin{center}
+\begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
+\cline{2-4}
+\multicolumn{1}{c |}{} & \multicolumn{1}{c |}{ Median } &\multicolumn{1}{c |}{ Average } & \multicolumn{1}{c |}{ Standard Deviation} \\
+\hline
+Pthreads                        & 26996 & 26984.71      & 156.6  \\
+\CFA Coroutine Lazy     & 6             & 5.71  & 0.45   \\
+\CFA Coroutine Eager    & 708           & 706.68        & 4.82   \\
+\CFA Thread                     & 1173.5        & 1176.18       & 15.18  \\
+\uC Coroutine           & 109           & 107.46        & 1.74   \\
+\uC Thread                      & 526           & 530.89        & 9.73   \\
+Goroutine                       & 2520.5        & 2530.93       & 61,56  \\
+Java Thread                     & 91114.5       & 92272.79      & 961.58 \\
+\hline
+\end{tabular}
+\end{center}
+\caption{Creation comparison. All numbers are in nanoseconds(\si{\nano\second})}
+        printf("%llu\n", result);
+}
+\end{cfacode}
+\end{multicols}
+\caption{Bechmark code for pthreads and \CFA to measure object creation}
+\label{lst:creation}
+\end{figure}
+\begin{figure}
+\begin{center}
+\begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
+\cline{2-4}
+\multicolumn{1}{c |}{} & \multicolumn{1}{c |}{ Median } &\multicolumn{1}{c |}{ Average } & \multicolumn{1}{c |}{ Standard Deviation} \\
+\hline
+Pthreads                        & 26974.5       & 26977 & 124.12 \\
+\CFA Coroutines Lazy    & 5             & 5             & 0      \\
+\CFA Coroutines Eager   & 335.0 & 357.67        & 34.2   \\
+\CFA Threads            & 1122.5        & 1109.86       & 36.54  \\
+\uC Coroutines          & 106           & 107.04        & 1.61   \\
+\uC Threads                     & 525.5 & 533.04        & 11.14  \\
+\hline
+\end{tabular}
+\end{center}
+\caption{Creation comparaison. All numbers are in nanoseconds(\si{\nano\second})}
 \label{tab:creation}
 \end{table}
+\end{figure}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changes in doc/proposals/concurrency/text/results.tex [383e159:20ffcf3]

Legend:

doc/proposals/concurrency/text/results.tex

Download in other formats: