Ignore:
Timestamp:
Nov 28, 2017, 3:52:01 PM (6 years ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
6c2ba38
Parents:
f7a4f89
Message:

Results need to be updated but otherwise, tentative final draft

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/proposals/concurrency/text/results.tex

    rf7a4f89 rcf966b5  
    66\section{Machine setup}
    77Table \ref{tab:machine} shows the characteristics of the machine used to run the benchmarks. All tests where made on this machine.
    8 \begin{figure}[H]
     8\begin{table}[H]
    99\begin{center}
    1010\begin{tabular}{| l | r | l | r |}
     
    2828\hline
    2929Compiler                        & GCC 6.3.0             & Translator    & CFA 1.0.0 \\
     30\hline
     31Java version            & OpenJDK-9             & Go version    & 1.9.2 \\
    3032\hline
    3133\end{tabular}
     
    3335\caption{Machine setup used for the tests}
    3436\label{tab:machine}
    35 \end{figure}
     37\end{table}
    3638
    3739\section{Micro benchmarks}
     
    3941\begin{pseudo}
    4042#define BENCH(run, result)
    41         gettime();
     43        before = gettime();
    4244        run;
    43         gettime();
     45        after  = gettime();
    4446        result = (after - before) / N;
    4547\end{pseudo}
     
    4749
    4850\subsection{Context-switching}
    49 The first interesting benchmark is to measure how long context-switches take. The simplest approach to do this is to yield on a thread, which executes a 2-step context switch. In order to make the comparison fair, coroutines also execute a 2-step context-switch, which is a resume/suspend cycle instead of a yield. Listing \ref{lst:ctx-switch} shows the code for coroutines and threads. All omitted tests are functionally identical to one of these tests. The results can be shown in table \ref{tab:ctx-switch}.
     51The first interesting benchmark is to measure how long context-switches take. The simplest approach to do this is to yield on a thread, which executes a 2-step context switch. In order to make the comparison fair, coroutines also execute a 2-step context-switch (\gls{uthread} to \gls{kthread} then \gls{kthread} to \gls{uthread}), which is a resume/suspend cycle instead of a yield. Listing \ref{lst:ctx-switch} shows the code for coroutines and threads whith the results in table \ref{tab:ctx-switch}. All omitted tests are functionally identical to one of these tests.
    5052\begin{figure}
    5153\begin{multicols}{2}
     
    8890\end{cfacode}
    8991\end{multicols}
    90 \caption{\CFA benchmark code used to measure context-switches for coroutines and threads.}
    91 \label{lst:ctx-switch}
    92 \end{figure}
    93 
    94 \begin{figure}
    95 \begin{center}
    96 \begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
    97 \cline{2-4}
    98 \multicolumn{1}{c |}{} & \multicolumn{1}{c |}{ Median } &\multicolumn{1}{c |}{ Average } & \multicolumn{1}{c |}{ Standard Deviation} \\
    99 \hline
    100 Kernel Threads          & 239           & 242.57        & 5.54 \\
    101 \CFA Coroutines         & 38            & 38            & 0    \\
    102 \CFA Threads            & 102           & 102.39        & 1.57 \\
    103 \uC Coroutines          & 46            & 46.68 & 0.47 \\
    104 \uC Threads                     & 98            & 99.39 & 1.52 \\
     92\begin{cfacode}[caption={\CFA benchmark code used to measure context-switches for coroutines and threads.},label={lst:ctx-switch}]
     93\end{cfacode}
     94\end{figure}
     95
     96\begin{table}
     97\begin{center}
     98\begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
     99\cline{2-4}
     100\multicolumn{1}{c |}{} & \multicolumn{1}{c |}{ Median } &\multicolumn{1}{c |}{ Average } & \multicolumn{1}{c |}{ Standard Deviation} \\
     101\hline
     102Kernel Thread   & 239           & 242.57        & 5.54 \\
     103\CFA Coroutine  & 38            & 38            & 0    \\
     104\CFA Thread             & 102           & 102.39        & 1.57 \\
     105\uC Coroutine   & 46            & 46.68 & 0.47 \\
     106\uC Thread              & 98            & 99.39 & 1.52 \\
     107Goroutine               & 148           & 148.0 & 0 \\
     108Java Thread             & 271           & 271.0 & 0 \\
    105109\hline
    106110\end{tabular}
     
    108112\caption{Context Switch comparison. All numbers are in nanoseconds(\si{\nano\second})}
    109113\label{tab:ctx-switch}
    110 \end{figure}
     114\end{table}
    111115
    112116\subsection{Mutual-exclusion}
    113 The next interesting benchmark is to measure the overhead to enter/leave a critical-section. For monitors, the simplest approach is to measure how long it takes enter and leave a monitor routine. Listing \ref{lst:mutex} shows the code for \CFA. To put the results in context, the cost of entering a non-inline function and the cost of acquiring and releasing a pthread mutex lock are also measured. The results can be shown in table \ref{tab:mutex}.
    114 
    115 \begin{figure}
    116 \begin{cfacode}
     117The next interesting benchmark is to measure the overhead to enter/leave a critical-section. For monitors, the simplest approach is to measure how long it takes to enter and leave a monitor routine. Listing \ref{lst:mutex} shows the code for \CFA. To put the results in context, the cost of entering a non-inline function and the cost of acquiring and releasing a pthread mutex lock are also measured. The results can be shown in table \ref{tab:mutex}.
     118
     119\begin{figure}
     120\begin{cfacode}[caption={\CFA benchmark code used to measure mutex routines.},label={lst:mutex}]
    117121monitor M {};
    118122void __attribute__((noinline)) call( M & mutex m /*, m2, m3, m4*/ ) {}
     
    129133}
    130134\end{cfacode}
    131 \caption{\CFA benchmark code used to measure mutex routines.}
    132 \label{lst:mutex}
    133 \end{figure}
    134 
    135 \begin{figure}
     135\end{figure}
     136
     137\begin{table}
    136138\begin{center}
    137139\begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
     
    140142\hline
    141143C routine                                               & 2             & 2             & 0      \\
     144FetchAdd + FetchSub                             & 2             & 2             & 0      \\
    142145Pthreads Mutex Lock                             & 31            & 31.86 & 0.99   \\
    143146\uC \code{monitor} member routine               & 30            & 30            & 0      \\
     
    145148\CFA \code{mutex} routine, 2 argument   & 82            & 83            & 1.93   \\
    146149\CFA \code{mutex} routine, 4 argument   & 165           & 161.15        & 54.04  \\
     150Java synchronized routine                       & 165           & 161.15        & 54.04  \\
    147151\hline
    148152\end{tabular}
     
    150154\caption{Mutex routine comparison. All numbers are in nanoseconds(\si{\nano\second})}
    151155\label{tab:mutex}
    152 \end{figure}
     156\end{table}
    153157
    154158\subsection{Internal scheduling}
    155 The Internal scheduling benchmark measures the cost of waiting on and signalling a condition variable. Listing \ref{lst:int-sched} shows the code for \CFA. The results can be shown in table \ref{tab:int-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests.
    156 
    157 \begin{figure}
    158 \begin{cfacode}
     159The internal-scheduling benchmark measures the cost of waiting on and signalling a condition variable. Listing \ref{lst:int-sched} shows the code for \CFA, with results table \ref{tab:int-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests.
     160
     161\begin{figure}
     162\begin{cfacode}[caption={Benchmark code for internal scheduling},label={lst:int-sched}]
    159163volatile int go = 0;
    160164condition c;
     
    187191}
    188192\end{cfacode}
    189 \caption{Benchmark code for internal scheduling}
    190 \label{lst:int-sched}
    191 \end{figure}
    192 
    193 \begin{figure}
     193\end{figure}
     194
     195\begin{table}
    194196\begin{center}
    195197\begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
     
    201203\CFA \code{signal}, 2 \code{monitor}    & 1531  & 1550.75       & 32.77 \\
    202204\CFA \code{signal}, 4 \code{monitor}    & 2288.5        & 2326.86       & 54.73 \\
     205Java \code{notify}                              & 2288.5        & 2326.86       & 54.73 \\
    203206\hline
    204207\end{tabular}
     
    206209\caption{Internal scheduling comparison. All numbers are in nanoseconds(\si{\nano\second})}
    207210\label{tab:int-sched}
    208 \end{figure}
     211\end{table}
    209212
    210213\subsection{External scheduling}
    211 The Internal scheduling benchmark measures the cost of the \code{waitfor} statement (\code{_Accept} in \uC). Listing \ref{lst:ext-sched} shows the code for \CFA. The results can be shown in table \ref{tab:ext-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests.
    212 
    213 \begin{figure}
    214 \begin{cfacode}
     214The Internal scheduling benchmark measures the cost of the \code{waitfor} statement (\code{_Accept} in \uC). Listing \ref{lst:ext-sched} shows the code for \CFA, with results in table \ref{tab:ext-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests.
     215
     216\begin{figure}
     217\begin{cfacode}[caption={Benchmark code for external scheduling},label={lst:ext-sched}]
    215218volatile int go = 0;
    216219monitor M {};
     
    242245}
    243246\end{cfacode}
    244 \caption{Benchmark code for external scheduling}
    245 \label{lst:ext-sched}
    246 \end{figure}
    247 
    248 \begin{figure}
     247\end{figure}
     248
     249\begin{table}
    249250\begin{center}
    250251\begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
     
    261262\caption{External scheduling comparison. All numbers are in nanoseconds(\si{\nano\second})}
    262263\label{tab:ext-sched}
    263 \end{figure}
     264\end{table}
    264265
    265266\subsection{Object creation}
    266 Finally, the last benchmark measured is the cost of creation for concurrent objects. Listing \ref{lst:creation} shows the code for pthreads and \CFA threads. The results can be shown in table \ref{tab:creation}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests. The only note here is that the call-stacks of \CFA coroutines are lazily created, therefore without priming the coroutine, the creation cost is very low.
    267 
    268 \begin{figure}
    269 \begin{multicols}{2}
     267Finally, the last benchmark measurs the cost of creation for concurrent objects. Listing \ref{lst:creation} shows the code for pthreads and \CFA threads, with results shown in table \ref{tab:creation}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests. The only note here is that the call-stacks of \CFA coroutines are lazily created, therefore without priming the coroutine, the creation cost is very low.
     268
     269\begin{figure}
     270\begin{center}
    270271pthread
    271 \begin{cfacode}
     272\begin{ccode}
    272273int main() {
    273274        BENCH(
    274275                for(size_t i=0; i<n; i++) {
    275276                        pthread_t thread;
    276                         if(pthread_create(
    277                                 &thread,
    278                                 NULL,
    279                                 foo,
    280                                 NULL
    281                         ) < 0) {
     277                        if(pthread_create(&thread,NULL,foo,NULL)<0) {
    282278                                perror( "failure" );
    283279                                return 1;
    284280                        }
    285281
    286                         if(pthread_join(
    287                                 thread,
    288                                 NULL
    289                         ) < 0) {
     282                        if(pthread_join(thread, NULL)<0) {
    290283                                perror( "failure" );
    291284                                return 1;
     
    296289        printf("%llu\n", result);
    297290}
    298 \end{cfacode}
    299 \columnbreak
     291\end{ccode}
     292
     293
     294
    300295\CFA Threads
    301296\begin{cfacode}
     
    307302                result
    308303        )
    309 
    310         printf("%llu\n", result);
    311 }
    312 \end{cfacode}
    313 \end{multicols}
    314 \caption{Benchmark code for pthreads and \CFA to measure object creation}
    315 \label{lst:creation}
    316 \end{figure}
    317 
    318 \begin{figure}
     304        printf("%llu\n", result);
     305}
     306\end{cfacode}
     307\end{center}
     308\begin{cfacode}[caption={Benchmark code for pthreads and \CFA to measure object creation},label={lst:creation}]
     309\end{cfacode}
     310\end{figure}
     311
     312\begin{table}
    319313\begin{center}
    320314\begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
     
    323317\hline
    324318Pthreads                        & 26974.5       & 26977 & 124.12 \\
    325 \CFA Coroutines Lazy    & 5             & 5             & 0      \\
    326 \CFA Coroutines Eager   & 335.0 & 357.67        & 34.2   \\
    327 \CFA Threads            & 1122.5        & 1109.86       & 36.54  \\
    328 \uC Coroutines          & 106           & 107.04        & 1.61   \\
    329 \uC Threads                     & 525.5 & 533.04        & 11.14  \\
     319\CFA Coroutine Lazy     & 5             & 5             & 0      \\
     320\CFA Coroutine Eager    & 335.0 & 357.67        & 34.2   \\
     321\CFA Thread                     & 1122.5        & 1109.86       & 36.54  \\
     322\uC Coroutine           & 106           & 107.04        & 1.61   \\
     323\uC Thread                      & 525.5 & 533.04        & 11.14  \\
     324Goroutine                       & 525.5 & 533.04        & 11.14  \\
     325Java Thread                     & 525.5 & 533.04        & 11.14  \\
    330326\hline
    331327\end{tabular}
     
    333329\caption{Creation comparison. All numbers are in nanoseconds(\si{\nano\second})}
    334330\label{tab:creation}
    335 \end{figure}
     331\end{table}
Note: See TracChangeset for help on using the changeset viewer.