Context Navigation

-                      rf7a4f89
+                      rcf966b5
 \section{Machine setup}
 Table \ref{tab:machine} shows the characteristics of the machine used to run the benchmarks. All tests where made on this machine.
 \begin{figure}[H]
+\begin{table}[H]
 \begin{center}
 \begin{tabular}{| l | r | l | r |}
 …
 \hline
 Compiler                        & GCC 6.3.0             & Translator    & CFA 1.0.0 \\
+\hline
+Java version            & OpenJDK-9             & Go version    & 1.9.2 \\
 \hline
 \end{tabular}
 …
 \caption{Machine setup used for the tests}
 \label{tab:machine}
 \end{figure}
+\end{table}
 \section{Micro benchmarks}
 …
 \begin{pseudo}
 #define BENCH(run, result)
         gettime();
+        before = gettime();
         run;
         gettime();
+        after  = gettime();
         result = (after - before) / N;
 \end{pseudo}
 …
 \subsection{Context-switching}
 The first interesting benchmark is to measure how long context-switches take. The simplest approach to do this is to yield on a thread, which executes a 2-step context switch. In order to make the comparison fair, coroutines also execute a 2-step context-switch, which is a resume/suspend cycle instead of a yield. Listing \ref{lst:ctx-switch} shows the code for coroutines and threads. All omitted tests are functionally identical to one of these tests. The results can be shown in table \ref{tab:ctx-switch}.
+The first interesting benchmark is to measure how long context-switches take. The simplest approach to do this is to yield on a thread, which executes a 2-step context switch. In order to make the comparison fair, coroutines also execute a 2-step context-switch (\gls{uthread} to \gls{kthread} then \gls{kthread} to \gls{uthread}), which is a resume/suspend cycle instead of a yield. Listing \ref{lst:ctx-switch} shows the code for coroutines and threads whith the results in table \ref{tab:ctx-switch}. All omitted tests are functionally identical to one of these tests.
 \begin{figure}
 \begin{multicols}{2}
 …
 \end{cfacode}
 \end{multicols}
+\caption{\CFA benchmark code used to measure context-switches for coroutines and threads.}
+\label{lst:ctx-switch}
+\end{figure}
+\begin{figure}
+\begin{center}
+\begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
+\cline{2-4}
+\multicolumn{1}{c |}{} & \multicolumn{1}{c |}{ Median } &\multicolumn{1}{c |}{ Average } & \multicolumn{1}{c |}{ Standard Deviation} \\
+\hline
+Kernel Threads          & 239           & 242.57        & 5.54 \\
+\CFA Coroutines         & 38            & 38            & 0    \\
+\CFA Threads            & 102           & 102.39        & 1.57 \\
+\uC Coroutines          & 46            & 46.68 & 0.47 \\
+\uC Threads                     & 98            & 99.39 & 1.52 \\
+\begin{cfacode}[caption={\CFA benchmark code used to measure context-switches for coroutines and threads.},label={lst:ctx-switch}]
+\end{cfacode}
+\end{figure}
+\begin{table}
+\begin{center}
+\begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
+\cline{2-4}
+\multicolumn{1}{c |}{} & \multicolumn{1}{c |}{ Median } &\multicolumn{1}{c |}{ Average } & \multicolumn{1}{c |}{ Standard Deviation} \\
+\hline
+Kernel Thread   & 239           & 242.57        & 5.54 \\
+\CFA Coroutine  & 38            & 38            & 0    \\
+\CFA Thread             & 102           & 102.39        & 1.57 \\
+\uC Coroutine   & 46            & 46.68 & 0.47 \\
+\uC Thread              & 98            & 99.39 & 1.52 \\
+Goroutine               & 148           & 148.0 & 0 \\
+Java Thread             & 271           & 271.0 & 0 \\
 \hline
 \end{tabular}
 …
 \caption{Context Switch comparison. All numbers are in nanoseconds(\si{\nano\second})}
 \label{tab:ctx-switch}
 \end{figure}
+\end{table}
 \subsection{Mutual-exclusion}
 The next interesting benchmark is to measure the overhead to enter/leave a critical-section. For monitors, the simplest approach is to measure how long it takes enter and leave a monitor routine. Listing \ref{lst:mutex} shows the code for \CFA. To put the results in context, the cost of entering a non-inline function and the cost of acquiring and releasing a pthread mutex lock are also measured. The results can be shown in table \ref{tab:mutex}.
 \begin{figure}
 \begin{cfacode}
+The next interesting benchmark is to measure the overhead to enter/leave a critical-section. For monitors, the simplest approach is to measure how long it takes to enter and leave a monitor routine. Listing \ref{lst:mutex} shows the code for \CFA. To put the results in context, the cost of entering a non-inline function and the cost of acquiring and releasing a pthread mutex lock are also measured. The results can be shown in table \ref{tab:mutex}.
+\begin{figure}
+\begin{cfacode}[caption={\CFA benchmark code used to measure mutex routines.},label={lst:mutex}]
 monitor M {};
 void __attribute__((noinline)) call( M & mutex m /*, m2, m3, m4*/ ) {}
 …
+}
 \end{cfacode}
+\caption{\CFA benchmark code used to measure mutex routines.}
+\label{lst:mutex}
+\end{figure}
+\begin{figure}
+\end{figure}
+\begin{table}
 \begin{center}
 \begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
 …
 \hline
 C routine                                               & 2             & 2             & 0      \\
+FetchAdd + FetchSub                             & 2             & 2             & 0      \\
 Pthreads Mutex Lock                             & 31            & 31.86 & 0.99   \\
 \uC \code{monitor} member routine               & 30            & 30            & 0      \\
 …
 \CFA \code{mutex} routine, 2 argument   & 82            & 83            & 1.93   \\
 \CFA \code{mutex} routine, 4 argument   & 165           & 161.15        & 54.04  \\
+Java synchronized routine                       & 165           & 161.15        & 54.04  \\
 \hline
 \end{tabular}
 …
 \caption{Mutex routine comparison. All numbers are in nanoseconds(\si{\nano\second})}
 \label{tab:mutex}
 \end{figure}
+\end{table}
 \subsection{Internal scheduling}
 The Internal scheduling benchmark measures the cost of waiting on and signalling a condition variable. Listing \ref{lst:int-sched} shows the code for \CFA. The results can be shown in table \ref{tab:int-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests.
 \begin{figure}
 \begin{cfacode}
+The internal-scheduling benchmark measures the cost of waiting on and signalling a condition variable. Listing \ref{lst:int-sched} shows the code for \CFA, with results table \ref{tab:int-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests.
+\begin{figure}
+\begin{cfacode}[caption={Benchmark code for internal scheduling},label={lst:int-sched}]
 volatile int go = 0;
 condition c;
 …
+}
 \end{cfacode}
+\caption{Benchmark code for internal scheduling}
+\label{lst:int-sched}
+\end{figure}
+\begin{figure}
+\end{figure}
+\begin{table}
 \begin{center}
 \begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
 …
 \CFA \code{signal}, 2 \code{monitor}    & 1531  & 1550.75       & 32.77 \\
 \CFA \code{signal}, 4 \code{monitor}    & 2288.5        & 2326.86       & 54.73 \\
+Java \code{notify}                              & 2288.5        & 2326.86       & 54.73 \\
 \hline
 \end{tabular}
 …
 \caption{Internal scheduling comparison. All numbers are in nanoseconds(\si{\nano\second})}
 \label{tab:int-sched}
 \end{figure}
+\end{table}
 \subsection{External scheduling}
 The Internal scheduling benchmark measures the cost of the \code{waitfor} statement (\code{_Accept} in \uC). Listing \ref{lst:ext-sched} shows the code for \CFA. The results can be shown in table \ref{tab:ext-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests.
 \begin{figure}
 \begin{cfacode}
+The Internal scheduling benchmark measures the cost of the \code{waitfor} statement (\code{_Accept} in \uC). Listing \ref{lst:ext-sched} shows the code for \CFA, with results in table \ref{tab:ext-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests.
+\begin{figure}
+\begin{cfacode}[caption={Benchmark code for external scheduling},label={lst:ext-sched}]
 volatile int go = 0;
 monitor M {};
 …
+}
 \end{cfacode}
+\caption{Benchmark code for external scheduling}
+\label{lst:ext-sched}
+\end{figure}
+\begin{figure}
+\end{figure}
+\begin{table}
 \begin{center}
 \begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
 …
 \caption{External scheduling comparison. All numbers are in nanoseconds(\si{\nano\second})}
 \label{tab:ext-sched}
 \end{figure}
+\end{table}
 \subsection{Object creation}
 Finally, the last benchmark measured is the cost of creation for concurrent objects. Listing \ref{lst:creation} shows the code for pthreads and \CFA threads. The results can be shown in table \ref{tab:creation}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests. The only note here is that the call-stacks of \CFA coroutines are lazily created, therefore without priming the coroutine, the creation cost is very low.
 \begin{figure}
 \begin{multicols}{2}
+Finally, the last benchmark measurs the cost of creation for concurrent objects. Listing \ref{lst:creation} shows the code for pthreads and \CFA threads, with results shown in table \ref{tab:creation}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests. The only note here is that the call-stacks of \CFA coroutines are lazily created, therefore without priming the coroutine, the creation cost is very low.
+\begin{figure}
+\begin{center}
 pthread
 \begin{cfacode}
+\begin{ccode}
 int main() {
         BENCH(
                 for(size_t i=0; i<n; i++) {
                         pthread_t thread;
+                        if(pthread_create(
+                                &thread,
+                                NULL,
+                                foo,
+                                NULL
+                        ) < 0) {
+                        if(pthread_create(&thread,NULL,foo,NULL)<0) {
                                 perror( "failure" );
                                 return 1;
+                        }
+                        if(pthread_join(
+                                thread,
+                                NULL
+                        ) < 0) {
+                        if(pthread_join(thread, NULL)<0) {
                                 perror( "failure" );
                                 return 1;
 …
         printf("%llu\n", result);
+}
+\end{cfacode}
+\columnbreak
+\end{ccode}
 \CFA Threads
 \begin{cfacode}
 …
                 result
+        )
+        printf("%llu\n", result);
+}
+\end{cfacode}
+\end{multicols}
+\caption{Benchmark code for pthreads and \CFA to measure object creation}
+\label{lst:creation}
+\end{figure}
+\begin{figure}
+        printf("%llu\n", result);
+}
+\end{cfacode}
+\end{center}
+\begin{cfacode}[caption={Benchmark code for pthreads and \CFA to measure object creation},label={lst:creation}]
+\end{cfacode}
+\end{figure}
+\begin{table}
 \begin{center}
 \begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |}
 …
 \hline
 Pthreads                        & 26974.5       & 26977 & 124.12 \\
+\CFA Coroutines Lazy    & 5             & 5             & 0      \\
+\CFA Coroutines Eager   & 335.0 & 357.67        & 34.2   \\
+\CFA Threads            & 1122.5        & 1109.86       & 36.54  \\
+\uC Coroutines          & 106           & 107.04        & 1.61   \\
+\uC Threads                     & 525.5 & 533.04        & 11.14  \\
+\CFA Coroutine Lazy     & 5             & 5             & 0      \\
+\CFA Coroutine Eager    & 335.0 & 357.67        & 34.2   \\
+\CFA Thread                     & 1122.5        & 1109.86       & 36.54  \\
+\uC Coroutine           & 106           & 107.04        & 1.61   \\
+\uC Thread                      & 525.5 & 533.04        & 11.14  \\
+Goroutine                       & 525.5 & 533.04        & 11.14  \\
+Java Thread                     & 525.5 & 533.04        & 11.14  \\
 \hline
 \end{tabular}
 …
 \caption{Creation comparison. All numbers are in nanoseconds(\si{\nano\second})}
 \label{tab:creation}
 \end{figure}
+\end{table}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset cf966b5 for doc/proposals/concurrency/text/results.tex

Legend:

doc/proposals/concurrency/text/results.tex

Download in other formats: