Changeset cf966b5 for doc/proposals/concurrency/text/results.tex
- Timestamp:
- Nov 28, 2017, 3:52:01 PM (6 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
- Children:
- 6c2ba38
- Parents:
- f7a4f89
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/proposals/concurrency/text/results.tex
rf7a4f89 rcf966b5 6 6 \section{Machine setup} 7 7 Table \ref{tab:machine} shows the characteristics of the machine used to run the benchmarks. All tests where made on this machine. 8 \begin{ figure}[H]8 \begin{table}[H] 9 9 \begin{center} 10 10 \begin{tabular}{| l | r | l | r |} … … 28 28 \hline 29 29 Compiler & GCC 6.3.0 & Translator & CFA 1.0.0 \\ 30 \hline 31 Java version & OpenJDK-9 & Go version & 1.9.2 \\ 30 32 \hline 31 33 \end{tabular} … … 33 35 \caption{Machine setup used for the tests} 34 36 \label{tab:machine} 35 \end{ figure}37 \end{table} 36 38 37 39 \section{Micro benchmarks} … … 39 41 \begin{pseudo} 40 42 #define BENCH(run, result) 41 gettime();43 before = gettime(); 42 44 run; 43 gettime();45 after = gettime(); 44 46 result = (after - before) / N; 45 47 \end{pseudo} … … 47 49 48 50 \subsection{Context-switching} 49 The first interesting benchmark is to measure how long context-switches take. The simplest approach to do this is to yield on a thread, which executes a 2-step context switch. In order to make the comparison fair, coroutines also execute a 2-step context-switch , which is a resume/suspend cycle instead of a yield. Listing \ref{lst:ctx-switch} shows the code for coroutines and threads. All omitted tests are functionally identical to one of these tests. The results can be shown in table \ref{tab:ctx-switch}.51 The first interesting benchmark is to measure how long context-switches take. The simplest approach to do this is to yield on a thread, which executes a 2-step context switch. In order to make the comparison fair, coroutines also execute a 2-step context-switch (\gls{uthread} to \gls{kthread} then \gls{kthread} to \gls{uthread}), which is a resume/suspend cycle instead of a yield. Listing \ref{lst:ctx-switch} shows the code for coroutines and threads whith the results in table \ref{tab:ctx-switch}. All omitted tests are functionally identical to one of these tests. 50 52 \begin{figure} 51 53 \begin{multicols}{2} … … 88 90 \end{cfacode} 89 91 \end{multicols} 90 \caption{\CFA benchmark code used to measure context-switches for coroutines and threads.} 91 \label{lst:ctx-switch} 92 \end{figure} 93 94 \begin{figure} 95 \begin{center} 96 \begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |} 97 \cline{2-4} 98 \multicolumn{1}{c |}{} & \multicolumn{1}{c |}{ Median } &\multicolumn{1}{c |}{ Average } & \multicolumn{1}{c |}{ Standard Deviation} \\ 99 \hline 100 Kernel Threads & 239 & 242.57 & 5.54 \\ 101 \CFA Coroutines & 38 & 38 & 0 \\ 102 \CFA Threads & 102 & 102.39 & 1.57 \\ 103 \uC Coroutines & 46 & 46.68 & 0.47 \\ 104 \uC Threads & 98 & 99.39 & 1.52 \\ 92 \begin{cfacode}[caption={\CFA benchmark code used to measure context-switches for coroutines and threads.},label={lst:ctx-switch}] 93 \end{cfacode} 94 \end{figure} 95 96 \begin{table} 97 \begin{center} 98 \begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |} 99 \cline{2-4} 100 \multicolumn{1}{c |}{} & \multicolumn{1}{c |}{ Median } &\multicolumn{1}{c |}{ Average } & \multicolumn{1}{c |}{ Standard Deviation} \\ 101 \hline 102 Kernel Thread & 239 & 242.57 & 5.54 \\ 103 \CFA Coroutine & 38 & 38 & 0 \\ 104 \CFA Thread & 102 & 102.39 & 1.57 \\ 105 \uC Coroutine & 46 & 46.68 & 0.47 \\ 106 \uC Thread & 98 & 99.39 & 1.52 \\ 107 Goroutine & 148 & 148.0 & 0 \\ 108 Java Thread & 271 & 271.0 & 0 \\ 105 109 \hline 106 110 \end{tabular} … … 108 112 \caption{Context Switch comparison. All numbers are in nanoseconds(\si{\nano\second})} 109 113 \label{tab:ctx-switch} 110 \end{ figure}114 \end{table} 111 115 112 116 \subsection{Mutual-exclusion} 113 The next interesting benchmark is to measure the overhead to enter/leave a critical-section. For monitors, the simplest approach is to measure how long it takes enter and leave a monitor routine. Listing \ref{lst:mutex} shows the code for \CFA. To put the results in context, the cost of entering a non-inline function and the cost of acquiring and releasing a pthread mutex lock are also measured. The results can be shown in table \ref{tab:mutex}.114 115 \begin{figure} 116 \begin{cfacode} 117 The next interesting benchmark is to measure the overhead to enter/leave a critical-section. For monitors, the simplest approach is to measure how long it takes to enter and leave a monitor routine. Listing \ref{lst:mutex} shows the code for \CFA. To put the results in context, the cost of entering a non-inline function and the cost of acquiring and releasing a pthread mutex lock are also measured. The results can be shown in table \ref{tab:mutex}. 118 119 \begin{figure} 120 \begin{cfacode}[caption={\CFA benchmark code used to measure mutex routines.},label={lst:mutex}] 117 121 monitor M {}; 118 122 void __attribute__((noinline)) call( M & mutex m /*, m2, m3, m4*/ ) {} … … 129 133 } 130 134 \end{cfacode} 131 \caption{\CFA benchmark code used to measure mutex routines.} 132 \label{lst:mutex} 133 \end{figure} 134 135 \begin{figure} 135 \end{figure} 136 137 \begin{table} 136 138 \begin{center} 137 139 \begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |} … … 140 142 \hline 141 143 C routine & 2 & 2 & 0 \\ 144 FetchAdd + FetchSub & 2 & 2 & 0 \\ 142 145 Pthreads Mutex Lock & 31 & 31.86 & 0.99 \\ 143 146 \uC \code{monitor} member routine & 30 & 30 & 0 \\ … … 145 148 \CFA \code{mutex} routine, 2 argument & 82 & 83 & 1.93 \\ 146 149 \CFA \code{mutex} routine, 4 argument & 165 & 161.15 & 54.04 \\ 150 Java synchronized routine & 165 & 161.15 & 54.04 \\ 147 151 \hline 148 152 \end{tabular} … … 150 154 \caption{Mutex routine comparison. All numbers are in nanoseconds(\si{\nano\second})} 151 155 \label{tab:mutex} 152 \end{ figure}156 \end{table} 153 157 154 158 \subsection{Internal scheduling} 155 The Internal scheduling benchmark measures the cost of waiting on and signalling a condition variable. Listing \ref{lst:int-sched} shows the code for \CFA. The results can be shown intable \ref{tab:int-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests.156 157 \begin{figure} 158 \begin{cfacode} 159 The internal-scheduling benchmark measures the cost of waiting on and signalling a condition variable. Listing \ref{lst:int-sched} shows the code for \CFA, with results table \ref{tab:int-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests. 160 161 \begin{figure} 162 \begin{cfacode}[caption={Benchmark code for internal scheduling},label={lst:int-sched}] 159 163 volatile int go = 0; 160 164 condition c; … … 187 191 } 188 192 \end{cfacode} 189 \caption{Benchmark code for internal scheduling} 190 \label{lst:int-sched} 191 \end{figure} 192 193 \begin{figure} 193 \end{figure} 194 195 \begin{table} 194 196 \begin{center} 195 197 \begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |} … … 201 203 \CFA \code{signal}, 2 \code{monitor} & 1531 & 1550.75 & 32.77 \\ 202 204 \CFA \code{signal}, 4 \code{monitor} & 2288.5 & 2326.86 & 54.73 \\ 205 Java \code{notify} & 2288.5 & 2326.86 & 54.73 \\ 203 206 \hline 204 207 \end{tabular} … … 206 209 \caption{Internal scheduling comparison. All numbers are in nanoseconds(\si{\nano\second})} 207 210 \label{tab:int-sched} 208 \end{ figure}211 \end{table} 209 212 210 213 \subsection{External scheduling} 211 The Internal scheduling benchmark measures the cost of the \code{waitfor} statement (\code{_Accept} in \uC). Listing \ref{lst:ext-sched} shows the code for \CFA . The results can be shownin table \ref{tab:ext-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests.212 213 \begin{figure} 214 \begin{cfacode} 214 The Internal scheduling benchmark measures the cost of the \code{waitfor} statement (\code{_Accept} in \uC). Listing \ref{lst:ext-sched} shows the code for \CFA, with results in table \ref{tab:ext-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests. 215 216 \begin{figure} 217 \begin{cfacode}[caption={Benchmark code for external scheduling},label={lst:ext-sched}] 215 218 volatile int go = 0; 216 219 monitor M {}; … … 242 245 } 243 246 \end{cfacode} 244 \caption{Benchmark code for external scheduling} 245 \label{lst:ext-sched} 246 \end{figure} 247 248 \begin{figure} 247 \end{figure} 248 249 \begin{table} 249 250 \begin{center} 250 251 \begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |} … … 261 262 \caption{External scheduling comparison. All numbers are in nanoseconds(\si{\nano\second})} 262 263 \label{tab:ext-sched} 263 \end{ figure}264 \end{table} 264 265 265 266 \subsection{Object creation} 266 Finally, the last benchmark measur ed is the cost of creation for concurrent objects. Listing \ref{lst:creation} shows the code for pthreads and \CFA threads. The results can beshown in table \ref{tab:creation}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests. The only note here is that the call-stacks of \CFA coroutines are lazily created, therefore without priming the coroutine, the creation cost is very low.267 268 \begin{figure} 269 \begin{ multicols}{2}267 Finally, the last benchmark measurs the cost of creation for concurrent objects. Listing \ref{lst:creation} shows the code for pthreads and \CFA threads, with results shown in table \ref{tab:creation}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests. The only note here is that the call-stacks of \CFA coroutines are lazily created, therefore without priming the coroutine, the creation cost is very low. 268 269 \begin{figure} 270 \begin{center} 270 271 pthread 271 \begin{c facode}272 \begin{ccode} 272 273 int main() { 273 274 BENCH( 274 275 for(size_t i=0; i<n; i++) { 275 276 pthread_t thread; 276 if(pthread_create( 277 &thread, 278 NULL, 279 foo, 280 NULL 281 ) < 0) { 277 if(pthread_create(&thread,NULL,foo,NULL)<0) { 282 278 perror( "failure" ); 283 279 return 1; 284 280 } 285 281 286 if(pthread_join( 287 thread, 288 NULL 289 ) < 0) { 282 if(pthread_join(thread, NULL)<0) { 290 283 perror( "failure" ); 291 284 return 1; … … 296 289 printf("%llu\n", result); 297 290 } 298 \end{cfacode} 299 \columnbreak 291 \end{ccode} 292 293 294 300 295 \CFA Threads 301 296 \begin{cfacode} … … 307 302 result 308 303 ) 309 310 printf("%llu\n", result); 311 } 312 \end{cfacode} 313 \end{multicols} 314 \caption{Benchmark code for pthreads and \CFA to measure object creation} 315 \label{lst:creation} 316 \end{figure} 317 318 \begin{figure} 304 printf("%llu\n", result); 305 } 306 \end{cfacode} 307 \end{center} 308 \begin{cfacode}[caption={Benchmark code for pthreads and \CFA to measure object creation},label={lst:creation}] 309 \end{cfacode} 310 \end{figure} 311 312 \begin{table} 319 313 \begin{center} 320 314 \begin{tabular}{| l | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] | S[table-format=5.2,table-number-alignment=right] |} … … 323 317 \hline 324 318 Pthreads & 26974.5 & 26977 & 124.12 \\ 325 \CFA Coroutines Lazy & 5 & 5 & 0 \\ 326 \CFA Coroutines Eager & 335.0 & 357.67 & 34.2 \\ 327 \CFA Threads & 1122.5 & 1109.86 & 36.54 \\ 328 \uC Coroutines & 106 & 107.04 & 1.61 \\ 329 \uC Threads & 525.5 & 533.04 & 11.14 \\ 319 \CFA Coroutine Lazy & 5 & 5 & 0 \\ 320 \CFA Coroutine Eager & 335.0 & 357.67 & 34.2 \\ 321 \CFA Thread & 1122.5 & 1109.86 & 36.54 \\ 322 \uC Coroutine & 106 & 107.04 & 1.61 \\ 323 \uC Thread & 525.5 & 533.04 & 11.14 \\ 324 Goroutine & 525.5 & 533.04 & 11.14 \\ 325 Java Thread & 525.5 & 533.04 & 11.14 \\ 330 326 \hline 331 327 \end{tabular} … … 333 329 \caption{Creation comparison. All numbers are in nanoseconds(\si{\nano\second})} 334 330 \label{tab:creation} 335 \end{ figure}331 \end{table}
Note: See TracChangeset
for help on using the changeset viewer.