Changeset 6090518 for doc/proposals/concurrency/text/results.tex
- Timestamp:
- Nov 29, 2017, 4:33:46 PM (7 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
- Children:
- f0743a7
- Parents:
- 9d48a17
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/proposals/concurrency/text/results.tex
r9d48a17 r6090518 1 1 % ====================================================================== 2 2 % ====================================================================== 3 \chapter{Performance results} \label{results}3 \chapter{Performance Results} \label{results} 4 4 % ====================================================================== 5 5 % ====================================================================== 6 \section{Machine setup}7 Table \ref{tab:machine} shows the characteristics of the machine used to run the benchmarks. All tests w here made on this machine.6 \section{Machine Setup} 7 Table \ref{tab:machine} shows the characteristics of the machine used to run the benchmarks. All tests were made on this machine. 8 8 \begin{table}[H] 9 9 \begin{center} … … 37 37 \end{table} 38 38 39 \section{Micro benchmarks}39 \section{Micro Benchmarks} 40 40 All benchmarks are run using the same harness to produce the results, seen as the \code{BENCH()} macro in the following examples. This macro uses the following logic to benchmark the code : 41 41 \begin{pseudo} … … 46 46 result = (after - before) / N; 47 47 \end{pseudo} 48 The method used to get time is \code{clock_gettime(CLOCK_THREAD_CPUTIME_ID);}. Each benchmark is using many iterations of a simple call to measure the cost of the call. The specific number of iteration depends on the specific benchmark.49 50 \subsection{Context- switching}51 The first interesting benchmark is to measure how long context-switches take. The simplest approach to do this is to yield on a thread, which executes a 2-step context switch. In order to make the comparison fair, coroutines also execute a 2-step context-switch (\gls{uthread} to \gls{kthread} then \gls{kthread} to \gls{uthread}), which is a resume/suspend cycle instead of a yield. Listing \ref{lst:ctx-switch} shows the code for coroutines and threads w hith the results in table \ref{tab:ctx-switch}. All omitted tests are functionally identical to one of these tests.48 The method used to get time is \code{clock_gettime(CLOCK_THREAD_CPUTIME_ID);}. Each benchmark is using many iterations of a simple call to measure the cost of the call. The specific number of iterations depends on the specific benchmark. 49 50 \subsection{Context-Switching} 51 The first interesting benchmark is to measure how long context-switches take. The simplest approach to do this is to yield on a thread, which executes a 2-step context switch. In order to make the comparison fair, coroutines also execute a 2-step context-switch (\gls{uthread} to \gls{kthread} then \gls{kthread} to \gls{uthread}), which is a resume/suspend cycle instead of a yield. Listing \ref{lst:ctx-switch} shows the code for coroutines and threads with the results in table \ref{tab:ctx-switch}. All omitted tests are functionally identical to one of these tests. 52 52 \begin{figure} 53 53 \begin{multicols}{2} … … 114 114 \end{table} 115 115 116 \subsection{Mutual- exclusion}117 The next interesting benchmark is to measure the overhead to enter/leave a critical-section. For monitors, the simplest approach is to measure how long it takes to enter and leave a monitor routine. Listing \ref{lst:mutex} shows the code for \CFA. To put the results in context, the cost of entering a non-inline function and the cost of acquiring and releasing a pthread mutex lock arealso measured. The results can be shown in table \ref{tab:mutex}.116 \subsection{Mutual-Exclusion} 117 The next interesting benchmark is to measure the overhead to enter/leave a critical-section. For monitors, the simplest approach is to measure how long it takes to enter and leave a monitor routine. Listing \ref{lst:mutex} shows the code for \CFA. To put the results in context, the cost of entering a non-inline function and the cost of acquiring and releasing a pthread mutex lock is also measured. The results can be shown in table \ref{tab:mutex}. 118 118 119 119 \begin{figure} … … 156 156 \end{table} 157 157 158 \subsection{Internal scheduling}158 \subsection{Internal Scheduling} 159 159 The internal-scheduling benchmark measures the cost of waiting on and signalling a condition variable. Listing \ref{lst:int-sched} shows the code for \CFA, with results table \ref{tab:int-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests. 160 160 … … 211 211 \end{table} 212 212 213 \subsection{External scheduling}213 \subsection{External Scheduling} 214 214 The Internal scheduling benchmark measures the cost of the \code{waitfor} statement (\code{_Accept} in \uC). Listing \ref{lst:ext-sched} shows the code for \CFA, with results in table \ref{tab:ext-sched}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests. 215 215 … … 264 264 \end{table} 265 265 266 \subsection{Object creation}267 Finally, the last benchmark measur s the cost of creation for concurrent objects. Listing \ref{lst:creation} shows the code for pthreads and \CFA threads, with results shown in table \ref{tab:creation}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests. The only note here is that the call-stacks of \CFA coroutines are lazily created, therefore without priming the coroutine, the creation cost is very low.266 \subsection{Object Creation} 267 Finally, the last benchmark measures the cost of creation for concurrent objects. Listing \ref{lst:creation} shows the code for pthreads and \CFA threads, with results shown in table \ref{tab:creation}. As with all other benchmarks, all omitted tests are functionally identical to one of these tests. The only note here is that the call stacks of \CFA coroutines are lazily created, therefore without priming the coroutine, the creation cost is very low. 268 268 269 269 \begin{figure} … … 327 327 \end{tabular} 328 328 \end{center} 329 \caption{Creation comparison. All numbers are in nanoseconds(\si{\nano\second}) }329 \caption{Creation comparison. All numbers are in nanoseconds(\si{\nano\second}).} 330 330 \label{tab:creation} 331 331 \end{table}
Note: See TracChangeset
for help on using the changeset viewer.