Changeset fe9cf9e
- Timestamp:
- Jun 4, 2020, 9:17:02 AM (4 years ago)
- Branches:
- ADT, arm-eh, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
- Children:
- aabb846
- Parents:
- 04b4a71
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/papers/concurrency/Paper.tex
r04b4a71 rfe9cf9e 269 269 270 270 \abstract[Summary]{ 271 \CFA is a polymorphic, non-object-oriented, concurrent, backwards -compatible extension of the C programming language.271 \CFA is a polymorphic, non-object-oriented, concurrent, backwards compatible extension of the C programming language. 272 272 This paper discusses the design philosophy and implementation of its advanced control-flow and concurrent/parallel features, along with the supporting runtime written in \CFA. 273 273 These features are created from scratch as ISO C has only low-level and/or unimplemented concurrency, so C programmers continue to rely on library approaches like pthreads. … … 301 301 The TIOBE index~\cite{TIOBE} for May 2020 ranks the top five \emph{popular} programming languages as C 17\%, Java 16\%, Python 9\%, \CC 6\%, and \Csharp 4\% = 52\%, and over the past 30 years, C has always ranked either first or second in popularity.} 302 302 allowing immediate dissemination. 303 This paper discusses the design philosophy and implementation of advanced language-level control-flow and concurrent/parallel features in \CFA and its runtime, which is written entirely in \CFA. 304 The \CFA control-flow framework extends ISO \Celeven~\cite{C11} with new call/return and concurrent/parallel control-flow. 303 This paper discusses the design philosophy and implementation of \CFA's advanced control-flow and concurrent/parallel features, along with the supporting runtime written in \CFA. 305 304 306 305 % The call/return extensions retain state between callee and caller versus losing the callee's state on return; 307 306 % the concurrency extensions allow high-level management of threads. 308 307 308 The \CFA control-flow framework extends ISO \Celeven~\cite{C11} with new call/return and concurrent/parallel control-flow. 309 309 Call/return control-flow with argument and parameter passing appeared in the first programming languages. 310 Over the past 50 years, call/return has been augmented with features like static /dynamic call, exceptions (multi-level return) and generators/coroutines (retain state between calls).310 Over the past 50 years, call/return has been augmented with features like static and dynamic call, exceptions (multi-level return) and generators/coroutines (see Section~\ref{s:StatefulFunction}). 311 311 While \CFA has mechanisms for dynamic call (algebraic effects~\cite{Zhang19}) and exceptions\footnote{ 312 312 \CFA exception handling will be presented in a separate paper. … … 314 314 \newterm{Coroutining} was introduced by Conway~\cite{Conway63} (1963), discussed by Knuth~\cite[\S~1.4.2]{Knuth73V1}, implemented in Simula67~\cite{Simula67}, formalized by Marlin~\cite{Marlin80}, and is now popular and appears in old and new programming languages: CLU~\cite{CLU}, \Csharp~\cite{Csharp}, Ruby~\cite{Ruby}, Python~\cite{Python}, JavaScript~\cite{JavaScript}, Lua~\cite{Lua}, \CCtwenty~\cite{C++20Coroutine19}. 315 315 Coroutining is sequential execution requiring direct handoff among coroutines, \ie only the programmer is controlling execution order. 316 If coroutines transfer to an internal event-engine for scheduling the next coroutines , the program transitions into the realm of concurrency~\cite[\S~3]{Buhr05a}.316 If coroutines transfer to an internal event-engine for scheduling the next coroutines (as in async-await), the program transitions into the realm of concurrency~\cite[\S~3]{Buhr05a}. 317 317 Coroutines are only a stepping stone towards concurrency where the commonality is that coroutines and threads retain state between calls. 318 318 … … 324 324 Kernel threading was chosen, largely because of its simplicity and fit with the simpler operating systems and hardware architectures at the time, which gave it a performance advantage~\cite{Drepper03}. 325 325 Libraries like pthreads were developed for C, and the Solaris operating-system switched from user (JDK 1.1~\cite{JDK1.1}) to kernel threads. 326 As a result, many current languages implementations adopt the 1:1 kernel-threading model, like Java (Scala), Objective-C~\cite{obj-c-book}, \CCeleven~\cite{C11}, C\#~\cite{Csharp} and Rust~\cite{Rust}, with a variety of presentation mechanisms.326 As a result, many languages adopt the 1:1 kernel-threading model, like Java (Scala), Objective-C~\cite{obj-c-book}, \CCeleven~\cite{C11}, C\#~\cite{Csharp} and Rust~\cite{Rust}, with a variety of presentation mechanisms. 327 327 From 2000 onwards, several language implementations have championed the M:N user-threading model, like Go~\cite{Go}, Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, D~\cite{D}, and \uC~\cite{uC++,uC++book}, including putting green threads back into Java~\cite{Quasar}, and many user-threading libraries have appeared~\cite{Qthreads,MPC,Marcel}. 328 328 The main argument for user-level threading is that it is lighter weight than kernel threading because locking and context switching do not cross the kernel boundary, so there is less restriction on programming styles that encourages large numbers of threads performing medium-sized work to facilitate load balancing by the runtime~\cite{Verch12}. 329 329 As well, user-threading facilitates a simpler concurrency approach using thread objects that leverage sequential patterns versus events with call-backs~\cite{Adya02,vonBehren03}. 330 Finally, performant user-threading implementations, both time and space, meet or exceed direct kernel-threading implementations, while achieving the programming advantages of high concurrency levels and safety.330 Finally, performant user-threading implementations, both in time and space, meet or exceed direct kernel-threading implementations, while achieving the programming advantages of high concurrency levels and safety. 331 331 332 332 A further effort over the past two decades is the development of language memory models to deal with the conflict between language features and compiler/hardware optimizations, \eg some language features are unsafe in the presence of aggressive sequential optimizations~\cite{Buhr95a,Boehm05}. … … 338 338 339 339 Finally, it is important for a language to provide safety over performance \emph{as the default}, allowing careful reduction of safety for performance when necessary. 340 Two concurrency violations of this philosophy are \emph{spurious or random wakeup}~\cite[\S~9]{Buhr05a})and \emph{barging}\footnote{340 Two concurrency violations of this philosophy are \emph{spurious} or \emph{random wakeup}~\cite[\S~9]{Buhr05a}, and \emph{barging}\footnote{ 341 341 Barging is competitive succession instead of direct handoff, \ie after a lock is released both arriving and preexisting waiter threads compete to acquire the lock. 342 342 Hence, an arriving thread can temporally \emph{barge} ahead of threads already waiting for an event, which can repeat indefinitely leading to starvation of waiter threads. … … 386 386 Section~\ref{s:Monitor} shows how both mutual exclusion and synchronization are safely embedded in the @monitor@ and @thread@ constructs. 387 387 Section~\ref{s:CFARuntimeStructure} describes the large-scale mechanism to structure threads and virtual processors (kernel threads). 388 Section~\ref{s:Performance} uses a series ofmicrobenchmarks to compare \CFA threading with pthreads, Java 11.0.6, Go 1.12.6, Rust 1.37.0, Python 3.7.6, Node.js 12.14.1, and \uC 7.0.0.388 Section~\ref{s:Performance} uses microbenchmarks to compare \CFA threading with pthreads, Java 11.0.6, Go 1.12.6, Rust 1.37.0, Python 3.7.6, Node.js 12.14.1, and \uC 7.0.0. 389 389 390 390 … … 395 395 To this end, the control-flow features created for \CFA are based on the fundamental properties of any language with function-stack control-flow (see also \uC~\cite[pp.~140-142]{uC++}). 396 396 The fundamental properties are execution state, thread, and mutual-exclusion/synchronization (MES). 397 These independent properties can be used to compose different language features, forming a compositional hierarchy, where the combination of all three is the most advanced feature, called a thread /task/process.397 These independent properties can be used to compose different language features, forming a compositional hierarchy, where the combination of all three is the most advanced feature, called a thread. 398 398 While it is possible for a language to only provide threads for composing programs~\cite{Hermes90}, this unnecessarily complicates and makes inefficient solutions to certain classes of problems. 399 399 As is shown, each of the non-rejected composed language features solves a particular set of problems, and hence, has a defensible position in a programming language. … … 405 405 is the state information needed by a control-flow feature to initialize, manage compute data and execution location(s), and de-initialize, \eg calling a function initializes a stack frame including contained objects with constructors, manages local data in blocks and return locations during calls, and de-initializes the frame by running any object destructors and management operations. 406 406 State is retained in fixed-sized aggregate structures (objects) and dynamic-sized stack(s), often allocated in the heap(s) managed by the runtime system. 407 The lifetime of thestate varies with the control-flow feature, where longer life-time and dynamic size provide greater power but also increase usage complexity and cost.407 The lifetime of state varies with the control-flow feature, where longer life-time and dynamic size provide greater power but also increase usage complexity and cost. 408 408 Control-flow transfers among execution states in multiple ways, such as function call, context switch, asynchronous await, etc. 409 409 Because the programming language determines what constitutes an execution state, implicitly manages this state, and defines movement mechanisms among states, execution state is an elementary property of the semantics of a programming language. … … 411 411 412 412 \item[\newterm{threading}:] 413 is execution of code that occurs independently of other execution, \ie the execution resulting from a threadis sequential.413 is execution of code that occurs independently of other execution, where an individual thread's execution is sequential. 414 414 Multiple threads provide \emph{concurrent execution}; 415 415 concurrent execution becomes parallel when run on multiple processing units, \eg hyper-threading, cores, or sockets. 416 There must be language mechanisms to create, block and unblock, and join with a thread .416 There must be language mechanisms to create, block and unblock, and join with a thread, even if the mechanism is indirect. 417 417 418 418 \item[\newterm{MES}:] … … 421 421 Limiting MES, \eg no access to shared data, results in contrived solutions and inefficiency on multi-core von Neumann computers where shared memory is a foundational aspect of its design. 422 422 \end{description} 423 These properties are fundamental because they cannot be built from existing language features, \eg a basic programming language like C99~\cite{C99} cannot create new control-flow features, concurrency, or provide MES usingatomic hardware mechanisms.423 These properties are fundamental because they cannot be built from existing language features, \eg a basic programming language like C99~\cite{C99} cannot create new control-flow features, concurrency, or provide MES without atomic hardware mechanisms. 424 424 425 425 … … 432 432 Table~\ref{t:ExecutionPropertyComposition} shows all combinations of the three fundamental execution properties available to language designers. 433 433 (When doing combination case-analysis, not all combinations are meaningful.) 434 The combinations of state, thread, and mutual exclusioncompose a hierarchy of control-flow features all of which have appeared in prior programming languages, where each of these languages have found the feature useful.434 The combinations of state, thread, and MES compose a hierarchy of control-flow features all of which have appeared in prior programming languages, where each of these languages have found the feature useful. 435 435 To understand the table, it is important to review the basic von Neumann execution requirement of at least one thread and execution state providing some form of call stack. 436 436 For table entries missing these minimal components, the property is borrowed from the invoker (caller). … … 468 468 A @mutex@ structure, often called a \newterm{monitor}, provides a high-level interface for race-free access of shared data in concurrent programming-languages. 469 469 Case 3 is case 1 where the structure can implicitly retain execution state and access functions use this execution state to resume/suspend across \emph{callers}, but resume/suspend does not retain a function's local state. 470 A stackless structure, often called a \newterm{generator} or \emph{iterator}, is \newterm{stackless} because it still borrow the caller's stack and thread, wherethe stack is used only to preserve state across its callees not callers.470 A stackless structure, often called a \newterm{generator} or \emph{iterator}, is \newterm{stackless} because it still borrow the caller's stack and thread, but the stack is used only to preserve state across its callees not callers. 471 471 Generators provide the first step toward directly solving problems like finite-state machines that retain data and execution state between calls, whereas normal functions restart on each call. 472 472 Case 4 is cases 2 and 3 with thread safety during execution of the generator's access functions. … … 480 480 Cases 11 and 12 are a stackful thread with and without safe access to shared state. 481 481 A thread is the language mechanism to start another thread of control in a program with growable execution state for call/return execution. 482 In general, more execution properties increase the cost of creation and execution along with complexity of usage.482 In general, language constructs with more execution properties increase the cost of creation and execution along with complexity of usage. 483 483 484 484 Given the execution-properties taxonomy, programmers now ask three basic questions: is state necessary across callers and how much, is a separate thread necessary, is thread safety necessary. … … 594 594 & 595 595 \begin{cfa} 596 void * rtn( void * arg ) { ... }596 void * `rtn`( void * arg ) { ... } 597 597 int i = 3, rc; 598 598 pthread_t t; $\C{// thread id}$ … … 3037 3037 \end{multicols} 3038 3038 3039 \vspace*{-10pt} 3040 \paragraph{Internal Scheduling} 3041 3042 Internal scheduling is measured using a cycle of two threads signalling and waiting. 3043 Figure~\ref{f:schedint} shows the code for \CFA, with results in Table~\ref{t:schedint}. 3044 Note, the incremental cost of bulk acquire for \CFA, which is largely a fixed cost for small numbers of mutex objects. 3045 Java scheduling is significantly greater because the benchmark explicitly creates multiple thread in order to prevent the JIT from making the program sequential, \ie removing all locking. 3046 3047 \begin{multicols}{2} 3048 \lstset{language=CFA,moredelim=**[is][\color{red}]{@}{@},deletedelim=**[is][]{`}{`}} 3049 \begin{cfa} 3050 volatile int go = 0; 3051 @condition c;@ 3052 @monitor@ M {} m1/*, m2, m3, m4*/; 3053 void call( M & @mutex p1/*, p2, p3, p4*/@ ) { 3054 @signal( c );@ 3055 } 3056 void wait( M & @mutex p1/*, p2, p3, p4*/@ ) { 3057 go = 1; // continue other thread 3058 for ( N ) { @wait( c );@ } ); 3059 } 3060 thread T {}; 3061 void main( T & ) { 3062 while ( go == 0 ) { yield(); } // waiter must start first 3063 BENCH( for ( N ) { call( m1/*, m2, m3, m4*/ ); } ) 3064 sout | result; 3065 } 3066 int main() { 3067 T t; 3068 wait( m1/*, m2, m3, m4*/ ); 3069 } 3070 \end{cfa} 3071 \vspace*{-8pt} 3072 \captionof{figure}{\CFA Internal-scheduling benchmark} 3073 \label{f:schedint} 3074 3075 \columnbreak 3076 3077 \vspace*{-16pt} 3078 \captionof{table}{Internal-scheduling comparison (nanoseconds)} 3079 \label{t:schedint} 3080 \bigskip 3081 3082 \begin{tabular}{@{}r*{3}{D{.}{.}{5.2}}@{}} 3083 \multicolumn{1}{@{}c}{} & \multicolumn{1}{c}{Median} & \multicolumn{1}{c}{Average} & \multicolumn{1}{c@{}}{Std Dev} \\ 3084 \CFA @signal@, 1 monitor & 364.4 & 364.2 & 4.4 \\ 3085 \CFA @signal@, 2 monitor & 484.4 & 483.9 & 8.8 \\ 3086 \CFA @signal@, 4 monitor & 709.1 & 707.7 & 15.0 \\ 3087 \uC @signal@ monitor & 328.3 & 327.4 & 2.4 \\ 3088 Rust cond. variable & 7514.0 & 7437.4 & 397.2 \\ 3089 Java @notify@ monitor & 9623.0 & 9654.6 & 236.2 \\ 3090 Pthreads cond. variable & 5553.7 & 5576.1 & 345.6 3091 \end{tabular} 3092 \end{multicols} 3093 3094 3095 \paragraph{External Scheduling} 3096 3097 External scheduling is measured using a cycle of two threads calling and accepting the call using the @waitfor@ statement. 3098 Figure~\ref{f:schedext} shows the code for \CFA with results in Table~\ref{t:schedext}. 3099 Note, the incremental cost of bulk acquire for \CFA, which is largely a fixed cost for small numbers of mutex objects. 3100 3101 \begin{multicols}{2} 3102 \lstset{language=CFA,moredelim=**[is][\color{red}]{@}{@},deletedelim=**[is][]{`}{`}} 3103 \vspace*{-16pt} 3104 \begin{cfa} 3105 @monitor@ M {} m1/*, m2, m3, m4*/; 3106 void call( M & @mutex p1/*, p2, p3, p4*/@ ) {} 3107 void wait( M & @mutex p1/*, p2, p3, p4*/@ ) { 3108 for ( N ) { @waitfor( call : p1/*, p2, p3, p4*/ );@ } 3109 } 3110 thread T {}; 3111 void main( T & ) { 3112 BENCH( for ( N ) { call( m1/*, m2, m3, m4*/ ); } ) 3113 sout | result; 3114 } 3115 int main() { 3116 T t; 3117 wait( m1/*, m2, m3, m4*/ ); 3118 } 3119 \end{cfa} 3120 \captionof{figure}{\CFA external-scheduling benchmark} 3121 \label{f:schedext} 3122 3123 \columnbreak 3124 3125 \vspace*{-16pt} 3126 \captionof{table}{External-scheduling comparison (nanoseconds)} 3127 \label{t:schedext} 3128 \begin{tabular}{@{}r*{3}{D{.}{.}{3.2}}@{}} 3129 \multicolumn{1}{@{}c}{} & \multicolumn{1}{c}{Median} &\multicolumn{1}{c}{Average} & \multicolumn{1}{c@{}}{Std Dev} \\ 3130 \CFA @waitfor@, 1 monitor & 367.1 & 365.3 & 5.0 \\ 3131 \CFA @waitfor@, 2 monitor & 463.0 & 464.6 & 7.1 \\ 3132 \CFA @waitfor@, 4 monitor & 689.6 & 696.2 & 21.5 \\ 3133 \uC \lstinline[language=uC++]|_Accept| monitor & 328.2 & 329.1 & 3.4 \\ 3134 Go \lstinline[language=Golang]|select| channel & 365.0 & 365.5 & 1.2 3135 \end{tabular} 3136 \end{multicols} 3137 3138 \paragraph{Mutual-Exclusion} 3139 3140 Uncontented mutual exclusion, which frequently occurs, is measured by entering and leaving a critical section. 3141 For monitors, entering and leaving a mutex function is measured, otherwise the language-appropriate mutex-lock is measured. 3142 For comparison, a spinning (versus blocking) test-and-test-set lock is presented. 3143 Figure~\ref{f:mutex} shows the code for \CFA with results in Table~\ref{t:mutex}. 3144 Note the incremental cost of bulk acquire for \CFA, which is largely a fixed cost for small numbers of mutex objects. 3145 3146 \begin{multicols}{2} 3147 \lstset{language=CFA,moredelim=**[is][\color{red}]{@}{@},deletedelim=**[is][]{`}{`}} 3148 \begin{cfa} 3149 @monitor@ M {} m1/*, m2, m3, m4*/; 3150 call( M & @mutex p1/*, p2, p3, p4*/@ ) {} 3151 int main() { 3152 BENCH( for( N ) call( m1/*, m2, m3, m4*/ ); ) 3153 sout | result; 3154 } 3155 \end{cfa} 3156 \captionof{figure}{\CFA acquire/release mutex benchmark} 3157 \label{f:mutex} 3158 3159 \columnbreak 3160 3161 \vspace*{-16pt} 3162 \captionof{table}{Mutex comparison (nanoseconds)} 3163 \label{t:mutex} 3164 \begin{tabular}{@{}r*{3}{D{.}{.}{3.2}}@{}} 3165 \multicolumn{1}{@{}c}{} & \multicolumn{1}{c}{Median} &\multicolumn{1}{c}{Average} & \multicolumn{1}{c@{}}{Std Dev} \\ 3166 test-and-test-set lock & 19.1 & 18.9 & 0.4 \\ 3167 \CFA @mutex@ function, 1 arg. & 48.3 & 47.8 & 0.9 \\ 3168 \CFA @mutex@ function, 2 arg. & 86.7 & 87.6 & 1.9 \\ 3169 \CFA @mutex@ function, 4 arg. & 173.4 & 169.4 & 5.9 \\ 3170 \uC @monitor@ member rtn. & 54.8 & 54.8 & 0.1 \\ 3171 Goroutine mutex lock & 34.0 & 34.0 & 0.0 \\ 3172 Rust mutex lock & 33.0 & 33.2 & 0.8 \\ 3173 Java synchronized method & 31.0 & 31.0 & 0.0 \\ 3174 Pthreads mutex Lock & 31.0 & 31.1 & 0.4 3175 \end{tabular} 3176 \end{multicols} 3177 3039 3178 \paragraph{Context Switching} 3040 3179 … … 3104 3243 \end{multicols} 3105 3244 3106 \vspace*{-10pt}3107 \paragraph{Internal Scheduling}3108 3109 Internal scheduling is measured using a cycle of two threads signalling and waiting.3110 Figure~\ref{f:schedint} shows the code for \CFA, with results in Table~\ref{t:schedint}.3111 Note, the incremental cost of bulk acquire for \CFA, which is largely a fixed cost for small numbers of mutex objects.3112 Java scheduling is significantly greater because the benchmark explicitly creates multiple thread in order to prevent the JIT from making the program sequential, \ie removing all locking.3113 3114 \begin{multicols}{2}3115 \lstset{language=CFA,moredelim=**[is][\color{red}]{@}{@},deletedelim=**[is][]{`}{`}}3116 \begin{cfa}3117 volatile int go = 0;3118 @condition c;@3119 @monitor@ M {} m1/*, m2, m3, m4*/;3120 void call( M & @mutex p1/*, p2, p3, p4*/@ ) {3121 @signal( c );@3122 }3123 void wait( M & @mutex p1/*, p2, p3, p4*/@ ) {3124 go = 1; // continue other thread3125 for ( N ) { @wait( c );@ } );3126 }3127 thread T {};3128 void main( T & ) {3129 while ( go == 0 ) { yield(); } // waiter must start first3130 BENCH( for ( N ) { call( m1/*, m2, m3, m4*/ ); } )3131 sout | result;3132 }3133 int main() {3134 T t;3135 wait( m1/*, m2, m3, m4*/ );3136 }3137 \end{cfa}3138 \vspace*{-8pt}3139 \captionof{figure}{\CFA Internal-scheduling benchmark}3140 \label{f:schedint}3141 3142 \columnbreak3143 3144 \vspace*{-16pt}3145 \captionof{table}{Internal-scheduling comparison (nanoseconds)}3146 \label{t:schedint}3147 \bigskip3148 3149 \begin{tabular}{@{}r*{3}{D{.}{.}{5.2}}@{}}3150 \multicolumn{1}{@{}c}{} & \multicolumn{1}{c}{Median} & \multicolumn{1}{c}{Average} & \multicolumn{1}{c@{}}{Std Dev} \\3151 \CFA @signal@, 1 monitor & 364.4 & 364.2 & 4.4 \\3152 \CFA @signal@, 2 monitor & 484.4 & 483.9 & 8.8 \\3153 \CFA @signal@, 4 monitor & 709.1 & 707.7 & 15.0 \\3154 \uC @signal@ monitor & 328.3 & 327.4 & 2.4 \\3155 Rust cond. variable & 7514.0 & 7437.4 & 397.2 \\3156 Java @notify@ monitor & 9623.0 & 9654.6 & 236.2 \\3157 Pthreads cond. variable & 5553.7 & 5576.1 & 345.63158 \end{tabular}3159 \end{multicols}3160 3161 3162 \paragraph{External Scheduling}3163 3164 External scheduling is measured using a cycle of two threads calling and accepting the call using the @waitfor@ statement.3165 Figure~\ref{f:schedext} shows the code for \CFA with results in Table~\ref{t:schedext}.3166 Note, the incremental cost of bulk acquire for \CFA, which is largely a fixed cost for small numbers of mutex objects.3167 3168 \begin{multicols}{2}3169 \lstset{language=CFA,moredelim=**[is][\color{red}]{@}{@},deletedelim=**[is][]{`}{`}}3170 \vspace*{-16pt}3171 \begin{cfa}3172 @monitor@ M {} m1/*, m2, m3, m4*/;3173 void call( M & @mutex p1/*, p2, p3, p4*/@ ) {}3174 void wait( M & @mutex p1/*, p2, p3, p4*/@ ) {3175 for ( N ) { @waitfor( call : p1/*, p2, p3, p4*/ );@ }3176 }3177 thread T {};3178 void main( T & ) {3179 BENCH( for ( N ) { call( m1/*, m2, m3, m4*/ ); } )3180 sout | result;3181 }3182 int main() {3183 T t;3184 wait( m1/*, m2, m3, m4*/ );3185 }3186 \end{cfa}3187 \captionof{figure}{\CFA external-scheduling benchmark}3188 \label{f:schedext}3189 3190 \columnbreak3191 3192 \vspace*{-16pt}3193 \captionof{table}{External-scheduling comparison (nanoseconds)}3194 \label{t:schedext}3195 \begin{tabular}{@{}r*{3}{D{.}{.}{3.2}}@{}}3196 \multicolumn{1}{@{}c}{} & \multicolumn{1}{c}{Median} &\multicolumn{1}{c}{Average} & \multicolumn{1}{c@{}}{Std Dev} \\3197 \CFA @waitfor@, 1 monitor & 367.1 & 365.3 & 5.0 \\3198 \CFA @waitfor@, 2 monitor & 463.0 & 464.6 & 7.1 \\3199 \CFA @waitfor@, 4 monitor & 689.6 & 696.2 & 21.5 \\3200 \uC \lstinline[language=uC++]|_Accept| monitor & 328.2 & 329.1 & 3.4 \\3201 Go \lstinline[language=Golang]|select| channel & 365.0 & 365.5 & 1.23202 \end{tabular}3203 \end{multicols}3204 3205 \paragraph{Mutual-Exclusion}3206 3207 Uncontented mutual exclusion, which frequently occurs, is measured by entering and leaving a critical section.3208 For monitors, entering and leaving a mutex function is measured, otherwise the language-appropriate mutex-lock is measured.3209 For comparison, a spinning (versus blocking) test-and-test-set lock is presented.3210 Figure~\ref{f:mutex} shows the code for \CFA with results in Table~\ref{t:mutex}.3211 Note the incremental cost of bulk acquire for \CFA, which is largely a fixed cost for small numbers of mutex objects.3212 3213 \begin{multicols}{2}3214 \lstset{language=CFA,moredelim=**[is][\color{red}]{@}{@},deletedelim=**[is][]{`}{`}}3215 \begin{cfa}3216 @monitor@ M {} m1/*, m2, m3, m4*/;3217 call( M & @mutex p1/*, p2, p3, p4*/@ ) {}3218 int main() {3219 BENCH( for( N ) call( m1/*, m2, m3, m4*/ ); )3220 sout | result;3221 }3222 \end{cfa}3223 \captionof{figure}{\CFA acquire/release mutex benchmark}3224 \label{f:mutex}3225 3226 \columnbreak3227 3228 \vspace*{-16pt}3229 \captionof{table}{Mutex comparison (nanoseconds)}3230 \label{t:mutex}3231 \begin{tabular}{@{}r*{3}{D{.}{.}{3.2}}@{}}3232 \multicolumn{1}{@{}c}{} & \multicolumn{1}{c}{Median} &\multicolumn{1}{c}{Average} & \multicolumn{1}{c@{}}{Std Dev} \\3233 test-and-test-set lock & 19.1 & 18.9 & 0.4 \\3234 \CFA @mutex@ function, 1 arg. & 48.3 & 47.8 & 0.9 \\3235 \CFA @mutex@ function, 2 arg. & 86.7 & 87.6 & 1.9 \\3236 \CFA @mutex@ function, 4 arg. & 173.4 & 169.4 & 5.9 \\3237 \uC @monitor@ member rtn. & 54.8 & 54.8 & 0.1 \\3238 Goroutine mutex lock & 34.0 & 34.0 & 0.0 \\3239 Rust mutex lock & 33.0 & 33.2 & 0.8 \\3240 Java synchronized method & 31.0 & 31.0 & 0.0 \\3241 Pthreads mutex Lock & 31.0 & 31.1 & 0.43242 \end{tabular}3243 \end{multicols}3244 3245 3245 3246 3246 \subsection{Discussion}
Note: See TracChangeset
for help on using the changeset viewer.