Changeset e98c7ab for doc/papers
- Timestamp:
- Jun 24, 2019, 3:05:10 PM (5 years ago)
- Branches:
- ADT, arm-eh, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
- Children:
- df9317bd
- Parents:
- 093a5d7
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/papers/concurrency/Paper.tex
r093a5d7 re98c7ab 280 280 The runtime also ensures multiple monitors can be safely acquired \emph{simultaneously} (deadlock free), and this feature is fully integrated with all monitor synchronization mechanisms. 281 281 All control-flow features integrate with the \CFA polymorphic type-system and exception handling, while respecting the expectations and style of C programmers. 282 Experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming -languages.282 Experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming languages. 283 283 }% 284 284 … … 301 301 In many ways, \CFA is to C as Scala~\cite{Scala} is to Java, providing a \emph{research vehicle} for new typing and control-flow capabilities on top of a highly popular programming language allowing immediate dissemination. 302 302 Within the \CFA framework, new control-flow features are created from scratch because ISO \Celeven defines only a subset of the \CFA extensions, where the overlapping features are concurrency~\cite[\S~7.26]{C11}. 303 However, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}, and \Celeven and pthreads concurrency is simple, based on thread fork/join in a function and a few locks, which is low-level and error 303 However, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}, and \Celeven and pthreads concurrency is simple, based on thread fork/join in a function and a few locks, which is low-level and error-prone; 304 304 no high-level language concurrency features are defined. 305 305 Interestingly, almost a decade after publication of the \Celeven standard, neither gcc-8, clang-9 nor msvc-19 (most recent versions) support the \Celeven include @threads.h@, indicating little interest in the C11 concurrency approach. … … 312 312 As a result, languages like Java, Scala, Objective-C~\cite{obj-c-book}, \CCeleven~\cite{C11}, and C\#~\cite{Csharp} adopt the 1:1 kernel-threading model, with a variety of presentation mechanisms. 313 313 From 2000 onwards, languages like Go~\cite{Go}, Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, D~\cite{D}, and \uC~\cite{uC++,uC++book} have championed the M:N user-threading model, and many user-threading libraries have appeared~\cite{Qthreads,MPC,Marcel}, including putting green threads back into Java~\cite{Quasar}. 314 The main argument for user-level threading is that they are lighter weight than kernel threads (locking and context switching do not cross the kernel boundary), so there is less restriction on programming styles that encourage large numbers of threads performing medium work -units to facilitate load balancing by the runtime~\cite{Verch12}.314 The main argument for user-level threading is that they are lighter weight than kernel threads (locking and context switching do not cross the kernel boundary), so there is less restriction on programming styles that encourage large numbers of threads performing medium work units to facilitate load balancing by the runtime~\cite{Verch12}. 315 315 As well, user-threading facilitates a simpler concurrency approach using thread objects that leverage sequential patterns versus events with call-backs~\cite{Adya02,vonBehren03}. 316 316 Finally, performant user-threading implementations (both time and space) meet or exceed direct kernel-threading implementations, while achieving the programming advantages of high concurrency levels and safety. 317 317 318 A further effort over the past two decades is the development of language memory -models to deal with the conflict between language features and compiler/hardware optimizations, i.e., some language features are unsafe in the presence of aggressive sequential optimizations~\cite{Buhr95a,Boehm05}.318 A further effort over the past two decades is the development of language memory models to deal with the conflict between language features and compiler/hardware optimizations, \ie, some language features are unsafe in the presence of aggressive sequential optimizations~\cite{Buhr95a,Boehm05}. 319 319 The consequence is that a language must provide sufficient tools to program around safety issues, as inline and library code is all sequential to the compiler. 320 One solution is low-level qualifiers and functions ( e.g., @volatile@ and atomics) allowing \emph{programmers} to explicitly write safe (race-free~\cite{Boehm12}) programs.320 One solution is low-level qualifiers and functions (\eg, @volatile@ and atomics) allowing \emph{programmers} to explicitly write safe (race-free~\cite{Boehm12}) programs. 321 321 A safer solution is high-level language constructs so the \emph{compiler} knows the optimization boundaries, and hence, provides implicit safety. 322 This problem is best know with respect to concurrency, but applies to other complex control-flow, like exceptions\footnote{322 This problem is best known with respect to concurrency, but applies to other complex control-flow, like exceptions\footnote{ 323 323 \CFA exception handling will be presented in a separate paper. 324 The key feature that dovetails with this paper is non -local exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++}324 The key feature that dovetails with this paper is nonlocal exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++} 325 325 } and coroutines. 326 Finally, language solutions allow matching constructs with language paradigm, i.e., imperative and functional languages often have different presentations of the same concept to fit their programming model.326 Finally, language solutions allow matching constructs with language paradigm, \ie, imperative and functional languages often have different presentations of the same concept to fit their programming model. 327 327 328 328 Finally, it is important for a language to provide safety over performance \emph{as the default}, allowing careful reduction of safety for performance when necessary. 329 Two concurrency violations of this philosophy are \emph{spurious wakeup} (random wakeup~\cite[\S~8]{Buhr05a}) and \emph{barging} (signals-as-hints~\cite[\S~8]{Buhr05a}), where one is a consequence of the other, i.e., once there is spurious wakeup, signals-as-hints follows.329 Two concurrency violations of this philosophy are \emph{spurious wakeup} (random wakeup~\cite[\S~8]{Buhr05a}) and \emph{barging} (signals-as-hints~\cite[\S~8]{Buhr05a}), where one is a consequence of the other, \ie, once there is spurious wakeup, signals-as-hints follow. 330 330 However, spurious wakeup is \emph{not} a foundational concurrency property~\cite[\S~8]{Buhr05a}, it is a performance design choice. 331 Similarly, signals-as-hints isoften a performance decision.332 We argue removing spurious wakeup and signals-as-hints make sconcurrent programming significantly safer because it removes local non-determinism and matches with programmer expectation.331 Similarly, signals-as-hints are often a performance decision. 332 We argue removing spurious wakeup and signals-as-hints make concurrent programming significantly safer because it removes local non-determinism and matches with programmer expectation. 333 333 (Author experience teaching concurrency is that students are highly confused by these semantics.) 334 334 Clawing back performance, when local non-determinism is unimportant, should be an option not the default. … … 337 337 Most augmented traditional (Fortran 18~\cite{Fortran18}, Cobol 14~\cite{Cobol14}, Ada 12~\cite{Ada12}, Java 11~\cite{Java11}) and new languages (Go~\cite{Go}, Rust~\cite{Rust}, and D~\cite{D}), except \CC, diverge from C with different syntax and semantics, only interoperate indirectly with C, and are not systems languages, for those with managed memory. 338 338 As a result, there is a significant learning curve to move to these languages, and C legacy-code must be rewritten. 339 While \CC, like \CFA, takes an evolutionary approach to extend C, \CC's constantly growing complex and interdependent features-set ( e.g., objects, inheritance, templates, etc.) mean idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning \CC.339 While \CC, like \CFA, takes an evolutionary approach to extend C, \CC's constantly growing complex and interdependent features-set (\eg, objects, inheritance, templates, etc.) mean idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning \CC. 340 340 Hence, rewriting and retraining costs for these languages, even \CC, are prohibitive for companies with a large C software-base. 341 341 \CFA with its orthogonal feature-set, its high-performance runtime, and direct access to all existing C libraries circumvents these problems. … … 343 343 344 344 \CFA embraces user-level threading, language extensions for advanced control-flow, and safety as the default. 345 We present comparative examples so the reader can judge if the \CFA control-flow extensions are better and safer than those in other concurrent, imperative programming -languages, and perform experiments to show the \CFA runtime is competitive with other similar mechanisms.345 We present comparative examples so the reader can judge if the \CFA control-flow extensions are better and safer than those in other concurrent, imperative programming languages, and perform experiments to show the \CFA runtime is competitive with other similar mechanisms. 346 346 The main contributions of this work are: 347 347 \begin{itemize} … … 349 349 language-level generators, coroutines and user-level threading, which respect the expectations of C programmers. 350 350 \item 351 monitor synchronization without barging, and the ability to safely acquiring multiple monitors \emph{simultaneously} (deadlock free), while seamlessly integrating these capabilit ywith all monitor synchronization mechanisms.351 monitor synchronization without barging, and the ability to safely acquiring multiple monitors \emph{simultaneously} (deadlock free), while seamlessly integrating these capabilities with all monitor synchronization mechanisms. 352 352 \item 353 353 providing statically type-safe interfaces that integrate with the \CFA polymorphic type-system and other language features. … … 367 367 \section{Stateful Function} 368 368 369 The stateful function is an old idea~\cite{Conway63,Marlin80} that is new again~\cite{C++20Coroutine19}, where execution is temporarily suspended and later resumed, e.g., plugin, device driver, finite-state machine.369 The stateful function is an old idea~\cite{Conway63,Marlin80} that is new again~\cite{C++20Coroutine19}, where execution is temporarily suspended and later resumed, \eg, plugin, device driver, finite-state machine. 370 370 Hence, a stateful function may not end when it returns to its caller, allowing it to be restarted with the data and execution location present at the point of suspension. 371 371 This capability is accomplished by retaining a data/execution \emph{closure} between invocations. 372 If the closure is fixed size, we call it a \emph{generator} (or \emph{stackless}), and its control flow is restricted, e.g., suspending outside the generator is prohibited.373 If the closure is variabl esized, we call it a \emph{coroutine} (or \emph{stackful}), and as the names implies, often implemented with a separate stack with no programming restrictions.372 If the closure is fixed size, we call it a \emph{generator} (or \emph{stackless}), and its control flow is restricted, \eg, suspending outside the generator is prohibited. 373 If the closure is variably sized, we call it a \emph{coroutine} (or \emph{stackful}), and as the names implies, often implemented with a separate stack with no programming restrictions. 374 374 Hence, refactoring a stackless coroutine may require changing it to stackful. 375 A foundational property of all \emph{stateful functions} is that resume/suspend \emph{do not} cause incremental stack growth, i.e., resume/suspend operations are remembered through the closure not the stack.375 A foundational property of all \emph{stateful functions} is that resume/suspend \emph{do not} cause incremental stack growth, \ie, resume/suspend operations are remembered through the closure not the stack. 376 376 As well, activating a stateful function is \emph{asymmetric} or \emph{symmetric}, identified by resume/suspend (no cycles) and resume/resume (cycles). 377 377 A fixed closure activated by modified call/return is faster than a variable closure activated by context switching. 378 Additionally, any storage management for the closure (especially in unmanaged languages, i.e., no garbage collection) must also be factored into design and performance.378 Additionally, any storage management for the closure (especially in unmanaged languages, \ie, no garbage collection) must also be factored into design and performance. 379 379 Therefore, selecting between stackless and stackful semantics is a tradeoff between programming requirements and performance, where stackless is faster and stackful is more general. 380 380 Note, creation cost is amortized across usage, so activation cost is usually the dominant factor. … … 603 603 the top initialization state appears at the start and the middle execution state is denoted by statement @suspend@. 604 604 Any local variables in @main@ \emph{are not retained} between calls; 605 hence local variable are only for temporary computations \emph{between} suspends.605 hence local variables are only for temporary computations \emph{between} suspends. 606 606 All retained state \emph{must} appear in the generator's type. 607 607 As well, generator code containing a @suspend@ cannot be refactored into a helper function called by the generator, because @suspend@ is implemented via @return@, so a return from the helper function goes back to the current generator not the resumer. … … 618 618 sout | (int)f1() | (double)f1() | f2( 2 ); // alternative interface, cast selects call based on return type, step 2 values 619 619 \end{cfa} 620 Now, the generator can be a separately -compiled opaque-type only accessed through its interface functions.620 Now, the generator can be a separately compiled opaque-type only accessed through its interface functions. 621 621 For contrast, Figure~\ref{f:PythonFibonacci} shows the equivalent Python Fibonacci generator, which does not use a generator type, and hence only has a single interface, but an implicit closure. 622 622 … … 624 624 (This restriction is removed by the coroutine in Section~\ref{s:Coroutine}.) 625 625 This requirement follows from the generality of variable-size local-state, \eg local state with a variable-length array requires dynamic allocation because the array size is unknown at compile time. 626 However, dynamic allocation significantly increases the cost of generator creation/destruction and is a show -stopper for embedded real-time programming.626 However, dynamic allocation significantly increases the cost of generator creation/destruction and is a showstopper for embedded real-time programming. 627 627 But more importantly, the size of the generator type is tied to the local state in the generator main, which precludes separate compilation of the generator main, \ie a generator must be inlined or local state must be dynamically allocated. 628 628 With respect to safety, we believe static analysis can discriminate local state from temporary variables in a generator, \ie variable usage spanning @suspend@, and generate a compile-time error. … … 648 648 \end{center} 649 649 The example takes advantage of resuming a generator in the constructor to prime the loops so the first character sent for formatting appears inside the nested loops. 650 The destructor provides a newline ,if formatted text ends with a full line.650 The destructor provides a newline if formatted text ends with a full line. 651 651 Figure~\ref{f:CFormatSim} shows the C implementation of the \CFA input generator with one additional field and the computed @goto@. 652 652 For contrast, Figure~\ref{f:PythonFormatter} shows the equivalent Python format generator with the same properties as the Fibonacci generator. … … 669 669 In contrast, the execution state is large, with one @resume@ and seven @suspend@s. 670 670 Hence, the key benefits of the generator are correctness, safety, and maintenance because the execution states are transcribed directly into the programming language rather than using a table-driven approach. 671 Because FSMs can be complex and occur frequently in important domains, direct support of the generator is crucial in a systems programming-language.671 Because FSMs can be complex and frequently occur in important domains, direct support of the generator is crucial in a system programming language. 672 672 673 673 \begin{figure} … … 782 782 The steps for symmetric control-flow are creating, executing, and terminating the cycle. 783 783 Constructing the cycle must deal with definition-before-use to close the cycle, \ie, the first generator must know about the last generator, which is not within scope. 784 (This issue s occurs for any cyclic data-structure.)784 (This issue occurs for any cyclic data structure.) 785 785 % The example creates all the generators and then assigns the partners that form the cycle. 786 786 % Alternatively, the constructor can assign the partners as they are declared, except the first, and the first-generator partner is set after the last generator declaration to close the cycle. … … 792 792 793 793 Figure~\ref{f:CPingPongSim} shows the implementation of the symmetric generator, where the complexity is the @resume@, which needs an extension to the calling convention to perform a forward rather than backward jump. 794 This jump 794 This jump-starts at the top of the next generator main to re-execute the normal calling convention to make space on the stack for its local variables. 795 795 However, before the jump, the caller must reset its stack (and any registers) equivalent to a @return@, but subsequently jump forward. 796 796 This semantics is basically a tail-call optimization, which compilers already perform. … … 862 862 863 863 Finally, part of this generator work was inspired by the recent \CCtwenty generator proposal~\cite{C++20Coroutine19} (which they call coroutines). 864 Our work provides the same high-performance asymmetric -generators as \CCtwenty, and extends their work with symmetric generators.865 An additional \CCtwenty generator feature allows @suspend@ and @resume@ to be followed by a restricted compound -statement that is executed after the current generator has reset its stack but before calling the next generator, specified with \CFA syntax:864 Our work provides the same high-performance asymmetric generators as \CCtwenty, and extends their work with symmetric generators. 865 An additional \CCtwenty generator feature allows @suspend@ and @resume@ to be followed by a restricted compound statement that is executed after the current generator has reset its stack but before calling the next generator, specified with \CFA syntax: 866 866 \begin{cfa} 867 867 ... suspend`{ ... }`; … … 879 879 A coroutine is specified by replacing @generator@ with @coroutine@ for the type. 880 880 Coroutine generality results in higher cost for creation, due to dynamic stack allocation, execution, due to context switching among stacks, and terminating, due to possible stack unwinding and dynamic stack deallocation. 881 A series of different kinds of coroutines and their implementation demonstrate how coroutines extend generators.881 A series of different kinds of coroutines and their implementations demonstrate how coroutines extend generators. 882 882 883 883 First, the previous generator examples are converted to their coroutine counterparts, allowing local-state variables to be moved from the generator type into the coroutine main. … … 1164 1164 \end{cfa} 1165 1165 \end{tabular} 1166 \caption{Producer / consumer: resume-resume cycle, bi -directional communication}1166 \caption{Producer / consumer: resume-resume cycle, bidirectional communication} 1167 1167 \label{f:ProdCons} 1168 1168 \end{figure} … … 1208 1208 Furthermore, each deallocated coroutine must guarantee all destructors are run for object allocated in the coroutine type \emph{and} allocated on the coroutine's stack at the point of suspension, which can be arbitrarily deep. 1209 1209 When a coroutine's main ends, its stack is already unwound so any stack allocated objects with destructors have been finalized. 1210 The na\"{i}ve semantics for coroutine-cycle termination is context switch to the last resumer, like executing a @suspend@/@return@ in a generator.1210 The na\"{i}ve semantics for coroutine-cycle termination is to context switch to the last resumer, like executing a @suspend@/@return@ in a generator. 1211 1211 However, for coroutines, the last resumer is \emph{not} implicitly below the current stack frame, as for generators, because each coroutine's stack is independent. 1212 1212 Unfortunately, it is impossible to determine statically if a coroutine is in a cycle and unrealistic to check dynamically (graph-cycle problem). … … 1214 1214 1215 1215 Our solution is to context switch back to the first resumer (starter) once the coroutine ends. 1216 This semantics works well for the most common asymmetric and symmetric coroutine usage -patterns.1216 This semantics works well for the most common asymmetric and symmetric coroutine usage patterns. 1217 1217 For asymmetric coroutines, it is common for the first resumer (starter) coroutine to be the only resumer. 1218 1218 All previous generators converted to coroutines have this property. 1219 For symmetric coroutines, it is common for the cycle creator to persist for the life -time of the cycle.1219 For symmetric coroutines, it is common for the cycle creator to persist for the lifetime of the cycle. 1220 1220 Hence, the starter coroutine is remembered on the first resume and ending the coroutine resumes the starter. 1221 1221 Figure~\ref{f:ProdConsRuntimeStacks} shows this semantic by the dashed lines from the end of the coroutine mains: @prod@ starts @cons@ so @cons@ resumes @prod@ at the end, and the program main starts @prod@ so @prod@ resumes the program main at the end. … … 1285 1285 \end{cfa} 1286 1286 Note, copying generators/coroutines/threads is not meaningful. 1287 For example, both the resumer and suspender descriptors can have bi -directional pointers;1287 For example, both the resumer and suspender descriptors can have bidirectional pointers; 1288 1288 copying these coroutines does not update the internal pointers so behaviour of both copies would be difficult to understand. 1289 1289 Furthermore, two coroutines cannot logically execute on the same stack. 1290 1290 A deep coroutine copy, which copies the stack, is also meaningless in an unmanaged language (no garbage collection), like C, because the stack may contain pointers to object within it that require updating for the copy. 1291 1291 The \CFA @dtype@ property provides no \emph{implicit} copying operations and the @is_coroutine@ trait provides no \emph{explicit} copying operations, so all coroutines must be passed by reference (pointer). 1292 The function definitions ensure s there is a statically-typed @main@ function that is the starting point (first stack frame) of a coroutine, and a mechanism to get (read) the coroutine descriptor from its handle.1292 The function definitions ensure there is a statically typed @main@ function that is the starting point (first stack frame) of a coroutine, and a mechanism to get (read) the coroutine descriptor from its handle. 1293 1293 The @main@ function has no return value or additional parameters because the coroutine type allows an arbitrary number of interface functions with corresponding arbitrary typed input/output values versus fixed ones. 1294 1294 The advantage of this approach is that users can easily create different types of coroutines, \eg changing the memory layout of a coroutine is trivial when implementing the @get_coroutine@ function, and possibly redefining \textsf{suspend} and @resume@. … … 1342 1342 For a VLS stack allocation/deallocation is an inexpensive adjustment of the stack pointer, modulo any stack constructor costs (\eg initial frame setup). 1343 1343 For heap stack allocation, allocation/deallocation is an expensive heap allocation (where the heap can be a shared resource), modulo any stack constructor costs. 1344 With heap stack allocation, it is also possible to use a split (segmented) stack calling -convention, available with gcc and clang, so the stack is variable sized.1344 With heap stack allocation, it is also possible to use a split (segmented) stack calling convention, available with gcc and clang, so the stack is variable sized. 1345 1345 Currently, \CFA supports stack/heap allocated descriptors but only fixed-sized heap allocated stacks. 1346 1346 In \CFA debug-mode, the fixed-sized stack is terminated with a write-only page, which catches most stack overflows. … … 1359 1359 \label{s:Concurrency} 1360 1360 1361 Concurrency is nondeterministic scheduling of independent sequential execution -paths (threads), where each thread has its own stack.1361 Concurrency is nondeterministic scheduling of independent sequential execution paths (threads), where each thread has its own stack. 1362 1362 A single thread with multiple call stacks, \newterm{coroutining}~\cite{Conway63,Marlin80}, does \emph{not} imply concurrency~\cite[\S~2]{Buhr05a}. 1363 1363 In coroutining, coroutines self-schedule the thread across stacks so execution is deterministic. … … 1367 1367 The transition to concurrency, even for a single thread with multiple stacks, occurs when coroutines context switch to a \newterm{scheduling coroutine}, introducing non-determinism from the coroutine perspective~\cite[\S~3,]{Buhr05a}. 1368 1368 Therefore, a minimal concurrency system requires coroutines \emph{in conjunction with a nondeterministic scheduler}. 1369 The resulting execution system now follows a cooperative threading -model~\cite{Adya02,libdill}, called \newterm{non-preemptive scheduling}.1369 The resulting execution system now follows a cooperative threading model~\cite{Adya02,libdill}, called \newterm{non-preemptive scheduling}. 1370 1370 Adding \newterm{preemption} introduces non-cooperative scheduling, where context switching occurs randomly between any two instructions often based on a timer interrupt, called \newterm{preemptive scheduling}. 1371 1371 While a scheduler introduces uncertain execution among explicit context switches, preemption introduces uncertainty by introducing implicit context switches. … … 1497 1497 \end{cquote} 1498 1498 Like coroutines, the @dtype@ property prevents \emph{implicit} copy operations and the @is_thread@ trait provides no \emph{explicit} copy operations, so threads must be passed by reference (pointer). 1499 Similarly, the function definitions ensure s there is a statically-typed @main@ function that is the thread starting point (first stack frame), a mechanism to get (read) the thread descriptor from its handle, and a special destructor to prevent deallocation while the thread is executing.1499 Similarly, the function definitions ensure there is a statically typed @main@ function that is the thread starting point (first stack frame), a mechanism to get (read) the thread descriptor from its handle, and a special destructor to prevent deallocation while the thread is executing. 1500 1500 (The qualifier @mutex@ for the destructor parameter is discussed in Section~\ref{s:Monitor}.) 1501 1501 The difference between the coroutine and thread is that a coroutine borrows a thread from its caller, so the first thread resuming a coroutine creates the coroutine's stack and starts running the coroutine main on the stack; … … 1512 1512 Hence, a programmer must learn and manipulate two sets of design/programming patterns. 1513 1513 While this distinction can be hidden away in library code, effective use of the library still has to take both paradigms into account. 1514 In contrast, approaches based on stateful models more closely resemble the standard call/return programming -model, resulting in a single programming paradigm.1514 In contrast, approaches based on stateful models more closely resemble the standard call/return programming model, resulting in a single programming paradigm. 1515 1515 1516 1516 At the lowest level, concurrent control is implemented by atomic operations, upon which different kinds of locking mechanisms are constructed, \eg semaphores~\cite{Dijkstra68b}, barriers, and path expressions~\cite{Campbell74}. … … 1520 1520 1521 1521 One of the most natural, elegant, and efficient mechanisms for mutual exclusion and synchronization for shared-memory systems is the \emph{monitor}. 1522 First proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}, many concurrent programming -languages provide monitors as an explicit language construct: \eg Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Modula~\cite{Modula-2}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, NeWS~\cite{NeWS}, Emerald~\cite{Emerald}, \uC~\cite{Buhr92a} and Java~\cite{Java}.1522 First proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}, many concurrent programming languages provide monitors as an explicit language construct: \eg Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Modula~\cite{Modula-2}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, NeWS~\cite{NeWS}, Emerald~\cite{Emerald}, \uC~\cite{Buhr92a} and Java~\cite{Java}. 1523 1523 In addition, operating-system kernels and device drivers have a monitor-like structure, although they often use lower-level primitives such as mutex locks or semaphores to simulate monitors. 1524 For these reasons, \CFA selected monitors as the core high-level concurrency -construct, upon which higher-level approaches can be easily constructed.1524 For these reasons, \CFA selected monitors as the core high-level concurrency construct, upon which higher-level approaches can be easily constructed. 1525 1525 1526 1526 … … 1571 1571 % (While a constructor may publish its address into a global variable, doing so generates a race-condition.) 1572 1572 The prefix increment operation, @++?@, is normally @mutex@, indicating mutual exclusion is necessary during function execution, to protect the incrementing from race conditions, unless there is an atomic increment instruction for the implementation type. 1573 The assignment operators provide bi -directional conversion between an atomic and normal integer without accessing field @cnt@;1573 The assignment operators provide bidirectional conversion between an atomic and normal integer without accessing field @cnt@; 1574 1574 these operations only need @mutex@, if reading/writing the implementation type is not atomic. 1575 1575 The atomic counter is used without any explicit mutual-exclusion and provides thread-safe semantics, which is similar to the \CC template @std::atomic@. … … 1593 1593 Similar safety is offered by \emph{explicit} mechanisms like \CC RAII; 1594 1594 monitor \emph{implicit} safety ensures no programmer usage errors. 1595 Furthermore, RAII mechan sims cannot handle complex synchronization within a monitor, where the monitor lock may not be released on function exit because it is passed to an unblocking thread;1595 Furthermore, RAII mechanisms cannot handle complex synchronization within a monitor, where the monitor lock may not be released on function exit because it is passed to an unblocking thread; 1596 1596 RAII is purely a mutual-exclusion mechanism (see Section~\ref{s:Scheduling}). 1597 1597 … … 1632 1632 1633 1633 The benefit of mandatory monitor qualifiers is self-documentation, but requiring both @mutex@ and \lstinline[morekeywords=nomutex]@nomutex@ for all monitor parameters is redundant. 1634 Instead, the semantics ha veone qualifier as the default and the other required.1634 Instead, the semantics has one qualifier as the default and the other required. 1635 1635 For example, make the safe @mutex@ qualifier the default because assuming \lstinline[morekeywords=nomutex]@nomutex@ may cause subtle errors. 1636 1636 Alternatively, make the unsafe \lstinline[morekeywords=nomutex]@nomutex@ qualifier the default because it is the \emph{normal} parameter semantics while @mutex@ parameters are rare. … … 1809 1809 Note, signalling cannot have the signaller and signalled thread in the monitor simultaneously because of the mutual exclusion, so either the signaller or signallee can proceed. 1810 1810 For internal scheduling, threads are unblocked from condition queues using @signal@, where the signallee is moved to urgent and the signaller continues (solid line). 1811 Multiple signals move multiple signallees to urgent ,until the condition is empty.1811 Multiple signals move multiple signallees to urgent until the condition is empty. 1812 1812 When the signaller exits or waits, a thread blocked on urgent is processed before calling threads to prevent barging. 1813 1813 (Java conceptually moves the signalled thread to the calling queue, and hence, allows barging.) … … 1819 1819 The @waitfor@ has the same semantics as @signal_block@, where the signalled thread executes before the signallee, which waits on urgent. 1820 1820 Executing multiple @waitfor@s from different signalled functions causes the calling threads to move to urgent. 1821 External scheduling requires urgent to be a stack, because the signaller ex cepts to execute immediately after the specified monitor call has exited or waited.1821 External scheduling requires urgent to be a stack, because the signaller expects to execute immediately after the specified monitor call has exited or waited. 1822 1822 Internal scheduling behaves the same for an urgent stack or queue, except for multiple signalling, where the threads unblock from urgent in reverse order from signalling. 1823 1823 If the restart order is important, multiple signalling by a signal thread can be transformed into daisy-chain signalling among threads, where each thread signals the next thread. … … 2143 2143 \end{figure} 2144 2144 2145 Note, a group of conditional @waitfor@ clauses is \emph{not} the same as a group of @if@ statements, e.g.:2145 Note, a group of conditional @waitfor@ clauses is \emph{not} the same as a group of @if@ statements, \eg: 2146 2146 \begin{cfa} 2147 2147 if ( C1 ) waitfor( mem1 ); when ( C1 ) waitfor( mem1 ); … … 2256 2256 \label{s:LooseObjectDefinitions} 2257 2257 2258 In an object-oriented programming -language, a class includes an exhaustive list of operations.2258 In an object-oriented programming language, a class includes an exhaustive list of operations. 2259 2259 A new class can add members via static inheritance but the subclass still has an exhaustive list of operations. 2260 2260 (Dynamic member adding, \eg JavaScript~\cite{JavaScript}, is not considered.) … … 2671 2671 \subsection{Preemption} 2672 2672 2673 Nondeterministic preemption provides fairness from long 2673 Nondeterministic preemption provides fairness from long-running threads, and forces concurrent programmers to write more robust programs, rather than relying on code between cooperative scheduling to be atomic. 2674 2674 This atomic reliance can fail on multi-core machines, because execution across cores is nondeterministic. 2675 2675 A different reason for not supporting preemption is that it significantly complicates the runtime system, \eg Microsoft runtime does not support interrupts and on Linux systems, interrupts are complex (see below). 2676 Preemption is normally handled by setting a count -down timer on each virtual processor.2677 When the timer expires, an interrupt is delivered, and the interrupt handler resets the count -down timer, and if the virtual processor is executing in user code, the signal handler performs a user-level context-switch, or if executing in the language runtime-kernel, the preemption is ignored or rolled forward to the point where the runtime kernel context switches back to user code.2676 Preemption is normally handled by setting a countdown timer on each virtual processor. 2677 When the timer expires, an interrupt is delivered, and the interrupt handler resets the countdown timer, and if the virtual processor is executing in user code, the signal handler performs a user-level context-switch, or if executing in the language runtime kernel, the preemption is ignored or rolled forward to the point where the runtime kernel context switches back to user code. 2678 2678 Multiple signal handlers may be pending. 2679 2679 When control eventually switches back to the signal handler, it returns normally, and execution continues in the interrupted user thread, even though the return from the signal handler may be on a different kernel thread than the one where the signal is delivered. … … 2685 2685 \begin{cquote} 2686 2686 A process-directed signal may be delivered to any one of the threads that does not currently have the signal blocked. 2687 If more than one of the threads has the signal unblocked, then the kernel chooses an arbitrary thread to which todeliver the signal.2687 If more than one of the threads has the signal unblocked, then the kernel chooses an arbitrary thread to which it will deliver the signal. 2688 2688 SIGNAL(7) - Linux Programmer's Manual 2689 2689 \end{cquote} … … 2691 2691 To ensure each virtual processor receives a preemption signal, a discrete-event simulation is run on a special virtual processor, and only it sets and receives timer events. 2692 2692 Virtual processors register an expiration time with the discrete-event simulator, which is inserted in sorted order. 2693 The simulation sets the count -down timer to the value at the head of the event list, and when the timer expires, all events less than or equal to the current time are processed.2693 The simulation sets the countdown timer to the value at the head of the event list, and when the timer expires, all events less than or equal to the current time are processed. 2694 2694 Processing a preemption event sends an \emph{internal} @SIGUSR1@ signal to the registered virtual processor, which is always delivered to that processor. 2695 2695 … … 2913 2913 \paragraph{Mutual-Exclusion} 2914 2914 2915 Uncontented mutual exclusion, which occurs frequently, is measured by entering/leaving a critical section.2915 Uncontented mutual exclusion, which frequently occurs, is measured by entering/leaving a critical section. 2916 2916 For monitors, entering and leaving a monitor function is measured. 2917 2917 To put the results in context, the cost of entering a non-inline function and the cost of acquiring and releasing a @pthread_mutex@ lock is also measured.
Note: See TracChangeset
for help on using the changeset viewer.