Changeset e98c7ab for doc/papers


Ignore:
Timestamp:
Jun 24, 2019, 3:05:10 PM (5 years ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
ADT, arm-eh, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
df9317bd
Parents:
093a5d7
Message:

Passed spell checker on the paper, it had a hard time with latex so it's not perfect

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/papers/concurrency/Paper.tex

    r093a5d7 re98c7ab  
    280280The runtime also ensures multiple monitors can be safely acquired \emph{simultaneously} (deadlock free), and this feature is fully integrated with all monitor synchronization mechanisms.
    281281All control-flow features integrate with the \CFA polymorphic type-system and exception handling, while respecting the expectations and style of C programmers.
    282 Experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming-languages.
     282Experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming languages.
    283283}%
    284284
     
    301301In many ways, \CFA is to C as Scala~\cite{Scala} is to Java, providing a \emph{research vehicle} for new typing and control-flow capabilities on top of a highly popular programming language allowing immediate dissemination.
    302302Within the \CFA framework, new control-flow features are created from scratch because ISO \Celeven defines only a subset of the \CFA extensions, where the overlapping features are concurrency~\cite[\S~7.26]{C11}.
    303 However, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}, and \Celeven and pthreads concurrency is simple, based on thread fork/join in a function and a few locks, which is low-level and error prone;
     303However, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}, and \Celeven and pthreads concurrency is simple, based on thread fork/join in a function and a few locks, which is low-level and error-prone;
    304304no high-level language concurrency features are defined.
    305305Interestingly, almost a decade after publication of the \Celeven standard, neither gcc-8, clang-9 nor msvc-19 (most recent versions) support the \Celeven include @threads.h@, indicating little interest in the C11 concurrency approach.
     
    312312As a result, languages like Java, Scala, Objective-C~\cite{obj-c-book}, \CCeleven~\cite{C11}, and C\#~\cite{Csharp} adopt the 1:1 kernel-threading model, with a variety of presentation mechanisms.
    313313From 2000 onwards, languages like Go~\cite{Go}, Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, D~\cite{D}, and \uC~\cite{uC++,uC++book} have championed the M:N user-threading model, and many user-threading libraries have appeared~\cite{Qthreads,MPC,Marcel}, including putting green threads back into Java~\cite{Quasar}.
    314 The main argument for user-level threading is that they are lighter weight than kernel threads (locking and context switching do not cross the kernel boundary), so there is less restriction on programming styles that encourage large numbers of threads performing medium work-units to facilitate load balancing by the runtime~\cite{Verch12}.
     314The main argument for user-level threading is that they are lighter weight than kernel threads (locking and context switching do not cross the kernel boundary), so there is less restriction on programming styles that encourage large numbers of threads performing medium work units to facilitate load balancing by the runtime~\cite{Verch12}.
    315315As well, user-threading facilitates a simpler concurrency approach using thread objects that leverage sequential patterns versus events with call-backs~\cite{Adya02,vonBehren03}.
    316316Finally, performant user-threading implementations (both time and space) meet or exceed direct kernel-threading implementations, while achieving the programming advantages of high concurrency levels and safety.
    317317
    318 A further effort over the past two decades is the development of language memory-models to deal with the conflict between language features and compiler/hardware optimizations, i.e., some language features are unsafe in the presence of aggressive sequential optimizations~\cite{Buhr95a,Boehm05}.
     318A further effort over the past two decades is the development of language memory models to deal with the conflict between language features and compiler/hardware optimizations, \ie, some language features are unsafe in the presence of aggressive sequential optimizations~\cite{Buhr95a,Boehm05}.
    319319The consequence is that a language must provide sufficient tools to program around safety issues, as inline and library code is all sequential to the compiler.
    320 One solution is low-level qualifiers and functions (e.g., @volatile@ and atomics) allowing \emph{programmers} to explicitly write safe (race-free~\cite{Boehm12}) programs.
     320One solution is low-level qualifiers and functions (\eg, @volatile@ and atomics) allowing \emph{programmers} to explicitly write safe (race-free~\cite{Boehm12}) programs.
    321321A safer solution is high-level language constructs so the \emph{compiler} knows the optimization boundaries, and hence, provides implicit safety.
    322 This problem is best know with respect to concurrency, but applies to other complex control-flow, like exceptions\footnote{
     322This problem is best known with respect to concurrency, but applies to other complex control-flow, like exceptions\footnote{
    323323\CFA exception handling will be presented in a separate paper.
    324 The key feature that dovetails with this paper is non-local exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++}
     324The key feature that dovetails with this paper is nonlocal exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++}
    325325} and coroutines.
    326 Finally, language solutions allow matching constructs with language paradigm, i.e., imperative and functional languages often have different presentations of the same concept to fit their programming model.
     326Finally, language solutions allow matching constructs with language paradigm, \ie, imperative and functional languages often have different presentations of the same concept to fit their programming model.
    327327
    328328Finally, it is important for a language to provide safety over performance \emph{as the default}, allowing careful reduction of safety for performance when necessary.
    329 Two concurrency violations of this philosophy are \emph{spurious wakeup} (random wakeup~\cite[\S~8]{Buhr05a}) and \emph{barging} (signals-as-hints~\cite[\S~8]{Buhr05a}), where one is a consequence of the other, i.e., once there is spurious wakeup, signals-as-hints follows.
     329Two concurrency violations of this philosophy are \emph{spurious wakeup} (random wakeup~\cite[\S~8]{Buhr05a}) and \emph{barging} (signals-as-hints~\cite[\S~8]{Buhr05a}), where one is a consequence of the other, \ie, once there is spurious wakeup, signals-as-hints follow.
    330330However, spurious wakeup is \emph{not} a foundational concurrency property~\cite[\S~8]{Buhr05a}, it is a performance design choice.
    331 Similarly, signals-as-hints is often a performance decision.
    332 We argue removing spurious wakeup and signals-as-hints makes concurrent programming significantly safer because it removes local non-determinism and matches with programmer expectation.
     331Similarly, signals-as-hints are often a performance decision.
     332We argue removing spurious wakeup and signals-as-hints make concurrent programming significantly safer because it removes local non-determinism and matches with programmer expectation.
    333333(Author experience teaching concurrency is that students are highly confused by these semantics.)
    334334Clawing back performance, when local non-determinism is unimportant, should be an option not the default.
     
    337337Most augmented traditional (Fortran 18~\cite{Fortran18}, Cobol 14~\cite{Cobol14}, Ada 12~\cite{Ada12}, Java 11~\cite{Java11}) and new languages (Go~\cite{Go}, Rust~\cite{Rust}, and D~\cite{D}), except \CC, diverge from C with different syntax and semantics, only interoperate indirectly with C, and are not systems languages, for those with managed memory.
    338338As a result, there is a significant learning curve to move to these languages, and C legacy-code must be rewritten.
    339 While \CC, like \CFA, takes an evolutionary approach to extend C, \CC's constantly growing complex and interdependent features-set (e.g., objects, inheritance, templates, etc.) mean idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning \CC.
     339While \CC, like \CFA, takes an evolutionary approach to extend C, \CC's constantly growing complex and interdependent features-set (\eg, objects, inheritance, templates, etc.) mean idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning \CC.
    340340Hence, rewriting and retraining costs for these languages, even \CC, are prohibitive for companies with a large C software-base.
    341341\CFA with its orthogonal feature-set, its high-performance runtime, and direct access to all existing C libraries circumvents these problems.
     
    343343
    344344\CFA embraces user-level threading, language extensions for advanced control-flow, and safety as the default.
    345 We present comparative examples so the reader can judge if the \CFA control-flow extensions are better and safer than those in other concurrent, imperative programming-languages, and perform experiments to show the \CFA runtime is competitive with other similar mechanisms.
     345We present comparative examples so the reader can judge if the \CFA control-flow extensions are better and safer than those in other concurrent, imperative programming languages, and perform experiments to show the \CFA runtime is competitive with other similar mechanisms.
    346346The main contributions of this work are:
    347347\begin{itemize}
     
    349349language-level generators, coroutines and user-level threading, which respect the expectations of C programmers.
    350350\item
    351 monitor synchronization without barging, and the ability to safely acquiring multiple monitors \emph{simultaneously} (deadlock free), while seamlessly integrating these capability with all monitor synchronization mechanisms.
     351monitor synchronization without barging, and the ability to safely acquiring multiple monitors \emph{simultaneously} (deadlock free), while seamlessly integrating these capabilities with all monitor synchronization mechanisms.
    352352\item
    353353providing statically type-safe interfaces that integrate with the \CFA polymorphic type-system and other language features.
     
    367367\section{Stateful Function}
    368368
    369 The stateful function is an old idea~\cite{Conway63,Marlin80} that is new again~\cite{C++20Coroutine19}, where execution is temporarily suspended and later resumed, e.g., plugin, device driver, finite-state machine.
     369The stateful function is an old idea~\cite{Conway63,Marlin80} that is new again~\cite{C++20Coroutine19}, where execution is temporarily suspended and later resumed, \eg, plugin, device driver, finite-state machine.
    370370Hence, a stateful function may not end when it returns to its caller, allowing it to be restarted with the data and execution location present at the point of suspension.
    371371This capability is accomplished by retaining a data/execution \emph{closure} between invocations.
    372 If the closure is fixed size, we call it a \emph{generator} (or \emph{stackless}), and its control flow is restricted, e.g., suspending outside the generator is prohibited.
    373 If the closure is variable sized, we call it a \emph{coroutine} (or \emph{stackful}), and as the names implies, often implemented with a separate stack with no programming restrictions.
     372If the closure is fixed size, we call it a \emph{generator} (or \emph{stackless}), and its control flow is restricted, \eg, suspending outside the generator is prohibited.
     373If the closure is variably sized, we call it a \emph{coroutine} (or \emph{stackful}), and as the names implies, often implemented with a separate stack with no programming restrictions.
    374374Hence, refactoring a stackless coroutine may require changing it to stackful.
    375 A foundational property of all \emph{stateful functions} is that resume/suspend \emph{do not} cause incremental stack growth, i.e., resume/suspend operations are remembered through the closure not the stack.
     375A foundational property of all \emph{stateful functions} is that resume/suspend \emph{do not} cause incremental stack growth, \ie, resume/suspend operations are remembered through the closure not the stack.
    376376As well, activating a stateful function is \emph{asymmetric} or \emph{symmetric}, identified by resume/suspend (no cycles) and resume/resume (cycles).
    377377A fixed closure activated by modified call/return is faster than a variable closure activated by context switching.
    378 Additionally, any storage management for the closure (especially in unmanaged languages, i.e., no garbage collection) must also be factored into design and performance.
     378Additionally, any storage management for the closure (especially in unmanaged languages, \ie, no garbage collection) must also be factored into design and performance.
    379379Therefore, selecting between stackless and stackful semantics is a tradeoff between programming requirements and performance, where stackless is faster and stackful is more general.
    380380Note, creation cost is amortized across usage, so activation cost is usually the dominant factor.
     
    603603the top initialization state appears at the start and the middle execution state is denoted by statement @suspend@.
    604604Any local variables in @main@ \emph{are not retained} between calls;
    605 hence local variable are only for temporary computations \emph{between} suspends.
     605hence local variables are only for temporary computations \emph{between} suspends.
    606606All retained state \emph{must} appear in the generator's type.
    607607As well, generator code containing a @suspend@ cannot be refactored into a helper function called by the generator, because @suspend@ is implemented via @return@, so a return from the helper function goes back to the current generator not the resumer.
     
    618618sout | (int)f1() | (double)f1() | f2( 2 ); // alternative interface, cast selects call based on return type, step 2 values
    619619\end{cfa}
    620 Now, the generator can be a separately-compiled opaque-type only accessed through its interface functions.
     620Now, the generator can be a separately compiled opaque-type only accessed through its interface functions.
    621621For contrast, Figure~\ref{f:PythonFibonacci} shows the equivalent Python Fibonacci generator, which does not use a generator type, and hence only has a single interface, but an implicit closure.
    622622
     
    624624(This restriction is removed by the coroutine in Section~\ref{s:Coroutine}.)
    625625This requirement follows from the generality of variable-size local-state, \eg local state with a variable-length array requires dynamic allocation because the array size is unknown at compile time.
    626 However, dynamic allocation significantly increases the cost of generator creation/destruction and is a show-stopper for embedded real-time programming.
     626However, dynamic allocation significantly increases the cost of generator creation/destruction and is a showstopper for embedded real-time programming.
    627627But more importantly, the size of the generator type is tied to the local state in the generator main, which precludes separate compilation of the generator main, \ie a generator must be inlined or local state must be dynamically allocated.
    628628With respect to safety, we believe static analysis can discriminate local state from temporary variables in a generator, \ie variable usage spanning @suspend@, and generate a compile-time error.
     
    648648\end{center}
    649649The example takes advantage of resuming a generator in the constructor to prime the loops so the first character sent for formatting appears inside the nested loops.
    650 The destructor provides a newline, if formatted text ends with a full line.
     650The destructor provides a newline if formatted text ends with a full line.
    651651Figure~\ref{f:CFormatSim} shows the C implementation of the \CFA input generator with one additional field and the computed @goto@.
    652652For contrast, Figure~\ref{f:PythonFormatter} shows the equivalent Python format generator with the same properties as the Fibonacci generator.
     
    669669In contrast, the execution state is large, with one @resume@ and seven @suspend@s.
    670670Hence, the key benefits of the generator are correctness, safety, and maintenance because the execution states are transcribed directly into the programming language rather than using a table-driven approach.
    671 Because FSMs can be complex and occur frequently in important domains, direct support of the generator is crucial in a systems programming-language.
     671Because FSMs can be complex and frequently occur in important domains, direct support of the generator is crucial in a system programming language.
    672672
    673673\begin{figure}
     
    782782The steps for symmetric control-flow are creating, executing, and terminating the cycle.
    783783Constructing the cycle must deal with definition-before-use to close the cycle, \ie, the first generator must know about the last generator, which is not within scope.
    784 (This issues occurs for any cyclic data-structure.)
     784(This issue occurs for any cyclic data structure.)
    785785% The example creates all the generators and then assigns the partners that form the cycle.
    786786% Alternatively, the constructor can assign the partners as they are declared, except the first, and the first-generator partner is set after the last generator declaration to close the cycle.
     
    792792
    793793Figure~\ref{f:CPingPongSim} shows the implementation of the symmetric generator, where the complexity is the @resume@, which needs an extension to the calling convention to perform a forward rather than backward jump.
    794 This jump starts at the top of the next generator main to re-execute the normal calling convention to make space on the stack for its local variables.
     794This jump-starts at the top of the next generator main to re-execute the normal calling convention to make space on the stack for its local variables.
    795795However, before the jump, the caller must reset its stack (and any registers) equivalent to a @return@, but subsequently jump forward.
    796796This semantics is basically a tail-call optimization, which compilers already perform.
     
    862862
    863863Finally, part of this generator work was inspired by the recent \CCtwenty generator proposal~\cite{C++20Coroutine19} (which they call coroutines).
    864 Our work provides the same high-performance asymmetric-generators as \CCtwenty, and extends their work with symmetric generators.
    865 An additional \CCtwenty generator feature allows @suspend@ and @resume@ to be followed by a restricted compound-statement that is executed after the current generator has reset its stack but before calling the next generator, specified with \CFA syntax:
     864Our work provides the same high-performance asymmetric generators as \CCtwenty, and extends their work with symmetric generators.
     865An additional \CCtwenty generator feature allows @suspend@ and @resume@ to be followed by a restricted compound statement that is executed after the current generator has reset its stack but before calling the next generator, specified with \CFA syntax:
    866866\begin{cfa}
    867867... suspend`{ ... }`;
     
    879879A coroutine is specified by replacing @generator@ with @coroutine@ for the type.
    880880Coroutine generality results in higher cost for creation, due to dynamic stack allocation, execution, due to context switching among stacks, and terminating, due to possible stack unwinding and dynamic stack deallocation.
    881 A series of different kinds of coroutines and their implementation demonstrate how coroutines extend generators.
     881A series of different kinds of coroutines and their implementations demonstrate how coroutines extend generators.
    882882
    883883First, the previous generator examples are converted to their coroutine counterparts, allowing local-state variables to be moved from the generator type into the coroutine main.
     
    11641164\end{cfa}
    11651165\end{tabular}
    1166 \caption{Producer / consumer: resume-resume cycle, bi-directional communication}
     1166\caption{Producer / consumer: resume-resume cycle, bidirectional communication}
    11671167\label{f:ProdCons}
    11681168\end{figure}
     
    12081208Furthermore, each deallocated coroutine must guarantee all destructors are run for object allocated in the coroutine type \emph{and} allocated on the coroutine's stack at the point of suspension, which can be arbitrarily deep.
    12091209When a coroutine's main ends, its stack is already unwound so any stack allocated objects with destructors have been finalized.
    1210 The na\"{i}ve semantics for coroutine-cycle termination is context switch to the last resumer, like executing a @suspend@/@return@ in a generator.
     1210The na\"{i}ve semantics for coroutine-cycle termination is to context switch to the last resumer, like executing a @suspend@/@return@ in a generator.
    12111211However, for coroutines, the last resumer is \emph{not} implicitly below the current stack frame, as for generators, because each coroutine's stack is independent.
    12121212Unfortunately, it is impossible to determine statically if a coroutine is in a cycle and unrealistic to check dynamically (graph-cycle problem).
     
    12141214
    12151215Our solution is to context switch back to the first resumer (starter) once the coroutine ends.
    1216 This semantics works well for the most common asymmetric and symmetric coroutine usage-patterns.
     1216This semantics works well for the most common asymmetric and symmetric coroutine usage patterns.
    12171217For asymmetric coroutines, it is common for the first resumer (starter) coroutine to be the only resumer.
    12181218All previous generators converted to coroutines have this property.
    1219 For symmetric coroutines, it is common for the cycle creator to persist for the life-time of the cycle.
     1219For symmetric coroutines, it is common for the cycle creator to persist for the lifetime of the cycle.
    12201220Hence, the starter coroutine is remembered on the first resume and ending the coroutine resumes the starter.
    12211221Figure~\ref{f:ProdConsRuntimeStacks} shows this semantic by the dashed lines from the end of the coroutine mains: @prod@ starts @cons@ so @cons@ resumes @prod@ at the end, and the program main starts @prod@ so @prod@ resumes the program main at the end.
     
    12851285\end{cfa}
    12861286Note, copying generators/coroutines/threads is not meaningful.
    1287 For example, both the resumer and suspender descriptors can have bi-directional pointers;
     1287For example, both the resumer and suspender descriptors can have bidirectional pointers;
    12881288copying these coroutines does not update the internal pointers so behaviour of both copies would be difficult to understand.
    12891289Furthermore, two coroutines cannot logically execute on the same stack.
    12901290A deep coroutine copy, which copies the stack, is also meaningless in an unmanaged language (no garbage collection), like C, because the stack may contain pointers to object within it that require updating for the copy.
    12911291The \CFA @dtype@ property provides no \emph{implicit} copying operations and the @is_coroutine@ trait provides no \emph{explicit} copying operations, so all coroutines must be passed by reference (pointer).
    1292 The function definitions ensures there is a statically-typed @main@ function that is the starting point (first stack frame) of a coroutine, and a mechanism to get (read) the coroutine descriptor from its handle.
     1292The function definitions ensure there is a statically typed @main@ function that is the starting point (first stack frame) of a coroutine, and a mechanism to get (read) the coroutine descriptor from its handle.
    12931293The @main@ function has no return value or additional parameters because the coroutine type allows an arbitrary number of interface functions with corresponding arbitrary typed input/output values versus fixed ones.
    12941294The advantage of this approach is that users can easily create different types of coroutines, \eg changing the memory layout of a coroutine is trivial when implementing the @get_coroutine@ function, and possibly redefining \textsf{suspend} and @resume@.
     
    13421342For a VLS stack allocation/deallocation is an inexpensive adjustment of the stack pointer, modulo any stack constructor costs (\eg initial frame setup).
    13431343For heap stack allocation, allocation/deallocation is an expensive heap allocation (where the heap can be a shared resource), modulo any stack constructor costs.
    1344 With heap stack allocation, it is also possible to use a split (segmented) stack calling-convention, available with gcc and clang, so the stack is variable sized.
     1344With heap stack allocation, it is also possible to use a split (segmented) stack calling convention, available with gcc and clang, so the stack is variable sized.
    13451345Currently, \CFA supports stack/heap allocated descriptors but only fixed-sized heap allocated stacks.
    13461346In \CFA debug-mode, the fixed-sized stack is terminated with a write-only page, which catches most stack overflows.
     
    13591359\label{s:Concurrency}
    13601360
    1361 Concurrency is nondeterministic scheduling of independent sequential execution-paths (threads), where each thread has its own stack.
     1361Concurrency is nondeterministic scheduling of independent sequential execution paths (threads), where each thread has its own stack.
    13621362A single thread with multiple call stacks, \newterm{coroutining}~\cite{Conway63,Marlin80}, does \emph{not} imply concurrency~\cite[\S~2]{Buhr05a}.
    13631363In coroutining, coroutines self-schedule the thread across stacks so execution is deterministic.
     
    13671367The transition to concurrency, even for a single thread with multiple stacks, occurs when coroutines context switch to a \newterm{scheduling coroutine}, introducing non-determinism from the coroutine perspective~\cite[\S~3,]{Buhr05a}.
    13681368Therefore, a minimal concurrency system requires coroutines \emph{in conjunction with a nondeterministic scheduler}.
    1369 The resulting execution system now follows a cooperative threading-model~\cite{Adya02,libdill}, called \newterm{non-preemptive scheduling}.
     1369The resulting execution system now follows a cooperative threading model~\cite{Adya02,libdill}, called \newterm{non-preemptive scheduling}.
    13701370Adding \newterm{preemption} introduces non-cooperative scheduling, where context switching occurs randomly between any two instructions often based on a timer interrupt, called \newterm{preemptive scheduling}.
    13711371While a scheduler introduces uncertain execution among explicit context switches, preemption introduces uncertainty by introducing implicit context switches.
     
    14971497\end{cquote}
    14981498Like coroutines, the @dtype@ property prevents \emph{implicit} copy operations and the @is_thread@ trait provides no \emph{explicit} copy operations, so threads must be passed by reference (pointer).
    1499 Similarly, the function definitions ensures there is a statically-typed @main@ function that is the thread starting point (first stack frame), a mechanism to get (read) the thread descriptor from its handle, and a special destructor to prevent deallocation while the thread is executing.
     1499Similarly, the function definitions ensure there is a statically typed @main@ function that is the thread starting point (first stack frame), a mechanism to get (read) the thread descriptor from its handle, and a special destructor to prevent deallocation while the thread is executing.
    15001500(The qualifier @mutex@ for the destructor parameter is discussed in Section~\ref{s:Monitor}.)
    15011501The difference between the coroutine and thread is that a coroutine borrows a thread from its caller, so the first thread resuming a coroutine creates the coroutine's stack and starts running the coroutine main on the stack;
     
    15121512Hence, a programmer must learn and manipulate two sets of design/programming patterns.
    15131513While this distinction can be hidden away in library code, effective use of the library still has to take both paradigms into account.
    1514 In contrast, approaches based on stateful models more closely resemble the standard call/return programming-model, resulting in a single programming paradigm.
     1514In contrast, approaches based on stateful models more closely resemble the standard call/return programming model, resulting in a single programming paradigm.
    15151515
    15161516At the lowest level, concurrent control is implemented by atomic operations, upon which different kinds of locking mechanisms are constructed, \eg semaphores~\cite{Dijkstra68b}, barriers, and path expressions~\cite{Campbell74}.
     
    15201520
    15211521One of the most natural, elegant, and efficient mechanisms for mutual exclusion and synchronization for shared-memory systems is the \emph{monitor}.
    1522 First proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}, many concurrent programming-languages provide monitors as an explicit language construct: \eg Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Modula~\cite{Modula-2}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, NeWS~\cite{NeWS}, Emerald~\cite{Emerald}, \uC~\cite{Buhr92a} and Java~\cite{Java}.
     1522First proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}, many concurrent programming languages provide monitors as an explicit language construct: \eg Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Modula~\cite{Modula-2}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, NeWS~\cite{NeWS}, Emerald~\cite{Emerald}, \uC~\cite{Buhr92a} and Java~\cite{Java}.
    15231523In addition, operating-system kernels and device drivers have a monitor-like structure, although they often use lower-level primitives such as mutex locks or semaphores to simulate monitors.
    1524 For these reasons, \CFA selected monitors as the core high-level concurrency-construct, upon which higher-level approaches can be easily constructed.
     1524For these reasons, \CFA selected monitors as the core high-level concurrency construct, upon which higher-level approaches can be easily constructed.
    15251525
    15261526
     
    15711571% (While a constructor may publish its address into a global variable, doing so generates a race-condition.)
    15721572The prefix increment operation, @++?@, is normally @mutex@, indicating mutual exclusion is necessary during function execution, to protect the incrementing from race conditions, unless there is an atomic increment instruction for the implementation type.
    1573 The assignment operators provide bi-directional conversion between an atomic and normal integer without accessing field @cnt@;
     1573The assignment operators provide bidirectional conversion between an atomic and normal integer without accessing field @cnt@;
    15741574these operations only need @mutex@, if reading/writing the implementation type is not atomic.
    15751575The atomic counter is used without any explicit mutual-exclusion and provides thread-safe semantics, which is similar to the \CC template @std::atomic@.
     
    15931593Similar safety is offered by \emph{explicit} mechanisms like \CC RAII;
    15941594monitor \emph{implicit} safety ensures no programmer usage errors.
    1595 Furthermore, RAII mechansims cannot handle complex synchronization within a monitor, where the monitor lock may not be released on function exit because it is passed to an unblocking thread;
     1595Furthermore, RAII mechanisms cannot handle complex synchronization within a monitor, where the monitor lock may not be released on function exit because it is passed to an unblocking thread;
    15961596RAII is purely a mutual-exclusion mechanism (see Section~\ref{s:Scheduling}).
    15971597
     
    16321632
    16331633The benefit of mandatory monitor qualifiers is self-documentation, but requiring both @mutex@ and \lstinline[morekeywords=nomutex]@nomutex@ for all monitor parameters is redundant.
    1634 Instead, the semantics have one qualifier as the default and the other required.
     1634Instead, the semantics has one qualifier as the default and the other required.
    16351635For example, make the safe @mutex@ qualifier the default because assuming \lstinline[morekeywords=nomutex]@nomutex@ may cause subtle errors.
    16361636Alternatively, make the unsafe \lstinline[morekeywords=nomutex]@nomutex@ qualifier the default because it is the \emph{normal} parameter semantics while @mutex@ parameters are rare.
     
    18091809Note, signalling cannot have the signaller and signalled thread in the monitor simultaneously because of the mutual exclusion, so either the signaller or signallee can proceed.
    18101810For internal scheduling, threads are unblocked from condition queues using @signal@, where the signallee is moved to urgent and the signaller continues (solid line).
    1811 Multiple signals move multiple signallees to urgent, until the condition is empty.
     1811Multiple signals move multiple signallees to urgent until the condition is empty.
    18121812When the signaller exits or waits, a thread blocked on urgent is processed before calling threads to prevent barging.
    18131813(Java conceptually moves the signalled thread to the calling queue, and hence, allows barging.)
     
    18191819The @waitfor@ has the same semantics as @signal_block@, where the signalled thread executes before the signallee, which waits on urgent.
    18201820Executing multiple @waitfor@s from different signalled functions causes the calling threads to move to urgent.
    1821 External scheduling requires urgent to be a stack, because the signaller excepts to execute immediately after the specified monitor call has exited or waited.
     1821External scheduling requires urgent to be a stack, because the signaller expects to execute immediately after the specified monitor call has exited or waited.
    18221822Internal scheduling behaves the same for an urgent stack or queue, except for multiple signalling, where the threads unblock from urgent in reverse order from signalling.
    18231823If the restart order is important, multiple signalling by a signal thread can be transformed into daisy-chain signalling among threads, where each thread signals the next thread.
     
    21432143\end{figure}
    21442144
    2145 Note, a group of conditional @waitfor@ clauses is \emph{not} the same as a group of @if@ statements, e.g.:
     2145Note, a group of conditional @waitfor@ clauses is \emph{not} the same as a group of @if@ statements, \eg:
    21462146\begin{cfa}
    21472147if ( C1 ) waitfor( mem1 );                       when ( C1 ) waitfor( mem1 );
     
    22562256\label{s:LooseObjectDefinitions}
    22572257
    2258 In an object-oriented programming-language, a class includes an exhaustive list of operations.
     2258In an object-oriented programming language, a class includes an exhaustive list of operations.
    22592259A new class can add members via static inheritance but the subclass still has an exhaustive list of operations.
    22602260(Dynamic member adding, \eg JavaScript~\cite{JavaScript}, is not considered.)
     
    26712671\subsection{Preemption}
    26722672
    2673 Nondeterministic preemption provides fairness from long running threads, and forces concurrent programmers to write more robust programs, rather than relying on code between cooperative scheduling to be atomic.
     2673Nondeterministic preemption provides fairness from long-running threads, and forces concurrent programmers to write more robust programs, rather than relying on code between cooperative scheduling to be atomic.
    26742674This atomic reliance can fail on multi-core machines, because execution across cores is nondeterministic.
    26752675A different reason for not supporting preemption is that it significantly complicates the runtime system, \eg Microsoft runtime does not support interrupts and on Linux systems, interrupts are complex (see below).
    2676 Preemption is normally handled by setting a count-down timer on each virtual processor.
    2677 When the timer expires, an interrupt is delivered, and the interrupt handler resets the count-down timer, and if the virtual processor is executing in user code, the signal handler performs a user-level context-switch, or if executing in the language runtime-kernel, the preemption is ignored or rolled forward to the point where the runtime kernel context switches back to user code.
     2676Preemption is normally handled by setting a countdown timer on each virtual processor.
     2677When the timer expires, an interrupt is delivered, and the interrupt handler resets the countdown timer, and if the virtual processor is executing in user code, the signal handler performs a user-level context-switch, or if executing in the language runtime kernel, the preemption is ignored or rolled forward to the point where the runtime kernel context switches back to user code.
    26782678Multiple signal handlers may be pending.
    26792679When control eventually switches back to the signal handler, it returns normally, and execution continues in the interrupted user thread, even though the return from the signal handler may be on a different kernel thread than the one where the signal is delivered.
     
    26852685\begin{cquote}
    26862686A process-directed signal may be delivered to any one of the threads that does not currently have the signal blocked.
    2687 If more than one of the threads has the signal unblocked, then the kernel chooses an arbitrary thread to which to deliver the signal.
     2687If more than one of the threads has the signal unblocked, then the kernel chooses an arbitrary thread to which it will deliver the signal.
    26882688SIGNAL(7) - Linux Programmer's Manual
    26892689\end{cquote}
     
    26912691To ensure each virtual processor receives a preemption signal, a discrete-event simulation is run on a special virtual processor, and only it sets and receives timer events.
    26922692Virtual processors register an expiration time with the discrete-event simulator, which is inserted in sorted order.
    2693 The simulation sets the count-down timer to the value at the head of the event list, and when the timer expires, all events less than or equal to the current time are processed.
     2693The simulation sets the countdown timer to the value at the head of the event list, and when the timer expires, all events less than or equal to the current time are processed.
    26942694Processing a preemption event sends an \emph{internal} @SIGUSR1@ signal to the registered virtual processor, which is always delivered to that processor.
    26952695
     
    29132913\paragraph{Mutual-Exclusion}
    29142914
    2915 Uncontented mutual exclusion, which occurs frequently, is measured by entering/leaving a critical section.
     2915Uncontented mutual exclusion, which frequently occurs, is measured by entering/leaving a critical section.
    29162916For monitors, entering and leaving a monitor function is measured.
    29172917To put the results in context, the cost of entering a non-inline function and the cost of acquiring and releasing a @pthread_mutex@ lock is also measured.
Note: See TracChangeset for help on using the changeset viewer.