Changeset bd12159 for doc


Ignore:
Timestamp:
Jun 17, 2019, 3:35:22 PM (2 years ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
arm-eh, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr
Children:
89faa82
Parents:
b4d34fa
Message:

complete draft for second version of concurrency paper

Location:
doc/papers/concurrency
Files:
3 added
3 edited

Legend:

Unmodified
Added
Removed
  • doc/papers/concurrency/Makefile

    rb4d34fa rbd12159  
    2727FullCoroutinePhases \
    2828corlayout \
     29CondSigWait \
    2930monitor \
    3031ext_monitor \
  • doc/papers/concurrency/Paper.tex

    rb4d34fa rbd12159  
    1515\usepackage{epic,eepic}
    1616\usepackage{xspace}
     17\usepackage{enumitem}
    1718\usepackage{comment}
    1819\usepackage{upquote}                                            % switch curled `'" to straight
     
    2324\usepackage{dcolumn}                                            % align decimal points in tables
    2425\usepackage{capt-of}
     26\setlength{\multicolsep}{6.0pt plus 2.0pt minus 1.5pt}
    2527
    2628\hypersetup{breaklinks=true}
     
    271273\abstract[Summary]{
    272274\CFA is a polymorphic, non-object-oriented, concurrent, backwards-compatible extension of the C programming language.
    273 This paper discusses the design philosophy and implementation of its advanced control-flow and concurrent/parallel features, along with the supporting runtime.
    274 These features are created from scratch as ISO C has only low-level and/or unimplemented concurrency, so C programmers continue to rely on library features like C pthreads.
     275This paper discusses the design philosophy and implementation of its advanced control-flow and concurrent/parallel features, along with the supporting runtime written in \CFA.
     276These features are created from scratch as ISO C has only low-level and/or unimplemented concurrency, so C programmers continue to rely on library features like pthreads.
    275277\CFA introduces modern language-level control-flow mechanisms, like generators, coroutines, user-level threading, and monitors for mutual exclusion and synchronization.
    276278% Library extension for executors, futures, and actors are built on these basic mechanisms.
    277 The runtime provides significant programmer simplification and safety by eliminating spurious wakeup and optional monitor barging.
     279The runtime provides significant programmer simplification and safety by eliminating spurious wakeup and monitor barging.
    278280The runtime also ensures multiple monitors can be safely acquired \emph{simultaneously} (deadlock free), and this feature is fully integrated with all monitor synchronization mechanisms.
    279281All control-flow features integrate with the \CFA polymorphic type-system and exception handling, while respecting the expectations and style of C programmers.
     
    292294\section{Introduction}
    293295
    294 This paper discusses the design philosophy and implementation of advanced language-level control-flow and concurrent/parallel features in \CFA~\cite{Moss18} and its runtime.
     296This paper discusses the design philosophy and implementation of advanced language-level control-flow and concurrent/parallel features in \CFA~\cite{Moss18,Cforall} and its runtime, which is written entirely in \CFA.
    295297\CFA is a modern, polymorphic, non-object-oriented\footnote{
    296298\CFA has features often associated with object-oriented programming languages, such as constructors, destructors, virtuals and simple inheritance.
    297299However, functions \emph{cannot} be nested in structures, so there is no lexical binding between a structure and set of functions (member/method) implemented by an implicit \lstinline@this@ (receiver) parameter.},
    298300backwards-compatible extension of the C programming language.
    299 In many ways, \CFA is to C as Scala~\cite{Scala} is to Java, providing a \emph{research vehicle} for new typing and control flow capabilities on top of a highly popular programming language allowing immediate dissemination.
    300 Within the \CFA framework, new control-flow features are created from scratch.
    301 ISO \Celeven defines only a subset of the \CFA extensions, where the overlapping features are concurrency~\cite[\S~7.26]{C11}.
    302 However, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}.
    303 Furthermore, \Celeven and pthreads concurrency is simple, based on thread fork/join in a function and a few locks, which is low-level and error prone;
     301In many ways, \CFA is to C as Scala~\cite{Scala} is to Java, providing a \emph{research vehicle} for new typing and control-flow capabilities on top of a highly popular programming language allowing immediate dissemination.
     302Within the \CFA framework, new control-flow features are created from scratch because ISO \Celeven defines only a subset of the \CFA extensions, where the overlapping features are concurrency~\cite[\S~7.26]{C11}.
     303However, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}, and \Celeven and pthreads concurrency is simple, based on thread fork/join in a function and a few locks, which is low-level and error prone;
    304304no high-level language concurrency features are defined.
    305305Interestingly, almost a decade after publication of the \Celeven standard, neither gcc-8, clang-8 nor msvc-19 (most recent versions) support the \Celeven include @threads.h@, indicating little interest in the C11 concurrency approach.
     
    314314The main argument for user-level threading is that they are lighter weight than kernel threads (locking and context switching do not cross the kernel boundary), so there is less restriction on programming styles that encourage large numbers of threads performing medium work-units to facilitate load balancing by the runtime~\cite{Verch12}.
    315315As well, user-threading facilitates a simpler concurrency approach using thread objects that leverage sequential patterns versus events with call-backs~\cite{vonBehren03}.
    316 Finally, performant user-threading implementations (both time and space) are largely competitive with direct kernel-threading implementations, while achieving the programming advantages of high concurrency levels and safety.
     316Finally, performant user-threading implementations (both time and space) meet or exceed direct kernel-threading implementations, while achieving the programming advantages of high concurrency levels and safety.
    317317
    318318A further effort over the past two decades is the development of language memory-models to deal with the conflict between language features and compiler/hardware optimizations, i.e., some language features are unsafe in the presence of aggressive sequential optimizations~\cite{Buhr95a,Boehm05}.
     
    324324The key feature that dovetails with this paper is non-local exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++}
    325325} and coroutines.
    326 Finally, solutions in the language allows matching constructs with language paradigm, i.e., imperative and functional languages have different presentations of the same concept.
     326Finally, language solutions allow matching constructs with language paradigm, i.e., imperative and functional languages often have different presentations of the same concept to fit their programming model.
    327327
    328328Finally, it is important for a language to provide safety over performance \emph{as the default}, allowing careful reduction of safety for performance when necessary.
    329329Two concurrency violations of this philosophy are \emph{spurious wakeup} and \emph{barging}, i.e., random wakeup~\cite[\S~8]{Buhr05a} and signals-as-hints~\cite[\S~8]{Buhr05a}, where one is a consequence of the other, i.e., once there is spurious wakeup, signals-as-hints follows.
    330330However, spurious wakeup is \emph{not} a foundational concurrency property~\cite[\S~8]{Buhr05a}, it is a performance design choice.
    331 Similarly, signals-as-hints is also a performance decision.
     331Similarly, signals-as-hints is often a performance decision.
    332332We argue removing spurious wakeup and signals-as-hints makes concurrent programming significantly safer because it removes local non-determinism and matches with programmer expectation.
    333 (Experience by the authors teaching concurrency is that students are highly confused by these semantics.)
     333(Authors experience teaching concurrency is that students are highly confused by these semantics.)
    334334Clawing back performance, when local non-determinism is unimportant, should be an option not the default.
    335335
    336336\begin{comment}
    337 For example, it is possible to provide exceptions, coroutines, monitors, and tasks as specialized types in an object-oriented language, integrating these constructs to allow leveraging the type-system (static type-checking) and all other object-oriented capabilities~\cite{uC++}.
    338 It is also possible to leverage call/return for blocking communication via new control structures, versus switching to alternative communication paradigms, like channels or message passing.
    339 As well, user threading is often a complementary feature, allowing light-weight threading to match with low-cost objects, while hiding the application/kernel boundary.
    340 User threading also allows layering of implicit concurrency models (no explicit thread creation), such executors, data-flow, actors, into a single language, so programmers can chose the model that best fits an algorithm.\footnote{
    341 All implicit concurrency models have explicit threading in their implementation, and hence, can be build from explicit threading;
    342 however, the reverse is seldom true, i.e., given implicit concurrency, e.g., actors, it is virtually impossible to create explicit concurrency, e.g., blocking thread objects.}
    343 Finally, with extended language features and user-level threading it is possible to discretely fold locking and non-blocking I/O multiplexing into the language's I/O libraries, so threading implicitly dovetails with the I/O subsystem.
    344 \CFA embraces language extensions and user-level threading to provide advanced control-flow (exception handling\footnote{
    345 \CFA exception handling will be presented in a separate paper.
    346 The key feature that dovetails with this paper is non-local exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++}
    347 } and coroutines) and concurrency.
    348 
    349337Most augmented traditional (Fortran 18~\cite{Fortran18}, Cobol 14~\cite{Cobol14}, Ada 12~\cite{Ada12}, Java 11~\cite{Java11}) and new languages (Go~\cite{Go}, Rust~\cite{Rust}, and D~\cite{D}), except \CC, diverge from C with different syntax and semantics, only interoperate indirectly with C, and are not systems languages, for those with managed memory.
    350338As a result, there is a significant learning curve to move to these languages, and C legacy-code must be rewritten.
     
    355343
    356344\CFA embraces user-level threading, language extensions for advanced control-flow, and safety as the default.
    357 We present comparative examples so the reader can judge if the \CFA control-flow extensions are better and safer than those in or proposed for \Celeven, \CC, and other concurrent, imperative programming-languages, and perform experiments to show the \CFA runtime is competitive with other similar mechanisms.
     345We present comparative examples so the reader can judge if the \CFA control-flow extensions are better and safer than those in other concurrent, imperative programming-languages, and perform experiments to show the \CFA runtime is competitive with other similar mechanisms.
    358346The main contributions of this work are:
    359347\begin{itemize}
     
    609597The name \lstinline|main| has special meaning in C, specifically the function where a program starts execution.
    610598Hence, overloading this name for other starting points (generator/coroutine/thread) is a logical extension.}
    611 which takes as its only parameter a reference to the generator type, called a \emph{generator main}.
     599called a \emph{generator main},which takes as its only parameter a reference to the generator type.
    612600The generator main contains @suspend@ statements that suspend execution without ending the generator versus @return@.
    613601For the Fibonacci generator-main,\footnote{
     
    635623
    636624Having to manually create the generator closure by moving local-state variables into the generator type is an additional programmer burden.
    637 (This restriction is removed by the coroutine, see Section~\ref{s:Coroutine}.)
     625(This restriction is removed by the coroutine in Section~\ref{s:Coroutine}.)
    638626This requirement follows from the generality of variable-size local-state, \eg local state with a variable-length array requires dynamic allocation because the array size is unknown at compile time.
    639627However, dynamic allocation significantly increases the cost of generator creation/destruction and is a show-stopper for embedded real-time programming.
     
    678666
    679667Note, the cost of creating and resuming the device-driver generator, @Driver@, is virtually identical to call/return, so performance in an operating-system kernel is excellent.
    680 As well, the data state is small, where variables @byte@ and @msg@ are communication variables for passing in message bytes and returning the message, and variables @lnth@, @crc@, and @sum@ are local variable that must be retained between calls and are hoisted into the generator type.
    681 Manually, detecting and hoisting local-state variables is easy when the number is small.
     668As well, the data state is small, where variables @byte@ and @msg@ are communication variables for passing in message bytes and returning the message, and variables @lnth@, @crc@, and @sum@ are local variable that must be retained between calls and are manually hoisted into the generator type.
     669% Manually, detecting and hoisting local-state variables is easy when the number is small.
    682670Finally, the execution state is large, with one @resume@ and seven @suspend@s.
    683 Hence, the key benefits of the generator are correctness, safety, and maintenance because the execution states of the FSM are transcribed directly into the programming language rather than using a table-driven approach.
     671Hence, the key benefits of the generator are correctness, safety, and maintenance because the execution states are transcribed directly into the programming language rather than using a table-driven approach.
    684672Because FSMs can be complex and occur frequently in important domains, direct support of the generator is crucial in a systems programming-language.
    685673
     
    806794Figure~\ref{f:CPingPongSim} shows the implementation of the symmetric generator, where the complexity is the @resume@, which needs an extension to the calling convention to perform a forward rather than backward jump.
    807795This jump starts at the top of the next generator main to re-execute the normal calling convention to make space on the stack for its local variables.
    808 However, before the jump, the caller must reset its stack (and any registers) equivalent to a @return@, but subsequently jump forward not backwards.
     796However, before the jump, the caller must reset its stack (and any registers) equivalent to a @return@, but subsequently jump forward.
    809797This semantics is basically a tail-call optimization, which compilers already perform.
    810 Hence, assembly code is manually insert in the example to undo the generator's entry code before the direct jump.
    811 This assembly code is fragile as it depends on what entry code is generated, specifically if there are local variables and the level of optimization.
     798The example shows the assembly code to undo the generator's entry code before the direct jump.
     799This assembly code depends on what entry code is generated, specifically if there are local variables and the level of optimization.
    812800To provide this new calling convention requires a mechanism built into the compiler, which is beyond the scope of \CFA at this time.
    813801Nevertheless, it is possible to hand generate any symmetric generators for proof of concept and performance testing.
     
    876864Finally, part of this generator work was inspired by the recent \CCtwenty generator proposal~\cite{C++20Coroutine19} (which they call coroutines).
    877865Our work provides the same high-performance asymmetric-generators as \CCtwenty, and extends their work with symmetric generators.
    878 An additional \CCtwenty generator feature allows @suspend@ and @resume@ to be followed by a restricted compound-statement that is executed after the current generator has reset its stack but before calling the next generator, specified with the following \CFA syntax.
     866An additional \CCtwenty generator feature allows @suspend@ and @resume@ to be followed by a restricted compound-statement that is executed after the current generator has reset its stack but before calling the next generator, specified with \CFA syntax:
    879867\begin{cfa}
    880868... suspend`{ ... }`;
     
    891879Stackful coroutines extend generator semantics, \ie there is an implicit closure and @suspend@ may appear in a helper function called from the coroutine main.
    892880A coroutine is specified by replacing @generator@ with @coroutine@ for the type.
    893 This generality results in higher cost for creation, due to dynamic stack allocation, execution, due to context switching among stacks, and terminating, due to possible stack unwinding and dynamic stack deallocation.
     881Coroutine generality results in higher cost for creation, due to dynamic stack allocation, execution, due to context switching among stacks, and terminating, due to possible stack unwinding and dynamic stack deallocation.
    894882A series of different kinds of coroutines and their implementation demonstrate how coroutines extend generators.
    895883
     
    12491237Hence, if a coroutine's destructor detects the coroutine is not ended, it implicitly raises a cancellation exception (uncatchable exception) at the coroutine and resumes it so the cancellation exception can propagate to the root of the coroutine's stack destroying all local variable on the stack.
    12501238So the \CFA semantics for the generator and coroutine, ensure both can be safely deallocated at any time, regardless of their current state, like any other aggregate object.
    1251 Explicitly raising normal exceptions at another coroutine can replace flag variables, like @stop@, \eg @prod@ raises a @stop@ exception at @cons@ after it finishes generating values and resumes @cons@, which catches the @stop@exception to terminate its loop.
     1239Explicitly raising normal exceptions at another coroutine can replace flag variables, like @stop@, \eg @prod@ raises a @stop@ exception at @cons@ after it finishes generating values and resumes @cons@, which catches the @stop@ exception to terminate its loop.
    12521240
    12531241Finally, there is an interesting effect for @suspend@ with symmetric coroutines.
     
    12601248\subsection{(Generator) Coroutine Implementation}
    12611249
    1262 A significant implementation challenge for generators/coroutines (and threads, see Section~\ref{s:threads}) is adding extra fields to the custom types and related functions, \eg inserting code after/before the coroutine constructor/destructor and @main@ to create/initialize/de-initialize/destroy any extra fields, \eg stack.
    1263 There are several solutions to these problem, which follow from the object-oriented-flavoured choice to adopt custom types.
     1250A significant implementation challenge for generators/coroutines (and threads in Section~\ref{s:threads}) is adding extra fields to the custom types and related functions, \eg inserting code after/before the coroutine constructor/destructor and @main@ to create/initialize/de-initialize/destroy any extra fields, \eg stack.
     1251There are several solutions to these problem, which follow from the object-oriented flavour of adopting custom types.
    12641252
    12651253For object-oriented languages, inheritance is used to provide extra fields and code via explicit inheritance:
     
    12701258As well, some special properties are not handled by existing language semantics, \eg the execution of constructors/destructors is in the wrong order to implicitly start threads because the thread must start \emph{after} all constructors as it relies on a completely initialized object, but the inherited constructor runs \emph{before} the derived.
    12711259Alternatives, such as explicitly starting threads as in Java, are repetitive and forgetting to call start is a common source of errors.
    1272 
    12731260An alternative is composition:
    12741261\begin{cfa}
     
    12841271The downside of this approach is that it makes custom types a special case in the language.
    12851272Users wanting to extend custom types or build their own can only do so in ways offered by the language.
    1286 Furthermore, implementing custom types without language supports may display the power of a programming language.
    1287 \CFA blends the two approaches, providing custom type for idiomatic \CFA code, extending and building new custom types is still possible, similar to Java approach with builtin and library concurrency.
    1288 
    1289 Part of the mechanism to generalize custom types is the \CFA trait, \eg the definition for custom-type @coroutine@ is anything satisfying the trait @is_coroutine@, and this trait both enforces and restricts the coroutine-interface functions.
     1273Furthermore, implementing custom types without language support may display the power of a programming language.
     1274\CFA blends the two approaches, providing custom type for idiomatic \CFA code, while extending and building new custom types is still possible, similar to Java concurrency with builtin and library.
     1275
     1276Part of the mechanism to generalize custom types is the \CFA trait~\cite[\S~2.3]{Moss18}, \eg the definition for custom-type @coroutine@ is anything satisfying the trait @is_coroutine@, and this trait both enforces and restricts the coroutine-interface functions.
    12901277\begin{cfa}
    12911278trait is_coroutine( `dtype` T ) {
     
    12931280        coroutine_desc * get_coroutine( T & );
    12941281};
    1295 forall( `dtype` T | is_coroutine(T) ) void $suspend$( T & );
    1296 forall( `dtype` T | is_coroutine(T) ) void resume( T & );
     1282forall( `dtype` T | is_coroutine(T) ) void $suspend$( T & ), resume( T & );
    12971283\end{cfa}
    12981284Note, copying generators/coroutines/threads is not meaningful.
     
    13011287Furthermore, two coroutines cannot logically execute on the same stack.
    13021288A deep coroutine copy, which copies the stack, is also meaningless in an unmanaged language (no garbage collection), like C, because the stack may contain pointers to object within it that require updating for the copy.
    1303 Now, the @dtype@ property provides no \emph{implicit} copying operations and the @is_coroutine@ trait provides no \emph{explicit} copying operations, so all coroutines must be passed by reference (pointer).
     1289The \CFA @dtype@ property provides no \emph{implicit} copying operations and the @is_coroutine@ trait provides no \emph{explicit} copying operations, so all coroutines must be passed by reference (pointer).
    13041290The function definitions ensures there is a statically-typed @main@ function that is the starting point (first stack frame) of a coroutine, and a mechanism to get (read) the currently executing coroutine handle.
    13051291The @main@ function has no return value or additional parameters because the coroutine type allows an arbitrary number of interface functions with corresponding arbitrary typed input/output values versus fixed ones.
    13061292The advantage of this approach is that users can easily create different types of coroutines, \eg changing the memory layout of a coroutine is trivial when implementing the @get_coroutine@ function, and possibly redefining @suspend@ and @resume@.
    13071293
    1308 The \CFA custom-type @coroutine@ implicitly implements the getter and forward declarations required for implementing the coroutine main:
     1294The \CFA custom-type @coroutine@ implicitly implements the getter and forward declarations for the coroutine main.
    13091295\begin{cquote}
    13101296\begin{tabular}{@{}ccc@{}}
     
    13421328\end{tabular}
    13431329\end{cquote}
    1344 The combination of custom types and fundamental @trait@ description of these types allows an concise specification for programmers and tools, while more advanced programmers can have tighter control over memory layout and initialization.
     1330The combination of custom types and fundamental @trait@ description of these types allows a concise specification for programmers and tools, while more advanced programmers can have tighter control over memory layout and initialization.
    13451331
    13461332Figure~\ref{f:CoroutineMemoryLayout} shows different memory-layout options for a coroutine (where a task is similar).
    1347 The coroutine handle is the @coroutine@ instance containing programmer specified global/communication variables across interface functions.
     1333The coroutine handle is the @coroutine@ instance containing programmer specified type global/communication variables across interface functions.
    13481334The coroutine descriptor contains all implicit declarations needed by the runtime, \eg @suspend@/@resume@, and can be part of the coroutine handle or separate.
    1349 The coroutine stack can appear in a number of locations and fixed or variable sized.
     1335The coroutine stack can appear in a number of locations and be fixed or variable sized.
    13501336Hence, the coroutine's stack could be a VLS\footnote{
    13511337We are examining variable-sized structures (VLS), where fields can be variable-sized structures or arrays.
     
    13551341For heap stack allocation, allocation/deallocation is an expensive heap allocation (where the heap can be a shared resource), modulo any stack constructor costs.
    13561342With heap stack allocation, it is also possible to use a split (segmented) stack calling-convention, available with gcc and clang, so the stack is variable sized.
    1357 Currently, \CFA supports stack/heap allocated descriptors but only fixed-sized heap allocated stacks;
     1343Currently, \CFA supports stack/heap allocated descriptors but only fixed-sized heap allocated stacks.
    13581344In \CFA debug-mode, the fixed-sized stack is terminated with a write-only page, which catches most stack overflows.
    13591345Experience teaching concurrency with \uC~\cite{CS343} shows fixed-sized stacks are rarely an issue for students.
    1360 Split-stack allocation is under development but requires recompilation of existing code, which may be impossible.
     1346Split-stack allocation is under development but requires recompilation of legacy code, which may be impossible.
    13611347
    13621348\begin{figure}
     
    13771363However, coroutines are a stepping stone towards concurrency.
    13781364
    1379 The transition to concurrency, even for a single thread with multiple stacks, occurs when coroutines context switch to a \newterm{scheduling coroutine}, introducing non-determinism from the coroutine perspective~\cite[\S~3]{Buhr05a}.
     1365The transition to concurrency, even for a single thread with multiple stacks, occurs when coroutines context switch to a \newterm{scheduling coroutine}, introducing non-determinism from the coroutine perspective~\cite[\S~3,]{Buhr05a}\cite{Adya02}.
    13801366Therefore, a minimal concurrency system requires coroutines \emph{in conjunction with a nondeterministic scheduler}.
    13811367The resulting execution system now follows a cooperative threading-model, called \newterm{non-preemptive scheduling}.
     
    13911377For stackless, the scheduler performs scheduling on the stack of the current coroutine and switches directly to the next coroutine, so there is one context switch.
    13921378For stackful, the current coroutine switches to the scheduler, which performs scheduling, and it then switches to the next coroutine, so there are two context switches.
    1393 A stackful scheduler is often used for simplicity and security.
     1379The \CFA runtime uses a stackful scheduler for uniformity and security.
    13941380
    13951381
     
    14021388\begin{tabular}{@{}lll@{}}
    14031389\multicolumn{1}{c}{\textbf{Java}} & \multicolumn{1}{c}{\textbf{\Celeven}} & \multicolumn{1}{c}{\textbf{pthreads}} \\
    1404 \begin{cfa}[aboveskip=0pt,belowskip=0pt]
     1390\begin{cfa}
    14051391class MyTask extends Thread {...}
    14061392mytask t = new MyTask(...);
     
    14101396\end{cfa}
    14111397&
    1412 \begin{cfa}[aboveskip=0pt,belowskip=0pt]
     1398\begin{cfa}
    14131399class MyTask { ... } // functor
    14141400MyTask mytask;
     
    14181404\end{cfa}
    14191405&
    1420 \begin{cfa}[aboveskip=0pt,belowskip=0pt]
     1406\begin{cfa}
    14211407void * rtn( void * arg ) {...}
    14221408pthread_t t;  int i = 3;
     
    14871473\subsection{Thread Implementation}
    14881474
    1489 Threads in \CFA are user level run by runtime kernel threads (see Section~\ref{s:Parallelism}), where user threads provide concurrency and kernel threads provide parallelism.
     1475Threads in \CFA are user level run by runtime kernel threads (see Section~\ref{s:CFARuntimeStructure}), where user threads provide concurrency and kernel threads provide parallelism.
    14901476Like coroutines, and for the same design reasons, \CFA provides a custom @thread@ type and a @trait@ to enforce and restrict the task-interface functions.
    14911477\begin{cquote}
     
    15111497Similarly, the function definitions ensures there is a statically-typed @main@ function that is the thread starting point (first stack frame), a mechanism to get (read) the currently executing thread handle, and a special destructor to prevent deallocation while the thread is executing.
    15121498(The qualifier @mutex@ for the destructor parameter is discussed in Section~\ref{s:Monitor}.)
    1513 The difference between the coroutine and thread is that a coroutine borrows a thread from its caller, so the first thread resuming a coroutine creates the coroutine's stack starts running the coroutine main on the stack;
     1499The difference between the coroutine and thread is that a coroutine borrows a thread from its caller, so the first thread resuming a coroutine creates the coroutine's stack and starts running the coroutine main on the stack;
    15141500whereas, a thread is scheduling for execution in @main@ immediately after its constructor is run.
    15151501No return value or additional parameters are necessary for this function because the @thread@ type allows an arbitrary number of interface functions with corresponding arbitrary typed input/output values.
    15161502
    1517 \begin{comment} % put in appendix with coroutine version ???
    1518 As such the @main@ function of a thread can be defined as
    1519 \begin{cfa}
    1520 thread foo {};
    1521 
    1522 void main(foo & this) {
    1523         sout | "Hello World!";
    1524 }
    1525 \end{cfa}
    1526 
    1527 In this example, threads of type @foo@ start execution in the @void main(foo &)@ function, which prints @"Hello World!".@ While this paper encourages this approach to enforce strongly typed programming, users may prefer to use the function-based thread semantics for the sake of simplicity.
    1528 With the static semantics it is trivial to write a thread type that takes a function pointer as a parameter and executes it on its stack asynchronously.
    1529 \begin{cfa}
    1530 typedef void (*voidRtn)(int);
    1531 
    1532 thread RtnRunner {
    1533         voidRtn func;
    1534         int arg;
    1535 };
    1536 
    1537 void ?{}(RtnRunner & this, voidRtn inRtn, int arg) {
    1538         this.func = inRtn;
    1539         this.arg  = arg;
    1540 }
    1541 
    1542 void main(RtnRunner & this) {
    1543         // thread starts here and runs the function
    1544         this.func( this.arg );
    1545 }
    1546 
    1547 void hello(/*unused*/ int) {
    1548         sout | "Hello World!";
    1549 }
    1550 
    1551 int main() {
    1552         RtnRunner f = {hello, 42};
    1553         return 0?
    1554 }
    1555 \end{cfa}
    1556 A consequence of the strongly typed approach to main is that memory layout of parameters and return values to/from a thread are now explicitly specified in the \textbf{API}.
    1557 \end{comment}
    1558 
    15591503
    15601504\section{Mutual Exclusion / Synchronization}
    15611505
    1562 Uncontrolled nondeterministic execution produces meaningless results.
    1563 To produce meaningful execution requires clawing back some determinism using mutual exclusion and synchronization, where mutual exclusion provides access-control for threads using shared data, and synchronization is a timing relationship among threads~\cite[\S~4]{Buhr05a}.
     1506Unrestricted nondeterminism is meaningless as there is no way to know when the result is completed without synchronization.
     1507To produce meaningful execution requires clawing back some determinism using mutual exclusion and synchronization, where mutual exclusion provides access control for threads using shared data, and synchronization is a timing relationship among threads~\cite[\S~4]{Buhr05a}.
    15641508Some concurrent systems eliminate mutable shared-state by switching to stateless communication like message passing~\cite{Thoth,Harmony,V-Kernel,MPI} (Erlang, MPI), channels~\cite{CSP} (CSP,Go), actors~\cite{Akka} (Akka, Scala), or functional techniques (Haskell).
    1565 However, in call/return-based languages, these approaches force a clear distinction, \ie introduce a new programming paradigm between regular and concurrent computation, \eg function call versus message passing.
     1509However, these approaches introduce a new communication mechanism for concurrency different from the standard communication using function call/return.
    15661510Hence, a programmer must learn and manipulate two sets of design/programming patterns.
    15671511While this distinction can be hidden away in library code, effective use of the library still has to take both paradigms into account.
     
    15841528The generalization is called a \newterm{group critical-section}~\cite{Joung00}, where multiple tasks with the same session may use the resource simultaneously, but different sessions may not use the resource simultaneously.
    15851529The readers/writer problem~\cite{Courtois71} is an instance of a group critical-section, where readers share a session but writers have a unique session.
    1586 \newterm{Mutual exclusion} enforces the correct kind and number of threads use a critical section.
     1530\newterm{Mutual exclusion} enforces the correct kind and number of threads using a critical section.
    15871531
    15881532However, many solutions exist for mutual exclusion, which vary in terms of performance, flexibility and ease of use.
     
    16121556
    16131557A \textbf{monitor} is a set of functions that ensure mutual exclusion when accessing shared state.
    1614 More precisely, a monitor is a programming technique that binds mutual exclusion to function scope, as opposed to locks, where mutual-exclusion is defined by acquire/release calls, independent of lexical context (analogous to block and heap storage allocation).
    1615 The strong association with the call/return paradigm eases programming, comprehension, and maintenance, at a slight cost in flexibility and efficiency.
     1558More precisely, a monitor is a programming technique that implicitly binds mutual exclusion to static function scope, as opposed to locks, where mutual-exclusion is defined by acquire/release calls, independent of lexical context (analogous to block and heap storage allocation).
     1559Restricting acquire/release points eases programming, comprehension, and maintenance, at a slight cost in flexibility and efficiency.
    16161560\CFA uses a custom @monitor@ type and leverages declaration semantics (deallocation) to protect active or waiting threads in a monitor.
    16171561
    16181562The following is a \CFA monitor implementation of an atomic counter.
    1619 \newpage
    16201563\begin{cfa}[morekeywords=nomutex]
    16211564`monitor` Aint { int cnt; }; $\C[4.25in]{// atomic integer counter}$
    16221565int ++?( Aint & `mutex`$\(_{opt}\)$ this ) with( this ) { return ++cnt; } $\C{// increment}$
    1623 int ?=?( Aint & `mutex`$\(_{opt}\)$ lhs, int rhs ) with( lhs ) { cnt = rhs; } $\C{// conversions}\CRT$
     1566int ?=?( Aint & `mutex`$\(_{opt}\)$ lhs, int rhs ) with( lhs ) { cnt = rhs; } $\C{// conversions with int}\CRT$
    16241567int ?=?( int & lhs, Aint & `mutex`$\(_{opt}\)$ rhs ) with( rhs ) { lhs = cnt; }
    16251568\end{cfa}
     
    16461589}
    16471590\end{cfa}
    1648 \CFA monitors also ensure the monitor lock is always released regardless of how an acquiring function ends (normal or exceptional), and returning a shared variable is safe via copying before the lock is released.
    1649 Similar safety is offered by explicit mechanisms like \CC RAII;
    1650 monitor implicit safety ensures no programmer error.
    1651 Furthermore, RAII mechansims cannot handle complex synchronization within a monitor, where the monitor lock may not be released on function exit but passed to an unblocking thread.
     1591\CFA monitors also ensure the monitor lock is released regardless of how an acquiring function ends (normal or exceptional), and returning a shared variable is safe via copying before the lock is released.
     1592Similar safety is offered by \emph{explicit} mechanisms like \CC RAII;
     1593monitor \emph{implicit} safety ensures no programmer usage errors.
     1594Furthermore, RAII mechansims cannot handle complex synchronization within a monitor, where the monitor lock may not be released on function exit because it is passed to an unblocking thread;
    16521595RAII is purely a mutual-exclusion mechanism (see Section~\ref{s:Scheduling}).
    16531596
     
    16731616\end{tabular}
    16741617\end{cquote}
    1675 Like before, the @dtype@ property prevents \emph{implicit} copy operations and the @is_monitor@ trait provides no \emph{explicit} copy operations, so monitors must be passed by reference (pointer).
     1618The @dtype@ property prevents \emph{implicit} copy operations and the @is_monitor@ trait provides no \emph{explicit} copy operations, so monitors must be passed by reference (pointer).
    16761619% Copying a lock is insecure because it is possible to copy an open lock and then use the open copy when the original lock is closed to simultaneously access the shared data.
    16771620% Copying a monitor is secure because both the lock and shared data are copies, but copying the shared data is meaningless because it no longer represents a unique entity.
    16781621Similarly, the function definitions ensures there is a mechanism to get (read) the currently executing monitor handle, and a special destructor to prevent deallocation if a thread using the shared data.
    1679 The custom monitor type also inserts any locking vehicles needed to implement the monitor locking semnatics.
     1622The custom monitor type also inserts any locks needed to implement the mutual exclusion semantics.
    16801623
    16811624
     
    16881631
    16891632The benefit of mandatory monitor qualifiers is self-documentation, but requiring both @mutex@ and \lstinline[morekeywords=nomutex]@nomutex@ for all monitor parameters is redundant.
    1690 Instead, the semantics have one qualifier as the default, and the other required.
     1633Instead, the semantics have one qualifier as the default and the other required.
    16911634For example, make the safe @mutex@ qualifier the default because assuming \lstinline[morekeywords=nomutex]@nomutex@ may cause subtle errors.
    16921635Alternatively, make the unsafe \lstinline[morekeywords=nomutex]@nomutex@ qualifier the default because it is the \emph{normal} parameter semantics while @mutex@ parameters are rare.
    16931636Providing a default qualifier implies knowing whether a parameter is a monitor.
    1694 Since \CFA relies heavily on traits as an abstraction mechanism, the distinction between a type that is a monitor and a type that looks like a monitor can become blurred.
     1637Since \CFA relies heavily on traits as an abstraction mechanism, types can coincidentally match the monitor trait but not be a monitor, similar to inheritance where a shape and playing card can both be drawable.
    16951638For this reason, \CFA requires programmers to identify the kind of parameter with the @mutex@ keyword and uses no keyword to mean \lstinline[morekeywords=nomutex]@nomutex@.
    16961639
     1640\newpage
    16971641The next semantic decision is establishing which parameter \emph{types} may be qualified with @mutex@.
    16981642The following has monitor parameter types that are composed of multiple objects.
     
    17131657A positive consequence of this design decision is the ability to support multi-monitor functions,\footnote{
    17141658While object-oriented monitors can be extended with a mutex qualifier for multiple-monitor members, no prior example of this feature could be found.}
    1715 called \newterm{bulk acquire} (see Section~\ref{s:Implementation} for implementation details).
     1659called \newterm{bulk acquire}.
    17161660\CFA guarantees acquisition order is consistent across calls to @mutex@ functions using the same monitors as arguments, so acquiring multiple monitors is safe from deadlock.
    1717 Figure~\ref{f:BankTransfer} shows a trivial solution to the bank-account transfer problem using \CFA monitors with implicit locking and \CC with explicit locking.
     1661Figure~\ref{f:BankTransfer} shows a trivial solution to the bank transfer problem, where two resources must be locked simultaneously, using \CFA monitors with implicit locking and \CC with explicit locking.
    17181662A \CFA programmer only has to manage when to acquire mutual exclusion;
    17191663a \CC programmer must select the correct lock and acquisition mechanism from a panoply of locking options.
     
    17921736\end{figure}
    17931737
    1794 Users can force the acquiring order, by using @mutex@/\lstinline[morekeywords=nomutex]@nomutex@.
     1738Users can still force the acquiring order by using @mutex@/\lstinline[morekeywords=nomutex]@nomutex@.
     1739\newpage
    17951740\begin{cfa}
    17961741void foo( M & mutex m1, M & mutex m2 ); $\C{// acquire m1 and m2}$
     
    18031748\end{cfa}
    18041749The bulk-acquire semantics allow @bar@ or @baz@ to acquire a monitor lock and reacquire it in @foo@.
    1805 In the calls to @bar@ and @baz@, the monitors are acquired in opposite order, resulting in deadlock.
     1750In the calls to @bar@ and @baz@, the monitors are acquired in opposite order, possibly resulting in deadlock.
    18061751However, this case is the simplest instance of the \emph{nested-monitor problem}~\cite{Lister77}, where monitors are acquired in sequence versus bulk.
    18071752Detecting the nested-monitor problem requires dynamic tracking of monitor calls, and dealing with it requires rollback semantics~\cite{Dice10}.
    18081753\CFA does not deal with this fundamental problem.
    18091754
    1810 
    1811 \subsection{\protect\lstinline@mutex@ statement}
    1812 \label{mutex-stmt}
    1813 
    1814 The monitor call-semantics associate all locking semantics with functions.
    1815 Like Java, \CFA offers an alternative @mutex@ statement to reduce refactoring and naming.
     1755Finally, like Java, \CFA offers an alternative @mutex@ statement to reduce refactoring and naming.
    18161756\begin{cquote}
     1757\renewcommand{\arraystretch}{0.0}
    18171758\begin{tabular}{@{}l@{\hspace{3\parindentlnth}}l@{}}
    18181759\multicolumn{1}{c}{\textbf{\lstinline@mutex@ call}} & \multicolumn{1}{c}{\lstinline@mutex@ \textbf{statement}} \\
    1819 \begin{cfa}
     1760\begin{cfa}[aboveskip=0pt,belowskip=0pt]
    18201761monitor M { ... };
    18211762void foo( M & mutex m1, M & mutex m2 ) {
     
    18271768\end{cfa}
    18281769&
    1829 \begin{cfa}
     1770\begin{cfa}[aboveskip=0pt,belowskip=0pt]
    18301771
    18311772void bar( M & m1, M & m2 ) {
     
    18431784\label{s:Scheduling}
    18441785
     1786% There are many aspects of scheduling in a concurrency system, all related to resource utilization by waiting threads, \ie which thread gets the resource next.
     1787% Different forms of scheduling include access to processors by threads (see Section~\ref{s:RuntimeStructureCluster}), another is access to a shared resource by a lock or monitor.
     1788This section discusses monitor scheduling for waiting threads eligible for entry, \ie which thread gets the shared resource next. (See Section~\ref{s:RuntimeStructureCluster} for scheduling threads on virtual processors.)
    18451789While monitor mutual-exclusion provides safe access to shared data, the monitor data may indicate that a thread accessing it cannot proceed, \eg a bounded buffer may be full/empty so produce/consumer threads must block.
    18461790Leaving the monitor and trying again (busy waiting) is impractical for high-level programming.
    18471791Monitors eliminate busy waiting by providing synchronization to schedule threads needing access to the shared data, where threads block versus spinning.
    1848 Synchronization is generally achieved with internal~\cite{Hoare74} or external~\cite[\S~2.9.2]{uC++} scheduling, where \newterm{scheduling} defines which thread acquires the critical section next.
     1792Synchronization is generally achieved with internal~\cite{Hoare74} or external~\cite[\S~2.9.2]{uC++} scheduling.
    18491793\newterm{Internal scheduling} is characterized by each thread entering the monitor and making an individual decision about proceeding or blocking, while \newterm{external scheduling} is characterized by an entering thread making a decision about proceeding for itself and on behalf of other threads attempting entry.
    18501794Finally, \CFA monitors do not allow calling threads to barge ahead of signalled threads, which simplifies synchronization among threads in the monitor and increases correctness.
    18511795If barging is allowed, synchronization between a signaller and signallee is difficult, often requiring additional flags and multiple unblock/block cycles.
    1852 In fact, signals-as-hints is completely opposite from that proposed by Hoare in the seminal paper on monitors:
    1853 \begin{cquote}
    1854 However, we decree that a signal operation be followed immediately by resumption of a waiting program, without possibility of an intervening procedure call from yet a third program.
    1855 It is only in this way that a waiting program has an absolute guarantee that it can acquire the resource just released by the signalling program without any danger that a third program will interpose a monitor entry and seize the resource instead.~\cite[p.~550]{Hoare74}
    1856 \end{cquote}
     1796In fact, signals-as-hints is completely opposite from that proposed by Hoare in the seminal paper on monitors~\cite[p.~550]{Hoare74}.
     1797% \begin{cquote}
     1798% However, we decree that a signal operation be followed immediately by resumption of a waiting program, without possibility of an intervening procedure call from yet a third program.
     1799% It is only in this way that a waiting program has an absolute guarantee that it can acquire the resource just released by the signalling program without any danger that a third program will interpose a monitor entry and seize the resource instead.~\cite[p.~550]{Hoare74}
     1800% \end{cquote}
    18571801Furthermore, \CFA concurrency has no spurious wakeup~\cite[\S~9]{Buhr05a}, which eliminates an implicit form of barging.
    1858 Hence, a \CFA @wait@ statement is not enclosed in a @while@ loop that can cause thread starvation.
    1859 
    1860 Figure~\ref{f:BBInt} shows a \CFA generic bounded-buffer with internal scheduling, where producers/consumers enter the monitor, see the buffer is full/empty, and block on an appropriate condition lock, @full@/@empty@.
     1802Hence, a \CFA @wait@ statement is not enclosed in a @while@ loop retesting a blocking predicate, which can cause thread starvation due to barging.
     1803
     1804Figure~\ref{f:MonitorScheduling} shows internal/external scheduling (for the bounded-buffer example in Figure~\ref{f:InternalExternalScheduling}).
     1805External calling threads block on the calling queue, if the monitor is occupied, otherwise they enter in FIFO order.
     1806Internal threads block on condition queues via @wait@ and they reenter from the condition in FIFO order.
     1807
     1808There are three signalling mechanisms to unblock waiting threads to enter the monitor.
     1809Note, signalling cannot have the signaller and signalled thread in the monitor simultaneously because of the mutual exclusion so only can proceed.
     1810For internal scheduling, threads are unblocked from condition queues using @signal@, where the signallee is moved to urgent and the signaller continues (solid line).
     1811Multiple signals move multiple signallees to urgent, until the condition is empty.
     1812When the signaller exits or waits, a thread blocked on urgent is processed before calling threads to prevent barging.
     1813(Java conceptually moves the signalled thread to the calling queue, and hence, allows barging.)
     1814The alternative unblock is in the opposite order using @signal_block@, where the signaller is moved to urgent and the signallee continues (dashed line), and is implicitly unblocked from urgent when the signallee exits or waits.
     1815
     1816For external scheduling, the condition queues are not used;
     1817instead threads are unblocked directly from the calling queue using @waitfor@ based on function names requesting mutual exclusion.
     1818(The linear search through the calling queue to locate a particular call can be reduced to $O(1)$.)
     1819The @waitfor@ has the same semantics as @signal_block@, where the signalled thread executes before the signallee, which waits on urgent.
     1820Executing multiple @waitfor@s from different signalled functions causes the calling threads to move to urgent.
     1821External scheduling requires urgent to be a stack, because the signaller excepts to execute immediately after the specified monitor call has exited or waited.
     1822Internal scheduling behaves the same for an urgent stack or queue, except for signalling multiple threads, where the threads unblock from urgent in reverse order from signalling.
     1823If the restart order is important, multiple signalling by a signal thread can be transformed into shared signalling among threads, where each thread signals the next thread.
     1824Hence, \CFA uses an urgent stack.
     1825
     1826\begin{figure}
     1827\centering
     1828% \subfloat[Scheduling Statements] {
     1829% \label{fig:SchedulingStatements}
     1830% {\resizebox{0.45\textwidth}{!}{\input{CondSigWait.pstex_t}}}
     1831\input{CondSigWait.pstex_t}
     1832% }% subfloat
     1833% \quad
     1834% \subfloat[Bulk acquire monitor] {
     1835% \label{fig:BulkMonitor}
     1836% {\resizebox{0.45\textwidth}{!}{\input{ext_monitor.pstex_t}}}
     1837% }% subfloat
     1838\caption{Monitor Scheduling}
     1839\label{f:MonitorScheduling}
     1840\end{figure}
     1841
     1842Figure~\ref{f:BBInt} shows a \CFA generic bounded-buffer with internal scheduling, where producers/consumers enter the monitor, see the buffer is full/empty, and block on an appropriate condition variable, @full@/@empty@.
    18611843The @wait@ function atomically blocks the calling thread and implicitly releases the monitor lock(s) for all monitors in the function's parameter list.
    1862 The appropriate condition lock is signalled to unblock an opposite kind of thread after an element is inserted/removed from the buffer.
    1863 Signalling is unconditional, because signalling an empty condition lock does nothing.
    1864 
    1865 Signalling semantics cannot have the signaller and signalled thread in the monitor simultaneously, which means:
    1866 \begin{enumerate}
    1867 \item
    1868 The signalling thread returns immediately, and the signalled thread continues.
    1869 \item
    1870 The signalling thread continues and the signalled thread is marked for urgent unblocking at the next scheduling point (exit/wait).
    1871 \item
    1872 The signalling thread blocks but is marked for urgrent unblocking at the next scheduling point and the signalled thread continues.
    1873 \end{enumerate}
    1874 The first approach is too restrictive, as it precludes solving a reasonable class of problems, \eg dating service (see Figure~\ref{f:DatingService}).
    1875 \CFA supports the next two semantics as both are useful.
    1876 Finally, while it is common to store a @condition@ as a field of the monitor, in \CFA, a @condition@ variable can be created/stored independently.
    1877 Furthermore, a condition variable is tied to a \emph{group} of monitors on first use, called \newterm{branding}, which means that using internal scheduling with distinct sets of monitors requires one condition variable per set of monitors.
     1844The appropriate condition variable is signalled to unblock an opposite kind of thread after an element is inserted/removed from the buffer.
     1845Signalling is unconditional, because signalling an empty condition variable does nothing.
     1846It is common to declare condition variables as monitor fields to prevent shared access, hence no locking is required for access as the conditions are protected by the monitor lock.
     1847In \CFA, a condition variable can be created/stored independently.
     1848To still prevent expensive locking on access, a condition variable is tied to a \emph{group} of monitors on first use, called \newterm{branding}, resulting in a low-cost boolen test to detect sharing from other monitors.
     1849
     1850% Signalling semantics cannot have the signaller and signalled thread in the monitor simultaneously, which means:
     1851% \begin{enumerate}
     1852% \item
     1853% The signalling thread returns immediately and the signalled thread continues.
     1854% \item
     1855% The signalling thread continues and the signalled thread is marked for urgent unblocking at the next scheduling point (exit/wait).
     1856% \item
     1857% The signalling thread blocks but is marked for urgrent unblocking at the next scheduling point and the signalled thread continues.
     1858% \end{enumerate}
     1859% The first approach is too restrictive, as it precludes solving a reasonable class of problems, \eg dating service (see Figure~\ref{f:DatingService}).
     1860% \CFA supports the next two semantics as both are useful.
    18781861
    18791862\begin{figure}
     
    18891872        };
    18901873        void ?{}( Buffer(T) & buffer ) with(buffer) {
    1891                 [front, back, count] = 0;
     1874                front = back = count = 0;
    18921875        }
    1893 
    18941876        void insert( Buffer(T) & mutex buffer, T elem )
    18951877                                with(buffer) {
     
    19081890\end{lrbox}
    19091891
     1892% \newbox\myboxB
     1893% \begin{lrbox}{\myboxB}
     1894% \begin{cfa}[aboveskip=0pt,belowskip=0pt]
     1895% forall( otype T ) { // distribute forall
     1896%       monitor Buffer {
     1897%
     1898%               int front, back, count;
     1899%               T elements[10];
     1900%       };
     1901%       void ?{}( Buffer(T) & buffer ) with(buffer) {
     1902%               [front, back, count] = 0;
     1903%       }
     1904%       T remove( Buffer(T) & mutex buffer ); // forward
     1905%       void insert( Buffer(T) & mutex buffer, T elem )
     1906%                               with(buffer) {
     1907%               if ( count == 10 ) `waitfor( remove, buffer )`;
     1908%               // insert elem into buffer
     1909%
     1910%       }
     1911%       T remove( Buffer(T) & mutex buffer ) with(buffer) {
     1912%               if ( count == 0 ) `waitfor( insert, buffer )`;
     1913%               // remove elem from buffer
     1914%
     1915%               return elem;
     1916%       }
     1917% }
     1918% \end{cfa}
     1919% \end{lrbox}
     1920
    19101921\newbox\myboxB
    19111922\begin{lrbox}{\myboxB}
    19121923\begin{cfa}[aboveskip=0pt,belowskip=0pt]
    1913 forall( otype T ) { // distribute forall
    1914         monitor Buffer {
    1915 
    1916                 int front, back, count;
    1917                 T elements[10];
    1918         };
    1919         void ?{}( Buffer(T) & buffer ) with(buffer) {
    1920                 [front, back, count] = 0;
    1921         }
    1922         T remove( Buffer(T) & mutex buffer ); // forward
    1923         void insert( Buffer(T) & mutex buffer, T elem )
    1924                                 with(buffer) {
    1925                 if ( count == 10 ) `waitfor( remove, buffer )`;
    1926                 // insert elem into buffer
    1927 
    1928         }
    1929         T remove( Buffer(T) & mutex buffer ) with(buffer) {
    1930                 if ( count == 0 ) `waitfor( insert, buffer )`;
    1931                 // remove elem from buffer
    1932 
    1933                 return elem;
    1934         }
    1935 }
     1924monitor ReadersWriter {
     1925        int rcnt, wcnt; // readers/writer using resource
     1926};
     1927void ?{}( ReadersWriter & rw ) with(rw) {
     1928        rcnt = wcnt = 0;
     1929}
     1930void EndRead( ReadersWriter & mutex rw ) with(rw) {
     1931        rcnt -= 1;
     1932}
     1933void EndWrite( ReadersWriter & mutex rw ) with(rw) {
     1934        wcnt = 0;
     1935}
     1936void StartRead( ReadersWriter & mutex rw ) with(rw) {
     1937        if ( wcnt > 0 ) `waitfor( EndWrite, rw );`
     1938        rcnt += 1;
     1939}
     1940void StartWrite( ReadersWriter & mutex rw ) with(rw) {
     1941        if ( wcnt > 0 ) `waitfor( EndWrite, rw );`
     1942        else while ( rcnt > 0 ) `waitfor( EndRead, rw );`
     1943        wcnt = 1;
     1944}
     1945
    19361946\end{cfa}
    19371947\end{lrbox}
    19381948
    1939 \subfloat[Internal scheduling]{\label{f:BBInt}\usebox\myboxA}
    1940 %\qquad
    1941 \subfloat[External scheduling]{\label{f:BBExt}\usebox\myboxB}
    1942 \caption{Generic bounded-buffer}
    1943 \label{f:GenericBoundedBuffer}
     1949\subfloat[Generic bounded buffer, internal scheduling]{\label{f:BBInt}\usebox\myboxA}
     1950\hspace{3pt}
     1951\vrule
     1952\hspace{3pt}
     1953\subfloat[Readers / writer lock, external scheduling]{\label{f:RWExt}\usebox\myboxB}
     1954
     1955\caption{Internal / external scheduling}
     1956\label{f:InternalExternalScheduling}
    19441957\end{figure}
    19451958
    1946 Figure~\ref{f:BBExt} shows a \CFA generic bounded-buffer with external scheduling, where producers/consumers detecting a full/\-empty buffer block and prevent more producers/consumers from entering the monitor until there is a free/empty slot in the buffer.
     1959Figure~\ref{f:BBInt} can be transformed into external scheduling by removing the condition variables and signals/waits, and adding the following lines at the locations of the current @wait@s in @insert@/@remove@, respectively.
     1960\begin{cfa}[aboveskip=2pt,belowskip=1pt]
     1961if ( count == 10 ) `waitfor( remove, buffer )`;       |      if ( count == 0 ) `waitfor( insert, buffer )`;
     1962\end{cfa}
     1963Here, the producers/consumers detects a full/\-empty buffer and prevents more producers/consumers from entering the monitor until there is a free/empty slot in the buffer.
    19471964External scheduling is controlled by the @waitfor@ statement, which atomically blocks the calling thread, releases the monitor lock, and restricts the function calls that can next acquire mutual exclusion.
    19481965If the buffer is full, only calls to @remove@ can acquire the buffer, and if the buffer is empty, only calls to @insert@ can acquire the buffer.
    1949 Threads making calls to functions that are currently excluded, block outside of (external to) the monitor on a calling queue, versus blocking on condition queues inside of (internal to) the monitor.
    1950 External scheduling allows users to wait for events from other threads without concern of unrelated events occurring.
     1966Threads making calls to functions that are currently excluded block outside of (external to) the monitor on the calling queue, versus blocking on condition queues inside of (internal to) the monitor.
     1967Figure~\ref{f:RWExt} shows a readers/writer lock written using external scheduling, where a waiting reader detects a writer using the resource and restricts further calls until the writer exits by calling @EndWrite@.
     1968The writer does a similar action for each reader or writer using the resource.
     1969Note, no new calls to @StarRead@/@StartWrite@ may occur when waiting for the call to @EndRead@/@EndWrite@.
     1970External scheduling allows waiting for events from other threads while restricting unrelated events.
    19511971The mechnaism can be done in terms of control flow, \eg Ada @accept@ or \uC @_Accept@, or in terms of data, \eg Go @select@ on channels.
    1952 While both mechanisms have strengths and weaknesses, this project uses a control-flow mechanism to stay consistent with other language semantics.
    1953 Two challenges specific to \CFA for external scheduling are loose object-definitions (see Section~\ref{s:LooseObjectDefinitions}) and multiple-monitor functions (see Section~\ref{s:Multi-MonitorScheduling}).
    1954 
    1955 For internal scheduling, non-blocking signalling (as in the producer/consumer example) is used when the signaller is providing the cooperation for a waiting thread;
    1956 the signaller enters the monitor and changes state, detects a waiting threads that can use the state, performs a non-blocking signal on the condition queue for the waiting thread, and exits the monitor to run concurrently.
    1957 The waiter unblocks next from the urgent queue, uses/takes the state, and exits the monitor.
    1958 Blocking signalling is the reverse, where the waiter is providing the cooperation for the signalling thread;
    1959 the signaller enters the monitor, detects a waiting thread providing the necessary state, performs a blocking signal to place it on the urgent queue and unblock the waiter.
    1960 The waiter changes state and exits the monitor, and the signaller unblocks next from the urgent queue to use/take the state.
     1972While both mechanisms have strengths and weaknesses, this project uses the control-flow mechanism to be consistent with other language features.
     1973% Two challenges specific to \CFA for external scheduling are loose object-definitions (see Section~\ref{s:LooseObjectDefinitions}) and multiple-monitor functions (see Section~\ref{s:Multi-MonitorScheduling}).
    19611974
    19621975Figure~\ref{f:DatingService} shows a dating service demonstrating non-blocking and blocking signalling.
     
    19811994        int GirlPhNo, BoyPhNo;
    19821995        condition Girls[CCodes], Boys[CCodes];
    1983         condition exchange;
     1996        `condition exchange;`
    19841997};
    19851998int girl( DS & mutex ds, int phNo, int ccode ) {
     
    19922005                `signal( Boys[ccode] );`
    19932006                `wait( exchange );`
    1994         } // if
     2007        }
    19952008        return BoyPhNo;
    19962009}
     
    20352048\end{figure}
    20362049
     2050In summation, for internal scheduling, non-blocking signalling (as in the producer/consumer example) is used when the signaller is providing the cooperation for a waiting thread;
     2051the signaller enters the monitor and changes state, detects a waiting threads that can use the state, performs a non-blocking signal on the condition queue for the waiting thread, and exits the monitor to run concurrently.
     2052The waiter unblocks next from the urgent queue, uses/takes the state, and exits the monitor.
     2053Blocking signalling is the reverse, where the waiter is providing the cooperation for the signalling thread;
     2054the signaller enters the monitor, detects a waiting thread providing the necessary state, performs a blocking signal to place it on the urgent queue and unblock the waiter.
     2055The waiter changes state and exits the monitor, and the signaller unblocks next from the urgent queue to use/take the state.
     2056
    20372057Both internal and external scheduling extend to multiple monitors in a natural way.
    20382058\begin{cquote}
     
    20572077\end{tabular}
    20582078\end{cquote}
    2059 For @wait( e )@, the default semantics is to atomically block the signaller and release all acquired mutex types in the parameter list, \ie @wait( e, m1, m2 )@.
     2079For @wait( e )@, the default semantics is to atomically block the signaller and release all acquired mutex parameters, \ie @wait( e, m1, m2 )@.
    20602080To override the implicit multi-monitor wait, specific mutex parameter(s) can be specified, \eg @wait( e, m1 )@.
    20612081Wait statically verifies the released monitors are the acquired mutex-parameters so unconditional release is safe.
    20622082While \CC supports bulk locking, @wait@ only accepts a single lock for a condition variable, so bulk locking with condition variables is asymmetric.
    20632083Finally, a signaller,
     2084\newpage
    20642085\begin{cfa}
    20652086void baz( M & mutex m1, M & mutex m2 ) {
     
    20692090must have acquired at least the same locks as the waiting thread signalled from the condition queue.
    20702091
    2071 Similarly, for @waitfor( rtn )@, the default semantics is to atomically block the acceptor and release all acquired mutex types in the parameter list, \ie @waitfor( rtn, m1, m2 )@.
     2092Similarly, for @waitfor( rtn )@, the default semantics is to atomically block the acceptor and release all acquired mutex parameters, \ie @waitfor( rtn, m1, m2 )@.
    20722093To override the implicit multi-monitor wait, specific mutex parameter(s) can be specified, \eg @waitfor( rtn, m1 )@.
    20732094@waitfor@ statically verifies the released monitors are the same as the acquired mutex-parameters of the given function or function pointer.
     
    20922113While deadlock can occur with multiple/nesting acquisition, this is a consequence of locks, and by extension monitors, not being perfectly composable.
    20932114
    2094 % Supporting barging prevention as well as extending internal scheduling to multiple monitors is the main source of complexity in the design and implementation of \CFA concurrency.
    2095 
    2096 
    2097 \subsection{Barging Prevention}
    2098 
    2099 Figure~\ref{f:BargingPrevention} shows \CFA code where bulk acquire adds complexity to the internal-signalling semantics.
     2115
     2116
     2117\subsection{Extended \protect\lstinline@waitfor@}
     2118
     2119Figure~\ref{f:ExtendedWaitfor} show the extended form of the @waitfor@ statement to conditionally accept one of a group of mutex functions, with an optional statement to be performed \emph{after} the mutex function finishes.
     2120For a @waitfor@ clause to be executed, its @when@ must be true and an outstanding call to its corresponding member(s) must exist.
     2121The \emph{conditional-expression} of a @when@ may call a function, but the function must not block or context switch.
     2122If there are multiple acceptable mutex calls, selection occurs top-to-bottom (prioritized) among the @waitfor@ clauses, whereas some programming languages with similar mechanisms accept nondeterministically for this case, \eg Go \lstinline[morekeywords=select]@select@.
     2123If some accept guards are true and there are no outstanding calls to these members, the acceptor is accept-blocked until a call to one of these members is made.
     2124If there is a @timeout@ clause, it provides an upper bound on waiting.
     2125If all the accept guards are false, the statement does nothing, unless there is a terminating @else@ clause with a true guard, which is executed instead.
     2126Hence, the terminating @else@ clause allows a conditional attempt to accept a call without blocking.
     2127If both @timeout@ and @else@ clause are present, the @else@ must be conditional, or the @timeout@ is never triggered.
     2128
     2129\begin{figure}
     2130\centering
     2131\begin{cfa}
     2132`when` ( $\emph{conditional-expression}$ )      $\C{// optional guard}$
     2133        waitfor( $\emph{mutex-member-name}$ ) $\emph{statement}$ $\C{// action after call}$
     2134`or` `when` ( $\emph{conditional-expression}$ ) $\C{// any number of functions}$
     2135        waitfor( $\emph{mutex-member-name}$ ) $\emph{statement}$
     2136`or`    ...
     2137`when` ( $\emph{conditional-expression}$ ) $\C{// optional guard}$
     2138        `timeout` $\emph{statement}$ $\C{// optional terminating timeout clause}$
     2139`when` ( $\emph{conditional-expression}$ ) $\C{// optional guard}$
     2140        `else`  $\emph{statement}$ $\C{// optional terminating clause}$
     2141\end{cfa}
     2142\caption{Extended \protect\lstinline@waitfor@}
     2143\label{f:ExtendedWaitfor}
     2144\end{figure}
     2145
     2146Note, a group of conditional @waitfor@ clauses is \emph{not} the same as a group of @if@ statements, e.g.:
     2147\begin{cfa}
     2148if ( C1 ) waitfor( mem1 );                       when ( C1 ) waitfor( mem1 );
     2149else if ( C2 ) waitfor( mem2 );         or when ( C2 ) waitfor( mem2 );
     2150\end{cfa}
     2151The left example only accepts @mem1@ if @C1@ is true or only @mem2@ if @C2@ is true.
     2152The right example accepts either @mem1@ or @mem2@ if @C1@ and @C2@ are true.
     2153
     2154An interesting use of @waitfor@ is accepting the @mutex@ destructor to know when an object is deallocated.
     2155\begin{cfa}
     2156void insert( Buffer(T) & mutex buffer, T elem ) with( buffer ) {
     2157        if ( count == 10 )
     2158                waitfor( remove, buffer ) {
     2159                        // insert elem into buffer
     2160                } or `waitfor( ^?{}, buffer )` throw insertFail;
     2161}
     2162\end{cfa}
     2163When the buffer is deallocated, the current waiter is unblocked and informed, so it can perform an appropriate action.
     2164However, the basic @waitfor@ semantics do not support this functionality, since using an object after its destructor is called is undefined.
     2165Therefore, to make this useful capability work, the semantics for accepting the destructor is the same as @signal@, \ie the call to the destructor is placed on the urgent queue and the acceptor continues execution, which throws an exception to the acceptor and then the caller is unblocked from the urgent queue to deallocate the object.
     2166Accepting the destructor is an idiomatic way to terminate a thread in \CFA.
     2167
     2168
     2169\subsection{Bulk Barging Prevention}
     2170
     2171Figure~\ref{f:BulkBargingPrevention} shows \CFA code where bulk acquire adds complexity to the internal-signalling semantics.
    21002172The complexity begins at the end of the inner @mutex@ statement, where the semantics of internal scheduling need to be extended for multiple monitors.
    21012173The problem is that bulk acquire is used in the inner @mutex@ statement where one of the monitors is already acquired.
    21022174When the signalling thread reaches the end of the inner @mutex@ statement, it should transfer ownership of @m1@ and @m2@ to the waiting threads to prevent barging into the outer @mutex@ statement by another thread.
    2103 However, both the signalling and waiting thread W1 still need monitor @m1@.
     2175However, both the signalling and waiting threads W1 and W2 need some subset of monitors @m1@ and @m2@.
     2176\begin{cquote}
     2177condition c: (order 1) W2(@m2@), W1(@m1@,@m2@)\ \ \ or\ \ \ (order 2) W1(@m1@,@m2@), W2(@m2@) \\
     2178S: acq. @m1@ $\rightarrow$ acq. @m1,m2@ $\rightarrow$ @signal(c)@ $\rightarrow$ rel. @m2@ $\rightarrow$ pass @m2@ unblock W2 (order 2) $\rightarrow$ rel. @m1@ $\rightarrow$ pass @m1,m2@ unblock W1 \\
     2179\hspace*{2.75in}$\rightarrow$ rel. @m1@ $\rightarrow$ pass @m1,m2@ unblock W1 (order 1)
     2180\end{cquote}
    21042181
    21052182\begin{figure}
     
    21132190        mutex( m1, m2 ) { // $\LstCommentStyle{\color{red}inner}$
    21142191                ... `signal( c )`; ...
    2115                 // m1, m2 acquired
     2192                // m1, m2 still acquired
    21162193        } // $\LstCommentStyle{\color{red}release m2}$
    21172194        // m1 acquired
     
    21282205        ...
    21292206        mutex( m1, m2 ) {
    2130                 ... `wait( c )`; // block and release m1, m2
    2131                 // m1, m2 acquired
     2207                ... `wait( c )`; // release m1, m2
     2208                // m1, m2 reacquired
    21322209        } // $\LstCommentStyle{\color{red}release m2}$
    21332210        // m1 acquired
     
    21422219
    21432220mutex( m2 ) {
    2144         ... `wait( c )`; ...
    2145         // m2 acquired
     2221        ... `wait( c )`; // release m2
     2222        // m2 reacquired
    21462223} // $\LstCommentStyle{\color{red}release m2}$
    21472224
     
    21532230
    21542231\begin{cquote}
    2155 \subfloat[Signalling Thread]{\label{f:SignallingThread}\usebox\myboxA}
     2232\subfloat[Signalling Thread (S)]{\label{f:SignallingThread}\usebox\myboxA}
    21562233\hspace{3\parindentlnth}
    21572234\subfloat[Waiting Thread (W1)]{\label{f:WaitingThread}\usebox\myboxB}
     
    21592236\subfloat[Waiting Thread (W2)]{\label{f:OtherWaitingThread}\usebox\myboxC}
    21602237\end{cquote}
    2161 \caption{Barging Prevention}
    2162 \label{f:BargingPrevention}
     2238\caption{Bulk Barging Prevention}
     2239\label{f:BulkBargingPrevention}
    21632240\end{figure}
    21642241
    2165 One scheduling solution is for the signaller to keep ownership of all locks until the last lock is ready to be transferred, because this semantics fits most closely to the behaviour of single-monitor scheduling.
    2166 However, Figure~\ref{f:OtherWaitingThread} shows this solution is complex depending on other waiters, resulting in options when the signaller finishes the inner mutex-statement.
    2167 The signaller can retain @m2@ until completion of the outer mutex statement and pass the locks to waiter W1, or it can pass @m2@ to waiter W2 after completing the inner mutex-statement, while continuing to hold @m1@.
    2168 In the latter case, waiter W2 must eventually pass @m2@ to waiter W1, which is complex because W1 may have waited before W2, so W2 is unaware of it.
    2169 Furthermore, there is an execution sequence where the signaller always finds waiter W2, and hence, waiter W1 starves.
    2170 
    2171 While a number of approaches were examined~\cite[\S~4.3]{Delisle18}, the solution chosen for \CFA is a novel techique called \newterm{partial signalling}.
    2172 Signalled threads are moved to the urgent queue and the waiter at the front defines the set of monitors necessary for it to unblock.
    2173 Partial signalling transfers ownership of monitors to the front waiter.
    2174 When the signaller thread exits or waits in the monitor, the front waiter is unblocked if all its monitors are released.
    2175 The benefit is encapsulating complexity into only two actions: passing monitors to the next owner when they should be released and conditionally waking threads if all conditions are met.
     2242One scheduling solution is for the signaller S to keep ownership of all locks until the last lock is ready to be transferred, because this semantics fits most closely to the behaviour of single-monitor scheduling.
     2243However, this solution is inefficient if W2 waited first and can be immediate passed @m2@ when released, while S retains @m1@ until completion of the outer mutex statement.
     2244If W1 waited first, the signaller must retain @m1@ amd @m2@ until completion of the outer mutex statement and then pass both to W1.
     2245% Furthermore, there is an execution sequence where the signaller always finds waiter W2, and hence, waiter W1 starves.
     2246To support this efficient semantics (and prevent barging), the implementation maintains a list of monitors acquired for each blocked thread.
     2247When a signaller exits or waits in a monitor function/statement, the front waiter on urgent is unblocked if all its monitors are released.
     2248Implementing a fast subset check for the necessary released monitors is important.
     2249% The benefit is encapsulating complexity into only two actions: passing monitors to the next owner when they should be released and conditionally waking threads if all conditions are met.
    21762250
    21772251
     
    21802254
    21812255In an object-oriented programming-language, a class includes an exhaustive list of operations.
    2182 However, new members can be added via static inheritance or dynamic members, \eg JavaScript~\cite{JavaScript}.
    2183 Similarly, monitor functions can be added at any time in \CFA, making it less clear for programmers and more difficult to implement.
    2184 \begin{cfa}
    2185 monitor M { ... };
    2186 void `f`( M & mutex m );
    2187 void g( M & mutex m ) { waitfor( `f` ); } $\C{// clear which f}$
    2188 void `f`( M & mutex m, int ); $\C{// different f}$
    2189 void h( M & mutex m ) { waitfor( `f` ); } $\C{// unclear which f}$
    2190 \end{cfa}
    2191 Hence, the \CFA code for entering a monitor looks like:
    2192 \begin{cfa}
    2193 if ( $\textrm{\textit{monitor is free}}$ ) $\C{// enter}$
    2194 else if ( $\textrm{\textit{already own monitor}}$ ) $\C{// continue}$
    2195 else if ( $\textrm{\textit{monitor accepts me}}$ ) $\C{// enter}$
    2196 else $\C{// block}$
    2197 \end{cfa}
    2198 For the first two conditions, it is easy to implement a check that can evaluate the condition in a few instructions.
    2199 However, a fast check for \emph{monitor accepts me} is much harder to implement depending on the constraints put on the monitors.
    2200 Figure~\ref{fig:ClassicalMonitor} shows monitors are often expressed as an entry (calling) queue, some acceptor queues, and an urgent stack/queue.
    2201 
    2202 \begin{figure}
    2203 \centering
    2204 \subfloat[Classical monitor] {
    2205 \label{fig:ClassicalMonitor}
    2206 {\resizebox{0.45\textwidth}{!}{\input{monitor.pstex_t}}}
    2207 }% subfloat
    2208 \quad
    2209 \subfloat[Bulk acquire monitor] {
    2210 \label{fig:BulkMonitor}
    2211 {\resizebox{0.45\textwidth}{!}{\input{ext_monitor.pstex_t}}}
    2212 }% subfloat
    2213 \caption{Monitor implementation}
    2214 \label{f:MonitorImplementation}
    2215 \end{figure}
    2216 
    2217 For a fixed (small) number of mutex functions (\eg 128), the accept check reduces to a bitmask of allowed callers, which can be checked with a single instruction.
    2218 This approach requires a unique dense ordering of functions with a small upper-bound and the ordering must be consistent across translation units.
    2219 For object-oriented languages these constraints are common, but \CFA mutex functions can be added in any scope and are only visible in certain translation unit, precluding program-wide dense-ordering among mutex functions.
    2220 
    2221 Figure~\ref{fig:BulkMonitor} shows the \CFA monitor implementation.
    2222 The mutex function called is associated with each thread on the entry queue, while a list of acceptable functions is kept separately.
    2223 The accepted list is a variable-sized array of accepted function pointers, so the single instruction bitmask comparison is replaced by dereferencing a pointer followed by a (usually short) linear search.
     2256A new class can add members via static inheritance but the subclass still has an exhaustive list of operations.
     2257(Dynamic member adding, \eg JavaScript~\cite{JavaScript}, is not considered.)
     2258In the object-oriented scenario, the type and all its operators are always present at compilation (even separate compilation), so it is possible to number the operations in a bit mask and use an $O(1)$ compare with a similar bit mask created for the operations specified in a @waitfor@.
     2259
     2260In \CFA, monitor functions can be statically added/removed in translation units, so it is impossible to apply an $O(1)$ approach.
     2261\begin{cfa}
     2262        monitor M { ... }; // common type, included in .h file
     2263translation unit 1
     2264        void `f`( M & mutex m );
     2265        void g( M & mutex m ) { waitfor( `f`, m ); }
     2266translation unit 2
     2267        void `f`( M & mutex m );
     2268        void `g`( M & mutex m );
     2269        void h( M & mutex m ) { waitfor( `f`, m ) or waitfor( `g`, m ); }
     2270\end{cfa}
     2271The @waitfor@ statements in each translation unit cannot form a unique bit-mask because the monitor type does not carry that information.
     2272Hence, function pointers are used to identify the functions listed in the @waitfor@ statement, stored in a variable-sized array,
     2273Then, the same implementation approach used for the urgent stack is used for the calling queue.
     2274Each caller has a list of monitors acquired, and the @waitfor@ statement performs a (usually short) linear search matching functions in the @waitfor@ list with called functions, and then verifying the associated mutex locks can be transfers.
     2275(A possible way to construct a dense mapping is at link or load-time.)
    22242276
    22252277
     
    22922344\label{f:UnmatchedMutexSets}
    22932345\end{figure}
    2294 
    2295 
    2296 \subsection{Extended \protect\lstinline@waitfor@}
    2297 
    2298 Figure~\ref{f:ExtendedWaitfor} show the extended form of the @waitfor@ statement to conditionally accept one of a group of mutex functions, with a specific action to be performed \emph{after} the mutex function finishes.
    2299 For a @waitfor@ clause to be executed, its @when@ must be true and an outstanding call to its corresponding member(s) must exist.
    2300 The \emph{conditional-expression} of a @when@ may call a function, but the function must not block or context switch.
    2301 If there are multiple acceptable mutex calls, selection occurs top-to-bottom (prioritized) in the @waitfor@ clauses, whereas some programming languages with similar mechanisms accept nondeterministically for this case, \eg Go \lstinline[morekeywords=select]@select@.
    2302 If some accept guards are true and there are no outstanding calls to these members, the acceptor is accept-blocked until a call to one of these members is made.
    2303 If all the accept guards are false, the statement does nothing, unless there is a terminating @else@ clause with a true guard, which is executed instead.
    2304 Hence, the terminating @else@ clause allows a conditional attempt to accept a call without blocking.
    2305 If there is a @timeout@ clause, it provides an upper bound on waiting.
    2306 If both a @timeout@ clause and an @else@ clause are present, the @else@ must be conditional, or the @timeout@ is never triggered.
    2307 In all cases, the statement following is executed \emph{after} a clause is executed to know which of the clauses executed.
    2308 
    2309 \begin{figure}
    2310 \centering
    2311 \begin{cfa}
    2312 `when` ( $\emph{conditional-expression}$ )      $\C{// optional guard}$
    2313         waitfor( $\emph{mutex-member-name}$ )
    2314                 $\emph{statement}$                                      $\C{// action after call}$
    2315 `or` `when` ( $\emph{conditional-expression}$ ) $\C{// optional guard}$
    2316         waitfor( $\emph{mutex-member-name}$ )
    2317                 $\emph{statement}$                                      $\C{// action after call}$
    2318 `or`    ...                                                                     $\C{// list of waitfor clauses}$
    2319 `when` ( $\emph{conditional-expression}$ )      $\C{// optional guard}$
    2320         `timeout`                                                               $\C{// optional terminating timeout clause}$
    2321                 $\emph{statement}$                                      $\C{// action after timeout}$
    2322 `when` ( $\emph{conditional-expression}$ )      $\C{// optional guard}$
    2323         `else`                                                                  $\C{// optional terminating clause}$
    2324                 $\emph{statement}$                                      $\C{// action when no immediate calls}$
    2325 \end{cfa}
    2326 \caption{Extended \protect\lstinline@waitfor@}
    2327 \label{f:ExtendedWaitfor}
    2328 \end{figure}
    2329 
    2330 Note, a group of conditional @waitfor@ clauses is \emph{not} the same as a group of @if@ statements, e.g.:
    2331 \begin{cfa}
    2332 if ( C1 ) waitfor( mem1 );                       when ( C1 ) waitfor( mem1 );
    2333 else if ( C2 ) waitfor( mem2 );         or when ( C2 ) waitfor( mem2 );
    2334 \end{cfa}
    2335 The left example only accepts @mem1@ if @C1@ is true or only @mem2@ if @C2@ is true.
    2336 The right example accepts either @mem1@ or @mem2@ if @C1@ and @C2@ are true.
    2337 
    2338 An interesting use of @waitfor@ is accepting the @mutex@ destructor to know when an object is deallocated.
    2339 \begin{cfa}
    2340 void insert( Buffer(T) & mutex buffer, T elem ) with( buffer ) {
    2341         if ( count == 10 )
    2342                 waitfor( remove, buffer ) {
    2343                         // insert elem into buffer
    2344                 } or `waitfor( ^?{}, buffer )` throw insertFail;
    2345 }
    2346 \end{cfa}
    2347 When the buffer is deallocated, the current waiter is unblocked and informed, so it can perform an appropriate action.
    2348 However, the basic @waitfor@ semantics do not support this functionality, since using an object after its destructor is called is undefined.
    2349 Therefore, to make this useful capability work, the semantics for accepting the destructor is the same as @signal@, \ie the call to the destructor is placed on the urgent queue and the acceptor continues execution, which throws an exception to the acceptor and then the caller is unblocked from the urgent queue to deallocate the object.
    2350 Accepting the destructor is an idiomatic way to terminate a thread in \CFA.
    23512346
    23522347
     
    24892484For situations where threads do not require direct communication, case 9 provides faster creation/destruction by eliminating @mutex@ setup.
    24902485
    2491 \begin{table}[b]
     2486\begin{table}
    24922487\caption{Object property composition}
    24932488\centering
     
    25032498No              & No                                    & \textbf{1}\ \ \ aggregate type                & \textbf{2}\ \ \ @monitor@ aggregate type \\
    25042499\hline
    2505 No              & Yes (stackless)               & \textbf{3}\ \ \ @generator@                   & \textbf{4}\ \ \ @monitor@ generator \\
     2500No              & Yes (stackless)               & \textbf{3}\ \ \ @generator@                   & \textbf{4}\ \ \ @monitor@ @generator@ \\
    25062501\hline
    25072502No              & Yes (stackful)                & \textbf{5}\ \ \ @coroutine@                   & \textbf{6}\ \ \ @monitor@ @coroutine@ \\
     
    25202515
    25212516
    2522 \section{Parallelism}
    2523 \label{s:Parallelism}
    2524 
    2525 Historically, computer performance was about processor speeds.
    2526 However, with heat dissipation being a direct consequence of speed increase, parallelism is the new source for increased performance~\cite{Sutter05, Sutter05b}.
    2527 Therefore, high-performance applications must care about parallelism, which requires concurrency.
    2528 The lowest-level approach of parallelism is to use \newterm{kernel threads} in combination with semantics like @fork@, @join@, \etc.
    2529 However, kernel threads are better as an implementation tool because of complexity and higher cost.
    2530 Therefore, different abstractions are often layered onto kernel threads to simplify them, \eg pthreads.
    2531 
    2532 
    2533 \subsection{User Threads}
    2534 
    2535 A direct improvement on kernel threads is user threads, \eg Erlang~\cite{Erlang} and \uC~\cite{uC++book}.
    2536 This approach provides an interface that matches the language paradigms, gives more control over concurrency by the language runtime, and an abstract (and portable) interface to the underlying kernel threads across operating systems.
    2537 In many cases, user threads can be used on a much larger scale (100,000 threads).
    2538 Like kernel threads, user threads support preemption, which maximizes nondeterminism, but increases the potential for concurrency errors: race, livelock, starvation, and deadlock.
    2539 \CFA adopts user-threads as they represent the truest realization of concurrency and can build any other concurrency approach, \eg thread pools and actors~\cite{Actors}.
    2540 
    2541 A variant of user thread is \newterm{fibres}, which removes preemption, \eg Go~\cite{Go} @goroutine@s.
    2542 Like functional programming, which removes mutation and its associated problems, removing preemption from concurrency reduces nondeterminism, making race and deadlock errors more difficult to generate.
    2543 However, preemption is necessary for concurrency that relies on spinning, so there are a class of problems that cannot be programmed with fibres.
    2544 
    2545 
    2546 \subsection{Thread Pools}
    2547 
    2548 In contrast to direct threading is indirect \newterm{thread pools}, \eg Java @executor@, where small jobs (work units) are inserted into a work pool for execution.
    2549 If the jobs are dependent, \ie interact, there is an implicit/explicit dependency graph that ties them together.
    2550 While removing direct concurrency, and hence the amount of context switching, thread pools significantly limit the interaction that can occur among jobs.
    2551 Indeed, jobs should not block because that also blocks the underlying thread, which effectively means the CPU utilization, and therefore throughput, suffers.
    2552 While it is possible to tune the thread pool with sufficient threads, it becomes difficult to obtain high throughput and good core utilization as job interaction increases.
    2553 As well, concurrency errors return, which threads pools are suppose to mitigate.
     2517% \section{Parallelism}
     2518% \label{s:Parallelism}
     2519%
     2520% Historically, computer performance was about processor speeds.
     2521% However, with heat dissipation being a direct consequence of speed increase, parallelism is the new source for increased performance~\cite{Sutter05, Sutter05b}.
     2522% Therefore, high-performance applications must care about parallelism, which requires concurrency.
     2523% The lowest-level approach of parallelism is to use \newterm{kernel threads} in combination with semantics like @fork@, @join@, \etc.
     2524% However, kernel threads are better as an implementation tool because of complexity and higher cost.
     2525% Therefore, different abstractions are often layered onto kernel threads to simplify them, \eg pthreads.
     2526%
     2527%
     2528% \subsection{User Threads}
     2529%
     2530% A direct improvement on kernel threads is user threads, \eg Erlang~\cite{Erlang} and \uC~\cite{uC++book}.
     2531% This approach provides an interface that matches the language paradigms, gives more control over concurrency by the language runtime, and an abstract (and portable) interface to the underlying kernel threads across operating systems.
     2532% In many cases, user threads can be used on a much larger scale (100,000 threads).
     2533% Like kernel threads, user threads support preemption, which maximizes nondeterminism, but increases the potential for concurrency errors: race, livelock, starvation, and deadlock.
     2534% \CFA adopts user-threads to provide more flexibility and a low-cost mechanism to build any other concurrency approach, \eg thread pools and actors~\cite{Actors}.
     2535%
     2536% A variant of user thread is \newterm{fibres}, which removes preemption, \eg Go~\cite{Go} @goroutine@s.
     2537% Like functional programming, which removes mutation and its associated problems, removing preemption from concurrency reduces nondeterminism, making race and deadlock errors more difficult to generate.
     2538% However, preemption is necessary for fairness and to reduce tail-latency.
     2539% For concurrency that relies on spinning, if all cores spin the system is livelocked, whereas preemption breaks the livelock.
     2540%
     2541%
     2542% \subsection{Thread Pools}
     2543%
     2544% In contrast to direct threading is indirect \newterm{thread pools}, \eg Java @executor@, where small jobs (work units) are inserted into a work pool for execution.
     2545% If the jobs are dependent, \ie interact, there is an implicit/explicit dependency graph that ties them together.
     2546% While removing direct concurrency, and hence the amount of context switching, thread pools significantly limit the interaction that can occur among jobs.
     2547% Indeed, jobs should not block because that also blocks the underlying thread, which effectively means the CPU utilization, and therefore throughput, suffers.
     2548% While it is possible to tune the thread pool with sufficient threads, it becomes difficult to obtain high throughput and good core utilization as job interaction increases.
     2549% As well, concurrency errors return, which threads pools are suppose to mitigate.
    25542550
    25552551
    25562552\section{\protect\CFA Runtime Structure}
     2553\label{s:CFARuntimeStructure}
    25572554
    25582555Figure~\ref{f:RunTimeStructure} illustrates the runtime structure of a \CFA program.
     
    25712568\label{s:RuntimeStructureCluster}
    25722569
    2573 A \newterm{cluster} is a collection of threads and virtual processors (abstract kernel-thread) that execute the threads (like a virtual machine).
     2570A \newterm{cluster} is a collection of threads and virtual processors (abstract kernel-thread) that execute the threads from its own ready queue (like an OS).
    25742571The purpose of a cluster is to control the amount of parallelism that is possible among threads, plus scheduling and other execution defaults.
    25752572The default cluster-scheduler is single-queue multi-server, which provides automatic load-balancing of threads on processors.
    2576 However, the scheduler is pluggable, supporting alternative schedulers.
     2573However, the scheduler is pluggable, supporting alternative schedulers, such as multi-queue multi-server, with work-stealing/sharing.
    25772574If several clusters exist, both threads and virtual processors, can be explicitly migrated from one cluster to another.
    25782575No automatic load balancing among clusters is performed by \CFA.
     
    25982595
    25992596
    2600 \subsection{Debug Kernel}
    2601 
    2602 There are two versions of the \CFA runtime kernel: debug and non-debug.
    2603 The debugging version has many runtime checks and internal assertions, \eg stack (non-writable) guard page, and checks for stack overflow whenever context switches occur among coroutines and threads, which catches most stack overflows.
    2604 After a program is debugged, the non-debugging version can be used to decrease space and increase performance.
    2605 
    2606 
     2597\begin{comment}
    26072598\section{Implementation}
    26082599\label{s:Implementation}
     
    26262617The exception to this rule is the program main, \ie the initial kernel thread that is given to any program.
    26272618In order to respect C expectations, the stack of the initial kernel thread is used by program main rather than the main processor, allowing it to grow dynamically as in a normal C program.
    2628 
    2629 Finally, an important aspect for a complete threading system is preemption, which introduces extra non-determinism via transparent interleaving, rather than cooperation among threads for proper scheduling and processor fairness from long-running threads.
    2630 Because preemption frequency is usually long (1 millisecond) performance cost is negligible.
     2619\end{comment}
     2620
     2621
     2622\subsection{Preemption}
     2623
     2624Nondeterministic preemption provides fairness from long running threads, and forces concurrent programmers to write more robust programs, rather than relying on section of code between cooperative scheduling to be atomic,
     2625A separate reason for not supporting preemption is that it significantly complicates the runtime system.
    26312626Preemption is normally handled by setting a count-down timer on each virtual processor.
    26322627When the timer expires, an interrupt is delivered, and the interrupt handler resets the count-down timer, and if the virtual processor is executing in user code, the signal handler performs a user-level context-switch, or if executing in the language runtime-kernel, the preemption is ignored or rolled forward to the point where the runtime kernel context switches back to user code.
     
    26352630The only issue with this approach is that signal masks from one kernel thread may be restored on another as part of returning from the signal handler;
    26362631therefore, the same signal mask is required for all virtual processors in a cluster.
    2637 
    2638 However, on current UNIX systems:
     2632Because preemption frequency is usually long (1 millisecond) performance cost is negligible.
     2633
     2634However, on current Linux systems:
    26392635\begin{cquote}
    26402636A process-directed signal may be delivered to any one of the threads that does not currently have the signal blocked.
     
    26422638SIGNAL(7) - Linux Programmer's Manual
    26432639\end{cquote}
    2644 Hence, the timer-expiry signal, which is generated \emph{externally} by the UNIX kernel to the UNIX process, is delivered to any of its UNIX subprocesses (kernel threads).
     2640Hence, the timer-expiry signal, which is generated \emph{externally} by the Linux kernel to the Linux process, is delivered to any of its Linux subprocesses (kernel threads).
    26452641To ensure each virtual processor receives its own preemption signals, a discrete-event simulation is run on a special virtual processor, and only it sets and receives timer events.
    26462642Virtual processors register an expiration time with the discrete-event simulator, which is inserted in sorted order.
    26472643The simulation sets the count-down timer to the value at the head of the event list, and when the timer expires, all events less than or equal to the current time are processed.
    26482644Processing a preemption event sends an \emph{internal} @SIGUSR1@ signal to the registered virtual processor, which is always delivered to that processor.
     2645
     2646
     2647\subsection{Debug Kernel}
     2648
     2649There are two versions of the \CFA runtime kernel: debug and non-debug.
     2650The debugging version has many runtime checks and internal assertions, \eg stack (non-writable) guard page, and checks for stack overflow whenever context switches occur among coroutines and threads, which catches most stack overflows.
     2651After a program is debugged, the non-debugging version can be used to decrease space and increase performance.
    26492652
    26502653
     
    26892692
    26902693All benchmarks are run using the following harness.
    2691 \newpage
    26922694\begin{cfa}
    26932695unsigned int N = 10_000_000;
     
    26972699Each benchmark is performed @N@ times, where @N@ varies depending on the benchmark;
    26982700the total time is divided by @N@ to obtain the average time for a benchmark.
    2699 All omitted tests for other languages are functionally identical to the shown \CFA test.
     2701All omitted tests for other languages are functionally identical to the \CFA tests and available online~\cite{CforallBenchMarks}.
     2702
     2703
     2704\paragraph{Object Creation}
     2705
     2706Object creation is measured by creating/deleting the specific kind of concurrent object.
     2707Figure~\ref{f:creation} shows the code for \CFA, with results in Table~\ref{tab:creation}.
     2708The only note here is that the call stacks of \CFA coroutines are lazily created, therefore without priming the coroutine to force stack creation, the creation cost is artificially low.
     2709
     2710\newpage
     2711\begin{multicols}{2}
     2712\lstset{language=CFA,moredelim=**[is][\color{red}]{@}{@},deletedelim=**[is][]{`}{`}}
     2713\begin{cfa}
     2714thread MyThread {};
     2715void main( MyThread & ) {}
     2716int main() {
     2717        BENCH( for ( N ) { @MyThread m;@ } )
     2718        sout | result`ns;
     2719}
     2720\end{cfa}
     2721\captionof{figure}{\CFA object-creation benchmark}
     2722\label{f:creation}
     2723
     2724\columnbreak
     2725
     2726\vspace*{-16pt}
     2727\captionof{table}{Object creation comparison (nanoseconds)}
     2728\label{tab:creation}
     2729
     2730\begin{tabular}[t]{@{}r*{3}{D{.}{.}{5.2}}@{}}
     2731\multicolumn{1}{@{}c}{} & \multicolumn{1}{c}{Median} & \multicolumn{1}{c}{Average} & \multicolumn{1}{c@{}}{Std Dev} \\
     2732Pthreads                                & 28091         & 28073.39      & 163.1         \\
     2733\CFA Coroutine Lazy             & 6                     & 6.07          & 0.26          \\
     2734\CFA Coroutine Eager    & 520           & 520.61        & 2.04          \\
     2735\CFA Thread                             & 2032          & 2016.29       & 112.07        \\
     2736\uC Coroutine                   & 106           & 107.36        & 1.47          \\
     2737\uC Thread                              & 536.5         & 537.07        & 4.64          \\
     2738Goroutine                               & 3103          & 3086.29       & 90.25         \\
     2739Java Thread                             & 103416.5      & 103732.29     & 1137          \\
     2740\end{tabular}
     2741\end{multicols}
    27002742
    27012743
     
    27142756\begin{cfa}[aboveskip=0pt,belowskip=0pt]
    27152757coroutine C {} c;
    2716 void main( C & ) { for ( ;; ) { @suspend();@ } }
     2758void main( C & ) { for ( ;; ) { @suspend;@ } }
    27172759int main() { // coroutine test
    27182760        BENCH( for ( N ) { @resume( c );@ } )
     
    27472789\paragraph{Mutual-Exclusion}
    27482790
    2749 Mutual exclusion is measured by entering/leaving a critical section.
     2791Uncontented mutual exclusion, which occurs frequently, is measured by entering/leaving a critical section.
    27502792For monitors, entering and leaving a monitor function is measured.
    27512793To put the results in context, the cost of entering a non-inline function and the cost of acquiring and releasing a @pthread_mutex@ lock is also measured.
     
    27902832\paragraph{Internal Scheduling}
    27912833
    2792 Internal scheduling is measured by waiting on and signalling a condition variable.
     2834Internal scheduling is measured using a cycle of two threads signalling and waiting.
    27932835Figure~\ref{f:int-sched} shows the code for \CFA, with results in Table~\ref{tab:int-sched}.
    27942836Note, the incremental cost of bulk acquire for \CFA, which is largely a fixed cost for small numbers of mutex objects.
     
    28012843monitor M { condition c; } m;
    28022844void __attribute__((noinline))
    2803 do_call( M & mutex a1 ) { signal( c ); }
     2845do_call( M & mutex a1 ) { @signal( c );@ }
    28042846thread T {};
    28052847void main( T & this ) {
    28062848        while ( go == 0 ) { yield(); }
    2807         while ( go == 1 ) { @do_call( m );@ }
     2849        while ( go == 1 ) { do_call( m ); }
    28082850}
    28092851int  __attribute__((noinline))
     
    28432885\paragraph{External Scheduling}
    28442886
    2845 External scheduling is measured by accepting a call using the @waitfor@ statement (@_Accept@ in \uC).
     2887External scheduling is measured using a cycle of two threads calling and accepting the call using the @waitfor@ statement.
    28462888Figure~\ref{f:ext-sched} shows the code for \CFA, with results in Table~\ref{tab:ext-sched}.
    28472889Note, the incremental cost of bulk acquire for \CFA, which is largely a fixed cost for small numbers of mutex objects.
     
    28902932
    28912933
    2892 
    2893 \paragraph{Object Creation}
    2894 
    2895 Object creation is measured by creating/deleting the specific kind of concurrent object.
    2896 Figure~\ref{f:creation} shows the code for \CFA, with results in Table~\ref{tab:creation}.
    2897 The only note here is that the call stacks of \CFA coroutines are lazily created, therefore without priming the coroutine to force stack creation, the creation cost is artificially low.
    2898 
    2899 \begin{multicols}{2}
    2900 \lstset{language=CFA,moredelim=**[is][\color{red}]{@}{@},deletedelim=**[is][]{`}{`}}
    2901 \begin{cfa}
    2902 thread MyThread {};
    2903 void main( MyThread & ) {}
    2904 int main() {
    2905         BENCH( for ( N ) { @MyThread m;@ } )
    2906         sout | result`ns;
    2907 }
    2908 \end{cfa}
    2909 \captionof{figure}{\CFA object-creation benchmark}
    2910 \label{f:creation}
    2911 
    2912 \columnbreak
    2913 
    2914 \vspace*{-16pt}
    2915 \captionof{table}{Object creation comparison (nanoseconds)}
    2916 \label{tab:creation}
    2917 
    2918 \begin{tabular}[t]{@{}r*{3}{D{.}{.}{5.2}}@{}}
    2919 \multicolumn{1}{@{}c}{} & \multicolumn{1}{c}{Median} & \multicolumn{1}{c}{Average} & \multicolumn{1}{c@{}}{Std Dev} \\
    2920 Pthreads                                & 28091         & 28073.39      & 163.1         \\
    2921 \CFA Coroutine Lazy             & 6                     & 6.07          & 0.26          \\
    2922 \CFA Coroutine Eager    & 520           & 520.61        & 2.04          \\
    2923 \CFA Thread                             & 2032          & 2016.29       & 112.07        \\
    2924 \uC Coroutine                   & 106           & 107.36        & 1.47          \\
    2925 \uC Thread                              & 536.5         & 537.07        & 4.64          \\
    2926 Goroutine                               & 3103          & 3086.29       & 90.25         \\
    2927 Java Thread                             & 103416.5      & 103732.29     & 1137          \\
    2928 \end{tabular}
    2929 \end{multicols}
    2930 
    2931 
    29322934\section{Conclusion}
    29332935
     
    29402942The \CFA runtime provides concurrency based on a preemptive M:N user-level threading-system, executing in clusters, which encapsulate scheduling of work on multiple kernel threads providing parallelism.
    29412943The M:N model is judged to be efficient and provide greater flexibility than a 1:1 threading model.
    2942 These concepts and the entire \CFA runtime-system are written in the \CFA language, extensively leveraging the \CFA type-system, which demonstrates the expressiveness of the \CFA language.
     2944These concepts and the \CFA runtime-system are written in the \CFA language, extensively leveraging the \CFA type-system, which demonstrates the expressiveness of the \CFA language.
    29432945Performance comparisons with other concurrent systems/languages show the \CFA approach is competitive across all low-level operations, which translates directly into good performance in well-written concurrent applications.
    29442946C programmers should feel comfortable using these mechanisms for developing complex control-flow in applications, with the ability to obtain maximum available performance by selecting mechanisms at the appropriate level of need.
     
    29912993
    29922994The authors would like to recognize the design assistance of Aaron Moss, Rob Schluntz and Andrew Beach on the features described in this paper.
    2993 Funding for this project has been provided by Huawei Ltd.\ (\url{http://www.huawei.com}), and Peter Buhr is partially funded by the Natural Sciences and Engineering Research Council of Canada.
     2995Funding for this project has been provided by Huawei Ltd.\ (\url{http://www.huawei.com}). %, and Peter Buhr is partially funded by the Natural Sciences and Engineering Research Council of Canada.
    29942996
    29952997{%
  • doc/papers/concurrency/figures/monitor.fig

    rb4d34fa rbd12159  
    88-2
    991200 2
    10 5 1 0 1 -1 -1 0 0 -1 0.000 0 1 0 0 1500.000 3600.000 1500 3300 1200 3600 1500 3900
    11 5 1 0 1 -1 -1 0 0 -1 0.000 0 1 0 0 1500.000 4500.000 1500 4200 1200 4500 1500 4800
    12 6 2400 2400 2700 2700
    13 1 3 0 1 -1 -1 1 0 4 0.000 1 0.0000 2550 2550 105 105 2550 2550 2655 2550
    14 4 1 -1 0 0 0 10 0.0000 2 105 90 2550 2610 b\001
    15 -6
    16 6 2400 2700 2700 3000
    17 1 3 0 1 -1 -1 1 0 4 0.000 1 0.0000 2550 2850 105 105 2550 2850 2655 2850
    18 4 1 -1 0 0 0 10 0.0000 2 75 75 2550 2895 a\001
    19 -6
    20 6 3300 2400 3600 2700
    21 1 3 0 1 -1 -1 1 0 4 0.000 1 0.0000 3450 2550 105 105 3450 2550 3555 2550
    22 4 1 -1 0 0 0 10 0.0000 2 105 90 3450 2610 d\001
    23 -6
    24 6 1350 5550 5325 5850
    25 1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 1500 5700 80 80 1500 5700 1580 5780
    26 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 2850 5700 105 105 2850 5700 2955 5805
    27 1 3 0 1 -1 -1 0 0 4 0.000 1 0.0000 4350 5700 105 105 4350 5700 4455 5805
    28 4 0 -1 0 0 0 12 0.0000 2 180 765 4575 5775 duplicate\001
    29 4 0 -1 0 0 0 12 0.0000 2 135 1035 3075 5775 blocked task\001
    30 4 0 -1 0 0 0 12 0.0000 2 135 870 1650 5775 active task\001
    31 -6
    32 6 4200 2100 4500 2400
    33 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4350 2250 105 105 4350 2250 4455 2355
    34 4 1 -1 0 0 0 10 0.0000 2 105 90 4350 2310 d\001
     105 1 0 1 -1 -1 0 0 -1 0.000 0 1 0 0 1500.000 3300.000 1500 3000 1200 3300 1500 3600
     115 1 0 1 -1 -1 0 0 -1 0.000 0 1 0 0 1500.000 4200.000 1500 3900 1200 4200 1500 4500
     126 1350 5250 5325 5550
     131 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 1500 5400 80 80 1500 5400 1580 5480
     141 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 2850 5400 105 105 2850 5400 2955 5505
     151 3 0 1 -1 -1 0 0 4 0.000 1 0.0000 4350 5400 105 105 4350 5400 4455 5505
     164 0 -1 0 0 0 12 0.0000 2 180 765 4575 5475 duplicate\001
     174 0 -1 0 0 0 12 0.0000 2 135 1035 3075 5475 blocked task\001
     184 0 -1 0 0 0 12 0.0000 2 135 870 1650 5475 active task\001
    3519-6
    36206 4200 1800 4500 2100
    37211 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4350 1950 105 105 4350 1950 4455 2055
    38 4 1 -1 0 0 0 10 0.0000 2 105 90 4350 2010 b\001
     224 1 -1 0 0 0 10 0.0000 2 105 90 4350 2010 d\001
    3923-6
    40 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 1650 3750 105 105 1650 3750 1755 3855
    41 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 1950 3750 105 105 1950 3750 2055 3855
    42 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4950 4050 105 105 4950 4050 5055 4155
    43 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 5250 4050 105 105 5250 4050 5355 4155
    44 1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3450 4725 80 80 3450 4725 3530 4805
    45 1 3 0 1 -1 -1 1 0 4 0.000 1 0.0000 3450 2850 105 105 3450 2850 3555 2850
    46 1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4350 2850 105 105 4350 2850 4455 2955
     246 4200 1500 4500 1800
     251 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4350 1650 105 105 4350 1650 4455 1755
     264 1 -1 0 0 0 10 0.0000 2 105 90 4350 1710 b\001
     27-6
     281 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 1650 3450 105 105 1650 3450 1755 3555
     291 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 1950 3450 105 105 1950 3450 2055 3555
     301 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4950 3750 105 105 4950 3750 5055 3855
     311 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 5250 3750 105 105 5250 3750 5355 3855
     321 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3450 4425 80 80 3450 4425 3530 4505
    47331 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4350 2550 105 105 4350 2550 4455 2655
     341 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4350 2250 105 105 4350 2250 4455 2355
     352 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 5
     36         1500 3000 2100 3000 2100 2700 2400 2700 2400 2100
     372 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 4
     38         1500 3600 2100 3600 2100 3900 1500 3900
     392 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 3
     40         1500 3300 2100 3300 2250 3525
    48412 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2
    49          2400 3000 2625 3150
     42         2100 3000 1950 3225
     432 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 3
     44         1500 4200 2100 4200 2250 4425
    50452 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2
    51          3300 3000 3525 3150
    52 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 5
    53          1500 3300 2100 3300 2100 3000 2400 3000 2400 2400
     46         2100 3900 1950 4125
    54472 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 4
    55          1500 3900 2100 3900 2100 4200 1500 4200
    56 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 3
    57          1500 3600 2100 3600 2250 3825
     48         1500 4500 2100 4500 2100 4800 3300 4800
    58492 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2
    59          2100 3300 1950 3525
    60 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 3
    61          1500 4500 2100 4500 2250 4725
     50         4800 3600 4650 3825
    62512 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2
    63          2100 4200 1950 4425
     52         3300 4800 3525 4950
     532 2 1 1 -1 -1 0 0 -1 4.000 0 0 0 0 0 5
     54         4200 4050 4200 3150 2700 3150 2700 4050 4200 4050
    64552 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 4
    65          1500 4800 2100 4800 2100 5100 3300 5100
    66 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2
    67          4800 3900 4650 4125
    68 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2
    69          3300 5100 3525 5250
    70 2 2 1 1 -1 -1 0 0 -1 4.000 0 0 0 0 0 5
    71          4200 4350 4200 3450 2700 3450 2700 4350 4200 4350
    72 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 4
    73          2700 2400 2700 3000 3300 3000 3300 2400
    74 2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 4
    75          3600 2400 3600 3000 4050 3000 4050 1800
     56         3600 2100 3600 2700 4050 2700 4050 1500
    76572 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 9
    77          3600 5100 4800 5100 4800 4200 5400 4200 5400 3900 4800 3900
    78          4800 3000 4500 3000 4500 1800
     58         3600 4800 4800 4800 4800 3900 5400 3900 5400 3600 4800 3600
     59         4800 2700 4500 2700 4500 1500
    79602 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
    80          4050 3000 4500 3150
    81 4 1 -1 0 0 0 12 0.0000 2 135 315 3450 5475 exit\001
    82 4 1 -1 0 0 0 12 0.0000 2 135 135 1650 3225 A\001
    83 4 1 -1 0 0 0 12 0.0000 2 135 795 1650 5025 condition\001
    84 4 1 -1 0 0 0 12 0.0000 2 135 135 1650 5250 B\001
    85 4 0 -1 0 0 0 12 0.0000 2 135 420 4950 3825 stack\001
    86 4 0 -1 0 0 0 12 0.0000 2 180 750 4950 3375 acceptor/\001
    87 4 0 -1 0 0 0 12 0.0000 2 180 750 4950 3600 signalled\001
    88 4 1 -1 0 0 0 12 0.0000 2 135 795 1650 3000 condition\001
    89 4 1 4 0 0 0 12 0.0000 2 135 135 2550 2325 X\001
    90 4 1 4 0 0 0 12 0.0000 2 135 135 3450 2325 Y\001
    91 4 1 -1 0 0 0 12 0.0000 2 135 525 3450 3825 shared\001
    92 4 1 -1 0 0 0 12 0.0000 2 135 735 3450 4125 variables\001
    93 4 1 -1 0 0 0 10 0.0000 2 75 75 3450 2895 c\001
    94 4 1 -1 0 0 0 12 0.0000 2 165 1125 3000 2100 mutex queues\001
    95 4 0 -1 0 0 3 12 0.0000 2 150 540 4950 4425 urgent\001
    96 4 1 -1 0 0 0 10 0.0000 2 75 75 4350 2895 a\001
    97 4 1 -1 0 0 0 10 0.0000 2 75 75 4350 2595 c\001
    98 4 0 -1 0 0 0 12 0.0000 2 135 525 4650 2550 arrival\001
    99 4 0 -1 0 0 0 12 0.0000 2 135 630 4650 2325 order of\001
    100 4 0 4 50 -1 0 11 0.0000 2 120 135 4075 2025 X\001
     61         4050 2700 4500 2850
     624 1 -1 0 0 0 12 0.0000 2 135 315 3450 5175 exit\001
     634 1 -1 0 0 0 12 0.0000 2 135 795 1650 4725 condition\001
     644 1 -1 0 0 0 12 0.0000 2 135 135 1650 4950 B\001
     654 0 -1 0 0 0 12 0.0000 2 135 420 4950 3525 stack\001
     664 0 -1 0 0 0 12 0.0000 2 180 750 4950 3075 acceptor/\001
     674 0 -1 0 0 0 12 0.0000 2 180 750 4950 3300 signalled\001
     684 1 -1 0 0 0 12 0.0000 2 135 525 3450 3525 shared\001
     694 1 -1 0 0 0 12 0.0000 2 135 735 3450 3825 variables\001
     704 0 -1 0 0 3 12 0.0000 2 150 540 4950 4125 urgent\001
     714 1 -1 0 0 0 10 0.0000 2 75 75 4350 2595 a\001
     724 1 -1 0 0 0 10 0.0000 2 75 75 4350 2295 c\001
     734 0 -1 0 0 0 12 0.0000 2 135 525 4650 2250 arrival\001
     744 0 -1 0 0 0 12 0.0000 2 135 630 4650 2025 order of\001
     754 0 4 50 -1 0 11 0.0000 2 120 135 4075 1725 X\001
     764 0 4 50 -1 0 11 0.0000 2 120 135 4075 2025 Y\001
    101774 0 4 50 -1 0 11 0.0000 2 120 135 4075 2325 Y\001
    102 4 0 4 50 -1 0 11 0.0000 2 120 135 4075 2625 Y\001
    103 4 0 4 50 -1 0 11 0.0000 2 120 135 4075 2925 X\001
    104 4 1 -1 0 0 0 12 0.0000 2 165 960 4275 1725 entry queue\001
     784 0 4 50 -1 0 11 0.0000 2 120 135 4075 2625 X\001
     794 1 -1 0 0 0 12 0.0000 2 165 960 4275 1425 entry queue\001
Note: See TracChangeset for help on using the changeset viewer.