Context Navigation

← Previous Change
Next Change →

Changeset 8dbedfc for doc/papers

Timestamp:

May 25, 2018, 1:37:38 PM (8 years ago)

Author:

Thierry Delisle <tdelisle@…>

Branches:

ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, stuck-waitfor-destruct, with_gc

Children:

58e822a

Parents:

13073be (diff), 34ca532 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Location:

doc/papers

Files:

: 2 edited

concurrency/Paper.tex (modified) (12 diffs)
general/Paper.tex (modified) (25 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/papers/concurrency/Paper.tex

-              r13073be
+              r8dbedfc
 %\DeclareTextCommandDefault{\textunderscore}{\leavevmode\makebox[1.2ex][c]{\rule{1ex}{0.1ex}}}
 \renewcommand{\textunderscore}{\leavevmode\makebox[1.2ex][c]{\rule{1ex}{0.075ex}}}
-%\def\myCHarFont{\fontencoding{T1}\selectfont}%
-% \def\{{\ttfamily\upshape\myCHarFont \char`\}}}%
 \renewcommand*{\thefootnote}{\Alph{footnote}} % hack because fnsymbol does not work
 …
 The coroutine main's stack holds the state for the next generation, @f1@ and @f2@, and the code has the three suspend points, representing the three states in the Fibonacci formula, to context switch back to the caller's resume.
 The interface function, @next@, takes a Fibonacci instance and context switches to it using @resume@;
 on return, the Fibonacci field, @fn@, contains the next value in the sequence, which is returned.
+on restart, the Fibonacci field, @fn@, contains the next value in the sequence, which is returned.
 The first @resume@ is special because it cocalls the coroutine at its coroutine main and allocates the stack;
 when the coroutine main returns, its stack is deallocated.
 Hence, @Fib@ is an object at creation, transitions to a coroutine on its first resume, and transitions back to an object when the coroutine main finishes.
 Figure~\ref{f:Coroutine1State} shows the coroutine version of the C version in Figure~\ref{f:ExternalState}.
 Coroutine generators are called \newterm{output coroutines} because values are returned by the coroutine.
 Figure~\ref{f:CFAFmt} shows an \newterm{input coroutine}, @Format@, for restructuring text into groups of character blocks of fixed size.
+Coroutine generators are called \newterm{output coroutines} because values are only returned.
+Figure~\ref{f:CFAFmt} shows an \newterm{input coroutine}, @Format@, for restructuring text into groups of characters of fixed-size blocks.
 For example, the input of the left is reformatted into the output on the right.
 \begin{quote}
 …
 \end{tabular}
 \end{quote}
 The example takes advantage of resuming coroutines in the constructor to prime the coroutine loops so the first character sent for formatting appears inside the nested loops.
+The example takes advantage of resuming a coroutine in the constructor to prime the loops so the first character sent for formatting appears inside the nested loops.
 The destruction provides a newline if formatted text ends with a full line.
 Figure~\ref{f:CFmt} shows the C equivalent formatter, where the loops of the coroutine are flatten (linearized) and rechecked on each call because execution location is not retained between calls.
 …
 void main( Format & fmt ) with( fmt ) {
         for ( ;; ) {
                 for ( g = 0; g < 5; g += 1 ) {  // group
+                for ( g = 0; g < 5; g += 1 ) {      // group
                         for ( b = 0; b < 4; b += 1 ) { // block
                                 `suspend();`
 …
 };
 void format( struct Format * fmt ) {
         if ( fmt->ch != -1 ) { // not EOF
+        if ( fmt->ch != -1 ) {      // not EOF ?
                 printf( "%c", fmt->ch );
                 fmt->b += 1;
 …
+                }
                 if ( fmt->g == 5 ) {  // group
                         printf( "\n" );      // separator
+                        printf( "\n" );     // separator
                         fmt->g = 0;
+                }
 …
 The previous examples are \newterm{asymmetric (semi) coroutine}s because one coroutine always calls a resuming function for another coroutine, and the resumed coroutine always suspends back to its last resumer, similar to call/return for normal functions.
 However, there is no stack growth because @resume@/@suspend@ context switch to an existing stack frames rather than create a new one.
 \newterm{Symmetric (full) coroutine}s have a coroutine call a resuming function for another coroutine, which eventually forms a cycle.
+However, there is no stack growth because @resume@/@suspend@ context switch to existing stack-frames rather than create new ones.
+\newterm{Symmetric (full) coroutine}s have a coroutine call a resuming function for another coroutine, which eventually forms a resuming-call cycle.
 (The trivial cycle is a coroutine resuming itself.)
 This control flow is similar to recursion for normal routines, but again there is no stack growth from the context switch.
 …
 The @start@ function communicates both the number of elements to be produced and the consumer into the producer's coroutine structure.
 Then the @resume@ to @prod@ creates @prod@'s stack with a frame for @prod@'s coroutine main at the top, and context switches to it.
 @prod@'s coroutine main starts, creates local variables that are retained between coroutine activations, and executes $N$ iterations, each generating two random vales, calling the consumer to deliver the values, and printing the status returned from the consumer.
+@prod@'s coroutine main starts, creates local variables that are retained between coroutine activations, and executes $N$ iterations, each generating two random values, calling the consumer to deliver the values, and printing the status returned from the consumer.
 The producer call to @delivery@ transfers values into the consumer's communication variables, resumes the consumer, and returns the consumer status.
 For the first resume, @cons@'s stack is initialized, creating local variables retained between subsequent activations of the coroutine.
 The consumer iterates until the @done@ flag is set, prints, increments status, and calls back to the producer's @payment@ member, and on return prints the receipt from the producer and increments the money for the next payment.
 The call from the consumer to the producer's @payment@ member introduces the cycle between producer and consumer.
+The consumer iterates until the @done@ flag is set, prints, increments status, and calls back to the producer via @payment@, and on return from @payment@, prints the receipt from the producer and increments @money@ (inflation).
+The call from the consumer to the @payment@ introduces the cycle between producer and consumer.
 When @payment@ is called, the consumer copies values into the producer's communication variable and a resume is executed.
 The context switch restarts the producer at the point where it was last context switched and it continues in member @delivery@ after the resume.
 The @delivery@ member returns the status value in @prod@'s @main@ member, where the status is printed.
+The context switch restarts the producer at the point where it was last context switched, so it continues in @delivery@ after the resume.
+@delivery@ returns the status value in @prod@'s coroutine main, where the status is printed.
 The loop then repeats calling @delivery@, where each call resumes the consumer coroutine.
 The context switch to the consumer continues in @payment@.
 The consumer increments and returns the receipt to the call in @cons@'s @main@ member.
+The consumer increments and returns the receipt to the call in @cons@'s coroutine main.
 The loop then repeats calling @payment@, where each call resumes the producer coroutine.
 …
 The context switch restarts @cons@ in @payment@ and it returns with the last receipt.
 The consumer terminates its loops because @done@ is true, its @main@ terminates, so @cons@ transitions from a coroutine back to an object, and @prod@ reactivates after the resume in @stop@.
 The @stop@ member returns and @prod@'s @main@ member terminates.
+@stop@ returns and @prod@'s coroutine main terminates.
 The program main restarts after the resume in @start@.
+The @start@ member returns and the program main terminates.
+\subsubsection{Construction}
+One important design challenge for implementing coroutines and threads (shown in section \ref{threads}) is that the runtime system needs to run code after the user-constructor runs to connect the fully constructed object into the system.
+In the case of coroutines, this challenge is simpler since there is no non-determinism from preemption or scheduling.
+However, the underlying challenge remains the same for coroutines and threads.
+The runtime system needs to create the coroutine's stack and, more importantly, prepare it for the first resumption.
+The timing of the creation is non-trivial since users expect both to have fully constructed objects once execution enters the coroutine main and to be able to resume the coroutine from the constructor.
+There are several solutions to this problem but the chosen option effectively forces the design of the coroutine.
+Furthermore, \CFA faces an extra challenge as polymorphic routines create invisible thunks when cast to non-polymorphic routines and these thunks have function scope.
+For example, the following code, while looking benign, can run into undefined behaviour because of thunks:
+\begin{cfa}
+// async: Runs function asynchronously on another thread
+forall(otype T)
+extern void async(void (*func)(T*), T* obj);
+forall(otype T)
+void noop(T*) {}
+void bar() {
+        int a;
+        async(noop, &a); // start thread running noop with argument a
+}
+\end{cfa}
+The generated C code\footnote{Code trimmed down for brevity} creates a local thunk to hold type information:
+\begin{cfa}
+extern void async(/* omitted */, void (*func)(void*), void* obj);
+void noop(/* omitted */, void* obj){}
+void bar(){
+        int a;
+        void _thunk0(int* _p0){
+                /* omitted */
+                noop(/* omitted */, _p0);
+        }
+        /* omitted */
+        async(/* omitted */, ((void (*)(void*))(&_thunk0)), (&a));
+}
+\end{cfa}
+The problem in this example is a storage management issue, the function pointer @_thunk0@ is only valid until the end of the block, which limits the viable solutions because storing the function pointer for too long causes undefined behaviour; \ie the stack-based thunk being destroyed before it can be used.
+This challenge is an extension of challenges that come with second-class routines.
+Indeed, GCC nested routines also have the limitation that nested routine cannot be passed outside of the declaration scope.
+The case of coroutines and threads is simply an extension of this problem to multiple call stacks.
+\subsubsection{Alternative: Composition}
+One solution to this challenge is to use composition/containment, where coroutine fields are added to manage the coroutine.
+\begin{cfa}
+struct Fibonacci {
+        int fn; // used for communication
+        coroutine c; // composition
+};
+void FibMain(void*) {
+        //...
+}
+void ?{}(Fibonacci& this) {
+        this.fn = 0;
+        // Call constructor to initialize coroutine
+        (this.c){myMain};
+}
+\end{cfa}
+The downside of this approach is that users need to correctly construct the coroutine handle before using it.
+Like any other objects, the user must carefully choose construction order to prevent usage of objects not yet constructed.
+However, in the case of coroutines, users must also pass to the coroutine information about the coroutine main, like in the previous example.
+This opens the door for user errors and requires extra runtime storage to pass at runtime information that can be known statically.
+\subsubsection{Alternative: Reserved keyword}
+The next alternative is to use language support to annotate coroutines as follows:
+\begin{cfa}
+coroutine Fibonacci {
+        int fn; // used for communication
+};
+\end{cfa}
+The @coroutine@ keyword means the compiler can find and inject code where needed.
+The downside of this approach is that it makes coroutine a special case in the language.
+Users wanting to extend coroutines or build their own for various reasons can only do so in ways offered by the language.
+Furthermore, implementing coroutines without language supports also displays the power of the programming language used.
+While this is ultimately the option used for idiomatic \CFA code, coroutines and threads can still be constructed by users without using the language support.
+The reserved keywords are only present to improve ease of use for the common cases.
+\subsubsection{Alternative: Lambda Objects}
+@start@ returns and the program main terminates.
+\subsection{Coroutine Implementation}
+A significant implementation challenge for coroutines (and threads, see section \ref{threads}) is adding extra fields and executing code after/before the coroutine constructor/destructor and coroutine main to create/initialize/de-initialize/destroy extra fields and the stack.
+There are several solutions to this problem and the chosen option forced the \CFA coroutine design.
+Object-oriented inheritance provides extra fields and code in a restricted context, but it requires programmers to explicitly perform the inheritance:
+\begin{cfa}
+struct mycoroutine $\textbf{\textsf{inherits}}$ baseCoroutine { ... }
+\end{cfa}
+and the programming language (and possibly its tool set, \eg debugger) may need to understand @baseCoroutine@ because of the stack.
+Furthermore, the execution of constructs/destructors is in the wrong order for certain operations, \eg for threads;
+\eg, if the thread is implicitly started, it must start \emph{after} all constructors, because the thread relies on a completely initialized object, but the inherited constructor runs \emph{before} the derived.
+An alternatively is composition:
+\begin{cfa}
+struct mycoroutine {
+        ... // declarations
+        baseCoroutine dummy; // composition, last declaration
+}
+\end{cfa}
+which also requires an explicit declaration that must be the last one to ensure correct initialization order.
+However, there is nothing preventing wrong placement or multiple declarations.
 For coroutines as for threads, many implementations are based on routine pointers or function objects~\cite{Butenhof97, C++14, MS:VisualC++, BoostCoroutines15}.
 For example, Boost implements coroutines in terms of four functor object types:
+For example, Boost implements coroutines in terms of four functor object-types:
 \begin{cfa}
 asymmetric_coroutine<>::pull_type
 …
 symmetric_coroutine<>::yield_type
 \end{cfa}
+Often, the canonical threading paradigm in languages is based on function pointers, @pthread@ being one of the most well-known examples.
+The main problem of this approach is that the thread usage is limited to a generic handle that must otherwise be wrapped in a custom type.
+Since the custom type is simple to write in \CFA and solves several issues, added support for routine/lambda based coroutines adds very little.
+A variation of this would be to use a simple function pointer in the same way @pthread@ does for threads:
+\begin{cfa}
+void foo( coroutine_t cid, void* arg ) {
+        int* value = (int*)arg;
+Similarly, the canonical threading paradigm is often based on function pointers, \eg @pthread@~\cite{pthreads}, \Csharp~\cite{Csharp}, Go~\cite{Go}, and Scala~\cite{Scala}.
+However, the generic thread-handle (identifier) is limited (few operations), unless it is wrapped in a custom type.
+\begin{cfa}
+void mycor( coroutine_t cid, void * arg ) {
+        int * value = (int *)arg;                               $\C{// type unsafe, pointer-size only}$
         // Coroutine body
+}
 int main() {
+        int value = 0;
+        coroutine_t cid = coroutine_create( &foo, (void*)&value );
+        coroutine_resume( &cid );
+}
+\end{cfa}
+This semantics is more common for thread interfaces but coroutines work equally well.
+As discussed in section \ref{threads}, this approach is superseded by static approaches in terms of expressivity.
+\subsubsection{Alternative: Trait-Based Coroutines}
+Finally, the underlying approach, which is the one closest to \CFA idioms, is to use trait-based lazy coroutines.
+This approach defines a coroutine as anything that satisfies the trait @is_coroutine@ (as defined below) and is used as a coroutine.
+\begin{cfa}
+trait is_coroutine(dtype T) {
+      void main(T& this);
+      coroutine_desc* get_coroutine(T& this);
+        int input = 0, output;
+        coroutine_t cid = coroutine_create( &mycor, (void *)&input ); $\C{// type unsafe, pointer-size only}$
+        coroutine_resume( cid, (void *)input, (void **)&output ); $\C{// type unsafe, pointer-size only}$
+}
+\end{cfa}
+Since the custom type is simple to write in \CFA and solves several issues, added support for routine/lambda-based coroutines adds very little.
+The selected approach is to use language support by introducing a new kind of aggregate (structure):
+\begin{cfa}
+coroutine Fibonacci {
+        int fn; // communication variables
 };
+forall( dtype T | is_coroutine(T) ) void suspend(T&);
+forall( dtype T | is_coroutine(T) ) void resume (T&);
+\end{cfa}
+This ensures that an object is not a coroutine until @resume@ is called on the object.
+Correspondingly, any object that is passed to @resume@ is a coroutine since it must satisfy the @is_coroutine@ trait to compile.
+\end{cfa}
+The @coroutine@ keyword means the compiler (and tool set) can find and inject code where needed.
+The downside of this approach is that it makes coroutine a special case in the language.
+Users wanting to extend coroutines or build their own for various reasons can only do so in ways offered by the language.
+Furthermore, implementing coroutines without language supports also displays the power of a programming language.
+While this is ultimately the option used for idiomatic \CFA code, coroutines and threads can still be constructed without using the language support.
+The reserved keyword eases use for the common cases.
+Part of the mechanism to generalize coroutines is using a \CFA trait, which defines a coroutine as anything satisfying the trait @is_coroutine@, and this trait is used to restrict coroutine-manipulation functions:
+\begin{cfa}
+trait is_coroutine( dtype T ) {
+      void main( T & this );
+      coroutine_desc * get_coroutine( T & this );
+};
+forall( dtype T | is_coroutine(T) ) void get_coroutine( T & );
+forall( dtype T | is_coroutine(T) ) void suspend( T & );
+forall( dtype T | is_coroutine(T) ) void resume( T & );
+\end{cfa}
+This definition ensures there is a statically-typed @main@ function that is the starting point (first stack frame) of a coroutine.
+No return value or additional parameters are necessary for this function, because the coroutine type allows an arbitrary number of interface functions with corresponding arbitrary typed input/output values.
+As well, any object passed to @suspend@ and @resume@ is a coroutine since it must satisfy the @is_coroutine@ trait to compile.
 The advantage of this approach is that users can easily create different types of coroutines, for example, changing the memory layout of a coroutine is trivial when implementing the @get_coroutine@ routine.
 The \CFA keyword @coroutine@ simply has the effect of implementing the getter and forward declarations required for users to implement the main routine.
 \begin{center}
 \begin{tabular}{c c c}
+\begin{cfa}[tabsize=3]
+coroutine MyCoroutine {
+        int someValue;
+The \CFA keyword @coroutine@ implicitly implements the getter and forward declarations required for implementing the coroutine main:
+\begin{cquote}
+\begin{tabular}{@{}ccc@{}}
+\begin{cfa}
+coroutine MyCor {
+        int value;
 };
+\end{cfa} & == & \begin{cfa}[tabsize=3]
+struct MyCoroutine {
+        int someValue;
+        coroutine_desc __cor;
+\end{cfa}
+& {\Large $\Rightarrow$} &
+\begin{tabular}{@{}ccc@{}}
+\begin{cfa}
+struct MyCor {
+        int value;
+        coroutine_desc cor;
 };
+static inline
+coroutine_desc* get_coroutine(
+        struct MyCoroutine& this
+) {
+        return &this.__cor;
+}
+void main(struct MyCoroutine* this);
+\end{cfa}
+&
+\begin{cfa}
+static inline coroutine_desc *
+get_coroutine( MyCor & this ) {
+        return &this.cor;
+}
+\end{cfa}
+&
+\begin{cfa}
+void main( MyCor * this );
 \end{cfa}
 \end{tabular}
+\end{center}
+The combination of these two approaches allows users new to coroutining and concurrency to have an easy and concise specification, while more advanced users have tighter control on memory layout and initialization.
+\subsection{Thread Interface}\label{threads}
+The basic building blocks of multithreading in \CFA are \textbf{cfathread}.
+Both user and kernel threads are supported, where user threads are the concurrency mechanism and kernel threads are the parallel mechanism.
+User threads offer a flexible and lightweight interface.
+A thread can be declared using a struct declaration @thread@ as follows:
+\begin{cfa}
+thread foo {};
+\end{cfa}
+As for coroutines, the keyword is a thin wrapper around a \CFA trait:
+\begin{cfa}
+trait is_thread(dtype T) {
+      void ^?{}(T & mutex this);
+      void main(T & this);
+      thread_desc* get_thread(T & this);
+\end{tabular}
+\end{cquote}
+The combination of these two approaches allows an easy and concise specification to coroutining (and concurrency) for normal users, while more advanced users have tighter control on memory layout and initialization.
+\subsection{Thread Interface}
+\label{threads}
+Both user and kernel threads are supported, where user threads provide concurrency and kernel threads provide parallelism.
+Like coroutines and for the same design reasons, the selected approach for user threads is to use language support by introducing a new kind of aggregate (structure) and a \CFA trait:
+\begin{cquote}
+\begin{tabular}{@{}c@{\hspace{2\parindentlnth}}c@{}}
+\begin{cfa}
+thread myThread {
+        // communication variables
 };
+\end{cfa}
+Obviously, for this thread implementation to be useful it must run some user code.
+Several other threading interfaces use a function-pointer representation as the interface of threads (for example \Csharp~\cite{Csharp} and Scala~\cite{Scala}).
+However, this proposal considers that statically tying a @main@ routine to a thread supersedes this approach.
+Since the @main@ routine is already a special routine in \CFA (where the program begins), it is a natural extension of the semantics to use overloading to declare mains for different threads (the normal main being the main of the initial thread).
+\end{cfa}
+&
+\begin{cfa}
+trait is_thread( dtype T ) {
+      void main( T & this );
+      thread_desc * get_thread( T & this );
+      void ^?{}( T & `mutex` this );
+};
+\end{cfa}
+\end{tabular}
+\end{cquote}
+(The qualifier @mutex@ for the destructor parameter is discussed in Section~\ref{s:Monitors}.)
+Like a coroutine, the statically-typed @main@ function is the starting point (first stack frame) of a user thread.
+The difference is that a coroutine borrows a thread from its caller, so the first thread resuming a coroutine creates an instance of @main@;
+whereas, a user thread receives its own thread from the runtime system, which starts in @main@ as some point after the thread constructor is run.\footnote{
+The \lstinline@main@ function is already a special routine in C (where the program begins), so it is a natural extension of the semantics to use overloading to declare mains for different coroutines/threads (the normal main being the main of the initial thread).}
+No return value or additional parameters are necessary for this function, because the task type allows an arbitrary number of interface functions with corresponding arbitrary typed input/output values.
+\begin{comment} % put in appendix with coroutine version ???
 As such the @main@ routine of a thread can be defined as
 \begin{cfa}
 …
+}
 \end{cfa}
 A consequence of the strongly typed approach to main is that memory layout of parameters and return values to/from a thread are now explicitly specified in the \textbf{api}.
+Of course, for threads to be useful, it must be possible to start and stop threads and wait for them to complete execution.
 While using an \textbf{api} such as @fork@ and @join@ is relatively common in the literature, such an interface is unnecessary.
 Indeed, the simplest approach is to use \textbf{raii} principles and have threads @fork@ after the constructor has completed and @join@ before the destructor runs.
+\begin{cfa}
+thread World;
 void main(World & this) {
+\end{comment}
+For user threads to be useful, it must be possible to start and stop the underlying thread, and wait for it to complete execution.
+While using an API such as @fork@ and @join@ is relatively common, such an interface is awkward and unnecessary.
+A simple approach is to use allocation/deallocation principles, and have threads implicitly @fork@ after construction and @join@ before destruction.
+\begin{cfa}
+thread World {};
+void main( World & this ) {
         sout | "World!" | endl;
+}
+void main() {
+        World w;
+        // Thread forks here
+        // Printing "Hello " and "World!" are run concurrently
+        sout | "Hello " | endl;
+        // Implicit join at end of scope
+}
+\end{cfa}
+This semantic has several advantages over explicit semantics: a thread is always started and stopped exactly once, users cannot make any programming errors, and it naturally scales to multiple threads meaning basic synchronization is very simple.
+\begin{cfa}
+thread MyThread {
+        //...
+int main() {
+        World w`[10]`;                                                  $\C{// implicit forks after creation}$
+        sout | "Hello " | endl;                                 $\C{// "Hello " and 10 "World!" printed concurrently}$
+}                                                                                       $\C{// implicit joins before destruction}$
+\end{cfa}
+This semantics ensures a thread is started and stopped exactly once, eliminating some programming error, and scales to multiple threads for basic (termination) synchronization.
+This tree-structure (lattice) create/delete from C block-structure is generalized by using dynamic allocation, so threads can outlive the scope in which they are created, much like dynamically allocating memory lets objects outlive the scope in which they are created.
+\begin{cfa}
+int main() {
+        MyThread * heapLived;
+        {
+                MyThread blockLived;                            $\C{// fork block-based thread}$
+                heapLived = `new`( MyThread );          $\C{// fork heap-based thread}$
+                ...
+        }                                                                               $\C{// join block-based thread}$
+        ...
+        `delete`( heapLived );                                  $\C{// join heap-based thread}$
+}
+\end{cfa}
+The heap-based approach allows arbitrary thread-creation topologies, with respect to fork/join-style concurrency.
+Figure~\ref{s:ConcurrentMatrixSummation} shows concurrently adding the rows of a matrix and then totalling the subtotals sequential, after all the row threads have terminated.
+The program uses heap-based threads because each thread needs different constructor values.
+(Python provides a simple iteration mechanism to initialize array elements to different values allowing stack allocation.)
+The allocation/deallocation pattern appears unusual because allocated objects are immediately deleted without any intervening code.
+However, for threads, the deletion provides implicit synchronization, which is the intervening code.
+While the subtotals are added in linear order rather than completion order, which slight inhibits concurrency, the computation is restricted by the critical-path thread (\ie the thread that takes the longest), and so any inhibited concurrency is very small as totalling the subtotals is trivial.
+\begin{figure}
+\begin{cfa}
+thread Adder {
+    int * row, cols, & subtotal;                        $\C{// communication}$
 };
+// main
+void main(MyThread& this) {
+        //...
+}
+void foo() {
+        MyThread thrds[10];
+        // Start 10 threads at the beginning of the scope
+        DoStuff();
+        // Wait for the 10 threads to finish
+}
+\end{cfa}
+However, one of the drawbacks of this approach is that threads always form a tree where nodes must always outlive their children, \ie they are always destroyed in the opposite order of construction because of C scoping rules.
+This restriction is relaxed by using dynamic allocation, so threads can outlive the scope in which they are created, much like dynamically allocating memory lets objects outlive the scope in which they are created.
+\begin{cfa}
+thread MyThread {
+        //...
+};
+void main(MyThread& this) {
+        //...
+}
+void foo() {
+        MyThread* long_lived;
+        {
+                // Start a thread at the beginning of the scope
+                MyThread short_lived;
+                // create another thread that will outlive the thread in this scope
+                long_lived = new MyThread;
+                DoStuff();
+                // Wait for the thread short_lived to finish
+        }
+        DoMoreStuff();
+        // Now wait for the long_lived to finish
+        delete long_lived;
+}
+\end{cfa}
+% ======================================================================
+% ======================================================================
+\section{Concurrency}
+% ======================================================================
+% ======================================================================
+Several tools can be used to solve concurrency challenges.
+Since many of these challenges appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared state (Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, Akka (Scala)~\cite{Akka}).
+In these paradigms, interaction among concurrent objects relies on message passing~\cite{Thoth,Harmony,V-Kernel} or other paradigms closely relate to networking concepts (channels~\cite{CSP,Go} for example).
+However, in languages that use routine calls as their core abstraction mechanism, these approaches force a clear distinction between concurrent and non-concurrent paradigms (\ie message passing versus routine calls).
+This distinction in turn means that, in order to be effective, programmers need to learn two sets of design patterns.
+void ?{}( Adder & adder, int row[], int cols, int & subtotal ) {
+    adder.[ row, cols, &subtotal ] = [ row, cols, &subtotal ];
+}
+void main( Adder & adder ) with( adder ) {
+    subtotal = 0;
+    for ( int c = 0; c < cols; c += 1 ) {
+                subtotal += row[c];
+    }
+}
+int main() {
+    const int rows = 10, cols = 1000;
+    int matrix[rows][cols], subtotals[rows], total = 0;
+    // read matrix
+    Adder * adders[rows];
+    for ( int r = 0; r < rows; r += 1 ) {       $\C{// start threads to sum rows}$
+                adders[r] = new( matrix[r], cols, &subtotals[r] );
+    }
+    for ( int r = 0; r < rows; r += 1 ) {       $\C{// wait for threads to finish}$
+                delete( adders[r] );                            $\C{// termination join}$
+                total += subtotals[r];                          $\C{// total subtotal}$
+    }
+    sout | total | endl;
+}
+\end{cfa}
+\caption{Concurrent Matrix Summation}
+\label{s:ConcurrentMatrixSummation}
+\end{figure}
+\section{Synchronization / Mutual Exclusion}
+Uncontrolled non-deterministic execution is meaningless.
+To reestablish meaningful execution requires mechanisms to reintroduce determinism (control non-determinism), called synchronization and mutual exclusion, where synchronization is a timing relationship among threads and mutual exclusion is an access-control mechanism on data shared by threads.
+Since many deterministic challenges appear with the use of mutable shared state, some languages/libraries disallow it (Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, Akka~\cite{Akka} (Scala)).
+In these paradigms, interaction among concurrent objects is performed by stateless message-passing~\cite{Thoth,Harmony,V-Kernel} or other paradigms closely relate to networking concepts (\eg channels~\cite{CSP,Go}).
+However, in call/return-based languages, these approaches force a clear distinction (\ie introduce a new programming paradigm) between non-concurrent and concurrent computation (\ie function call versus message passing).
+This distinction means a programmers needs to learn two sets of design patterns.
 While this distinction can be hidden away in library code, effective use of the library still has to take both paradigms into account.
+Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on basic constructs like routine calls and shared objects.
+At the lowest level, concurrent paradigms are implemented as atomic operations and locks.
+Many such mechanisms have been proposed, including semaphores~\cite{Dijkstra68b} and path expressions~\cite{Campbell74}.
+However, for productivity reasons it is desirable to have a higher-level construct be the core concurrency paradigm~\cite{Hochstein05}.
+An approach that is worth mentioning because it is gaining in popularity is transactional memory~\cite{Herlihy93}.
+While this approach is even pursued by system languages like \CC~\cite{Cpp-Transactions}, the performance and feature set is currently too restrictive to be the main concurrency paradigm for system languages, which is why it was rejected as the core paradigm for concurrency in \CFA.
+One of the most natural, elegant, and efficient mechanisms for synchronization and communication, especially for shared-memory systems, is the \emph{monitor}.
+In contrast, approaches based on statefull models more closely resemble the standard call/return programming-model, resulting in a single programming paradigm.
+At the lowest level, concurrent control is implemented as atomic operations, upon which different kinds of locks mechanism are constructed, \eg semaphores~\cite{Dijkstra68b} and path expressions~\cite{Campbell74}.
+However, for productivity it is always desirable to use the highest-level construct that provides the necessary efficiency~\cite{Hochstein05}.
+A newer approach is transactional memory~\cite{Herlihy93}.
+While this approach is pursued in hardware~\cite{Nakaike15} and system languages, like \CC~\cite{Cpp-Transactions}, the performance and feature set is still too restrictive to be the main concurrency paradigm for system languages, which is why it was rejected as the core paradigm for concurrency in \CFA.
+One of the most natural, elegant, and efficient mechanisms for synchronization and mutual exclusion for shared-memory systems is the \emph{monitor}.
 Monitors were first proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}.
 Many programming languages---\eg Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Modula~\cite{Modula-2}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, NeWS~\cite{NeWS}, Emerald~\cite{Emerald}, \uC~\cite{Buhr92a} and Java~\cite{Java}---provide monitors as explicit language constructs.
+Many programming languages -- \eg Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Modula~\cite{Modula-2}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, NeWS~\cite{NeWS}, Emerald~\cite{Emerald}, \uC~\cite{Buhr92a} and Java~\cite{Java} -- provide monitors as explicit language constructs.
 In addition, operating-system kernels and device drivers have a monitor-like structure, although they often use lower-level primitives such as semaphores or locks to simulate monitors.
+For these reasons, this project proposes monitors as the core concurrency construct.
+\subsection{Basics}
+Non-determinism requires concurrent systems to offer support for mutual-exclusion and synchronization.
+Mutual-exclusion is the concept that only a fixed number of threads can access a critical section at any given time, where a critical section is a group of instructions on an associated portion of data that requires the restricted access.
+On the other hand, synchronization enforces relative ordering of execution and synchronization tools provide numerous mechanisms to establish timing relationships among threads.
+\subsubsection{Mutual-Exclusion}
+As mentioned above, mutual-exclusion is the guarantee that only a fix number of threads can enter a critical section at once.
+For these reasons, this project proposes monitors as the core concurrency construct, upon which even higher-level approaches can be easily constructed..
+\subsection{Mutual Exclusion}
+A group of instructions manipulating a specific instance of shared data that must be performed atomically is called an (individual) \newterm{critical-section}~\cite{Dijkstra65}.
+A generalization is a \newterm{group critical-section}~\cite{Joung00}, where multiple tasks with the same session may use the resource simultaneously, but different sessions may not use the resource simultaneously.
+The readers/writer problem~\cite{Courtois71} is an instance of a group critical-section, where readers have the same session and all writers have a unique session.
+\newterm{Mutual exclusion} enforces the correction number of threads are using a critical section at the same time.
 However, many solutions exist for mutual exclusion, which vary in terms of performance, flexibility and ease of use.
 Methods range from low-level locks, which are fast and flexible but require significant attention to be correct, to  higher-level concurrency techniques, which sacrifice some performance in order to improve ease of use.
 Ease of use comes by either guaranteeing some problems cannot occur (\eg being deadlock free) or by offering a more explicit coupling between data and corresponding critical section.
+Methods range from low-level locks, which are fast and flexible but require significant attention for correctness, to higher-level concurrency techniques, which sacrifice some performance to improve ease of use.
+Ease of use comes by either guaranteeing some problems cannot occur (\eg deadlock free), or by offering a more explicit coupling between shared data and critical section.
 For example, the \CC @std::atomic<T>@ offers an easy way to express mutual-exclusion on a restricted set of operations (\eg reading/writing large types atomically).
 Another challenge with low-level locks is composability.
 Locks have restricted composability because it takes careful organizing for multiple locks to be used while preventing deadlocks.
+Easing composability is another feature higher-level mutual-exclusion mechanisms often offer.
+\subsubsection{Synchronization}
 As with mutual-exclusion, low-level synchronization primitives often offer good performance and good flexibility at the cost of ease of use.
 Again, higher-level mechanisms often simplify usage by adding either better coupling between synchronization and data (\eg message passing) or offering a simpler solution to otherwise involved challenges.
+However, a significant challenge with (low-level) locks is composability because it takes careful organization for multiple locks to be used while preventing deadlock.
+Easing composability is another feature higher-level mutual-exclusion mechanisms offer.
+\subsection{Synchronization}
+Synchronization enforces relative ordering of execution, and synchronization tools provide numerous mechanisms to establish these timing relationships.
+Low-level synchronization primitives offer good performance and flexibility at the cost of ease of use.
+Higher-level mechanisms often simplify usage by adding better coupling between synchronization and data (\eg message passing), or offering a simpler solution to otherwise involved challenges, \eg barrier lock.
 As mentioned above, synchronization can be expressed as guaranteeing that event \textit{X} always happens before \textit{Y}.
+Most of the time, synchronization happens within a critical section, where threads must acquire mutual-exclusion in a certain order.
+However, it may also be desirable to guarantee that event \textit{Z} does not occur between \textit{X} and \textit{Y}.
+Not satisfying this property is called \textbf{barging}.
+For example, where event \textit{X} tries to effect event \textit{Y} but another thread acquires the critical section and emits \textit{Z} before \textit{Y}.
+The classic example is the thread that finishes using a resource and unblocks a thread waiting to use the resource, but the unblocked thread must compete to acquire the resource.
+Often synchronization is used to order access to a critical section, \eg ensuring the next kind of thread to enter a critical section is a reader thread
+If a writer thread is scheduled for next access, but another reader thread acquires the critical section first, the reader has \newterm{barged}.
+Barging can result in staleness/freshness problems, where a reader barges ahead of a write and reads temporally stale data, or a writer barges ahead of another writer overwriting data with a fresh value preventing the previous value from having an opportunity to be read.
 Preventing or detecting barging is an involved challenge with low-level locks, which can be made much easier by higher-level constructs.
+This challenge is often split into two different methods, barging avoidance and barging prevention.
+Algorithms that use flag variables to detect barging threads are said to be using barging avoidance, while algorithms that baton-pass locks~\cite{Andrews89} between threads instead of releasing the locks are said to be using barging prevention.
+% ======================================================================
+% ======================================================================
+This challenge is often split into two different approaches, barging avoidance and barging prevention.
+Algorithms that allow a barger but divert it until later are avoiding the barger, while algorithms that preclude a barger from entering during synchronization in the critical section prevent the barger completely.
+baton-pass locks~\cite{Andrews89} between threads instead of releasing the locks are said to be using barging prevention.
 \section{Monitors}
+% ======================================================================
+% ======================================================================
+\label{s:Monitors}
 A \textbf{monitor} is a set of routines that ensure mutual-exclusion when accessing shared state.
 More precisely, a monitor is a programming technique that associates mutual-exclusion to routine scopes, as opposed to mutex locks, where mutual-exclusion is defined by lock/release calls independently of any scoping of the calling routine.
 …
 Given these building blocks, it is possible to reproduce all three of the popular paradigms.
 Indeed, \textbf{uthread} is the default paradigm in \CFA.
 However, disabling \textbf{preemption} on the \textbf{cfacluster} means \textbf{cfathread} effectively become \textbf{fiber}.
+However, disabling \textbf{preemption} on a cluster means threads effectively become fibers.
 Since several \textbf{cfacluster} with different scheduling policy can coexist in the same application, this allows \textbf{fiber} and \textbf{uthread} to coexist in the runtime of an application.
 Finally, it is possible to build executors for thread pools from \textbf{uthread} or \textbf{fiber}, which includes specialized jobs like actors~\cite{Actors}.

doc/papers/general/Paper.tex

-              r13073be
+              r8dbedfc
 Nevertheless, C, first standardized almost forty years ago~\cite{ANSI89:C}, lacks many features that make programming in more modern languages safer and more productive.
 \CFA (pronounced ``C-for-all'', and written \CFA or Cforall) is an evolutionary extension of the C programming language that adds modern language-features to C, while maintaining both source and runtime compatibility with C and a familiar programming model for programmers.
+\CFA (pronounced ``C-for-all'', and written \CFA or Cforall) is an evolutionary extension of the C programming language that adds modern language-features to C, while maintaining source and runtime compatibility in the familiar C programming model.
 The four key design goals for \CFA~\cite{Bilson03} are:
 (1) The behaviour of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler;
 …
 Starting with a translator versus a compiler makes it easier and faster to generate and debug C object-code rather than intermediate, assembler or machine code.
 The translator design is based on the \emph{visitor pattern}, allowing multiple passes over the abstract code-tree, which works well for incrementally adding new feature through additional visitor passes.
 At the heart of the translator is the type resolver, which handles the polymorphic routine/type overload-resolution.
+At the heart of the translator is the type resolver, which handles the polymorphic function/type overload-resolution.
 % @plg2[8]% cd cfa-cc/src; cloc libcfa
 % -------------------------------------------------------------------------------
 …
 Finally, it is impossible to describe a programming language without usages before definitions.
 Therefore, syntax and semantics appear before explanations;
 hence, patience is necessary until details are presented.
+Therefore, syntax and semantics appear before explanations, and related work (Section~\ref{s:RelatedWork}) is deferred until \CFA is presented;
+hence, patience is necessary until details are discussed.
 …
 \end{quote}
 \vspace{-9pt}
 C already has a limited form of ad-hoc polymorphism in the form of its basic arithmetic operators, which apply to a variety of different types using identical syntax.
+C already has a limited form of ad-hoc polymorphism in its basic arithmetic operators, which apply to a variety of different types using identical syntax.
 \CFA extends the built-in operator overloading by allowing users to define overloads for any function, not just operators, and even any variable;
 Section~\ref{sec:libraries} includes a number of examples of how this overloading simplifies \CFA programming relative to C.
 …
+}
 \end{cfa}
 Since @pair( T *, T * )@ is a concrete type, there are no implicit parameters passed to @lexcmp@, so the generated code is identical to a function written in standard C using @void *@, yet the \CFA version is type-checked to ensure the fields of both pairs and the arguments to the comparison function match in type.
+Since @pair( T *, T * )@ is a concrete type, there are no implicit parameters passed to @lexcmp@, so the generated code is identical to a function written in standard C using @void *@, yet the \CFA version is type-checked to ensure the members of both pairs and the arguments to the comparison function match in type.
 Another useful pattern enabled by reused dtype-static type instantiations is zero-cost \newterm{tag-structures}.
 …
 \subsection{Member Access}
 It is also possible to access multiple fields from a single expression using a \newterm{member-access}.
+It is also possible to access multiple members from a single expression using a \newterm{member-access}.
 The result is a single tuple-valued expression whose type is the tuple of the types of the members, \eg:
 \begin{cfa}
 …
 \begin{cfa}
 forall( dtype T0, dtype T1 | sized(T0) | sized(T1) ) struct _tuple2 {
         T0 field_0;  T1 field_1;                                        $\C{// generated before the first 2-tuple}$
+        T0 member_0;  T1 member_1;                                      $\C{// generated before the first 2-tuple}$
 };
 _tuple2(int, int) f() {
         _tuple2(double, double) x;
         forall( dtype T0, dtype T1, dtype T2 | sized(T0) | sized(T1) | sized(T2) ) struct _tuple3 {
                 T0 field_0;  T1 field_1;  T2 field_2;   $\C{// generated before the first 3-tuple}$
+                T0 member_0;  T1 member_1;  T2 member_2;        $\C{// generated before the first 3-tuple}$
         };
         _tuple3(int, double, int) y;
 …
 \begin{comment}
 Since tuples are essentially structures, tuple indexing expressions are just field accesses:
+Since tuples are essentially structures, tuple indexing expressions are just member accesses:
 \begin{cfa}
 void f(int, [double, char]);
 …
 _tuple2(int, double) x;
 x.field_0+x.field_1;
 printf("%d %g\n", x.field_0, x.field_1);
 f(x.field_0, (_tuple2){ x.field_1, 'z' });
 \end{cfa}
 Note that due to flattening, @x@ used in the argument position is converted into the list of its fields.
+x.member_0+x.member_1;
+printf("%d %g\n", x.member_0, x.member_1);
+f(x.member_0, (_tuple2){ x.member_1, 'z' });
+\end{cfa}
+Note that due to flattening, @x@ used in the argument position is converted into the list of its members.
 In the call to @f@, the second and third argument components are structured into a tuple argument.
 Similarly, tuple member expressions are recursively expanded into a list of member access expressions.
 …
 The various kinds of tuple assignment, constructors, and destructors generate GNU C statement expressions.
 A variable is generated to store the value produced by a statement expression, since its fields may need to be constructed with a non-trivial constructor and it may need to be referred to multiple time, \eg in a unique expression.
+A variable is generated to store the value produced by a statement expression, since its members may need to be constructed with a non-trivial constructor and it may need to be referred to multiple time, \eg in a unique expression.
 The use of statement expressions allows the translator to arbitrarily generate additional temporary variables as needed, but binds the implementation to a non-standard extension of the C language.
 However, there are other places where the \CFA translator makes use of GNU C extensions, such as its use of nested functions, so this restriction is not new.
 …
 Heterogeneous data is often aggregated into a structure/union.
 To reduce syntactic noise, \CFA provides a @with@ statement (see Pascal~\cite[\S~4.F]{Pascal}) to elide aggregate field-qualification by opening a scope containing the field identifiers.
+To reduce syntactic noise, \CFA provides a @with@ statement (see Pascal~\cite[\S~4.F]{Pascal}) to elide aggregate member-qualification by opening a scope containing the member identifiers.
 \begin{cquote}
 \vspace*{-\baselineskip}%???
 …
 The type must be an aggregate type.
 (Enumerations are already opened.)
 The object is the implicit qualifier for the open structure-fields.
+The object is the implicit qualifier for the open structure-members.
 All expressions in the expression list are open in parallel within the compound statement, which is different from Pascal, which nests the openings from left to right.
 The difference between parallel and nesting occurs for fields with the same name and type:
 \begin{cfa}
 struct S { int `i`; int j; double m; } s, w;
+The difference between parallel and nesting occurs for members with the same name and type:
+\begin{cfa}
+struct S { int `i`; int j; double m; } s, w;    $\C{// member i has same type in structure types S and T}$
 struct T { int `i`; int k; int m; } t, w;
 with ( s, t ) {
+with ( s, t ) {                                                         $\C{// open structure variables s and t in parallel}$
         j + k;                                                                  $\C{// unambiguous, s.j + t.k}$
         m = 5.0;                                                                $\C{// unambiguous, s.m = 5.0}$
 …
 For parallel semantics, both @s.i@ and @t.i@ are visible, so @i@ is ambiguous without qualification;
 for nested semantics, @t.i@ hides @s.i@, so @i@ implies @t.i@.
 \CFA's ability to overload variables means fields with the same name but different types are automatically disambiguated, eliminating most qualification when opening multiple aggregates.
+\CFA's ability to overload variables means members with the same name but different types are automatically disambiguated, eliminating most qualification when opening multiple aggregates.
 Qualification or a cast is used to disambiguate.
 …
 \begin{cfa}
 void ?{}( S & s, int i ) with ( s ) {           $\C{// constructor}$
         `s.i = i;`  j = 3;  m = 5.5;                    $\C{// initialize fields}$
+        `s.i = i;`  j = 3;  m = 5.5;                    $\C{// initialize members}$
+}
 \end{cfa}
 …
 \lstMakeShortInline@%
 \end{cquote}
 The only exception is bit field specification, which always appear to the right of the base type.
+The only exception is bit-field specification, which always appear to the right of the base type.
 % Specifically, the character @*@ is used to indicate a pointer, square brackets @[@\,@]@ are used to represent an array or function return value, and parentheses @()@ are used to indicate a function parameter.
 However, unlike C, \CFA type declaration tokens are distributed across all variables in the declaration list.
 …
 // pointer to array of 5 doubles
 // common bit field syntax
+// common bit-field syntax
 …
 \subsection{Type Nesting}
 Nested types provide a mechanism to organize associated types and refactor a subset of fields into a named aggregate (\eg sub-aggregates @name@, @address@, @department@, within aggregate @employe@).
+Nested types provide a mechanism to organize associated types and refactor a subset of members into a named aggregate (\eg sub-aggregates @name@, @address@, @department@, within aggregate @employe@).
 Java nested types are dynamic (apply to objects), \CC are static (apply to the \lstinline[language=C++]@class@), and C hoists (refactors) nested types into the enclosing scope, meaning there is no need for type qualification.
 Since \CFA in not object-oriented, adopting dynamic scoping does not make sense;
 instead \CFA adopts \CC static nesting, using the field-selection operator ``@.@'' for type qualification, as does Java, rather than the \CC type-selection operator ``@::@'' (see Figure~\ref{f:TypeNestingQualification}).
+instead \CFA adopts \CC static nesting, using the member-selection operator ``@.@'' for type qualification, as does Java, rather than the \CC type-selection operator ``@::@'' (see Figure~\ref{f:TypeNestingQualification}).
 \begin{figure}
 \centering
 …
 Destruction parameters are useful for specifying storage-management actions, such as de-initialize but not deallocate.}.
 \begin{cfa}
 struct VLA { int len, * data; };                        $\C{// variable length array of integers}$
 void ?{}( VLA & vla ) with ( vla ) { len = 10;  data = alloc( len ); }  $\C{// default constructor}$
+struct VLA { int size, * data; };                       $\C{// variable length array of integers}$
+void ?{}( VLA & vla ) with ( vla ) { size = 10;  data = alloc( size ); }  $\C{// default constructor}$
 void ^?{}( VLA & vla ) with ( vla ) { free( data ); } $\C{// destructor}$
+{
 …
 \end{cfa}
 @VLA@ is a \newterm{managed type}\footnote{
 A managed type affects the runtime environment versus a self-contained type.}: a type requiring a non-trivial constructor or destructor, or with a field of a managed type.
+A managed type affects the runtime environment versus a self-contained type.}: a type requiring a non-trivial constructor or destructor, or with a member of a managed type.
 A managed type is implicitly constructed at allocation and destructed at deallocation to ensure proper interaction with runtime resources, in this case, the @data@ array in the heap.
 For details of the code-generation placement of implicit constructor and destructor calls among complex executable statements see~\cite[\S~2.2]{Schluntz17}.
 …
 \CFA also provides syntax for \newterm{initialization} and \newterm{copy}:
 \begin{cfa}
 void ?{}( VLA & vla, int size, char fill ) with ( vla ) {  $\C{// initialization}$
         len = size;  data = alloc( len, fill );
+void ?{}( VLA & vla, int size, char fill = '\0' ) {  $\C{// initialization}$
+        vla.[ size, data ] = [ size, alloc( size, fill ) ];
+}
 void ?{}( VLA & vla, VLA other ) {                      $\C{// copy, shallow}$
         vla.len = other.len;  vla.data = other.data;
+        vla = other;
+}
 \end{cfa}
 …
 \CFA constructors may be explicitly called, like Java, and destructors may be explicitly called, like \CC.
 Explicit calls to constructors double as a \CC-style \emph{placement syntax}, useful for construction of member fields in user-defined constructors and reuse of existing storage allocations.
+Explicit calls to constructors double as a \CC-style \emph{placement syntax}, useful for construction of members in user-defined constructors and reuse of existing storage allocations.
 Like the other operators in \CFA, there is a concise syntax for constructor/destructor function calls:
 \begin{cfa}
 …
         y{ x };                                                                 $\C{// reallocate y, points to x}$
         x{};                                                                    $\C{// reallocate x, not pointing to y}$
+        //  ^z{};  ^y{};  ^x{};
+}
+}       //  ^z{};  ^y{};  ^x{};
 \end{cfa}
 …
 For compatibility with C, a copy constructor from the first union member type is also defined.
 For @struct@ types, each of the four functions are implicitly defined to call their corresponding functions on each member of the struct.
 To better simulate the behaviour of C initializers, a set of \newterm{field constructors} is also generated for structures.
+To better simulate the behaviour of C initializers, a set of \newterm{member constructors} is also generated for structures.
 A constructor is generated for each non-empty prefix of a structure's member-list to copy-construct the members passed as parameters and default-construct the remaining members.
 To allow users to limit the set of constructors available for a type, when a user declares any constructor or destructor, the corresponding generated function and all field constructors for that type are hidden from expression resolution;
+To allow users to limit the set of constructors available for a type, when a user declares any constructor or destructor, the corresponding generated function and all member constructors for that type are hidden from expression resolution;
 similarly, the generated default constructor is hidden upon declaration of any constructor.
 These semantics closely mirror the rule for implicit declaration of constructors in \CC\cite[p.~186]{ANSI98:C++}.
 …
 \section{Related Work}
+\label{s:RelatedWork}
 …
 C provides variadic functions through @va_list@ objects, but the programmer is responsible for managing the number of arguments and their types, so the mechanism is type unsafe.
 KW-C~\cite{Buhr94a}, a predecessor of \CFA, introduced tuples to C as an extension of the C syntax, taking much of its inspiration from SETL.
 The main contributions of that work were adding MRVF, tuple mass and multiple assignment, and record-field access.
+The main contributions of that work were adding MRVF, tuple mass and multiple assignment, and record-member access.
 \CCeleven introduced @std::tuple@ as a library variadic template structure.
 Tuples are a generalization of @std::pair@, in that they allow for arbitrary length, fixed-size aggregation of heterogeneous values.

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 8dbedfc for doc/papers

Legend:

doc/papers/concurrency/Paper.tex

doc/papers/general/Paper.tex

Download in other formats: