Context Navigation

← Previous Change
Next Change →

Paper.tex

Timestamp:

May 18, 2018, 2:09:21 PM (7 years ago)

Author:

Aaron Moss <a3moss@…>

Branches:

new-env, with_gc

Children:

2472a19

Parents:

f6f0cca3 (diff), c7d8100c (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge remote-tracking branch 'origin/master' into with_gc

File:

: 1 edited

doc/papers/concurrency/Paper.tex (modified) (17 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/papers/concurrency/Paper.tex

-              rf6f0cca3
+              rff29f08
 \captionsetup{justification=raggedright,singlelinecheck=false}
 \usepackage{siunitx}
 \sisetup{ binary-units=true }
+\sisetup{binary-units=true}
 \hypersetup{breaklinks=true}
 …
 \renewcommand{\linenumberfont}{\scriptsize\sffamily}
 \renewcommand{\textfraction}{0.0}       % the entire page maybe devoted to floats with no text on the page at all
+\renewcommand{\textfraction}{0.0}                       % the entire page maybe devoted to floats with no text on the page at all
 \lefthyphenmin=3                                                        % hyphen only after 4 characters
 …
 %\DeclareTextCommandDefault{\textunderscore}{\leavevmode\makebox[1.2ex][c]{\rule{1ex}{0.1ex}}}
 \renewcommand{\textunderscore}{\leavevmode\makebox[1.2ex][c]{\rule{1ex}{0.075ex}}}
+%\def\myCHarFont{\fontencoding{T1}\selectfont}%
+% \def\{{\ttfamily\upshape\myCHarFont \char`\}}}%
+\renewcommand*{\thefootnote}{\Alph{footnote}} % hack because fnsymbol does not work
+%\renewcommand*{\thefootnote}{\fnsymbol{footnote}}
 \makeatletter
 …
 \setlength{\gcolumnposn}{3.5in}
 \setlength{\columnposn}{\gcolumnposn}
 \newcommand{\C}[2][\@empty]{\ifx#1\@empty\else\global\setlength{\columnposn}{#1}\global\columnposn=\columnposn\fi\hfill\makebox[\textwidth-\columnposn][l]{\lst@basicstyle{\LstCommentStyle{#2}}}}
 \newcommand{\CRT}{\global\columnposn=\gcolumnposn}
 …
 literate={-}{\makebox[1ex][c]{\raisebox{0.4ex}{\rule{0.8ex}{0.1ex}}}}1 {^}{\raisebox{0.6ex}{$\scriptstyle\land\,$}}1
         {~}{\raisebox{0.3ex}{$\scriptstyle\sim\,$}}1 % {`}{\ttfamily\upshape\hspace*{-0.1ex}`}1
+        {<-}{$\leftarrow$}2 {=>}{$\Rightarrow$}2 {->}{\makebox[1ex][c]{\raisebox{0.4ex}{\rule{0.8ex}{0.075ex}}}\kern-0.2ex{\textgreater}}2,
+        {<}{\textrm{\textless}}1 {>}{\textrm{\textgreater}}1
+        {<-}{$\leftarrow$}2 {=>}{$\Rightarrow$}2 {->}{\makebox[1ex][c]{\raisebox{0.5ex}{\rule{0.8ex}{0.075ex}}}\kern-0.2ex{\textrm{\textgreater}}}2,
 moredelim=**[is][\color{red}]{`}{`},
 }% lstset
 …
 \lstMakeShortInline@%
+\let\OLDthebibliography\thebibliography
+\renewcommand\thebibliography[1]{
+  \OLDthebibliography{#1}
+  \setlength{\parskip}{0pt}
+  \setlength{\itemsep}{4pt plus 0.3ex}
+}
 \title{\texorpdfstring{Concurrency in \protect\CFA}{Concurrency in Cforall}}
 …
 \CFA is a modern, polymorphic, \emph{non-object-oriented} extension of the C programming language.
 This paper discusses the design of the concurrency and parallelism features in \CFA, and the concurrent runtime-system.
 These features are created from scratch as ISO C lacks concurrency, relying largely on pthreads library.
+These features are created from scratch as ISO C lacks concurrency, relying largely on the pthreads library.
 Coroutines and lightweight (user) threads are introduced into the language.
 In addition, monitors are added as a high-level mechanism for mutual exclusion and synchronization.
 …
 \maketitle
+% ======================================================================
+% ======================================================================
 \section{Introduction}
-% ======================================================================
-% ======================================================================
 This paper provides a minimal concurrency \newterm{Abstract Program Interface} (API) that is simple, efficient and can be used to build other concurrency features.
 …
 An easier approach for programmers is to support higher-level constructs as the basis of concurrency.
 Indeed, for highly productive concurrent programming, high-level approaches are much more popular~\cite{Hochstein05}.
 Examples of high-level approaches are task based~\cite{TBB}, message passing~\cite{Erlang,MPI}, and implicit threading~\cite{OpenMP}.
 This paper uses the following terminology.
+Examples of high-level approaches are task (work) based~\cite{TBB}, implicit threading~\cite{OpenMP}, monitors~\cite{Java}, channels~\cite{CSP,Go}, and message passing~\cite{Erlang,MPI}.
+The following terminology is used.
 A \newterm{thread} is a fundamental unit of execution that runs a sequence of code and requires a stack to maintain state.
 Multiple simultaneous threads give rise to \newterm{concurrency}, which requires locking to ensure safe communication and access to shared data.
 % Correspondingly, concurrency is defined as the concepts and challenges that occur when multiple independent (sharing memory, timing dependencies, \etc) concurrent threads are introduced.
 \newterm{Locking}, and by extension locks, are defined as a mechanism to prevent progress of threads to provide safety.
+\newterm{Locking}, and by extension \newterm{locks}, are defined as a mechanism to prevent progress of threads to provide safety.
 \newterm{Parallelism} is running multiple threads simultaneously.
 Parallelism implies \emph{actual} simultaneous execution, where concurrency only requires \emph{apparent} simultaneous execution.
 As such, parallelism only affects performance, which is observed through differences in space and/or time.
 Hence, there are two problems to be solved in the design of concurrency for a programming language: concurrency and parallelism.
+As such, parallelism only affects performance, which is observed through differences in space and/or time at runtime.
+Hence, there are two problems to be solved: concurrency and parallelism.
 While these two concepts are often combined, they are distinct, requiring different tools~\cite[\S~2]{Buhr05a}.
 Concurrency tools handle synchronization and mutual exclusion, while parallelism tools handle performance, cost and resource utilization.
 The proposed concurrency API is implemented in a dialect of C, called \CFA.
+The paper discusses how the language features are added to the \CFA translator with respect to parsing, semantic, and type checking, and the corresponding high-perforamnce runtime-library to implement the concurrency features.
+% ======================================================================
+% ======================================================================
+The paper discusses how the language features are added to the \CFA translator with respect to parsing, semantic, and type checking, and the corresponding high-performance runtime-library to implement the concurrency features.
 \section{\CFA Overview}
-% ======================================================================
-% ======================================================================
 The following is a quick introduction to the \CFA language, specifically tailored to the features needed to support concurrency.
 Most of the following code examples can be found on the \CFA website~\cite{Cforall}.
 \CFA is an extension of ISO-C, and therefore, supports all of the same paradigms as C.
+Extended versions and explanation of the following code examples are available at the \CFA website~\cite{Cforall} or in Moss~\etal~\cite{Moss18}.
+\CFA is an extension of ISO-C, and hence, supports all C paradigms.
 %It is a non-object-oriented system-language, meaning most of the major abstractions have either no runtime overhead or can be opted out easily.
 Like C, the basics of \CFA revolve around structures and routines, which are thin abstractions over machine code.
 The vast majority of the code produced by the \CFA translator respects memory layouts and calling conventions laid out by C.
+Interestingly, while \CFA is not an object-oriented language, lacking the concept of a receiver (\eg @this@) and inheritance, it does have some notion of objects\footnote{C defines the term objects as : ``region of data storage in the execution environment, the contents of which can represent
 values''~\cite[3.15]{C11}}, most importantly construction and destruction of objects.
+Like C, the basics of \CFA revolve around structures and functions.
+Virtually all of the code generated by the \CFA translator respects C memory layouts and calling conventions.
+While \CFA is not an object-oriented language, lacking the concept of a receiver (\eg @this@) and nominal inheritance-relationships, C does have a notion of objects: ``region of data storage in the execution environment, the contents of which can represent values''~\cite[3.15]{C11}.
+While some \CFA features are common in object-oriented programming-languages, they are an independent capability allowing \CFA to adopt them while retaining a procedural paradigm.
 \subsection{References}
+Like \CC, \CFA introduces rebind-able references providing multiple dereferencing as an alternative to pointers.
+In regards to concurrency, the semantic difference between pointers and references are not particularly relevant, but since this document uses mostly references, here is a quick overview of the semantics:
+\begin{cfa}
+int x, y, z;
+int * p1 = &x, ** p2 = &p1, *** p3 = &p2,       $\C{// pointers to x}$
+        & r1 = x,   && r2 = r1, &&& r3 = r2;    $\C{// references to x}$
+*p1 = 3; **p2 = 3; ***p3 = 3;                           $\C{// change x}$
+  r1 = 3;    r2 = 3;      r3 = 3;                       $\C{// change x}$
+**p3 = &y; *p3 = &z;                                            $\C{// change p1, p2}$
+&&r3 = &y; &r3 = &z;                                            $\C{// change p1, p2}$
+int & ar[3] = {x, y, z};                                        $\C{// initialize array of references}$
+typeof( ar[1]) p;                                                       $\C{// is int, referenced object type}$
+typeof(&ar[1]) q;                                                       $\C{// is int \&, reference type}$
+sizeof( ar[1]) == sizeof(int);                          $\C{// is true, referenced object size}$
+sizeof(&ar[1]) == sizeof(int *);                        $\C{// is true, reference size}$
+\end{cfa}
+The important take away from this code example is that a reference offers a handle to an object, much like a pointer, but which is automatically dereferenced for convenience.
+% ======================================================================
+\CFA provides multi-level rebindable references, as an alternative to pointers, which significantly reduces syntactic noise.
+\begin{cfa}
+int x = 1, y = 2, z = 3;
+int * p1 = &x, ** p2 = &p1,  *** p3 = &p2,      $\C{// pointers to x}$
+        `&` r1 = x,  `&&` r2 = r1,  `&&&` r3 = r2;      $\C{// references to x}$
+int * p4 = &z, `&` r4 = z;
+*p1 = 3; **p2 = 3; ***p3 = 3;       // change x
+r1 =  3;     r2 = 3;      r3 = 3;        // change x: implicit dereferences *r1, **r2, ***r3
+**p3 = &y; *p3 = &p4;                // change p1, p2
+`&`r3 = &y; `&&`r3 = &`&`r4;             // change r1, r2: cancel implicit dereferences (&*)**r3, (&(&*)*)*r3, &(&*)r4
+\end{cfa}
+A reference is a handle to an object, like a pointer, but is automatically dereferenced the specified number of levels.
+Referencing (address-of @&@) a reference variable cancels one of the implicit dereferences, until there are no more implicit references, after which normal expression behaviour applies.
+\subsection{\texorpdfstring{\protect\lstinline{with} Statement}{with Statement}}
+\label{s:WithStatement}
+Heterogeneous data is aggregated into a structure/union.
+To reduce syntactic noise, \CFA provides a @with@ statement (see Pascal~\cite[\S~4.F]{Pascal}) to elide aggregate field-qualification by opening a scope containing the field identifiers.
+\begin{cquote}
+\vspace*{-\baselineskip}%???
+\lstDeleteShortInline@%
+\begin{cfa}
+struct S { char c; int i; double d; };
+struct T { double m, n; };
+// multiple aggregate parameters
+\end{cfa}
+\begin{tabular}{@{}l@{\hspace{2\parindentlnth}}|@{\hspace{2\parindentlnth}}l@{}}
+\begin{cfa}
+void f( S & s, T & t ) {
+        `s.`c; `s.`i; `s.`d;
+        `t.`m; `t.`n;
+}
+\end{cfa}
+&
+\begin{cfa}
+void f( S & s, T & t ) `with ( s, t )` {
+        c; i; d;                // no qualification
+        m; n;
+}
+\end{cfa}
+\end{tabular}
+\lstMakeShortInline@%
+\end{cquote}
+Object-oriented programming languages only provide implicit qualification for the receiver.
+In detail, the @with@ statement has the form:
+\begin{cfa}
+$\emph{with-statement}$:
+        'with' '(' $\emph{expression-list}$ ')' $\emph{compound-statement}$
+\end{cfa}
+and may appear as the body of a function or nested within a function body.
+Each expression in the expression-list provides a type and object.
+The type must be an aggregate type.
+(Enumerations are already opened.)
+The object is the implicit qualifier for the open structure-fields.
+All expressions in the expression list are open in parallel within the compound statement, which is different from Pascal, which nests the openings from left to right.
 \subsection{Overloading}
+Another important feature of \CFA is function overloading as in Java and \CC, where routines with the same name are selected based on the number and type of the arguments.
+As well, \CFA uses the return type as part of the selection criteria, as in Ada~\cite{Ada}.
+For routines with multiple parameters and returns, the selection is complex.
+\CFA maximizes the ability to reuse names via overloading to aggressively address the naming problem.
+Both variables and functions may be overloaded, where selection is based on types, and number of returns (as in Ada~\cite{Ada}) and arguments.
+\begin{cquote}
+\vspace*{-\baselineskip}%???
+\lstDeleteShortInline@%
+\begin{cfa}
+// selection based on type
+\end{cfa}
+\begin{tabular}{@{}l@{\hspace{2\parindentlnth}}|@{\hspace{2\parindentlnth}}l@{}}
+\begin{cfa}
+const short int `MIN` = -32768;
+const int `MIN` = -2147483648;
+const long int `MIN` = -9223372036854775808L;
+\end{cfa}
+&
+\begin{cfa}
+short int si = `MIN`;
+int i = `MIN`;
+long int li = `MIN`;
+\end{cfa}
+\end{tabular}
 \begin{cfa}
 // selection based on type and number of parameters
+void f(void);                   $\C{// (1)}$
+void f(char);                   $\C{// (2)}$
+void f(int, double);    $\C{// (3)}$
+f();                                    $\C{// select (1)}$
+f('a');                                 $\C{// select (2)}$
+f(3, 5.2);                              $\C{// select (3)}$
+// selection based on  type and number of returns
+char   f(int);                  $\C{// (1)}$
+double f(int);                  $\C{// (2)}$
+char   c = f(3);                $\C{// select (1)}$
+double d = f(4);                $\C{// select (2)}$
+\end{cfa}
+This feature is particularly important for concurrency since the runtime system relies on creating different types to represent concurrency objects.
+Therefore, overloading is necessary to prevent the need for long prefixes and other naming conventions that prevent name clashes.
+As seen in section \ref{basics}, routine @main@ is an example that benefits from overloading.
+% ======================================================================
+\end{cfa}
+\begin{tabular}{@{}l@{\hspace{2.7\parindentlnth}}|@{\hspace{2\parindentlnth}}l@{}}
+\begin{cfa}
+void `f`( void );
+void `f`( char );
+void `f`( int, double );
+\end{cfa}
+&
+\begin{cfa}
+`f`();
+`f`( 'a' );
+`f`( 3, 5.2 );
+\end{cfa}
+\end{tabular}
+\begin{cfa}
+// selection based on type and number of returns
+\end{cfa}
+\begin{tabular}{@{}l@{\hspace{2\parindentlnth}}|@{\hspace{2\parindentlnth}}l@{}}
+\begin{cfa}
+char `f`( int );
+double `f`( int );
+[char, double] `f`( int );
+\end{cfa}
+&
+\begin{cfa}
+char c = `f`( 3 );
+double d = `f`( 3 );
+[d, c] = `f`( 3 );
+\end{cfa}
+\end{tabular}
+\lstMakeShortInline@%
+\end{cquote}
+Overloading is important for \CFA concurrency since the runtime system relies on creating different types to represent concurrency objects.
+Therefore, overloading is necessary to prevent the need for long prefixes and other naming conventions to prevent name clashes.
+As seen in Section~\ref{basics}, function @main@ is heavily overloaded.
+Variable overloading is useful in the parallel semantics of the @with@ statement for fields with the same name:
+\begin{cfa}
+struct S { int `i`; int j; double m; } s;
+struct T { int `i`; int k; int m; } t;
+with ( s, t ) {
+        j + k;                                                                  $\C{// unambiguous, s.j + t.k}$
+        m = 5.0;                                                                $\C{// unambiguous, s.m = 5.0}$
+        m = 1;                                                                  $\C{// unambiguous, t.m = 1}$
+        int a = m;                                                              $\C{// unambiguous, a = t.m }$
+        double b = m;                                                   $\C{// unambiguous, b = s.m}$
+        int c = `s.i` + `t.i`;                                  $\C{// unambiguous, qualification}$
+        (double)m;                                                              $\C{// unambiguous, cast s.m}$
+}
+\end{cfa}
+For parallel semantics, both @s.i@ and @t.i@ are visible the same type, so only @i@ is ambiguous without qualification.
 \subsection{Operators}
 Overloading also extends to operators.
+The syntax for denoting operator-overloading is to name a routine with the symbol of the operator and question marks where the arguments of the operation appear, \eg:
+\begin{cfa}
+int ++? (int op);                       $\C{// unary prefix increment}$
+int ?++ (int op);                       $\C{// unary postfix increment}$
+int ?+? (int op1, int op2);             $\C{// binary plus}$
+int ?<=?(int op1, int op2);             $\C{// binary less than}$
+int ?=? (int & op1, int op2);           $\C{// binary assignment}$
+int ?+=?(int & op1, int op2);           $\C{// binary plus-assignment}$
+struct S {int i, j;};
+S ?+?(S op1, S op2) {                           $\C{// add two structures}$
+Operator-overloading syntax names a routine with the operator symbol and question marks for the operands:
+\begin{cquote}
+\lstDeleteShortInline@%
+\begin{tabular}{@{}ll@{\hspace{\parindentlnth}}|@{\hspace{\parindentlnth}}l@{}}
+\begin{cfa}
+int ++? (int op);
+int ?++ (int op);
+int `?+?` (int op1, int op2);
+int ?<=?(int op1, int op2);
+int ?=? (int & op1, int op2);
+int ?+=?(int & op1, int op2);
+\end{cfa}
+&
+\begin{cfa}
+// unary prefix increment
+// unary postfix increment
+// binary plus
+// binary less than
+// binary assignment
+// binary plus-assignment
+\end{cfa}
+&
+\begin{cfa}
+struct S { int i, j; };
+S `?+?`( S op1, S op2) { // add two structures
         return (S){op1.i + op2.i, op1.j + op2.j};
+}
 S s1 = {1, 2}, s2 = {2, 3}, s3;
+s3 = s1 + s2;                                           $\C{// compute sum: s3 == {2, 5}}$
+\end{cfa}
+While concurrency does not use operator overloading directly, this feature is more important as an introduction for the syntax of constructors.
+% ======================================================================
+\subsection{Constructors/Destructors}
+Object lifetime is often a challenge in concurrency. \CFA uses the approach of giving concurrent meaning to object lifetime as a means of synchronization and/or mutual exclusion.
+Since \CFA relies heavily on the lifetime of objects, constructors and destructors is a core feature required for concurrency and parallelism. \CFA uses the following syntax for constructors and destructors:
+\begin{cfa}
+struct S {
+        size_t size;
+        int * ia;
+};
+void ?{}(S & s, int asize) {    $\C{// constructor operator}$
+        s.size = asize;                         $\C{// initialize fields}$
+        s.ia = calloc(size, sizeof(S));
+}
+void ^?{}(S & s) {                              $\C{// destructor operator}$
+        free(ia);                                       $\C{// de-initialization fields}$
+}
+int main() {
+        S x = {10}, y = {100};          $\C{// implicit calls: ?\{\}(x, 10), ?\{\}(y, 100)}$
+        ...                                                     $\C{// use x and y}$
+        ^x{};  ^y{};                            $\C{// explicit calls to de-initialize}$
+        x{20};  y{200};                         $\C{// explicit calls to reinitialize}$
+        ...                                                     $\C{// reuse x and y}$
+}                                                               $\C{// implicit calls: \^?\{\}(y), \^?\{\}(x)}$
+\end{cfa}
+The language guarantees that every object and all their fields are constructed.
+Like \CC, construction of an object is automatically done on allocation and destruction of the object is done on deallocation.
+Allocation and deallocation can occur on the stack or on the heap.
+\begin{cfa}
+{
+        struct S s = {10};      $\C{// allocation, call constructor}$
+        ...
+}                                               $\C{// deallocation, call destructor}$
+struct S * s = new();   $\C{// allocation, call constructor}$
+...
+delete(s);                              $\C{// deallocation, call destructor}$
+\end{cfa}
+Note that like \CC, \CFA introduces @new@ and @delete@, which behave like @malloc@ and @free@ in addition to constructing and destructing objects, after calling @malloc@ and before calling @free@, respectively.
+% ======================================================================
+s3 = s1 `+` s2;         // compute sum: s3 == {2, 5}
+\end{cfa}
+\end{tabular}
+\lstMakeShortInline@%
+\end{cquote}
+While concurrency does not use operator overloading directly, it provides an introduction for the syntax of constructors.
 \subsection{Parametric Polymorphism}
 \label{s:ParametricPolymorphism}
+Routines in \CFA can also be reused for multiple types.
 This capability is done using the @forall@ clauses, which allow separately compiled routines to support generic usage over multiple types.
+The signature feature of \CFA is parametric-polymorphic functions~\cite{} with functions generalized using a @forall@ clause (giving the language its name), which allow separately compiled routines to support generic usage over multiple types.
 For example, the following sum function works for any type that supports construction from 0 and addition:
 \begin{cfa}
+// constraint type, 0 and +
+forall(otype T | { void ?{}(T *, zero_t); T ?+?(T, T); })
+T sum(T a[ ], size_t size) {
+        T total = 0;                            $\C{// construct T from 0}$
+        for(size_t i = 0; i < size; i++)
+                total = total + a[i];   $\C{// select appropriate +}$
+forall( otype T | { void `?{}`( T *, zero_t ); T `?+?`( T, T ); } ) // constraint type, 0 and +
+T sum( T a[$\,$], size_t size ) {
+        `T` total = { `0` };                                    $\C{// initialize by 0 constructor}$
+        for ( size_t i = 0; i < size; i += 1 )
+                total = total `+` a[i];                         $\C{// select appropriate +}$
         return total;
+}
 S sa[5];
+int i = sum(sa, 5);                             $\C{// use S's 0 construction and +}$
+\end{cfa}
+Since writing constraints on types can become cumbersome for more constrained functions, \CFA also has the concept of traits.
+Traits are named collection of constraints that can be used both instead and in addition to regular constraints:
+\begin{cfa}
+trait summable( otype T ) {
+        void ?{}(T *, zero_t);          $\C{// constructor from 0 literal}$
+        T ?+?(T, T);                            $\C{// assortment of additions}$
+        T ?+=?(T *, T);
+        T ++?(T *);
+        T ?++(T *);
+int i = sum( sa, 5 );                                           $\C{// use S's 0 construction and +}$
+\end{cfa}
+\CFA provides \newterm{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration:
+\begin{cfa}
+trait `sumable`( otype T ) {
+        void `?{}`( T &, zero_t );                              $\C{// 0 literal constructor}$
+        T `?+?`( T, T );                                                $\C{// assortment of additions}$
+        T ?+=?( T &, T );
+        T ++?( T & );
+        T ?++( T & );
 };
+forall( otype T | summable(T) ) $\C{// use trait}$
+T sum(T a[], size_t size);
+\end{cfa}
+Note that the type use for assertions can be either an @otype@ or a @dtype@.
+Types declared as @otype@ refer to ``complete'' objects, \ie objects with a size, a default constructor, a copy constructor, a destructor and an assignment operator.
+Using @dtype@, on the other hand, has none of these assumptions but is extremely restrictive, it only guarantees the object is addressable.
+% ======================================================================
+\subsection{with Clause/Statement}
+Since \CFA lacks the concept of a receiver, certain functions end up needing to repeat variable names often.
+To remove this inconvenience, \CFA provides the @with@ statement, which opens an aggregate scope making its fields directly accessible (like Pascal).
+\begin{cfa}
+struct S { int i, j; };
+int mem(S & this) with (this)           $\C{// with clause}$
+        i = 1;                                                  $\C{// this->i}$
+        j = 2;                                                  $\C{// this->j}$
+}
+int foo() {
+        struct S1 { ... } s1;
+        struct S2 { ... } s2;
+        with (s1)                                               $\C{// with statement}$
+        {
+                // access fields of s1 without qualification
+                with (s2)                                       $\C{// nesting}$
+                {
+                        // access fields of s1 and s2 without qualification
+                }
+        }
+        with (s1, s2)                                   $\C{// scopes open in parallel}$
+        {
+                // access fields of s1 and s2 without qualification
+        }
+}
+\end{cfa}
+For more information on \CFA see \cite{cforall-ug,Schluntz17,www-cfa}.
+% ======================================================================
+% ======================================================================
+forall( otype T `| sumable( T )` )                      $\C{// use trait}$
+T sum( T a[$\,$], size_t size );
+\end{cfa}
+Assertions can be @otype@ or @dtype@.
+@otype@ refers to a ``complete'' object, \ie an object has a size, default constructor, copy constructor, destructor and an assignment operator.
+@dtype@ only guarantees an object has a size and alignment.
+Using the return type for discrimination, it is possible to write a type-safe @alloc@ based on the C @malloc@:
+\begin{cfa}
+forall( dtype T | sized(T) ) T * alloc( void ) { return (T *)malloc( sizeof(T) ); }
+int * ip = alloc();                                                     $\C{// select type and size from left-hand side}$
+double * dp = alloc();
+struct S {...} * sp = alloc();
+\end{cfa}
+where the return type supplies the type/size of the allocation, which is impossible in most type systems.
+\subsection{Constructors / Destructors}
+Object lifetime is a challenge in non-managed programming languages.
+\CFA responds with \CC-like constructors and destructors:
+\begin{cfa}
+struct VLA { int len, * data; };                        $\C{// variable length array of integers}$
+void ?{}( VLA & vla ) with ( vla ) { len = 10;  data = alloc( len ); }  $\C{// default constructor}$
+void ?{}( VLA & vla, int size, char fill ) with ( vla ) { len = size;  data = alloc( len, fill ); } // initialization
+void ?{}( VLA & vla, VLA other ) { vla.len = other.len;  vla.data = other.data; } $\C{// copy, shallow}$
+void ^?{}( VLA & vla ) with ( vla ) { free( data ); } $\C{// destructor}$
+{
+        VLA  x,            y = { 20, 0x01 },     z = y; $\C{// z points to y}$
+        //    x{};         y{ 20, 0x01 };          z{ z, y };
+        ^x{};                                                                   $\C{// deallocate x}$
+        x{};                                                                    $\C{// reallocate x}$
+        z{ 5, 0xff };                                                   $\C{// reallocate z, not pointing to y}$
+        ^y{};                                                                   $\C{// deallocate y}$
+        y{ x };                                                                 $\C{// reallocate y, points to x}$
+        x{};                                                                    $\C{// reallocate x, not pointing to y}$
+        //  ^z{};  ^y{};  ^x{};
+}
+\end{cfa}
+Like \CC, construction is implicit on allocation (stack/heap) and destruction is implicit on deallocation.
+The object and all their fields are constructed/destructed.
+\CFA also provides @new@ and @delete@, which behave like @malloc@ and @free@, in addition to constructing and destructing objects:
+\begin{cfa}
+{       struct S s = {10};                                              $\C{// allocation, call constructor}$
+        ...
+}                                                                                       $\C{// deallocation, call destructor}$
+struct S * s = new();                                           $\C{// allocation, call constructor}$
+...
+delete( s );                                                            $\C{// deallocation, call destructor}$
+\end{cfa}
+\CFA concurrency uses object lifetime as a means of synchronization and/or mutual exclusion.
 \section{Concurrency Basics}\label{basics}
+% ======================================================================
+% ======================================================================
+At its core, concurrency is based on having multiple call-stacks and scheduling among threads of execution executing on these stacks.
+Multiple call stacks (or contexts) and a single thread of execution does \emph{not} imply concurrency.
+Execution with a single thread and multiple stacks where the thread is deterministically self-scheduling across the stacks is called \newterm{coroutining};
+execution with a single thread and multiple stacks but where the thread is scheduled by an oracle (non-deterministic from the thread's perspective) across the stacks is called concurrency~\cite[\S~3]{Buhr05a}.
+Therefore, a minimal concurrency system can be achieved using coroutines (see Section \ref{coroutine}), which instead of context-switching among each other, always defer to an oracle for where to context-switch next.
+While coroutines can execute on the caller's stack-frame, stack-full coroutines allow full generality and are sufficient as the basis for concurrency.
+The aforementioned oracle is a scheduler and the whole system now follows a cooperative threading-model (a.k.a., non-preemptive scheduling).
+The oracle/scheduler can either be a stack-less or stack-full entity and correspondingly require one or two context-switches to run a different coroutine.
+In any case, a subset of concurrency related challenges start to appear.
+For the complete set of concurrency challenges to occur, the only feature missing is preemption.
+A scheduler introduces order of execution uncertainty, while preemption introduces uncertainty about where context switches occur.
+Mutual exclusion and synchronization are ways of limiting non-determinism in a concurrent system.
+Now it is important to understand that uncertainty is desirable; uncertainty can be used by runtime systems to significantly increase performance and is often the basis of giving a user the illusion that tasks are running in parallel.
+At its core, concurrency is based on multiple call-stacks and scheduling threads executing on these stacks.
+Multiple call stacks (or contexts) and a single thread of execution, called \newterm{coroutining}~\cite{Conway63,Marlin80}, does \emph{not} imply concurrency~\cite[\S~2]{Buhr05a}.
+In coroutining, the single thread is self-scheduling across the stacks, so execution is deterministic, \ie given fixed inputs, the execution path to the outputs is fixed and predictable.
+A \newterm{stackless} coroutine executes on the caller's stack~\cite{Python} but this approach is restrictive, \eg preventing modularization and supporting only iterator/generator-style programming;
+a \newterm{stackfull} coroutine executes on its own stack, allowing full generality.
+Only stackfull coroutines are a stepping-stone to concurrency.
+The transition to concurrency, even for execution with a single thread and multiple stacks, occurs when coroutines also context switch to a scheduling oracle, introducing non-determinism from the coroutine perspective~\cite[\S~3]{Buhr05a}.
+Therefore, a minimal concurrency system is possible using coroutines (see Section \ref{coroutine}) in conjunction with a scheduler to decide where to context switch next.
+The resulting execution system now follows a cooperative threading-model, called \newterm{non-preemptive scheduling}.
+Because the scheduler is special, it can either be a stackless or stackfull coroutine.
+For stackless, the scheduler performs scheduling on the stack of the current coroutine and switches directly to the next coroutine, so there is one context switch.
+For stackfull, the current coroutine switches to the scheduler, which performs scheduling, and it then switches to the next coroutine, so there are two context switches.
+A stackfull scheduler is often used for simplicity and security, even through there is a slightly higher runtime-cost.
+Regardless of the approach used, a subset of concurrency related challenges start to appear.
+For the complete set of concurrency challenges to occur, the missing feature is \newterm{preemption}, where context switching occurs randomly between any two instructions, often based on a timer interrupt, called \newterm{preemptive scheduling}.
+While a scheduler introduces uncertainty in the order of execution, preemption introduces uncertainty where context switches occur.
+Interestingly, uncertainty is necessary for the runtime (operating) system to give the illusion of parallelism on a single processor and increase performance on multiple processors.
+The reason is that only the runtime has complete knowledge about resources and how to best utilized them.
+However, the introduction of unrestricted non-determinism results in the need for \newterm{mutual exclusion} and \newterm{synchronization} to restrict non-determinism for correctness;
+otherwise, it is impossible to write meaningful programs.
 Optimal performance in concurrent applications is often obtained by having as much non-determinism as correctness allows.
 …
 \subsection{\protect\CFA's Thread Building Blocks}
 One of the important features that are missing in C is threading\footnote{While the C11 standard defines a ``threads.h'' header, it is minimal and defined as optional.
+An important missing feature in C is threading\footnote{While the C11 standard defines a ``threads.h'' header, it is minimal and defined as optional.
 As such, library support for threading is far from widespread.
 At the time of writing the paper, neither \protect\lstinline|gcc| nor \protect\lstinline|clang| support ``threads.h'' in their standard libraries.}.
 On modern architectures, a lack of threading is unacceptable~\cite{Sutter05, Sutter05b}, and therefore modern programming languages must have the proper tools to allow users to write efficient concurrent programs to take advantage of parallelism.
+On modern architectures, a lack of threading is unacceptable~\cite{Sutter05, Sutter05b}, and therefore existing and new programming languages must have tools for writing efficient concurrent programs to take advantage of parallelism.
 As an extension of C, \CFA needs to express these concepts in a way that is as natural as possible to programmers familiar with imperative languages.
+And being a system-level language means programmers expect to choose precisely which features they need and which cost they are willing to pay.
+Furthermore, because C is a system-level language, programmers expect to choose precisely which features they need and which cost they are willing to pay.
+Hence, concurrent programs should be written using high-level mechanisms, and only step down to lower-level mechanisms when performance bottlenecks are encountered.
 \subsection{Coroutines: A Stepping Stone}\label{coroutine}
 While the focus of this proposal is concurrency and parallelism, it is important to address coroutines, which are a significant building block of a concurrency system.
 \newterm{Coroutine}s are generalized routines with points where execution is suspended and resumed at a later time.
 Suspend/resume is a context switche and coroutines have other context-management operations.
 Many design challenges of threads are partially present in designing coroutines, which makes the design effort relevant.
 The core \textbf{api} of coroutines has two features: independent call-stacks and @suspend@/@resume@.
+A coroutine handles the class of problems that need to retain state between calls (\eg plugin, device driver, finite-state machine).
 For example, a problem made easier with coroutines is unbounded generators, \eg generating an infinite sequence of Fibonacci numbers:
+While the focus of this discussion is concurrency and parallelism, it is important to address coroutines, which are a significant building block of a concurrency system.
+Coroutines are generalized routines allowing execution to be temporarily suspend and later resumed.
+Hence, unlike a normal routine, a coroutine may not terminate when it returns to its caller, allowing it to be restarted with the values and execution location present at the point of suspension.
+This capability is accomplish via the coroutine's stack, where suspend/resume context switch among stacks.
+Because threading design-challenges are present in coroutines, their design effort is relevant, and this effort can be easily exposed to programmers giving them a useful new programming paradigm because a coroutine handles the class of problems that need to retain state between calls, \eg plugins, device drivers, and finite-state machines.
+Therefore, the core \CFA coroutine-API for has two fundamental features: independent call-stacks and @suspend@/@resume@ operations.
+For example, a problem made easier with coroutines is unbounded generators, \eg generating an infinite sequence of Fibonacci numbers, where Figure~\ref{f:C-fibonacci} shows conventional approaches for writing a Fibonacci generator in C.
 \begin{displaymath}
 f(n) = \left \{
+\mathsf{fib}(n) = \left \{
 \begin{array}{ll}
                                & n = 0         \\
                                & n = 1         \\
 f(n-1) + f(n-2) & n \ge 2       \\
+                                       & n = 0         \\
+                                       & n = 1         \\
+\mathsf{fib}(n-1) + \mathsf{fib}(n-2)   & n \ge 2       \\
 \end{array}
 \right.
 \end{displaymath}
-Figure~\ref{f:C-fibonacci} shows conventional approaches for writing a Fibonacci generator in C.
 Figure~\ref{f:GlobalVariables} illustrates the following problems:
 unencapsulated global variables necessary to retain state between calls;
 only one fibonacci generator can run at a time;
 execution state must be explicitly retained.
+unique unencapsulated global variables necessary to retain state between calls;
+only one Fibonacci generator;
+execution state must be explicitly retained via explicit state variables.
 Figure~\ref{f:ExternalState} addresses these issues:
 unencapsulated program global variables become encapsulated structure variables;
 multiple fibonacci generators can run at a time by declaring multiple fibonacci objects;
 explicit execution state is removed by precomputing the first two Fibonacci numbers and returning $f(n-2)$.
+unique global variables are replaced by multiple Fibonacci objects;
+explicit execution state is removed by precomputing the first two Fibonacci numbers and returning $\mathsf{fib}(n-2)$.
 \begin{figure}
 …
 \begin{lstlisting}[aboveskip=0pt,belowskip=0pt]
 `coroutine` Fib { int fn; };
 void main( Fib & f ) with( f ) {
+void main( Fib & fib ) with( fib ) {
         int f1, f2;
         fn = 0;  f1 = fn;  `suspend()`;
 …
 \begin{lstlisting}[aboveskip=0pt,belowskip=0pt]
 `coroutine` Fib { int ret; };
 void main( Fib & f ) with( f ) {
+void main( Fib & f ) with( fib ) {
         int fn, f1 = 1, f2 = 0;
         for ( ;; ) {
 …
 \end{figure}
+Figure~\ref{f:Coroutine3States} creates a @coroutine@ type, which provides communication for multiple interface functions, and the \newterm{coroutine main}, which runs on the coroutine stack.
+\begin{cfa}
+`coroutine C { char c; int i; _Bool s; };`      $\C{// used for communication}$
+void ?{}( C & c ) { s = false; }                        $\C{// constructor}$
+void main( C & cor ) with( cor ) {                      $\C{// actual coroutine}$
+        while ( ! s ) // process c
+        if ( v == ... ) s = false;
+}
+// interface functions
+char cont( C & cor, char ch ) { c = ch; resume( cor ); return c; }
+_Bool stop( C & cor, int v ) { s = true; i = v; resume( cor ); return s; }
+\end{cfa}
+encapsulates the Fibonacci state in the  shows is an example of a solution to the Fibonacci problem using \CFA coroutines, where the coroutine stack holds sufficient state for the next generation.
+This solution has the advantage of having very strong decoupling between how the sequence is generated and how it is used.
+Indeed, this version is as easy to use as the @fibonacci_state@ solution, while the implementation is very similar to the @fibonacci_func@ example.
+Figure~\ref{f:fmt-line} shows the @Format@ coroutine for restructuring text into groups of character blocks of fixed size.
+The example takes advantage of resuming coroutines in the constructor to simplify the code and highlights the idea that interesting control flow can occur in the constructor.
+Using a coroutine, it is possible to express the Fibonacci formula directly without any of the C problems.
+Figure~\ref{f:Coroutine3States} creates a @coroutine@ type:
+\begin{cfa}
+`coroutine` Fib { int fn; };
+\end{cfa}
+which provides communication, @fn@, for the \newterm{coroutine main}, @main@, which runs on the coroutine stack, and possibly multiple interface functions, @next@.
+Like the structure in Figure~\ref{f:ExternalState}, the coroutine type allows multiple instances, where instances of this type are passed to the (overloaded) coroutine main.
+The coroutine main's stack holds the state for the next generation, @f1@ and @f2@, and the code has the three suspend points, representing the three states in the Fibonacci formula, to context switch back to the caller's resume.
+The interface function, @next@, takes a Fibonacci instance and context switches to it using @resume@;
+on return, the Fibonacci field, @fn@, contains the next value in the sequence, which is returned.
+The first @resume@ is special because it cocalls the coroutine at its coroutine main and allocates the stack;
+when the coroutine main returns, its stack is deallocated.
+Hence, @Fib@ is an object at creation, transitions to a coroutine on its first resume, and transitions back to an object when the coroutine main finishes.
+Figure~\ref{f:Coroutine1State} shows the coroutine version of the C version in Figure~\ref{f:ExternalState}.
+Coroutine generators are called \newterm{output coroutines} because values are returned by the coroutine.
+Figure~\ref{f:CFAFmt} shows an \newterm{input coroutine}, @Format@, for restructuring text into groups of character blocks of fixed size.
+For example, the input of the left is reformatted into the output on the right.
+\begin{quote}
+\tt
+\begin{tabular}{@{}l|l@{}}
+\multicolumn{1}{c|}{\textbf{\textrm{input}}} & \multicolumn{1}{c}{\textbf{\textrm{output}}} \\
+abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
+&
+\begin{tabular}[t]{@{}lllll@{}}
+abcd    & efgh  & ijkl  & mnop  & qrst  \\
+uvwx    & yzab  & cdef  & ghij  & klmn  \\
+opqr    & stuv  & wxyz  &               &
+\end{tabular}
+\end{tabular}
+\end{quote}
+The example takes advantage of resuming coroutines in the constructor to prime the coroutine loops so the first character sent for formatting appears inside the nested loops.
+The destruction provides a newline if formatted text ends with a full line.
+Figure~\ref{f:CFmt} shows the C equivalent formatter, where the loops of the coroutine are flatten (linearized) and rechecked on each call because execution location is not retained between calls.
 \begin{figure}
+\begin{cfa}[xleftmargin=4\parindentlnth]
+\centering
+\newbox\myboxA
+\begin{lrbox}{\myboxA}
+\begin{lstlisting}[aboveskip=0pt,belowskip=0pt]
 `coroutine` Format {
         char ch;                                                                $\C{// used for communication}$
         int g, b;                                                               $\C{// global because used in destructor}$
+        char ch;   // used for communication
+        int g, b;  // global because used in destructor
 };
-void ?{}( Format & fmt ) { `resume( fmt );` } $\C{// prime (start) coroutine}$
-void ^?{}( Format & fmt ) with( fmt ) { if ( g != 0 || b != 0 ) sout | endl; }
 void main( Format & fmt ) with( fmt ) {
         for ( ;; ) {                                                    $\C{// for as many characters}$
                 for ( g = 0; g < 5; g += 1 ) {          $\C{// groups of 5 blocks}$
                         for ( b = 0; b < 4; b += 1 ) {  $\C{// blocks of 4 characters}$
+        for ( ;; ) {
+                for ( g = 0; g < 5; g += 1 ) {  // group
+                        for ( b = 0; b < 4; b += 1 ) { // block
                                 `suspend();`
                                 sout | ch;                                      $\C{// print character}$
+                                sout | ch;              // separator
+                        }
                         sout | "  ";                                    $\C{// print block separator}$
+                        sout | "  ";               // separator
+                }
                 sout | endl;                                            $\C{// print group separator}$
+                sout | endl;
+        }
+}
+void prt( Format & fmt, char ch ) {
+        fmt.ch = ch;
+void ?{}( Format & fmt ) { `resume( fmt );` }
+void ^?{}( Format & fmt ) with( fmt ) {
+        if ( g != 0 || b != 0 ) sout | endl;
+}
+void format( Format & fmt ) {
         `resume( fmt );`
+}
 int main() {
         Format fmt;
+        eof: for ( ;; ) {
+                sin | fmt.ch;
+          if ( eof( sin ) ) break eof;
+                format( fmt );
+        }
+}
+\end{lstlisting}
+\end{lrbox}
+\newbox\myboxB
+\begin{lrbox}{\myboxB}
+\begin{lstlisting}[aboveskip=0pt,belowskip=0pt]
+struct Format {
         char ch;
+        for ( ;; ) {                                                    $\C{// read until end of file}$
+                sin | ch;                                                       $\C{// read one character}$
+          if ( eof( sin ) ) break;                              $\C{// eof ?}$
+                prt( fmt, ch );                                         $\C{// push character for formatting}$
+        int g, b;
+};
+void format( struct Format * fmt ) {
+        if ( fmt->ch != -1 ) { // not EOF
+                printf( "%c", fmt->ch );
+                fmt->b += 1;
+                if ( fmt->b == 4 ) {  // block
+                        printf( "  " );      // separator
+                        fmt->b = 0;
+                        fmt->g += 1;
+                }
+                if ( fmt->g == 5 ) {  // group
+                        printf( "\n" );      // separator
+                        fmt->g = 0;
+                }
+        } else {
+                if ( fmt->g != 0 || fmt->b != 0 ) printf( "\n" );
+        }
+}
+\end{cfa}
+int main() {
+        struct Format fmt = { 0, 0, 0 };
+        for ( ;; ) {
+                scanf( "%c", &fmt.ch );
+          if ( feof( stdin ) ) break;
+                format( &fmt );
+        }
+        fmt.ch = -1;
+        format( &fmt );
+}
+\end{lstlisting}
+\end{lrbox}
+\subfloat[\CFA Coroutine]{\label{f:CFAFmt}\usebox\myboxA}
+\qquad
+\subfloat[C Linearized]{\label{f:CFmt}\usebox\myboxB}
 \caption{Formatting text into lines of 5 blocks of 4 characters.}
 \label{f:fmt-line}
 \end{figure}
+The previous examples are \newterm{asymmetric (semi) coroutine}s because one coroutine always calls a resuming function for another coroutine, and the resumed coroutine always suspends back to its last resumer, similar to call/return for normal functions.
+However, there is no stack growth because @resume@/@suspend@ context switch to an existing stack frames rather than create a new one.
+\newterm{Symmetric (full) coroutine}s have a coroutine call a resuming function for another coroutine, which eventually forms a cycle.
+(The trivial cycle is a coroutine resuming itself.)
+This control flow is similar to recursion for normal routines, but again there is no stack growth from the context switch.
 \begin{figure}
 \centering
 \lstset{language=CFA,escapechar={},moredelim=**[is][\protect\color{red}]{`}{`}}
+\lstset{language=CFA,escapechar={},moredelim=**[is][\protect\color{red}]{`}{`}}% allow $
 \begin{tabular}{@{}l@{\hspace{2\parindentlnth}}l@{}}
 \begin{cfa}
 …
         Prod prod;
         Cons cons = { prod };
-        srandom( getpid() );
         start( prod, 5, cons );
+}
 …
         `resume( cons );`
+}
 \end{cfa}
 \end{tabular}
 …
 \label{f:ProdCons}
 \end{figure}
+Figure~\ref{f:ProdCons} shows a producer/consumer symmetric-coroutine performing bi-directional communication.
+Since the solution involves a full-coroutining cycle, the program main creates one coroutine in isolation, passes this coroutine to its partner, and closes the cycle at the call to @start@.
+The @start@ function communicates both the number of elements to be produced and the consumer into the producer's coroutine structure.
+Then the @resume@ to @prod@ creates @prod@'s stack with a frame for @prod@'s coroutine main at the top, and context switches to it.
+@prod@'s coroutine main starts, creates local variables that are retained between coroutine activations, and executes $N$ iterations, each generating two random vales, calling the consumer to deliver the values, and printing the status returned from the consumer.
+The producer call to @delivery@ transfers values into the consumer's communication variables, resumes the consumer, and returns the consumer status.
+For the first resume, @cons@'s stack is initialized, creating local variables retained between subsequent activations of the coroutine.
+The consumer iterates until the @done@ flag is set, prints, increments status, and calls back to the producer's @payment@ member, and on return prints the receipt from the producer and increments the money for the next payment.
+The call from the consumer to the producer's @payment@ member introduces the cycle between producer and consumer.
+When @payment@ is called, the consumer copies values into the producer's communication variable and a resume is executed.
+The context switch restarts the producer at the point where it was last context switched and it continues in member @delivery@ after the resume.
+The @delivery@ member returns the status value in @prod@'s @main@ member, where the status is printed.
+The loop then repeats calling @delivery@, where each call resumes the consumer coroutine.
+The context switch to the consumer continues in @payment@.
+The consumer increments and returns the receipt to the call in @cons@'s @main@ member.
+The loop then repeats calling @payment@, where each call resumes the producer coroutine.
+After iterating $N$ times, the producer calls @stop@.
+The @done@ flag is set to stop the consumer's execution and a resume is executed.
+The context switch restarts @cons@ in @payment@ and it returns with the last receipt.
+The consumer terminates its loops because @done@ is true, its @main@ terminates, so @cons@ transitions from a coroutine back to an object, and @prod@ reactivates after the resume in @stop@.
+The @stop@ member returns and @prod@'s @main@ member terminates.
+The program main restarts after the resume in @start@.
+The @start@ member returns and the program main terminates.
 …
 \bibliography{pl,local}
 \end{document}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset ff29f08 for doc/papers/concurrency/Paper.tex

Legend:

doc/papers/concurrency/Paper.tex

Download in other formats: