Ignore:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/papers/concurrency/Paper.tex

    ra927662 rca0f061f  
    215215{}
    216216\lstnewenvironment{Go}[1][]
    217 {\lstset{language=go,moredelim=**[is][\protect\color{red}]{`}{`},#1}\lstset{#1}}
    218 {}
    219 \lstnewenvironment{python}[1][]
    220 {\lstset{language=python,moredelim=**[is][\protect\color{red}]{`}{`},#1}\lstset{#1}}
     217{\lstset{#1}}
    221218{}
    222219
     
    231228}
    232229
    233 \title{\texorpdfstring{Advanced Control-flow and Concurrency in \protect\CFA}{Advanced Control-flow in Cforall}}
     230\title{\texorpdfstring{Advanced Control-flow in \protect\CFA}{Advanced Control-flow in Cforall}}
    234231
    235232\author[1]{Thierry Delisle}
     
    245242\abstract[Summary]{
    246243\CFA is a modern, polymorphic, non-object-oriented, backwards-compatible extension of the C programming language.
    247 This paper discusses some advanced control-flow and concurrency/parallelism features in \CFA, along with the supporting runtime.
    248 These features are created from scratch because they do not exist in ISO C, or are low-level and/or unimplemented, so C programmers continue to rely on library features, like C pthreads.
    249 \CFA introduces language-level control-flow mechanisms, like coroutines, user-level threading, and monitors for mutual exclusion and synchronization.
    250 A unique contribution of this work is allowing multiple monitors to be safely acquired \emph{simultaneously} (deadlock free), while integrating this capability with monitor synchronization mechanisms.
    251 These features also integrate with the \CFA polymorphic type-system and exception handling, while respecting the expectations and style of C programmers.
     244This paper discusses the advanced control-flow features in \CFA, which include concurrency and parallelism, and its supporting runtime system.
     245These features are created from scratch as ISO C's concurrency is low-level and unimplemented, so C programmers continue to rely on the C pthreads library.
     246\CFA provides high-level control-flow mechanisms, like coroutines and user-level threads, and monitors for mutual exclusion and synchronization.
     247A unique contribution of this work is allowing multiple monitors to be safely acquired \emph{simultaneously} (deadlock free), while integrating this capability with all monitor synchronization mechanisms.
     248All features respect the expectations of C programmers, while being fully integrate with the \CFA polymorphic type-system and other language features.
    252249Experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming-languages.
    253250}%
     
    264261\section{Introduction}
    265262
    266 This paper discusses the design of language-level control-flow and concurrency/parallelism extensions in \CFA and its runtime.
     263This paper discusses the design of advanced, high-level control-flow extensions (especially concurrency and parallelism) in \CFA and its runtime.
    267264\CFA is a modern, polymorphic, non-object-oriented\footnote{
    268265\CFA has features often associated with object-oriented programming languages, such as constructors, destructors, virtuals and simple inheritance.
    269266However, functions \emph{cannot} be nested in structures, so there is no lexical binding between a structure and set of functions (member/method) implemented by an implicit \lstinline@this@ (receiver) parameter.},
    270267backwards-compatible extension of the C programming language~\cite{Moss18}.
    271 Within the \CFA framework, new control-flow features are created from scratch.
    272 ISO \Celeven defines only a subset of the \CFA extensions, where the overlapping features are concurrency~\cite[\S~7.26]{C11}.
    273 However, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}.
    274 Furthermore, \Celeven and pthreads concurrency is simple, based on thread fork/join in a function and a few locks, which is low-level and error prone;
    275 no high-level language concurrency features are defined.
    276 Interestingly, almost a decade after publication of the \Celeven standard, neither gcc-8, clang-8 nor msvc-19 (most recent versions) support the \Celeven include @threads.h@, indicating little interest in the C11 concurrency approach.
    277 Finally, while the \Celeven standard does not state a concurrent threading-model, the historical association with pthreads suggests implementations would adopt kernel-level threading (1:1)~\cite{ThreadModel}.
     268Within the \CFA framework, new control-flow features were created from scratch.
     269ISO \Celeven defines only a subset of the \CFA extensions, and with respect to concurrency~\cite[\S~7.26]{C11}, the features are largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}.
     270Furthermore, \Celeven and pthreads concurrency is basic, based on thread fork/join in a function and a few locks, which is low-level and error prone;
     271no high-level language concurrency features exist.
     272Interestingly, almost a decade after publication of the \Celeven standard, neither gcc-8, clang-8 nor msvc-19 (most recent versions) support the \Celeven include @threads.h@, indicating little interest in the C concurrency approach.
     273Finally, while the \Celeven standard does not state a concurrent threading-model, the historical association with pthreads suggests the threading model is kernel-level threading (1:1)~\cite{ThreadModel}.
    278274
    279275In contrast, there has been a renewed interest during the past decade in user-level (M:N, green) threading in old and new programming languages.
     
    288284
    289285A further effort over the past decade is the development of language memory-models to deal with the conflict between certain language features and compiler/hardware optimizations.
    290 This issue can be rephrased as: some language features are pervasive (language and runtime) and cannot be safely added via a library to prevent invalidation by sequential optimizations~\cite{Buhr95a,Boehm05}.
     286This issue can be rephrased as some features are pervasive (language and runtime) and cannot be safely added via a library to prevent invalidation by sequential optimizations~\cite{Buhr95a,Boehm05}.
    291287The consequence is that a language must be cognizant of these features and provide sufficient tools to program around any safety issues.
    292288For example, C created the @volatile@ qualifier to provide correct execution for @setjmp@/@logjmp@ (concurrency came later).
    293 The common solution is to provide a handful of complex qualifiers and functions (e.g., @volatile@ and atomics) allowing programmers to write consistent/race-free programs, often in the sequentially-consistent memory-model~\cite{Boehm12}.
     289The simplest solution is to provide a handful of complex qualifiers and functions (e.g., @volatile@ and atomics) allowing programmers to write consistent/race-free programs, often in the sequentially-consistent memory-model~\cite{Boehm12}.
    294290
    295291While having a sufficient memory-model allows sound libraries to be constructed, writing these libraries can quickly become awkward and error prone, and using these low-level libraries has the same issues.
    296292Essentially, using low-level explicit locks is the concurrent equivalent of assembler programming.
    297 Just as most assembler programming is replaced with high-level programming, explicit locks can be replaced with high-level concurrency in a programming language.
    298 Then the goal is for the compiler to check for correct usage and follow any complex coding conventions implicitly.
     293Just as most assembler programming is replaced with programming in a high-level language, explicit locks can be replaced with high-level concurrency constructs in a programming language.
     294The goal is to get the compiler to check for correct usage and follow any complex coding conventions implicitly.
    299295The drawback is that language constructs may preclude certain specialized techniques, therefore introducing inefficiency or inhibiting concurrency.
    300296For most concurrent programs, these drawbacks are insignificant in comparison to the speed of composition, and subsequent reliability and maintainability of the high-level concurrent program.
     
    303299As stated, this observation applies to non-concurrent forms of complex control-flow, like exception handling and coroutines.
    304300
    305 Adapting the programming language to these features also allows matching the control-flow model with the programming-language style, versus adopting one general (sound) library/paradigm.
     301Adapting the programming language allows matching the control-flow model with the programming-language style, versus adapting to one general (sound) library/paradigm.
    306302For example, it is possible to provide exceptions, coroutines, monitors, and tasks as specialized types in an object-oriented language, integrating these constructs to allow leveraging the type-system (static type-checking) and all other object-oriented capabilities~\cite{uC++}.
    307303It is also possible to leverage call/return for blocking communication via new control structures, versus switching to alternative communication paradigms, like channels or message passing.
     
    311307however, the reverse is seldom true, i.e., given implicit concurrency, e.g., actors, it is virtually impossible to create explicit concurrency, e.g., blocking thread objects.}
    312308Finally, with extended language features and user-level threading it is possible to discretely fold locking and non-blocking I/O multiplexing into the language's I/O libraries, so threading implicitly dovetails with the I/O subsystem.
    313 \CFA embraces language extensions and user-level threading to provide advanced control-flow (exception handling\footnote{
    314 \CFA exception handling will be presented in a separate paper.
    315 The key feature that dovetails with this paper is non-local exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++}
    316 } and coroutines) and concurrency.
    317 
    318 Most augmented traditional (Fortran 18~\cite{Fortran18}, Cobol 14~\cite{Cobol14}, Ada 12~\cite{Ada12}, Java 11~\cite{Java11}) and new languages (Go~\cite{Go}, Rust~\cite{Rust}, and D~\cite{D}), except \CC, diverge from C with different syntax and semantics, only interoperate indirectly with C, and are not systems languages, for those with managed memory.
    319 As a result, there is a significant learning curve to move to these languages, and C legacy-code must be rewritten.
    320 While \CC, like \CFA, takes an evolutionary approach to extend C, \CC's constantly growing complex and interdependent features-set (e.g., objects, inheritance, templates, etc.) mean idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning \CC.
    321 Hence, rewriting and retraining costs for these languages, even \CC, are prohibitive for companies with a large C software-base.
    322 \CFA with its orthogonal feature-set, its high-performance runtime, and direct access to all existing C libraries circumvents these problems.
    323 
    324 We present comparative examples so the reader can judge if the \CFA control-flow extensions are equivalent or better than those in or proposed for \Celeven, \CC and other concurrent, imperative programming languages, and perform experiments to show the \CFA runtime is competitive with other similar mechanisms.
    325 The detailed contributions of this work are:
     309
     310\CFA embraces language extensions and user-level threading to provide advanced control-flow and concurrency.
     311We attempt to show the \CFA extensions and runtime are demonstrably better than those proposed for \CC and other concurrent, imperative programming languages.
     312The contributions of this work are:
    326313\begin{itemize}
    327314\item
     
    628615
    629616
    630 \section{Coroutines: Stepping Stone}
    631 \label{coroutine}
    632 
     617\section{Coroutines: A Stepping Stone}\label{coroutine}
     618
     619Advanced controlWhile the focus of this discussion is concurrency and parallelism, it is important to address coroutines, which are a significant building block of a concurrency system (but not concurrent among themselves).
    633620Coroutines are generalized routines allowing execution to be temporarily suspended and later resumed.
    634621Hence, unlike a normal routine, a coroutine may not terminate when it returns to its caller, allowing it to be restarted with the values and execution location present at the point of suspension.
     
    656643\begin{lrbox}{\myboxA}
    657644\begin{cfa}[aboveskip=0pt,belowskip=0pt]
    658 `int fn1, fn2, state = 1;`   // single global variables
     645`int f1, f2, state = 1;`   // single global variables
    659646int fib() {
    660647        int fn;
    661648        `switch ( state )` {  // explicit execution state
    662           case 1: fn = 0;  fn1 = fn;  state = 2;  break;
    663           case 2: fn = 1;  fn2 = fn1;  fn1 = fn;  state = 3;  break;
    664           case 3: fn = fn1 + fn2;  fn2 = fn1;  fn1 = fn;  break;
     649          case 1: fn = 0;  f1 = fn;  state = 2;  break;
     650          case 2: fn = 1;  f2 = f1;  f1 = fn;  state = 3;  break;
     651          case 3: fn = f1 + f2;  f2 = f1;  f1 = fn;  break;
    665652        }
    666653        return fn;
     
    679666\begin{cfa}[aboveskip=0pt,belowskip=0pt]
    680667#define FIB_INIT `{ 0, 1 }`
    681 typedef struct { int fn2, fn1; } Fib;
     668typedef struct { int f2, f1; } Fib;
    682669int fib( Fib * f ) {
    683670
    684         int ret = f->fn2;
    685         int fn = f->fn1 + f->fn2;
    686         f->fn2 = f->fn1; f->fn1 = fn;
     671        int ret = f->f2;
     672        int fn = f->f1 + f->f2;
     673        f->f2 = f->f1; f->f1 = fn;
    687674
    688675        return ret;
     
    691678        Fib f1 = FIB_INIT, f2 = FIB_INIT;
    692679        for ( int i = 0; i < 10; i += 1 ) {
    693                 printf( "%d %d\n", fib( &fn1 ), fib( &f2 ) );
     680                printf( "%d %d\n", fib( &f1 ), fib( &f2 ) );
    694681        }
    695682}
     
    697684\end{lrbox}
    698685
    699 \subfloat[3 states: global variables]{\label{f:GlobalVariables}\usebox\myboxA}
     686\subfloat[3 States: global variables]{\label{f:GlobalVariables}\usebox\myboxA}
    700687\qquad
    701 \subfloat[1 state: encapsulated variables]{\label{f:ExternalState}\usebox\myboxB}
    702 \caption{C fibonacci fsm}
     688\subfloat[1 State: external variables]{\label{f:ExternalState}\usebox\myboxB}
     689\caption{C Fibonacci Implementations}
    703690\label{f:C-fibonacci}
    704691
     
    710697`coroutine` Fib { int fn; };
    711698void main( Fib & fib ) with( fib ) {
    712         fn = 0;  int fn1 = fn; `suspend()`;
    713         fn = 1;  int fn2 = fn1;  fn1 = fn; `suspend()`;
    714         for () {
    715                 fn = fn1 + fn2; fn2 = fn1; fn1 = fn; `suspend()`; }
    716 }
    717 int next( Fib & fib ) with( fib ) { `resume( fib );` return fn; }
     699        int f1, f2;
     700        fn = 0;  f1 = fn;  `suspend()`;
     701        fn = 1;  f2 = f1;  f1 = fn;  `suspend()`;
     702        for ( ;; ) {
     703                fn = f1 + f2;  f2 = f1;  f1 = fn;  `suspend()`;
     704        }
     705}
     706int next( Fib & fib ) with( fib ) {
     707        `resume( fib );`
     708        return fn;
     709}
    718710int main() {
    719711        Fib f1, f2;
    720         for ( 10 )
     712        for ( int i = 1; i <= 10; i += 1 ) {
    721713                sout | next( f1 ) | next( f2 );
     714        }
    722715}
    723716\end{cfa}
     
    725718\newbox\myboxB
    726719\begin{lrbox}{\myboxB}
    727 \begin{python}[aboveskip=0pt,belowskip=0pt]
    728 
    729 def Fibonacci():
    730         fn = 0; fn1 = fn; `yield fn`  # suspend
    731         fn = 1; fn2 = fn1; fn1 = fn; `yield fn`
    732         while True:
    733                 fn = fn1 + fn2; fn2 = fn1; fn1 = fn; `yield fn`
    734 
    735 
    736 f1 = Fibonacci()
    737 f2 = Fibonacci()
    738 for i in range( 10 ):
    739         print( `next( f1 )`, `next( f2 )` ) # resume
    740 
    741 \end{python}
     720\begin{cfa}[aboveskip=0pt,belowskip=0pt]
     721`coroutine` Fib { int ret; };
     722void main( Fib & f ) with( fib ) {
     723        int fn, f1 = 1, f2 = 0;
     724        for ( ;; ) {
     725                ret = f2;
     726
     727                fn = f1 + f2;  f2 = f1;  f1 = fn; `suspend();`
     728        }
     729}
     730int next( Fib & fib ) with( fib ) {
     731        `resume( fib );`
     732        return ret;
     733}
     734
     735
     736
     737
     738
     739
     740\end{cfa}
    742741\end{lrbox}
    743 \subfloat[\CFA]{\label{f:Coroutine3States}\usebox\myboxA}
    744 \qquad
    745 \subfloat[Python]{\label{f:Coroutine1State}\usebox\myboxB}
    746 \caption{Fibonacci input coroutine, 3 states, internal variables}
     742\subfloat[3 States, internal variables]{\label{f:Coroutine3States}\usebox\myboxA}
     743\qquad\qquad
     744\subfloat[1 State, internal variables]{\label{f:Coroutine1State}\usebox\myboxB}
     745\caption{\CFA Coroutine Fibonacci Implementations}
    747746\label{f:cfa-fibonacci}
    748747\end{figure}
     
    785784\begin{lrbox}{\myboxA}
    786785\begin{cfa}[aboveskip=0pt,belowskip=0pt]
    787 `coroutine` Fmt {
    788         char ch;   // communication variables
    789         int g, b;   // needed in destructor
     786`coroutine` Format {
     787        char ch;   // used for communication
     788        int g, b;  // global because used in destructor
    790789};
    791 void main( Fmt & fmt ) with( fmt ) {
    792         for () {
    793                 for ( g = 0; g < 5; g += 1 ) { // groups
    794                         for ( b = 0; b < 4; b += 1 ) { // blocks
     790void main( Format & fmt ) with( fmt ) {
     791        for ( ;; ) {
     792                for ( g = 0; g < 5; g += 1 ) {      // group
     793                        for ( b = 0; b < 4; b += 1 ) { // block
    795794                                `suspend();`
    796                                 sout | ch; } // print character
    797                         sout | "  "; } // block separator
    798                 sout | nl; }  // group separator
    799 }
    800 void ?{}( Fmt & fmt ) { `resume( fmt );` } // prime
    801 void ^?{}( Fmt & fmt ) with( fmt ) { // destructor
    802         if ( g != 0 || b != 0 ) // special case
    803                 sout | nl; }
    804 void send( Fmt & fmt, char c ) { fmt.ch = c; `resume( fmt )`; }
     795                                sout | ch;              // separator
     796                        }
     797                        sout | "  ";               // separator
     798                }
     799                sout | nl;
     800        }
     801}
     802void ?{}( Format & fmt ) { `resume( fmt );` }
     803void ^?{}( Format & fmt ) with( fmt ) {
     804        if ( g != 0 || b != 0 ) sout | nl;
     805}
     806void format( Format & fmt ) {
     807        `resume( fmt );`
     808}
    805809int main() {
    806         Fmt fmt;
    807         sout | nlOff;   // turn off auto newline
    808         for ( 41 )
    809                 send( fmt, 'a' );
     810        Format fmt;
     811        eof: for ( ;; ) {
     812                sin | fmt.ch;
     813          if ( eof( sin ) ) break eof;
     814                format( fmt );
     815        }
    810816}
    811817\end{cfa}
     
    814820\newbox\myboxB
    815821\begin{lrbox}{\myboxB}
    816 \begin{python}[aboveskip=0pt,belowskip=0pt]
    817 
    818 
    819 
    820 def Fmt():
    821         try:
    822                 while True:
    823                         for g in range( 5 ):
    824                                 for b in range( 4 ):
    825 
    826                                         print( `(yield)`, end='' )
    827                                 print( '  ', end='' )
    828                         print()
    829 
    830 
    831         except GeneratorExit:
    832                 if g != 0 | b != 0:
    833                         print()
    834 
    835 
    836 fmt = Fmt()
    837 `next( fmt )`                    # prime
    838 for i in range( 41 ):
    839         `fmt.send( 'a' );`      # send to yield
    840 
    841 \end{python}
     822\begin{cfa}[aboveskip=0pt,belowskip=0pt]
     823struct Format {
     824        char ch;
     825        int g, b;
     826};
     827void format( struct Format * fmt ) {
     828        if ( fmt->ch != -1 ) {      // not EOF ?
     829                printf( "%c", fmt->ch );
     830                fmt->b += 1;
     831                if ( fmt->b == 4 ) {  // block
     832                        printf( "  " );      // separator
     833                        fmt->b = 0;
     834                        fmt->g += 1;
     835                }
     836                if ( fmt->g == 5 ) {  // group
     837                        printf( "\n" );     // separator
     838                        fmt->g = 0;
     839                }
     840        } else {
     841                if ( fmt->g != 0 || fmt->b != 0 ) printf( "\n" );
     842        }
     843}
     844int main() {
     845        struct Format fmt = { 0, 0, 0 };
     846        for ( ;; ) {
     847                scanf( "%c", &fmt.ch );
     848          if ( feof( stdin ) ) break;
     849                format( &fmt );
     850        }
     851        fmt.ch = -1;
     852        format( &fmt );
     853}
     854\end{cfa}
    842855\end{lrbox}
    843 \subfloat[\CFA]{\label{f:CFAFmt}\usebox\myboxA}
     856\subfloat[\CFA Coroutine]{\label{f:CFAFmt}\usebox\myboxA}
    844857\qquad
    845 \subfloat[Python]{\label{f:CFmt}\usebox\myboxB}
    846 \caption{Output formatting text}
     858\subfloat[C Linearized]{\label{f:CFmt}\usebox\myboxB}
     859\caption{Formatting text into lines of 5 blocks of 4 characters.}
    847860\label{f:fmt-line}
    848861\end{figure}
     
    865878void main( Prod & prod ) with( prod ) {
    866879        // 1st resume starts here
    867         for ( i; N ) {
     880        for ( int i = 0; i < N; i += 1 ) {
    868881                int p1 = random( 100 ), p2 = random( 100 );
    869882                sout | p1 | " " | p2;
     
    881894}
    882895void start( Prod & prod, int N, Cons &c ) {
    883         &prod.c = &c; // reassignable reference
     896        &prod.c = &c;
    884897        prod.[N, receipt] = [N, 0];
    885898        `resume( prod );`
     
    896909        Prod & p;
    897910        int p1, p2, status;
    898         bool done;
     911        _Bool done;
    899912};
    900913void ?{}( Cons & cons, Prod & p ) {
    901         &cons.p = &p; // reassignable reference
     914        &cons.p = &p;
    902915        cons.[status, done ] = [0, false];
    903916}
     
    956969The program main restarts after the resume in @start@.
    957970@start@ returns and the program main terminates.
    958 
    959 One \emph{killer} application for a coroutine is device drivers, which at one time caused 70\%-85\% of failures in Windows/Linux~\cite{Swift05}.
    960 Many device drivers are a finite state-machine parsing a protocol, e.g.:
    961 \begin{tabbing}
    962 \ldots STX \= \ldots message \ldots \= ESC \= ETX \= \ldots message \ldots  \= ETX \= 2-byte crc \= \ldots      \kill
    963 \ldots STX \> \ldots message \ldots \> ESC \> ETX \> \ldots message \ldots  \> ETX \> 2-byte crc \> \ldots
    964 \end{tabbing}
    965 where a network message begins with the control character STX and ends with an ETX, followed by a 2-byte cyclic-redundancy check.
    966 Control characters may appear in a message if preceded by an ESC.
    967 Because FSMs can be complex and occur frequently in important domains, direct support of the coroutine is crucial in a systems programminglanguage.
    968 
    969 \begin{figure}
    970 \begin{cfa}
    971 enum Status { CONT, MSG, ESTX, ELNTH, ECRC };
    972 `coroutine` Driver {
    973         Status status;
    974         char * msg, byte;
    975 };
    976 void ?{}( Driver & d, char * m ) { d.msg = m; }         $\C[3.0in]{// constructor}$
    977 Status next( Driver & d, char b ) with( d ) {           $\C{// 'with' opens scope}$
    978         byte = b; `resume( d );` return status;
    979 }
    980 void main( Driver & d ) with( d ) {
    981         enum { STX = '\002', ESC = '\033', ETX = '\003', MaxMsg = 64 };
    982         unsigned short int crc;                                                 $\C{// error checking}$
    983   msg: for () {                                                                         $\C{// parse message}$
    984                 status = CONT;
    985                 unsigned int lnth = 0, sum = 0;
    986                 while ( byte != STX ) `suspend();`
    987           emsg: for () {
    988                         `suspend();`                                                    $\C{// process byte}$
    989                         choose ( byte ) {                                               $\C{// switch with default break}$
    990                           case STX:
    991                                 status = ESTX; `suspend();` continue msg;
    992                           case ETX:
    993                                 break emsg;
    994                           case ESC:
    995                                 suspend();
    996                         } // choose
    997                         if ( lnth >= MaxMsg ) {                                 $\C{// buffer full ?}$
    998                                 status = ELNTH; `suspend();` continue msg; }
    999                         msg[lnth++] = byte;
    1000                         sum += byte;
    1001                 } // for
    1002                 msg[lnth] = '\0';                                                       $\C{// terminate string}\CRT$
    1003                 `suspend();`
    1004                 crc = (unsigned char)byte << 8; // prevent sign extension for signed char
    1005                 `suspend();`
    1006                 status = (crc | (unsigned char)byte) == sum ? MSG : ECRC;
    1007                 `suspend();`
    1008         } // for
    1009 }
    1010 \end{cfa}
    1011 \caption{Device driver for simple communication protocol}
    1012 \end{figure}
    1013971
    1014972
Note: See TracChangeset for help on using the changeset viewer.