Changeset 466b1c9 for doc/papers/concurrency
- Timestamp:
- Mar 18, 2019, 2:32:26 PM (7 years ago)
- Branches:
- ADT, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
- Children:
- 78e109f
- Parents:
- 76d3ca6 (diff), a927662 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/papers/concurrency/Paper.tex
r76d3ca6 r466b1c9 215 215 {} 216 216 \lstnewenvironment{Go}[1][] 217 {\lstset{#1}} 217 {\lstset{language=go,moredelim=**[is][\protect\color{red}]{`}{`},#1}\lstset{#1}} 218 {} 219 \lstnewenvironment{python}[1][] 220 {\lstset{language=python,moredelim=**[is][\protect\color{red}]{`}{`},#1}\lstset{#1}} 218 221 {} 219 222 … … 228 231 } 229 232 230 \title{\texorpdfstring{Advanced Control-flow in \protect\CFA}{Advanced Control-flow in Cforall}}233 \title{\texorpdfstring{Advanced Control-flow and Concurrency in \protect\CFA}{Advanced Control-flow in Cforall}} 231 234 232 235 \author[1]{Thierry Delisle} … … 242 245 \abstract[Summary]{ 243 246 \CFA is a modern, polymorphic, non-object-oriented, backwards-compatible extension of the C programming language. 244 This paper discusses the advanced control-flow features in \CFA, which include concurrency and parallelism, and its supporting runtime system.245 These features are created from scratch as ISO C's concurrency is low-level and unimplemented, so C programmers continue to rely on the C pthreads library.246 \CFA provides high-level control-flow mechanisms, like coroutines and user-level threads, and monitors for mutual exclusion and synchronization.247 A unique contribution of this work is allowing multiple monitors to be safely acquired \emph{simultaneously} (deadlock free), while integrating this capability with allmonitor synchronization mechanisms.248 All features respect the expectations of C programmers, while being fully integrate with the \CFA polymorphic type-system and other language features.247 This paper discusses some advanced control-flow and concurrency/parallelism features in \CFA, along with the supporting runtime. 248 These features are created from scratch because they do not exist in ISO C, or are low-level and/or unimplemented, so C programmers continue to rely on library features, like C pthreads. 249 \CFA introduces language-level control-flow mechanisms, like coroutines, user-level threading, and monitors for mutual exclusion and synchronization. 250 A unique contribution of this work is allowing multiple monitors to be safely acquired \emph{simultaneously} (deadlock free), while integrating this capability with monitor synchronization mechanisms. 251 These features also integrate with the \CFA polymorphic type-system and exception handling, while respecting the expectations and style of C programmers. 249 252 Experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming-languages. 250 253 }% … … 261 264 \section{Introduction} 262 265 263 This paper discusses the design of advanced, high-level control-flow extensions (especially concurrency and parallelism)in \CFA and its runtime.266 This paper discusses the design of language-level control-flow and concurrency/parallelism extensions in \CFA and its runtime. 264 267 \CFA is a modern, polymorphic, non-object-oriented\footnote{ 265 268 \CFA has features often associated with object-oriented programming languages, such as constructors, destructors, virtuals and simple inheritance. 266 269 However, functions \emph{cannot} be nested in structures, so there is no lexical binding between a structure and set of functions (member/method) implemented by an implicit \lstinline@this@ (receiver) parameter.}, 267 270 backwards-compatible extension of the C programming language~\cite{Moss18}. 268 Within the \CFA framework, new control-flow features were created from scratch. 269 ISO \Celeven defines only a subset of the \CFA extensions, and with respect to concurrency~\cite[\S~7.26]{C11}, the features are largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}. 270 Furthermore, \Celeven and pthreads concurrency is basic, based on thread fork/join in a function and a few locks, which is low-level and error prone; 271 no high-level language concurrency features exist. 272 Interestingly, almost a decade after publication of the \Celeven standard, neither gcc-8, clang-8 nor msvc-19 (most recent versions) support the \Celeven include @threads.h@, indicating little interest in the C concurrency approach. 273 Finally, while the \Celeven standard does not state a concurrent threading-model, the historical association with pthreads suggests the threading model is kernel-level threading (1:1)~\cite{ThreadModel}. 271 Within the \CFA framework, new control-flow features are created from scratch. 272 ISO \Celeven defines only a subset of the \CFA extensions, where the overlapping features are concurrency~\cite[\S~7.26]{C11}. 273 However, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}. 274 Furthermore, \Celeven and pthreads concurrency is simple, based on thread fork/join in a function and a few locks, which is low-level and error prone; 275 no high-level language concurrency features are defined. 276 Interestingly, almost a decade after publication of the \Celeven standard, neither gcc-8, clang-8 nor msvc-19 (most recent versions) support the \Celeven include @threads.h@, indicating little interest in the C11 concurrency approach. 277 Finally, while the \Celeven standard does not state a concurrent threading-model, the historical association with pthreads suggests implementations would adopt kernel-level threading (1:1)~\cite{ThreadModel}. 274 278 275 279 In contrast, there has been a renewed interest during the past decade in user-level (M:N, green) threading in old and new programming languages. … … 284 288 285 289 A further effort over the past decade is the development of language memory-models to deal with the conflict between certain language features and compiler/hardware optimizations. 286 This issue can be rephrased as some features are pervasive (language and runtime) and cannot be safely added via a library to prevent invalidation by sequential optimizations~\cite{Buhr95a,Boehm05}.290 This issue can be rephrased as: some language features are pervasive (language and runtime) and cannot be safely added via a library to prevent invalidation by sequential optimizations~\cite{Buhr95a,Boehm05}. 287 291 The consequence is that a language must be cognizant of these features and provide sufficient tools to program around any safety issues. 288 292 For example, C created the @volatile@ qualifier to provide correct execution for @setjmp@/@logjmp@ (concurrency came later). 289 The simplestsolution is to provide a handful of complex qualifiers and functions (e.g., @volatile@ and atomics) allowing programmers to write consistent/race-free programs, often in the sequentially-consistent memory-model~\cite{Boehm12}.293 The common solution is to provide a handful of complex qualifiers and functions (e.g., @volatile@ and atomics) allowing programmers to write consistent/race-free programs, often in the sequentially-consistent memory-model~\cite{Boehm12}. 290 294 291 295 While having a sufficient memory-model allows sound libraries to be constructed, writing these libraries can quickly become awkward and error prone, and using these low-level libraries has the same issues. 292 296 Essentially, using low-level explicit locks is the concurrent equivalent of assembler programming. 293 Just as most assembler programming is replaced with programming in a high-level language, explicit locks can be replaced with high-level concurrency constructsin a programming language.294 The goal is to getthe compiler to check for correct usage and follow any complex coding conventions implicitly.297 Just as most assembler programming is replaced with high-level programming, explicit locks can be replaced with high-level concurrency in a programming language. 298 Then the goal is for the compiler to check for correct usage and follow any complex coding conventions implicitly. 295 299 The drawback is that language constructs may preclude certain specialized techniques, therefore introducing inefficiency or inhibiting concurrency. 296 300 For most concurrent programs, these drawbacks are insignificant in comparison to the speed of composition, and subsequent reliability and maintainability of the high-level concurrent program. … … 299 303 As stated, this observation applies to non-concurrent forms of complex control-flow, like exception handling and coroutines. 300 304 301 Adapting the programming language allows matching the control-flow model with the programming-language style, versus adapting toone general (sound) library/paradigm.305 Adapting the programming language to these features also allows matching the control-flow model with the programming-language style, versus adopting one general (sound) library/paradigm. 302 306 For example, it is possible to provide exceptions, coroutines, monitors, and tasks as specialized types in an object-oriented language, integrating these constructs to allow leveraging the type-system (static type-checking) and all other object-oriented capabilities~\cite{uC++}. 303 307 It is also possible to leverage call/return for blocking communication via new control structures, versus switching to alternative communication paradigms, like channels or message passing. … … 307 311 however, the reverse is seldom true, i.e., given implicit concurrency, e.g., actors, it is virtually impossible to create explicit concurrency, e.g., blocking thread objects.} 308 312 Finally, with extended language features and user-level threading it is possible to discretely fold locking and non-blocking I/O multiplexing into the language's I/O libraries, so threading implicitly dovetails with the I/O subsystem. 309 310 \CFA embraces language extensions and user-level threading to provide advanced control-flow and concurrency. 311 We attempt to show the \CFA extensions and runtime are demonstrably better than those proposed for \CC and other concurrent, imperative programming languages. 312 The contributions of this work are: 313 \CFA embraces language extensions and user-level threading to provide advanced control-flow (exception handling\footnote{ 314 \CFA exception handling will be presented in a separate paper. 315 The key feature that dovetails with this paper is non-local exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++} 316 } and coroutines) and concurrency. 317 318 Most augmented traditional (Fortran 18~\cite{Fortran18}, Cobol 14~\cite{Cobol14}, Ada 12~\cite{Ada12}, Java 11~\cite{Java11}) and new languages (Go~\cite{Go}, Rust~\cite{Rust}, and D~\cite{D}), except \CC, diverge from C with different syntax and semantics, only interoperate indirectly with C, and are not systems languages, for those with managed memory. 319 As a result, there is a significant learning curve to move to these languages, and C legacy-code must be rewritten. 320 While \CC, like \CFA, takes an evolutionary approach to extend C, \CC's constantly growing complex and interdependent features-set (e.g., objects, inheritance, templates, etc.) mean idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning \CC. 321 Hence, rewriting and retraining costs for these languages, even \CC, are prohibitive for companies with a large C software-base. 322 \CFA with its orthogonal feature-set, its high-performance runtime, and direct access to all existing C libraries circumvents these problems. 323 324 We present comparative examples so the reader can judge if the \CFA control-flow extensions are equivalent or better than those in or proposed for \Celeven, \CC and other concurrent, imperative programming languages, and perform experiments to show the \CFA runtime is competitive with other similar mechanisms. 325 The detailed contributions of this work are: 313 326 \begin{itemize} 314 327 \item … … 615 628 616 629 617 \section{Coroutines: A Stepping Stone}\label{coroutine}618 619 Advanced controlWhile the focus of this discussion is concurrency and parallelism, it is important to address coroutines, which are a significant building block of a concurrency system (but not concurrent among themselves). 630 \section{Coroutines: Stepping Stone} 631 \label{coroutine} 632 620 633 Coroutines are generalized routines allowing execution to be temporarily suspended and later resumed. 621 634 Hence, unlike a normal routine, a coroutine may not terminate when it returns to its caller, allowing it to be restarted with the values and execution location present at the point of suspension. … … 643 656 \begin{lrbox}{\myboxA} 644 657 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 645 `int f 1, f2, state = 1;` // single global variables658 `int fn1, fn2, state = 1;` // single global variables 646 659 int fib() { 647 660 int fn; 648 661 `switch ( state )` { // explicit execution state 649 case 1: fn = 0; f 1 = fn; state = 2; break;650 case 2: fn = 1; f 2 = f1; f1 = fn; state = 3; break;651 case 3: fn = f 1 + f2; f2 = f1; f1 = fn; break;662 case 1: fn = 0; fn1 = fn; state = 2; break; 663 case 2: fn = 1; fn2 = fn1; fn1 = fn; state = 3; break; 664 case 3: fn = fn1 + fn2; fn2 = fn1; fn1 = fn; break; 652 665 } 653 666 return fn; … … 666 679 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 667 680 #define FIB_INIT `{ 0, 1 }` 668 typedef struct { int f 2, f1; } Fib;681 typedef struct { int fn2, fn1; } Fib; 669 682 int fib( Fib * f ) { 670 683 671 int ret = f->f 2;672 int fn = f->f 1 + f->f2;673 f->f 2 = f->f1; f->f1 = fn;684 int ret = f->fn2; 685 int fn = f->fn1 + f->fn2; 686 f->fn2 = f->fn1; f->fn1 = fn; 674 687 675 688 return ret; … … 678 691 Fib f1 = FIB_INIT, f2 = FIB_INIT; 679 692 for ( int i = 0; i < 10; i += 1 ) { 680 printf( "%d %d\n", fib( &f 1 ), fib( &f2 ) );693 printf( "%d %d\n", fib( &fn1 ), fib( &f2 ) ); 681 694 } 682 695 } … … 684 697 \end{lrbox} 685 698 686 \subfloat[3 States: global variables]{\label{f:GlobalVariables}\usebox\myboxA}699 \subfloat[3 states: global variables]{\label{f:GlobalVariables}\usebox\myboxA} 687 700 \qquad 688 \subfloat[1 State: externalvariables]{\label{f:ExternalState}\usebox\myboxB}689 \caption{C Fibonacci Implementations}701 \subfloat[1 state: encapsulated variables]{\label{f:ExternalState}\usebox\myboxB} 702 \caption{C fibonacci fsm} 690 703 \label{f:C-fibonacci} 691 704 … … 697 710 `coroutine` Fib { int fn; }; 698 711 void main( Fib & fib ) with( fib ) { 699 int f1, f2; 700 fn = 0; f1 = fn; `suspend()`; 701 fn = 1; f2 = f1; f1 = fn; `suspend()`; 702 for ( ;; ) { 703 fn = f1 + f2; f2 = f1; f1 = fn; `suspend()`; 704 } 705 } 706 int next( Fib & fib ) with( fib ) { 707 `resume( fib );` 708 return fn; 709 } 712 fn = 0; int fn1 = fn; `suspend()`; 713 fn = 1; int fn2 = fn1; fn1 = fn; `suspend()`; 714 for () { 715 fn = fn1 + fn2; fn2 = fn1; fn1 = fn; `suspend()`; } 716 } 717 int next( Fib & fib ) with( fib ) { `resume( fib );` return fn; } 710 718 int main() { 711 719 Fib f1, f2; 712 for ( int i = 1; i <= 10; i += 1 ) {720 for ( 10 ) 713 721 sout | next( f1 ) | next( f2 ); 714 }715 722 } 716 723 \end{cfa} … … 718 725 \newbox\myboxB 719 726 \begin{lrbox}{\myboxB} 720 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 721 `coroutine` Fib { int ret; }; 722 void main( Fib & f ) with( fib ) { 723 int fn, f1 = 1, f2 = 0; 724 for ( ;; ) { 725 ret = f2; 726 727 fn = f1 + f2; f2 = f1; f1 = fn; `suspend();` 728 } 729 } 730 int next( Fib & fib ) with( fib ) { 731 `resume( fib );` 732 return ret; 733 } 734 735 736 737 738 739 740 \end{cfa} 727 \begin{python}[aboveskip=0pt,belowskip=0pt] 728 729 def Fibonacci(): 730 fn = 0; fn1 = fn; `yield fn` # suspend 731 fn = 1; fn2 = fn1; fn1 = fn; `yield fn` 732 while True: 733 fn = fn1 + fn2; fn2 = fn1; fn1 = fn; `yield fn` 734 735 736 f1 = Fibonacci() 737 f2 = Fibonacci() 738 for i in range( 10 ): 739 print( `next( f1 )`, `next( f2 )` ) # resume 740 741 \end{python} 741 742 \end{lrbox} 742 \subfloat[ 3 States, internal variables]{\label{f:Coroutine3States}\usebox\myboxA}743 \qquad \qquad744 \subfloat[ 1 State, internal variables]{\label{f:Coroutine1State}\usebox\myboxB}745 \caption{ \CFA Coroutine Fibonacci Implementations}743 \subfloat[\CFA]{\label{f:Coroutine3States}\usebox\myboxA} 744 \qquad 745 \subfloat[Python]{\label{f:Coroutine1State}\usebox\myboxB} 746 \caption{Fibonacci input coroutine, 3 states, internal variables} 746 747 \label{f:cfa-fibonacci} 747 748 \end{figure} … … 784 785 \begin{lrbox}{\myboxA} 785 786 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 786 `coroutine` F ormat {787 char ch; // used for communication788 int g, b; // global because used in destructor787 `coroutine` Fmt { 788 char ch; // communication variables 789 int g, b; // needed in destructor 789 790 }; 790 void main( F ormat & fmt ) with( fmt ) {791 for ( ;;) {792 for ( g = 0; g < 5; g += 1 ) { // group793 for ( b = 0; b < 4; b += 1 ) { // block 791 void main( Fmt & fmt ) with( fmt ) { 792 for () { 793 for ( g = 0; g < 5; g += 1 ) { // groups 794 for ( b = 0; b < 4; b += 1 ) { // blocks 794 795 `suspend();` 795 sout | ch; // separator 796 } 797 sout | " "; // separator 798 } 799 sout | nl; 800 } 801 } 802 void ?{}( Format & fmt ) { `resume( fmt );` } 803 void ^?{}( Format & fmt ) with( fmt ) { 804 if ( g != 0 || b != 0 ) sout | nl; 805 } 806 void format( Format & fmt ) { 807 `resume( fmt );` 808 } 796 sout | ch; } // print character 797 sout | " "; } // block separator 798 sout | nl; } // group separator 799 } 800 void ?{}( Fmt & fmt ) { `resume( fmt );` } // prime 801 void ^?{}( Fmt & fmt ) with( fmt ) { // destructor 802 if ( g != 0 || b != 0 ) // special case 803 sout | nl; } 804 void send( Fmt & fmt, char c ) { fmt.ch = c; `resume( fmt )`; } 809 805 int main() { 810 Format fmt; 811 eof: for ( ;; ) { 812 sin | fmt.ch; 813 if ( eof( sin ) ) break eof; 814 format( fmt ); 815 } 806 Fmt fmt; 807 sout | nlOff; // turn off auto newline 808 for ( 41 ) 809 send( fmt, 'a' ); 816 810 } 817 811 \end{cfa} … … 820 814 \newbox\myboxB 821 815 \begin{lrbox}{\myboxB} 822 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 823 struct Format { 824 char ch; 825 int g, b; 826 }; 827 void format( struct Format * fmt ) { 828 if ( fmt->ch != -1 ) { // not EOF ? 829 printf( "%c", fmt->ch ); 830 fmt->b += 1; 831 if ( fmt->b == 4 ) { // block 832 printf( " " ); // separator 833 fmt->b = 0; 834 fmt->g += 1; 835 } 836 if ( fmt->g == 5 ) { // group 837 printf( "\n" ); // separator 838 fmt->g = 0; 839 } 840 } else { 841 if ( fmt->g != 0 || fmt->b != 0 ) printf( "\n" ); 842 } 843 } 844 int main() { 845 struct Format fmt = { 0, 0, 0 }; 846 for ( ;; ) { 847 scanf( "%c", &fmt.ch ); 848 if ( feof( stdin ) ) break; 849 format( &fmt ); 850 } 851 fmt.ch = -1; 852 format( &fmt ); 853 } 854 \end{cfa} 816 \begin{python}[aboveskip=0pt,belowskip=0pt] 817 818 819 820 def Fmt(): 821 try: 822 while True: 823 for g in range( 5 ): 824 for b in range( 4 ): 825 826 print( `(yield)`, end='' ) 827 print( ' ', end='' ) 828 print() 829 830 831 except GeneratorExit: 832 if g != 0 | b != 0: 833 print() 834 835 836 fmt = Fmt() 837 `next( fmt )` # prime 838 for i in range( 41 ): 839 `fmt.send( 'a' );` # send to yield 840 841 \end{python} 855 842 \end{lrbox} 856 \subfloat[\CFA Coroutine]{\label{f:CFAFmt}\usebox\myboxA}843 \subfloat[\CFA]{\label{f:CFAFmt}\usebox\myboxA} 857 844 \qquad 858 \subfloat[ C Linearized]{\label{f:CFmt}\usebox\myboxB}859 \caption{ Formatting text into lines of 5 blocks of 4 characters.}845 \subfloat[Python]{\label{f:CFmt}\usebox\myboxB} 846 \caption{Output formatting text} 860 847 \label{f:fmt-line} 861 848 \end{figure} … … 878 865 void main( Prod & prod ) with( prod ) { 879 866 // 1st resume starts here 880 for ( i nt i = 0; i < N; i += 1) {867 for ( i; N ) { 881 868 int p1 = random( 100 ), p2 = random( 100 ); 882 869 sout | p1 | " " | p2; … … 894 881 } 895 882 void start( Prod & prod, int N, Cons &c ) { 896 &prod.c = &c; 883 &prod.c = &c; // reassignable reference 897 884 prod.[N, receipt] = [N, 0]; 898 885 `resume( prod );` … … 909 896 Prod & p; 910 897 int p1, p2, status; 911 _Bool done;898 bool done; 912 899 }; 913 900 void ?{}( Cons & cons, Prod & p ) { 914 &cons.p = &p; 901 &cons.p = &p; // reassignable reference 915 902 cons.[status, done ] = [0, false]; 916 903 } … … 969 956 The program main restarts after the resume in @start@. 970 957 @start@ returns and the program main terminates. 958 959 One \emph{killer} application for a coroutine is device drivers, which at one time caused 70\%-85\% of failures in Windows/Linux~\cite{Swift05}. 960 Many device drivers are a finite state-machine parsing a protocol, e.g.: 961 \begin{tabbing} 962 \ldots STX \= \ldots message \ldots \= ESC \= ETX \= \ldots message \ldots \= ETX \= 2-byte crc \= \ldots \kill 963 \ldots STX \> \ldots message \ldots \> ESC \> ETX \> \ldots message \ldots \> ETX \> 2-byte crc \> \ldots 964 \end{tabbing} 965 where a network message begins with the control character STX and ends with an ETX, followed by a 2-byte cyclic-redundancy check. 966 Control characters may appear in a message if preceded by an ESC. 967 Because FSMs can be complex and occur frequently in important domains, direct support of the coroutine is crucial in a systems programminglanguage. 968 969 \begin{figure} 970 \begin{cfa} 971 enum Status { CONT, MSG, ESTX, ELNTH, ECRC }; 972 `coroutine` Driver { 973 Status status; 974 char * msg, byte; 975 }; 976 void ?{}( Driver & d, char * m ) { d.msg = m; } $\C[3.0in]{// constructor}$ 977 Status next( Driver & d, char b ) with( d ) { $\C{// 'with' opens scope}$ 978 byte = b; `resume( d );` return status; 979 } 980 void main( Driver & d ) with( d ) { 981 enum { STX = '\002', ESC = '\033', ETX = '\003', MaxMsg = 64 }; 982 unsigned short int crc; $\C{// error checking}$ 983 msg: for () { $\C{// parse message}$ 984 status = CONT; 985 unsigned int lnth = 0, sum = 0; 986 while ( byte != STX ) `suspend();` 987 emsg: for () { 988 `suspend();` $\C{// process byte}$ 989 choose ( byte ) { $\C{// switch with default break}$ 990 case STX: 991 status = ESTX; `suspend();` continue msg; 992 case ETX: 993 break emsg; 994 case ESC: 995 suspend(); 996 } // choose 997 if ( lnth >= MaxMsg ) { $\C{// buffer full ?}$ 998 status = ELNTH; `suspend();` continue msg; } 999 msg[lnth++] = byte; 1000 sum += byte; 1001 } // for 1002 msg[lnth] = '\0'; $\C{// terminate string}\CRT$ 1003 `suspend();` 1004 crc = (unsigned char)byte << 8; // prevent sign extension for signed char 1005 `suspend();` 1006 status = (crc | (unsigned char)byte) == sum ? MSG : ECRC; 1007 `suspend();` 1008 } // for 1009 } 1010 \end{cfa} 1011 \caption{Device driver for simple communication protocol} 1012 \end{figure} 971 1013 972 1014
Note:
See TracChangeset
for help on using the changeset viewer.