- File:
-
- 1 edited
-
doc/papers/concurrency/Paper.tex (modified) (20 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/papers/concurrency/Paper.tex
ra927662 rca0f061f 215 215 {} 216 216 \lstnewenvironment{Go}[1][] 217 {\lstset{language=go,moredelim=**[is][\protect\color{red}]{`}{`},#1}\lstset{#1}} 218 {} 219 \lstnewenvironment{python}[1][] 220 {\lstset{language=python,moredelim=**[is][\protect\color{red}]{`}{`},#1}\lstset{#1}} 217 {\lstset{#1}} 221 218 {} 222 219 … … 231 228 } 232 229 233 \title{\texorpdfstring{Advanced Control-flow and Concurrencyin \protect\CFA}{Advanced Control-flow in Cforall}}230 \title{\texorpdfstring{Advanced Control-flow in \protect\CFA}{Advanced Control-flow in Cforall}} 234 231 235 232 \author[1]{Thierry Delisle} … … 245 242 \abstract[Summary]{ 246 243 \CFA is a modern, polymorphic, non-object-oriented, backwards-compatible extension of the C programming language. 247 This paper discusses some advanced control-flow and concurrency/parallelism features in \CFA, along with the supporting runtime.248 These features are created from scratch because they do not exist in ISO C, or are low-level and/or unimplemented, so C programmers continue to rely on library features, like C pthreads.249 \CFA introduces language-level control-flow mechanisms, like coroutines, user-level threading, and monitors for mutual exclusion and synchronization.250 A unique contribution of this work is allowing multiple monitors to be safely acquired \emph{simultaneously} (deadlock free), while integrating this capability with monitor synchronization mechanisms.251 These features also integrate with the \CFA polymorphic type-system and exception handling, while respecting the expectations and style of C programmers.244 This paper discusses the advanced control-flow features in \CFA, which include concurrency and parallelism, and its supporting runtime system. 245 These features are created from scratch as ISO C's concurrency is low-level and unimplemented, so C programmers continue to rely on the C pthreads library. 246 \CFA provides high-level control-flow mechanisms, like coroutines and user-level threads, and monitors for mutual exclusion and synchronization. 247 A unique contribution of this work is allowing multiple monitors to be safely acquired \emph{simultaneously} (deadlock free), while integrating this capability with all monitor synchronization mechanisms. 248 All features respect the expectations of C programmers, while being fully integrate with the \CFA polymorphic type-system and other language features. 252 249 Experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming-languages. 253 250 }% … … 264 261 \section{Introduction} 265 262 266 This paper discusses the design of language-level control-flow and concurrency/parallelism extensionsin \CFA and its runtime.263 This paper discusses the design of advanced, high-level control-flow extensions (especially concurrency and parallelism) in \CFA and its runtime. 267 264 \CFA is a modern, polymorphic, non-object-oriented\footnote{ 268 265 \CFA has features often associated with object-oriented programming languages, such as constructors, destructors, virtuals and simple inheritance. 269 266 However, functions \emph{cannot} be nested in structures, so there is no lexical binding between a structure and set of functions (member/method) implemented by an implicit \lstinline@this@ (receiver) parameter.}, 270 267 backwards-compatible extension of the C programming language~\cite{Moss18}. 271 Within the \CFA framework, new control-flow features are created from scratch. 272 ISO \Celeven defines only a subset of the \CFA extensions, where the overlapping features are concurrency~\cite[\S~7.26]{C11}. 273 However, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}. 274 Furthermore, \Celeven and pthreads concurrency is simple, based on thread fork/join in a function and a few locks, which is low-level and error prone; 275 no high-level language concurrency features are defined. 276 Interestingly, almost a decade after publication of the \Celeven standard, neither gcc-8, clang-8 nor msvc-19 (most recent versions) support the \Celeven include @threads.h@, indicating little interest in the C11 concurrency approach. 277 Finally, while the \Celeven standard does not state a concurrent threading-model, the historical association with pthreads suggests implementations would adopt kernel-level threading (1:1)~\cite{ThreadModel}. 268 Within the \CFA framework, new control-flow features were created from scratch. 269 ISO \Celeven defines only a subset of the \CFA extensions, and with respect to concurrency~\cite[\S~7.26]{C11}, the features are largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}. 270 Furthermore, \Celeven and pthreads concurrency is basic, based on thread fork/join in a function and a few locks, which is low-level and error prone; 271 no high-level language concurrency features exist. 272 Interestingly, almost a decade after publication of the \Celeven standard, neither gcc-8, clang-8 nor msvc-19 (most recent versions) support the \Celeven include @threads.h@, indicating little interest in the C concurrency approach. 273 Finally, while the \Celeven standard does not state a concurrent threading-model, the historical association with pthreads suggests the threading model is kernel-level threading (1:1)~\cite{ThreadModel}. 278 274 279 275 In contrast, there has been a renewed interest during the past decade in user-level (M:N, green) threading in old and new programming languages. … … 288 284 289 285 A further effort over the past decade is the development of language memory-models to deal with the conflict between certain language features and compiler/hardware optimizations. 290 This issue can be rephrased as : some language features are pervasive (language and runtime) and cannot be safely added via a library to prevent invalidation by sequential optimizations~\cite{Buhr95a,Boehm05}.286 This issue can be rephrased as some features are pervasive (language and runtime) and cannot be safely added via a library to prevent invalidation by sequential optimizations~\cite{Buhr95a,Boehm05}. 291 287 The consequence is that a language must be cognizant of these features and provide sufficient tools to program around any safety issues. 292 288 For example, C created the @volatile@ qualifier to provide correct execution for @setjmp@/@logjmp@ (concurrency came later). 293 The commonsolution is to provide a handful of complex qualifiers and functions (e.g., @volatile@ and atomics) allowing programmers to write consistent/race-free programs, often in the sequentially-consistent memory-model~\cite{Boehm12}.289 The simplest solution is to provide a handful of complex qualifiers and functions (e.g., @volatile@ and atomics) allowing programmers to write consistent/race-free programs, often in the sequentially-consistent memory-model~\cite{Boehm12}. 294 290 295 291 While having a sufficient memory-model allows sound libraries to be constructed, writing these libraries can quickly become awkward and error prone, and using these low-level libraries has the same issues. 296 292 Essentially, using low-level explicit locks is the concurrent equivalent of assembler programming. 297 Just as most assembler programming is replaced with high-level programming, explicit locks can be replaced with high-level concurrencyin a programming language.298 The n the goal is forthe compiler to check for correct usage and follow any complex coding conventions implicitly.293 Just as most assembler programming is replaced with programming in a high-level language, explicit locks can be replaced with high-level concurrency constructs in a programming language. 294 The goal is to get the compiler to check for correct usage and follow any complex coding conventions implicitly. 299 295 The drawback is that language constructs may preclude certain specialized techniques, therefore introducing inefficiency or inhibiting concurrency. 300 296 For most concurrent programs, these drawbacks are insignificant in comparison to the speed of composition, and subsequent reliability and maintainability of the high-level concurrent program. … … 303 299 As stated, this observation applies to non-concurrent forms of complex control-flow, like exception handling and coroutines. 304 300 305 Adapting the programming language to these features also allows matching the control-flow model with the programming-language style, versus adoptingone general (sound) library/paradigm.301 Adapting the programming language allows matching the control-flow model with the programming-language style, versus adapting to one general (sound) library/paradigm. 306 302 For example, it is possible to provide exceptions, coroutines, monitors, and tasks as specialized types in an object-oriented language, integrating these constructs to allow leveraging the type-system (static type-checking) and all other object-oriented capabilities~\cite{uC++}. 307 303 It is also possible to leverage call/return for blocking communication via new control structures, versus switching to alternative communication paradigms, like channels or message passing. … … 311 307 however, the reverse is seldom true, i.e., given implicit concurrency, e.g., actors, it is virtually impossible to create explicit concurrency, e.g., blocking thread objects.} 312 308 Finally, with extended language features and user-level threading it is possible to discretely fold locking and non-blocking I/O multiplexing into the language's I/O libraries, so threading implicitly dovetails with the I/O subsystem. 313 \CFA embraces language extensions and user-level threading to provide advanced control-flow (exception handling\footnote{ 314 \CFA exception handling will be presented in a separate paper. 315 The key feature that dovetails with this paper is non-local exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++} 316 } and coroutines) and concurrency. 317 318 Most augmented traditional (Fortran 18~\cite{Fortran18}, Cobol 14~\cite{Cobol14}, Ada 12~\cite{Ada12}, Java 11~\cite{Java11}) and new languages (Go~\cite{Go}, Rust~\cite{Rust}, and D~\cite{D}), except \CC, diverge from C with different syntax and semantics, only interoperate indirectly with C, and are not systems languages, for those with managed memory. 319 As a result, there is a significant learning curve to move to these languages, and C legacy-code must be rewritten. 320 While \CC, like \CFA, takes an evolutionary approach to extend C, \CC's constantly growing complex and interdependent features-set (e.g., objects, inheritance, templates, etc.) mean idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning \CC. 321 Hence, rewriting and retraining costs for these languages, even \CC, are prohibitive for companies with a large C software-base. 322 \CFA with its orthogonal feature-set, its high-performance runtime, and direct access to all existing C libraries circumvents these problems. 323 324 We present comparative examples so the reader can judge if the \CFA control-flow extensions are equivalent or better than those in or proposed for \Celeven, \CC and other concurrent, imperative programming languages, and perform experiments to show the \CFA runtime is competitive with other similar mechanisms. 325 The detailed contributions of this work are: 309 310 \CFA embraces language extensions and user-level threading to provide advanced control-flow and concurrency. 311 We attempt to show the \CFA extensions and runtime are demonstrably better than those proposed for \CC and other concurrent, imperative programming languages. 312 The contributions of this work are: 326 313 \begin{itemize} 327 314 \item … … 628 615 629 616 630 \section{Coroutines: Stepping Stone}631 \label{coroutine} 632 617 \section{Coroutines: A Stepping Stone}\label{coroutine} 618 619 Advanced controlWhile the focus of this discussion is concurrency and parallelism, it is important to address coroutines, which are a significant building block of a concurrency system (but not concurrent among themselves). 633 620 Coroutines are generalized routines allowing execution to be temporarily suspended and later resumed. 634 621 Hence, unlike a normal routine, a coroutine may not terminate when it returns to its caller, allowing it to be restarted with the values and execution location present at the point of suspension. … … 656 643 \begin{lrbox}{\myboxA} 657 644 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 658 `int f n1, fn2, state = 1;` // single global variables645 `int f1, f2, state = 1;` // single global variables 659 646 int fib() { 660 647 int fn; 661 648 `switch ( state )` { // explicit execution state 662 case 1: fn = 0; f n1 = fn; state = 2; break;663 case 2: fn = 1; f n2 = fn1; fn1 = fn; state = 3; break;664 case 3: fn = f n1 + fn2; fn2 = fn1; fn1 = fn; break;649 case 1: fn = 0; f1 = fn; state = 2; break; 650 case 2: fn = 1; f2 = f1; f1 = fn; state = 3; break; 651 case 3: fn = f1 + f2; f2 = f1; f1 = fn; break; 665 652 } 666 653 return fn; … … 679 666 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 680 667 #define FIB_INIT `{ 0, 1 }` 681 typedef struct { int f n2, fn1; } Fib;668 typedef struct { int f2, f1; } Fib; 682 669 int fib( Fib * f ) { 683 670 684 int ret = f->f n2;685 int fn = f->f n1 + f->fn2;686 f->f n2 = f->fn1; f->fn1 = fn;671 int ret = f->f2; 672 int fn = f->f1 + f->f2; 673 f->f2 = f->f1; f->f1 = fn; 687 674 688 675 return ret; … … 691 678 Fib f1 = FIB_INIT, f2 = FIB_INIT; 692 679 for ( int i = 0; i < 10; i += 1 ) { 693 printf( "%d %d\n", fib( &f n1 ), fib( &f2 ) );680 printf( "%d %d\n", fib( &f1 ), fib( &f2 ) ); 694 681 } 695 682 } … … 697 684 \end{lrbox} 698 685 699 \subfloat[3 states: global variables]{\label{f:GlobalVariables}\usebox\myboxA}686 \subfloat[3 States: global variables]{\label{f:GlobalVariables}\usebox\myboxA} 700 687 \qquad 701 \subfloat[1 state: encapsulatedvariables]{\label{f:ExternalState}\usebox\myboxB}702 \caption{C fibonacci fsm}688 \subfloat[1 State: external variables]{\label{f:ExternalState}\usebox\myboxB} 689 \caption{C Fibonacci Implementations} 703 690 \label{f:C-fibonacci} 704 691 … … 710 697 `coroutine` Fib { int fn; }; 711 698 void main( Fib & fib ) with( fib ) { 712 fn = 0; int fn1 = fn; `suspend()`; 713 fn = 1; int fn2 = fn1; fn1 = fn; `suspend()`; 714 for () { 715 fn = fn1 + fn2; fn2 = fn1; fn1 = fn; `suspend()`; } 716 } 717 int next( Fib & fib ) with( fib ) { `resume( fib );` return fn; } 699 int f1, f2; 700 fn = 0; f1 = fn; `suspend()`; 701 fn = 1; f2 = f1; f1 = fn; `suspend()`; 702 for ( ;; ) { 703 fn = f1 + f2; f2 = f1; f1 = fn; `suspend()`; 704 } 705 } 706 int next( Fib & fib ) with( fib ) { 707 `resume( fib );` 708 return fn; 709 } 718 710 int main() { 719 711 Fib f1, f2; 720 for ( 10 )712 for ( int i = 1; i <= 10; i += 1 ) { 721 713 sout | next( f1 ) | next( f2 ); 714 } 722 715 } 723 716 \end{cfa} … … 725 718 \newbox\myboxB 726 719 \begin{lrbox}{\myboxB} 727 \begin{python}[aboveskip=0pt,belowskip=0pt] 728 729 def Fibonacci(): 730 fn = 0; fn1 = fn; `yield fn` # suspend 731 fn = 1; fn2 = fn1; fn1 = fn; `yield fn` 732 while True: 733 fn = fn1 + fn2; fn2 = fn1; fn1 = fn; `yield fn` 734 735 736 f1 = Fibonacci() 737 f2 = Fibonacci() 738 for i in range( 10 ): 739 print( `next( f1 )`, `next( f2 )` ) # resume 740 741 \end{python} 720 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 721 `coroutine` Fib { int ret; }; 722 void main( Fib & f ) with( fib ) { 723 int fn, f1 = 1, f2 = 0; 724 for ( ;; ) { 725 ret = f2; 726 727 fn = f1 + f2; f2 = f1; f1 = fn; `suspend();` 728 } 729 } 730 int next( Fib & fib ) with( fib ) { 731 `resume( fib );` 732 return ret; 733 } 734 735 736 737 738 739 740 \end{cfa} 742 741 \end{lrbox} 743 \subfloat[ \CFA]{\label{f:Coroutine3States}\usebox\myboxA}744 \qquad 745 \subfloat[ Python]{\label{f:Coroutine1State}\usebox\myboxB}746 \caption{ Fibonacci input coroutine, 3 states, internal variables}742 \subfloat[3 States, internal variables]{\label{f:Coroutine3States}\usebox\myboxA} 743 \qquad\qquad 744 \subfloat[1 State, internal variables]{\label{f:Coroutine1State}\usebox\myboxB} 745 \caption{\CFA Coroutine Fibonacci Implementations} 747 746 \label{f:cfa-fibonacci} 748 747 \end{figure} … … 785 784 \begin{lrbox}{\myboxA} 786 785 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 787 `coroutine` F mt {788 char ch; // communication variables789 int g, b; // needed in destructor786 `coroutine` Format { 787 char ch; // used for communication 788 int g, b; // global because used in destructor 790 789 }; 791 void main( F mt & fmt ) with( fmt ) {792 for ( ) {793 for ( g = 0; g < 5; g += 1 ) { // groups794 for ( b = 0; b < 4; b += 1 ) { // block s790 void main( Format & fmt ) with( fmt ) { 791 for ( ;; ) { 792 for ( g = 0; g < 5; g += 1 ) { // group 793 for ( b = 0; b < 4; b += 1 ) { // block 795 794 `suspend();` 796 sout | ch; } // print character 797 sout | " "; } // block separator 798 sout | nl; } // group separator 799 } 800 void ?{}( Fmt & fmt ) { `resume( fmt );` } // prime 801 void ^?{}( Fmt & fmt ) with( fmt ) { // destructor 802 if ( g != 0 || b != 0 ) // special case 803 sout | nl; } 804 void send( Fmt & fmt, char c ) { fmt.ch = c; `resume( fmt )`; } 795 sout | ch; // separator 796 } 797 sout | " "; // separator 798 } 799 sout | nl; 800 } 801 } 802 void ?{}( Format & fmt ) { `resume( fmt );` } 803 void ^?{}( Format & fmt ) with( fmt ) { 804 if ( g != 0 || b != 0 ) sout | nl; 805 } 806 void format( Format & fmt ) { 807 `resume( fmt );` 808 } 805 809 int main() { 806 Fmt fmt; 807 sout | nlOff; // turn off auto newline 808 for ( 41 ) 809 send( fmt, 'a' ); 810 Format fmt; 811 eof: for ( ;; ) { 812 sin | fmt.ch; 813 if ( eof( sin ) ) break eof; 814 format( fmt ); 815 } 810 816 } 811 817 \end{cfa} … … 814 820 \newbox\myboxB 815 821 \begin{lrbox}{\myboxB} 816 \begin{python}[aboveskip=0pt,belowskip=0pt] 817 818 819 820 def Fmt(): 821 try: 822 while True: 823 for g in range( 5 ): 824 for b in range( 4 ): 825 826 print( `(yield)`, end='' ) 827 print( ' ', end='' ) 828 print() 829 830 831 except GeneratorExit: 832 if g != 0 | b != 0: 833 print() 834 835 836 fmt = Fmt() 837 `next( fmt )` # prime 838 for i in range( 41 ): 839 `fmt.send( 'a' );` # send to yield 840 841 \end{python} 822 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 823 struct Format { 824 char ch; 825 int g, b; 826 }; 827 void format( struct Format * fmt ) { 828 if ( fmt->ch != -1 ) { // not EOF ? 829 printf( "%c", fmt->ch ); 830 fmt->b += 1; 831 if ( fmt->b == 4 ) { // block 832 printf( " " ); // separator 833 fmt->b = 0; 834 fmt->g += 1; 835 } 836 if ( fmt->g == 5 ) { // group 837 printf( "\n" ); // separator 838 fmt->g = 0; 839 } 840 } else { 841 if ( fmt->g != 0 || fmt->b != 0 ) printf( "\n" ); 842 } 843 } 844 int main() { 845 struct Format fmt = { 0, 0, 0 }; 846 for ( ;; ) { 847 scanf( "%c", &fmt.ch ); 848 if ( feof( stdin ) ) break; 849 format( &fmt ); 850 } 851 fmt.ch = -1; 852 format( &fmt ); 853 } 854 \end{cfa} 842 855 \end{lrbox} 843 \subfloat[\CFA ]{\label{f:CFAFmt}\usebox\myboxA}856 \subfloat[\CFA Coroutine]{\label{f:CFAFmt}\usebox\myboxA} 844 857 \qquad 845 \subfloat[ Python]{\label{f:CFmt}\usebox\myboxB}846 \caption{ Output formatting text}858 \subfloat[C Linearized]{\label{f:CFmt}\usebox\myboxB} 859 \caption{Formatting text into lines of 5 blocks of 4 characters.} 847 860 \label{f:fmt-line} 848 861 \end{figure} … … 865 878 void main( Prod & prod ) with( prod ) { 866 879 // 1st resume starts here 867 for ( i ; N) {880 for ( int i = 0; i < N; i += 1 ) { 868 881 int p1 = random( 100 ), p2 = random( 100 ); 869 882 sout | p1 | " " | p2; … … 881 894 } 882 895 void start( Prod & prod, int N, Cons &c ) { 883 &prod.c = &c; // reassignable reference896 &prod.c = &c; 884 897 prod.[N, receipt] = [N, 0]; 885 898 `resume( prod );` … … 896 909 Prod & p; 897 910 int p1, p2, status; 898 bool done;911 _Bool done; 899 912 }; 900 913 void ?{}( Cons & cons, Prod & p ) { 901 &cons.p = &p; // reassignable reference914 &cons.p = &p; 902 915 cons.[status, done ] = [0, false]; 903 916 } … … 956 969 The program main restarts after the resume in @start@. 957 970 @start@ returns and the program main terminates. 958 959 One \emph{killer} application for a coroutine is device drivers, which at one time caused 70\%-85\% of failures in Windows/Linux~\cite{Swift05}.960 Many device drivers are a finite state-machine parsing a protocol, e.g.:961 \begin{tabbing}962 \ldots STX \= \ldots message \ldots \= ESC \= ETX \= \ldots message \ldots \= ETX \= 2-byte crc \= \ldots \kill963 \ldots STX \> \ldots message \ldots \> ESC \> ETX \> \ldots message \ldots \> ETX \> 2-byte crc \> \ldots964 \end{tabbing}965 where a network message begins with the control character STX and ends with an ETX, followed by a 2-byte cyclic-redundancy check.966 Control characters may appear in a message if preceded by an ESC.967 Because FSMs can be complex and occur frequently in important domains, direct support of the coroutine is crucial in a systems programminglanguage.968 969 \begin{figure}970 \begin{cfa}971 enum Status { CONT, MSG, ESTX, ELNTH, ECRC };972 `coroutine` Driver {973 Status status;974 char * msg, byte;975 };976 void ?{}( Driver & d, char * m ) { d.msg = m; } $\C[3.0in]{// constructor}$977 Status next( Driver & d, char b ) with( d ) { $\C{// 'with' opens scope}$978 byte = b; `resume( d );` return status;979 }980 void main( Driver & d ) with( d ) {981 enum { STX = '\002', ESC = '\033', ETX = '\003', MaxMsg = 64 };982 unsigned short int crc; $\C{// error checking}$983 msg: for () { $\C{// parse message}$984 status = CONT;985 unsigned int lnth = 0, sum = 0;986 while ( byte != STX ) `suspend();`987 emsg: for () {988 `suspend();` $\C{// process byte}$989 choose ( byte ) { $\C{// switch with default break}$990 case STX:991 status = ESTX; `suspend();` continue msg;992 case ETX:993 break emsg;994 case ESC:995 suspend();996 } // choose997 if ( lnth >= MaxMsg ) { $\C{// buffer full ?}$998 status = ELNTH; `suspend();` continue msg; }999 msg[lnth++] = byte;1000 sum += byte;1001 } // for1002 msg[lnth] = '\0'; $\C{// terminate string}\CRT$1003 `suspend();`1004 crc = (unsigned char)byte << 8; // prevent sign extension for signed char1005 `suspend();`1006 status = (crc | (unsigned char)byte) == sum ? MSG : ECRC;1007 `suspend();`1008 } // for1009 }1010 \end{cfa}1011 \caption{Device driver for simple communication protocol}1012 \end{figure}1013 971 1014 972
Note:
See TracChangeset
for help on using the changeset viewer.