Changeset 933f32f for doc/papers/concurrency/Paper.tex
- Timestamp:
- May 24, 2019, 10:19:41 AM (6 years ago)
- Branches:
- ADT, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
- Children:
- d908563
- Parents:
- 6a9d4b4 (diff), 292642a (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)links above to see all the changes relative to each parent. - File:
-
- 1 edited
-
doc/papers/concurrency/Paper.tex (modified) (15 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/papers/concurrency/Paper.tex
r6a9d4b4 r933f32f 215 215 {} 216 216 \lstnewenvironment{Go}[1][] 217 {\lstset{#1}} 217 {\lstset{language=go,moredelim=**[is][\protect\color{red}]{`}{`},#1}\lstset{#1}} 218 {} 219 \lstnewenvironment{python}[1][] 220 {\lstset{language=python,moredelim=**[is][\protect\color{red}]{`}{`},#1}\lstset{#1}} 218 221 {} 219 222 … … 228 231 } 229 232 230 \title{\texorpdfstring{ Concurrency in \protect\CFA}{Concurrencyin Cforall}}233 \title{\texorpdfstring{Advanced Control-flow and Concurrency in \protect\CFA}{Advanced Control-flow in Cforall}} 231 234 232 235 \author[1]{Thierry Delisle} … … 238 241 \corres{*Peter A. Buhr, Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada. \email{pabuhr{\char`\@}uwaterloo.ca}} 239 242 240 \fundingInfo{Natural Sciences and Engineering Research Council of Canada}243 % \fundingInfo{Natural Sciences and Engineering Research Council of Canada} 241 244 242 245 \abstract[Summary]{ 243 \CFA is a modern, polymorphic, \emph{non-object-oriented} extension of the C programming language. 244 This paper discusses the design of the concurrency and parallelism features in \CFA, and its concurrent runtime-system. 245 These features are created from scratch as ISO C lacks concurrency, relying largely on the pthreads library for concurrency. 246 Coroutines and lightweight (user) threads are introduced into \CFA; 247 as well, monitors are added as a high-level mechanism for mutual exclusion and synchronization. 248 A unique contribution of this work is allowing multiple monitors to be safely acquired \emph{simultaneously}. 249 All features respect the expectations of C programmers, while being fully integrate with the \CFA polymorphic type-system and other language features. 246 \CFA is a polymorphic, non-object-oriented, concurrent, backwards-compatible extension of the C programming language. 247 This paper discusses the design philosophy and implementation of its advanced control-flow and concurrent/parallel features, along with the supporting runtime. 248 These features are created from scratch as ISO C has only low-level and/or unimplemented concurrency, so C programmers continue to rely on library features like C pthreads. 249 \CFA introduces modern language-level control-flow mechanisms, like coroutines, user-level threading, and monitors for mutual exclusion and synchronization. 250 Library extension for executors, futures, and actors are built on these basic mechanisms. 251 The runtime provides significant programmer simplification and safety by eliminating spurious wakeup and reducing monitor barging. 252 The runtime also ensures multiple monitors can be safely acquired \emph{simultaneously} (deadlock free), and this feature is fully integrated with all monitor synchronization mechanisms. 253 All language features integrate with the \CFA polymorphic type-system and exception handling, while respecting the expectations and style of C programmers. 250 254 Experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming-languages. 251 255 }% 252 256 253 \keywords{co ncurrency, parallelism, coroutines, threads, monitors, runtime, C, Cforall}257 \keywords{coroutines, concurrency, parallelism, threads, monitors, runtime, C, \CFA (Cforall)} 254 258 255 259 … … 262 266 \section{Introduction} 263 267 268 This paper discusses the design philosophy and implementation of advanced language-level control-flow and concurrent/parallel features in \CFA~\cite{Moss18} and its runtime. 269 \CFA is a modern, polymorphic, non-object-oriented\footnote{ 270 \CFA has features often associated with object-oriented programming languages, such as constructors, destructors, virtuals and simple inheritance. 271 However, functions \emph{cannot} be nested in structures, so there is no lexical binding between a structure and set of functions (member/method) implemented by an implicit \lstinline@this@ (receiver) parameter.}, 272 backwards-compatible extension of the C programming language. 273 Within the \CFA framework, new control-flow features are created from scratch. 274 ISO \Celeven defines only a subset of the \CFA extensions, where the overlapping features are concurrency~\cite[\S~7.26]{C11}. 275 However, \Celeven concurrency is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}. 276 Furthermore, \Celeven and pthreads concurrency is simple, based on thread fork/join in a function and a few locks, which is low-level and error prone; 277 no high-level language concurrency features are defined. 278 Interestingly, almost a decade after publication of the \Celeven standard, neither gcc-8, clang-8 nor msvc-19 (most recent versions) support the \Celeven include @threads.h@, indicating little interest in the C11 concurrency approach. 279 Finally, while the \Celeven standard does not state a threading model, the historical association with pthreads suggests implementations would adopt kernel-level threading (1:1)~\cite{ThreadModel}. 280 281 In contrast, there has been a renewed interest during the past decade in user-level (M:N, green) threading in old and new programming languages. 282 As multi-core hardware became available in the 1980/90s, both user and kernel threading were examined. 283 Kernel threading was chosen, largely because of its simplicity and fit with the simpler operating systems and hardware architectures at the time, which gave it a performance advantage~\cite{Drepper03}. 284 Libraries like pthreads were developed for C, and the Solaris operating-system switched from user (JDK 1.1~\cite{JDK1.1}) to kernel threads. 285 As a result, languages like Java, Scala~\cite{Scala}, Objective-C~\cite{obj-c-book}, \CCeleven~\cite{C11}, and C\#~\cite{Csharp} adopt the 1:1 kernel-threading model, with a variety of presentation mechanisms. 286 From 2000 onwards, languages like Go~\cite{Go}, Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, D~\cite{D}, and \uC~\cite{uC++,uC++book} have championed the M:N user-threading model, and many user-threading libraries have appeared~\cite{Qthreads,MPC,BoostThreads}, including putting green threads back into Java~\cite{Quasar}. 287 The main argument for user-level threading is that they are lighter weight than kernel threads (locking and context switching do not cross the kernel boundary), so there is less restriction on programming styles that encourage large numbers of threads performing smaller work-units to facilitate load balancing by the runtime~\cite{Verch12}. 288 As well, user-threading facilitates a simpler concurrency approach using thread objects that leverage sequential patterns versus events with call-backs~\cite{vonBehren03}. 289 Finally, performant user-threading implementations (both time and space) are largely competitive with direct kernel-threading implementations, while achieving the programming advantages of high concurrency levels and safety. 290 291 A further effort over the past two decades is the development of language memory-models to deal with the conflict between language features and compiler/hardware optimizations, i.e., some language features are unsafe in the presence of aggressive sequential optimizations~\cite{Buhr95a,Boehm05}. 292 The consequence is that a language must provide sufficient tools to program around safety issues, as inline and library code is all sequential to the compiler. 293 One solution is low-level qualifiers and functions (e.g., @volatile@ and atomics) allowing \emph{programmers} to explicitly write safe (race-free~\cite{Boehm12}) programs. 294 A safer solution is high-level language constructs so the \emph{compiler} knows the optimization boundaries, and hence, provides implicit safety. 295 This problem is best know with respect to concurrency, but applies to other complex control-flow, like exceptions\footnote{ 296 \CFA exception handling will be presented in a separate paper. 297 The key feature that dovetails with this paper is non-local exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++} 298 } and coroutines. 299 Finally, solutions in the language allows matching constructs with language paradigm, i.e., imperative and functional languages have different presentations of the same concept. 300 301 Finally, it is important for a language to provide safety over performance \emph{as the default}, allowing careful reduction of safety for performance when necessary. 302 Two concurrency violations of this philosophy are \emph{spurious wakeup} and \emph{barging}, i.e., random wakeup~\cite[\S~8]{Buhr05a} and signalling-as-hints~\cite[\S~8]{Buhr05a}, where one begats the other. 303 If you believe spurious wakeup is a foundational concurrency property, than unblocking (signalling) a thread is always a hint. 304 If you \emph{do not} believe spurious wakeup is foundational, than signalling-as-hints is a performance decision. 305 Most importantly, removing spurious wakeup and signals-as-hints makes concurrent programming significantly safer because it removes local non-determinism. 306 Clawing back performance where the local non-determinism is unimportant, should be an option not the default. 307 308 \begin{comment} 309 For example, it is possible to provide exceptions, coroutines, monitors, and tasks as specialized types in an object-oriented language, integrating these constructs to allow leveraging the type-system (static type-checking) and all other object-oriented capabilities~\cite{uC++}. 310 It is also possible to leverage call/return for blocking communication via new control structures, versus switching to alternative communication paradigms, like channels or message passing. 311 As well, user threading is often a complementary feature, allowing light-weight threading to match with low-cost objects, while hiding the application/kernel boundary. 312 User threading also allows layering of implicit concurrency models (no explicit thread creation), such executors, data-flow, actors, into a single language, so programmers can chose the model that best fits an algorithm.\footnote{ 313 All implicit concurrency models have explicit threading in their implementation, and hence, can be build from explicit threading; 314 however, the reverse is seldom true, i.e., given implicit concurrency, e.g., actors, it is virtually impossible to create explicit concurrency, e.g., blocking thread objects.} 315 Finally, with extended language features and user-level threading it is possible to discretely fold locking and non-blocking I/O multiplexing into the language's I/O libraries, so threading implicitly dovetails with the I/O subsystem. 316 \CFA embraces language extensions and user-level threading to provide advanced control-flow (exception handling\footnote{ 317 \CFA exception handling will be presented in a separate paper. 318 The key feature that dovetails with this paper is non-local exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++} 319 } and coroutines) and concurrency. 320 321 Most augmented traditional (Fortran 18~\cite{Fortran18}, Cobol 14~\cite{Cobol14}, Ada 12~\cite{Ada12}, Java 11~\cite{Java11}) and new languages (Go~\cite{Go}, Rust~\cite{Rust}, and D~\cite{D}), except \CC, diverge from C with different syntax and semantics, only interoperate indirectly with C, and are not systems languages, for those with managed memory. 322 As a result, there is a significant learning curve to move to these languages, and C legacy-code must be rewritten. 323 While \CC, like \CFA, takes an evolutionary approach to extend C, \CC's constantly growing complex and interdependent features-set (e.g., objects, inheritance, templates, etc.) mean idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning \CC. 324 Hence, rewriting and retraining costs for these languages, even \CC, are prohibitive for companies with a large C software-base. 325 \CFA with its orthogonal feature-set, its high-performance runtime, and direct access to all existing C libraries circumvents these problems. 326 \end{comment} 327 328 \CFA embraces user-level threading, language extensions for advanced control-flow, and safety as the default. 329 We present comparative examples so the reader can judge if the \CFA control-flow extensions are better and safer than those in or proposed for \Celeven, \CC and other concurrent, imperative programming languages, and perform experiments to show the \CFA runtime is competitive with other similar mechanisms. 330 The main contributions of this work are: 331 \begin{itemize} 332 \item 333 expressive language-level coroutines and user-level threading, which respect the expectations of C programmers. 334 \item 335 monitor synchronization without barging. 336 \item 337 safely acquiring multiple monitors \emph{simultaneously} (deadlock free), while seamlessly integrating this capability with all monitor synchronization mechanisms. 338 \item 339 providing statically type-safe interfaces that integrate with the \CFA polymorphic type-system and other language features. 340 \item 341 library extensions for executors, futures, and actors built on the basic mechanisms. 342 \item 343 a runtime system with no spurious wakeup. 344 \item 345 experimental results showing comparable performance of the new features with similar mechanisms in other concurrent programming-languages. 346 \end{itemize} 347 348 \begin{comment} 264 349 This paper provides a minimal concurrency \newterm{Application Program Interface} (API) that is simple, efficient and can be used to build other concurrency features. 265 350 While the simplest concurrency system is a thread and a lock, this low-level approach is hard to master. … … 281 366 The proposed concurrency API is implemented in a dialect of C, called \CFA (pronounced C-for-all). 282 367 The paper discusses how the language features are added to the \CFA translator with respect to parsing, semantics, and type checking, and the corresponding high-performance runtime-library to implement the concurrent features. 283 284 368 \end{comment} 369 370 371 \begin{comment} 285 372 \section{\CFA Overview} 286 373 … … 551 638 \end{cfa} 552 639 where the return type supplies the type/size of the allocation, which is impossible in most type systems. 553 554 555 \section{Concurrency} 556 \label{s:Concurrency} 557 558 At its core, concurrency is based on multiple call-stacks and scheduling threads executing on these stacks. 559 Multiple call stacks (or contexts) and a single thread of execution, called \newterm{coroutining}~\cite{Conway63,Marlin80}, does \emph{not} imply concurrency~\cite[\S~2]{Buhr05a}. 560 In coroutining, the single thread is self-scheduling across the stacks, so execution is deterministic, \ie the execution path from input to output is fixed and predictable. 561 A \newterm{stackless} coroutine executes on the caller's stack~\cite{Python} but this approach is restrictive, \eg preventing modularization and supporting only iterator/generator-style programming; 562 a \newterm{stackful} coroutine executes on its own stack, allowing full generality. 563 Only stackful coroutines are a stepping stone to concurrency. 564 565 The transition to concurrency, even for execution with a single thread and multiple stacks, occurs when coroutines also context switch to a \newterm{scheduling oracle}, introducing non-determinism from the coroutine perspective~\cite[\S~3]{Buhr05a}. 566 Therefore, a minimal concurrency system is possible using coroutines (see Section \ref{coroutine}) in conjunction with a scheduler to decide where to context switch next. 567 The resulting execution system now follows a cooperative threading-model, called \newterm{non-preemptive scheduling}. 568 569 Because the scheduler is special, it can either be a stackless or stackful coroutine. 570 For stackless, the scheduler performs scheduling on the stack of the current coroutine and switches directly to the next coroutine, so there is one context switch. 571 For stackful, the current coroutine switches to the scheduler, which performs scheduling, and it then switches to the next coroutine, so there are two context switches. 572 A stackful scheduler is often used for simplicity and security. 573 574 Regardless of the approach used, a subset of concurrency related challenges start to appear. 575 For the complete set of concurrency challenges to occur, the missing feature is \newterm{preemption}, where context switching occurs randomly between any two instructions, often based on a timer interrupt, called \newterm{preemptive scheduling}. 576 While a scheduler introduces uncertainty in the order of execution, preemption introduces uncertainty about where context switches occur. 577 Interestingly, uncertainty is necessary for the runtime (operating) system to give the illusion of parallelism on a single processor and increase performance on multiple processors. 578 The reason is that only the runtime has complete knowledge about resources and how to best utilized them. 579 However, the introduction of unrestricted non-determinism results in the need for \newterm{mutual exclusion} and \newterm{synchronization} to restrict non-determinism for correctness; 580 otherwise, it is impossible to write meaningful programs. 581 Optimal performance in concurrent applications is often obtained by having as much non-determinism as correctness allows. 582 583 An important missing feature in C is threading\footnote{While the C11 standard defines a \protect\lstinline@threads.h@ header, it is minimal and defined as optional. 584 As such, library support for threading is far from widespread. 585 At the time of writing the paper, neither \protect\lstinline@gcc@ nor \protect\lstinline@clang@ support \protect\lstinline@threads.h@ in their standard libraries.}. 586 In modern programming languages, a lack of threading is unacceptable~\cite{Sutter05, Sutter05b}, and therefore existing and new programming languages must have tools for writing efficient concurrent programs to take advantage of parallelism. 587 As an extension of C, \CFA needs to express these concepts in a way that is as natural as possible to programmers familiar with imperative languages. 588 Furthermore, because C is a system-level language, programmers expect to choose precisely which features they need and which cost they are willing to pay. 589 Hence, concurrent programs should be written using high-level mechanisms, and only step down to lower-level mechanisms when performance bottlenecks are encountered. 590 591 592 \subsection{Coroutines: A Stepping Stone}\label{coroutine} 593 594 While the focus of this discussion is concurrency and parallelism, it is important to address coroutines, which are a significant building block of a concurrency system (but not concurrent among themselves). 640 \end{comment} 641 642 643 \section{Coroutines: Stepping Stone} 644 \label{coroutine} 645 595 646 Coroutines are generalized routines allowing execution to be temporarily suspended and later resumed. 596 647 Hence, unlike a normal routine, a coroutine may not terminate when it returns to its caller, allowing it to be restarted with the values and execution location present at the point of suspension. … … 616 667 \centering 617 668 \newbox\myboxA 669 % \begin{lrbox}{\myboxA} 670 % \begin{cfa}[aboveskip=0pt,belowskip=0pt] 671 % `int fn1, fn2, state = 1;` // single global variables 672 % int fib() { 673 % int fn; 674 % `switch ( state )` { // explicit execution state 675 % case 1: fn = 0; fn1 = fn; state = 2; break; 676 % case 2: fn = 1; fn2 = fn1; fn1 = fn; state = 3; break; 677 % case 3: fn = fn1 + fn2; fn2 = fn1; fn1 = fn; break; 678 % } 679 % return fn; 680 % } 681 % int main() { 682 % 683 % for ( int i = 0; i < 10; i += 1 ) { 684 % printf( "%d\n", fib() ); 685 % } 686 % } 687 % \end{cfa} 688 % \end{lrbox} 618 689 \begin{lrbox}{\myboxA} 619 690 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 620 `int f1, f2, state = 1;` // single global variables 621 int fib() { 622 int fn; 623 `switch ( state )` { // explicit execution state 624 case 1: fn = 0; f1 = fn; state = 2; break; 625 case 2: fn = 1; f2 = f1; f1 = fn; state = 3; break; 626 case 3: fn = f1 + f2; f2 = f1; f1 = fn; break; 627 } 628 return fn; 629 } 691 #define FIB_INIT { 0, 1 } 692 typedef struct { int fn1, fn; } Fib; 693 int fib( Fib * f ) { 694 695 int ret = f->fn1; 696 f->fn1 = f->fn; 697 f->fn = ret + f->fn; 698 return ret; 699 } 700 701 702 630 703 int main() { 631 704 Fib f1 = FIB_INIT, f2 = FIB_INIT; 632 705 for ( int i = 0; i < 10; i += 1 ) { 633 printf( "%d\n", fib() ); 706 printf( "%d %d\n", 707 fib( &f1 ), fib( &f2 ) ); 634 708 } 635 709 } … … 640 714 \begin{lrbox}{\myboxB} 641 715 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 642 #define FIB_INIT `{ 0, 1 }` 643 typedef struct { int f2, f1; } Fib; 644 int fib( Fib * f ) { 645 646 int ret = f->f2; 647 int fn = f->f1 + f->f2; 648 f->f2 = f->f1; f->f1 = fn; 649 650 return ret; 651 } 652 int main() { 653 Fib f1 = FIB_INIT, f2 = FIB_INIT; 654 for ( int i = 0; i < 10; i += 1 ) { 655 printf( "%d %d\n", fib( &f1 ), fib( &f2 ) ); 716 `coroutine` Fib { int fn1; }; 717 void main( Fib & fib ) with( fib ) { 718 int fn; 719 [fn1, fn] = [0, 1]; 720 for () { 721 `suspend();` 722 [fn1, fn] = [fn, fn1 + fn]; 656 723 } 657 724 } 658 \end{cfa} 659 \end{lrbox} 660 661 \subfloat[3 States: global variables]{\label{f:GlobalVariables}\usebox\myboxA} 662 \qquad 663 \subfloat[1 State: external variables]{\label{f:ExternalState}\usebox\myboxB} 664 \caption{C Fibonacci Implementations} 665 \label{f:C-fibonacci} 666 667 \bigskip 668 669 \newbox\myboxA 670 \begin{lrbox}{\myboxA} 671 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 672 `coroutine` Fib { int fn; }; 673 void main( Fib & fib ) with( fib ) { 674 int f1, f2; 675 fn = 0; f1 = fn; `suspend()`; 676 fn = 1; f2 = f1; f1 = fn; `suspend()`; 677 for ( ;; ) { 678 fn = f1 + f2; f2 = f1; f1 = fn; `suspend()`; 679 } 680 } 681 int next( Fib & fib ) with( fib ) { 682 `resume( fib );` 683 return fn; 725 int ?()( Fib & fib ) with( fib ) { 726 `resume( fib );` return fn1; 684 727 } 685 728 int main() { 686 729 Fib f1, f2; 687 for ( int i = 1; i <= 10; i += 1 ) { 688 sout | next( f1 ) | next( f2 ); 689 } 690 } 730 for ( 10 ) { 731 sout | f1() | f2(); 732 } 733 734 691 735 \end{cfa} 692 736 \end{lrbox} 693 \newbox\myboxB 694 \begin{lrbox}{\myboxB} 695 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 696 `coroutine` Fib { int ret; }; 697 void main( Fib & f ) with( fib ) { 698 int fn, f1 = 1, f2 = 0; 699 for ( ;; ) { 700 ret = f2; 701 702 fn = f1 + f2; f2 = f1; f1 = fn; `suspend();` 703 } 704 } 705 int next( Fib & fib ) with( fib ) { 706 `resume( fib );` 707 return ret; 708 } 709 710 711 712 713 714 715 \end{cfa} 737 738 \newbox\myboxC 739 \begin{lrbox}{\myboxC} 740 \begin{python}[aboveskip=0pt,belowskip=0pt] 741 742 def Fib(): 743 744 fn1, fn = 0, 1 745 while True: 746 `yield fn1` 747 fn1, fn = fn, fn1 + fn 748 749 750 // next prewritten 751 752 753 f1 = Fib() 754 f2 = Fib() 755 for i in range( 10 ): 756 print( next( f1 ), next( f2 ) ) 757 758 759 760 \end{python} 716 761 \end{lrbox} 717 \subfloat[3 States, internal variables]{\label{f:Coroutine3States}\usebox\myboxA} 718 \qquad\qquad 719 \subfloat[1 State, internal variables]{\label{f:Coroutine1State}\usebox\myboxB} 720 \caption{\CFA Coroutine Fibonacci Implementations} 721 \label{f:cfa-fibonacci} 762 763 \subfloat[C]{\label{f:GlobalVariables}\usebox\myboxA} 764 \hspace{3pt} 765 \vrule 766 \hspace{3pt} 767 \subfloat[\CFA]{\label{f:ExternalState}\usebox\myboxB} 768 \hspace{3pt} 769 \vrule 770 \hspace{3pt} 771 \subfloat[Python]{\label{f:ExternalState}\usebox\myboxC} 772 \caption{Fibonacci Generator} 773 \label{f:C-fibonacci} 774 775 % \bigskip 776 % 777 % \newbox\myboxA 778 % \begin{lrbox}{\myboxA} 779 % \begin{cfa}[aboveskip=0pt,belowskip=0pt] 780 % `coroutine` Fib { int fn; }; 781 % void main( Fib & fib ) with( fib ) { 782 % fn = 0; int fn1 = fn; `suspend()`; 783 % fn = 1; int fn2 = fn1; fn1 = fn; `suspend()`; 784 % for () { 785 % fn = fn1 + fn2; fn2 = fn1; fn1 = fn; `suspend()`; } 786 % } 787 % int next( Fib & fib ) with( fib ) { `resume( fib );` return fn; } 788 % int main() { 789 % Fib f1, f2; 790 % for ( 10 ) 791 % sout | next( f1 ) | next( f2 ); 792 % } 793 % \end{cfa} 794 % \end{lrbox} 795 % \newbox\myboxB 796 % \begin{lrbox}{\myboxB} 797 % \begin{python}[aboveskip=0pt,belowskip=0pt] 798 % 799 % def Fibonacci(): 800 % fn = 0; fn1 = fn; `yield fn` # suspend 801 % fn = 1; fn2 = fn1; fn1 = fn; `yield fn` 802 % while True: 803 % fn = fn1 + fn2; fn2 = fn1; fn1 = fn; `yield fn` 804 % 805 % 806 % f1 = Fibonacci() 807 % f2 = Fibonacci() 808 % for i in range( 10 ): 809 % print( `next( f1 )`, `next( f2 )` ) # resume 810 % 811 % \end{python} 812 % \end{lrbox} 813 % \subfloat[\CFA]{\label{f:Coroutine3States}\usebox\myboxA} 814 % \qquad 815 % \subfloat[Python]{\label{f:Coroutine1State}\usebox\myboxB} 816 % \caption{Fibonacci input coroutine, 3 states, internal variables} 817 % \label{f:cfa-fibonacci} 722 818 \end{figure} 723 819 … … 759 855 \begin{lrbox}{\myboxA} 760 856 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 761 `coroutine` F ormat {762 char ch; // used for communication763 int g, b; // global because used in destructor857 `coroutine` Fmt { 858 char ch; // communication variables 859 int g, b; // needed in destructor 764 860 }; 765 void main( F ormat & fmt ) with( fmt ) {766 for ( ;;) {767 for ( g = 0; g < 5; g += 1 ) { // group768 for ( b = 0; b < 4; b += 1 ) { // block 861 void main( Fmt & fmt ) with( fmt ) { 862 for () { 863 for ( g = 0; g < 5; g += 1 ) { // groups 864 for ( b = 0; b < 4; b += 1 ) { // blocks 769 865 `suspend();` 770 sout | ch; // separator 771 } 772 sout | " "; // separator 773 } 774 sout | nl; 775 } 776 } 777 void ?{}( Format & fmt ) { `resume( fmt );` } 778 void ^?{}( Format & fmt ) with( fmt ) { 779 if ( g != 0 || b != 0 ) sout | nl; 780 } 781 void format( Format & fmt ) { 782 `resume( fmt );` 783 } 866 sout | ch; } // print character 867 sout | " "; } // block separator 868 sout | nl; } // group separator 869 } 870 void ?{}( Fmt & fmt ) { `resume( fmt );` } // prime 871 void ^?{}( Fmt & fmt ) with( fmt ) { // destructor 872 if ( g != 0 || b != 0 ) // special case 873 sout | nl; } 874 void send( Fmt & fmt, char c ) { fmt.ch = c; `resume( fmt )`; } 784 875 int main() { 785 Format fmt; 786 eof: for ( ;; ) { 787 sin | fmt.ch; 788 if ( eof( sin ) ) break eof; 789 format( fmt ); 790 } 876 Fmt fmt; 877 sout | nlOff; // turn off auto newline 878 for ( 41 ) 879 send( fmt, 'a' ); 791 880 } 792 881 \end{cfa} … … 795 884 \newbox\myboxB 796 885 \begin{lrbox}{\myboxB} 797 \begin{cfa}[aboveskip=0pt,belowskip=0pt] 798 struct Format { 799 char ch; 800 int g, b; 801 }; 802 void format( struct Format * fmt ) { 803 if ( fmt->ch != -1 ) { // not EOF ? 804 printf( "%c", fmt->ch ); 805 fmt->b += 1; 806 if ( fmt->b == 4 ) { // block 807 printf( " " ); // separator 808 fmt->b = 0; 809 fmt->g += 1; 810 } 811 if ( fmt->g == 5 ) { // group 812 printf( "\n" ); // separator 813 fmt->g = 0; 814 } 815 } else { 816 if ( fmt->g != 0 || fmt->b != 0 ) printf( "\n" ); 817 } 818 } 819 int main() { 820 struct Format fmt = { 0, 0, 0 }; 821 for ( ;; ) { 822 scanf( "%c", &fmt.ch ); 823 if ( feof( stdin ) ) break; 824 format( &fmt ); 825 } 826 fmt.ch = -1; 827 format( &fmt ); 828 } 829 \end{cfa} 886 \begin{python}[aboveskip=0pt,belowskip=0pt] 887 888 889 890 def Fmt(): 891 try: 892 while True: 893 for g in range( 5 ): 894 for b in range( 4 ): 895 896 print( `(yield)`, end='' ) 897 print( ' ', end='' ) 898 print() 899 900 901 except GeneratorExit: 902 if g != 0 | b != 0: 903 print() 904 905 906 fmt = Fmt() 907 `next( fmt )` # prime 908 for i in range( 41 ): 909 `fmt.send( 'a' );` # send to yield 910 911 \end{python} 830 912 \end{lrbox} 831 \subfloat[\CFA Coroutine]{\label{f:CFAFmt}\usebox\myboxA}913 \subfloat[\CFA]{\label{f:CFAFmt}\usebox\myboxA} 832 914 \qquad 833 \subfloat[ C Linearized]{\label{f:CFmt}\usebox\myboxB}834 \caption{ Formatting text into lines of 5 blocks of 4 characters.}915 \subfloat[Python]{\label{f:CFmt}\usebox\myboxB} 916 \caption{Output formatting text} 835 917 \label{f:fmt-line} 836 918 \end{figure} … … 853 935 void main( Prod & prod ) with( prod ) { 854 936 // 1st resume starts here 855 for ( i nt i = 0; i < N; i += 1) {937 for ( i; N ) { 856 938 int p1 = random( 100 ), p2 = random( 100 ); 857 939 sout | p1 | " " | p2; … … 869 951 } 870 952 void start( Prod & prod, int N, Cons &c ) { 871 &prod.c = &c; 953 &prod.c = &c; // reassignable reference 872 954 prod.[N, receipt] = [N, 0]; 873 955 `resume( prod );` … … 884 966 Prod & p; 885 967 int p1, p2, status; 886 _Bool done;968 bool done; 887 969 }; 888 970 void ?{}( Cons & cons, Prod & p ) { 889 &cons.p = &p; 971 &cons.p = &p; // reassignable reference 890 972 cons.[status, done ] = [0, false]; 891 973 } … … 945 1027 @start@ returns and the program main terminates. 946 1028 1029 One \emph{killer} application for a coroutine is device drivers, which at one time caused 70\%-85\% of failures in Windows/Linux~\cite{Swift05}. 1030 Many device drivers are a finite state-machine parsing a protocol, e.g.: 1031 \begin{tabbing} 1032 \ldots STX \= \ldots message \ldots \= ESC \= ETX \= \ldots message \ldots \= ETX \= 2-byte crc \= \ldots \kill 1033 \ldots STX \> \ldots message \ldots \> ESC \> ETX \> \ldots message \ldots \> ETX \> 2-byte crc \> \ldots 1034 \end{tabbing} 1035 where a network message begins with the control character STX and ends with an ETX, followed by a 2-byte cyclic-redundancy check. 1036 Control characters may appear in a message if preceded by an ESC. 1037 Because FSMs can be complex and occur frequently in important domains, direct support of the coroutine is crucial in a systems programminglanguage. 1038 1039 \begin{figure} 1040 \begin{cfa} 1041 enum Status { CONT, MSG, ESTX, ELNTH, ECRC }; 1042 `coroutine` Driver { 1043 Status status; 1044 char * msg, byte; 1045 }; 1046 void ?{}( Driver & d, char * m ) { d.msg = m; } $\C[3.0in]{// constructor}$ 1047 Status next( Driver & d, char b ) with( d ) { $\C{// 'with' opens scope}$ 1048 byte = b; `resume( d );` return status; 1049 } 1050 void main( Driver & d ) with( d ) { 1051 enum { STX = '\002', ESC = '\033', ETX = '\003', MaxMsg = 64 }; 1052 unsigned short int crc; $\C{// error checking}$ 1053 msg: for () { $\C{// parse message}$ 1054 status = CONT; 1055 unsigned int lnth = 0, sum = 0; 1056 while ( byte != STX ) `suspend();` 1057 emsg: for () { 1058 `suspend();` $\C{// process byte}$ 1059 choose ( byte ) { $\C{// switch with default break}$ 1060 case STX: 1061 status = ESTX; `suspend();` continue msg; 1062 case ETX: 1063 break emsg; 1064 case ESC: 1065 suspend(); 1066 } // choose 1067 if ( lnth >= MaxMsg ) { $\C{// buffer full ?}$ 1068 status = ELNTH; `suspend();` continue msg; } 1069 msg[lnth++] = byte; 1070 sum += byte; 1071 } // for 1072 msg[lnth] = '\0'; $\C{// terminate string}\CRT$ 1073 `suspend();` 1074 crc = (unsigned char)byte << 8; // prevent sign extension for signed char 1075 `suspend();` 1076 status = (crc | (unsigned char)byte) == sum ? MSG : ECRC; 1077 `suspend();` 1078 } // for 1079 } 1080 \end{cfa} 1081 \caption{Device driver for simple communication protocol} 1082 \end{figure} 1083 947 1084 948 1085 \subsection{Coroutine Implementation} … … 1060 1197 \end{cquote} 1061 1198 The combination of these two approaches allows an easy and concise specification to coroutining (and concurrency) for normal users, while more advanced users have tighter control on memory layout and initialization. 1199 1200 1201 \section{Concurrency} 1202 \label{s:Concurrency} 1203 1204 At its core, concurrency is based on multiple call-stacks and scheduling threads executing on these stacks. 1205 Multiple call stacks (or contexts) and a single thread of execution, called \newterm{coroutining}~\cite{Conway63,Marlin80}, does \emph{not} imply concurrency~\cite[\S~2]{Buhr05a}. 1206 In coroutining, the single thread is self-scheduling across the stacks, so execution is deterministic, \ie the execution path from input to output is fixed and predictable. 1207 A \newterm{stackless} coroutine executes on the caller's stack~\cite{Python} but this approach is restrictive, \eg preventing modularization and supporting only iterator/generator-style programming; 1208 a \newterm{stackful} coroutine executes on its own stack, allowing full generality. 1209 Only stackful coroutines are a stepping stone to concurrency. 1210 1211 The transition to concurrency, even for execution with a single thread and multiple stacks, occurs when coroutines also context switch to a \newterm{scheduling oracle}, introducing non-determinism from the coroutine perspective~\cite[\S~3]{Buhr05a}. 1212 Therefore, a minimal concurrency system is possible using coroutines (see Section \ref{coroutine}) in conjunction with a scheduler to decide where to context switch next. 1213 The resulting execution system now follows a cooperative threading-model, called \newterm{non-preemptive scheduling}. 1214 1215 Because the scheduler is special, it can either be a stackless or stackful coroutine. 1216 For stackless, the scheduler performs scheduling on the stack of the current coroutine and switches directly to the next coroutine, so there is one context switch. 1217 For stackful, the current coroutine switches to the scheduler, which performs scheduling, and it then switches to the next coroutine, so there are two context switches. 1218 A stackful scheduler is often used for simplicity and security. 1219 1220 Regardless of the approach used, a subset of concurrency related challenges start to appear. 1221 For the complete set of concurrency challenges to occur, the missing feature is \newterm{preemption}, where context switching occurs randomly between any two instructions, often based on a timer interrupt, called \newterm{preemptive scheduling}. 1222 While a scheduler introduces uncertainty in the order of execution, preemption introduces uncertainty about where context switches occur. 1223 Interestingly, uncertainty is necessary for the runtime (operating) system to give the illusion of parallelism on a single processor and increase performance on multiple processors. 1224 The reason is that only the runtime has complete knowledge about resources and how to best utilized them. 1225 However, the introduction of unrestricted non-determinism results in the need for \newterm{mutual exclusion} and \newterm{synchronization} to restrict non-determinism for correctness; 1226 otherwise, it is impossible to write meaningful programs. 1227 Optimal performance in concurrent applications is often obtained by having as much non-determinism as correctness allows. 1228 1229 An important missing feature in C is threading\footnote{While the C11 standard defines a \protect\lstinline@threads.h@ header, it is minimal and defined as optional. 1230 As such, library support for threading is far from widespread. 1231 At the time of writing the paper, neither \protect\lstinline@gcc@ nor \protect\lstinline@clang@ support \protect\lstinline@threads.h@ in their standard libraries.}. 1232 In modern programming languages, a lack of threading is unacceptable~\cite{Sutter05, Sutter05b}, and therefore existing and new programming languages must have tools for writing efficient concurrent programs to take advantage of parallelism. 1233 As an extension of C, \CFA needs to express these concepts in a way that is as natural as possible to programmers familiar with imperative languages. 1234 Furthermore, because C is a system-level language, programmers expect to choose precisely which features they need and which cost they are willing to pay. 1235 Hence, concurrent programs should be written using high-level mechanisms, and only step down to lower-level mechanisms when performance bottlenecks are encountered. 1062 1236 1063 1237
Note:
See TracChangeset
for help on using the changeset viewer.