- File:
-
- 1 edited
-
doc/papers/concurrency/Paper.tex (modified) (10 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/papers/concurrency/Paper.tex
re98c7ab r62dbb00 2 2 3 3 \articletype{RESEARCH ARTICLE}% 4 5 % Referees 6 % Doug Lea, dl@cs.oswego.edu, SUNY Oswego 7 % Herb Sutter, hsutter@microsoft.com, Microsoft Corp 8 % Gor Nishanov, gorn@microsoft.com, Microsoft Corp 9 % James Noble, kjx@ecs.vuw.ac.nz, Victoria University of Wellington, School of Engineering and Computer Science 4 10 5 11 \received{XXXXX} … … 312 318 As a result, languages like Java, Scala, Objective-C~\cite{obj-c-book}, \CCeleven~\cite{C11}, and C\#~\cite{Csharp} adopt the 1:1 kernel-threading model, with a variety of presentation mechanisms. 313 319 From 2000 onwards, languages like Go~\cite{Go}, Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, D~\cite{D}, and \uC~\cite{uC++,uC++book} have championed the M:N user-threading model, and many user-threading libraries have appeared~\cite{Qthreads,MPC,Marcel}, including putting green threads back into Java~\cite{Quasar}. 314 The main argument for user-level threading is that they are lighter weight than kernel threads(locking and context switching do not cross the kernel boundary), so there is less restriction on programming styles that encourage large numbers of threads performing medium work units to facilitate load balancing by the runtime~\cite{Verch12}.320 The main argument for user-level threading is that it is lighter weight than kernel threading (locking and context switching do not cross the kernel boundary), so there is less restriction on programming styles that encourage large numbers of threads performing medium work units to facilitate load balancing by the runtime~\cite{Verch12}. 315 321 As well, user-threading facilitates a simpler concurrency approach using thread objects that leverage sequential patterns versus events with call-backs~\cite{Adya02,vonBehren03}. 316 322 Finally, performant user-threading implementations (both time and space) meet or exceed direct kernel-threading implementations, while achieving the programming advantages of high concurrency levels and safety. 317 323 318 A further effort over the past two decades is the development of language memory models to deal with the conflict between language features and compiler/hardware optimizations, \ie ,some language features are unsafe in the presence of aggressive sequential optimizations~\cite{Buhr95a,Boehm05}.324 A further effort over the past two decades is the development of language memory models to deal with the conflict between language features and compiler/hardware optimizations, \ie some language features are unsafe in the presence of aggressive sequential optimizations~\cite{Buhr95a,Boehm05}. 319 325 The consequence is that a language must provide sufficient tools to program around safety issues, as inline and library code is all sequential to the compiler. 320 One solution is low-level qualifiers and functions (\eg ,@volatile@ and atomics) allowing \emph{programmers} to explicitly write safe (race-free~\cite{Boehm12}) programs.326 One solution is low-level qualifiers and functions (\eg @volatile@ and atomics) allowing \emph{programmers} to explicitly write safe (race-free~\cite{Boehm12}) programs. 321 327 A safer solution is high-level language constructs so the \emph{compiler} knows the optimization boundaries, and hence, provides implicit safety. 322 328 This problem is best known with respect to concurrency, but applies to other complex control-flow, like exceptions\footnote{ … … 324 330 The key feature that dovetails with this paper is nonlocal exceptions allowing exceptions to be raised across stacks, with synchronous exceptions raised among coroutines and asynchronous exceptions raised among threads, similar to that in \uC~\cite[\S~5]{uC++} 325 331 } and coroutines. 326 Finally, language solutions allow matching constructs with language paradigm, \ie ,imperative and functional languages often have different presentations of the same concept to fit their programming model.332 Finally, language solutions allow matching constructs with language paradigm, \ie imperative and functional languages often have different presentations of the same concept to fit their programming model. 327 333 328 334 Finally, it is important for a language to provide safety over performance \emph{as the default}, allowing careful reduction of safety for performance when necessary. 329 Two concurrency violations of this philosophy are \emph{spurious wakeup} (random wakeup~\cite[\S~8]{Buhr05a}) and \emph{barging} (signals-as-hints~\cite[\S~8]{Buhr05a}), where one is a consequence of the other, \ie ,once there is spurious wakeup, signals-as-hints follow.335 Two concurrency violations of this philosophy are \emph{spurious wakeup} (random wakeup~\cite[\S~8]{Buhr05a}) and \emph{barging} (signals-as-hints~\cite[\S~8]{Buhr05a}), where one is a consequence of the other, \ie once there is spurious wakeup, signals-as-hints follow. 330 336 However, spurious wakeup is \emph{not} a foundational concurrency property~\cite[\S~8]{Buhr05a}, it is a performance design choice. 331 337 Similarly, signals-as-hints are often a performance decision. … … 337 343 Most augmented traditional (Fortran 18~\cite{Fortran18}, Cobol 14~\cite{Cobol14}, Ada 12~\cite{Ada12}, Java 11~\cite{Java11}) and new languages (Go~\cite{Go}, Rust~\cite{Rust}, and D~\cite{D}), except \CC, diverge from C with different syntax and semantics, only interoperate indirectly with C, and are not systems languages, for those with managed memory. 338 344 As a result, there is a significant learning curve to move to these languages, and C legacy-code must be rewritten. 339 While \CC, like \CFA, takes an evolutionary approach to extend C, \CC's constantly growing complex and interdependent features-set (\eg ,objects, inheritance, templates, etc.) mean idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning \CC.345 While \CC, like \CFA, takes an evolutionary approach to extend C, \CC's constantly growing complex and interdependent features-set (\eg objects, inheritance, templates, etc.) mean idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning \CC. 340 346 Hence, rewriting and retraining costs for these languages, even \CC, are prohibitive for companies with a large C software-base. 341 347 \CFA with its orthogonal feature-set, its high-performance runtime, and direct access to all existing C libraries circumvents these problems. … … 367 373 \section{Stateful Function} 368 374 369 The stateful function is an old idea~\cite{Conway63,Marlin80} that is new again~\cite{C++20Coroutine19}, where execution is temporarily suspended and later resumed, \eg ,plugin, device driver, finite-state machine.375 The stateful function is an old idea~\cite{Conway63,Marlin80} that is new again~\cite{C++20Coroutine19}, where execution is temporarily suspended and later resumed, \eg plugin, device driver, finite-state machine. 370 376 Hence, a stateful function may not end when it returns to its caller, allowing it to be restarted with the data and execution location present at the point of suspension. 371 377 This capability is accomplished by retaining a data/execution \emph{closure} between invocations. 372 If the closure is fixed size, we call it a \emph{generator} (or \emph{stackless}), and its control flow is restricted, \eg ,suspending outside the generator is prohibited.373 If the closure is variabl y sized, we call it a \emph{coroutine} (or \emph{stackful}), and as the names implies, often implemented with a separate stack with no programming restrictions.378 If the closure is fixed size, we call it a \emph{generator} (or \emph{stackless}), and its control flow is restricted, \eg suspending outside the generator is prohibited. 379 If the closure is variable size, we call it a \emph{coroutine} (or \emph{stackful}), and as the names implies, often implemented with a separate stack with no programming restrictions. 374 380 Hence, refactoring a stackless coroutine may require changing it to stackful. 375 A foundational property of all \emph{stateful functions} is that resume/suspend \emph{do not} cause incremental stack growth, \ie ,resume/suspend operations are remembered through the closure not the stack.381 A foundational property of all \emph{stateful functions} is that resume/suspend \emph{do not} cause incremental stack growth, \ie resume/suspend operations are remembered through the closure not the stack. 376 382 As well, activating a stateful function is \emph{asymmetric} or \emph{symmetric}, identified by resume/suspend (no cycles) and resume/resume (cycles). 377 383 A fixed closure activated by modified call/return is faster than a variable closure activated by context switching. 378 Additionally, any storage management for the closure (especially in unmanaged languages, \ie ,no garbage collection) must also be factored into design and performance.384 Additionally, any storage management for the closure (especially in unmanaged languages, \ie no garbage collection) must also be factored into design and performance. 379 385 Therefore, selecting between stackless and stackful semantics is a tradeoff between programming requirements and performance, where stackless is faster and stackful is more general. 380 386 Note, creation cost is amortized across usage, so activation cost is usually the dominant factor. … … 648 654 \end{center} 649 655 The example takes advantage of resuming a generator in the constructor to prime the loops so the first character sent for formatting appears inside the nested loops. 650 The destructor provides a newline if formatted text ends with a full line.656 The destructor provides a newline, if formatted text ends with a full line. 651 657 Figure~\ref{f:CFormatSim} shows the C implementation of the \CFA input generator with one additional field and the computed @goto@. 652 658 For contrast, Figure~\ref{f:PythonFormatter} shows the equivalent Python format generator with the same properties as the Fibonacci generator. … … 669 675 In contrast, the execution state is large, with one @resume@ and seven @suspend@s. 670 676 Hence, the key benefits of the generator are correctness, safety, and maintenance because the execution states are transcribed directly into the programming language rather than using a table-driven approach. 671 Because FSMs can be complex and frequently occur in important domains, direct support of the generator is crucialin a system programming language.677 Because FSMs can be complex and frequently occur in important domains, direct generator support is important in a system programming language. 672 678 673 679 \begin{figure} … … 796 802 This semantics is basically a tail-call optimization, which compilers already perform. 797 803 The example shows the assembly code to undo the generator's entry code before the direct jump. 798 This assembly code depends on what entry code is generated, specifically if there are local variables ,and the level of optimization.804 This assembly code depends on what entry code is generated, specifically if there are local variables and the level of optimization. 799 805 To provide this new calling convention requires a mechanism built into the compiler, which is beyond the scope of \CFA at this time. 800 806 Nevertheless, it is possible to hand generate any symmetric generators for proof of concept and performance testing. … … 2719 2725 Each benchmark experiment is run 31 times. 2720 2726 All omitted tests for other languages are functionally identical to the \CFA tests and available online~\cite{CforallBenchMarks}. 2721 2727 % tar --exclude=.deps --exclude=Makefile --exclude=Makefile.in --exclude=c.c --exclude=cxx.cpp --exclude=fetch_add.c -cvhf benchmark.tar benchmark 2722 2728 2723 2729 \paragraph{Object Creation} … … 2749 2755 \multicolumn{1}{@{}c}{} & \multicolumn{1}{c}{Median} & \multicolumn{1}{c}{Average} & \multicolumn{1}{c@{}}{Std Dev} \\ 2750 2756 \CFA Coroutine Lazy & 14.3 & 14.3 & 0.32 \\ 2751 \CFA Coroutine Eager & 2203.7 & 2205.6 & 26.03\\2757 \CFA Coroutine Eager & 522.8 & 525.3 & 5.81 \\ 2752 2758 \CFA Thread & 1257.8 & 1291.2 & 86.19 \\ 2753 2759 \uC Coroutine & 92.2 & 91.4 & 1.58 \\
Note:
See TracChangeset
for help on using the changeset viewer.