Changeset ca0f061f


Ignore:
Timestamp:
Mar 5, 2019, 1:36:51 PM (5 years ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, persistent-indexer, pthread-emulation, qualifiedEnum
Children:
97a1544
Parents:
17a1b21
Message:

second introduction update

Location:
doc
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • doc/bibliography/pl.bib

    r17a1b21 rca0f061f  
    11411141}
    11421142
     1143@inproceedings{Tarditi18,
     1144    keywords    = {Checked C},
     1145    contributer = {a3moss@uwaterloo.ca},
     1146    author      = {Tarditi, David and Elliott, Archibald Samuel and Ruef, Andrew and Hicks, Michael},
     1147    title       = {Checked C: Making C Safe by Extension},
     1148    booktitle   = {2018 IEEE Cybersecurity Development (SecDev)},
     1149    publisher   = {IEEE},
     1150    year        = {2018},
     1151    month       = seep,
     1152    pages       = {53-60},
     1153    url         = {https://www.microsoft.com/en-us/research/publication/checkedc-making-c-safe-by-extension/},
     1154}
     1155
    11431156@book{Yourdon79,
    11441157    keywords    = {software engineering},
     
    13011314    journal     = sigplan,
    13021315    year        = 1986,
    1303     month       = oct, volume = 21, number = 10, pages = {19-28},
     1316    month       = oct,
     1317    volume      = 21,
     1318    number      = 10,
     1319    pages       = {19-28},
    13041320    note        = {Object Oriented Programming Workshop}
    13051321}
     
    77917807% Y
    77927808
     7809@article{Boehm12,
     7810    keywords    = {memory model, race condition},
     7811    contributer = {pabuhr@plg},
     7812    author      = {Boehm, Hans-J. and Adve, Sarita V.},
     7813    title       = {You Don'T Know Jack About Shared Variables or Memory Models},
     7814    journal     = cacm,
     7815    volume      = 55,
     7816    number      = 2,
     7817    month       = feb,
     7818    year        = 2012,
     7819    pages       = {48--54},
     7820    publisher   = {ACM},
     7821    address     = {New York, NY, USA},
     7822}
     7823
    77937824% Z
    77947825
  • doc/papers/concurrency/Paper.tex

    r17a1b21 rca0f061f  
    228228}
    229229
    230 \title{\texorpdfstring{Concurrency in \protect\CFA}{Concurrency in Cforall}}
     230\title{\texorpdfstring{Advanced Control-flow in \protect\CFA}{Advanced Control-flow in Cforall}}
    231231
    232232\author[1]{Thierry Delisle}
     
    241241
    242242\abstract[Summary]{
    243 \CFA is a modern, polymorphic, \emph{non-object-oriented}, backwards-compatible extension of the C programming language.
    244 This paper discusses the concurrency and parallelism features in \CFA, and its runtime system.
     243\CFA is a modern, polymorphic, non-object-oriented, backwards-compatible extension of the C programming language.
     244This paper discusses the advanced control-flow features in \CFA, which include concurrency and parallelism, and its supporting runtime system.
    245245These features are created from scratch as ISO C's concurrency is low-level and unimplemented, so C programmers continue to rely on the C pthreads library.
    246 \CFA provides high-level control-flow mechanisms, like coroutines and lightweight (user) threads, and monitors for mutual exclusion and synchronization.
    247 A unique contribution of this work is allowing multiple monitors to be safely acquired \emph{simultaneously}.
     246\CFA provides high-level control-flow mechanisms, like coroutines and user-level threads, and monitors for mutual exclusion and synchronization.
     247A unique contribution of this work is allowing multiple monitors to be safely acquired \emph{simultaneously} (deadlock free), while integrating this capability with all monitor synchronization mechanisms.
    248248All features respect the expectations of C programmers, while being fully integrate with the \CFA polymorphic type-system and other language features.
    249249Experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming-languages.
    250250}%
    251251
    252 \keywords{concurrency, parallelism, coroutines, threads, monitors, runtime, C, \CFA (Cforall)}
     252\keywords{coroutines, concurrency, parallelism, threads, monitors, runtime, C, \CFA (Cforall)}
    253253
    254254
     
    261261\section{Introduction}
    262262
    263 This paper discusses the design of the high-level concurrency and parallelism features in \CFA, and its runtime.
    264 \CFA is a modern, polymorphic, \emph{non-object-oriented}, backwards-compatible extension of the C programming language~\cite{Moss18}.
    265 Within the \CFA framework, new concurrency features were created from scratch.
    266 While ISO \Celeven defines concurrency~\cite[\S~7.26]{C11}, it is largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}.
    267 Furthermore, \Celeven and pthreads concurrency is simple: create/join threads in a function and a few locks, which is low-level and error prone;
     263This paper discusses the design of advanced, high-level control-flow extensions (especially concurrency and parallelism) in \CFA and its runtime.
     264\CFA is a modern, polymorphic, non-object-oriented\footnote{
     265\CFA has features often associated with object-oriented programming languages, such as constructors, destructors, virtuals and simple inheritance.
     266However, functions \emph{cannot} be nested in structures, so there is no lexical binding between a structure and set of functions (member/method) implemented by an implicit \lstinline@this@ (receiver) parameter.},
     267backwards-compatible extension of the C programming language~\cite{Moss18}.
     268Within the \CFA framework, new control-flow features were created from scratch.
     269ISO \Celeven defines only a subset of the \CFA extensions, and with respect to concurrency~\cite[\S~7.26]{C11}, the features are largely wrappers for a subset of the pthreads library~\cite{Butenhof97,Pthreads}.
     270Furthermore, \Celeven and pthreads concurrency is basic, based on thread fork/join in a function and a few locks, which is low-level and error prone;
    268271no high-level language concurrency features exist.
    269 Interestingly, 8 years since publication of the \Celeven standard, neither gcc-8 nor clang-8 (most recent versions) support \Celeven @threads.h@, indicating little interest in the C concurrency approach.
    270 Finally, while the \Celeven standard does not state a concurrent threading-model, the strong association with pthreads suggests the threading model is kernel-level threading (1:1)~\cite{ThreadModel}.
    271 
    272 There has been a re-interest during the past decade in user-level (M:N, green) threading in new and old programming languages, and providing high-level constructs like coroutines, monitors, tasks, and actors for presenting advanced control-flow.
     272Interestingly, almost a decade after publication of the \Celeven standard, neither gcc-8, clang-8 nor msvc-19 (most recent versions) support the \Celeven include @threads.h@, indicating little interest in the C concurrency approach.
     273Finally, while the \Celeven standard does not state a concurrent threading-model, the historical association with pthreads suggests the threading model is kernel-level threading (1:1)~\cite{ThreadModel}.
     274
     275In contrast, there has been a renewed interest during the past decade in user-level (M:N, green) threading in old and new programming languages.
    273276As multi-core hardware became available in the 1980/90s, both user and kernel threading were examined.
    274277Kernel threading was chosen, largely because of its simplicity and fit with the simpler operating systems and hardware architectures at the time, which gave it a performance advantage~\cite{Drepper03}.
    275 Libraries like pthreads were developed for C and the Solaris operating-system switched from user (JDK 1.1~\cite{JDK1.1}) to kernel threads.
     278Libraries like pthreads were developed for C, and the Solaris operating-system switched from user (JDK 1.1~\cite{JDK1.1}) to kernel threads.
    276279As a result, languages like Java, Scala~\cite{Scala}, Objective-C~\cite{obj-c-book}, \CCeleven~\cite{C11}, and C\#~\cite{Csharp} adopted the 1:1 kernel-threading model, with a variety of presentation mechanisms.
    277280From 2000 onwards, languages like Go~\cite{Go}, Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, D~\cite{D}, and \uC~\cite{uC++,uC++book} have championed the M:N user-threading model, and many user-threading libraries have appeared~\cite{Qthreads,MPC,BoostThreads}, including putting green threads back into Java~\cite{Quasar}.
    278 Because advanced control-flow (including exception handling) is pervasive in a programming language and its runtime, these features must be understood by the language (i.e., not added via a library) to prevent invalidation by sequential optimizations~\cite{Buhr95a,Boehm05}.
    279 
    280 The main argument for user-level threading is matching the concurrency model with the programming-language style, versus adapting language concurrency to one general approach.
    281 For example, it is possible to provide coroutines, monitors, and tasks as specialized types in an object-oriented language, integrating these constructs to allow leveraging the type-system (static type-checking) and all other object-oriented capabilities~\cite{uC++}.
    282 The user-threading approach facilitates a simpler concurrency construction using thread objects and leveraging sequential patterns versus call-backs and events~\cite{vonBehren03}.
    283 As well, user-level threads are lighter weight than kernel threads, so there is less restriction on programming styles that encourage large numbers of threads performing smaller work-units to facilitate load balancing by the runtime~\cite{Verch12}.
    284 User threading is also able to layer multiple concurrency models into a single language (locks, monitors, tasks, actors, futures), so programmers can chose the model that best fits an application.
    285 Finally, it is possible to discretely fold locking and non-blocking I/O multiplexing into the language's I/O libraries, so threading implicitly dovetails with the I/O subsystem.
    286 Performant user-threading implementations (both time and space) are appearing that are competitive with direct kernel-threading implementations, while achieving the programming advantages of high concurrency levels and safety.
    287 
    288 Adding advanced control-flow to \CFA is similar to current and future extensions in \CCeleven through to \CCtwenty.
    289 However, we contend the \CFA extensions are demonstrably better than those proposed for \CC.
    290 For example, a unique contribution of this work is allowing multiple monitors to be safely acquired \emph{simultaneously} (deadlock free), while integrating this capability with all monitor synchronization mechanisms.
    291 As well, all control-flow features respect the expectations of C programmers, with statically type-safe interfaces that integrate with the \CFA polymorphic type-system and other language features.
    292 Experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming-languages.
     281The main argument for user-level threading is that they are lighter weight than kernel threads (locking and context switching do not cross the kernel boundary), so there is less restriction on programming styles that encourage large numbers of threads performing smaller work-units to facilitate load balancing by the runtime~\cite{Verch12}.
     282As well, user-threading facilitates a simpler concurrency approach using thread objects that leverage sequential patterns versus events with call-backs~\cite{vonBehren03}.
     283Finally, performant user-threading implementations (both time and space) are largely competitive with direct kernel-threading implementations, while achieving the programming advantages of high concurrency levels and safety.
     284
     285A further effort over the past decade is the development of language memory-models to deal with the conflict between certain language features and compiler/hardware optimizations.
     286This issue can be rephrased as some features are pervasive (language and runtime) and cannot be safely added via a library to prevent invalidation by sequential optimizations~\cite{Buhr95a,Boehm05}.
     287The consequence is that a language must be cognizant of these features and provide sufficient tools to program around any safety issues.
     288For example, C created the @volatile@ qualifier to provide correct execution for @setjmp@/@logjmp@ (concurrency came later).
     289The simplest solution is to provide a handful of complex qualifiers and functions (e.g., @volatile@ and atomics) allowing programmers to write consistent/race-free programs, often in the sequentially-consistent memory-model~\cite{Boehm12}.
     290
     291While having a sufficient memory-model allows sound libraries to be constructed, writing these libraries can quickly become awkward and error prone, and using these low-level libraries has the same issues.
     292Essentially, using low-level explicit locks is the concurrent equivalent of assembler programming.
     293Just as most assembler programming is replaced with programming in a high-level language, explicit locks can be replaced with high-level concurrency constructs in a programming language.
     294The goal is to get the compiler to check for correct usage and follow any complex coding conventions implicitly.
     295The drawback is that language constructs may preclude certain specialized techniques, therefore introducing inefficiency or inhibiting concurrency.
     296For most concurrent programs, these drawbacks are insignificant in comparison to the speed of composition, and subsequent reliability and maintainability of the high-level concurrent program.
     297(The same is true for high-level programming versus assembler programming.)
     298Only very rarely should it be necessary to drop down to races and/or explicit locks to apply a specialized technique to achieve maximum speed or concurrency.
     299As stated, this observation applies to non-concurrent forms of complex control-flow, like exception handling and coroutines.
     300
     301Adapting the programming language allows matching the control-flow model with the programming-language style, versus adapting to one general (sound) library/paradigm.
     302For example, it is possible to provide exceptions, coroutines, monitors, and tasks as specialized types in an object-oriented language, integrating these constructs to allow leveraging the type-system (static type-checking) and all other object-oriented capabilities~\cite{uC++}.
     303It is also possible to leverage call/return for blocking communication via new control structures, versus switching to alternative communication paradigms, like channels or message passing.
     304As well, user threading is often a complementary feature, allowing light-weight threading to match with low-cost objects, while hiding the application/kernel boundary.
     305User threading also allows layering of implicit concurrency models (no explicit thread creation), such executors, data-flow, actors, into a single language, so programmers can chose the model that best fits an algorithm.\footnote{
     306All implicit concurrency models have explicit threading in their implementation, and hence, can be build from explicit threading;
     307however, the reverse is seldom true, i.e., given implicit concurrency, e.g., actors, it is virtually impossible to create explicit concurrency, e.g., blocking thread objects.}
     308Finally, with extended language features and user-level threading it is possible to discretely fold locking and non-blocking I/O multiplexing into the language's I/O libraries, so threading implicitly dovetails with the I/O subsystem.
     309
     310\CFA embraces language extensions and user-level threading to provide advanced control-flow and concurrency.
     311We attempt to show the \CFA extensions and runtime are demonstrably better than those proposed for \CC and other concurrent, imperative programming languages.
     312The contributions of this work are:
     313\begin{itemize}
     314\item
     315allowing multiple monitors to be safely acquired \emph{simultaneously} (deadlock free), while seamlessly integrating this capability with all monitor synchronization mechanisms.
     316\item
     317all control-flow features respect the expectations of C programmers, with statically type-safe interfaces that integrate with the \CFA polymorphic type-system and other language features.
     318\item
     319experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming-languages.
     320\end{itemize}
    293321
    294322\begin{comment}
     
    587615
    588616
    589 \section{Concurrency}
    590 \label{s:Concurrency}
    591 
    592 At its core, concurrency is based on multiple call-stacks and scheduling threads executing on these stacks.
    593 Multiple call stacks (or contexts) and a single thread of execution, called \newterm{coroutining}~\cite{Conway63,Marlin80}, does \emph{not} imply concurrency~\cite[\S~2]{Buhr05a}.
    594 In coroutining, the single thread is self-scheduling across the stacks, so execution is deterministic, \ie the execution path from input to output is fixed and predictable.
    595 A \newterm{stackless} coroutine executes on the caller's stack~\cite{Python} but this approach is restrictive, \eg preventing modularization and supporting only iterator/generator-style programming;
    596 a \newterm{stackful} coroutine executes on its own stack, allowing full generality.
    597 Only stackful coroutines are a stepping stone to concurrency.
    598 
    599 The transition to concurrency, even for execution with a single thread and multiple stacks, occurs when coroutines also context switch to a \newterm{scheduling oracle}, introducing non-determinism from the coroutine perspective~\cite[\S~3]{Buhr05a}.
    600 Therefore, a minimal concurrency system is possible using coroutines (see Section \ref{coroutine}) in conjunction with a scheduler to decide where to context switch next.
    601 The resulting execution system now follows a cooperative threading-model, called \newterm{non-preemptive scheduling}.
    602 
    603 Because the scheduler is special, it can either be a stackless or stackful coroutine.
    604 For stackless, the scheduler performs scheduling on the stack of the current coroutine and switches directly to the next coroutine, so there is one context switch.
    605 For stackful, the current coroutine switches to the scheduler, which performs scheduling, and it then switches to the next coroutine, so there are two context switches.
    606 A stackful scheduler is often used for simplicity and security.
    607 
    608 Regardless of the approach used, a subset of concurrency related challenges start to appear.
    609 For the complete set of concurrency challenges to occur, the missing feature is \newterm{preemption}, where context switching occurs randomly between any two instructions, often based on a timer interrupt, called \newterm{preemptive scheduling}.
    610 While a scheduler introduces uncertainty in the order of execution, preemption introduces uncertainty about where context switches occur.
    611 Interestingly, uncertainty is necessary for the runtime (operating) system to give the illusion of parallelism on a single processor and increase performance on multiple processors.
    612 The reason is that only the runtime has complete knowledge about resources and how to best utilized them.
    613 However, the introduction of unrestricted non-determinism results in the need for \newterm{mutual exclusion} and \newterm{synchronization} to restrict non-determinism for correctness;
    614 otherwise, it is impossible to write meaningful programs.
    615 Optimal performance in concurrent applications is often obtained by having as much non-determinism as correctness allows.
    616 
    617 An important missing feature in C is threading\footnote{While the C11 standard defines a \protect\lstinline@threads.h@ header, it is minimal and defined as optional.
    618 As such, library support for threading is far from widespread.
    619 At the time of writing the paper, neither \protect\lstinline@gcc@ nor \protect\lstinline@clang@ support \protect\lstinline@threads.h@ in their standard libraries.}.
    620 In modern programming languages, a lack of threading is unacceptable~\cite{Sutter05, Sutter05b}, and therefore existing and new programming languages must have tools for writing efficient concurrent programs to take advantage of parallelism.
    621 As an extension of C, \CFA needs to express these concepts in a way that is as natural as possible to programmers familiar with imperative languages.
    622 Furthermore, because C is a system-level language, programmers expect to choose precisely which features they need and which cost they are willing to pay.
    623 Hence, concurrent programs should be written using high-level mechanisms, and only step down to lower-level mechanisms when performance bottlenecks are encountered.
    624 
    625 
    626 \subsection{Coroutines: A Stepping Stone}\label{coroutine}
    627 
    628 While the focus of this discussion is concurrency and parallelism, it is important to address coroutines, which are a significant building block of a concurrency system (but not concurrent among themselves).
     617\section{Coroutines: A Stepping Stone}\label{coroutine}
     618
     619Advanced controlWhile the focus of this discussion is concurrency and parallelism, it is important to address coroutines, which are a significant building block of a concurrency system (but not concurrent among themselves).
    629620Coroutines are generalized routines allowing execution to be temporarily suspended and later resumed.
    630621Hence, unlike a normal routine, a coroutine may not terminate when it returns to its caller, allowing it to be restarted with the values and execution location present at the point of suspension.
     
    10941085\end{cquote}
    10951086The combination of these two approaches allows an easy and concise specification to coroutining (and concurrency) for normal users, while more advanced users have tighter control on memory layout and initialization.
     1087
     1088
     1089\section{Concurrency}
     1090\label{s:Concurrency}
     1091
     1092At its core, concurrency is based on multiple call-stacks and scheduling threads executing on these stacks.
     1093Multiple call stacks (or contexts) and a single thread of execution, called \newterm{coroutining}~\cite{Conway63,Marlin80}, does \emph{not} imply concurrency~\cite[\S~2]{Buhr05a}.
     1094In coroutining, the single thread is self-scheduling across the stacks, so execution is deterministic, \ie the execution path from input to output is fixed and predictable.
     1095A \newterm{stackless} coroutine executes on the caller's stack~\cite{Python} but this approach is restrictive, \eg preventing modularization and supporting only iterator/generator-style programming;
     1096a \newterm{stackful} coroutine executes on its own stack, allowing full generality.
     1097Only stackful coroutines are a stepping stone to concurrency.
     1098
     1099The transition to concurrency, even for execution with a single thread and multiple stacks, occurs when coroutines also context switch to a \newterm{scheduling oracle}, introducing non-determinism from the coroutine perspective~\cite[\S~3]{Buhr05a}.
     1100Therefore, a minimal concurrency system is possible using coroutines (see Section \ref{coroutine}) in conjunction with a scheduler to decide where to context switch next.
     1101The resulting execution system now follows a cooperative threading-model, called \newterm{non-preemptive scheduling}.
     1102
     1103Because the scheduler is special, it can either be a stackless or stackful coroutine.
     1104For stackless, the scheduler performs scheduling on the stack of the current coroutine and switches directly to the next coroutine, so there is one context switch.
     1105For stackful, the current coroutine switches to the scheduler, which performs scheduling, and it then switches to the next coroutine, so there are two context switches.
     1106A stackful scheduler is often used for simplicity and security.
     1107
     1108Regardless of the approach used, a subset of concurrency related challenges start to appear.
     1109For the complete set of concurrency challenges to occur, the missing feature is \newterm{preemption}, where context switching occurs randomly between any two instructions, often based on a timer interrupt, called \newterm{preemptive scheduling}.
     1110While a scheduler introduces uncertainty in the order of execution, preemption introduces uncertainty about where context switches occur.
     1111Interestingly, uncertainty is necessary for the runtime (operating) system to give the illusion of parallelism on a single processor and increase performance on multiple processors.
     1112The reason is that only the runtime has complete knowledge about resources and how to best utilized them.
     1113However, the introduction of unrestricted non-determinism results in the need for \newterm{mutual exclusion} and \newterm{synchronization} to restrict non-determinism for correctness;
     1114otherwise, it is impossible to write meaningful programs.
     1115Optimal performance in concurrent applications is often obtained by having as much non-determinism as correctness allows.
     1116
     1117An important missing feature in C is threading\footnote{While the C11 standard defines a \protect\lstinline@threads.h@ header, it is minimal and defined as optional.
     1118As such, library support for threading is far from widespread.
     1119At the time of writing the paper, neither \protect\lstinline@gcc@ nor \protect\lstinline@clang@ support \protect\lstinline@threads.h@ in their standard libraries.}.
     1120In modern programming languages, a lack of threading is unacceptable~\cite{Sutter05, Sutter05b}, and therefore existing and new programming languages must have tools for writing efficient concurrent programs to take advantage of parallelism.
     1121As an extension of C, \CFA needs to express these concepts in a way that is as natural as possible to programmers familiar with imperative languages.
     1122Furthermore, because C is a system-level language, programmers expect to choose precisely which features they need and which cost they are willing to pay.
     1123Hence, concurrent programs should be written using high-level mechanisms, and only step down to lower-level mechanisms when performance bottlenecks are encountered.
    10961124
    10971125
Note: See TracChangeset for help on using the changeset viewer.