Changeset ff98952 for doc/proposals/concurrency/text
- Timestamp:
- May 29, 2017, 3:08:17 PM (7 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
- Children:
- 4a368547
- Parents:
- 27dde72
- Location:
- doc/proposals/concurrency/text
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/proposals/concurrency/text/basics.tex
r27dde72 rff98952 13 13 One of the important features that is missing to C is threading. On modern architectures, the lack of threading is becoming less and less forgivable\cite{Sutter05, Sutter05b} and therefore any modern programming language should have the proper tools to allow users to write performant concurrent and/or parallel programs. As an extension of C, \CFA needs to express these concepts an a way that is as natural as possible to programmers used to imperative languages. And being a system level language means programmers will expect to be able to choose precisely which features they need and which cost they are willing to pay. 14 14 15 \section{Coroutines : A stepping stone}\label{coroutine} 16 While the main focus of this proposal is concurrency and parallelism, as mentionned above it is important to adress coroutines which are actually a significant underlying aspect of the concurrency system. Indeed, while having nothing todo with parallelism and arguably little to do with concurrency, coroutines need to deal with context-switchs and and other context management operations. Therefore, this proposal includes coroutines both as an intermediate step for the implementation of threads and a first class feature of \CFA. Furthermore, many design challenges of threads are at least partially present in designing coroutines, which makes the design effort that much more relevant. 17 18 The core API of coroutines revolve around two features : independent call stacks and \code{suspend}/\code{resume}. 19 Here is an example of a solution to the fibonnaci problem using \CFA coroutines : 15 \section{Coroutines A stepping stone}\label{coroutine} 16 While the main focus of this proposal is concurrency and parallelism, as mentionned above it is important to adress coroutines which are actually a significant underlying aspect of the concurrency system. Indeed, while having nothing todo with parallelism and arguably little to do with concurrency, coroutines need to deal with context-switchs and and other context management operations. Therefore, this proposal includes coroutines both as an intermediate step for the implementation of threads and a first class feature of \CFA. Furthermore, many design challenges of threads are at least partially present in designing coroutines, which makes the design effort that much more relevant. The core API of coroutines revolve around two features independent call stacks and \code{suspend}/\code{resume}. 17 18 Here is an example of a solution to the fibonnaci problem using \CFA coroutines: 20 19 \begin{cfacode} 21 20 coroutine Fibonacci { … … 64 63 The runtime system needs to create the coroutine's stack and more importantly prepare it for the first resumption. The timing of the creation is non trivial since users both expect to have fully constructed objects once execution enters the coroutine main and to be able to resume the coroutine from the constructor (Obviously we only solve cases where these two statements don't conflict). There are several solutions to this problem but the chosen options effectively forces the design of the coroutine. 65 64 66 Furthermore, \CFA faces an extra challenge which is that polymorphique routines rely on invisible thunks when casted to non-polymorphic routines and these thunks have function scope. For example, the following code, while looking benign, can run into undefined behaviour because of thunks 65 Furthermore, \CFA faces an extra challenge which is that polymorphique routines rely on invisible thunks when casted to non-polymorphic routines and these thunks have function scope. For example, the following code, while looking benign, can run into undefined behaviour because of thunks: 67 66 68 67 \begin{cfacode} … … 79 78 } 80 79 \end{cfacode} 81 Indeed, the generated C code\footnote{Code trimmed down for brevity} shows that a local thunk is created in order to hold type information 80 Indeed, the generated C code\footnote{Code trimmed down for brevity} shows that a local thunk is created in order to hold type information: 82 81 83 82 \begin{ccode} … … 116 115 117 116 \subsection{Alternative: Reserved keyword} 118 The next alternative is to use language support to annotate coroutines as follows 117 The next alternative is to use language support to annotate coroutines as follows: 119 118 120 119 \begin{cfacode} … … 129 128 \subsection{Alternative: Lamda Objects} 130 129 131 For coroutines as for threads, many implementations are based on routine pointers or function objects\cit. For example, Boost implements coroutines in terms of four functor object types :\code{asymmetric_coroutine<>::pull_type}, \code{asymmetric_coroutine<>::push_type}, \code{symmetric_coroutine<>::call_type}, \code{symmetric_coroutine<>::yield_type}. Often, the canonical threading paradigm in languages is based on function pointers, pthread being one of the most well known example. The main problem of these approach is that the thread usage is limited to a generic handle that must otherwise be wrapped in a custom type. Since the custom type is simple to write and \CFA and solves several issues, added support for routine/lambda based coroutines adds very little.130 For coroutines as for threads, many implementations are based on routine pointers or function objects\cit. For example, Boost implements coroutines in terms of four functor object types \code{asymmetric_coroutine<>::pull_type}, \code{asymmetric_coroutine<>::push_type}, \code{symmetric_coroutine<>::call_type}, \code{symmetric_coroutine<>::yield_type}. Often, the canonical threading paradigm in languages is based on function pointers, pthread being one of the most well known example. The main problem of these approach is that the thread usage is limited to a generic handle that must otherwise be wrapped in a custom type. Since the custom type is simple to write and \CFA and solves several issues, added support for routine/lambda based coroutines adds very little. 132 131 133 132 \subsection{Trait based coroutines} … … 146 145 147 146 \section{Thread Interface}\label{threads} 148 The basic building blocks of multi-threading in \CFA are \glspl{cfathread}. By default these are implemented as \glspl{uthread}, and as such, offer a flexible and lightweight threading interface (lightweight compared to \glspl{kthread}). A thread can be declared using a SUE declaration \code{thread} as follows 147 The basic building blocks of multi-threading in \CFA are \glspl{cfathread}. By default these are implemented as \glspl{uthread}, and as such, offer a flexible and lightweight threading interface (lightweight compared to \glspl{kthread}). A thread can be declared using a SUE declaration \code{thread} as follows: 149 148 150 149 \begin{cfacode} … … 152 151 \end{cfacode} 153 152 154 Like for coroutines, the keyword is a thin wrapper arount a \CFA trait 153 Like for coroutines, the keyword is a thin wrapper arount a \CFA trait: 155 154 156 155 \begin{cfacode} … … 162 161 \end{cfacode} 163 162 164 Obviously, for this thread implementation to be usefull it must run some user code. Several other threading interfaces use a function-pointer representation as the interface of threads (for example : \Csharp~\cite{Csharp} and Scala~\cite{Scala}). However, this proposal considers that statically tying a \code{main} routine to a thread superseeds this approach. Since the \code{main} routine is already a special routine in \CFA (where the program begins), it is possible naturally extend the semantics using overloading to declare mains for different threads (the normal main being the main of the initial thread). As such the \code{main} routine of a thread can be defined as :163 Obviously, for this thread implementation to be usefull it must run some user code. Several other threading interfaces use a function-pointer representation as the interface of threads (for example \Csharp~\cite{Csharp} and Scala~\cite{Scala}). However, this proposal considers that statically tying a \code{main} routine to a thread superseeds this approach. Since the \code{main} routine is already a special routine in \CFA (where the program begins), it is possible naturally extend the semantics using overloading to declare mains for different threads (the normal main being the main of the initial thread). As such the \code{main} routine of a thread can be defined as 165 164 \begin{cfacode} 166 165 thread foo {}; … … 171 170 \end{cfacode} 172 171 173 In this example, threads of type \code{foo} will start there execution in the \code{void main(foo*)} routine which in this case prints \code{"Hello World!"}. While this proposoal encourages this approach which enforces strongly type programming, users may prefer to use the routine based thread semantics for the sake of simplicity. With these semantics it is trivial to write a thread type that takes a function pointer as parameter and executes it on its stack asynchronously :172 In this example, threads of type \code{foo} will start there execution in the \code{void main(foo*)} routine which in this case prints \code{"Hello World!"}. While this proposoal encourages this approach which enforces strongly type programming, users may prefer to use the routine based thread semantics for the sake of simplicity. With these semantics it is trivial to write a thread type that takes a function pointer as parameter and executes it on its stack asynchronously 174 173 \begin{cfacode} 175 174 typedef void (*voidFunc)(void); … … 211 210 \end{cfacode} 212 211 213 This semantic has several advantages over explicit semantics : typesafety is guaranteed, a thread is always started and stopped exaclty once and users cannot make any progamming errors. However, one of the apparent drawbacks of this system is that threads now always form a lattice, that is they are always destroyed in opposite order of construction. While this seems like a significant limitation, existing \CFA semantics can solve this problem. Indeed, using dynamic allocation to create threads will naturally let threads outlive the scope in which the thread was created much like dynamically allocating memory will let objects outlive the scope in which thy were created :212 This semantic has several advantages over explicit semantics typesafety is guaranteed, a thread is always started and stopped exaclty once and users cannot make any progamming errors. However, one of the apparent drawbacks of this system is that threads now always form a lattice, that is they are always destroyed in opposite order of construction. While this seems like a significant limitation, existing \CFA semantics can solve this problem. Indeed, using dynamic allocation to create threads will naturally let threads outlive the scope in which the thread was created much like dynamically allocating memory will let objects outlive the scope in which thy were created 214 213 215 214 \begin{cfacode} … … 243 242 \end{cfacode} 244 243 245 Another advantage of this semantic is that it naturally scale to multiple threads meaning basic synchronisation is very simple :244 Another advantage of this semantic is that it naturally scale to multiple threads meaning basic synchronisation is very simple 246 245 247 246 \begin{cfacode} -
doc/proposals/concurrency/text/concurrency.tex
r27dde72 rff98952 387 387 \end{pseudo} 388 388 \end{multicols} 389 \begin{center} 390 Listing 1 391 \end{center} 389 392 390 393 It is particularly important to pay attention to code sections 8 and 3, which are where the existing semantics of internal scheduling need to be extended for multiple monitors. The root of the problem is that \gls{group-acquire} is used in a context where one of the monitors is already acquired and is why it is important to define the behaviour of the previous pseudo-code. When the signaller thread reaches the location where it should "release A \& B" (line 17), it must actually transfer ownership of monitor B to the waiting thread. This ownership trasnfer is required in order to prevent barging. Since the signalling thread still needs the monitor A, simply waking up the waiting thread is not an option because it would violate mutual exclusion. We are therefore left with three options: … … 416 419 \end{multicols} 417 420 However, this solution can become much more complicated depending on what is executed while secretly holding B. Indeed, nothing prevents a user from signalling monitor A on a different condition variable: 421 \newpage 418 422 \begin{multicols}{2} 419 423 Thread 1 420 \begin{pseudo} 424 \begin{pseudo}[numbers=left, firstnumber=1] 421 425 acquire A 422 426 acquire A & B … … 427 431 428 432 Thread 2 429 \begin{pseudo} 433 \begin{pseudo}[numbers=left, firstnumber=6] 430 434 acquire A 431 435 wait A … … 436 440 437 441 Thread 3 438 \begin{pseudo} 442 \begin{pseudo}[numbers=left, firstnumber=10] 439 443 acquire A 440 444 acquire A & B … … 449 453 \end{multicols} 450 454 451 The goal in this solution is to avoid the need to transfer ownership of a subset of the condition monitors. However, this goal is unreacheable in the previous example since \TODO 455 The goal in this solution is to avoid the need to transfer ownership of a subset of the condition monitors. However, this goal is unreacheable in the previous example. Depending on the order of signals (line 12 and 15) two cases can happen. Note that ordering is not determined by a race condition but by whether signalled threads are enqueued in FIFO or FILO order. However, regardless of the answer, users can move line 15 before line 11 and get the reverse effect. 456 457 \paragraph{Case 1: thread 1 will go first.} In this case, the problem is that monitor A needs to be passed to thread 2 when thread 1 is done with it. 458 \paragraph{Case 2: thread 2 will go first.} In this case, the problem is that monitor B needs to be passed to thread 1. This can be done directly or using thread 2 as an intermediate. 459 \\ 460 461 In both cases however, the threads need to be able to distinguish on a per monitor basis which ones need to be released and which ones need to be transferred. Which means monitors cannot be handled as a single homogenous group. 452 462 453 463 \subsubsection{Dependency graphs} 454 In the previouspseudo-code, there is a solution which would statisfy both barging prevention and mutual exclusion. If ownership of both monitors is transferred to the waiter when the signaller releases A and then the waiter transfers back ownership of A when it releases it then the problem is solved. Dynamically finding the correct order is therefore the second possible solution. The problem it encounters is that it effectively boils down to resolving a dependency graph of ownership requirements. Here even the simplest of code snippets requires two transfers and it seems to increase in a manner closer to polynomial. For example the following code which is just a direct extension to three monitors requires at least three ownership transfer and has multiple solutions:464 In the Listing 1 pseudo-code, there is a solution which would statisfy both barging prevention and mutual exclusion. If ownership of both monitors is transferred to the waiter when the signaller releases A and then the waiter transfers back ownership of A when it releases it then the problem is solved. Dynamically finding the correct order is therefore the second possible solution. The problem it encounters is that it effectively boils down to resolving a dependency graph of ownership requirements. Here even the simplest of code snippets requires two transfers and it seems to increase in a manner closer to polynomial. For example the following code which is just a direct extension to three monitors requires at least three ownership transfer and has multiple solutions: 455 465 456 466 \begin{multicols}{2}
Note: See TracChangeset
for help on using the changeset viewer.