Changeset ff98952


Ignore:
Timestamp:
May 29, 2017, 3:08:17 PM (4 years ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
aaron-thesis, arm-eh, cleanup-dtors, deferred_resn, demangler, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, resolv-new, with_gc
Children:
4a368547
Parents:
27dde72
Message:

Updated draft up to 3.4

Location:
doc/proposals/concurrency
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • doc/proposals/concurrency/.gitignore

    r27dde72 rff98952  
    1919build/*.toc
    2020*.pdf
     21
     22examples
  • doc/proposals/concurrency/Makefile

    r27dde72 rff98952  
    6464        build/*.tex     \
    6565        build/*.toc     \
    66                
     66
    6767
    6868# File Dependencies #
     
    8181        @ echo "Citation Pass 1"
    8282        @ -${BibTeX} ${basename $@}                                                                                     # Some citations reference others so run steps again to resolve these citations
    83         @ echo "Citation Pass 2"       
     83        @ echo "Citation Pass 2"
    8484        @ ${LaTeX} ${basename ${notdir $@}}.tex
    8585        @ -${BibTeX} ${basename $@}
    86         @ echo "Glossary"       
     86        @ echo "Glossary"
    8787        makeglossaries -q -s ${basename $@}.ist ${basename $@}                                          # Make index from *.aux entries and input index at end of document
    88         @ echo ".dvi generation"       
     88        @ echo ".dvi generation"
    8989        @ -build/bump_ver.sh
    9090        @ ${LaTeX} ${basename ${notdir $@}}.tex                                                                 # Run again to get index title into table of contents
  • doc/proposals/concurrency/text/basics.tex

    r27dde72 rff98952  
    1313One of the important features that is missing to C is threading. On modern architectures, the lack of threading is becoming less and less forgivable\cite{Sutter05, Sutter05b} and therefore any modern programming language should have the proper tools to allow users to write performant concurrent and/or parallel programs. As an extension of C, \CFA needs to express these concepts an a way that is as natural as possible to programmers used to imperative languages. And being a system level language means programmers will expect to be able to choose precisely which features they need and which cost they are willing to pay.
    1414
    15 \section{Coroutines : A stepping stone}\label{coroutine}
    16 While the main focus of this proposal is concurrency and parallelism, as mentionned above it is important to adress coroutines which are actually a significant underlying aspect of the concurrency system. Indeed, while having nothing todo with parallelism and arguably little to do with concurrency, coroutines need to deal with context-switchs and and other context management operations. Therefore, this proposal includes coroutines both as an intermediate step for the implementation of threads and a first class feature of \CFA. Furthermore, many design challenges of threads are at least partially present in designing coroutines, which makes the design effort that much more relevant.
    17 
    18 The core API of coroutines revolve around two features : independent call stacks and \code{suspend}/\code{resume}.
    19 Here is an example of a solution to the fibonnaci problem using \CFA coroutines :
     15\section{Coroutines A stepping stone}\label{coroutine}
     16While the main focus of this proposal is concurrency and parallelism, as mentionned above it is important to adress coroutines which are actually a significant underlying aspect of the concurrency system. Indeed, while having nothing todo with parallelism and arguably little to do with concurrency, coroutines need to deal with context-switchs and and other context management operations. Therefore, this proposal includes coroutines both as an intermediate step for the implementation of threads and a first class feature of \CFA. Furthermore, many design challenges of threads are at least partially present in designing coroutines, which makes the design effort that much more relevant. The core API of coroutines revolve around two features independent call stacks and \code{suspend}/\code{resume}.
     17
     18Here is an example of a solution to the fibonnaci problem using \CFA coroutines:
    2019\begin{cfacode}
    2120        coroutine Fibonacci {
     
    6463The runtime system needs to create the coroutine's stack and more importantly prepare it for the first resumption. The timing of the creation is non trivial since users both expect to have fully constructed objects once execution enters the coroutine main and to be able to resume the coroutine from the constructor (Obviously we only solve cases where these two statements don't conflict). There are several solutions to this problem but the chosen options effectively forces the design of the coroutine.
    6564
    66 Furthermore, \CFA faces an extra challenge which is that polymorphique routines rely on invisible thunks when casted to non-polymorphic routines and these thunks have function scope. For example, the following code, while looking benign, can run into undefined behaviour because of thunks :
     65Furthermore, \CFA faces an extra challenge which is that polymorphique routines rely on invisible thunks when casted to non-polymorphic routines and these thunks have function scope. For example, the following code, while looking benign, can run into undefined behaviour because of thunks:
    6766
    6867\begin{cfacode}
     
    7978}
    8079\end{cfacode}
    81 Indeed, the generated C code\footnote{Code trimmed down for brevity} shows that a local thunk is created in order to hold type information :
     80Indeed, the generated C code\footnote{Code trimmed down for brevity} shows that a local thunk is created in order to hold type information:
    8281
    8382\begin{ccode}
     
    116115
    117116\subsection{Alternative: Reserved keyword}
    118 The next alternative is to use language support to annotate coroutines as follows :
     117The next alternative is to use language support to annotate coroutines as follows:
    119118
    120119\begin{cfacode}
     
    129128\subsection{Alternative: Lamda Objects}
    130129
    131 For coroutines as for threads, many implementations are based on routine pointers or function objects\cit. For example, Boost implements coroutines in terms of four functor object types : \code{asymmetric_coroutine<>::pull_type}, \code{asymmetric_coroutine<>::push_type}, \code{symmetric_coroutine<>::call_type}, \code{symmetric_coroutine<>::yield_type}. Often, the canonical threading paradigm in languages is based on function pointers, pthread being one of the most well known example. The main problem of these approach is that the thread usage is limited to a generic handle that must otherwise be wrapped in a custom type. Since the custom type is simple to write and \CFA and solves several issues, added support for routine/lambda based coroutines adds very little.
     130For coroutines as for threads, many implementations are based on routine pointers or function objects\cit. For example, Boost implements coroutines in terms of four functor object types \code{asymmetric_coroutine<>::pull_type}, \code{asymmetric_coroutine<>::push_type}, \code{symmetric_coroutine<>::call_type}, \code{symmetric_coroutine<>::yield_type}. Often, the canonical threading paradigm in languages is based on function pointers, pthread being one of the most well known example. The main problem of these approach is that the thread usage is limited to a generic handle that must otherwise be wrapped in a custom type. Since the custom type is simple to write and \CFA and solves several issues, added support for routine/lambda based coroutines adds very little.
    132131
    133132\subsection{Trait based coroutines}
     
    146145
    147146\section{Thread Interface}\label{threads}
    148 The basic building blocks of multi-threading in \CFA are \glspl{cfathread}. By default these are implemented as \glspl{uthread}, and as such, offer a flexible and lightweight threading interface (lightweight compared to \glspl{kthread}). A thread can be declared using a SUE declaration \code{thread} as follows :
     147The basic building blocks of multi-threading in \CFA are \glspl{cfathread}. By default these are implemented as \glspl{uthread}, and as such, offer a flexible and lightweight threading interface (lightweight compared to \glspl{kthread}). A thread can be declared using a SUE declaration \code{thread} as follows:
    149148
    150149\begin{cfacode}
     
    152151\end{cfacode}
    153152
    154 Like for coroutines, the keyword is a thin wrapper arount a \CFA trait :
     153Like for coroutines, the keyword is a thin wrapper arount a \CFA trait:
    155154
    156155\begin{cfacode}
     
    162161\end{cfacode}
    163162
    164 Obviously, for this thread implementation to be usefull it must run some user code. Several other threading interfaces use a function-pointer representation as the interface of threads (for example : \Csharp~\cite{Csharp} and Scala~\cite{Scala}). However, this proposal considers that statically tying a \code{main} routine to a thread superseeds this approach. Since the \code{main} routine is already a special routine in \CFA (where the program begins), it is possible naturally extend the semantics using overloading to declare mains for different threads (the normal main being the main of the initial thread). As such the \code{main} routine of a thread can be defined as :
     163Obviously, for this thread implementation to be usefull it must run some user code. Several other threading interfaces use a function-pointer representation as the interface of threads (for example \Csharp~\cite{Csharp} and Scala~\cite{Scala}). However, this proposal considers that statically tying a \code{main} routine to a thread superseeds this approach. Since the \code{main} routine is already a special routine in \CFA (where the program begins), it is possible naturally extend the semantics using overloading to declare mains for different threads (the normal main being the main of the initial thread). As such the \code{main} routine of a thread can be defined as
    165164\begin{cfacode}
    166165        thread foo {};
     
    171170\end{cfacode}
    172171
    173 In this example, threads of type \code{foo} will start there execution in the \code{void main(foo*)} routine which in this case prints \code{"Hello World!"}. While this proposoal encourages this approach which enforces strongly type programming, users may prefer to use the routine based thread semantics for the sake of simplicity. With these semantics it is trivial to write a thread type that takes a function pointer as parameter and executes it on its stack asynchronously :
     172In this example, threads of type \code{foo} will start there execution in the \code{void main(foo*)} routine which in this case prints \code{"Hello World!"}. While this proposoal encourages this approach which enforces strongly type programming, users may prefer to use the routine based thread semantics for the sake of simplicity. With these semantics it is trivial to write a thread type that takes a function pointer as parameter and executes it on its stack asynchronously
    174173\begin{cfacode}
    175174        typedef void (*voidFunc)(void);
     
    211210\end{cfacode}
    212211
    213 This semantic has several advantages over explicit semantics : typesafety is guaranteed, a thread is always started and stopped exaclty once and users cannot make any progamming errors. However, one of the apparent drawbacks of this system is that threads now always form a lattice, that is they are always destroyed in opposite order of construction. While this seems like a significant limitation, existing \CFA semantics can solve this problem. Indeed, using dynamic allocation to create threads will naturally let threads outlive the scope in which the thread was created much like dynamically allocating memory will let objects outlive the scope in which thy were created :
     212This semantic has several advantages over explicit semantics typesafety is guaranteed, a thread is always started and stopped exaclty once and users cannot make any progamming errors. However, one of the apparent drawbacks of this system is that threads now always form a lattice, that is they are always destroyed in opposite order of construction. While this seems like a significant limitation, existing \CFA semantics can solve this problem. Indeed, using dynamic allocation to create threads will naturally let threads outlive the scope in which the thread was created much like dynamically allocating memory will let objects outlive the scope in which thy were created
    214213
    215214\begin{cfacode}
     
    243242\end{cfacode}
    244243
    245 Another advantage of this semantic is that it naturally scale to multiple threads meaning basic synchronisation is very simple :
     244Another advantage of this semantic is that it naturally scale to multiple threads meaning basic synchronisation is very simple
    246245
    247246\begin{cfacode}
  • doc/proposals/concurrency/text/concurrency.tex

    r27dde72 rff98952  
    387387\end{pseudo}
    388388\end{multicols}
     389\begin{center}
     390Listing 1
     391\end{center}
    389392
    390393It is particularly important to pay attention to code sections 8 and 3, which are where the existing semantics of internal scheduling need to be extended for multiple monitors. The root of the problem is that \gls{group-acquire} is used in a context where one of the monitors is already acquired and is why it is important to define the behaviour of the previous pseudo-code. When the signaller thread reaches the location where it should "release A \& B" (line 17), it must actually transfer ownership of monitor B to the waiting thread. This ownership trasnfer is required in order to prevent barging. Since the signalling thread still needs the monitor A, simply waking up the waiting thread is not an option because it would violate mutual exclusion. We are therefore left with three options:
     
    416419\end{multicols}
    417420However, this solution can become much more complicated depending on what is executed while secretly holding B. Indeed, nothing prevents a user from signalling monitor A on a different condition variable:
     421\newpage
    418422\begin{multicols}{2}
    419423Thread 1
    420 \begin{pseudo}
     424\begin{pseudo}[numbers=left, firstnumber=1]
    421425acquire A
    422426        acquire A & B
     
    427431
    428432Thread 2
    429 \begin{pseudo}
     433\begin{pseudo}[numbers=left, firstnumber=6]
    430434acquire A
    431435        wait A
     
    436440
    437441Thread 3
    438 \begin{pseudo}
     442\begin{pseudo}[numbers=left, firstnumber=10]
    439443acquire A
    440444        acquire A & B
     
    449453\end{multicols}
    450454
    451 The goal in this solution is to avoid the need to transfer ownership of a subset of the condition monitors. However, this goal is unreacheable in the previous example since \TODO
     455The goal in this solution is to avoid the need to transfer ownership of a subset of the condition monitors. However, this goal is unreacheable in the previous example. Depending on the order of signals (line 12 and 15) two cases can happen. Note that ordering is not determined by a race condition but by whether signalled threads are enqueued in FIFO or FILO order. However, regardless of the answer, users can move line 15 before line 11 and get the reverse effect.
     456
     457\paragraph{Case 1: thread 1 will go first.} In this case, the problem is that monitor A needs to be passed to thread 2 when thread 1 is done with it.
     458\paragraph{Case 2: thread 2 will go first.} In this case, the problem is that monitor B needs to be passed to thread 1. This can be done directly or using thread 2 as an intermediate.
     459\\
     460
     461In both cases however, the threads need to be able to distinguish on a per monitor basis which ones need to be released and which ones need to be transferred. Which means monitors cannot be handled as a single homogenous group.
    452462
    453463\subsubsection{Dependency graphs}
    454 In the previous pseudo-code, there is a solution which would statisfy both barging prevention and mutual exclusion. If ownership of both monitors is transferred to the waiter when the signaller releases A and then the waiter transfers back ownership of A when it releases it then the problem is solved. Dynamically finding the correct order is therefore the second possible solution. The problem it encounters is that it effectively boils down to resolving a dependency graph of ownership requirements. Here even the simplest of code snippets requires two transfers and it seems to increase in a manner closer to polynomial. For example the following code which is just a direct extension to three monitors requires at least three ownership transfer and has multiple solutions:
     464In the Listing 1 pseudo-code, there is a solution which would statisfy both barging prevention and mutual exclusion. If ownership of both monitors is transferred to the waiter when the signaller releases A and then the waiter transfers back ownership of A when it releases it then the problem is solved. Dynamically finding the correct order is therefore the second possible solution. The problem it encounters is that it effectively boils down to resolving a dependency graph of ownership requirements. Here even the simplest of code snippets requires two transfers and it seems to increase in a manner closer to polynomial. For example the following code which is just a direct extension to three monitors requires at least three ownership transfer and has multiple solutions:
    455465
    456466\begin{multicols}{2}
Note: See TracChangeset for help on using the changeset viewer.