Context Navigation

← Previous Changeset
Next Changeset →

Changeset ff98952

Timestamp:

May 29, 2017, 3:08:17 PM (9 years ago)

Author:

Thierry Delisle <tdelisle@…>

Branches:

ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, stuck-waitfor-destruct, with_gc

Children:

Parents:

Message:

Updated draft up to 3.4

Location:

doc/proposals/concurrency

Files:

: 4 edited

.gitignore (modified) (1 diff)
Makefile (modified) (2 diffs)
text/basics.tex (modified) (11 diffs)
text/concurrency.tex (modified) (5 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/proposals/concurrency/.gitignore

r27dde72	rff98952
19	19	build/*.toc
20	20	*.pdf
	21
	22	examples

doc/proposals/concurrency/Makefile

-              r27dde72
+              rff98952
         build/*.tex     \
         build/*.toc     \
 # File Dependencies #
 …
         @ echo "Citation Pass 1"
         @ -${BibTeX} ${basename $@}                                                                                     # Some citations reference others so run steps again to resolve these citations
         @ echo "Citation Pass 2"
+        @ echo "Citation Pass 2"
         @ ${LaTeX} ${basename ${notdir $@}}.tex
         @ -${BibTeX} ${basename $@}
         @ echo "Glossary"
+        @ echo "Glossary"
         makeglossaries -q -s ${basename $@}.ist ${basename $@}                                          # Make index from *.aux entries and input index at end of document
         @ echo ".dvi generation"
+        @ echo ".dvi generation"
         @ -build/bump_ver.sh
         @ ${LaTeX} ${basename ${notdir $@}}.tex                                                                 # Run again to get index title into table of contents

doc/proposals/concurrency/text/basics.tex

-              r27dde72
+              rff98952
 One of the important features that is missing to C is threading. On modern architectures, the lack of threading is becoming less and less forgivable\cite{Sutter05, Sutter05b} and therefore any modern programming language should have the proper tools to allow users to write performant concurrent and/or parallel programs. As an extension of C, \CFA needs to express these concepts an a way that is as natural as possible to programmers used to imperative languages. And being a system level language means programmers will expect to be able to choose precisely which features they need and which cost they are willing to pay.
+\section{Coroutines : A stepping stone}\label{coroutine}
+While the main focus of this proposal is concurrency and parallelism, as mentionned above it is important to adress coroutines which are actually a significant underlying aspect of the concurrency system. Indeed, while having nothing todo with parallelism and arguably little to do with concurrency, coroutines need to deal with context-switchs and and other context management operations. Therefore, this proposal includes coroutines both as an intermediate step for the implementation of threads and a first class feature of \CFA. Furthermore, many design challenges of threads are at least partially present in designing coroutines, which makes the design effort that much more relevant.
+The core API of coroutines revolve around two features : independent call stacks and \code{suspend}/\code{resume}.
+Here is an example of a solution to the fibonnaci problem using \CFA coroutines :
+\section{Coroutines A stepping stone}\label{coroutine}
+While the main focus of this proposal is concurrency and parallelism, as mentionned above it is important to adress coroutines which are actually a significant underlying aspect of the concurrency system. Indeed, while having nothing todo with parallelism and arguably little to do with concurrency, coroutines need to deal with context-switchs and and other context management operations. Therefore, this proposal includes coroutines both as an intermediate step for the implementation of threads and a first class feature of \CFA. Furthermore, many design challenges of threads are at least partially present in designing coroutines, which makes the design effort that much more relevant. The core API of coroutines revolve around two features independent call stacks and \code{suspend}/\code{resume}.
+Here is an example of a solution to the fibonnaci problem using \CFA coroutines:
 \begin{cfacode}
         coroutine Fibonacci {
 …
 The runtime system needs to create the coroutine's stack and more importantly prepare it for the first resumption. The timing of the creation is non trivial since users both expect to have fully constructed objects once execution enters the coroutine main and to be able to resume the coroutine from the constructor (Obviously we only solve cases where these two statements don't conflict). There are several solutions to this problem but the chosen options effectively forces the design of the coroutine.
 Furthermore, \CFA faces an extra challenge which is that polymorphique routines rely on invisible thunks when casted to non-polymorphic routines and these thunks have function scope. For example, the following code, while looking benign, can run into undefined behaviour because of thunks :
+Furthermore, \CFA faces an extra challenge which is that polymorphique routines rely on invisible thunks when casted to non-polymorphic routines and these thunks have function scope. For example, the following code, while looking benign, can run into undefined behaviour because of thunks:
 \begin{cfacode}
 …
+}
 \end{cfacode}
 Indeed, the generated C code\footnote{Code trimmed down for brevity} shows that a local thunk is created in order to hold type information :
+Indeed, the generated C code\footnote{Code trimmed down for brevity} shows that a local thunk is created in order to hold type information:
 \begin{ccode}
 …
 \subsection{Alternative: Reserved keyword}
 The next alternative is to use language support to annotate coroutines as follows :
+The next alternative is to use language support to annotate coroutines as follows:
 \begin{cfacode}
 …
 \subsection{Alternative: Lamda Objects}
 For coroutines as for threads, many implementations are based on routine pointers or function objects\cit. For example, Boost implements coroutines in terms of four functor object types : \code{asymmetric_coroutine<>::pull_type}, \code{asymmetric_coroutine<>::push_type}, \code{symmetric_coroutine<>::call_type}, \code{symmetric_coroutine<>::yield_type}. Often, the canonical threading paradigm in languages is based on function pointers, pthread being one of the most well known example. The main problem of these approach is that the thread usage is limited to a generic handle that must otherwise be wrapped in a custom type. Since the custom type is simple to write and \CFA and solves several issues, added support for routine/lambda based coroutines adds very little.
+For coroutines as for threads, many implementations are based on routine pointers or function objects\cit. For example, Boost implements coroutines in terms of four functor object types \code{asymmetric_coroutine<>::pull_type}, \code{asymmetric_coroutine<>::push_type}, \code{symmetric_coroutine<>::call_type}, \code{symmetric_coroutine<>::yield_type}. Often, the canonical threading paradigm in languages is based on function pointers, pthread being one of the most well known example. The main problem of these approach is that the thread usage is limited to a generic handle that must otherwise be wrapped in a custom type. Since the custom type is simple to write and \CFA and solves several issues, added support for routine/lambda based coroutines adds very little.
 \subsection{Trait based coroutines}
 …
 \section{Thread Interface}\label{threads}
 The basic building blocks of multi-threading in \CFA are \glspl{cfathread}. By default these are implemented as \glspl{uthread}, and as such, offer a flexible and lightweight threading interface (lightweight compared to \glspl{kthread}). A thread can be declared using a SUE declaration \code{thread} as follows :
+The basic building blocks of multi-threading in \CFA are \glspl{cfathread}. By default these are implemented as \glspl{uthread}, and as such, offer a flexible and lightweight threading interface (lightweight compared to \glspl{kthread}). A thread can be declared using a SUE declaration \code{thread} as follows:
 \begin{cfacode}
 …
 \end{cfacode}
 Like for coroutines, the keyword is a thin wrapper arount a \CFA trait :
+Like for coroutines, the keyword is a thin wrapper arount a \CFA trait:
 \begin{cfacode}
 …
 \end{cfacode}
 Obviously, for this thread implementation to be usefull it must run some user code. Several other threading interfaces use a function-pointer representation as the interface of threads (for example : \Csharp~\cite{Csharp} and Scala~\cite{Scala}). However, this proposal considers that statically tying a \code{main} routine to a thread superseeds this approach. Since the \code{main} routine is already a special routine in \CFA (where the program begins), it is possible naturally extend the semantics using overloading to declare mains for different threads (the normal main being the main of the initial thread). As such the \code{main} routine of a thread can be defined as :
+Obviously, for this thread implementation to be usefull it must run some user code. Several other threading interfaces use a function-pointer representation as the interface of threads (for example \Csharp~\cite{Csharp} and Scala~\cite{Scala}). However, this proposal considers that statically tying a \code{main} routine to a thread superseeds this approach. Since the \code{main} routine is already a special routine in \CFA (where the program begins), it is possible naturally extend the semantics using overloading to declare mains for different threads (the normal main being the main of the initial thread). As such the \code{main} routine of a thread can be defined as
 \begin{cfacode}
         thread foo {};
 …
 \end{cfacode}
 In this example, threads of type \code{foo} will start there execution in the \code{void main(foo*)} routine which in this case prints \code{"Hello World!"}. While this proposoal encourages this approach which enforces strongly type programming, users may prefer to use the routine based thread semantics for the sake of simplicity. With these semantics it is trivial to write a thread type that takes a function pointer as parameter and executes it on its stack asynchronously :
+In this example, threads of type \code{foo} will start there execution in the \code{void main(foo*)} routine which in this case prints \code{"Hello World!"}. While this proposoal encourages this approach which enforces strongly type programming, users may prefer to use the routine based thread semantics for the sake of simplicity. With these semantics it is trivial to write a thread type that takes a function pointer as parameter and executes it on its stack asynchronously
 \begin{cfacode}
         typedef void (*voidFunc)(void);
 …
 \end{cfacode}
 This semantic has several advantages over explicit semantics : typesafety is guaranteed, a thread is always started and stopped exaclty once and users cannot make any progamming errors. However, one of the apparent drawbacks of this system is that threads now always form a lattice, that is they are always destroyed in opposite order of construction. While this seems like a significant limitation, existing \CFA semantics can solve this problem. Indeed, using dynamic allocation to create threads will naturally let threads outlive the scope in which the thread was created much like dynamically allocating memory will let objects outlive the scope in which thy were created :
+This semantic has several advantages over explicit semantics typesafety is guaranteed, a thread is always started and stopped exaclty once and users cannot make any progamming errors. However, one of the apparent drawbacks of this system is that threads now always form a lattice, that is they are always destroyed in opposite order of construction. While this seems like a significant limitation, existing \CFA semantics can solve this problem. Indeed, using dynamic allocation to create threads will naturally let threads outlive the scope in which the thread was created much like dynamically allocating memory will let objects outlive the scope in which thy were created
 \begin{cfacode}
 …
 \end{cfacode}
 Another advantage of this semantic is that it naturally scale to multiple threads meaning basic synchronisation is very simple :
+Another advantage of this semantic is that it naturally scale to multiple threads meaning basic synchronisation is very simple
 \begin{cfacode}

doc/proposals/concurrency/text/concurrency.tex

-              r27dde72
+              rff98952
 \end{pseudo}
 \end{multicols}
+\begin{center}
+Listing 1
+\end{center}
 It is particularly important to pay attention to code sections 8 and 3, which are where the existing semantics of internal scheduling need to be extended for multiple monitors. The root of the problem is that \gls{group-acquire} is used in a context where one of the monitors is already acquired and is why it is important to define the behaviour of the previous pseudo-code. When the signaller thread reaches the location where it should "release A \& B" (line 17), it must actually transfer ownership of monitor B to the waiting thread. This ownership trasnfer is required in order to prevent barging. Since the signalling thread still needs the monitor A, simply waking up the waiting thread is not an option because it would violate mutual exclusion. We are therefore left with three options:
 …
 \end{multicols}
 However, this solution can become much more complicated depending on what is executed while secretly holding B. Indeed, nothing prevents a user from signalling monitor A on a different condition variable:
+\newpage
 \begin{multicols}{2}
 Thread 1
 \begin{pseudo}
+\begin{pseudo}[numbers=left, firstnumber=1]
 acquire A
         acquire A & B
 …
 Thread 2
 \begin{pseudo}
+\begin{pseudo}[numbers=left, firstnumber=6]
 acquire A
         wait A
 …
 Thread 3
 \begin{pseudo}
+\begin{pseudo}[numbers=left, firstnumber=10]
 acquire A
         acquire A & B
 …
 \end{multicols}
+The goal in this solution is to avoid the need to transfer ownership of a subset of the condition monitors. However, this goal is unreacheable in the previous example since \TODO
+The goal in this solution is to avoid the need to transfer ownership of a subset of the condition monitors. However, this goal is unreacheable in the previous example. Depending on the order of signals (line 12 and 15) two cases can happen. Note that ordering is not determined by a race condition but by whether signalled threads are enqueued in FIFO or FILO order. However, regardless of the answer, users can move line 15 before line 11 and get the reverse effect.
+\paragraph{Case 1: thread 1 will go first.} In this case, the problem is that monitor A needs to be passed to thread 2 when thread 1 is done with it.
+\paragraph{Case 2: thread 2 will go first.} In this case, the problem is that monitor B needs to be passed to thread 1. This can be done directly or using thread 2 as an intermediate.
+\\
+In both cases however, the threads need to be able to distinguish on a per monitor basis which ones need to be released and which ones need to be transferred. Which means monitors cannot be handled as a single homogenous group.
 \subsubsection{Dependency graphs}
 In the previous pseudo-code, there is a solution which would statisfy both barging prevention and mutual exclusion. If ownership of both monitors is transferred to the waiter when the signaller releases A and then the waiter transfers back ownership of A when it releases it then the problem is solved. Dynamically finding the correct order is therefore the second possible solution. The problem it encounters is that it effectively boils down to resolving a dependency graph of ownership requirements. Here even the simplest of code snippets requires two transfers and it seems to increase in a manner closer to polynomial. For example the following code which is just a direct extension to three monitors requires at least three ownership transfer and has multiple solutions:
+In the Listing 1 pseudo-code, there is a solution which would statisfy both barging prevention and mutual exclusion. If ownership of both monitors is transferred to the waiter when the signaller releases A and then the waiter transfers back ownership of A when it releases it then the problem is solved. Dynamically finding the correct order is therefore the second possible solution. The problem it encounters is that it effectively boils down to resolving a dependency graph of ownership requirements. Here even the simplest of code snippets requires two transfers and it seems to increase in a manner closer to polynomial. For example the following code which is just a direct extension to three monitors requires at least three ownership transfer and has multiple solutions:
 \begin{multicols}{2}

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: