Context Navigation

Reverse Diff

Changes in / [d93d980:24eb51ed]

Location:

doc/proposals/concurrency

Files:

: 2 edited

concurrency.tex (modified) (14 diffs)
version (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

doc/proposals/concurrency/concurrency.tex

-              rd93d980
+              r24eb51ed
 \maketitle
-% ### #     # ####### ######  #######
-%  #  ##    #    #    #     # #     #
-%  #  # #   #    #    #     # #     #
-%  #  #  #  #    #    ######  #     #
-%  #  #   # #    #    #   #   #     #
-%  #  #    ##    #    #    #  #     #
-% ### #     #    #    #     # #######
 \section{Introduction}
 This proposal provides a minimal core concurrency API that is both simple, efficient and can be reused to build higher-level features. The simplest possible core is a thread and a lock but this low-level approach is hard to master. An easier approach for users is to support higher-level construct as the basis of the concurrency in \CFA.
 …
 There are actually two problems that need to be solved in the design of the concurrency for a language. Which concurrency tools are available to the users and which parallelism tools are available. While these two concepts are often seen together, they are in fact distinct concepts that require different sorts of tools\cite{Buhr05a}. Concurrency tools need to handle mutual exclusion and synchronization while parallelism tools are more about performance, cost and resource utilization.
-%  #####  ####### #     #  #####  #     # ######  ######  ####### #     #  #####  #     #
-% #     # #     # ##    # #     # #     # #     # #     # #       ##    # #     #  #   #
-% #       #     # # #   # #       #     # #     # #     # #       # #   # #         # #
-% #       #     # #  #  # #       #     # ######  ######  #####   #  #  # #          #
-% #       #     # #   # # #       #     # #   #   #   #   #       #   # # #          #
-% #     # #     # #    ## #     # #     # #    #  #    #  #       #    ## #     #    #
-%  #####  ####### #     #  #####   #####  #     # #     # ####### #     #  #####     #
 \section{Concurrency}
+% Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared-state (Erlang\cite{Erlang}, Haskell\cite{Haskell}, Akka (Scala)\cite{Akka}). In these paradigms, interaction among concurrent objects rely on message passing or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages, these approaches entail a clear distinction between concurrent and non-concurrent paradigms (i.e. message passing versus routine call). Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. However, for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. This project proposes Monitors\cite{Hoare74} as the core concurrency construct.
+% \\
+Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared-state (Erlang\cite{Erlang}, Haskell\cite{Haskell}, Akka (Scala)\cite{Akka}). In these paradigms, interaction among concurrent objects rely on message passing\cite{Thoth,Harmony,V-Kernel} or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages, these approaches entail a clear distinction between concurrent and non-concurrent paradigms (i.e. message passing versus routine call). Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. Many such mechanisms have been proposed, including semaphores~\cite{Dijkstra68b} and path expressions~\cite{Campbell74}. However, for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. One of the most natural, elegant, and efficient mechanisms for synchronization and communication, especially for shared memory systems, is the \emph{monitor}.
+Monitors were first proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}.
+Many programming languages---e.g., Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Modula~\cite{Modula-2}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, NeWS~\cite{NeWS}, Emerald~\cite{Emerald}, \uC~\cite{Buhr92a} and Java~\cite{Java}---provide monitors as explicit language constructs. In addition, operating-system kernels and device drivers have a monitor-like structure, although they often use lower-level primitives such as semaphores or locks to simulate monitors. For these reasons, this project proposes Monitors as the core concurrency construct.
+Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared-state (Erlang\cite{Erlang}, Haskell\cite{Haskell}, Akka (Scala)\cite{Akka}). In these paradigms, interaction among concurrent objects rely on message passing or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages, these approaches entail a clear distinction between concurrent and non-concurrent paradigms (i.e. message passing versus routine call). Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. However, for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. This project proposes Monitors\cite{Hoare74} as the core concurrency construct.
 \\
 Finally, an approach that is worth mentionning because it is gaining in popularity is transactionnal memory\cite{Dice10}. However, the performance and feature set is currently too restrictive to be possible to add such a paradigm to a language like C or \CC\cit, which is why it was rejected as the core paradigm for concurrency in \CFA.
-% #     # ####### #     # ### ####### ####### ######   #####
-% ##   ## #     # ##    #  #     #    #     # #     # #     #
-% # # # # #     # # #   #  #     #    #     # #     # #
-% #  #  # #     # #  #  #  #     #    #     # ######   #####
-% #     # #     # #   # #  #     #    #     # #   #         #
-% #     # #     # #    ##  #     #    #     # #    #  #     #
-% #     # ####### #     # ###    #    ####### #     #  #####
 \subsection{Monitors}
 …
+        }
 \end{lstlisting}
-%  #####     #    #       #
-% #     #   # #   #       #
-% #        #   #  #       #
-% #       #     # #       #
-% #       ####### #       #
-% #     # #     # #       #
-%  #####  #     # ####### #######
 \subsubsection{Call semantics} \label{call}
 …
 The problem is to indentify which object(s) should be acquired. Furthermore we also need to acquire each objects only once. In case of simple routines like \code{f1} and \code{f2} it is easy to identify an exhaustive list of objects to acquire on entering. Adding indirections (\code{f3}) still allows the compiler and programmer to indentify which object will be acquired. However, adding in arrays (\code{f4}) makes it much harder. Array lengths aren't necessarily known in C and even then making sure we only acquire objects once becomes also none trivial. This can be extended to absurd limits like \code{f5} which uses a custom graph of monitors. To keep everyone as sane as possible\cite{Chicken}, this projects imposes the requirement that a routine may only acquire one monitor per parameter and it must be the type of the parameter (ignoring potential qualifiers and indirections).
-% ######     #    #######    #
-% #     #   # #      #      # #
-% #     #  #   #     #     #   #
-% #     # #     #    #    #     #
-% #     # #######    #    #######
-% #     # #     #    #    #     #
-% ######  #     #    #    #     #
 \subsubsection{Data semantics} \label{data}
 Once the call semantics are established, the next step is to establish data semantics. Indeed, until now a monitor is used simply as a generic handle but in most cases monitors contian shared data. This data should be intrinsic to the monitor declaration to prevent any accidental use of data without its appripriate protection. For example here is a more fleshed-out version of the counter showed in \ref{call}:
 …
 Recursive mutex routine calls are allowed in \CFA but if not done carefully it can lead to nested monitor call problems\cite{Lister77}. These problems which are a specific  implementation of the lock acquiring order problem. In the example above, the user uses implicit ordering in the case of function \code{bar} but explicit ordering in the case of \code{baz}. This subtle mistake can mean that calling these two functions concurrently will lead to deadlocks, depending on the implicit ordering matching the explicit ordering. As shown on several occasion\cit, there isn't really any solutions to this problem, users simply need to be carefull when acquiring multiple monitors at the same time.
-% ######  ####### #######    #    ### #        #####
-% #     # #          #      # #    #  #       #     #
-% #     # #          #     #   #   #  #       #
-% #     # #####      #    #     #  #  #        #####
-% #     # #          #    #######  #  #             #
-% #     # #          #    #     #  #  #       #     #
-% ######  #######    #    #     # ### #######  #####
+%
-%             ######  ####### #       #     # #     # ####### ######  #     #
-%             #     # #     # #        #   #  ##   ## #     # #     # #     #
-%             #     # #     # #         # #   # # # # #     # #     # #     #
-%  #####    ######  #     # #          #    #  #  # #     # ######  #######
-%             #       #     # #          #    #     # #     # #   #   #     #
-%             #       #     # #          #    #     # #     # #    #  #     #
-%             #       ####### #######    #    #     # ####### #     # #     #
 \subsubsection{Implementation Details: Interaction with polymorphism}
 At first glance, interaction between monitors and \CFA's concept of polymorphism seem complexe to support. However, it can be reasoned that entry-point locking can solve most of the issues that could be present with polymorphism.
 First of all, interaction between \code{otype} polymorphism and monitors is impossible since monitors do not support copying. Therefore the main question is how to support \code{dtype} polymorphism. We must remember that monitors' main purpose is to ensure mutual exclusion when accessing shared data. This implies that mutual exclusion is only required for routines that do in fact access shared data. However, since \code{dtype} polymorphism always handle incomplete types (by definition) no \code{dtype} polymorphic routine can access shared data since the data would require knowledge about the type. Therefore the only concern when combining \code{dtype} polymorphism and monitors is to protect access to routines. With callsite-locking, this would require significant amount of work since any \code{dtype} routine could have to obtain some lock before calling a routine. However, with entry-point-locking calling a monitor routine becomes exactly the same as calling it from anywhere else.
-% ### #     # #######         #####   #####  #     # ####### ######
-%  #  ##    #    #           #     # #     # #     # #       #     #
-%  #  # #   #    #           #       #       #     # #       #     #
-%  #  #  #  #    #            #####  #       ####### #####   #     #
-%  #  #   # #    #    ###          # #       #     # #       #     #
-%  #  #    ##    #    ###    #     # #     # #     # #       #     #
-% ### #     #    #    ###     #####   #####  #     # ####### ######
 \subsection{Internal scheduling} \label{insched}
 …
 \\
-% ####### #     # #######         #####   #####  #     # ####### ######
-% #        #   #     #           #     # #     # #     # #       #     #
-% #         # #      #           #       #       #     # #       #     #
-% #####      #       #            #####  #       ####### #####   #     #
-% #         # #      #    ###          # #       #     # #       #     #
-% #        #   #     #    ###    #     # #     # #     # #       #     #
-% ####### #     #    #    ###     #####   #####  #     # ####### ######
 \subsection{External scheduling} \label{extsched}
 As one might expect, the alternative to Internal scheduling is to use External scheduling instead. This method is somewhat more robust to deadlocks since one of the threads keeps a relatively tight control on scheduling. Indeed, as the following examples will demonstrate, external scheduling allows users to wait for events from other threads without the concern of unrelated events occuring. External scheduling can generally be done either in terms of control flow (ex: \uC) or in terms of data (ex: Go). Of course, both of these paradigms have their own strenghts and weaknesses but for this project control flow semantics where chosen to stay consistent with the rest of the languages semantics. Two challenges specific to \CFA arise when trying to add external scheduling with loose object definitions and multi-monitor routines. The following example shows what a simple use \code{accept} versus \code{wait}/\code{signal} and its advantages.
 …
 In the case of internal scheduling, the call to \code{wait} only guarantees that \code{g} was the last routine to access the monitor. This intails that the routine \code{f} may have acquired mutual exclusion several times while routine \code{h} was waiting. On the other hand, external scheduling guarantees that while routine \code{h} was waiting, no routine other than \code{g} could acquire the monitor.
 \\
-% #       ####### #######  #####  #######    ####### ######        #  #####
-% #       #     # #     # #     # #          #     # #     #       # #     #
-% #       #     # #     # #       #          #     # #     #       # #
-% #       #     # #     #  #####  #####      #     # ######        #  #####
-% #       #     # #     #       # #          #     # #     # #     #       #
-% #       #     # #     # #     # #          #     # #     # #     # #     #
-% ####### ####### #######  #####  #######    ####### ######   #####   #####
 \subsubsection{Loose object definitions}
 …
 An other aspect to consider is what happens if multiple overloads of the same routine are used. For the time being it is assumed that multiple overloads of the same routine should be scheduled regardless of the overload used. However, this could easily be extended in the future.
-% #     # #     # #       ####### ###    #     # ####### #     #
-% ##   ## #     # #          #     #     ##   ## #     # ##    #
-% # # # # #     # #          #     #     # # # # #     # # #   #
-% #  #  # #     # #          #     #     #  #  # #     # #  #  #
-% #     # #     # #          #     #     #     # #     # #   # #
-% #     # #     # #          #     #     #     # #     # #    ##
-% #     #  #####  #######    #    ###    #     # ####### #     #
 \subsubsection{Multi-monitor scheduling}
 …
 Note that the set of monitors passed to the \code{accept} statement must be entirely contained in the set of monitor already acquired in the routine. \code{accept} used in any other context is Undefined Behaviour.
-% ######  ####### #######    #    ### #        #####
-% #     # #          #      # #    #  #       #     #
-% #     # #          #     #   #   #  #       #
-% #     # #####      #    #     #  #  #        #####
-% #     # #          #    #######  #  #             #
-% #     # #          #    #     #  #  #       #     #
-% ######  #######    #    #     # ### #######  #####
+%
-%                #####  #     # ####### #     # #######  #####
-%             #     # #     # #       #     # #       #     #
-%             #     # #     # #       #     # #       #
-%    #####    #     # #     # #####   #     # #####    #####
-%             #   # # #     # #       #     # #             #
-%             #    #  #     # #       #     # #       #     #
-%                #### #  #####  #######  #####  #######  #####
 \subsubsection{Implementation Details: External scheduling queues}
 To support multi-monitor external scheduling means that some kind of entry-queues must be used that is aware of both monitors. However, acceptable routines must be aware of the entry queues which means they must be stored inside at least one of the monitors that will be acquired. This in turn adds the requirement a systematic algorithm of disambiguating which queue is relavant regardless of user ordering. The proposed algorithm is to fall back on monitors lock ordering and specify that the monitor that is acquired first is the lock with the relevant entry queue. This assumes that the lock acquiring order is static for the lifetime of all concerned objects but that is a reasonnable constraint. This algorithm choice has two consequences, the entry queue of the highest priority monitor is no longer a true FIFO queue and the queue of the lowest priority monitor is both required and probably unused. The queue can no longer be a FIFO queue because instead of simply containing the waiting threads in order arrival, they also contain the second mutex. Therefore, another thread with the same highest priority monitor but a different lowest priority monitor may arrive first but enter the critical section after a thread with the correct pairing. Secondly, since it may not be known at compile time which monitor will be the lowest priority monitor, every monitor needs to have the correct queues even though it is probable that half the multi-monitor queues will go unused for the entire duration of the program.
 …
 \newpage
-% ######     #    ######     #    #       #       ####### #       ###  #####  #     #
-% #     #   # #   #     #   # #   #       #       #       #        #  #     # ##   ##
-% #     #  #   #  #     #  #   #  #       #       #       #        #  #       # # # #
-% ######  #     # ######  #     # #       #       #####   #        #   #####  #  #  #
-% #       ####### #   #   ####### #       #       #       #        #        # #     #
-% #       #     # #    #  #     # #       #       #       #        #  #     # #     #
-% #       #     # #     # #     # ####### ####### ####### ####### ###  #####  #     #
 \section{Parallelism}
 Historically, computer performance was about processor speeds and instructions count. However, with heat dissipation being an ever growing challenge, parallelism has become the new source of greatest performance \cite{Sutter05, Sutter05b}. In this decade, it is not longer reasonnable to create high-performance application without caring about parallelism. Indeed, parallelism is an important aspect of performance and more specifically throughput and hardware utilization. The lowest level approach of parallelism is to use \glspl{kthread}. However since these have significant costs and limitations \glspl{kthread} are now mostly used as an implementation tool rather than a user oriented one. There are several alternatives to solve these issues which all have strengths and weaknesses.
 …
 While the choice between the three paradigms listed above may have significant performance implication, it is difficult to pin the performance implications of chosing a model at the language level. Indeed, in many situations one of these paradigms will show better performance but it all strongly depends on the usage. Having mostly indepent units of work to execute almost guarantess that the \gls{job} based system will have the best performance. However, add interactions between jobs and the processor utilisation might suffer. User-level threads may allow maximum ressource utilisation but context switches will be more expansive and it is also harder for users to get perfect tunning. As with every example, fibers sit somewhat in the middle of the spectrum. Furthermore, if the units of uninterrupted work are large enough the paradigm choice will be largely amorticised by the actual work done.
-%  #####  #######    #          ####### ######  ######
-% #     # #         # #            #    #     # #     #
-% #       #        #   #           #    #     # #     #
-% #       #####   #     # #####    #    ######  ######
-% #       #       #######          #    #     # #     #
-% #     # #       #     #          #    #     # #     #
-%  #####  #       #     #          #    ######  ######
 \section{\CFA 's Thread Building Blocks}
 As a system level language, \CFA should offer both performance and flexibilty as its primary goals, simplicity and user-friendliness being a secondary concern. Therefore, the core of parallelism in \CFA should prioritize power and efficiency. With this said, it is possible to deconstruct the three paradigms details aboved in order to get simple building blocks. Here is a table showing the core caracteristics of the mentionned paradigms :
 …
 As shown in section \ref{cfaparadigms} these different blocks being available in \CFA it is trivial to reproduce any of these paradigm.
-% ####### #     # ######  #######    #    ######   #####
-%    #    #     # #     # #         # #   #     # #     #
-%    #    #     # #     # #        #   #  #     # #
-%    #    ####### ######  #####   #     # #     #  #####
-%    #    #     # #   #   #       ####### #     #       #
-%    #    #     # #    #  #       #     # #     # #     #
-%    #    #     # #     # ####### #     # ######   #####
 \subsection{Thread Interface}
 …
 % \textbf{\large{Work in progress...}} Do wee need something beyond specifying the number of kernel threads?
-%    #    #       #
-%   # #   #       #
-%  #   #  #       #
-% #     # #       #
-% ####### #       #
-% #     # #       #
-% #     # ####### #######
 \section{Putting it all together}
-% ####### #     # ####### #     # ######  #######
-% #       #     #    #    #     # #     # #
-% #       #     #    #    #     # #     # #
-% #####   #     #    #    #     # ######  #####
-% #       #     #    #    #     # #   #   #
-% #       #     #    #    #     # #    #  #
-% #        #####     #     #####  #     # ######
 \section{Future work}
 Concurrency and parallelism is still a very active field that strongly benefits from hardware advances. As such certain features that aren't necessarily mature enough in their current state could become relevant in the lifetime of \CFA.
 \subsection{Transactions}
-% ####### #     # ######
-% #       ##    # #     #
-% #       # #   # #     #
-% #####   #  #  # #     #
-% #       #   # # #     #
-% #       #    ## #     #
-% ####### #     # ######
 \section*{Acknowledgements}
 …
 \clearpage
 \bibliographystyle{plain}
 \bibliography{cw92,distSharedMem,lfp92,mlw92,parallel,parallelIO,partheory,pl,pldi92,ps,realtime,techreportsPAB,visual,local}
+\bibliography{pl,local}

doc/proposals/concurrency/version

rd93d980	r24eb51ed
1		0.4.99
	1	0.4.95

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changes in / [d93d980:24eb51ed]

Legend:

doc/proposals/concurrency/concurrency.tex

doc/proposals/concurrency/version

Download in other formats: