Changes in / [d93d980:24eb51ed]


Ignore:
Location:
doc/proposals/concurrency
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • doc/proposals/concurrency/concurrency.tex

    rd93d980 r24eb51ed  
    9090
    9191\maketitle
    92 
    93 % ### #     # ####### ######  #######
    94 %  #  ##    #    #    #     # #     #
    95 %  #  # #   #    #    #     # #     #
    96 %  #  #  #  #    #    ######  #     #
    97 %  #  #   # #    #    #   #   #     #
    98 %  #  #    ##    #    #    #  #     #
    99 % ### #     #    #    #     # #######
    100 
    10192\section{Introduction}
    10293This proposal provides a minimal core concurrency API that is both simple, efficient and can be reused to build higher-level features. The simplest possible core is a thread and a lock but this low-level approach is hard to master. An easier approach for users is to support higher-level construct as the basis of the concurrency in \CFA.
     
    10596There are actually two problems that need to be solved in the design of the concurrency for a language. Which concurrency tools are available to the users and which parallelism tools are available. While these two concepts are often seen together, they are in fact distinct concepts that require different sorts of tools\cite{Buhr05a}. Concurrency tools need to handle mutual exclusion and synchronization while parallelism tools are more about performance, cost and resource utilization.
    10697
    107 %  #####  ####### #     #  #####  #     # ######  ######  ####### #     #  #####  #     #
    108 % #     # #     # ##    # #     # #     # #     # #     # #       ##    # #     #  #   #
    109 % #       #     # # #   # #       #     # #     # #     # #       # #   # #         # #
    110 % #       #     # #  #  # #       #     # ######  ######  #####   #  #  # #          #
    111 % #       #     # #   # # #       #     # #   #   #   #   #       #   # # #          #
    112 % #     # #     # #    ## #     # #     # #    #  #    #  #       #    ## #     #    #
    113 %  #####  ####### #     #  #####   #####  #     # #     # ####### #     #  #####     #
    114 
    11598\section{Concurrency}
    116 % Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared-state (Erlang\cite{Erlang}, Haskell\cite{Haskell}, Akka (Scala)\cite{Akka}). In these paradigms, interaction among concurrent objects rely on message passing or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages, these approaches entail a clear distinction between concurrent and non-concurrent paradigms (i.e. message passing versus routine call). Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. However, for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. This project proposes Monitors\cite{Hoare74} as the core concurrency construct.
    117 % \\
    118 
    119 Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared-state (Erlang\cite{Erlang}, Haskell\cite{Haskell}, Akka (Scala)\cite{Akka}). In these paradigms, interaction among concurrent objects rely on message passing\cite{Thoth,Harmony,V-Kernel} or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages, these approaches entail a clear distinction between concurrent and non-concurrent paradigms (i.e. message passing versus routine call). Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. Many such mechanisms have been proposed, including semaphores~\cite{Dijkstra68b} and path expressions~\cite{Campbell74}. However, for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. One of the most natural, elegant, and efficient mechanisms for synchronization and communication, especially for shared memory systems, is the \emph{monitor}.
    120 
    121 Monitors were first proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}.
    122 Many programming languages---e.g., Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Modula~\cite{Modula-2}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, NeWS~\cite{NeWS}, Emerald~\cite{Emerald}, \uC~\cite{Buhr92a} and Java~\cite{Java}---provide monitors as explicit language constructs. In addition, operating-system kernels and device drivers have a monitor-like structure, although they often use lower-level primitives such as semaphores or locks to simulate monitors. For these reasons, this project proposes Monitors as the core concurrency construct.
     99Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared-state (Erlang\cite{Erlang}, Haskell\cite{Haskell}, Akka (Scala)\cite{Akka}). In these paradigms, interaction among concurrent objects rely on message passing or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages, these approaches entail a clear distinction between concurrent and non-concurrent paradigms (i.e. message passing versus routine call). Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. However, for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. This project proposes Monitors\cite{Hoare74} as the core concurrency construct.
    123100\\
    124101
    125102Finally, an approach that is worth mentionning because it is gaining in popularity is transactionnal memory\cite{Dice10}. However, the performance and feature set is currently too restrictive to be possible to add such a paradigm to a language like C or \CC\cit, which is why it was rejected as the core paradigm for concurrency in \CFA.
    126 
    127 % #     # ####### #     # ### ####### ####### ######   #####
    128 % ##   ## #     # ##    #  #     #    #     # #     # #     #
    129 % # # # # #     # # #   #  #     #    #     # #     # #
    130 % #  #  # #     # #  #  #  #     #    #     # ######   #####
    131 % #     # #     # #   # #  #     #    #     # #   #         #
    132 % #     # #     # #    ##  #     #    #     # #    #  #     #
    133 % #     # ####### #     # ###    #    ####### #     #  #####
    134103
    135104\subsection{Monitors}
     
    144113        }
    145114\end{lstlisting}
    146 
    147 %  #####     #    #       #
    148 % #     #   # #   #       #
    149 % #        #   #  #       #
    150 % #       #     # #       #
    151 % #       ####### #       #
    152 % #     # #     # #       #
    153 %  #####  #     # ####### #######
    154115
    155116\subsubsection{Call semantics} \label{call}
     
    187148The problem is to indentify which object(s) should be acquired. Furthermore we also need to acquire each objects only once. In case of simple routines like \code{f1} and \code{f2} it is easy to identify an exhaustive list of objects to acquire on entering. Adding indirections (\code{f3}) still allows the compiler and programmer to indentify which object will be acquired. However, adding in arrays (\code{f4}) makes it much harder. Array lengths aren't necessarily known in C and even then making sure we only acquire objects once becomes also none trivial. This can be extended to absurd limits like \code{f5} which uses a custom graph of monitors. To keep everyone as sane as possible\cite{Chicken}, this projects imposes the requirement that a routine may only acquire one monitor per parameter and it must be the type of the parameter (ignoring potential qualifiers and indirections).
    188149
    189 % ######     #    #######    #
    190 % #     #   # #      #      # #
    191 % #     #  #   #     #     #   #
    192 % #     # #     #    #    #     #
    193 % #     # #######    #    #######
    194 % #     # #     #    #    #     #
    195 % ######  #     #    #    #     #
    196 
    197150\subsubsection{Data semantics} \label{data}
    198151Once the call semantics are established, the next step is to establish data semantics. Indeed, until now a monitor is used simply as a generic handle but in most cases monitors contian shared data. This data should be intrinsic to the monitor declaration to prevent any accidental use of data without its appripriate protection. For example here is a more fleshed-out version of the counter showed in \ref{call}:
     
    266219Recursive mutex routine calls are allowed in \CFA but if not done carefully it can lead to nested monitor call problems\cite{Lister77}. These problems which are a specific  implementation of the lock acquiring order problem. In the example above, the user uses implicit ordering in the case of function \code{bar} but explicit ordering in the case of \code{baz}. This subtle mistake can mean that calling these two functions concurrently will lead to deadlocks, depending on the implicit ordering matching the explicit ordering. As shown on several occasion\cit, there isn't really any solutions to this problem, users simply need to be carefull when acquiring multiple monitors at the same time.
    267220
    268 % ######  ####### #######    #    ### #        #####
    269 % #     # #          #      # #    #  #       #     #
    270 % #     # #          #     #   #   #  #       #
    271 % #     # #####      #    #     #  #  #        #####
    272 % #     # #          #    #######  #  #             #
    273 % #     # #          #    #     #  #  #       #     #
    274 % ######  #######    #    #     # ### #######  #####
    275 %
    276 %             ######  ####### #       #     # #     # ####### ######  #     #
    277 %             #     # #     # #        #   #  ##   ## #     # #     # #     #
    278 %             #     # #     # #         # #   # # # # #     # #     # #     #
    279 %  #####    ######  #     # #          #    #  #  # #     # ######  #######
    280 %             #       #     # #          #    #     # #     # #   #   #     #
    281 %             #       #     # #          #    #     # #     # #    #  #     #
    282 %             #       ####### #######    #    #     # ####### #     # #     #
    283 
    284221\subsubsection{Implementation Details: Interaction with polymorphism}
    285222At first glance, interaction between monitors and \CFA's concept of polymorphism seem complexe to support. However, it can be reasoned that entry-point locking can solve most of the issues that could be present with polymorphism.
    286223
    287224First of all, interaction between \code{otype} polymorphism and monitors is impossible since monitors do not support copying. Therefore the main question is how to support \code{dtype} polymorphism. We must remember that monitors' main purpose is to ensure mutual exclusion when accessing shared data. This implies that mutual exclusion is only required for routines that do in fact access shared data. However, since \code{dtype} polymorphism always handle incomplete types (by definition) no \code{dtype} polymorphic routine can access shared data since the data would require knowledge about the type. Therefore the only concern when combining \code{dtype} polymorphism and monitors is to protect access to routines. With callsite-locking, this would require significant amount of work since any \code{dtype} routine could have to obtain some lock before calling a routine. However, with entry-point-locking calling a monitor routine becomes exactly the same as calling it from anywhere else.
    288 
    289 % ### #     # #######         #####   #####  #     # ####### ######
    290 %  #  ##    #    #           #     # #     # #     # #       #     #
    291 %  #  # #   #    #           #       #       #     # #       #     #
    292 %  #  #  #  #    #            #####  #       ####### #####   #     #
    293 %  #  #   # #    #    ###          # #       #     # #       #     #
    294 %  #  #    ##    #    ###    #     # #     # #     # #       #     #
    295 % ### #     #    #    ###     #####   #####  #     # ####### ######
    296225
    297226\subsection{Internal scheduling} \label{insched}
     
    538467\\
    539468
    540 % ####### #     # #######         #####   #####  #     # ####### ######
    541 % #        #   #     #           #     # #     # #     # #       #     #
    542 % #         # #      #           #       #       #     # #       #     #
    543 % #####      #       #            #####  #       ####### #####   #     #
    544 % #         # #      #    ###          # #       #     # #       #     #
    545 % #        #   #     #    ###    #     # #     # #     # #       #     #
    546 % ####### #     #    #    ###     #####   #####  #     # ####### ######
    547 
    548469\subsection{External scheduling} \label{extsched}
    549470As one might expect, the alternative to Internal scheduling is to use External scheduling instead. This method is somewhat more robust to deadlocks since one of the threads keeps a relatively tight control on scheduling. Indeed, as the following examples will demonstrate, external scheduling allows users to wait for events from other threads without the concern of unrelated events occuring. External scheduling can generally be done either in terms of control flow (ex: \uC) or in terms of data (ex: Go). Of course, both of these paradigms have their own strenghts and weaknesses but for this project control flow semantics where chosen to stay consistent with the rest of the languages semantics. Two challenges specific to \CFA arise when trying to add external scheduling with loose object definitions and multi-monitor routines. The following example shows what a simple use \code{accept} versus \code{wait}/\code{signal} and its advantages.
     
    575496In the case of internal scheduling, the call to \code{wait} only guarantees that \code{g} was the last routine to access the monitor. This intails that the routine \code{f} may have acquired mutual exclusion several times while routine \code{h} was waiting. On the other hand, external scheduling guarantees that while routine \code{h} was waiting, no routine other than \code{g} could acquire the monitor.
    576497\\
    577 
    578 % #       ####### #######  #####  #######    ####### ######        #  #####
    579 % #       #     # #     # #     # #          #     # #     #       # #     #
    580 % #       #     # #     # #       #          #     # #     #       # #
    581 % #       #     # #     #  #####  #####      #     # ######        #  #####
    582 % #       #     # #     #       # #          #     # #     # #     #       #
    583 % #       #     # #     # #     # #          #     # #     # #     # #     #
    584 % ####### ####### #######  #####  #######    ####### ######   #####   #####
    585498
    586499\subsubsection{Loose object definitions}
     
    674587An other aspect to consider is what happens if multiple overloads of the same routine are used. For the time being it is assumed that multiple overloads of the same routine should be scheduled regardless of the overload used. However, this could easily be extended in the future.
    675588
    676 % #     # #     # #       ####### ###    #     # ####### #     #
    677 % ##   ## #     # #          #     #     ##   ## #     # ##    #
    678 % # # # # #     # #          #     #     # # # # #     # # #   #
    679 % #  #  # #     # #          #     #     #  #  # #     # #  #  #
    680 % #     # #     # #          #     #     #     # #     # #   # #
    681 % #     # #     # #          #     #     #     # #     # #    ##
    682 % #     #  #####  #######    #    ###    #     # ####### #     #
    683 
    684589\subsubsection{Multi-monitor scheduling}
    685590
     
    724629Note that the set of monitors passed to the \code{accept} statement must be entirely contained in the set of monitor already acquired in the routine. \code{accept} used in any other context is Undefined Behaviour.
    725630
    726 % ######  ####### #######    #    ### #        #####
    727 % #     # #          #      # #    #  #       #     #
    728 % #     # #          #     #   #   #  #       #
    729 % #     # #####      #    #     #  #  #        #####
    730 % #     # #          #    #######  #  #             #
    731 % #     # #          #    #     #  #  #       #     #
    732 % ######  #######    #    #     # ### #######  #####
    733 %
    734 %                #####  #     # ####### #     # #######  #####
    735 %             #     # #     # #       #     # #       #     #
    736 %             #     # #     # #       #     # #       #
    737 %    #####    #     # #     # #####   #     # #####    #####
    738 %             #   # # #     # #       #     # #             #
    739 %             #    #  #     # #       #     # #       #     #
    740 %                #### #  #####  #######  #####  #######  #####
    741 
    742 
    743631\subsubsection{Implementation Details: External scheduling queues}
    744632To support multi-monitor external scheduling means that some kind of entry-queues must be used that is aware of both monitors. However, acceptable routines must be aware of the entry queues which means they must be stored inside at least one of the monitors that will be acquired. This in turn adds the requirement a systematic algorithm of disambiguating which queue is relavant regardless of user ordering. The proposed algorithm is to fall back on monitors lock ordering and specify that the monitor that is acquired first is the lock with the relevant entry queue. This assumes that the lock acquiring order is static for the lifetime of all concerned objects but that is a reasonnable constraint. This algorithm choice has two consequences, the entry queue of the highest priority monitor is no longer a true FIFO queue and the queue of the lowest priority monitor is both required and probably unused. The queue can no longer be a FIFO queue because instead of simply containing the waiting threads in order arrival, they also contain the second mutex. Therefore, another thread with the same highest priority monitor but a different lowest priority monitor may arrive first but enter the critical section after a thread with the correct pairing. Secondly, since it may not be known at compile time which monitor will be the lowest priority monitor, every monitor needs to have the correct queues even though it is probable that half the multi-monitor queues will go unused for the entire duration of the program.
     
    748636
    749637\newpage
    750 % ######     #    ######     #    #       #       ####### #       ###  #####  #     #
    751 % #     #   # #   #     #   # #   #       #       #       #        #  #     # ##   ##
    752 % #     #  #   #  #     #  #   #  #       #       #       #        #  #       # # # #
    753 % ######  #     # ######  #     # #       #       #####   #        #   #####  #  #  #
    754 % #       ####### #   #   ####### #       #       #       #        #        # #     #
    755 % #       #     # #    #  #     # #       #       #       #        #  #     # #     #
    756 % #       #     # #     # #     # ####### ####### ####### ####### ###  #####  #     #
    757638\section{Parallelism}
    758639Historically, computer performance was about processor speeds and instructions count. However, with heat dissipation being an ever growing challenge, parallelism has become the new source of greatest performance \cite{Sutter05, Sutter05b}. In this decade, it is not longer reasonnable to create high-performance application without caring about parallelism. Indeed, parallelism is an important aspect of performance and more specifically throughput and hardware utilization. The lowest level approach of parallelism is to use \glspl{kthread}. However since these have significant costs and limitations \glspl{kthread} are now mostly used as an implementation tool rather than a user oriented one. There are several alternatives to solve these issues which all have strengths and weaknesses.
     
    774655While the choice between the three paradigms listed above may have significant performance implication, it is difficult to pin the performance implications of chosing a model at the language level. Indeed, in many situations one of these paradigms will show better performance but it all strongly depends on the usage. Having mostly indepent units of work to execute almost guarantess that the \gls{job} based system will have the best performance. However, add interactions between jobs and the processor utilisation might suffer. User-level threads may allow maximum ressource utilisation but context switches will be more expansive and it is also harder for users to get perfect tunning. As with every example, fibers sit somewhat in the middle of the spectrum. Furthermore, if the units of uninterrupted work are large enough the paradigm choice will be largely amorticised by the actual work done.
    775656
    776 %  #####  #######    #          ####### ######  ######
    777 % #     # #         # #            #    #     # #     #
    778 % #       #        #   #           #    #     # #     #
    779 % #       #####   #     # #####    #    ######  ######
    780 % #       #       #######          #    #     # #     #
    781 % #     # #       #     #          #    #     # #     #
    782 %  #####  #       #     #          #    ######  ######
    783 
    784657\section{\CFA 's Thread Building Blocks}
    785658As a system level language, \CFA should offer both performance and flexibilty as its primary goals, simplicity and user-friendliness being a secondary concern. Therefore, the core of parallelism in \CFA should prioritize power and efficiency. With this said, it is possible to deconstruct the three paradigms details aboved in order to get simple building blocks. Here is a table showing the core caracteristics of the mentionned paradigms :
     
    799672
    800673As shown in section \ref{cfaparadigms} these different blocks being available in \CFA it is trivial to reproduce any of these paradigm.
    801 
    802 % ####### #     # ######  #######    #    ######   #####
    803 %    #    #     # #     # #         # #   #     # #     #
    804 %    #    #     # #     # #        #   #  #     # #
    805 %    #    ####### ######  #####   #     # #     #  #####
    806 %    #    #     # #   #   #       ####### #     #       #
    807 %    #    #     # #    #  #       #     # #     # #     #
    808 %    #    #     # #     # ####### #     # ######   #####
    809674
    810675\subsection{Thread Interface}
     
    978843% \textbf{\large{Work in progress...}} Do wee need something beyond specifying the number of kernel threads?
    979844
    980 %    #    #       #
    981 %   # #   #       #
    982 %  #   #  #       #
    983 % #     # #       #
    984 % ####### #       #
    985 % #     # #       #
    986 % #     # ####### #######
    987845\section{Putting it all together}
    988846
    989 
    990 
    991 
    992 
    993 
    994 
    995 
    996 
    997 
    998 % ####### #     # ####### #     # ######  #######
    999 % #       #     #    #    #     # #     # #
    1000 % #       #     #    #    #     # #     # #
    1001 % #####   #     #    #    #     # ######  #####
    1002 % #       #     #    #    #     # #   #   #
    1003 % #       #     #    #    #     # #    #  #
    1004 % #        #####     #     #####  #     # ######
    1005847\section{Future work}
    1006848Concurrency and parallelism is still a very active field that strongly benefits from hardware advances. As such certain features that aren't necessarily mature enough in their current state could become relevant in the lifetime of \CFA.
    1007849\subsection{Transactions}
    1008850
    1009 % ####### #     # ######
    1010 % #       ##    # #     #
    1011 % #       # #   # #     #
    1012 % #####   #  #  # #     #
    1013 % #       #   # # #     #
    1014 % #       #    ## #     #
    1015 % ####### #     # ######
    1016851\section*{Acknowledgements}
    1017852
     
    1022857\clearpage
    1023858\bibliographystyle{plain}
    1024 \bibliography{cw92,distSharedMem,lfp92,mlw92,parallel,parallelIO,partheory,pl,pldi92,ps,realtime,techreportsPAB,visual,local}
     859\bibliography{pl,local}
    1025860
    1026861
  • doc/proposals/concurrency/version

    rd93d980 r24eb51ed  
    1 0.4.99
     10.4.95
Note: See TracChangeset for help on using the changeset viewer.