Changeset f849c8e for doc


Ignore:
Timestamp:
Oct 28, 2016, 1:11:12 PM (8 years ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
4b1afb6
Parents:
837f999 (diff), 0ebac75 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg2:software/cfa/cfa-cc

Conflicts:

src/Makefile.am
src/Makefile.in

Location:
doc/proposals/concurrency
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • doc/proposals/concurrency/concurrency.tex

    r837f999 rf849c8e  
    9090
    9191\maketitle
     92
     93% ### #     # ####### ######  #######
     94%  #  ##    #    #    #     # #     #
     95%  #  # #   #    #    #     # #     #
     96%  #  #  #  #    #    ######  #     #
     97%  #  #   # #    #    #   #   #     #
     98%  #  #    ##    #    #    #  #     #
     99% ### #     #    #    #     # #######
     100
    92101\section{Introduction}
    93102This proposal provides a minimal core concurrency API that is both simple, efficient and can be reused to build higher-level features. The simplest possible core is a thread and a lock but this low-level approach is hard to master. An easier approach for users is to support higher-level construct as the basis of the concurrency in \CFA.
     
    96105There are actually two problems that need to be solved in the design of the concurrency for a language. Which concurrency tools are available to the users and which parallelism tools are available. While these two concepts are often seen together, they are in fact distinct concepts that require different sorts of tools\cite{Buhr05a}. Concurrency tools need to handle mutual exclusion and synchronization while parallelism tools are more about performance, cost and resource utilization.
    97106
     107%  #####  ####### #     #  #####  #     # ######  ######  ####### #     #  #####  #     #
     108% #     # #     # ##    # #     # #     # #     # #     # #       ##    # #     #  #   #
     109% #       #     # # #   # #       #     # #     # #     # #       # #   # #         # #
     110% #       #     # #  #  # #       #     # ######  ######  #####   #  #  # #          #
     111% #       #     # #   # # #       #     # #   #   #   #   #       #   # # #          #
     112% #     # #     # #    ## #     # #     # #    #  #    #  #       #    ## #     #    #
     113%  #####  ####### #     #  #####   #####  #     # #     # ####### #     #  #####     #
     114
    98115\section{Concurrency}
    99 Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared-state (Erlang\cite{Erlang}, Haskell\cite{Haskell}, Akka (Scala)\cite{Akka}). In these paradigms, interaction among concurrent objects rely on message passing or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages, these approaches entail a clear distinction between concurrent and non-concurrent paradigms (i.e. message passing versus routine call). Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. However, for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. This project proposes Monitors\cite{Hoare74} as the core concurrency construct.
     116% Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared-state (Erlang\cite{Erlang}, Haskell\cite{Haskell}, Akka (Scala)\cite{Akka}). In these paradigms, interaction among concurrent objects rely on message passing or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages, these approaches entail a clear distinction between concurrent and non-concurrent paradigms (i.e. message passing versus routine call). Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. However, for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. This project proposes Monitors\cite{Hoare74} as the core concurrency construct.
     117% \\
     118
     119Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared-state (Erlang\cite{Erlang}, Haskell\cite{Haskell}, Akka (Scala)\cite{Akka}). In these paradigms, interaction among concurrent objects rely on message passing\cite{Thoth,Harmony,V-Kernel} or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages, these approaches entail a clear distinction between concurrent and non-concurrent paradigms (i.e. message passing versus routine call). Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. Many such mechanisms have been proposed, including semaphores~\cite{Dijkstra68b} and path expressions~\cite{Campbell74}. However, for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. One of the most natural, elegant, and efficient mechanisms for synchronization and communication, especially for shared memory systems, is the \emph{monitor}.
     120
     121Monitors were first proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}.
     122Many programming languages---e.g., Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Modula~\cite{Modula-2}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, NeWS~\cite{NeWS}, Emerald~\cite{Emerald}, \uC~\cite{Buhr92a} and Java~\cite{Java}---provide monitors as explicit language constructs. In addition, operating-system kernels and device drivers have a monitor-like structure, although they often use lower-level primitives such as semaphores or locks to simulate monitors. For these reasons, this project proposes Monitors as the core concurrency construct.
    100123\\
    101124
    102125Finally, an approach that is worth mentionning because it is gaining in popularity is transactionnal memory\cite{Dice10}. However, the performance and feature set is currently too restrictive to be possible to add such a paradigm to a language like C or \CC\cit, which is why it was rejected as the core paradigm for concurrency in \CFA.
     126
     127% #     # ####### #     # ### ####### ####### ######   #####
     128% ##   ## #     # ##    #  #     #    #     # #     # #     #
     129% # # # # #     # # #   #  #     #    #     # #     # #
     130% #  #  # #     # #  #  #  #     #    #     # ######   #####
     131% #     # #     # #   # #  #     #    #     # #   #         #
     132% #     # #     # #    ##  #     #    #     # #    #  #     #
     133% #     # ####### #     # ###    #    ####### #     #  #####
    103134
    104135\subsection{Monitors}
     
    113144        }
    114145\end{lstlisting}
     146
     147%  #####     #    #       #
     148% #     #   # #   #       #
     149% #        #   #  #       #
     150% #       #     # #       #
     151% #       ####### #       #
     152% #     # #     # #       #
     153%  #####  #     # ####### #######
    115154
    116155\subsubsection{Call semantics} \label{call}
     
    148187The problem is to indentify which object(s) should be acquired. Furthermore we also need to acquire each objects only once. In case of simple routines like \code{f1} and \code{f2} it is easy to identify an exhaustive list of objects to acquire on entering. Adding indirections (\code{f3}) still allows the compiler and programmer to indentify which object will be acquired. However, adding in arrays (\code{f4}) makes it much harder. Array lengths aren't necessarily known in C and even then making sure we only acquire objects once becomes also none trivial. This can be extended to absurd limits like \code{f5} which uses a custom graph of monitors. To keep everyone as sane as possible\cite{Chicken}, this projects imposes the requirement that a routine may only acquire one monitor per parameter and it must be the type of the parameter (ignoring potential qualifiers and indirections).
    149188
     189% ######     #    #######    #
     190% #     #   # #      #      # #
     191% #     #  #   #     #     #   #
     192% #     # #     #    #    #     #
     193% #     # #######    #    #######
     194% #     # #     #    #    #     #
     195% ######  #     #    #    #     #
     196
    150197\subsubsection{Data semantics} \label{data}
    151198Once the call semantics are established, the next step is to establish data semantics. Indeed, until now a monitor is used simply as a generic handle but in most cases monitors contian shared data. This data should be intrinsic to the monitor declaration to prevent any accidental use of data without its appripriate protection. For example here is a more fleshed-out version of the counter showed in \ref{call}:
     
    219266Recursive mutex routine calls are allowed in \CFA but if not done carefully it can lead to nested monitor call problems\cite{Lister77}. These problems which are a specific  implementation of the lock acquiring order problem. In the example above, the user uses implicit ordering in the case of function \code{bar} but explicit ordering in the case of \code{baz}. This subtle mistake can mean that calling these two functions concurrently will lead to deadlocks, depending on the implicit ordering matching the explicit ordering. As shown on several occasion\cit, there isn't really any solutions to this problem, users simply need to be carefull when acquiring multiple monitors at the same time.
    220267
     268% ######  ####### #######    #    ### #        #####
     269% #     # #          #      # #    #  #       #     #
     270% #     # #          #     #   #   #  #       #
     271% #     # #####      #    #     #  #  #        #####
     272% #     # #          #    #######  #  #             #
     273% #     # #          #    #     #  #  #       #     #
     274% ######  #######    #    #     # ### #######  #####
     275%
     276%             ######  ####### #       #     # #     # ####### ######  #     #
     277%             #     # #     # #        #   #  ##   ## #     # #     # #     #
     278%             #     # #     # #         # #   # # # # #     # #     # #     #
     279%  #####    ######  #     # #          #    #  #  # #     # ######  #######
     280%             #       #     # #          #    #     # #     # #   #   #     #
     281%             #       #     # #          #    #     # #     # #    #  #     #
     282%             #       ####### #######    #    #     # ####### #     # #     #
     283
    221284\subsubsection{Implementation Details: Interaction with polymorphism}
    222285At first glance, interaction between monitors and \CFA's concept of polymorphism seem complexe to support. However, it can be reasoned that entry-point locking can solve most of the issues that could be present with polymorphism.
    223286
    224287First of all, interaction between \code{otype} polymorphism and monitors is impossible since monitors do not support copying. Therefore the main question is how to support \code{dtype} polymorphism. We must remember that monitors' main purpose is to ensure mutual exclusion when accessing shared data. This implies that mutual exclusion is only required for routines that do in fact access shared data. However, since \code{dtype} polymorphism always handle incomplete types (by definition) no \code{dtype} polymorphic routine can access shared data since the data would require knowledge about the type. Therefore the only concern when combining \code{dtype} polymorphism and monitors is to protect access to routines. With callsite-locking, this would require significant amount of work since any \code{dtype} routine could have to obtain some lock before calling a routine. However, with entry-point-locking calling a monitor routine becomes exactly the same as calling it from anywhere else.
     288
     289% ### #     # #######         #####   #####  #     # ####### ######
     290%  #  ##    #    #           #     # #     # #     # #       #     #
     291%  #  # #   #    #           #       #       #     # #       #     #
     292%  #  #  #  #    #            #####  #       ####### #####   #     #
     293%  #  #   # #    #    ###          # #       #     # #       #     #
     294%  #  #    ##    #    ###    #     # #     # #     # #       #     #
     295% ### #     #    #    ###     #####   #####  #     # ####### ######
    225296
    226297\subsection{Internal scheduling} \label{insched}
     
    467538\\
    468539
     540% ####### #     # #######         #####   #####  #     # ####### ######
     541% #        #   #     #           #     # #     # #     # #       #     #
     542% #         # #      #           #       #       #     # #       #     #
     543% #####      #       #            #####  #       ####### #####   #     #
     544% #         # #      #    ###          # #       #     # #       #     #
     545% #        #   #     #    ###    #     # #     # #     # #       #     #
     546% ####### #     #    #    ###     #####   #####  #     # ####### ######
     547
    469548\subsection{External scheduling} \label{extsched}
    470549As one might expect, the alternative to Internal scheduling is to use External scheduling instead. This method is somewhat more robust to deadlocks since one of the threads keeps a relatively tight control on scheduling. Indeed, as the following examples will demonstrate, external scheduling allows users to wait for events from other threads without the concern of unrelated events occuring. External scheduling can generally be done either in terms of control flow (ex: \uC) or in terms of data (ex: Go). Of course, both of these paradigms have their own strenghts and weaknesses but for this project control flow semantics where chosen to stay consistent with the rest of the languages semantics. Two challenges specific to \CFA arise when trying to add external scheduling with loose object definitions and multi-monitor routines. The following example shows what a simple use \code{accept} versus \code{wait}/\code{signal} and its advantages.
     
    496575In the case of internal scheduling, the call to \code{wait} only guarantees that \code{g} was the last routine to access the monitor. This intails that the routine \code{f} may have acquired mutual exclusion several times while routine \code{h} was waiting. On the other hand, external scheduling guarantees that while routine \code{h} was waiting, no routine other than \code{g} could acquire the monitor.
    497576\\
     577
     578% #       ####### #######  #####  #######    ####### ######        #  #####
     579% #       #     # #     # #     # #          #     # #     #       # #     #
     580% #       #     # #     # #       #          #     # #     #       # #
     581% #       #     # #     #  #####  #####      #     # ######        #  #####
     582% #       #     # #     #       # #          #     # #     # #     #       #
     583% #       #     # #     # #     # #          #     # #     # #     # #     #
     584% ####### ####### #######  #####  #######    ####### ######   #####   #####
    498585
    499586\subsubsection{Loose object definitions}
     
    587674An other aspect to consider is what happens if multiple overloads of the same routine are used. For the time being it is assumed that multiple overloads of the same routine should be scheduled regardless of the overload used. However, this could easily be extended in the future.
    588675
     676% #     # #     # #       ####### ###    #     # ####### #     #
     677% ##   ## #     # #          #     #     ##   ## #     # ##    #
     678% # # # # #     # #          #     #     # # # # #     # # #   #
     679% #  #  # #     # #          #     #     #  #  # #     # #  #  #
     680% #     # #     # #          #     #     #     # #     # #   # #
     681% #     # #     # #          #     #     #     # #     # #    ##
     682% #     #  #####  #######    #    ###    #     # ####### #     #
     683
    589684\subsubsection{Multi-monitor scheduling}
    590685
     
    629724Note that the set of monitors passed to the \code{accept} statement must be entirely contained in the set of monitor already acquired in the routine. \code{accept} used in any other context is Undefined Behaviour.
    630725
     726% ######  ####### #######    #    ### #        #####
     727% #     # #          #      # #    #  #       #     #
     728% #     # #          #     #   #   #  #       #
     729% #     # #####      #    #     #  #  #        #####
     730% #     # #          #    #######  #  #             #
     731% #     # #          #    #     #  #  #       #     #
     732% ######  #######    #    #     # ### #######  #####
     733%
     734%                #####  #     # ####### #     # #######  #####
     735%             #     # #     # #       #     # #       #     #
     736%             #     # #     # #       #     # #       #
     737%    #####    #     # #     # #####   #     # #####    #####
     738%             #   # # #     # #       #     # #             #
     739%             #    #  #     # #       #     # #       #     #
     740%                #### #  #####  #######  #####  #######  #####
     741
     742
    631743\subsubsection{Implementation Details: External scheduling queues}
    632744To support multi-monitor external scheduling means that some kind of entry-queues must be used that is aware of both monitors. However, acceptable routines must be aware of the entry queues which means they must be stored inside at least one of the monitors that will be acquired. This in turn adds the requirement a systematic algorithm of disambiguating which queue is relavant regardless of user ordering. The proposed algorithm is to fall back on monitors lock ordering and specify that the monitor that is acquired first is the lock with the relevant entry queue. This assumes that the lock acquiring order is static for the lifetime of all concerned objects but that is a reasonnable constraint. This algorithm choice has two consequences, the entry queue of the highest priority monitor is no longer a true FIFO queue and the queue of the lowest priority monitor is both required and probably unused. The queue can no longer be a FIFO queue because instead of simply containing the waiting threads in order arrival, they also contain the second mutex. Therefore, another thread with the same highest priority monitor but a different lowest priority monitor may arrive first but enter the critical section after a thread with the correct pairing. Secondly, since it may not be known at compile time which monitor will be the lowest priority monitor, every monitor needs to have the correct queues even though it is probable that half the multi-monitor queues will go unused for the entire duration of the program.
     
    636748
    637749\newpage
     750% ######     #    ######     #    #       #       ####### #       ###  #####  #     #
     751% #     #   # #   #     #   # #   #       #       #       #        #  #     # ##   ##
     752% #     #  #   #  #     #  #   #  #       #       #       #        #  #       # # # #
     753% ######  #     # ######  #     # #       #       #####   #        #   #####  #  #  #
     754% #       ####### #   #   ####### #       #       #       #        #        # #     #
     755% #       #     # #    #  #     # #       #       #       #        #  #     # #     #
     756% #       #     # #     # #     # ####### ####### ####### ####### ###  #####  #     #
    638757\section{Parallelism}
    639758Historically, computer performance was about processor speeds and instructions count. However, with heat dissipation being an ever growing challenge, parallelism has become the new source of greatest performance \cite{Sutter05, Sutter05b}. In this decade, it is not longer reasonnable to create high-performance application without caring about parallelism. Indeed, parallelism is an important aspect of performance and more specifically throughput and hardware utilization. The lowest level approach of parallelism is to use \glspl{kthread}. However since these have significant costs and limitations \glspl{kthread} are now mostly used as an implementation tool rather than a user oriented one. There are several alternatives to solve these issues which all have strengths and weaknesses.
     
    654773\subsection{Paradigm performance}
    655774While the choice between the three paradigms listed above may have significant performance implication, it is difficult to pin the performance implications of chosing a model at the language level. Indeed, in many situations one of these paradigms will show better performance but it all strongly depends on the usage. Having mostly indepent units of work to execute almost guarantess that the \gls{job} based system will have the best performance. However, add interactions between jobs and the processor utilisation might suffer. User-level threads may allow maximum ressource utilisation but context switches will be more expansive and it is also harder for users to get perfect tunning. As with every example, fibers sit somewhat in the middle of the spectrum. Furthermore, if the units of uninterrupted work are large enough the paradigm choice will be largely amorticised by the actual work done.
     775
     776%  #####  #######    #          ####### ######  ######
     777% #     # #         # #            #    #     # #     #
     778% #       #        #   #           #    #     # #     #
     779% #       #####   #     # #####    #    ######  ######
     780% #       #       #######          #    #     # #     #
     781% #     # #       #     #          #    #     # #     #
     782%  #####  #       #     #          #    ######  ######
    656783
    657784\section{\CFA 's Thread Building Blocks}
     
    672799
    673800As shown in section \ref{cfaparadigms} these different blocks being available in \CFA it is trivial to reproduce any of these paradigm.
     801
     802% ####### #     # ######  #######    #    ######   #####
     803%    #    #     # #     # #         # #   #     # #     #
     804%    #    #     # #     # #        #   #  #     # #
     805%    #    ####### ######  #####   #     # #     #  #####
     806%    #    #     # #   #   #       ####### #     #       #
     807%    #    #     # #    #  #       #     # #     # #     #
     808%    #    #     # #     # ####### #     # ######   #####
    674809
    675810\subsection{Thread Interface}
     
    843978% \textbf{\large{Work in progress...}} Do wee need something beyond specifying the number of kernel threads?
    844979
     980%    #    #       #
     981%   # #   #       #
     982%  #   #  #       #
     983% #     # #       #
     984% ####### #       #
     985% #     # #       #
     986% #     # ####### #######
    845987\section{Putting it all together}
    846988
     989
     990
     991
     992
     993
     994
     995
     996
     997
     998% ####### #     # ####### #     # ######  #######
     999% #       #     #    #    #     # #     # #
     1000% #       #     #    #    #     # #     # #
     1001% #####   #     #    #    #     # ######  #####
     1002% #       #     #    #    #     # #   #   #
     1003% #       #     #    #    #     # #    #  #
     1004% #        #####     #     #####  #     # ######
    8471005\section{Future work}
    8481006Concurrency and parallelism is still a very active field that strongly benefits from hardware advances. As such certain features that aren't necessarily mature enough in their current state could become relevant in the lifetime of \CFA.
    8491007\subsection{Transactions}
    8501008
     1009% ####### #     # ######
     1010% #       ##    # #     #
     1011% #       # #   # #     #
     1012% #####   #  #  # #     #
     1013% #       #   # # #     #
     1014% #       #    ## #     #
     1015% ####### #     # ######
    8511016\section*{Acknowledgements}
    8521017
     
    8571022\clearpage
    8581023\bibliographystyle{plain}
    859 \bibliography{pl,local}
     1024\bibliography{cw92,distSharedMem,lfp92,mlw92,parallel,parallelIO,partheory,pl,pldi92,ps,realtime,techreportsPAB,visual,local}
    8601025
    8611026
  • doc/proposals/concurrency/version

    r837f999 rf849c8e  
    1 0.4.95
     10.4.99
Note: See TracChangeset for help on using the changeset viewer.