- Timestamp:
- Oct 28, 2016, 1:11:12 PM (8 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
- Children:
- 4b1afb6
- Parents:
- 837f999 (diff), 0ebac75 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - Location:
- doc/proposals/concurrency
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/proposals/concurrency/concurrency.tex
r837f999 rf849c8e 90 90 91 91 \maketitle 92 93 % ### # # ####### ###### ####### 94 % # ## # # # # # # 95 % # # # # # # # # # 96 % # # # # # ###### # # 97 % # # # # # # # # # 98 % # # ## # # # # # 99 % ### # # # # # ####### 100 92 101 \section{Introduction} 93 102 This proposal provides a minimal core concurrency API that is both simple, efficient and can be reused to build higher-level features. The simplest possible core is a thread and a lock but this low-level approach is hard to master. An easier approach for users is to support higher-level construct as the basis of the concurrency in \CFA. … … 96 105 There are actually two problems that need to be solved in the design of the concurrency for a language. Which concurrency tools are available to the users and which parallelism tools are available. While these two concepts are often seen together, they are in fact distinct concepts that require different sorts of tools\cite{Buhr05a}. Concurrency tools need to handle mutual exclusion and synchronization while parallelism tools are more about performance, cost and resource utilization. 97 106 107 % ##### ####### # # ##### # # ###### ###### ####### # # ##### # # 108 % # # # # ## # # # # # # # # # # ## # # # # # 109 % # # # # # # # # # # # # # # # # # # # # 110 % # # # # # # # # # ###### ###### ##### # # # # # 111 % # # # # # # # # # # # # # # # # # # # 112 % # # # # # ## # # # # # # # # # # ## # # # 113 % ##### ####### # # ##### ##### # # # # ####### # # ##### # 114 98 115 \section{Concurrency} 99 Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared-state (Erlang\cite{Erlang}, Haskell\cite{Haskell}, Akka (Scala)\cite{Akka}). In these paradigms, interaction among concurrent objects rely on message passing or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages, these approaches entail a clear distinction between concurrent and non-concurrent paradigms (i.e. message passing versus routine call). Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. However, for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. This project proposes Monitors\cite{Hoare74} as the core concurrency construct. 116 % Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared-state (Erlang\cite{Erlang}, Haskell\cite{Haskell}, Akka (Scala)\cite{Akka}). In these paradigms, interaction among concurrent objects rely on message passing or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages, these approaches entail a clear distinction between concurrent and non-concurrent paradigms (i.e. message passing versus routine call). Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. However, for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. This project proposes Monitors\cite{Hoare74} as the core concurrency construct. 117 % \\ 118 119 Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared-state (Erlang\cite{Erlang}, Haskell\cite{Haskell}, Akka (Scala)\cite{Akka}). In these paradigms, interaction among concurrent objects rely on message passing\cite{Thoth,Harmony,V-Kernel} or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages, these approaches entail a clear distinction between concurrent and non-concurrent paradigms (i.e. message passing versus routine call). Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. Many such mechanisms have been proposed, including semaphores~\cite{Dijkstra68b} and path expressions~\cite{Campbell74}. However, for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. One of the most natural, elegant, and efficient mechanisms for synchronization and communication, especially for shared memory systems, is the \emph{monitor}. 120 121 Monitors were first proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}. 122 Many programming languages---e.g., Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Modula~\cite{Modula-2}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, NeWS~\cite{NeWS}, Emerald~\cite{Emerald}, \uC~\cite{Buhr92a} and Java~\cite{Java}---provide monitors as explicit language constructs. In addition, operating-system kernels and device drivers have a monitor-like structure, although they often use lower-level primitives such as semaphores or locks to simulate monitors. For these reasons, this project proposes Monitors as the core concurrency construct. 100 123 \\ 101 124 102 125 Finally, an approach that is worth mentionning because it is gaining in popularity is transactionnal memory\cite{Dice10}. However, the performance and feature set is currently too restrictive to be possible to add such a paradigm to a language like C or \CC\cit, which is why it was rejected as the core paradigm for concurrency in \CFA. 126 127 % # # ####### # # ### ####### ####### ###### ##### 128 % ## ## # # ## # # # # # # # # # 129 % # # # # # # # # # # # # # # # # 130 % # # # # # # # # # # # # ###### ##### 131 % # # # # # # # # # # # # # # 132 % # # # # # ## # # # # # # # # 133 % # # ####### # # ### # ####### # # ##### 103 134 104 135 \subsection{Monitors} … … 113 144 } 114 145 \end{lstlisting} 146 147 % ##### # # # 148 % # # # # # # 149 % # # # # # 150 % # # # # # 151 % # ####### # # 152 % # # # # # # 153 % ##### # # ####### ####### 115 154 116 155 \subsubsection{Call semantics} \label{call} … … 148 187 The problem is to indentify which object(s) should be acquired. Furthermore we also need to acquire each objects only once. In case of simple routines like \code{f1} and \code{f2} it is easy to identify an exhaustive list of objects to acquire on entering. Adding indirections (\code{f3}) still allows the compiler and programmer to indentify which object will be acquired. However, adding in arrays (\code{f4}) makes it much harder. Array lengths aren't necessarily known in C and even then making sure we only acquire objects once becomes also none trivial. This can be extended to absurd limits like \code{f5} which uses a custom graph of monitors. To keep everyone as sane as possible\cite{Chicken}, this projects imposes the requirement that a routine may only acquire one monitor per parameter and it must be the type of the parameter (ignoring potential qualifiers and indirections). 149 188 189 % ###### # ####### # 190 % # # # # # # # 191 % # # # # # # # 192 % # # # # # # # 193 % # # ####### # ####### 194 % # # # # # # # 195 % ###### # # # # # 196 150 197 \subsubsection{Data semantics} \label{data} 151 198 Once the call semantics are established, the next step is to establish data semantics. Indeed, until now a monitor is used simply as a generic handle but in most cases monitors contian shared data. This data should be intrinsic to the monitor declaration to prevent any accidental use of data without its appripriate protection. For example here is a more fleshed-out version of the counter showed in \ref{call}: … … 219 266 Recursive mutex routine calls are allowed in \CFA but if not done carefully it can lead to nested monitor call problems\cite{Lister77}. These problems which are a specific implementation of the lock acquiring order problem. In the example above, the user uses implicit ordering in the case of function \code{bar} but explicit ordering in the case of \code{baz}. This subtle mistake can mean that calling these two functions concurrently will lead to deadlocks, depending on the implicit ordering matching the explicit ordering. As shown on several occasion\cit, there isn't really any solutions to this problem, users simply need to be carefull when acquiring multiple monitors at the same time. 220 267 268 % ###### ####### ####### # ### # ##### 269 % # # # # # # # # # # 270 % # # # # # # # # # 271 % # # ##### # # # # # ##### 272 % # # # # ####### # # # 273 % # # # # # # # # # # 274 % ###### ####### # # # ### ####### ##### 275 % 276 % ###### ####### # # # # # ####### ###### # # 277 % # # # # # # # ## ## # # # # # # 278 % # # # # # # # # # # # # # # # # # 279 % ##### ###### # # # # # # # # # ###### ####### 280 % # # # # # # # # # # # # # 281 % # # # # # # # # # # # # # 282 % # ####### ####### # # # ####### # # # # 283 221 284 \subsubsection{Implementation Details: Interaction with polymorphism} 222 285 At first glance, interaction between monitors and \CFA's concept of polymorphism seem complexe to support. However, it can be reasoned that entry-point locking can solve most of the issues that could be present with polymorphism. 223 286 224 287 First of all, interaction between \code{otype} polymorphism and monitors is impossible since monitors do not support copying. Therefore the main question is how to support \code{dtype} polymorphism. We must remember that monitors' main purpose is to ensure mutual exclusion when accessing shared data. This implies that mutual exclusion is only required for routines that do in fact access shared data. However, since \code{dtype} polymorphism always handle incomplete types (by definition) no \code{dtype} polymorphic routine can access shared data since the data would require knowledge about the type. Therefore the only concern when combining \code{dtype} polymorphism and monitors is to protect access to routines. With callsite-locking, this would require significant amount of work since any \code{dtype} routine could have to obtain some lock before calling a routine. However, with entry-point-locking calling a monitor routine becomes exactly the same as calling it from anywhere else. 288 289 % ### # # ####### ##### ##### # # ####### ###### 290 % # ## # # # # # # # # # # # 291 % # # # # # # # # # # # # 292 % # # # # # ##### # ####### ##### # # 293 % # # # # # ### # # # # # # # 294 % # # ## # ### # # # # # # # # # 295 % ### # # # ### ##### ##### # # ####### ###### 225 296 226 297 \subsection{Internal scheduling} \label{insched} … … 467 538 \\ 468 539 540 % ####### # # ####### ##### ##### # # ####### ###### 541 % # # # # # # # # # # # # # 542 % # # # # # # # # # # # 543 % ##### # # ##### # ####### ##### # # 544 % # # # # ### # # # # # # # 545 % # # # # ### # # # # # # # # # 546 % ####### # # # ### ##### ##### # # ####### ###### 547 469 548 \subsection{External scheduling} \label{extsched} 470 549 As one might expect, the alternative to Internal scheduling is to use External scheduling instead. This method is somewhat more robust to deadlocks since one of the threads keeps a relatively tight control on scheduling. Indeed, as the following examples will demonstrate, external scheduling allows users to wait for events from other threads without the concern of unrelated events occuring. External scheduling can generally be done either in terms of control flow (ex: \uC) or in terms of data (ex: Go). Of course, both of these paradigms have their own strenghts and weaknesses but for this project control flow semantics where chosen to stay consistent with the rest of the languages semantics. Two challenges specific to \CFA arise when trying to add external scheduling with loose object definitions and multi-monitor routines. The following example shows what a simple use \code{accept} versus \code{wait}/\code{signal} and its advantages. … … 496 575 In the case of internal scheduling, the call to \code{wait} only guarantees that \code{g} was the last routine to access the monitor. This intails that the routine \code{f} may have acquired mutual exclusion several times while routine \code{h} was waiting. On the other hand, external scheduling guarantees that while routine \code{h} was waiting, no routine other than \code{g} could acquire the monitor. 497 576 \\ 577 578 % # ####### ####### ##### ####### ####### ###### # ##### 579 % # # # # # # # # # # # # # # # 580 % # # # # # # # # # # # # # 581 % # # # # # ##### ##### # # ###### # ##### 582 % # # # # # # # # # # # # # # 583 % # # # # # # # # # # # # # # # # 584 % ####### ####### ####### ##### ####### ####### ###### ##### ##### 498 585 499 586 \subsubsection{Loose object definitions} … … 587 674 An other aspect to consider is what happens if multiple overloads of the same routine are used. For the time being it is assumed that multiple overloads of the same routine should be scheduled regardless of the overload used. However, this could easily be extended in the future. 588 675 676 % # # # # # ####### ### # # ####### # # 677 % ## ## # # # # # ## ## # # ## # 678 % # # # # # # # # # # # # # # # # # # 679 % # # # # # # # # # # # # # # # # 680 % # # # # # # # # # # # # # # 681 % # # # # # # # # # # # # ## 682 % # # ##### ####### # ### # # ####### # # 683 589 684 \subsubsection{Multi-monitor scheduling} 590 685 … … 629 724 Note that the set of monitors passed to the \code{accept} statement must be entirely contained in the set of monitor already acquired in the routine. \code{accept} used in any other context is Undefined Behaviour. 630 725 726 % ###### ####### ####### # ### # ##### 727 % # # # # # # # # # # 728 % # # # # # # # # # 729 % # # ##### # # # # # ##### 730 % # # # # ####### # # # 731 % # # # # # # # # # # 732 % ###### ####### # # # ### ####### ##### 733 % 734 % ##### # # ####### # # ####### ##### 735 % # # # # # # # # # # 736 % # # # # # # # # # 737 % ##### # # # # ##### # # ##### ##### 738 % # # # # # # # # # # 739 % # # # # # # # # # # 740 % #### # ##### ####### ##### ####### ##### 741 742 631 743 \subsubsection{Implementation Details: External scheduling queues} 632 744 To support multi-monitor external scheduling means that some kind of entry-queues must be used that is aware of both monitors. However, acceptable routines must be aware of the entry queues which means they must be stored inside at least one of the monitors that will be acquired. This in turn adds the requirement a systematic algorithm of disambiguating which queue is relavant regardless of user ordering. The proposed algorithm is to fall back on monitors lock ordering and specify that the monitor that is acquired first is the lock with the relevant entry queue. This assumes that the lock acquiring order is static for the lifetime of all concerned objects but that is a reasonnable constraint. This algorithm choice has two consequences, the entry queue of the highest priority monitor is no longer a true FIFO queue and the queue of the lowest priority monitor is both required and probably unused. The queue can no longer be a FIFO queue because instead of simply containing the waiting threads in order arrival, they also contain the second mutex. Therefore, another thread with the same highest priority monitor but a different lowest priority monitor may arrive first but enter the critical section after a thread with the correct pairing. Secondly, since it may not be known at compile time which monitor will be the lowest priority monitor, every monitor needs to have the correct queues even though it is probable that half the multi-monitor queues will go unused for the entire duration of the program. … … 636 748 637 749 \newpage 750 % ###### # ###### # # # ####### # ### ##### # # 751 % # # # # # # # # # # # # # # # ## ## 752 % # # # # # # # # # # # # # # # # # # 753 % ###### # # ###### # # # # ##### # # ##### # # # 754 % # ####### # # ####### # # # # # # # # 755 % # # # # # # # # # # # # # # # # 756 % # # # # # # # ####### ####### ####### ####### ### ##### # # 638 757 \section{Parallelism} 639 758 Historically, computer performance was about processor speeds and instructions count. However, with heat dissipation being an ever growing challenge, parallelism has become the new source of greatest performance \cite{Sutter05, Sutter05b}. In this decade, it is not longer reasonnable to create high-performance application without caring about parallelism. Indeed, parallelism is an important aspect of performance and more specifically throughput and hardware utilization. The lowest level approach of parallelism is to use \glspl{kthread}. However since these have significant costs and limitations \glspl{kthread} are now mostly used as an implementation tool rather than a user oriented one. There are several alternatives to solve these issues which all have strengths and weaknesses. … … 654 773 \subsection{Paradigm performance} 655 774 While the choice between the three paradigms listed above may have significant performance implication, it is difficult to pin the performance implications of chosing a model at the language level. Indeed, in many situations one of these paradigms will show better performance but it all strongly depends on the usage. Having mostly indepent units of work to execute almost guarantess that the \gls{job} based system will have the best performance. However, add interactions between jobs and the processor utilisation might suffer. User-level threads may allow maximum ressource utilisation but context switches will be more expansive and it is also harder for users to get perfect tunning. As with every example, fibers sit somewhat in the middle of the spectrum. Furthermore, if the units of uninterrupted work are large enough the paradigm choice will be largely amorticised by the actual work done. 775 776 % ##### ####### # ####### ###### ###### 777 % # # # # # # # # # # 778 % # # # # # # # # # 779 % # ##### # # ##### # ###### ###### 780 % # # ####### # # # # # 781 % # # # # # # # # # # 782 % ##### # # # # ###### ###### 656 783 657 784 \section{\CFA 's Thread Building Blocks} … … 672 799 673 800 As shown in section \ref{cfaparadigms} these different blocks being available in \CFA it is trivial to reproduce any of these paradigm. 801 802 % ####### # # ###### ####### # ###### ##### 803 % # # # # # # # # # # # # 804 % # # # # # # # # # # # 805 % # ####### ###### ##### # # # # ##### 806 % # # # # # # ####### # # # 807 % # # # # # # # # # # # # 808 % # # # # # ####### # # ###### ##### 674 809 675 810 \subsection{Thread Interface} … … 843 978 % \textbf{\large{Work in progress...}} Do wee need something beyond specifying the number of kernel threads? 844 979 980 % # # # 981 % # # # # 982 % # # # # 983 % # # # # 984 % ####### # # 985 % # # # # 986 % # # ####### ####### 845 987 \section{Putting it all together} 846 988 989 990 991 992 993 994 995 996 997 998 % ####### # # ####### # # ###### ####### 999 % # # # # # # # # # 1000 % # # # # # # # # # 1001 % ##### # # # # # ###### ##### 1002 % # # # # # # # # # 1003 % # # # # # # # # # 1004 % # ##### # ##### # # ###### 847 1005 \section{Future work} 848 1006 Concurrency and parallelism is still a very active field that strongly benefits from hardware advances. As such certain features that aren't necessarily mature enough in their current state could become relevant in the lifetime of \CFA. 849 1007 \subsection{Transactions} 850 1008 1009 % ####### # # ###### 1010 % # ## # # # 1011 % # # # # # # 1012 % ##### # # # # # 1013 % # # # # # # 1014 % # # ## # # 1015 % ####### # # ###### 851 1016 \section*{Acknowledgements} 852 1017 … … 857 1022 \clearpage 858 1023 \bibliographystyle{plain} 859 \bibliography{ pl,local}1024 \bibliography{cw92,distSharedMem,lfp92,mlw92,parallel,parallelIO,partheory,pl,pldi92,ps,realtime,techreportsPAB,visual,local} 860 1025 861 1026 -
doc/proposals/concurrency/version
r837f999 rf849c8e 1 0.4.9 51 0.4.99
Note: See TracChangeset
for help on using the changeset viewer.