Changeset a3eaa29 for doc/proposals
- Timestamp:
- Oct 18, 2016, 12:53:43 PM (8 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
- Children:
- b512454
- Parents:
- 9b4343e
- Location:
- doc/proposals/concurrency
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/proposals/concurrency/concurrency.tex
r9b4343e ra3eaa29 71 71 \setcounter{secnumdepth}{3} % number subsubsections 72 72 \setcounter{tocdepth}{3} % subsubsections in table of contents 73 \linenumbers % comment out to turn off line numbering73 % \linenumbers % comment out to turn off line numbering 74 74 \makeindex 75 75 \pagestyle{fancy} … … 90 90 \maketitle 91 91 \section{Introduction} 92 This proposal provides a minimal core concurrency API that is both simple, efficient and can be reused to build higher-level features. The simplest possible core is a thread and a lock but this low-level approach is hard to master. An easier approach for users is beto support higher-level construct as the basis of the concurrency in \CFA.92 This proposal provides a minimal core concurrency API that is both simple, efficient and can be reused to build higher-level features. The simplest possible core is a thread and a lock but this low-level approach is hard to master. An easier approach for users is to support higher-level construct as the basis of the concurrency in \CFA. 93 93 Indeed, for highly productive parallel programming high-level approaches are much more popular\cite{HPP:Study}. Examples are task based parallelism, message passing, implicit threading. 94 94 … … 96 96 97 97 \section{Concurrency} 98 Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared-state (Erlang\cite{Erlang}, Haskell\cite{Haskell}, Akka (Scala)\cit). In these paradigms, interaction among concurrent objects rely on message passing or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages, these approaches entail a clear distinction between concurrent and non-concurrent paradigms (i.e. message passing versus routine call). Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. However, for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. This project proposes Monitors\cit as the core concurrency construct. 98 Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared-state (Erlang\cite{Erlang}, Haskell\cite{Haskell}, Akka (Scala)\cite{Akka}). In these paradigms, interaction among concurrent objects rely on message passing or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages, these approaches entail a clear distinction between concurrent and non-concurrent paradigms (i.e. message passing versus routine call). Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. However, for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. This project proposes Monitors\cite{Hoare74} as the core concurrency construct. 99 \\ 99 100 100 101 Finally, an approach that is worth mentionning because it is gaining in popularity is transactionnal memory\cite{Dice10}. However, the performance and feature set is currently too restrictive to be possible to add such a paradigm to a language like C or \CC\cit, which is why it was rejected as the core paradigm for concurrency in \CFA. … … 114 115 \subsection{Call semantics} \label{call} 115 116 The above example of monitors already displays some of their intrinsic caracteristics. Indeed, it is necessary to use pass-by-reference over pass-by-value for monitor routines. This semantics is important because at their core, monitors are implicit mutual exclusion objects (locks), and these objects cannot be copied. Therefore, monitors are implicitly non-copyable. 117 \\ 116 118 117 119 Another aspect to consider is when a monitor acquires its mutual exclusion. Indeed, a monitor may need to be passed through multiple helper routines that do not acquire the monitor mutual exclusion on entry. Examples of this can be both generic helper routines (\code{swap}, \code{sort}, etc.) or specific helper routines like the following example : … … 125 127 \end{lstlisting} 126 128 *semantics of the declaration of \code{mutex struct counter_t} are discussed in details in section \ref{data} 129 \\ 127 130 128 131 This example is of a monitor implementing an atomic counter. Here, the constructor uses the \code{nomutex} keyword to signify that it does not acquire the coroutine mutual exclusion when constructing. This is because object not yet constructed should never be shared and therefore do not require mutual exclusion. The prefix increment operator 129 132 uses \code{mutex} to protect the incrementing process from race conditions. Finally, we have a conversion operator from \code{counter_t} to \code{Int}. This conversion may or may not require the \code{mutex} key word depending whether or not reading an \code{Int} is an atomic operation or not. 133 \\ 130 134 131 135 Having both \code{mutex} and \code{nomutex} keywords could be argued to be redundant based on the meaning of a routine having neither of these keywords. If there were a meaning to routine \code{void foo(counter_t & this)} then one could argue that it should be to default to the safest option : \code{mutex}. On the other hand, the option of having routine \code{void foo(counter_t & this)} mean \code{nomutex} is unsafe by default and may easily cause subtle errors. It can be argued that this is the more "normal" behavior, \code{nomutex} effectively stating explicitly that "this routine has nothing special". An other alternative is to make one of these keywords mandatory, which would provide the same semantics but without the ambiguity of supporting routine \code{void foo(counter_t & this)}. Mandatory keywords would also have the added benefice of being more clearly self-documented but at the cost of extra typing. In the end, which solution should be picked is still up for debate. For the reminder of this proposal, the explicit approach will be used for the sake of clarity. 136 \\ 132 137 133 138 Regardless of which keyword is kept, it is important to establish when mutex/nomutex may be used depending on type parameters. … … 140 145 \end{lstlisting} 141 146 142 The problem is to indentify which object(s) should be acquired. Furthermore we also need to acquire each objects only once. In case of simple routines like \code{f1} and \code{f2} it is easy to identify an exha stive list of objects to acquire on entering. Adding references (\code{f3}) still allows the compiler and programmer to indentify which object will be acquired. However, adding in arrays (\code{f4}) makes it much harder. Array lengths aren't necessarily known in C and even then making sure we only acquire objects once becomes also none trivial. This can be extended to absurd limits like \code{f5} which uses a custom graph of monitors. To keep everyone as sane as possible, this projects to imposes the requirement that a routine may only acquire one monitor per parameter and it must be the type of the parameter with potential qualifiers and references.147 The problem is to indentify which object(s) should be acquired. Furthermore we also need to acquire each objects only once. In case of simple routines like \code{f1} and \code{f2} it is easy to identify an exhaustive list of objects to acquire on entering. Adding indirections (\code{f3}) still allows the compiler and programmer to indentify which object will be acquired. However, adding in arrays (\code{f4}) makes it much harder. Array lengths aren't necessarily known in C and even then making sure we only acquire objects once becomes also none trivial. This can be extended to absurd limits like \code{f5} which uses a custom graph of monitors. To keep everyone as sane as possible\cite{Chicken}, this projects imposes the requirement that a routine may only acquire one monitor per parameter and it must be the type of the parameter (ignoring potential qualifiers and indirections). 143 148 144 149 \subsection{Data semantics} \label{data} … … 192 197 \end{lstlisting} 193 198 194 This code acquires both locks before entering the critical section. In practice, writing multi-locking routines that can lead to deadlocks can be very tricky. Having language level support for such feature is therefore a significant asset for \CFA. However, this does have significant repercussions relating to scheduling (see \ref{insched} and \ref{extsched}). The ability to acquire multiple monitors at the same time does incur a significant pitfall even without looking into scheduling. For example :199 This code acquires both locks before entering the critical section. In practice, writing multi-locking routines that can not lead to deadlocks can be very tricky. Having language level support for such feature is therefore a significant asset for \CFA. However, this does have significant repercussions relating to scheduling (see \ref{insched} and \ref{extsched}). Furthermore, the ability to acquire multiple monitors at the same time does incur a significant pitfall even without looking into scheduling. For example : 195 200 \begin{lstlisting} 196 201 void foo(A & mutex a, B & mutex a) { … … 211 216 \end{lstlisting} 212 217 213 % TODO 214 TODO: dig further into monitor order aquiring 215 216 Thoughs : calls to \code{baz} and \code{bar} are definitely incompatible because they explicitly acquire locks in reverse order and therefore are explicitly asking for a deadlock. The best that can be done in this situatuin is to detect the deadlock. The case of implicit ordering is less clear because in the case of monitors the runtime system \textit{may} be smart enough to figure out that someone is waiting with explicit ordering... maybe. 217 218 \subsubsection{Internal scheduling} \label{insched} 218 Recursive mutex routine calls are allowed in \CFA but if not done carefully it can lead to nested monitor call problems\cite{Lister77}. These problems which are a specific implementation of the lock acquiring order problem. In the example above, the user uses implicit ordering in the case of function \code{bar} but explicit ordering in the case of \code{baz}. This subtle mistake can mean that calling these two functions concurrently will lead to deadlocks, depending on the implicit ordering matching the explicit ordering. As shown on several occasion\cit, there isn't really any solutions to this problem, users simply need to be carefull when acquiring multiple monitors at the same time. 219 220 \subsection{Internal scheduling} \label{insched} 219 221 Monitors should also be able to schedule what threads access it as a mean of synchronization. Internal scheduling is one of the simple examples of such a feature. It allows users to declare condition variables and wait for them to be signaled. Here is a simple example of such a technique : 220 222 … … 262 264 \end{center} 263 265 264 A direct extension of the single monitor semantics would be to release all locks when waiting and transferring ownership of all locks when signalling. However, for the purpose of synchronization it may be usefull to only release some of the locks but keep others. On the technical side, partially releasing lock is feasible but from the user perspective a choice must be made for the syntax of this feature. It is possible to do without any extra syntax by relying on order of acquisition (Note that here the use of helper routines is irrelevant here, only routines the acquire mutual exclusion have an impact on internal scheduling):266 A direct extension of the single monitor semantics would be to release all locks when waiting and transferring ownership of all locks when signalling. However, for the purpose of synchronization it may be usefull to only release some of the locks but keep others. On the technical side, partially releasing lock is feasible but from the user perspective a choice must be made for the syntax of this feature. It is possible to do without any extra syntax by relying on order of acquisition (Note that here the use of helper routines is irrelevant, only routines the acquire mutual exclusion have an impact on internal scheduling): 265 267 266 268 \begin{center} … … 315 317 316 318 This can be interpreted in two different ways : 319 \begin{flushleft} 317 320 \begin{enumerate} 318 \item \code{wait} atomically releases the monitors acquired by the inner-most mutexroutine, \underline{ignoring} nested calls.319 \item \code{wait} atomically releases the monitors acquired by the inner-most mutexroutine, \underline{considering} nested calls.321 \item \code{wait} atomically releases the monitors acquired by the inner-most routine, \underline{ignoring} nested calls. 322 \item \code{wait} atomically releases the monitors acquired by the inner-most routine, \underline{considering} nested calls. 320 323 \end{enumerate} 321 While the difference between these two is subtle, it has a significant impact. In the first case it means that the calls to \code{foo} would behave the same in Context 1 and 2. This semantic would also mean that the call to \code{wait} in routine \code{baz} would only release \code{monitor b}. While this may seem intuitive with these examples, it does have one significant implication, it creates a strong distinction between acquiring multiple monitors in sequence and acquiring the same monitors simulatenously. 324 \end{flushleft} 325 While the difference between these two is subtle, it has a significant impact. In the first case it means that the calls to \code{foo} would behave the same in Context 1 and 2. This semantic would also mean that the call to \code{wait} in routine \code{baz} would only release \code{monitor b}. While this may seem intuitive with these examples, it does have one significant implication, it creates a strong distinction between acquiring multiple monitors in sequence and acquiring the same monitors simulatenously, i.e. : 322 326 323 327 \begin{center} … … 339 343 \end{center} 340 344 341 This is not intuitive because even if both methods display the same monitors state both inside and outside the critical section respectively, the behavior is different. Furthermore, the actual acquiring order will be exaclty the same since acquiring a monitor from inside its mutual exclusion is a no-op. This means that even if the data and the actual control flow are the same using both methods, the behavior of the \code{wait} will be different. The alternative is option 2, that is releasing acquired monitors, \underline{considering} nesting. This solves the issue of having the two acquiring method differ at the cost of making routine \code{foo} behave differently depending on from which context it is called (Context 1 or 2). Indeed in Context 2, routine \code{foo} actually behaves like routine \code{baz} rather than having the same behavior than in context 1. The fact that both implicit approaches can be unintuitive depending on the perspective may be a sign that the explicit approach is superior.345 This is not intuitive because even if both methods display the same monitors state both inside and outside the critical section respectively, the behavior is different. Furthermore, the actual acquiring order will be exaclty the same since acquiring a monitor from inside its mutual exclusion is a no-op. This means that even if the data and the actual control flow are the same using both methods, the behavior of the \code{wait} will be different. The alternative is option 2, that is releasing acquired monitors, \underline{considering} nesting. This solves the issue of having the two acquiring method differ at the cost of making routine \code{foo} behave differently depending on from which context it is called (Context 1 or 2). Indeed in Context 2, routine \code{foo} actually behaves like routine \code{baz} rather than having the same behavior than in Context 1. The fact that both implicit approaches can be unintuitive depending on the perspective may be a sign that the explicit approach is superior. For this reason this \CFA does not support implicit monitor releasing and uses explicit semantics. 342 346 \\ 343 347 … … 416 420 \\ 417 421 418 All these cases have the re pros and cons. Case 1 is more distinct because it means programmers need to be carefull about where the condition is initialized as well as where it is used. On the other hand, it is very clear and explicitwhich monitor is released and which monitor stays acquired. This is similar to Case 2, which releases only the monitors explictly listed. However, in Case 2, calling the \code{wait} routine instead of the \code{waitRelease} routine releases all the acquired monitor. The Case 3 is an improvement on that since it releases all the monitors except those specified. The result is that the \code{wait} routine can be written as follows :422 All these cases have their pros and cons. Case 1 is more distinct because it means programmers need to be carefull about where the condition is initialized as well as where it is used. On the other hand, it is very clear and explicitly states which monitor is released and which monitor stays acquired. This is similar to Case 2, which releases only the monitors explictly listed. However, in Case 2, calling the \code{wait} routine instead of the \code{waitRelease} routine releases all the acquired monitor. The Case 3 is an improvement on that since it releases all the monitors except those specified. The result is that the \code{wait} routine can be written as follows : 419 423 \begin{lstlisting} 420 424 void wait(condition & cond) { … … 455 459 456 460 Finally, an additionnal semantic which can be very usefull is the \code{signalBlock} routine. This routine behaves like signal for all of the semantics discussed above, but with the subtelty that mutual exclusion is transferred to the waiting task immediately rather than wating for the end of the critical section. 461 \\ 457 462 458 463 \subsection{External scheduling} \label{extsched} 459 As one might expect, the alternative to Internal scheduling is to use External scheduling instead. This method is somewhat more robust to deadlocks since one of the threads keeps a relatively tight control on scheduling. Indeed, as the following examples will demontrate, external scheduling allows users to wait for events from other threads without the concern of unrelated events occuring. External scheduling can generally be done either in terms of control flow (see \uC) or in terms of data (see Go). Of course, both of these paradigms have their own strenghts and weaknesses but for this project control flow semantics where chosen to stay consistent with the reset of the languages semantics. Two challenges specific to \CFA arise when trying to add external scheduling whi ch isloose object definitions and multi-monitor routines. The following example shows what a simple use \code{accept} versus \code{wait}/\code{signal} and its advantages.464 As one might expect, the alternative to Internal scheduling is to use External scheduling instead. This method is somewhat more robust to deadlocks since one of the threads keeps a relatively tight control on scheduling. Indeed, as the following examples will demontrate, external scheduling allows users to wait for events from other threads without the concern of unrelated events occuring. External scheduling can generally be done either in terms of control flow (see \uC) or in terms of data (see Go). Of course, both of these paradigms have their own strenghts and weaknesses but for this project control flow semantics where chosen to stay consistent with the reset of the languages semantics. Two challenges specific to \CFA arise when trying to add external scheduling whith loose object definitions and multi-monitor routines. The following example shows what a simple use \code{accept} versus \code{wait}/\code{signal} and its advantages. 460 465 461 466 \begin{center} … … 486 491 487 492 In the case of internal scheduling, the call to \code{wait} only guarantees that \code{g} was the last routine to access the monitor. This intails that the routine \code{f} may have acquired mutual exclusion several times while routine \code{h} was waiting. On the other hand, external scheduling guarantees that while routine \code{h} was waiting, no routine other than \code{g} could acquire the monitor. 493 \\ 488 494 489 495 \subsubsection{Loose object definitions} 490 In \uC monitor definitions include an exhaustive list of monitor operations. Since \CFA is not an object oriented it becomes much more difficult to implement but also muchless clear for the user :496 In \uC, monitor declarations include an exhaustive list of monitor operations. Since \CFA is not object oriented it becomes both more difficult to implement but also less clear for the user : 491 497 492 498 \begin{lstlisting} -
doc/proposals/concurrency/version
r9b4343e ra3eaa29 1 0.4. 01 0.4.22
Note: See TracChangeset
for help on using the changeset viewer.