Changeset 3c6e417 for doc/papers/concurrency
- Timestamp:
- Jun 20, 2019, 1:45:01 PM (6 years ago)
- Branches:
- ADT, arm-eh, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
- Children:
- 54dd994
- Parents:
- 1e5dedc4 (diff), c0f9efe (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/papers/concurrency/Paper.tex
r1e5dedc4 r3c6e417 311 311 Libraries like pthreads were developed for C, and the Solaris operating-system switched from user (JDK 1.1~\cite{JDK1.1}) to kernel threads. 312 312 As a result, languages like Java, Scala, Objective-C~\cite{obj-c-book}, \CCeleven~\cite{C11}, and C\#~\cite{Csharp} adopt the 1:1 kernel-threading model, with a variety of presentation mechanisms. 313 From 2000 onwards, languages like Go~\cite{Go}, Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, D~\cite{D}, and \uC~\cite{uC++,uC++book} have championed the M:N user-threading model, and many user-threading libraries have appeared~\cite{Qthreads,MPC, BoostThreads}, including putting green threads back into Java~\cite{Quasar}.313 From 2000 onwards, languages like Go~\cite{Go}, Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, D~\cite{D}, and \uC~\cite{uC++,uC++book} have championed the M:N user-threading model, and many user-threading libraries have appeared~\cite{Qthreads,MPC,Marcel}, including putting green threads back into Java~\cite{Quasar}. 314 314 The main argument for user-level threading is that they are lighter weight than kernel threads (locking and context switching do not cross the kernel boundary), so there is less restriction on programming styles that encourage large numbers of threads performing medium work-units to facilitate load balancing by the runtime~\cite{Verch12}. 315 315 As well, user-threading facilitates a simpler concurrency approach using thread objects that leverage sequential patterns versus events with call-backs~\cite{vonBehren03}. … … 327 327 328 328 Finally, it is important for a language to provide safety over performance \emph{as the default}, allowing careful reduction of safety for performance when necessary. 329 Two concurrency violations of this philosophy are \emph{spurious wakeup} and \emph{barging}, i.e., random wakeup~\cite[\S~8]{Buhr05a} and signals-as-hints~\cite[\S~8]{Buhr05a}, where one is a consequence of the other, i.e., once there is spurious wakeup, signals-as-hints follows.329 Two concurrency violations of this philosophy are \emph{spurious wakeup} (random wakeup~\cite[\S~8]{Buhr05a}) and \emph{barging} (signals-as-hints~\cite[\S~8]{Buhr05a}), where one is a consequence of the other, i.e., once there is spurious wakeup, signals-as-hints follows. 330 330 However, spurious wakeup is \emph{not} a foundational concurrency property~\cite[\S~8]{Buhr05a}, it is a performance design choice. 331 331 Similarly, signals-as-hints is often a performance decision. 332 332 We argue removing spurious wakeup and signals-as-hints makes concurrent programming significantly safer because it removes local non-determinism and matches with programmer expectation. 333 (Author sexperience teaching concurrency is that students are highly confused by these semantics.)333 (Author experience teaching concurrency is that students are highly confused by these semantics.) 334 334 Clawing back performance, when local non-determinism is unimportant, should be an option not the default. 335 335 … … 367 367 \section{Stateful Function} 368 368 369 The generator/coroutine provides a stateful function, which is an old idea~\cite{Conway63,Marlin80} that is new again~\cite{C++20Coroutine19}. 370 A stateful function allows execution to be temporarily suspended and later resumed, e.g., plugin, device driver, finite-state machine. 369 The stateful function is an old idea~\cite{Conway63,Marlin80} that is new again~\cite{C++20Coroutine19}, where execution is temporarily suspended and later resumed, e.g., plugin, device driver, finite-state machine. 371 370 Hence, a stateful function may not end when it returns to its caller, allowing it to be restarted with the data and execution location present at the point of suspension. 372 371 This capability is accomplished by retaining a data/execution \emph{closure} between invocations. … … 543 542 \end{figure} 544 543 545 For generators, coroutines, and threads, many designs are based on function objects or pointers~\cite{Butenhof97, C++14, MS:VisualC++, BoostCoroutines15}.544 Stateful functions appear as generators, coroutines, and threads, where presentations are based on function objects or pointers~\cite{Butenhof97, C++14, MS:VisualC++, BoostCoroutines15}. 546 545 For example, Python presents generators as a function object: 547 546 \begin{python} … … 587 586 588 587 Figure~\ref{f:FibonacciAsymmetricGenerator} shows an unbounded asymmetric generator for an infinite sequence of Fibonacci numbers written in C and \CFA, with a simple C implementation for the \CFA version. 589 This kind ofgenerator is an \emph{output generator}, producing a new result on each resumption.588 This generator is an \emph{output generator}, producing a new result on each resumption. 590 589 To compute Fibonacci, the previous two values in the sequence are retained to generate the next value, \ie @fn1@ and @fn@, plus the execution location where control restarts when the generator is resumed, \ie top or middle. 591 An additional requirement is the ability to create an arbitrary number of generators (of any kind), \ie retaining state in global variables is insufficient;590 An additional requirement is the ability to create an arbitrary number of generators (of any kind), \ie retaining one state in global variables is insufficient; 592 591 hence, state is retained in a closure between calls. 593 592 Figure~\ref{f:CFibonacci} shows the C approach of manually creating the closure in structure @Fib@, and multiple instances of this closure provide multiple Fibonacci generators. 594 The C version only has the middle execution state because the top execution state becomes declaration initialization.593 The C version only has the middle execution state because the top execution state is declaration initialization. 595 594 Figure~\ref{f:CFAFibonacciGen} shows the \CFA approach, which also has a manual closure, but replaces the structure with a custom \CFA @generator@ type. 596 595 This generator type is then connected to a function that \emph{must be named \lstinline|main|},\footnote{ … … 668 667 As well, the data state is small, where variables @byte@ and @msg@ are communication variables for passing in message bytes and returning the message, and variables @lnth@, @crc@, and @sum@ are local variable that must be retained between calls and are manually hoisted into the generator type. 669 668 % Manually, detecting and hoisting local-state variables is easy when the number is small. 670 Finally, the execution state is large, with one @resume@ and seven @suspend@s.669 In contrast, the execution state is large, with one @resume@ and seven @suspend@s. 671 670 Hence, the key benefits of the generator are correctness, safety, and maintenance because the execution states are transcribed directly into the programming language rather than using a table-driven approach. 672 671 Because FSMs can be complex and occur frequently in important domains, direct support of the generator is crucial in a systems programming-language. … … 780 779 Figure~\ref{f:CFAPingPongGen} shows a symmetric generator, where the generator resumes another generator, forming a resume/resume cycle. 781 780 (The trivial cycle is a generator resuming itself.) 782 This control flow is similar to recursion for functions ,but without stack growth.781 This control flow is similar to recursion for functions but without stack growth. 783 782 The steps for symmetric control-flow are creating, executing, and terminating the cycle. 784 783 Constructing the cycle must deal with definition-before-use to close the cycle, \ie, the first generator must know about the last generator, which is not within scope. … … 789 788 Terminating the cycle is accomplished by @suspend@ or @return@, both of which go back to the stack frame that started the cycle (program main in the example). 790 789 The starting stack-frame is below the last active generator because the resume/resume cycle does not grow the stack. 791 Also, since local variables are not retained in the generator function, it does not contain an objects with destructors that must be called, so the cost is the same as a function return.790 Also, since local variables are not retained in the generator function, it does not contain any objects with destructors that must be called, so the cost is the same as a function return. 792 791 Destructor cost occurs when the generator instance is deallocated, which is easily controlled by the programmer. 793 792 … … 1220 1219 Hence, the starter coroutine is remembered on the first resume and ending the coroutine resumes the starter. 1221 1220 Figure~\ref{f:ProdConsRuntimeStacks} shows this semantic by the dashed lines from the end of the coroutine mains: @prod@ starts @cons@ so @cons@ resumes @prod@ at the end, and the program main starts @prod@ so @prod@ resumes the program main at the end. 1222 For other scenarios, it is always possible to devise a solution with additional programming effort .1221 For other scenarios, it is always possible to devise a solution with additional programming effort, such as forcing the cycle forward (backward) to a safe point before starting termination. 1223 1222 1224 1223 The producer/consumer example does not illustrate the full power of the starter semantics because @cons@ always ends first. … … 1290 1289 The function definitions ensures there is a statically-typed @main@ function that is the starting point (first stack frame) of a coroutine, and a mechanism to get (read) the currently executing coroutine handle. 1291 1290 The @main@ function has no return value or additional parameters because the coroutine type allows an arbitrary number of interface functions with corresponding arbitrary typed input/output values versus fixed ones. 1292 The advantage of this approach is that users can easily create different types of coroutines, \eg changing the memory layout of a coroutine is trivial when implementing the @get_coroutine@ function, and possibly redefining @suspend@and @resume@.1291 The advantage of this approach is that users can easily create different types of coroutines, \eg changing the memory layout of a coroutine is trivial when implementing the @get_coroutine@ function, and possibly redefining \textsf{suspend} and @resume@. 1293 1292 1294 1293 The \CFA custom-type @coroutine@ implicitly implements the getter and forward declarations for the coroutine main. … … 1338 1337 Once allocated, a VLS is fixed sized.} 1339 1338 on the allocating stack, provided the allocating stack is large enough. 1340 For a VLS stack allocation , allocation/deallocation is an inexpensive adjustment of the stack point, modulo any stack constructor costs (\eg initial frame setup).1339 For a VLS stack allocation/deallocation is an inexpensive adjustment of the stack pointer, modulo any stack constructor costs (\eg initial frame setup). 1341 1340 For heap stack allocation, allocation/deallocation is an expensive heap allocation (where the heap can be a shared resource), modulo any stack constructor costs. 1342 1341 With heap stack allocation, it is also possible to use a split (segmented) stack calling-convention, available with gcc and clang, so the stack is variable sized. … … 1363 1362 However, coroutines are a stepping stone towards concurrency. 1364 1363 1365 The transition to concurrency, even for a single thread with multiple stacks, occurs when coroutines context switch to a \newterm{scheduling coroutine}, introducing non-determinism from the coroutine perspective~\cite[\S~3,]{Buhr05a} \cite{Adya02}.1364 The transition to concurrency, even for a single thread with multiple stacks, occurs when coroutines context switch to a \newterm{scheduling coroutine}, introducing non-determinism from the coroutine perspective~\cite[\S~3,]{Buhr05a}. 1366 1365 Therefore, a minimal concurrency system requires coroutines \emph{in conjunction with a nondeterministic scheduler}. 1367 The resulting execution system now follows a cooperative threading-model , called \newterm{non-preemptive scheduling}.1366 The resulting execution system now follows a cooperative threading-model~\cite{Adya02,libdill}, called \newterm{non-preemptive scheduling}. 1368 1367 Adding \newterm{preemption} introduces non-cooperative scheduling, where context switching occurs randomly between any two instructions often based on a timer interrupt, called \newterm{preemptive scheduling}. 1369 1368 While a scheduler introduces uncertain execution among explicit context switches, preemption introduces uncertainty by introducing implicit context switches. … … 1424 1423 This semantic ensures a thread is started and stopped exactly once, eliminating some programming error, and scales to multiple threads for basic (termination) synchronization. 1425 1424 For block allocation to arbitrary depth, including recursion, threads are created/destroyed in a lattice structure (tree with top and bottom). 1426 Arbitrary topologies are possible using dynamic allocation, allowing threads to outlive their declaration scope, identical to normal dynamic ally allocating.1425 Arbitrary topologies are possible using dynamic allocation, allowing threads to outlive their declaration scope, identical to normal dynamic allocation. 1427 1426 \begin{cfa} 1428 1427 MyTask * factory( int N ) { ... return `anew( N )`; } $\C{// allocate heap-based threads, implicit start after construction}$ … … 1525 1524 \subsection{Mutual Exclusion} 1526 1525 1527 A group of instructions manipulating a specific instance of shared data that must be performed atomically is called a n (individual) \newterm{critical-section}~\cite{Dijkstra65}.1528 The generalization is called a \newterm{group critical-section}~\cite{Joung00}, where multiple tasks with the same session may use the resource simultaneously, but different sessions may not use the resource simultaneously.1526 A group of instructions manipulating a specific instance of shared data that must be performed atomically is called a \newterm{critical section}~\cite{Dijkstra65}, which is enforced by \newterm{simple mutual-exclusion}. 1527 The generalization is called a \newterm{group critical-section}~\cite{Joung00}, where multiple tasks with the same session use the resource simultaneously and different sessions are segregated, which is enforced by \newterm{complex mutual-exclusion} providing the correct kind and number of threads using a group critical-section. 1529 1528 The readers/writer problem~\cite{Courtois71} is an instance of a group critical-section, where readers share a session but writers have a unique session. 1530 \newterm{Mutual exclusion} enforces the correct kind and number of threads using a critical section.1531 1529 1532 1530 However, many solutions exist for mutual exclusion, which vary in terms of performance, flexibility and ease of use. … … 1548 1546 Preventing or detecting barging is an involved challenge with low-level locks, which is made easier through higher-level constructs. 1549 1547 This challenge is often split into two different approaches: barging avoidance and prevention. 1550 Algorithms that unconditionally releasing a lock for competing threads to acquire use barging avoidance during synchronization to force a barging thread to wait .1548 Algorithms that unconditionally releasing a lock for competing threads to acquire use barging avoidance during synchronization to force a barging thread to wait; 1551 1549 algorithms that conditionally hold locks during synchronization, \eg baton-passing~\cite{Andrews89}, prevent barging completely. 1552 1550 … … 1638 1636 For this reason, \CFA requires programmers to identify the kind of parameter with the @mutex@ keyword and uses no keyword to mean \lstinline[morekeywords=nomutex]@nomutex@. 1639 1637 1640 \newpage1641 1638 The next semantic decision is establishing which parameter \emph{types} may be qualified with @mutex@. 1642 1639 The following has monitor parameter types that are composed of multiple objects. … … 1737 1734 1738 1735 Users can still force the acquiring order by using @mutex@/\lstinline[morekeywords=nomutex]@nomutex@. 1739 \newpage1740 1736 \begin{cfa} 1741 1737 void foo( M & mutex m1, M & mutex m2 ); $\C{// acquire m1 and m2}$ … … 1748 1744 \end{cfa} 1749 1745 The bulk-acquire semantics allow @bar@ or @baz@ to acquire a monitor lock and reacquire it in @foo@. 1750 In the calls to @bar@ and @baz@, the monitors are acquiredin opposite order, possibly resulting in deadlock.1746 The calls to @bar@ and @baz@ acquired the monitors in opposite order, possibly resulting in deadlock. 1751 1747 However, this case is the simplest instance of the \emph{nested-monitor problem}~\cite{Lister77}, where monitors are acquired in sequence versus bulk. 1752 1748 Detecting the nested-monitor problem requires dynamic tracking of monitor calls, and dealing with it requires rollback semantics~\cite{Dice10}. … … 1799 1795 % It is only in this way that a waiting program has an absolute guarantee that it can acquire the resource just released by the signalling program without any danger that a third program will interpose a monitor entry and seize the resource instead.~\cite[p.~550]{Hoare74} 1800 1796 % \end{cquote} 1801 Furthermore, \CFA concurrency has no spurious wakeup~\cite[\S~9]{Buhr05a}, which eliminates an implicit form of barging.1797 Furthermore, \CFA concurrency has no spurious wakeup~\cite[\S~9]{Buhr05a}, which eliminates an implicit form of self barging. 1802 1798 Hence, a \CFA @wait@ statement is not enclosed in a @while@ loop retesting a blocking predicate, which can cause thread starvation due to barging. 1803 1799 1804 Figure~\ref{f:MonitorScheduling} shows internal/external scheduling (for the bounded-buffer example in Figure~\ref{f:InternalExternalScheduling}).1800 Figure~\ref{f:MonitorScheduling} shows general internal/external scheduling (for the bounded-buffer example in Figure~\ref{f:InternalExternalScheduling}). 1805 1801 External calling threads block on the calling queue, if the monitor is occupied, otherwise they enter in FIFO order. 1806 Internal threads block on condition queues via @wait@ and they reenter from the condition in FIFO order .1802 Internal threads block on condition queues via @wait@ and they reenter from the condition in FIFO order, or they block on urgent via @signal_block@ or @waitfor@ and reenter implicit when the monitor becomes empty, \ie, the thread in the monitor exits or waits. 1807 1803 1808 1804 There are three signalling mechanisms to unblock waiting threads to enter the monitor. 1809 Note, signalling cannot have the signaller and signalled thread in the monitor simultaneously because of the mutual exclusion so only can proceed.1805 Note, signalling cannot have the signaller and signalled thread in the monitor simultaneously because of the mutual exclusion so only one can proceed. 1810 1806 For internal scheduling, threads are unblocked from condition queues using @signal@, where the signallee is moved to urgent and the signaller continues (solid line). 1811 1807 Multiple signals move multiple signallees to urgent, until the condition is empty. … … 1820 1816 Executing multiple @waitfor@s from different signalled functions causes the calling threads to move to urgent. 1821 1817 External scheduling requires urgent to be a stack, because the signaller excepts to execute immediately after the specified monitor call has exited or waited. 1822 Internal scheduling behaves the same for an urgent stack or queue, except for signalling multiple threads, where the threads unblock from urgent in reverse order from signalling. 1823 If the restart order is important, multiple signalling by a signal thread can be transformed into shared signalling among threads, where each thread signals the next thread. 1824 Hence, \CFA uses an urgent stack. 1818 Internal scheduling behaves the same for an urgent stack or queue, except for multiple signalling, where the threads unblock from urgent in reverse order from signalling. 1819 If the restart order is important, multiple signalling by a signal thread can be transformed into daisy-chain signalling among threads, where each thread signals the next thread. 1820 We tried both a stack for @waitfor@ and queue for signalling, but that resulted in complex semantics about which thread enters next. 1821 Hence, \CFA uses a single urgent stack to correctly handle @waitfor@ and adequately support both forms of signalling. 1825 1822 1826 1823 \begin{figure} … … 1840 1837 \end{figure} 1841 1838 1842 Figure~\ref{f:BBInt} shows a \CFA generic bounded-buffer with internal scheduling, where producers/consumers enter the monitor, seethe buffer is full/empty, and block on an appropriate condition variable, @full@/@empty@.1839 Figure~\ref{f:BBInt} shows a \CFA generic bounded-buffer with internal scheduling, where producers/consumers enter the monitor, detect the buffer is full/empty, and block on an appropriate condition variable, @full@/@empty@. 1843 1840 The @wait@ function atomically blocks the calling thread and implicitly releases the monitor lock(s) for all monitors in the function's parameter list. 1844 1841 The appropriate condition variable is signalled to unblock an opposite kind of thread after an element is inserted/removed from the buffer. … … 1964 1961 External scheduling is controlled by the @waitfor@ statement, which atomically blocks the calling thread, releases the monitor lock, and restricts the function calls that can next acquire mutual exclusion. 1965 1962 If the buffer is full, only calls to @remove@ can acquire the buffer, and if the buffer is empty, only calls to @insert@ can acquire the buffer. 1966 Threads making calls to functions that are currently excluded block outside of (external to) the monitor on the calling queue, versus blocking on condition queues inside of (internal to) the monitor.1963 Calls threads to functions that are currently excluded block outside of (external to) the monitor on the calling queue, versus blocking on condition queues inside of (internal to) the monitor. 1967 1964 Figure~\ref{f:RWExt} shows a readers/writer lock written using external scheduling, where a waiting reader detects a writer using the resource and restricts further calls until the writer exits by calling @EndWrite@. 1968 1965 The writer does a similar action for each reader or writer using the resource. 1969 1966 Note, no new calls to @StarRead@/@StartWrite@ may occur when waiting for the call to @EndRead@/@EndWrite@. 1970 External scheduling allows waiting for events from other threads while restricting unrelated events .1967 External scheduling allows waiting for events from other threads while restricting unrelated events, that would otherwise have to wait on conditions in the monitor. 1971 1968 The mechnaism can be done in terms of control flow, \eg Ada @accept@ or \uC @_Accept@, or in terms of data, \eg Go @select@ on channels. 1972 1969 While both mechanisms have strengths and weaknesses, this project uses the control-flow mechanism to be consistent with other language features. … … 1983 1980 Furthermore, barging corrupts the dating service during an exchange because a barger may also match and change the phone numbers, invalidating the previous exchange phone number. 1984 1981 Putting loops around the @wait@s does not correct the problem; 1985 the s olution must be restructured to account for barging.1982 the simple solution must be restructured to account for barging. 1986 1983 1987 1984 \begin{figure} … … 2051 2048 the signaller enters the monitor and changes state, detects a waiting threads that can use the state, performs a non-blocking signal on the condition queue for the waiting thread, and exits the monitor to run concurrently. 2052 2049 The waiter unblocks next from the urgent queue, uses/takes the state, and exits the monitor. 2053 Blocking signal lingis the reverse, where the waiter is providing the cooperation for the signalling thread;2050 Blocking signal is the reverse, where the waiter is providing the cooperation for the signalling thread; 2054 2051 the signaller enters the monitor, detects a waiting thread providing the necessary state, performs a blocking signal to place it on the urgent queue and unblock the waiter. 2055 2052 The waiter changes state and exits the monitor, and the signaller unblocks next from the urgent queue to use/take the state. … … 2082 2079 While \CC supports bulk locking, @wait@ only accepts a single lock for a condition variable, so bulk locking with condition variables is asymmetric. 2083 2080 Finally, a signaller, 2084 \newpage2085 2081 \begin{cfa} 2086 2082 void baz( M & mutex m1, M & mutex m2 ) { … … 2088 2084 } 2089 2085 \end{cfa} 2090 must have acquired at least the same locks as the waiting thread signalled from the condition queue.2086 must have acquired at least the same locks as the waiting thread signalled from a condition queue to allow the locks to be passed, and hence, prevent barging. 2091 2087 2092 2088 Similarly, for @waitfor( rtn )@, the default semantics is to atomically block the acceptor and release all acquired mutex parameters, \ie @waitfor( rtn, m1, m2 )@. … … 2121 2117 The \emph{conditional-expression} of a @when@ may call a function, but the function must not block or context switch. 2122 2118 If there are multiple acceptable mutex calls, selection occurs top-to-bottom (prioritized) among the @waitfor@ clauses, whereas some programming languages with similar mechanisms accept nondeterministically for this case, \eg Go \lstinline[morekeywords=select]@select@. 2123 If some accept guards are true and there are no outstanding calls to these members, the acceptor is accept-blocked until a call to one of these members is made.2119 If some accept guards are true and there are no outstanding calls to these members, the acceptor is blocked until a call to one of these members is made. 2124 2120 If there is a @timeout@ clause, it provides an upper bound on waiting. 2125 2121 If all the accept guards are false, the statement does nothing, unless there is a terminating @else@ clause with a true guard, which is executed instead. … … 2164 2160 However, the basic @waitfor@ semantics do not support this functionality, since using an object after its destructor is called is undefined. 2165 2161 Therefore, to make this useful capability work, the semantics for accepting the destructor is the same as @signal@, \ie the call to the destructor is placed on the urgent queue and the acceptor continues execution, which throws an exception to the acceptor and then the caller is unblocked from the urgent queue to deallocate the object. 2166 Accepting the destructor is anidiomatic way to terminate a thread in \CFA.2162 Accepting the destructor is the idiomatic way to terminate a thread in \CFA. 2167 2163 2168 2164 … … 2258 2254 In the object-oriented scenario, the type and all its operators are always present at compilation (even separate compilation), so it is possible to number the operations in a bit mask and use an $O(1)$ compare with a similar bit mask created for the operations specified in a @waitfor@. 2259 2255 2260 In \CFA, monitor functions can be statically added/removed in translation units, so it is impossible to apply an $O(1)$ approach.2256 However, in \CFA, monitor functions can be statically added/removed in translation units, making a fast subset check difficult. 2261 2257 \begin{cfa} 2262 2258 monitor M { ... }; // common type, included in .h file … … 2265 2261 void g( M & mutex m ) { waitfor( `f`, m ); } 2266 2262 translation unit 2 2267 void `f`( M & mutex m ); 2263 void `f`( M & mutex m ); $\C{// replacing f and g for type M in this translation unit}$ 2268 2264 void `g`( M & mutex m ); 2269 void h( M & mutex m ) { waitfor( `f`, m ) or waitfor( `g`, m ); } 2265 void h( M & mutex m ) { waitfor( `f`, m ) or waitfor( `g`, m ); } $\C{// extending type M in this translation unit}$ 2270 2266 \end{cfa} 2271 2267 The @waitfor@ statements in each translation unit cannot form a unique bit-mask because the monitor type does not carry that information. 2272 Hence, function pointers are used to identify the functions listed in the @waitfor@ statement, stored in a variable-sized array ,2268 Hence, function pointers are used to identify the functions listed in the @waitfor@ statement, stored in a variable-sized array. 2273 2269 Then, the same implementation approach used for the urgent stack is used for the calling queue. 2274 2270 Each caller has a list of monitors acquired, and the @waitfor@ statement performs a (usually short) linear search matching functions in the @waitfor@ list with called functions, and then verifying the associated mutex locks can be transfers. … … 2280 2276 2281 2277 External scheduling, like internal scheduling, becomes significantly more complex for multi-monitor semantics. 2282 Even in the simplest case, new semantics need sto be established.2278 Even in the simplest case, new semantics need to be established. 2283 2279 \begin{cfa} 2284 2280 monitor M { ... }; … … 2512 2508 2513 2509 For completeness and efficiency, \CFA provides a standard set of low-level locks: recursive mutex, condition, semaphore, barrier, \etc, and atomic instructions: @fetchAssign@, @fetchAdd@, @testSet@, @compareSet@, \etc. 2514 However, we strongly advocate using high-level concurrencymechanisms whenever possible.2510 Some of these low-level mechanism are used in the \CFA runtime, but we strongly advocate using high-level mechanisms whenever possible. 2515 2511 2516 2512 … … 2568 2564 \label{s:RuntimeStructureCluster} 2569 2565 2570 A \newterm{cluster} is a collection of threads and virtual processors (abstract kernel-thread) that execute the threads from its own ready queue (like an OS).2566 A \newterm{cluster} is a collection of threads and virtual processors (abstract kernel-thread) that execute the (user) threads from its own ready queue (like an OS executing kernel threads). 2571 2567 The purpose of a cluster is to control the amount of parallelism that is possible among threads, plus scheduling and other execution defaults. 2572 2568 The default cluster-scheduler is single-queue multi-server, which provides automatic load-balancing of threads on processors. 2573 However, the scheduler is pluggable, supporting alternative schedulers, such as multi-queue multi-server, with work-stealing/sharing .2569 However, the scheduler is pluggable, supporting alternative schedulers, such as multi-queue multi-server, with work-stealing/sharing across the virtual processors. 2574 2570 If several clusters exist, both threads and virtual processors, can be explicitly migrated from one cluster to another. 2575 2571 No automatic load balancing among clusters is performed by \CFA. … … 2584 2580 \label{s:RuntimeStructureProcessor} 2585 2581 2586 A virtual processor is implemented by a kernel thread (\eg UNIX process), which is subsequentlyscheduled for execution on a hardware processor by the underlying operating system.2582 A virtual processor is implemented by a kernel thread (\eg UNIX process), which are scheduled for execution on a hardware processor by the underlying operating system. 2587 2583 Programs may use more virtual processors than hardware processors. 2588 2584 On a multiprocessor, kernel threads are distributed across the hardware processors resulting in virtual processors executing in parallel. 2589 2585 (It is possible to use affinity to lock a virtual processor onto a particular hardware processor~\cite{affinityLinux, affinityWindows, affinityFreebsd, affinityNetbsd, affinityMacosx}, which is used when caching issues occur or for heterogeneous hardware processors.) 2590 2586 The \CFA runtime attempts to block unused processors and unblock processors as the system load increases; 2591 balancing the workload with processors is difficult .2587 balancing the workload with processors is difficult because it requires future knowledge, \ie what will the applicaton workload do next. 2592 2588 Preemption occurs on virtual processors rather than user threads, via operating-system interrupts. 2593 2589 Thus virtual processors execute user threads, where preemption frequency applies to a virtual processor, so preemption occurs randomly across the executed user threads. … … 2622 2618 \subsection{Preemption} 2623 2619 2624 Nondeterministic preemption provides fairness from long running threads, and forces concurrent programmers to write more robust programs, rather than relying on section of code between cooperative scheduling to be atomic ,2620 Nondeterministic preemption provides fairness from long running threads, and forces concurrent programmers to write more robust programs, rather than relying on section of code between cooperative scheduling to be atomic. 2625 2621 A separate reason for not supporting preemption is that it significantly complicates the runtime system. 2626 2622 Preemption is normally handled by setting a count-down timer on each virtual processor. … … 2649 2645 There are two versions of the \CFA runtime kernel: debug and non-debug. 2650 2646 The debugging version has many runtime checks and internal assertions, \eg stack (non-writable) guard page, and checks for stack overflow whenever context switches occur among coroutines and threads, which catches most stack overflows. 2651 After a program is debugged, the non-debugging version can be used to decrease space and increase performance.2647 After a program is debugged, the non-debugging version can be used to significantly decrease space and increase performance. 2652 2648 2653 2649 … … 2708 2704 The only note here is that the call stacks of \CFA coroutines are lazily created, therefore without priming the coroutine to force stack creation, the creation cost is artificially low. 2709 2705 2710 \newpage2711 2706 \begin{multicols}{2} 2712 2707 \lstset{language=CFA,moredelim=**[is][\color{red}]{@}{@},deletedelim=**[is][]{`}{`}} … … 2959 2954 One solution is to offer various tuning options, allowing the scheduler to be adjusted to the requirements of the workload. 2960 2955 However, to be truly flexible, a pluggable scheduler is necessary. 2961 Currently, the \CFA pluggable scheduler is too simple to handle complex scheduling, \eg quality of service and real-time, where the scheduler must interact with mutex objects to deal with issues like priority inversion .2956 Currently, the \CFA pluggable scheduler is too simple to handle complex scheduling, \eg quality of service and real-time, where the scheduler must interact with mutex objects to deal with issues like priority inversion~\cite{Buhr00b}. 2962 2957 2963 2958 \paragraph{Non-Blocking I/O} … … 2992 2987 \section{Acknowledgements} 2993 2988 2994 The authors would like to recognize the design assistance of Aaron Moss, Rob Schluntz and Andrew Beachon the features described in this paper.2989 The authors would like to recognize the design assistance of Aaron Moss, Rob Schluntz, Andrew Beach and Michael Brooks on the features described in this paper. 2995 2990 Funding for this project has been provided by Huawei Ltd.\ (\url{http://www.huawei.com}). %, and Peter Buhr is partially funded by the Natural Sciences and Engineering Research Council of Canada. 2996 2991
Note:
See TracChangeset
for help on using the changeset viewer.