Changes in / [f52ce6e:73edfe9]


Ignore:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/papers/concurrency/Paper.tex

    rf52ce6e r73edfe9  
    16361636For this reason, \CFA requires programmers to identify the kind of parameter with the @mutex@ keyword and uses no keyword to mean \lstinline[morekeywords=nomutex]@nomutex@.
    16371637
     1638\newpage
    16381639The next semantic decision is establishing which parameter \emph{types} may be qualified with @mutex@.
    16391640The following has monitor parameter types that are composed of multiple objects.
     
    17341735
    17351736Users can still force the acquiring order by using @mutex@/\lstinline[morekeywords=nomutex]@nomutex@.
     1737\newpage
    17361738\begin{cfa}
    17371739void foo( M & mutex m1, M & mutex m2 ); $\C{// acquire m1 and m2}$
     
    17441746\end{cfa}
    17451747The bulk-acquire semantics allow @bar@ or @baz@ to acquire a monitor lock and reacquire it in @foo@.
    1746 The calls to @bar@ and @baz@ acquired the monitors in opposite order, possibly resulting in deadlock.
     1748In the calls to @bar@ and @baz@, the monitors are acquired in opposite order, possibly resulting in deadlock.
    17471749However, this case is the simplest instance of the \emph{nested-monitor problem}~\cite{Lister77}, where monitors are acquired in sequence versus bulk.
    17481750Detecting the nested-monitor problem requires dynamic tracking of monitor calls, and dealing with it requires rollback semantics~\cite{Dice10}.
     
    17951797% It is only in this way that a waiting program has an absolute guarantee that it can acquire the resource just released by the signalling program without any danger that a third program will interpose a monitor entry and seize the resource instead.~\cite[p.~550]{Hoare74}
    17961798% \end{cquote}
    1797 Furthermore, \CFA concurrency has no spurious wakeup~\cite[\S~9]{Buhr05a}, which eliminates an implicit form of self barging.
     1799Furthermore, \CFA concurrency has no spurious wakeup~\cite[\S~9]{Buhr05a}, which eliminates an implicit form of barging.
    17981800Hence, a \CFA @wait@ statement is not enclosed in a @while@ loop retesting a blocking predicate, which can cause thread starvation due to barging.
    17991801
    1800 Figure~\ref{f:MonitorScheduling} shows general internal/external scheduling (for the bounded-buffer example in Figure~\ref{f:InternalExternalScheduling}).
     1802Figure~\ref{f:MonitorScheduling} shows internal/external scheduling (for the bounded-buffer example in Figure~\ref{f:InternalExternalScheduling}).
    18011803External calling threads block on the calling queue, if the monitor is occupied, otherwise they enter in FIFO order.
    1802 Internal threads block on condition queues via @wait@ and they reenter from the condition in FIFO order, or they block on urgent via @signal_block@ or @waitfor@ and reenter implicit when the monitor becomes empty, \ie, the thread in the monitor exits or waits.
     1804Internal threads block on condition queues via @wait@ and they reenter from the condition in FIFO order.
    18031805
    18041806There are three signalling mechanisms to unblock waiting threads to enter the monitor.
    1805 Note, signalling cannot have the signaller and signalled thread in the monitor simultaneously because of the mutual exclusion so only one can proceed.
     1807Note, signalling cannot have the signaller and signalled thread in the monitor simultaneously because of the mutual exclusion so only can proceed.
    18061808For internal scheduling, threads are unblocked from condition queues using @signal@, where the signallee is moved to urgent and the signaller continues (solid line).
    18071809Multiple signals move multiple signallees to urgent, until the condition is empty.
     
    18161818Executing multiple @waitfor@s from different signalled functions causes the calling threads to move to urgent.
    18171819External scheduling requires urgent to be a stack, because the signaller excepts to execute immediately after the specified monitor call has exited or waited.
    1818 Internal scheduling behaves the same for an urgent stack or queue, except for multiple signalling, where the threads unblock from urgent in reverse order from signalling.
    1819 If the restart order is important, multiple signalling by a signal thread can be transformed into daisy-chain signalling among threads, where each thread signals the next thread.
    1820 We tried both a stack for @waitfor@ and queue for signalling, but that resulted in complex semantics about which thread enters next.
    1821 Hence, \CFA uses a single urgent stack to correctly handle @waitfor@ and adequately support both forms of signalling.
     1820Internal scheduling behaves the same for an urgent stack or queue, except for signalling multiple threads, where the threads unblock from urgent in reverse order from signalling.
     1821If the restart order is important, multiple signalling by a signal thread can be transformed into shared signalling among threads, where each thread signals the next thread.
     1822Hence, \CFA uses an urgent stack.
    18221823
    18231824\begin{figure}
     
    18371838\end{figure}
    18381839
    1839 Figure~\ref{f:BBInt} shows a \CFA generic bounded-buffer with internal scheduling, where producers/consumers enter the monitor, detect the buffer is full/empty, and block on an appropriate condition variable, @full@/@empty@.
     1840Figure~\ref{f:BBInt} shows a \CFA generic bounded-buffer with internal scheduling, where producers/consumers enter the monitor, see the buffer is full/empty, and block on an appropriate condition variable, @full@/@empty@.
    18401841The @wait@ function atomically blocks the calling thread and implicitly releases the monitor lock(s) for all monitors in the function's parameter list.
    18411842The appropriate condition variable is signalled to unblock an opposite kind of thread after an element is inserted/removed from the buffer.
     
    19611962External scheduling is controlled by the @waitfor@ statement, which atomically blocks the calling thread, releases the monitor lock, and restricts the function calls that can next acquire mutual exclusion.
    19621963If the buffer is full, only calls to @remove@ can acquire the buffer, and if the buffer is empty, only calls to @insert@ can acquire the buffer.
    1963 Calls threads to functions that are currently excluded block outside of (external to) the monitor on the calling queue, versus blocking on condition queues inside of (internal to) the monitor.
     1964Threads making calls to functions that are currently excluded block outside of (external to) the monitor on the calling queue, versus blocking on condition queues inside of (internal to) the monitor.
    19641965Figure~\ref{f:RWExt} shows a readers/writer lock written using external scheduling, where a waiting reader detects a writer using the resource and restricts further calls until the writer exits by calling @EndWrite@.
    19651966The writer does a similar action for each reader or writer using the resource.
    19661967Note, no new calls to @StarRead@/@StartWrite@ may occur when waiting for the call to @EndRead@/@EndWrite@.
    1967 External scheduling allows waiting for events from other threads while restricting unrelated events, that would otherwise have to wait on conditions in the monitor.
     1968External scheduling allows waiting for events from other threads while restricting unrelated events.
    19681969The mechnaism can be done in terms of control flow, \eg Ada @accept@ or \uC @_Accept@, or in terms of data, \eg Go @select@ on channels.
    19691970While both mechanisms have strengths and weaknesses, this project uses the control-flow mechanism to be consistent with other language features.
     
    19801981Furthermore, barging corrupts the dating service during an exchange because a barger may also match and change the phone numbers, invalidating the previous exchange phone number.
    19811982Putting loops around the @wait@s does not correct the problem;
    1982 the simple solution must be restructured to account for barging.
     1983the solution must be restructured to account for barging.
    19831984
    19841985\begin{figure}
     
    20482049the signaller enters the monitor and changes state, detects a waiting threads that can use the state, performs a non-blocking signal on the condition queue for the waiting thread, and exits the monitor to run concurrently.
    20492050The waiter unblocks next from the urgent queue, uses/takes the state, and exits the monitor.
    2050 Blocking signal is the reverse, where the waiter is providing the cooperation for the signalling thread;
     2051Blocking signalling is the reverse, where the waiter is providing the cooperation for the signalling thread;
    20512052the signaller enters the monitor, detects a waiting thread providing the necessary state, performs a blocking signal to place it on the urgent queue and unblock the waiter.
    20522053The waiter changes state and exits the monitor, and the signaller unblocks next from the urgent queue to use/take the state.
     
    20792080While \CC supports bulk locking, @wait@ only accepts a single lock for a condition variable, so bulk locking with condition variables is asymmetric.
    20802081Finally, a signaller,
     2082\newpage
    20812083\begin{cfa}
    20822084void baz( M & mutex m1, M & mutex m2 ) {
     
    20842086}
    20852087\end{cfa}
    2086 must have acquired at least the same locks as the waiting thread signalled from a condition queue to allow the locks to be passed, and hence, prevent barging.
     2088must have acquired at least the same locks as the waiting thread signalled from the condition queue.
    20872089
    20882090Similarly, for @waitfor( rtn )@, the default semantics is to atomically block the acceptor and release all acquired mutex parameters, \ie @waitfor( rtn, m1, m2 )@.
     
    21172119The \emph{conditional-expression} of a @when@ may call a function, but the function must not block or context switch.
    21182120If there are multiple acceptable mutex calls, selection occurs top-to-bottom (prioritized) among the @waitfor@ clauses, whereas some programming languages with similar mechanisms accept nondeterministically for this case, \eg Go \lstinline[morekeywords=select]@select@.
    2119 If some accept guards are true and there are no outstanding calls to these members, the acceptor is blocked until a call to one of these members is made.
     2121If some accept guards are true and there are no outstanding calls to these members, the acceptor is accept-blocked until a call to one of these members is made.
    21202122If there is a @timeout@ clause, it provides an upper bound on waiting.
    21212123If all the accept guards are false, the statement does nothing, unless there is a terminating @else@ clause with a true guard, which is executed instead.
     
    21602162However, the basic @waitfor@ semantics do not support this functionality, since using an object after its destructor is called is undefined.
    21612163Therefore, to make this useful capability work, the semantics for accepting the destructor is the same as @signal@, \ie the call to the destructor is placed on the urgent queue and the acceptor continues execution, which throws an exception to the acceptor and then the caller is unblocked from the urgent queue to deallocate the object.
    2162 Accepting the destructor is the idiomatic way to terminate a thread in \CFA.
     2164Accepting the destructor is an idiomatic way to terminate a thread in \CFA.
    21632165
    21642166
     
    22542256In the object-oriented scenario, the type and all its operators are always present at compilation (even separate compilation), so it is possible to number the operations in a bit mask and use an $O(1)$ compare with a similar bit mask created for the operations specified in a @waitfor@.
    22552257
    2256 However, in \CFA, monitor functions can be statically added/removed in translation units, making a fast subset check difficult.
     2258In \CFA, monitor functions can be statically added/removed in translation units, so it is impossible to apply an $O(1)$ approach.
    22572259\begin{cfa}
    22582260        monitor M { ... }; // common type, included in .h file
     
    22612263        void g( M & mutex m ) { waitfor( `f`, m ); }
    22622264translation unit 2
    2263         void `f`( M & mutex m ); $\C{// replacing f and g for type M in this translation unit}$
     2265        void `f`( M & mutex m );
    22642266        void `g`( M & mutex m );
    2265         void h( M & mutex m ) { waitfor( `f`, m ) or waitfor( `g`, m ); } $\C{// extending type M in this translation unit}$
     2267        void h( M & mutex m ) { waitfor( `f`, m ) or waitfor( `g`, m ); }
    22662268\end{cfa}
    22672269The @waitfor@ statements in each translation unit cannot form a unique bit-mask because the monitor type does not carry that information.
    2268 Hence, function pointers are used to identify the functions listed in the @waitfor@ statement, stored in a variable-sized array.
     2270Hence, function pointers are used to identify the functions listed in the @waitfor@ statement, stored in a variable-sized array,
    22692271Then, the same implementation approach used for the urgent stack is used for the calling queue.
    22702272Each caller has a list of monitors acquired, and the @waitfor@ statement performs a (usually short) linear search matching functions in the @waitfor@ list with called functions, and then verifying the associated mutex locks can be transfers.
     
    22762278
    22772279External scheduling, like internal scheduling, becomes significantly more complex for multi-monitor semantics.
    2278 Even in the simplest case, new semantics need to be established.
     2280Even in the simplest case, new semantics needs to be established.
    22792281\begin{cfa}
    22802282monitor M { ... };
     
    25082510
    25092511For completeness and efficiency, \CFA provides a standard set of low-level locks: recursive mutex, condition, semaphore, barrier, \etc, and atomic instructions: @fetchAssign@, @fetchAdd@, @testSet@, @compareSet@, \etc.
    2510 Some of these low-level mechanism are used in the \CFA runtime, but we strongly advocate using high-level mechanisms whenever possible.
     2512However, we strongly advocate using high-level concurrency mechanisms whenever possible.
    25112513
    25122514
     
    25642566\label{s:RuntimeStructureCluster}
    25652567
    2566 A \newterm{cluster} is a collection of threads and virtual processors (abstract kernel-thread) that execute the (user) threads from its own ready queue (like an OS executing kernel threads).
     2568A \newterm{cluster} is a collection of threads and virtual processors (abstract kernel-thread) that execute the threads from its own ready queue (like an OS).
    25672569The purpose of a cluster is to control the amount of parallelism that is possible among threads, plus scheduling and other execution defaults.
    25682570The default cluster-scheduler is single-queue multi-server, which provides automatic load-balancing of threads on processors.
    2569 However, the scheduler is pluggable, supporting alternative schedulers, such as multi-queue multi-server, with work-stealing/sharing across the virtual processors.
     2571However, the scheduler is pluggable, supporting alternative schedulers, such as multi-queue multi-server, with work-stealing/sharing.
    25702572If several clusters exist, both threads and virtual processors, can be explicitly migrated from one cluster to another.
    25712573No automatic load balancing among clusters is performed by \CFA.
     
    25802582\label{s:RuntimeStructureProcessor}
    25812583
    2582 A virtual processor is implemented by a kernel thread (\eg UNIX process), which are scheduled for execution on a hardware processor by the underlying operating system.
     2584A virtual processor is implemented by a kernel thread (\eg UNIX process), which is subsequently scheduled for execution on a hardware processor by the underlying operating system.
    25832585Programs may use more virtual processors than hardware processors.
    25842586On a multiprocessor, kernel threads are distributed across the hardware processors resulting in virtual processors executing in parallel.
    25852587(It is possible to use affinity to lock a virtual processor onto a particular hardware processor~\cite{affinityLinux, affinityWindows, affinityFreebsd, affinityNetbsd, affinityMacosx}, which is used when caching issues occur or for heterogeneous hardware processors.)
    25862588The \CFA runtime attempts to block unused processors and unblock processors as the system load increases;
    2587 balancing the workload with processors is difficult because it requires future knowledge, \ie what will the applicaton workload do next.
     2589balancing the workload with processors is difficult.
    25882590Preemption occurs on virtual processors rather than user threads, via operating-system interrupts.
    25892591Thus virtual processors execute user threads, where preemption frequency applies to a virtual processor, so preemption occurs randomly across the executed user threads.
     
    26182620\subsection{Preemption}
    26192621
    2620 Nondeterministic preemption provides fairness from long running threads, and forces concurrent programmers to write more robust programs, rather than relying on section of code between cooperative scheduling to be atomic.
     2622Nondeterministic preemption provides fairness from long running threads, and forces concurrent programmers to write more robust programs, rather than relying on section of code between cooperative scheduling to be atomic,
    26212623A separate reason for not supporting preemption is that it significantly complicates the runtime system.
    26222624Preemption is normally handled by setting a count-down timer on each virtual processor.
     
    26452647There are two versions of the \CFA runtime kernel: debug and non-debug.
    26462648The debugging version has many runtime checks and internal assertions, \eg stack (non-writable) guard page, and checks for stack overflow whenever context switches occur among coroutines and threads, which catches most stack overflows.
    2647 After a program is debugged, the non-debugging version can be used to significantly decrease space and increase performance.
     2649After a program is debugged, the non-debugging version can be used to decrease space and increase performance.
    26482650
    26492651
     
    27042706The only note here is that the call stacks of \CFA coroutines are lazily created, therefore without priming the coroutine to force stack creation, the creation cost is artificially low.
    27052707
     2708\newpage
    27062709\begin{multicols}{2}
    27072710\lstset{language=CFA,moredelim=**[is][\color{red}]{@}{@},deletedelim=**[is][]{`}{`}}
     
    29542957One solution is to offer various tuning options, allowing the scheduler to be adjusted to the requirements of the workload.
    29552958However, to be truly flexible, a pluggable scheduler is necessary.
    2956 Currently, the \CFA pluggable scheduler is too simple to handle complex scheduling, \eg quality of service and real-time, where the scheduler must interact with mutex objects to deal with issues like priority inversion~\cite{Buhr00b}.
     2959Currently, the \CFA pluggable scheduler is too simple to handle complex scheduling, \eg quality of service and real-time, where the scheduler must interact with mutex objects to deal with issues like priority inversion.
    29572960
    29582961\paragraph{Non-Blocking I/O}
     
    29872990\section{Acknowledgements}
    29882991
    2989 The authors would like to recognize the design assistance of Aaron Moss, Rob Schluntz, Andrew Beach and Michael Brooks on the features described in this paper.
     2992The authors would like to recognize the design assistance of Aaron Moss, Rob Schluntz and Andrew Beach on the features described in this paper.
    29902993Funding for this project has been provided by Huawei Ltd.\ (\url{http://www.huawei.com}). %, and Peter Buhr is partially funded by the Natural Sciences and Engineering Research Council of Canada.
    29912994
Note: See TracChangeset for help on using the changeset viewer.