Changeset adb60242 for doc/papers


Ignore:
Timestamp:
Jun 30, 2018, 8:35:13 AM (3 years ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
aaron-thesis, arm-eh, cleanup-dtors, deferred_resn, demangler, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, no_list, persistent-indexer
Children:
2ebcb28, 35a9e41
Parents:
1f133dc
Message:

final corrections

Location:
doc/papers/concurrency
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • doc/papers/concurrency/Paper.tex

    r1f133dc radb60242  
    242242\abstract[Summary]{
    243243\CFA is a modern, polymorphic, \emph{non-object-oriented} extension of the C programming language.
    244 This paper discusses the design of the concurrency and parallelism features in \CFA, and the concurrent runtime-system.
     244This paper discusses the design of the concurrency and parallelism features in \CFA, and its concurrent runtime-system.
    245245These features are created from scratch as ISO C lacks concurrency, relying largely on the pthreads library for concurrency.
    246 Coroutines and lightweight (user) threads are introduced into the language.
    247 In addition, monitors are added as a high-level mechanism for mutual exclusion and synchronization.
    248 A unique contribution is allowing multiple monitors to be safely acquired simultaneously.
     246Coroutines and lightweight (user) threads are introduced into \CFA;
     247as well, monitors are added as a high-level mechanism for mutual exclusion and synchronization.
     248A unique contribution of this work is allowing multiple monitors to be safely acquired \emph{simultaneously}.
    249249All features respect the expectations of C programmers, while being fully integrate with the \CFA polymorphic type-system and other language features.
    250 Finally, experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming-languages.
     250Experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming-languages.
    251251}%
    252252
     
    593593
    594594While the focus of this discussion is concurrency and parallelism, it is important to address coroutines, which are a significant building block of a concurrency system (but not concurrent among themselves).
    595 Coroutines are generalized routines allowing execution to be temporarily suspend and later resumed.
     595Coroutines are generalized routines allowing execution to be temporarily suspended and later resumed.
    596596Hence, unlike a normal routine, a coroutine may not terminate when it returns to its caller, allowing it to be restarted with the values and execution location present at the point of suspension.
    597 This capability is accomplish via the coroutine's stack, where suspend/resume context switch among stacks.
     597This capability is accomplished via the coroutine's stack, where suspend/resume context switch among stacks.
    598598Because threading design-challenges are present in coroutines, their design effort is relevant, and this effort can be easily exposed to programmers giving them a useful new programming paradigm because a coroutine handles the class of problems that need to retain state between calls, \eg plugins, device drivers, and finite-state machines.
    599599Therefore, the two fundamental features of the core \CFA coroutine-API are independent call-stacks and @suspend@/@resume@ operations.
     
    750750\end{quote}
    751751The example takes advantage of resuming a coroutine in the constructor to prime the loops so the first character sent for formatting appears inside the nested loops.
    752 The destruction provides a newline, if formatted text ends with a full line.
    753 Figure~\ref{f:CFmt} shows the C equivalent formatter, where the loops of the coroutine are flatten (linearized) and rechecked on each call because execution location is not retained between calls.
     752The destructor provides a newline, if formatted text ends with a full line.
     753Figure~\ref{f:CFmt} shows the C equivalent formatter, where the loops of the coroutine are flattened (linearized) and rechecked on each call because execution location is not retained between calls.
    754754(Linearized code is the bane of device drivers.)
    755755
     
    956956\end{cfa}
    957957and the programming language (and possibly its tool set, \eg debugger) may need to understand @baseCoroutine@ because of the stack.
    958 Furthermore, the execution of constructs/destructors is in the wrong order for certain operations.
     958Furthermore, the execution of constructors/destructors is in the wrong order for certain operations.
    959959For example, for threads if the thread is implicitly started, it must start \emph{after} all constructors, because the thread relies on a completely initialized object, but the inherited constructor runs \emph{before} the derived.
    960960
     
    10191019forall( `dtype` T | is_coroutine(T) ) void resume( T & );
    10201020\end{cfa}
    1021 The @dtype@ property of the trait ensures the coroutine descriptor is non-copyable, so all coroutines must be passed by reference (pointer).
     1021The @dtype@ property provides no implicit copying operations and the @is_coroutine@ trait provides no explicit copying operations, so all coroutines must be passed by reference (pointer).
    10221022The routine definitions ensures there is a statically-typed @main@ routine that is the starting point (first stack frame) of a coroutine, and a mechanism to get (read) the currently executing coroutine handle.
    10231023The @main@ routine has no return value or additional parameters because the coroutine type allows an arbitrary number of interface routines with corresponding arbitrary typed input/output values versus fixed ones.
     
    11691169The allocation/deallocation pattern appears unusual because allocated objects are immediately deallocated without any intervening code.
    11701170However, for threads, the deletion provides implicit synchronization, which is the intervening code.
    1171 While the subtotals are added in linear order rather than completion order, which slight inhibits concurrency, the computation is restricted by the critical-path thread (\ie the thread that takes the longest), and so any inhibited concurrency is very small as totalling the subtotals is trivial.
     1171While the subtotals are added in linear order rather than completion order, which slightly inhibits concurrency, the computation is restricted by the critical-path thread (\ie the thread that takes the longest), and so any inhibited concurrency is very small as totalling the subtotals is trivial.
    11721172
    11731173\begin{figure}
    11741174\begin{cfa}
    1175 thread Adder {
    1176     int * row, cols, & subtotal;                        $\C{// communication}$
    1177 };
     1175`thread` Adder { int * row, cols, & subtotal; } $\C{// communication variables}$
    11781176void ?{}( Adder & adder, int row[], int cols, int & subtotal ) {
    11791177    adder.[ row, cols, &subtotal ] = [ row, cols, &subtotal ];
     
    11891187    Adder * adders[rows];
    11901188    for ( int r = 0; r < rows; r += 1 ) {       $\C{// start threads to sum rows}$
    1191                 adders[r] = new( matrix[r], cols, &subtotals[r] );
     1189                adders[r] = `new( matrix[r], cols, &subtotals[r] );`
    11921190    }
    11931191    for ( int r = 0; r < rows; r += 1 ) {       $\C{// wait for threads to finish}$
    1194                 delete( adders[r] );                            $\C{// termination join}$
     1192                `delete( adders[r] );`                          $\C{// termination join}$
    11951193                total += subtotals[r];                          $\C{// total subtotal}$
    11961194    }
     
    12081206To reestablish meaningful execution requires mechanisms to reintroduce determinism, \ie restrict non-determinism, called mutual exclusion and synchronization, where mutual exclusion is an access-control mechanism on data shared by threads, and synchronization is a timing relationship among threads~\cite[\S~4]{Buhr05a}.
    12091207Since many deterministic challenges appear with the use of mutable shared state, some languages/libraries disallow it, \eg Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, Akka~\cite{Akka} (Scala).
    1210 In these paradigms, interaction among concurrent objects is performed by stateless message-passing~\cite{Thoth,Harmony,V-Kernel} or other paradigms closely relate to networking concepts, \eg channels~\cite{CSP,Go}.
     1208In these paradigms, interaction among concurrent objects is performed by stateless message-passing~\cite{Thoth,Harmony,V-Kernel} or other paradigms closely related to networking concepts, \eg channels~\cite{CSP,Go}.
    12111209However, in call/return-based languages, these approaches force a clear distinction, \ie introduce a new programming paradigm between regular and concurrent computation, \eg routine call versus message passing.
    12121210Hence, a programmer must learn and manipulate two sets of design patterns.
    12131211While this distinction can be hidden away in library code, effective use of the library still has to take both paradigms into account.
    1214 In contrast, approaches based on statefull models more closely resemble the standard call/return programming-model, resulting in a single programming paradigm.
     1212In contrast, approaches based on stateful models more closely resemble the standard call/return programming-model, resulting in a single programming paradigm.
    12151213
    12161214At the lowest level, concurrent control is implemented by atomic operations, upon which different kinds of locking mechanisms are constructed, \eg semaphores~\cite{Dijkstra68b}, barriers, and path expressions~\cite{Campbell74}.
     
    12521250Algorithms that allow a barger, but divert it until later using current synchronization state (flags), are avoiding the barger;
    12531251algorithms that preclude a barger from entering during synchronization in the critical section prevent barging completely.
    1254 Techniques like baton-pass locks~\cite{Andrews89} between threads instead of unconditionally releasing locks is an example of barging prevention.
     1252Techniques like baton-passing locks~\cite{Andrews89} between threads instead of unconditionally releasing locks is an example of barging prevention.
    12551253
    12561254
     
    12651263Copying a lock is insecure because it is possible to copy an open lock and then use the open copy when the original lock is closed to simultaneously access the shared data.
    12661264Copying a monitor is secure because both the lock and shared data are copies, but copying the shared data is meaningless because it no longer represents a unique entity.
    1267 As for coroutines/tasks, a non-copyable (@dtype@) trait is used to capture this requirement, so all locks/monitors must be passed by reference (pointer).
     1265As for coroutines/tasks, the @dtype@ property provides no implicit copying operations and the @is_monitor@ trait provides no explicit copying operations, so all locks/monitors must be passed by reference (pointer).
    12681266\begin{cfa}
    12691267trait is_monitor( `dtype` T ) {
     
    12771275\label{s:MutexAcquisition}
    12781276
    1279 While correctness implicitly implies a monitor's mutual exclusion is acquired and released, there are implementation options about when and where the locking/unlocking occurs.
     1277While correctness implies a monitor's mutual exclusion is acquired and released, there are implementation options about when and where the locking/unlocking occurs.
    12801278(Much of this discussion also applies to basic locks.)
    12811279For example, a monitor may need to be passed through multiple helper routines before it becomes necessary to acquire the monitor mutual-exclusion.
     
    13061304\begin{cfa}
    13071305monitor Tree {
    1308         Tree * left, right;
     1306        Tree * left, * right;
    13091307        // value
    13101308};
    13111309void print( Tree & mutex tree ) {                       $\C{// prefix traversal}$
    13121310        // write value
    1313         print( tree->left );                                    $\C{// multiply acquire monitor lock on each recursion}$
     1311        print( tree->left );                                    $\C{// multiply acquire monitor lock for tree on each recursion}$
    13141312        print( tree->right );
    13151313}
    13161314\end{cfa}
    13171315
    1318 The benefit of mandatory monitor qualifiers is self-documentation, but requiring both @mutex@ and \lstinline[morekeywords=nomutex]@nomutex@ for all monitor parameter is redundant.
    1319 Instead, one of qualifier semantics can be the default, and the other required.
    1320 For example, assume the safe @mutex@ qualifier for all monitor parameters because assuming \lstinline[morekeywords=nomutex]@nomutex@ may cause subtle errors.
    1321 On the other hand, assuming \lstinline[morekeywords=nomutex]@nomutex@ is the \emph{normal} parameter behaviour, stating explicitly \emph{this parameter is not special}.
     1316The benefit of mandatory monitor qualifiers is self-documentation, but requiring both @mutex@ and \lstinline[morekeywords=nomutex]@nomutex@ for all monitor parameters is redundant.
     1317Instead, the semantics have one qualifier as the default, and the other required.
     1318For example, make the safe @mutex@ qualifier the default because assuming \lstinline[morekeywords=nomutex]@nomutex@ may cause subtle errors.
     1319Alternatively, make the unsafe \lstinline[morekeywords=nomutex]@nomutex@ qualifier the default because it is the \emph{normal} parameter semantics while @mutex@ parameters are rare.
    13221320Providing a default qualifier implies knowing whether a parameter is a monitor.
    13231321Since \CFA relies heavily on traits as an abstraction mechanism, the distinction between a type that is a monitor and a type that looks like a monitor can become blurred.
     
    13681366Having language support for such a feature is therefore a significant asset for \CFA.
    13691367
    1370 The capability to acquire multiple locks before entering a critical section is called \newterm{bulk acquire}.
     1368The capability to acquire multiple locks before entering a critical section is called \newterm{bulk acquire} (see Section~\ref{s:Implementation} for implementation details).
    13711369In the previous example, \CFA guarantees the order of acquisition is consistent across calls to different routines using the same monitors as arguments.
    13721370This consistent ordering means acquiring multiple monitors is safe from deadlock.
     
    13861384
    13871385However, such use leads to lock acquiring order problems resulting in deadlock~\cite{Lister77}, where detecting it requires dynamic tracking of monitor calls, and dealing with it requires rollback semantics~\cite{Dice10}.
    1388 In \CFA, safety is guaranteed by using bulk acquire of all monitors to shared objects, whereas other monitor systems provide no aid.
    1389 While \CFA provides only a partial solution, the \CFA partial solution handles many useful cases.
     1386In \CFA, a safety aid is provided by using bulk acquire of all monitors to shared objects, whereas other monitor systems provide no aid.
     1387While \CFA provides only a partial solution, it handles many useful cases, \eg:
    13901388\begin{cfa}
    13911389monitor BankAccount { ... };
     
    14381436For example, Figure~\ref{f:GenericBoundedBuffer} shows a bounded buffer that may be full/empty so produce/consumer threads must block.
    14391437Leaving the monitor and trying again (busy waiting) is impractical for high-level programming.
    1440 Monitors eliminate busy waiting by providing internal synchronization to schedule threads needing access to the shared data, where the synchronization is blocking (threads are parked) versus spinning.
     1438Monitors eliminate busy waiting by providing synchronization to schedule threads needing access to the shared data, where threads block versus spinning.
    14411439Synchronization is generally achieved with internal~\cite{Hoare74} or external~\cite[\S~2.9.2]{uC++} scheduling, where \newterm{scheduling} defines which thread acquires the critical section next.
    14421440\newterm{Internal scheduling} is characterized by each thread entering the monitor and making an individual decision about proceeding or blocking, while \newterm{external scheduling} is characterized by an entering thread making a decision about proceeding for itself and on behalf of other threads attempting entry.
     
    15311529External scheduling is controlled by the @waitfor@ statement, which atomically blocks the calling thread, releases the monitor lock, and restricts the routine calls that can next acquire mutual exclusion.
    15321530If the buffer is full, only calls to @remove@ can acquire the buffer, and if the buffer is empty, only calls to @insert@ can acquire the buffer.
    1533 Threads making calls to routines that are currently excluded, block outside (external) of the monitor on a calling queue, versus blocking on condition queues inside (internal) of the monitor.
    1534 % External scheduling is more constrained and explicit, which helps programmers reduce the non-deterministic nature of concurrency.
     1531Threads making calls to routines that are currently excluded, block outside of (external to) the monitor on a calling queue, versus blocking on condition queues inside of (internal to) the monitor.
    15351532External scheduling allows users to wait for events from other threads without concern of unrelated events occurring.
    15361533The mechnaism can be done in terms of control flow, \eg Ada @accept@ or \uC @_Accept@, or in terms of data, \eg Go channels.
     
    15491546A thread blocks until an appropriate partner arrives.
    15501547The complexity is exchanging phone numbers in the monitor because of the mutual-exclusion property.
    1551 For signal scheduling, the @exchange@ condition is necessary to block the thread finding the match, while the matcher unblocks to take the oppose number, post its phone number, and unblock the partner.
     1548For signal scheduling, the @exchange@ condition is necessary to block the thread finding the match, while the matcher unblocks to take the opposite number, post its phone number, and unblock the partner.
    15521549For signal-block scheduling, the implicit urgent-queue replaces the explict @exchange@-condition and @signal_block@ puts the finding thread on the urgent condition and unblocks the matcher.
    15531550
     
    15701567                wait( Girls[ccode] );
    15711568                GirlPhNo = phNo;
    1572                 `exchange.signal();`
     1569                `signal( exchange );`
    15731570        } else {
    15741571                GirlPhNo = phNo;
    15751572                `signal( Boys[ccode] );`
    1576                 `exchange.wait();`
     1573                `wait( exchange );`
    15771574        } // if
    15781575        return BoyPhNo;
     
    16491646}
    16501647\end{cfa}
    1651 must acquire a set of monitor locks greater than or equal to the number of locks for the waiting thread signalled from the condition queue.
     1648must have acquired at least the same locks as the waiting thread signalled from the condition queue.
    16521649
    16531650Similarly, for @waitfor( rtn )@, the default semantics is to atomically block the acceptor and release all acquired mutex types in the parameter list, \ie @waitfor( rtn, m1, m2 )@.
     
    16651662\end{cfa}
    16661663
    1667 Given the ability to release a subset of acquired monitors can result in a \newterm{nested monitor}~\cite{Lister77} deadlock.
     1664The ability to release a subset of acquired monitors can result in a \newterm{nested monitor}~\cite{Lister77} deadlock.
    16681665\begin{cfa}
    16691666void foo( M & mutex m1, M & mutex m2 ) {
     
    16761673
    16771674Finally, an important aspect of monitor implementation is barging, \ie can calling threads barge ahead of signalled threads?
    1678 If barging is allowed, synchronization between a singller and signallee is difficult, often requiring multiple unblock/block cycles (looping around a wait rechecking if a condition is met).
     1675If barging is allowed, synchronization between a signaller and signallee is difficult, often requiring multiple unblock/block cycles (looping around a wait rechecking if a condition is met).
     1676In fact, signals-as-hints is completely opposite from that proposed by Hoare in the seminal paper on monitors:
    16791677\begin{quote}
    16801678However, we decree that a signal operation be followed immediately by resumption of a waiting program, without possibility of an intervening procedure call from yet a third program.
     
    16821680\end{quote}
    16831681\CFA scheduling \emph{precludes} barging, which simplifies synchronization among threads in the monitor and increases correctness.
     1682Furthermore, \CFA concurrency has no spurious wakeup~\cite[\S~9]{Buhr05a}, which eliminates an implict form of barging.
    16841683For example, there are no loops in either bounded buffer solution in Figure~\ref{f:GenericBoundedBuffer}.
    16851684Supporting barging prevention as well as extending internal scheduling to multiple monitors is the main source of complexity in the design and implementation of \CFA concurrency.
    1686 Furthermore, there is no spurious wakeup~\cite[\S~9]{Buhr05a} in \CFA, which eliminates an implict form of barging.
    16871685
    16881686
     
    17721770
    17731771In an object-oriented programming-language, a class includes an exhaustive list of operations.
    1774 However, new members can be added via static inheritance or dynaic members, \eg JavaScript~\cite{JavaScript}.
     1772However, new members can be added via static inheritance or dynamic members, \eg JavaScript~\cite{JavaScript}.
    17751773Similarly, monitor routines can be added at any time in \CFA, making it less clear for programmers and more difficult to implement.
    17761774\begin{cfa}
     
    17811779void h( M & mutex m ) { waitfor( `f` ); }       $\C{// unclear which f}$
    17821780\end{cfa}
    1783 Hence, the cfa-code for the entering a monitor looks like:
     1781Hence, the cfa-code for entering a monitor looks like:
    17841782\begin{cfa}
    17851783if ( $\textrm{\textit{monitor is free}}$ ) $\LstCommentStyle{// \color{red}enter}$
     
    18421840\end{cfa}
    18431841Again, the set of monitors passed to the @waitfor@ statement must be entirely contained in the set of monitors already acquired by the accepting routine.
    1844 
    1845 Note, for internal and external scheduling with multiple monitors, a signalling or accepting thread must match exactly, \ie partial matching results in waiting.
    1846 \begin{cquote}
     1842Also, the order of the monitors in a @waitfor@ statement is unimportant.
     1843
     1844Figure~\ref{f:UnmatchedMutexSets} shows an example where, for internal and external scheduling with multiple monitors, a signalling or accepting thread must match exactly, \ie partial matching results in waiting.
     1845For both examples, the set of monitors is disjoint so unblocking is impossible.
     1846
     1847\begin{figure}
    18471848\lstDeleteShortInline@%
    18481849\begin{tabular}{@{}l@{\hspace{\parindentlnth}}|@{\hspace{\parindentlnth}}l@{}}
     
    18571858        wait( c );
    18581859}
     1860g( `m11`, m2 ); // block on wait
     1861f( `m12`, m2 ); // cannot fulfil
     1862\end{cfa}
     1863&
     1864\begin{cfa}
     1865monitor M1 {} m11, m12;
     1866monitor M2 {} m2;
     1867
     1868void f( M1 & mutex m1, M2 & mutex m2 ) {
     1869
     1870}
     1871void g( M1 & mutex m1, M2 & mutex m2 ) {
     1872        waitfor( f, m1, m2 );
     1873}
    18591874g( `m11`, m2 ); // block on accept
    18601875f( `m12`, m2 ); // cannot fulfil
    18611876\end{cfa}
    1862 &
    1863 \begin{cfa}
    1864 monitor M1 {} m11, m12;
    1865 monitor M2 {} m2;
    1866 
    1867 void f( M1 & mutex m1, M2 & mutex m2 ) {
    1868 
    1869 }
    1870 void g( M1 & mutex m1, M2 & mutex m2 ) {
    1871         waitfor( f, m1, m2 );
    1872 }
    1873 g( `m11`, m2 ); // block on accept
    1874 f( `m12`, m2 ); // cannot fulfil
    1875 \end{cfa}
    18761877\end{tabular}
    18771878\lstMakeShortInline@%
    1878 \end{cquote}
    1879 For both internal and external scheduling, the set of monitors is disjoint so unblocking is impossible.
     1879\caption{Unmatched \protect\lstinline@mutex@ sets}
     1880\label{f:UnmatchedMutexSets}
     1881\end{figure}
    18801882
    18811883
    18821884\subsection{Extended \protect\lstinline@waitfor@}
    18831885
    1884 The extended form of the @waitfor@ statement conditionally accepts one of a group of mutex routines and allows a specific action to be performed \emph{after} the mutex routine finishes.
     1886Figure~\ref{f:ExtendedWaitfor} show the extended form of the @waitfor@ statement to conditionally accept one of a group of mutex routines, with a specific action to be performed \emph{after} the mutex routine finishes.
     1887For a @waitfor@ clause to be executed, its @when@ must be true and an outstanding call to its corresponding member(s) must exist.
     1888The \emph{conditional-expression} of a @when@ may call a routine, but the routine must not block or context switch.
     1889If there are multiple acceptable mutex calls, selection occurs top-to-bottom (prioritized) in the @waitfor@ clauses, whereas some programming languages with similar mechanisms accept non-deterministically for this case, \eg Go \lstinline[morekeywords=select]@select@.
     1890If some accept guards are true and there are no outstanding calls to these members, the acceptor is accept-blocked until a call to one of these members is made.
     1891If all the accept guards are false, the statement does nothing, unless there is a terminating @else@ clause with a true guard, which is executed instead.
     1892Hence, the terminating @else@ clause allows a conditional attempt to accept a call without blocking.
     1893If there is a @timeout@ clause, it provides an upper bound on waiting.
     1894If both a @timeout@ clause and an @else@ clause are present, the @else@ must be conditional, or the @timeout@ is never triggered.
     1895In all cases, the statement following is executed \emph{after} a clause is executed to know which of the clauses executed.
     1896
     1897\begin{figure}
    18851898\begin{cfa}
    18861899`when` ( $\emph{conditional-expression}$ )      $\C{// optional guard}$
     
    18981911                $\emph{statement}$                                      $\C{// action when no immediate calls}$
    18991912\end{cfa}
    1900 For a @waitfor@ clause to be executed, its @when@ must be true and an outstanding call to its corresponding member(s) must exist.
    1901 The \emph{conditional-expression} of a @when@ may call a routine, but the routine must not block or context switch.
    1902 If there are several mutex calls that can be accepted, selection occurs top-to-bottom in the @waitfor@ clauses versus non-deterministically.
    1903 If some accept guards are true and there are no outstanding calls to these members, the acceptor is accept-blocked until a call to one of these members is made.
    1904 If all the accept guards are false, the statement does nothing, unless there is a terminating @else@ clause with a true guard, which is executed instead.
    1905 Hence, the terminating @else@ clause allows a conditional attempt to accept a call without blocking.
    1906 If there is a @timeout@ clause, it provides an upper bound on waiting, and can only appear with a conditional @else@, otherwise the timeout cannot be triggered.
    1907 In all cases, the statement following is executed \emph{after} a clause is executed to know which of the clauses executed.
     1913\caption{Extended \protect\lstinline@waitfor@}
     1914\label{f:ExtendedWaitfor}
     1915\end{figure}
    19081916
    19091917Note, a group of conditional @waitfor@ clauses is \emph{not} the same as a group of @if@ statements, e.g.:
     
    19181926\begin{cfa}
    19191927void insert( Buffer(T) & mutex buffer, T elem ) with( buffer ) {
    1920         if ( count == BufferSize )
     1928        if ( count == 10 )
    19211929                waitfor( remove, buffer ) {
    1922                         elements[back] = elem;
    1923                         back = ( back + 1 ) % BufferSize;
    1924                         count += 1;
     1930                        // insert elem into buffer
    19251931                } or `waitfor( ^?{}, buffer )` throw insertFail;
    19261932}
     
    20002006\label{s:fibers}
    20012007
    2002 A variant of user thread is \newterm{fibers}, which removes preemption, \eg Go~\cite{Go}.
    2003 Like functional programming, which removes mutation and its associated problems, removing preemption from concurrency reduces nondeterminism, hence race and deadlock errors are more difficult to generate.
     2008A variant of user thread is \newterm{fibers}, which removes preemption, \eg Go~\cite{Go} @goroutine@s.
     2009Like functional programming, which removes mutation and its associated problems, removing preemption from concurrency reduces nondeterminism, making race and deadlock errors more difficult to generate.
    20042010However, preemption is necessary for concurrency that relies on spinning, so there are a class of problems that cannot be programmed without preemption.
    20052011
     
    20672073
    20682074\section{Implementation}
     2075\label{s:Implementation}
    20692076
    20702077Currently, \CFA has fixed-sized stacks, where the stack size can be set at coroutine/thread creation but with no subsequent growth.
    20712078Schemes exist for dynamic stack-growth, such as stack copying and chained stacks.
    2072 However, stack copying requires pointer adjustment to items on the stack, which is impossible without some form of garage collection.
     2079However, stack copying requires pointer adjustment to items on the stack, which is impossible without some form of garbage collection.
    20732080As well, chained stacks require all modules be recompiled to use this feature, which breaks backward compatibility with existing C libraries.
    20742081In the long term, it is likely C libraries will migrate to stack chaining to support concurrency, at only a minimal cost to sequential programs.
     
    20822089In \CFA, ordering of monitor acquisition relies on memory ordering to prevent deadlock~\cite{Havender68}, because all objects have distinct non-overlapping memory layouts, and mutual-exclusion for a monitor is only defined for its lifetime.
    20832090When a mutex call is made, pointers to the concerned monitors are aggregated into a variable-length array and sorted.
    2084 This array persists for the entire duration of the mutual-exclusion and its ordering reused extensively.
    2085 
    2086 To improve performance and simplicity, context switching occur inside a routine call, so only callee-saved registers are copied onto the stack and then the stack register is switched;
     2091This array persists for the entire duration of the mutual-exclusion and is used extensively for synchronization operations.
     2092
     2093To improve performance and simplicity, context switching occurs inside a routine call, so only callee-saved registers are copied onto the stack and then the stack register is switched;
    20872094the corresponding registers are then restored for the other context.
    20882095Note, the instruction pointer is untouched since the context switch is always inside the same routine.
     
    20912098This method is a 2-step context-switch and provides a clear distinction between user and kernel code, where scheduling and other system operations happen.
    20922099The alternative 1-step context-switch uses the \emph{from} thread's stack to schedule and then context-switches directly to the \emph{to} thread's stack.
    2093 Experimental results (not presented) show the performance difference between these two approaches is virtually equivalent, because both approaches are dominated by locking to prevent a race condition.
     2100Experimental results (not presented) show the performance of these two approaches is virtually equivalent, because both approaches are dominated by locking to prevent a race condition.
    20942101
    20952102All kernel threads (@pthreads@) created a stack.
     
    20992106
    21002107Finally, an important aspect for a complete threading system is preemption, which introduces extra non-determinism via transparent interleaving, rather than cooperation among threads for proper scheduling and processor fairness from long-running threads.
    2101 Because preemption frequency is usually long, 1 millisecond, performance cost is negligible.
     2108Because preemption frequency is usually long (1 millisecond) performance cost is negligible.
    21022109Preemption is normally handled by setting a count-down timer on each virtual processor.
    21032110When the timer expires, an interrupt is delivered, and the interrupt handler resets the count-down timer, and if the virtual processor is executing in user code, the signal handler performs a user-level context-switch, or if executing in the language runtime-kernel, the preemption is ignored or rolled forward to the point where the runtime kernel context switches back to user code.
     
    24252432\section{Conclusion}
    24262433
    2427 This paper demonstrate a concurrency API that is simple, efficient, and able to build higher-level concurrency features.
     2434This paper demonstrates a concurrency API that is simple, efficient, and able to build higher-level concurrency features.
    24282435The approach provides concurrency based on a preemptive M:N user-level threading-system, executing in clusters, which encapsulate scheduling of work on multiple kernel threads providing parallelism.
    24292436The M:N model is judged to be efficient and provide greater flexibility than a 1:1 threading model.
     
    24632470\label{futur:tools}
    24642471
    2465 While monitors offer a flexible and powerful concurrent for \CFA, other concurrency tools are also necessary for a complete multi-paradigm concurrency package.
     2472While monitors offer flexible and powerful concurrency for \CFA, other concurrency tools are also necessary for a complete multi-paradigm concurrency package.
    24662473Examples of such tools can include futures and promises~\cite{promises}, executors and actors.
    24672474These additional features are useful when monitors offer a level of abstraction that is inadequate for certain tasks.
  • doc/papers/concurrency/figures/ext_monitor.fig

    r1f133dc radb60242  
    89894 1 -1 0 0 0 12 0.0000 2 135 525 3525 3675 shared\001
    90904 1 -1 0 0 0 12 0.0000 2 135 735 3525 3975 variables\001
    91 4 0 4 50 -1 0 11 0.0000 2 120 135 4150 1875 Y\001
    92 4 0 4 50 -1 0 11 0.0000 2 120 105 4150 2175 Z\001
    93 4 0 4 50 -1 0 11 0.0000 2 120 135 4150 2475 X\001
    94 4 0 4 50 -1 0 11 0.0000 2 120 165 4150 2775 W\001
     914 0 4 50 -1 0 11 0.0000 2 120 135 4150 1875 X\001
     924 0 4 50 -1 0 11 0.0000 2 120 135 4150 2175 Y\001
     934 0 4 50 -1 0 11 0.0000 2 120 135 4150 2475 Y\001
     944 0 4 50 -1 0 11 0.0000 2 120 135 4150 2775 X\001
    95954 0 -1 0 0 3 12 0.0000 2 150 540 5025 4275 urgent\001
    96964 1 0 50 -1 0 11 0.0000 2 165 600 3150 3150 accepted\001
Note: See TracChangeset for help on using the changeset viewer.