Changeset adb60242 for doc/papers/concurrency/Paper.tex
- Timestamp:
- Jun 30, 2018, 8:35:13 AM (6 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, no_list, persistent-indexer, pthread-emulation, qualifiedEnum
- Children:
- 2ebcb28, 35a9e41
- Parents:
- 1f133dc
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/papers/concurrency/Paper.tex
r1f133dc radb60242 242 242 \abstract[Summary]{ 243 243 \CFA is a modern, polymorphic, \emph{non-object-oriented} extension of the C programming language. 244 This paper discusses the design of the concurrency and parallelism features in \CFA, and theconcurrent runtime-system.244 This paper discusses the design of the concurrency and parallelism features in \CFA, and its concurrent runtime-system. 245 245 These features are created from scratch as ISO C lacks concurrency, relying largely on the pthreads library for concurrency. 246 Coroutines and lightweight (user) threads are introduced into the language.247 In addition, monitors are added as a high-level mechanism for mutual exclusion and synchronization.248 A unique contribution is allowing multiple monitors to be safely acquired simultaneously.246 Coroutines and lightweight (user) threads are introduced into \CFA; 247 as well, monitors are added as a high-level mechanism for mutual exclusion and synchronization. 248 A unique contribution of this work is allowing multiple monitors to be safely acquired \emph{simultaneously}. 249 249 All features respect the expectations of C programmers, while being fully integrate with the \CFA polymorphic type-system and other language features. 250 Finally, experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming-languages.250 Experimental results show comparable performance of the new features with similar mechanisms in other concurrent programming-languages. 251 251 }% 252 252 … … 593 593 594 594 While the focus of this discussion is concurrency and parallelism, it is important to address coroutines, which are a significant building block of a concurrency system (but not concurrent among themselves). 595 Coroutines are generalized routines allowing execution to be temporarily suspend and later resumed.595 Coroutines are generalized routines allowing execution to be temporarily suspended and later resumed. 596 596 Hence, unlike a normal routine, a coroutine may not terminate when it returns to its caller, allowing it to be restarted with the values and execution location present at the point of suspension. 597 This capability is accomplish via the coroutine's stack, where suspend/resume context switch among stacks.597 This capability is accomplished via the coroutine's stack, where suspend/resume context switch among stacks. 598 598 Because threading design-challenges are present in coroutines, their design effort is relevant, and this effort can be easily exposed to programmers giving them a useful new programming paradigm because a coroutine handles the class of problems that need to retain state between calls, \eg plugins, device drivers, and finite-state machines. 599 599 Therefore, the two fundamental features of the core \CFA coroutine-API are independent call-stacks and @suspend@/@resume@ operations. … … 750 750 \end{quote} 751 751 The example takes advantage of resuming a coroutine in the constructor to prime the loops so the first character sent for formatting appears inside the nested loops. 752 The destruct ionprovides a newline, if formatted text ends with a full line.753 Figure~\ref{f:CFmt} shows the C equivalent formatter, where the loops of the coroutine are flatten (linearized) and rechecked on each call because execution location is not retained between calls.752 The destructor provides a newline, if formatted text ends with a full line. 753 Figure~\ref{f:CFmt} shows the C equivalent formatter, where the loops of the coroutine are flattened (linearized) and rechecked on each call because execution location is not retained between calls. 754 754 (Linearized code is the bane of device drivers.) 755 755 … … 956 956 \end{cfa} 957 957 and the programming language (and possibly its tool set, \eg debugger) may need to understand @baseCoroutine@ because of the stack. 958 Furthermore, the execution of construct s/destructors is in the wrong order for certain operations.958 Furthermore, the execution of constructors/destructors is in the wrong order for certain operations. 959 959 For example, for threads if the thread is implicitly started, it must start \emph{after} all constructors, because the thread relies on a completely initialized object, but the inherited constructor runs \emph{before} the derived. 960 960 … … 1019 1019 forall( `dtype` T | is_coroutine(T) ) void resume( T & ); 1020 1020 \end{cfa} 1021 The @dtype@ property of the trait ensures the coroutine descriptor is non-copyable, so all coroutines must be passed by reference (pointer).1021 The @dtype@ property provides no implicit copying operations and the @is_coroutine@ trait provides no explicit copying operations, so all coroutines must be passed by reference (pointer). 1022 1022 The routine definitions ensures there is a statically-typed @main@ routine that is the starting point (first stack frame) of a coroutine, and a mechanism to get (read) the currently executing coroutine handle. 1023 1023 The @main@ routine has no return value or additional parameters because the coroutine type allows an arbitrary number of interface routines with corresponding arbitrary typed input/output values versus fixed ones. … … 1169 1169 The allocation/deallocation pattern appears unusual because allocated objects are immediately deallocated without any intervening code. 1170 1170 However, for threads, the deletion provides implicit synchronization, which is the intervening code. 1171 While the subtotals are added in linear order rather than completion order, which slight inhibits concurrency, the computation is restricted by the critical-path thread (\ie the thread that takes the longest), and so any inhibited concurrency is very small as totalling the subtotals is trivial.1171 While the subtotals are added in linear order rather than completion order, which slightly inhibits concurrency, the computation is restricted by the critical-path thread (\ie the thread that takes the longest), and so any inhibited concurrency is very small as totalling the subtotals is trivial. 1172 1172 1173 1173 \begin{figure} 1174 1174 \begin{cfa} 1175 thread Adder { 1176 int * row, cols, & subtotal; $\C{// communication}$ 1177 }; 1175 `thread` Adder { int * row, cols, & subtotal; } $\C{// communication variables}$ 1178 1176 void ?{}( Adder & adder, int row[], int cols, int & subtotal ) { 1179 1177 adder.[ row, cols, &subtotal ] = [ row, cols, &subtotal ]; … … 1189 1187 Adder * adders[rows]; 1190 1188 for ( int r = 0; r < rows; r += 1 ) { $\C{// start threads to sum rows}$ 1191 adders[r] = new( matrix[r], cols, &subtotals[r] );1189 adders[r] = `new( matrix[r], cols, &subtotals[r] );` 1192 1190 } 1193 1191 for ( int r = 0; r < rows; r += 1 ) { $\C{// wait for threads to finish}$ 1194 delete( adders[r] );$\C{// termination join}$1192 `delete( adders[r] );` $\C{// termination join}$ 1195 1193 total += subtotals[r]; $\C{// total subtotal}$ 1196 1194 } … … 1208 1206 To reestablish meaningful execution requires mechanisms to reintroduce determinism, \ie restrict non-determinism, called mutual exclusion and synchronization, where mutual exclusion is an access-control mechanism on data shared by threads, and synchronization is a timing relationship among threads~\cite[\S~4]{Buhr05a}. 1209 1207 Since many deterministic challenges appear with the use of mutable shared state, some languages/libraries disallow it, \eg Erlang~\cite{Erlang}, Haskell~\cite{Haskell}, Akka~\cite{Akka} (Scala). 1210 In these paradigms, interaction among concurrent objects is performed by stateless message-passing~\cite{Thoth,Harmony,V-Kernel} or other paradigms closely relate to networking concepts, \eg channels~\cite{CSP,Go}.1208 In these paradigms, interaction among concurrent objects is performed by stateless message-passing~\cite{Thoth,Harmony,V-Kernel} or other paradigms closely related to networking concepts, \eg channels~\cite{CSP,Go}. 1211 1209 However, in call/return-based languages, these approaches force a clear distinction, \ie introduce a new programming paradigm between regular and concurrent computation, \eg routine call versus message passing. 1212 1210 Hence, a programmer must learn and manipulate two sets of design patterns. 1213 1211 While this distinction can be hidden away in library code, effective use of the library still has to take both paradigms into account. 1214 In contrast, approaches based on stateful lmodels more closely resemble the standard call/return programming-model, resulting in a single programming paradigm.1212 In contrast, approaches based on stateful models more closely resemble the standard call/return programming-model, resulting in a single programming paradigm. 1215 1213 1216 1214 At the lowest level, concurrent control is implemented by atomic operations, upon which different kinds of locking mechanisms are constructed, \eg semaphores~\cite{Dijkstra68b}, barriers, and path expressions~\cite{Campbell74}. … … 1252 1250 Algorithms that allow a barger, but divert it until later using current synchronization state (flags), are avoiding the barger; 1253 1251 algorithms that preclude a barger from entering during synchronization in the critical section prevent barging completely. 1254 Techniques like baton-pass locks~\cite{Andrews89} between threads instead of unconditionally releasing locks is an example of barging prevention.1252 Techniques like baton-passing locks~\cite{Andrews89} between threads instead of unconditionally releasing locks is an example of barging prevention. 1255 1253 1256 1254 … … 1265 1263 Copying a lock is insecure because it is possible to copy an open lock and then use the open copy when the original lock is closed to simultaneously access the shared data. 1266 1264 Copying a monitor is secure because both the lock and shared data are copies, but copying the shared data is meaningless because it no longer represents a unique entity. 1267 As for coroutines/tasks, a non-copyable (@dtype@) trait is used to capture this requirement, so all locks/monitors must be passed by reference (pointer).1265 As for coroutines/tasks, the @dtype@ property provides no implicit copying operations and the @is_monitor@ trait provides no explicit copying operations, so all locks/monitors must be passed by reference (pointer). 1268 1266 \begin{cfa} 1269 1267 trait is_monitor( `dtype` T ) { … … 1277 1275 \label{s:MutexAcquisition} 1278 1276 1279 While correctness impli citly implies a monitor's mutual exclusion is acquired and released, there are implementation options about when and where the locking/unlocking occurs.1277 While correctness implies a monitor's mutual exclusion is acquired and released, there are implementation options about when and where the locking/unlocking occurs. 1280 1278 (Much of this discussion also applies to basic locks.) 1281 1279 For example, a monitor may need to be passed through multiple helper routines before it becomes necessary to acquire the monitor mutual-exclusion. … … 1306 1304 \begin{cfa} 1307 1305 monitor Tree { 1308 Tree * left, right;1306 Tree * left, * right; 1309 1307 // value 1310 1308 }; 1311 1309 void print( Tree & mutex tree ) { $\C{// prefix traversal}$ 1312 1310 // write value 1313 print( tree->left ); $\C{// multiply acquire monitor lock on each recursion}$1311 print( tree->left ); $\C{// multiply acquire monitor lock for tree on each recursion}$ 1314 1312 print( tree->right ); 1315 1313 } 1316 1314 \end{cfa} 1317 1315 1318 The benefit of mandatory monitor qualifiers is self-documentation, but requiring both @mutex@ and \lstinline[morekeywords=nomutex]@nomutex@ for all monitor parameter is redundant.1319 Instead, one of qualifier semantics can bethe default, and the other required.1320 For example, assume the safe @mutex@ qualifier for all monitor parametersbecause assuming \lstinline[morekeywords=nomutex]@nomutex@ may cause subtle errors.1321 On the other hand, assuming \lstinline[morekeywords=nomutex]@nomutex@ is the \emph{normal} parameter behaviour, stating explicitly \emph{this parameter is not special}.1316 The benefit of mandatory monitor qualifiers is self-documentation, but requiring both @mutex@ and \lstinline[morekeywords=nomutex]@nomutex@ for all monitor parameters is redundant. 1317 Instead, the semantics have one qualifier as the default, and the other required. 1318 For example, make the safe @mutex@ qualifier the default because assuming \lstinline[morekeywords=nomutex]@nomutex@ may cause subtle errors. 1319 Alternatively, make the unsafe \lstinline[morekeywords=nomutex]@nomutex@ qualifier the default because it is the \emph{normal} parameter semantics while @mutex@ parameters are rare. 1322 1320 Providing a default qualifier implies knowing whether a parameter is a monitor. 1323 1321 Since \CFA relies heavily on traits as an abstraction mechanism, the distinction between a type that is a monitor and a type that looks like a monitor can become blurred. … … 1368 1366 Having language support for such a feature is therefore a significant asset for \CFA. 1369 1367 1370 The capability to acquire multiple locks before entering a critical section is called \newterm{bulk acquire} .1368 The capability to acquire multiple locks before entering a critical section is called \newterm{bulk acquire} (see Section~\ref{s:Implementation} for implementation details). 1371 1369 In the previous example, \CFA guarantees the order of acquisition is consistent across calls to different routines using the same monitors as arguments. 1372 1370 This consistent ordering means acquiring multiple monitors is safe from deadlock. … … 1386 1384 1387 1385 However, such use leads to lock acquiring order problems resulting in deadlock~\cite{Lister77}, where detecting it requires dynamic tracking of monitor calls, and dealing with it requires rollback semantics~\cite{Dice10}. 1388 In \CFA, safety is guaranteed by using bulk acquire of all monitors to shared objects, whereas other monitor systems provide no aid.1389 While \CFA provides only a partial solution, the \CFA partial solution handles many useful cases.1386 In \CFA, a safety aid is provided by using bulk acquire of all monitors to shared objects, whereas other monitor systems provide no aid. 1387 While \CFA provides only a partial solution, it handles many useful cases, \eg: 1390 1388 \begin{cfa} 1391 1389 monitor BankAccount { ... }; … … 1438 1436 For example, Figure~\ref{f:GenericBoundedBuffer} shows a bounded buffer that may be full/empty so produce/consumer threads must block. 1439 1437 Leaving the monitor and trying again (busy waiting) is impractical for high-level programming. 1440 Monitors eliminate busy waiting by providing internal synchronization to schedule threads needing access to the shared data, where the synchronization is blocking (threads are parked)versus spinning.1438 Monitors eliminate busy waiting by providing synchronization to schedule threads needing access to the shared data, where threads block versus spinning. 1441 1439 Synchronization is generally achieved with internal~\cite{Hoare74} or external~\cite[\S~2.9.2]{uC++} scheduling, where \newterm{scheduling} defines which thread acquires the critical section next. 1442 1440 \newterm{Internal scheduling} is characterized by each thread entering the monitor and making an individual decision about proceeding or blocking, while \newterm{external scheduling} is characterized by an entering thread making a decision about proceeding for itself and on behalf of other threads attempting entry. … … 1531 1529 External scheduling is controlled by the @waitfor@ statement, which atomically blocks the calling thread, releases the monitor lock, and restricts the routine calls that can next acquire mutual exclusion. 1532 1530 If the buffer is full, only calls to @remove@ can acquire the buffer, and if the buffer is empty, only calls to @insert@ can acquire the buffer. 1533 Threads making calls to routines that are currently excluded, block outside (external) of the monitor on a calling queue, versus blocking on condition queues inside (internal) of the monitor. 1534 % External scheduling is more constrained and explicit, which helps programmers reduce the non-deterministic nature of concurrency. 1531 Threads making calls to routines that are currently excluded, block outside of (external to) the monitor on a calling queue, versus blocking on condition queues inside of (internal to) the monitor. 1535 1532 External scheduling allows users to wait for events from other threads without concern of unrelated events occurring. 1536 1533 The mechnaism can be done in terms of control flow, \eg Ada @accept@ or \uC @_Accept@, or in terms of data, \eg Go channels. … … 1549 1546 A thread blocks until an appropriate partner arrives. 1550 1547 The complexity is exchanging phone numbers in the monitor because of the mutual-exclusion property. 1551 For signal scheduling, the @exchange@ condition is necessary to block the thread finding the match, while the matcher unblocks to take the oppos e number, post its phone number, and unblock the partner.1548 For signal scheduling, the @exchange@ condition is necessary to block the thread finding the match, while the matcher unblocks to take the opposite number, post its phone number, and unblock the partner. 1552 1549 For signal-block scheduling, the implicit urgent-queue replaces the explict @exchange@-condition and @signal_block@ puts the finding thread on the urgent condition and unblocks the matcher. 1553 1550 … … 1570 1567 wait( Girls[ccode] ); 1571 1568 GirlPhNo = phNo; 1572 ` exchange.signal();`1569 `signal( exchange );` 1573 1570 } else { 1574 1571 GirlPhNo = phNo; 1575 1572 `signal( Boys[ccode] );` 1576 ` exchange.wait();`1573 `wait( exchange );` 1577 1574 } // if 1578 1575 return BoyPhNo; … … 1649 1646 } 1650 1647 \end{cfa} 1651 must acquire a set of monitor locks greater than or equal to the number of locks forthe waiting thread signalled from the condition queue.1648 must have acquired at least the same locks as the waiting thread signalled from the condition queue. 1652 1649 1653 1650 Similarly, for @waitfor( rtn )@, the default semantics is to atomically block the acceptor and release all acquired mutex types in the parameter list, \ie @waitfor( rtn, m1, m2 )@. … … 1665 1662 \end{cfa} 1666 1663 1667 Given the ability to release a subset of acquired monitors can result in a \newterm{nested monitor}~\cite{Lister77} deadlock.1664 The ability to release a subset of acquired monitors can result in a \newterm{nested monitor}~\cite{Lister77} deadlock. 1668 1665 \begin{cfa} 1669 1666 void foo( M & mutex m1, M & mutex m2 ) { … … 1676 1673 1677 1674 Finally, an important aspect of monitor implementation is barging, \ie can calling threads barge ahead of signalled threads? 1678 If barging is allowed, synchronization between a singller and signallee is difficult, often requiring multiple unblock/block cycles (looping around a wait rechecking if a condition is met). 1675 If barging is allowed, synchronization between a signaller and signallee is difficult, often requiring multiple unblock/block cycles (looping around a wait rechecking if a condition is met). 1676 In fact, signals-as-hints is completely opposite from that proposed by Hoare in the seminal paper on monitors: 1679 1677 \begin{quote} 1680 1678 However, we decree that a signal operation be followed immediately by resumption of a waiting program, without possibility of an intervening procedure call from yet a third program. … … 1682 1680 \end{quote} 1683 1681 \CFA scheduling \emph{precludes} barging, which simplifies synchronization among threads in the monitor and increases correctness. 1682 Furthermore, \CFA concurrency has no spurious wakeup~\cite[\S~9]{Buhr05a}, which eliminates an implict form of barging. 1684 1683 For example, there are no loops in either bounded buffer solution in Figure~\ref{f:GenericBoundedBuffer}. 1685 1684 Supporting barging prevention as well as extending internal scheduling to multiple monitors is the main source of complexity in the design and implementation of \CFA concurrency. 1686 Furthermore, there is no spurious wakeup~\cite[\S~9]{Buhr05a} in \CFA, which eliminates an implict form of barging.1687 1685 1688 1686 … … 1772 1770 1773 1771 In an object-oriented programming-language, a class includes an exhaustive list of operations. 1774 However, new members can be added via static inheritance or dyna ic members, \eg JavaScript~\cite{JavaScript}.1772 However, new members can be added via static inheritance or dynamic members, \eg JavaScript~\cite{JavaScript}. 1775 1773 Similarly, monitor routines can be added at any time in \CFA, making it less clear for programmers and more difficult to implement. 1776 1774 \begin{cfa} … … 1781 1779 void h( M & mutex m ) { waitfor( `f` ); } $\C{// unclear which f}$ 1782 1780 \end{cfa} 1783 Hence, the cfa-code for theentering a monitor looks like:1781 Hence, the cfa-code for entering a monitor looks like: 1784 1782 \begin{cfa} 1785 1783 if ( $\textrm{\textit{monitor is free}}$ ) $\LstCommentStyle{// \color{red}enter}$ … … 1842 1840 \end{cfa} 1843 1841 Again, the set of monitors passed to the @waitfor@ statement must be entirely contained in the set of monitors already acquired by the accepting routine. 1844 1845 Note, for internal and external scheduling with multiple monitors, a signalling or accepting thread must match exactly, \ie partial matching results in waiting. 1846 \begin{cquote} 1842 Also, the order of the monitors in a @waitfor@ statement is unimportant. 1843 1844 Figure~\ref{f:UnmatchedMutexSets} shows an example where, for internal and external scheduling with multiple monitors, a signalling or accepting thread must match exactly, \ie partial matching results in waiting. 1845 For both examples, the set of monitors is disjoint so unblocking is impossible. 1846 1847 \begin{figure} 1847 1848 \lstDeleteShortInline@% 1848 1849 \begin{tabular}{@{}l@{\hspace{\parindentlnth}}|@{\hspace{\parindentlnth}}l@{}} … … 1857 1858 wait( c ); 1858 1859 } 1860 g( `m11`, m2 ); // block on wait 1861 f( `m12`, m2 ); // cannot fulfil 1862 \end{cfa} 1863 & 1864 \begin{cfa} 1865 monitor M1 {} m11, m12; 1866 monitor M2 {} m2; 1867 1868 void f( M1 & mutex m1, M2 & mutex m2 ) { 1869 1870 } 1871 void g( M1 & mutex m1, M2 & mutex m2 ) { 1872 waitfor( f, m1, m2 ); 1873 } 1859 1874 g( `m11`, m2 ); // block on accept 1860 1875 f( `m12`, m2 ); // cannot fulfil 1861 1876 \end{cfa} 1862 &1863 \begin{cfa}1864 monitor M1 {} m11, m12;1865 monitor M2 {} m2;1866 1867 void f( M1 & mutex m1, M2 & mutex m2 ) {1868 1869 }1870 void g( M1 & mutex m1, M2 & mutex m2 ) {1871 waitfor( f, m1, m2 );1872 }1873 g( `m11`, m2 ); // block on accept1874 f( `m12`, m2 ); // cannot fulfil1875 \end{cfa}1876 1877 \end{tabular} 1877 1878 \lstMakeShortInline@% 1878 \end{cquote} 1879 For both internal and external scheduling, the set of monitors is disjoint so unblocking is impossible. 1879 \caption{Unmatched \protect\lstinline@mutex@ sets} 1880 \label{f:UnmatchedMutexSets} 1881 \end{figure} 1880 1882 1881 1883 1882 1884 \subsection{Extended \protect\lstinline@waitfor@} 1883 1885 1884 The extended form of the @waitfor@ statement conditionally accepts one of a group of mutex routines and allows a specific action to be performed \emph{after} the mutex routine finishes. 1886 Figure~\ref{f:ExtendedWaitfor} show the extended form of the @waitfor@ statement to conditionally accept one of a group of mutex routines, with a specific action to be performed \emph{after} the mutex routine finishes. 1887 For a @waitfor@ clause to be executed, its @when@ must be true and an outstanding call to its corresponding member(s) must exist. 1888 The \emph{conditional-expression} of a @when@ may call a routine, but the routine must not block or context switch. 1889 If there are multiple acceptable mutex calls, selection occurs top-to-bottom (prioritized) in the @waitfor@ clauses, whereas some programming languages with similar mechanisms accept non-deterministically for this case, \eg Go \lstinline[morekeywords=select]@select@. 1890 If some accept guards are true and there are no outstanding calls to these members, the acceptor is accept-blocked until a call to one of these members is made. 1891 If all the accept guards are false, the statement does nothing, unless there is a terminating @else@ clause with a true guard, which is executed instead. 1892 Hence, the terminating @else@ clause allows a conditional attempt to accept a call without blocking. 1893 If there is a @timeout@ clause, it provides an upper bound on waiting. 1894 If both a @timeout@ clause and an @else@ clause are present, the @else@ must be conditional, or the @timeout@ is never triggered. 1895 In all cases, the statement following is executed \emph{after} a clause is executed to know which of the clauses executed. 1896 1897 \begin{figure} 1885 1898 \begin{cfa} 1886 1899 `when` ( $\emph{conditional-expression}$ ) $\C{// optional guard}$ … … 1898 1911 $\emph{statement}$ $\C{// action when no immediate calls}$ 1899 1912 \end{cfa} 1900 For a @waitfor@ clause to be executed, its @when@ must be true and an outstanding call to its corresponding member(s) must exist. 1901 The \emph{conditional-expression} of a @when@ may call a routine, but the routine must not block or context switch. 1902 If there are several mutex calls that can be accepted, selection occurs top-to-bottom in the @waitfor@ clauses versus non-deterministically. 1903 If some accept guards are true and there are no outstanding calls to these members, the acceptor is accept-blocked until a call to one of these members is made. 1904 If all the accept guards are false, the statement does nothing, unless there is a terminating @else@ clause with a true guard, which is executed instead. 1905 Hence, the terminating @else@ clause allows a conditional attempt to accept a call without blocking. 1906 If there is a @timeout@ clause, it provides an upper bound on waiting, and can only appear with a conditional @else@, otherwise the timeout cannot be triggered. 1907 In all cases, the statement following is executed \emph{after} a clause is executed to know which of the clauses executed. 1913 \caption{Extended \protect\lstinline@waitfor@} 1914 \label{f:ExtendedWaitfor} 1915 \end{figure} 1908 1916 1909 1917 Note, a group of conditional @waitfor@ clauses is \emph{not} the same as a group of @if@ statements, e.g.: … … 1918 1926 \begin{cfa} 1919 1927 void insert( Buffer(T) & mutex buffer, T elem ) with( buffer ) { 1920 if ( count == BufferSize)1928 if ( count == 10 ) 1921 1929 waitfor( remove, buffer ) { 1922 elements[back] = elem; 1923 back = ( back + 1 ) % BufferSize; 1924 count += 1; 1930 // insert elem into buffer 1925 1931 } or `waitfor( ^?{}, buffer )` throw insertFail; 1926 1932 } … … 2000 2006 \label{s:fibers} 2001 2007 2002 A variant of user thread is \newterm{fibers}, which removes preemption, \eg Go~\cite{Go} .2003 Like functional programming, which removes mutation and its associated problems, removing preemption from concurrency reduces nondeterminism, hence race and deadlock errors aremore difficult to generate.2008 A variant of user thread is \newterm{fibers}, which removes preemption, \eg Go~\cite{Go} @goroutine@s. 2009 Like functional programming, which removes mutation and its associated problems, removing preemption from concurrency reduces nondeterminism, making race and deadlock errors more difficult to generate. 2004 2010 However, preemption is necessary for concurrency that relies on spinning, so there are a class of problems that cannot be programmed without preemption. 2005 2011 … … 2067 2073 2068 2074 \section{Implementation} 2075 \label{s:Implementation} 2069 2076 2070 2077 Currently, \CFA has fixed-sized stacks, where the stack size can be set at coroutine/thread creation but with no subsequent growth. 2071 2078 Schemes exist for dynamic stack-growth, such as stack copying and chained stacks. 2072 However, stack copying requires pointer adjustment to items on the stack, which is impossible without some form of gar age collection.2079 However, stack copying requires pointer adjustment to items on the stack, which is impossible without some form of garbage collection. 2073 2080 As well, chained stacks require all modules be recompiled to use this feature, which breaks backward compatibility with existing C libraries. 2074 2081 In the long term, it is likely C libraries will migrate to stack chaining to support concurrency, at only a minimal cost to sequential programs. … … 2082 2089 In \CFA, ordering of monitor acquisition relies on memory ordering to prevent deadlock~\cite{Havender68}, because all objects have distinct non-overlapping memory layouts, and mutual-exclusion for a monitor is only defined for its lifetime. 2083 2090 When a mutex call is made, pointers to the concerned monitors are aggregated into a variable-length array and sorted. 2084 This array persists for the entire duration of the mutual-exclusion and i ts ordering reused extensively.2085 2086 To improve performance and simplicity, context switching occur inside a routine call, so only callee-saved registers are copied onto the stack and then the stack register is switched;2091 This array persists for the entire duration of the mutual-exclusion and is used extensively for synchronization operations. 2092 2093 To improve performance and simplicity, context switching occurs inside a routine call, so only callee-saved registers are copied onto the stack and then the stack register is switched; 2087 2094 the corresponding registers are then restored for the other context. 2088 2095 Note, the instruction pointer is untouched since the context switch is always inside the same routine. … … 2091 2098 This method is a 2-step context-switch and provides a clear distinction between user and kernel code, where scheduling and other system operations happen. 2092 2099 The alternative 1-step context-switch uses the \emph{from} thread's stack to schedule and then context-switches directly to the \emph{to} thread's stack. 2093 Experimental results (not presented) show the performance difference betweenthese two approaches is virtually equivalent, because both approaches are dominated by locking to prevent a race condition.2100 Experimental results (not presented) show the performance of these two approaches is virtually equivalent, because both approaches are dominated by locking to prevent a race condition. 2094 2101 2095 2102 All kernel threads (@pthreads@) created a stack. … … 2099 2106 2100 2107 Finally, an important aspect for a complete threading system is preemption, which introduces extra non-determinism via transparent interleaving, rather than cooperation among threads for proper scheduling and processor fairness from long-running threads. 2101 Because preemption frequency is usually long , 1 millisecond,performance cost is negligible.2108 Because preemption frequency is usually long (1 millisecond) performance cost is negligible. 2102 2109 Preemption is normally handled by setting a count-down timer on each virtual processor. 2103 2110 When the timer expires, an interrupt is delivered, and the interrupt handler resets the count-down timer, and if the virtual processor is executing in user code, the signal handler performs a user-level context-switch, or if executing in the language runtime-kernel, the preemption is ignored or rolled forward to the point where the runtime kernel context switches back to user code. … … 2425 2432 \section{Conclusion} 2426 2433 2427 This paper demonstrate a concurrency API that is simple, efficient, and able to build higher-level concurrency features.2434 This paper demonstrates a concurrency API that is simple, efficient, and able to build higher-level concurrency features. 2428 2435 The approach provides concurrency based on a preemptive M:N user-level threading-system, executing in clusters, which encapsulate scheduling of work on multiple kernel threads providing parallelism. 2429 2436 The M:N model is judged to be efficient and provide greater flexibility than a 1:1 threading model. … … 2463 2470 \label{futur:tools} 2464 2471 2465 While monitors offer a flexible and powerful concurrentfor \CFA, other concurrency tools are also necessary for a complete multi-paradigm concurrency package.2472 While monitors offer flexible and powerful concurrency for \CFA, other concurrency tools are also necessary for a complete multi-paradigm concurrency package. 2466 2473 Examples of such tools can include futures and promises~\cite{promises}, executors and actors. 2467 2474 These additional features are useful when monitors offer a level of abstraction that is inadequate for certain tasks.
Note: See TracChangeset
for help on using the changeset viewer.