Changes in / [210c737:4852232]


Ignore:
Files:
11 deleted
8 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/colby_parsons_MMAth/local.bib

    r210c737 r4852232  
    108108}
    109109
    110 @misc{java:allof:anyof,
    111   author = "Java Util Concurrent",
    112   title = "Class CompletableFuture",
    113   howpublished = {\url{https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html}},
    114   note = "[Online; accessed 23-May-2023]"
    115 }
    116 
    117110@misc{ocaml:channel,
    118111  author = "The OCaml Manual",
     
    209202    version     = {Version 080},
    210203    organization= {Intel},
    211     month       = mar,
     204    month       = march,
    212205    year        = 2023,
    213206}
  • doc/theses/colby_parsons_MMAth/text/actors.tex

    r210c737 r4852232  
    13651365\end{figure}
    13661366
    1367 \begin{figure}
    1368         \centering
    1369         \subfloat[AMD \CFA Matrix Benchmark]{
    1370                 \resizebox{0.5\textwidth}{!}{\input{figures/nasusCFAMatrix.pgf}}
    1371                 \label{f:cfaMatrixAMD}
    1372         }
    1373         \subfloat[Intel \CFA Matrix Benchmark]{
    1374                 \resizebox{0.5\textwidth}{!}{\input{figures/pykeCFAMatrix.pgf}}
    1375                 \label{f:cfaMatrixIntel}
    1376         }
    1377         \caption{The matrix benchmark comparing \CFA stealing heuristics (lower is better).}
    1378 \end{figure}
    1379 
    13801367Figures~\ref{f:cfaRepeatAMD} and~\ref{f:cfaRepeatIntel} show the effects of the stealing heuristics for the repeat benchmark.
    13811368This benchmark is a pathological case for work stealing actor systems, as the majority of work is being performed by the single actor conducting the scatter/gather.
     
    14021389Hence, it is difficult to attribute the AMD gain to the aggressive work stealing in CAF.
    14031390
     1391\begin{figure}
     1392        \centering
     1393        \subfloat[AMD \CFA Matrix Benchmark]{
     1394                \resizebox{0.5\textwidth}{!}{\input{figures/nasusCFAMatrix.pgf}}
     1395                \label{f:cfaMatrixAMD}
     1396        }
     1397        \subfloat[Intel \CFA Matrix Benchmark]{
     1398                \resizebox{0.5\textwidth}{!}{\input{figures/pykeCFAMatrix.pgf}}
     1399                \label{f:cfaMatrixIntel}
     1400        }
     1401        \caption{The matrix benchmark comparing \CFA stealing heuristics (lower is better).}
     1402\end{figure}
     1403
    14041404% Local Variables: %
    14051405% tab-width: 4 %
  • doc/theses/colby_parsons_MMAth/text/channels.tex

    r210c737 r4852232  
    196196
    197197\section{\CFA / Go channel Examples}
    198 To highlight the differences between \CFA's and Go's close semantics, two examples are presented.
     198To highlight the differences between \CFA's and Go's close semantics, three examples are presented.
    199199The first example is a simple shutdown case, where there are producer threads and consumer threads operating on a channel for a fixed duration.
    200200Once the duration ends, producers and consumers terminate immediately leaving unprocessed elements in the channel.
  • doc/theses/colby_parsons_MMAth/text/frontpgs.tex

    r210c737 r4852232  
    8080Concurrent programs are notoriously hard to program and even harder to debug. Furthermore concurrent programs must be performant, as the introduction of concurrency into a program is often done to achieve some form of speedup.
    8181
    82 This thesis presents a suite of high-level concurrent-language features in the new programming language \CFA, all of which are implemented with the aim of improving the performance, productivity, and safety of concurrent programs.
     82This thesis presents a suite of high-level concurrent-language features in the new programming-language \CFA, all of which are implemented with the aim of improving the performance, productivity, and safety of concurrent programs.
    8383\CFA is a non-object-oriented programming language that extends C.
    8484The foundation for concurrency in \CFA was laid by Thierry Delisle~\cite{Delisle18}, who implemented coroutines, user-level threads, and monitors.
  • doc/theses/colby_parsons_MMAth/text/mutex_stmt.tex

    r210c737 r4852232  
    350350
    351351The benchmark used to evaluate the avoidance algorithms repeatedly acquires a fixed number of locks in a random order and then releases them.
    352 The pseudocode for the deadlock avoidance benchmark is shown in \VRef[Figure]{l:deadlock_avoid_pseudo}.
     352The pseudocode for the deadlock avoidance benchmark is shown in \VRef[Listing]{l:deadlock_avoid_pseudo}.
    353353To ensure the comparison exercises the implementation of each lock avoidance algorithm, an identical spinlock is implemented in each language using a set of builtin atomics available in both \CC and \CFA.
    354354The benchmarks are run for a fixed duration of 10 seconds and then terminate.
     
    357357The median is calculated and is plotted alongside the 95\% confidence intervals for each point.
    358358
    359 \begin{figure}
    360 \begin{cfa}
     359\begin{cfa}[caption={Deadlock avoidance benchmark pseudocode},label={l:deadlock_avoid_pseudo}]
     360
    361361size_t n_locks; $\C{// number of locks}$
    362362size_t n_thds; $\C{// number of threads}$
     
    387387    printf( "%lu\n", total );
    388388}
    389 \end{cfa}
    390 \caption{Deadlock avoidance benchmark pseudocode}
    391 \label{l:deadlock_avoid_pseudo}
    392 \end{figure}
     389
     390\end{cfa}
    393391
    394392The performance experiments were run on the following multi-core hardware systems to determine differences across platforms:
  • doc/theses/colby_parsons_MMAth/text/waituntil.tex

    r210c737 r4852232  
    123123
    124124\begin{figure}
    125 \begin{lstlisting}[language=ada,literate=,{moredelim={**[is][\color{red}]{@}{@}}}]
    126 task type buffer is -- thread type
     125\begin{lstlisting}[language=ada,literate=]
     126task type buffer is -- thread
    127127        ... -- buffer declarations
    128128        count : integer := 0;
     
    130130        loop
    131131                select
    132                         @when count < Size@ => -- guard
    133                         @accept insert( elem : in ElemType )@ do  -- method
     132                        when count < Size => -- guard
     133                        accept insert( elem : in ElemType ) do  -- method
    134134                                ... -- add to buffer
    135135                                count := count + 1;
     
    137137                        -- executed if this accept called
    138138                or
    139                         @when count > 0@ => -- guard
    140                         @accept remove( elem : out ElemType )@ do  -- method
     139                        when count > 0 => -- guard
     140                        accept remove( elem : out ElemType ) do  -- method
    141141                                ... --remove and return from buffer via parameter
    142142                                count := count - 1;
    143143                        end;
    144144                        -- executed if this accept called
    145                 or @delay 10.0@;  -- unblock after 10 seconds without call
    146                 or @else@ -- do not block, cannot appear with delay
     145                or delay 10.0;  -- unblock after 10 seconds without call
     146                or else -- do not block, cannot appear with delay
    147147                end select;
    148148        end loop;
     
    222222
    223223\begin{lrbox}{\myboxB}
    224 \begin{lstlisting}[language=uC++,{moredelim={**[is][\color{red}]{@}{@}}}]
     224\begin{lstlisting}[language=uC++=]
    225225_Task BoundedBuffer {
    226226        int * buffer;
     
    243243                for ( ;; ) {
    244244                        _Accept( ~buffer ) break;
    245                         @or _When ( count < Size ) _Accept( insert )@;
    246                         @or _When ( count > 0 ) _Accept( remove )@;
     245                        or _When ( count < Size ) _Accept( insert );
     246                        or _When ( count > 0 ) _Accept( remove );
    247247                }
    248248        }
     
    267267Figure~\ref{l:BB_uC++} shows the outline of a bounded buffer administrator implemented with \uC @_Accept@ statements.
    268268
    269 There are other languages that provide \gls{synch_multiplex}, including Rust's @select!@ over futures~\cite{rust:select}, Java's @allof@/@anyof@ over futures~\cite{java:allof:anyof}, OCaml's @select@ over channels~\cite{ocaml:channel}, and C++14's @when_any@ over futures~\cite{cpp:whenany}.
     269There are other languages that provide \gls{synch_multiplex}, including Rust's @select!@ over futures~\cite{rust:select}, OCaml's @select@ over channels~\cite{ocaml:channel}, and C++14's @when_any@ over futures~\cite{cpp:whenany}.
    270270Note that while C++14 and Rust provide \gls{synch_multiplex}, the implementations leave much to be desired as both rely on polling to wait on multiple resources.
    271271
     
    276276Similar, actor systems circumvent the \gls{synch_multiplex} problem as actors only block when waiting for the next message never in a behaviour.
    277277While these approaches solve the \gls{synch_multiplex} problem, they introduce other issues.
    278 Consider the case where a thread has a single source of communication and it wants a set of $N$ resources.
    279 It must sequentially request the $N$ resources and wait for each response.
    280 During the receives for the $N$ resources, it can receive other communication, and has to save and postpone these communications, or discard them.
     278Consider the case where a thread has a single source of communication and it wants a set of @N@ resources.
     279It must sequentially request the @N@ resources and wait for each response.
     280During the receives for the @N@ resources, it can receive other communication, and has to save and postpone these communications, or discard them.
    281281% If the requests for the other resources need to be retracted, the burden falls on the programmer to determine how to synchronize appropriately to ensure that only one resource is delivered.
    282282
     
    320320int i = 0;
    321321
    322 waituntil( @Lock@ ) { ... }
    323 or when( i == 0 ) waituntil( i << @Channel@ ) { ... }
    324 and waituntil( @Future@ ) { ... }
    325 or waituntil( @timeout( 1`s )@ ) { ... }
     322waituntil( Lock ) { ... }
     323or when( i == 0 ) waituntil( i << Channel ) { ... }
     324and waituntil( Future ) { ... }
     325or waituntil( timeout( 1`s ) ) { ... }
    326326// else { ... }
    327327\end{cfa}
     
    342342\begin{cfa}
    343343future(int) bar, foo;
    344 waituntil( foo ) { ... } or waituntil( bar ) { ... } // prioritize foo
    345 \end{cfa}
    346 The reason for this semantics is that prioritizing resources can be useful in certain problems, such as shutdown.
    347 In the rare case where there is a starvation problem with the ordering, it possible to follow a @waituntil@ with its reverse form, alternating which resource has the highest priority:
     344waituntil( foo ) { ... } or waituntil( bar ) { ... }
     345\end{cfa}
     346The reason for this semantics is that prioritizing resources can be useful in certain problems.
     347In the rare case where there is a starvation problem with the ordering, it possible to follow a @waituntil@ with its reverse form:
    348348\begin{cfa}
    349349waituntil( foo ) { ... } or waituntil( bar ) { ... } // prioritize foo
    350350waituntil( bar ) { ... } or waituntil( foo ) { ... } // prioritize bar
    351351\end{cfa}
    352 While this approach is not general for many resources, it handles many basic cases.
    353352
    354353The \CFA @and@ semantics match the @and@ semantics of \uC \lstinline[language=uC++]{_Select}.
    355354When multiple clauses are joined by @and@, the @waituntil@ makes a thread wait for all to be available, but still runs the corresponding code blocks \emph{as they become available}.
    356 When an @and@ clause becomes available, the waiting thread unblocks and runs that clause's code-block, and then the thread waits again for the next available clause or the @waituntil@ statement is now satisfied.
     355When an @and@ clause becomes available, the waiting thread unblocks and runs that clause's code-block, and then the thread waits again for the next available clause or the @waituntil@ statement is now true.
    357356This semantics allows work to be done in parallel while synchronizing over a set of resources, and furthermore, gives a good reason to use the @and@ operator.
    358357If the @and@ operator waited for all clauses to be available before running, it is the same as just acquiring those resources consecutively by a sequence of @waituntil@ statements.
     
    416415\section{\lstinline{waituntil} Implementation}
    417416
    418 The @waituntil@ statement is not inherently complex, and Figure~\ref{f:WU_Impl} shows the basic outline of the @waituntil@ algorithm.
    419 Complexity comes from the consideration of race conditions and synchronization needed when supporting various primitives.
    420 The following sections use examples to fill in complexity details missing in Figure~\ref{f:WU_Impl}.
    421 After which, the full pseudocode for the @waituntil@ algorithm is presented in Figure~\ref{f:WU_Full_Impl}.
     417The @waituntil@ statement is not inherently complex, and Figure~\ref{f:WU_Impl} only shows the basic outline of the @waituntil@ algorithm.
     418The complexity comes from the consideration of race conditions and synchronization needed when supporting various primitives.
     419The following sections then use examples to fill in details missing in Figure~\ref{f:WU_Impl}.
     420The full pseudocode for the @waituntil@ algorithm is presented in Figure~\ref{f:WU_Full_Impl}.
    422421
    423422\begin{figure}
     
    478477When a @select_node@ reaches the front of the lock's queue and gains ownership, the thread blocked on the @waituntil@ is unblocked.
    479478Now, the lock is held by the @waituntil@ thread until the code block is executed, and then the node is unregistered, during which the lock is released.
    480 Immediately releasing the lock after the code block prevents the waiting thread from holding multiple locks and potentially introducing a deadlock.
     479Immediately releasing the lock prevents the waiting thread from holding multiple locks and potentially introducing a deadlock.
    481480As such, the only unregistered nodes associated with locks are the ones that have not run.
    482481
    483482\subsection{Timeouts}
    484483
    485 A timeout for the @waituntil@ statement is a duration passed to routine \lstinline[deletekeywords={timeout}]{timeout}\footnote{\lstinline{timeout} is a quasi-keyword in \CFA, allowing it to be used an identifier.}, \eg:
     484A timeout for the @waituntil@ statement is a duration passed to \lstinline[deletekeywords={timeout}]{timeout}, \eg:
    486485\begin{cquote}
    487486\begin{tabular}{@{}l|l@{}}
     
    509508Here, the multiple timeouts are useful because the durations can change during execution and the separate clauses provide different code blocks if a timeout triggers.
    510509Multiple timeouts can also be used with @and@ to provide a minimal delay before proceeding.
    511 In following example, either channel @C1@ or @C2@ must be satisfied but nothing can be done for at least 1 or 3 seconds after the channel read, respectively.
     510In following example, either channel @C1@ or @C2@ must be satisfied but nothing can be done for at least 1 or 3 seconds after the channel read, respctively.
    512511\begin{cfa}[deletekeywords={timeout}]
    513512waituntil( i << C1 ); and waituntil( timeout( 1`s ) );
    514513or waituntil( i << C2 ); and waituntil( timeout( 3`s ) );
    515514\end{cfa}
    516 If only @C2@ is satisfied, \emph{both} timeout code-blocks trigger because 1 second occurs before 3 seconds.
     515If only @C2@ is satisfied, \emph{both} timeout code-blocks trigger because 1 second ocurs before 3 seconds.
    517516Note, the \CFA @waitfor@ statement only provides a single @timeout@ clause because it only supports @or@ semantics.
    518517
     
    524523
    525524Channels require more complexity to allow synchronous multiplexing.
    526 For locks, when an outside thread releases a lock and unblocks the @waituntil@ thread (WUT), the lock's MX property is passed to the WUT (no spinning locks).
     525For locks, when an outside thread releases a lock and unblocks the waituntil thread (WUT), the lock's MX property is passed to the WUT (no spinning locks).
    527526For futures, the outside thread deliveries a value to the future and unblocks any waiting threads, including WUTs.
    528 In either case, after the WUT unblocks, it is safe to execute its corresponding code block knowing access to the resource is protected by the lock or the read-only state of the future.
     527In either case, after the WUT unblocks it is safe to execute its the corresponding code block knowing access to the resource is protected by the lock or the read-only state of the future.
    529528Similarly, for channels, when an outside thread inserts a value into a channel, it must unblock the WUT.
    530529However, for channels, there is a race condition that poses an issue.
     
    537536Furthermore, if both @and@ and @or@ operators are used, the @or@ operations stop behaving like exclusive-or due to the race among channel operations, \eg:
    538537\begin{cfa}
    539 int i;
    540538waituntil( i << A ) {} and waituntil( i << B ) {}
    541539or waituntil( i << C ) {} and waituntil( i << D ) {}
    542540\end{cfa}
    543541If exclusive-or semantics are followed, only the code blocks for @A@ and @B@ are run, or the code blocks for @C@ and @D@.
    544 However, four outside threads inserting into each channel can simultaneously put values into @i@ and attempt to unblock the WUT to run the four code-blocks.
     542However, four outside threads can simultaneously put values into @i@ and attempt to unblock the WUT to run the four code-blocks.
    545543This case introduces a race with complexity that increases with the size of the @waituntil@ statement.
    546544However, due to TOCTOU issues, it is impossible to know if all resources are available without acquiring all the internal locks of channels in the subtree of the @waituntil@ clauses.
    547545This approach is a poor solution for two reasons.
    548 It is possible that once all the locks are acquired, the subtree is not satisfied and the locks must be released.
     546It is possible that once all the locks are acquired the subtree is not satisfied and the locks must be released.
    549547This work incurs a high cost for signalling threads and heavily increase contention on internal channel locks.
    550548Furthermore, the @waituntil@ statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible.
    551549As such, the exclusive-or semantics are lost when using both @and@ and @or@ operators since it cannot be supported without significant complexity and significantly affects @waituntil@ performance.
    552 Therefore, the example of reusing variable @i@ by multiple output channels is considered a user error without exclusive-or semantics.
    553 Given aliasing in C, it is impossible to even warn of the potential race.
    554 In the future, it would be interesting to support Go-like syntax, \lstinline[language=Go]{case i := <- ...}, defining a new scoped @i@ variable for each clause.
    555550
    556551It was deemed important that exclusive-or semantics are maintained when only @or@ operators are used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel.
     
    558553\begin{cquote}
    559554\begin{tabular}{@{}l|l|l@{}}
    560 \multicolumn{3}{@{}l@{}}{\lstinline{channel(int) A, B; // zero size channels}} \\
     555\multicolumn{3}{@{}l@{}}{\lstinline{channel A, B; // zero size channels}} \\
    561556thread 1 & thread 2 & thread 3 \\
    562557\begin{cfa}
     
    587582Channels introduce another interesting implementation issue.
    588583Supporting both reading and writing to a channel in a @waituntil@ means that one @waituntil@ clause may be the notifier of another @waituntil@ clause.
    589 This conjunction poses a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel.
     584This poses a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel.
    590585Consider the following example, alongside a described ordering of events to highlight the race.
    591586\begin{cquote}
    592587\begin{tabular}{@{}l|l@{}}
    593 \multicolumn{2}{@{}l@{}}{\lstinline{channel(int) A, B; // zero size channels}} \\
     588\multicolumn{2}{@{}l@{}}{\lstinline{channel A, B; // zero size channels}} \\
    594589thread 1 & thread 2 \\
    595590\begin{cfa}[moredelim={**[is][\color{blue}]{\#}{\#}}]
     
    604599\end{tabular}
    605600\end{cquote}
    606 Assume thread 1 executes first, registers with channel @A@ and proceeds in the @waituntil@.
    607 Since @A@ is empty, thread 1 cannot remove, and then thread 1 is interrupted before registering with @B@.
    608 Thread 2 similarly registers with channel @B@, and proceeds in the @waituntil@.
    609 Since @B@ is zero size there is no space to insert, and then thread 2 is interrupted before registering with @A@.
     601Assume thread 1 executes first, registers with channel @A@ and proceeds, since it is empty, and then is interrupted before registering with @B@.
     602Thread 2 similarly registers with channel @B@, and proceeds, since it does not have space to insert, and then is interrupted before registering with @A@.
    610603At this point, thread 1 and 2 resume execution.
    611 Remember from above, each exclusive-or @waituntil@ holds a race to set the winning clause of the statement.
    612 The issue that arises is that these two @waituntil@ statements must have matching winning clauses (both @A@ clauses or both @B@ clauses) to preserve the exclusive-or semantics, since a zero-sized channel needs an insert/remove pair for an operation to occur.
    613 If threads 1 and 2 race to set a winner only in their own @waituntil@, thread 1 can think it successfully removed from @B@, and thread 2 can think it successfully inserted into @A@, which is an error.
    614 Hence, there is a race on two fronts.
    615 If thread 1 wins the race and sees that @B@ has a waiting insertion, then thread 2 must execute the first clause of its @waituntil@ and thread 1 execute its second.
    616 Correspondingly, if thread 2 wins the race and sees that @A@ has a waiting removal, then thread 1 must execute the first clause of its @waituntil@ and thread 2 execute its second.
    617 Any other execution scenario is incorrect for exclusive-or semantics.
    618 Note, that priority execution of multiple satisfied @waituntil@ causes (\ie top to bottom) is not violated because, in this scenario, there is only one satisfied clause for either thread.
     604There is now a race that must be dealt with on two fronts.
     605If thread 1 and 2 only race to the \gls{cas}, \ie a clause in their own @waituntil@, thread 1 can think that it successfully removed from @B@, and thread 2 may think it successfully inserted into @A@, when only one of these operations occurs.
    619606
    620607The Go @select@ solves this problem by acquiring all the internal locks of the channels before registering the @select@ on the channels.
    621608This approach eliminates the race shown above since thread 1 and 2 cannot both be registering at the same time.
    622609However, this approach cannot be used in \CFA, since the @waituntil@ is polymorphic.
    623 Not all types in a @waituntil@ have an internal lock, and when using non-channel types, acquiring all the locks incurs extra unneeded overhead.
     610Not all types in a @waituntil@ have an internal lock, and when using non-channel types acquiring all the locks incurs extra unneeded overhead.
    624611Instead, this race is consolidated in \CFA in two phases by having an intermediate pending status value for the race.
    625612This race case is detectable, and if detected, each thread first races to set its own @waituntil@ race pointer to be pending.
     
    639626
    640627    // Try to set other status, if we succeed break and return true
    641     while( ! CAS( other.clause_status, &cmp_status, SAT ) ) {
     628    while( !CAS( other.clause_status, &cmp_status, SAT ) ) {
    642629        if ( cmp_status == SAT )
    643630            return false; // If other status is SAT we lost so return false
     
    650637
    651638        // Attempt to set own status flag back to PENDING to retry
    652         if ( ! CAS( mine.clause_status, &cmp_status, PENDING ) )
     639        if ( !CAS( mine.clause_status, &cmp_status, PENDING ) )
    653640            return false; // If we fail then we lost so return false
    654641       
     
    666653These exception mechanisms are supported through the @on_selected@ routine.
    667654This routine is needed by channels to detect if they are closed after unblocking in a @waituntil@ statement, to ensure the appropriate behaviour is taken and an exception is thrown.
    668 Hence, the channel close-down mechanism is handled correctly.
    669655
    670656\subsection{Guards and Statement Predicate}\label{s:wu_guards}
     
    672658It is trivial to check when a synchronous multiplexing utility is done for the or/xor relationship, since any resource becoming available means that the blocked thread can proceed and the @waituntil@ statement is finished.
    673659In \uC and \CFA, the \gls{synch_multiplex} mechanism have both an and/or relationship, which along with guards, make the problem of checking for completion of the statement difficult.
    674 Consider the following @waituntil@.
     660Consider the @waituntil@ in Figure~\ref{f:WU_ComplexPredicate}.
     661When the @waituntil@ thread wakes up, checking if the statement is complete is non-trivial.
     662The predicate that will return if the statement in Figure~\ref{f:WU_ComplexPredicate} is satisfied is the following.
     663\begin{cfa}
     664A && B || C || !GA && B || !GB && A || !GA && !GB && !GC
     665\end{cfa}
     666Which simplifies to:
     667\begin{cfa}
     668( A || !GA ) && ( B || !GB ) || C || !GA && !GB && !GC
     669\end{cfa}
     670Checking a predicate this large with each iteration is expensive so \uC and \CFA both take steps to simplify checking statement completion.
     671
     672\begin{figure}
    675673\begin{cfa}
    676674when( GA ) waituntil( A ) {}
     
    678676or when( GC ) waituntil( C ) {}
    679677\end{cfa}
    680 When the @waituntil@ thread wakes up, the following predicate represents satisfaction:
    681 \begin{cfa}
    682 A && B || C || ! GA && B || ! GB && A || ! GA && ! GB && ! GC
    683 \end{cfa}
    684 which can be simplified to:
    685 \begin{cfa}
    686 ( A || ! GA ) && ( B || ! GB ) || C || ! GA && ! GB && ! GC
    687 \end{cfa}
    688 Checking the complete predicate on each iteration of the pending @waituntil@ is expensive so \uC and \CFA both take steps to simplify checking for statement completion.
     678\caption{\lstinline{waituntil} with a non-trivial predicate}
     679\label{f:WU_ComplexPredicate}
     680\end{figure}
    689681
    690682In the \uC @_Select@ statement, this problem is solved by constructing a tree of the resources, where the internal nodes are operators and the leaves are booleans storing the state of each resource.
    691 A diagram of the tree for the complete predicate above is shown in Figure~\ref{f:uC_select_tree}, alongside the modification of the tree that occurs when @GA@ is @false@.
    692 Each internal node stores the statuses of the two subtrees beneath it.
     683The internal nodes also store the statuses of the two subtrees beneath them.
    693684When resources become available, their corresponding leaf node status is modified, which percolates up the tree to update the state of the statement.
    694685Once the root of the tree has both subtrees marked as @true@ then the statement is complete.
    695686As an optimization, when the internal nodes are updated, the subtrees marked as @true@ are pruned and not examined again.
    696687To support statement guards in \uC, the tree is modified to remove an internal node if a guard is false to maintain the appropriate predicate representation.
     688An diagram of the tree for the statement in Figure~\ref{f:WU_ComplexPredicate} is shown in Figure~\ref{f:uC_select_tree}, alongside the modification of the tree that occurs when @GA@ is @false@.
    697689
    698690\begin{figure}
     
    700692\input{diagrams/uCpp_select_tree.tikz}
    701693\end{center}
    702 \caption{\uC \lstinline[language=uC++]{select} tree modification}
     694\caption{\uC select tree modification}
    703695\label{f:uC_select_tree}
    704696\end{figure}
    705697
    706 The \CFA @waituntil@ statement blocks a thread until a set of resources have become available that satisfy the complete predicate of a @waituntil@.
    707 The waiting condition of the @waituntil@ statement is implemented as the complete predicate over the resources, joined by the @waituntil@ operators, where a resource is @true@ if it is available, and @false@ otherwise.
    708 This complete predicate is used as the mechanism to check if a thread is done waiting on a @waituntil@.
     698The \CFA @waituntil@ statement blocks a thread until a set of resources have become available that satisfy the underlying predicate.
     699The waiting condition of the @waituntil@ statement can be represented as a predicate over the resources, joined by the @waituntil@ operators, where a resource is @true@ if it is available, and @false@ otherwise.
     700In \CFA, this representation is used as the mechanism to check if a thread is done waiting on the @waituntil@.
    709701Leveraging the compiler, a predicate routine is generated per @waituntil@ that when passes the statuses of the resources, returns @true@ when the @waituntil@ is done, and false otherwise.
    710702To support guards on the \CFA @waituntil@ statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate.
    711703The generated code allows the predicate that is checked with each iteration to be simplified to not check guard values.
    712 For example, the following is generated for the complete predicate above:
     704For example, the following would be generated for the @waituntil@ shown in Figure~\ref{f:WU_ComplexPredicate}.
    713705\begin{cfa}
    714706// statement completion predicate
     
    716708    return nodes[0].status && nodes[1].status || nodes[2].status;
    717709}
    718 if ( GA || GB || GC ) {                         $\C{// skip statement if all guards false}$
     710
     711// skip statement if all guards false
     712if ( GA || GB || GC ) {
    719713    select_node nodes[3];
    720     nodes[0].status = ! GA && GB;       $\C{// A's status}$
    721     nodes[1].status = ! GB && GA;       $\C{// B's status}$
    722     nodes[2].status = ! GC;                     $\C{// C's status}$
     714    nodes[0].status = !GA && GB; // A's status
     715    nodes[1].status = !GB && GA; // B's status
     716    nodes[2].status = !GC;       // C's status
     717
    723718    // ... rest of waituntil codegen ...
     719
    724720}
    725721\end{cfa}
     
    727723\uC's @_Select@, supports operators both inside and outside of the \lstinline[language=uC++]{_Select} clauses.
    728724In the following example, the code blocks run once their corresponding predicate inside the round braces is satisfied.
     725
    729726\begin{lstlisting}[language=uC++,{moredelim=**[is][\color{red}]{@}{@}}]
    730727Future_ISM<int> A, B, C, D;
     
    732729and _Select( @D && E@ ) { ... }
    733730\end{lstlisting}
    734 This feature is more expressive than the @waituntil@ statement in \CFA, allowing the code block for @&&@ to only run after \emph{both} resources are available.
     731This is more expressive that the @waituntil@ statement in \CFA, allowing the code block for @&&@ to only run after both resources are available.
    735732
    736733In \CFA, since the @waituntil@ statement supports more resources than just futures, implementing operators inside clauses is avoided for a few reasons.
     
    741738or waituntil( C && D ) { ... }
    742739\end{cfa}
    743 If the @waituntil@ acquires each lock as it becomes available, there is a possible deadlock since it is in a hold-and-wait situation.
     740If the @waituntil@ acquires each lock as it becomes available, there is a possible deadlock since it is in a hold and wait situation.
    744741Other semantics are needed to ensure this operation is safe.
    745742One possibility is to use \CC's @scoped_lock@ approach described in Section~\ref{s:DeadlockAvoidance};
     
    747744Another possibility is to use resource ordering similar to \CFA's @mutex@ statement, but that alone is insufficient, if the resource ordering is not used universally.
    748745One other way this could be implemented is to wait until all resources for a given clause are available before proceeding to acquire them, but this also quickly becomes a poor approach.
    749 This approach does not work due to \gls{toctou} issues, \ie it is impossible to ensure that the full set of resources are available without holding them all first.
    750 Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems.
    751 Finally, the problem of operators inside clauses is also a difficult to handle when supporting channels.
    752 It would require some way to ensure channels used with internal operators are modified, if and only if, the corresponding code block is run.
    753 However, that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}.
    754 Ultimately, I did not deemed this feature to be crucial enough when taking into account the complexity and runtime cost.
     746This approach does not work due to \gls{toctou} issues;
     747it is impossible to ensure that the full set of resources are available without holding them all first.
     748Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems involved, but it was not deemed an important feature when taking into account the runtime cost need paid to handle this situation.
     749The problem of operators inside clauses also becomes a difficult issue to handle when supporting channels.
     750It would require some way to ensure channels used with internal operators are modified, if and only if, the corresponding code block is run, but that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}.
    755751
    756752\subsection{The full \lstinline{waituntil} picture}
    757753
    758 Given all the complex details handled by the @waituntil@, its full pseudocode is presented in Figure \ref{f:WU_Full_Impl}.
     754Now the details have been discussed, the full pseudocode of the @waituntil@ is presented in Figure~\ref{f:WU_Full_Impl}.
    759755Some things to note are as follows.
    760756The @finally@ blocks provide exception-safe \gls{raii} unregistering of nodes, and in particular, the @finally@ inside the innermost loop performs the immediate unregistering required for deadlock-freedom mentioned in Section~\ref{s:wu_locks}.
     
    804800\end{figure}
    805801
    806 \section{\lstinline{waituntil} Performance}
     802\section{Waituntil Performance}
    807803
    808804Similar facilities to @waituntil@ are discussed in Section~\ref{s:History} covering C, Ada, Rust, \CC, and OCaml.
    809805However, these facilities are either not meaningful or feasible to benchmark against.
    810806The UNIX @select@ and related utilities are not comparable since they are system calls that go into the kernel and operate on file descriptors, whereas the @waituntil@ exists solely in user space.
    811 Ada's \lstinline[language=Ada]{select} and \uC's \lstinline[language=uC++]{_Accept} only operate on method calls, which is done in \CFA via the @waitfor@ statement.
     807Ada's \lstinline[language=Ada]{select} and \uC's \lstinline[language=uC++]{_Accept} only operates on method calls, which is done in \CFA via the @waitfor@ statement, so it is not meaningful to benchmark against the @waituntil@, which cannot wait on this resource.
    812808Rust and \CC only offer a busy-wait approach, which is not comparable to a blocking approach.
    813809OCaml's @select@ waits on channels that are not comparable with \CFA and Go channels, so OCaml @select@ is not benchmarked against Go's @select@ and \CFA's @waituntil@.
     
    821817The channel multiplexing benchmarks compare \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}, where the resource being waited on is a set of channels.
    822818The basic structure of the benchmark has the number of cores split evenly between producer and consumer threads, \ie, with 8 cores there are 4 producer and 4 consumer threads.
    823 The number of resource clauses $C$ is also varied across 2, 4, and 8 clauses, where each clause has a different channel that it waits on.
     819The number of resource clauses $C$ is also varied across 2, 4, and 8 clauses, where each clause has a different channel that is waits on.
    824820Each producer and consumer repeatedly waits to either produce or consume from one of the $C$ clauses and respective channels.
    825821For example, in \CFA syntax, the work loop in the consumer main with $C = 4$ clauses is:
     
    900896Go then is holding a lock for every channel, resulting in worse performance as the number of channels increase.
    901897In \CFA, since races are consolidated without holding all locks, it scales much better both with cores and clauses since more work can occur in parallel.
    902 This scalability difference is more significant on the Intel machine than the AMD machine since the Intel has lower cache-contention costs.
     898This scalability difference is more significant on the Intel machine than the AMD machine since the Intel machine has lower cache contention costs.
    903899
    904900The Go approach of holding all internal channel-locks in the \lstinline[language=Go]{select} has additional drawbacks.
     
    932928In Go, this setup results in significantly worse performance since @P2@ and @C2@ cannot operate in parallel with @P1@ and @C1@ due to all locks being acquired.
    933929Interesting, this case may not be as pathological as it seems.
    934 If the set of channels belonging to a \lstinline[language=Go]{select} have channels that overlap with the set of another \lstinline[language=Go]{select}, these statements lose the ability to operate in parallel.
     930If the set of channels belonging to a select have channels that overlap with the set of another select, these statements lose the ability to operate in parallel.
    935931The implementation in \CFA only holds a single lock at a time, resulting in better locking granularity, and hence, more parallelism.
    936932Comparison of this pathological case is shown in Table~\ref{t:pathGo}.
    937 The AMD results highlight the worst-case scenario for Go since contention is more costly on this machine than the Intel machine.
     933The AMD results highlight the worst case scenario for Go since contention is more costly on this machine than the Intel machine.
    938934
    939935\begin{table}[t]
     
    966962The @waituntil@ statement checks for statement completion using a predicate function, whereas the \lstinline[language=uC++]{_Select} statement maintains a tree that represents the state of the internal predicate.
    967963
     964\begin{figure}
     965        \centering
     966        \subfloat[AMD Future Synchronization Benchmark]{
     967                \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Future.pgf}}
     968                \label{f:futureAMD}
     969        }
     970        \subfloat[Intel Future Synchronization Benchmark]{
     971                \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Future.pgf}}
     972                \label{f:futureIntel}
     973        }
     974        \caption{\CFA \lstinline{waituntil} and \uC \lstinline{_Select} statement throughput synchronizing on a set of futures with varying wait predicates (higher is better).}
     975        \caption{}
     976        \label{f:futurePerf}
     977\end{figure}
     978
    968979This benchmark aims to indirectly measure the impact of various predicates on the performance of the @waituntil@ and \lstinline[language=uC++]{_Select} statements.
    969980The benchmark is indirect since the performance of futures in \CFA and \uC differ by a significant margin.
     
    10041015
    10051016Results of this benchmark are shown in Figure~\ref{f:futurePerf}.
    1006 Each pair of bars is marked with the predicate name for that experiment and the value at the top of each bar is the standard deviation.
     1017Each pair of bars is marked with the predicate name for that experiment and the value at the top of each bar is the standard deviation..
    10071018In detail, \uC results are lower in all cases due to the performance difference between futures and the more complex \gls{synch_multiplex} implementation.
    10081019However, the bars for both systems have similar height patterns across the experiments.
     
    10111022Interestingly, \CFA has lower variation across predicates on the AMD (excluding the special OR case), whereas \uC has lower variation on the Intel.
    10121023Given the differences in semantics and implementation between \uC and \CFA, this test only illustrates the overall costs among the different kinds of predicates.
    1013 
    1014 \begin{figure}
    1015         \centering
    1016         \subfloat[AMD Future Synchronization Benchmark]{
    1017                 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Future.pgf}}
    1018                 \label{f:futureAMD}
    1019         }
    1020         \subfloat[Intel Future Synchronization Benchmark]{
    1021                 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Future.pgf}}
    1022                 \label{f:futureIntel}
    1023         }
    1024         \caption{\CFA \lstinline{waituntil} and \uC \lstinline{_Select} statement throughput synchronizing on a set of futures with varying wait predicates (higher is better).}
    1025         \label{f:futurePerf}
    1026 %%%%%%%%%%%%%%%
    1027 \vspace*{5.5in}
    1028 \end{figure}
  • libcfa/src/heap.cfa

    r210c737 r4852232  
    1010// Created On       : Tue Dec 19 21:58:35 2017
    1111// Last Modified By : Peter A. Buhr
    12 // Last Modified On : Fri Jul 28 18:27:53 2023
    13 // Update Count     : 1612
     12// Last Modified On : Fri Dec 30 08:37:37 2022
     13// Update Count     : 1605
    1414//
    1515
     
    369369static __thread size_t PAD1 CALIGN TLSMODEL __attribute__(( unused )); // protect false sharing
    370370static __thread Heap * heapManager CALIGN TLSMODEL;
    371 static  __thread bool heapManagerBootFlag CALIGN TLSMODEL = false;
    372371static __thread size_t PAD2 CALIGN TLSMODEL __attribute__(( unused )); // protect further false sharing
    373372
     
    489488                        allocUnfreed = 0;
    490489                        #endif // __CFA_DEBUG__
    491                         heapManagerBootFlag = true;
    492490                } // with
    493491        } // if
     
    501499
    502500        lock( heapMaster.mgrLock );                                                     // protect heapMaster counters
    503 
    504         assert( ! heapManagerBootFlag );
    505501
    506502        // get storage for heap manager
     
    518514
    519515void heapManagerDtor() libcfa_public {
    520   if ( unlikely( ! heapManagerBootFlag ) ) return;              // thread never used ?
    521 
    522516        lock( heapMaster.mgrLock );
    523517
     
    532526        // Do not set heapManager to NULL because it is used after Cforall is shutdown but before the program shuts down.
    533527
    534         heapManagerBootFlag = false;
    535528        unlock( heapMaster.mgrLock );
    536529} // heapManagerDtor
     
    542535extern int cfa_main_returned;                                                   // from interpose.cfa
    543536extern "C" {
    544         void memory_startup( void ) {                                           // singleton => called once at start of program
     537        void memory_startup( void ) {
    545538                if ( ! heapMasterBootFlag ) heapManagerCtor();  // sanity check
    546539        } // memory_startup
     
    902895#endif // __STATISTICS__
    903896
    904 // Uncomment to get allocation addresses for a 0-sized allocation rather than a null pointer.
    905 //#define __NONNULL_0_ALLOC__
    906 #if ! defined( __NONNULL_0_ALLOC__ )
    907 #define __NULL_0_ALLOC__ unlikely( size == 0 ) ||               /* 0 BYTE ALLOCATION RETURNS NULL POINTER */
    908 #else
    909 #define __NULL_0_ALLOC__
    910 #endif // __NONNULL_0_ALLOC__
    911 
    912897#define PROLOG( counter, ... ) \
    913898        BOOT_HEAP_MANAGER; \
    914         if ( \
    915                 __NULL_0_ALLOC__ \
     899        if ( unlikely( size == 0 ) ||                                           /* 0 BYTE ALLOCATION RETURNS NULL POINTER */ \
    916900                unlikely( size > ULONG_MAX - sizeof(Heap.Storage) ) ) { /* error check */ \
    917901                STAT_0_CNT( counter ); \
  • libcfa/src/iostream.cfa

    r210c737 r4852232  
    1010// Created On       : Wed May 27 17:56:53 2015
    1111// Last Modified By : Peter A. Buhr
    12 // Last Modified On : Sun Jul 30 23:01:00 2023
    13 // Update Count     : 1406
     12// Last Modified On : Tue Jul 18 13:56:01 2023
     13// Update Count     : 1403
    1414//
    1515
     
    230230        ostype & ?|?( ostype & os, const char s[] ) {
    231231                enum { Open = 1, Close, OpenClose };
    232                 static const unsigned char mask[256] @= {               // 256 covers all Latin-1 characters
     232                static const unsigned char mask[256] @= {
    233233                        // opening delimiters, no space after
    234234                        ['('] : Open, ['['] : Open, ['{'] : Open,
Note: See TracChangeset for help on using the changeset viewer.