Changeset c68f6e6 for doc/theses


Ignore:
Timestamp:
Jul 31, 2023, 4:38:35 PM (12 months ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
07b59ec
Parents:
2e94f3e7 (diff), 17c13b9 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Location:
doc/theses/colby_parsons_MMAth
Files:
1 added
5 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/colby_parsons_MMAth/Makefile

    r2e94f3e7 rc68f6e6  
    8585        diagrams/cyclic_swap \
    8686        diagrams/steal \
     87        diagrams/uCpp_select_tree \
    8788}
    8889
  • doc/theses/colby_parsons_MMAth/glossary.tex

    r2e94f3e7 rc68f6e6  
    4141\newabbreviation{dwcas}{DWCAS}{\Newterm{double-wide (width) compare-and-set (swap)}}
    4242\newabbreviation{dcas}{DCAS}{\Newterm{double compare-and-set (swap)}}
    43 \newabbreviation{dcasw}{DCASW}{\Newterm{weak double compare-and-set (swap)}}
     43\newabbreviation{qpcas}{QPCAS}{\Newterm{queue pointer compare-and-set (swap)}}
    4444\newabbreviation{ll}{LL}{\Newterm{load linked}}
    4545\newabbreviation{sc}{SC}{\Newterm{store conditional}}
  • doc/theses/colby_parsons_MMAth/text/actors.tex

    r2e94f3e7 rc68f6e6  
    578578
    579579In more detail, the \CFA work-stealing algorithm begins by iterating over its message queues twice without finding any work before it tries to steal a queue from another worker.
    580 Stealing a queue is done wait-free (\ie no busy waiting) with a few atomic instructions that only create contention with other stealing workers not the victim.
     580Stealing a queue is done atomically with a few atomic instructions that only create contention with other stealing workers not the victim.
    581581The complexity in the implementation is that victim gulping does not take the mailbox queue;
    582582rather it atomically transfers the mailbox nodes to another queue leaving the mailbox empty, as discussed in Section~\ref{s:executor}.
     
    700700\subsection{Queue Pointer Swap}\label{s:swap}
    701701
    702 To atomically swap the two @worker_queues@ pointers during work stealing, a novel wait-free swap-algorithm is needed.
     702To atomically swap the two @worker_queues@ pointers during work stealing, a novel atomic swap-algorithm is needed.
    703703The \gls{cas} is a read-modify-write instruction available on most modern architectures.
    704704It atomically compares two memory locations, and if the values are equal, it writes a new value into the first memory location.
     
    737737}
    738738\end{cfa}
    739 and can swap two values, where the comparisons are superfluous.
     739\gls{dcas} can be used to swap two values; for this use case the comparisons are superfluous.
    740740\begin{cfa}
    741741DCAS( x, y, x, y, y, x );
    742742\end{cfa}
    743743A restrictive form of \gls{dcas} can be simulated using \gls{ll}/\gls{sc}~\cite{Brown13} or more expensive transactional memory with the same progress property problems as LL/SC.
    744 (There is waning interest in transactional memory and it seems to be fading away.)
     744% (There is waning interest in transactional memory and it seems to be fading away.)
    745745
    746746Similarly, very few architectures have a true memory/memory swap instruction (Motorola M68K, SPARC 32-bit).
     
    749749
    750750Either a true memory/memory swap instruction or a \gls{dcas} would provide the ability to atomically swap two memory locations, but unfortunately neither of these instructions are supported on the architectures used in this work.
    751 Hence, a novel atomic swap for this use case is simulated, called \gls{dcasw}.
    752 The \gls{dcasw} is effectively a \gls{dcas} special cased in two ways:
     751Hence, a novel atomic swap specific to the actor use case is simulated, called \gls{qpcas}.
     752The \gls{qpcas} is effectively a \gls{dcas} special cased in a few ways:
    753753\begin{enumerate}
    754754\item
    755755It works on two separate memory locations, and hence, is logically the same as.
    756756\begin{cfa}
    757 bool DCASW( T * dst, T * src ) {
     757bool QPCAS( T * dst, T * src ) {
    758758        return DCAS( dest, src, *dest, *src, *src, *dest );
    759759}
     
    762762The values swapped are never null pointers, so a null pointer can be used as an intermediate value during the swap.
    763763\end{enumerate}
    764 Figure~\ref{f:dcaswImpl} shows the \CFA pseudocode for the \gls{dcasw}.
     764Figure~\ref{f:qpcasImpl} shows the \CFA pseudocode for the \gls{qpcas}.
    765765In detail, a thief performs the following steps to swap two pointers:
    766766\begin{enumerate}[start=0]
     
    770770verifies the stored copy of the victim queue pointer, @vic_queue@, is valid.
    771771If @vic_queue@ is null, then the victim queue is part of another swap so the operation fails.
    772 No state has changed at this point so no fixup is needed.
    773 Note, @my_queue@ can never be equal to null at this point since thieves only set their own queues pointers to null when stealing.
    774 At no other point is a queue pointer set to null.
    775 Since each worker owns a disjoint range of the queue array, it is impossible for @my_queue@ to be null.
    776 Note, this algorithm is simplified due to each worker owning a disjoint range, allowing only the @vic_queue@ to be checked for null.
    777 This was not listed as a special case of this algorithm, since this requirement can be avoided by modifying Step 1 of Figure~\ref{f:dcaswImpl} to also check @my_queue@ for null.
    778 Further discussion of this generalization is omitted since it is not needed for the presented application.
     772No state has changed at this point so the thief just returns.
     773Note, thieves only set their own queues pointers to null when stealing, and queue pointers are not set to null anywhere else.
     774As such, it is impossible for @my_queue@ to be null since each worker owns a disjoint range of the queue array.
     775Hence, only @vic_queue@ is checked for null.
    779776\item
    780777attempts to atomically set the thief's queue pointer to null.
     
    782779At this point, the thief-turned-victim fails, and since it has not changed any state, it just returns false.
    783780If the @CAS@ succeeds, the thief's queue pointer is now null.
    784 Nulling the pointer is safe since only thieves look at other worker's queue ranges, and whenever thieves need to dereference a queue pointer, it is checked for null.
     781Only thieves look at other worker's queue ranges, and whenever thieves need to dereference a queue pointer, it is checked for null.
     782A thief can only see the null queue pointer when looking for queues to steal or attempting a queue swap.
     783If looking for queues, the thief will skip the null pointer, thus only the queue swap case needs to be considered for correctness.
     784
    785785\item
    786786attempts to atomically set the victim's queue pointer to @my_queue@.
     
    788788If the @CAS@ fails, the thief's queue pointer must be restored to its previous value before returning.
    789789\item
    790 set the thief's queue pointer to @vic_queue@ completing the swap.
     790sets the thief's queue pointer to @vic_queue@ completing the swap.
    791791\end{enumerate}
    792792
     
    820820}
    821821\end{cfa}
    822 \caption{DCASW Concurrent}
    823 \label{f:dcaswImpl}
     822\caption{QPCAS Concurrent}
     823\label{f:qpcasImpl}
    824824\end{figure}
    825825
    826826\begin{theorem}
    827 \gls{dcasw} is correct in both the success and failure cases.
     827\gls{qpcas} is correct in both the success and failure cases.
    828828\end{theorem}
    829 To verify sequential correctness, Figure~\ref{f:seqSwap} shows a simplified \gls{dcasw}.
     829To verify sequential correctness, Figure~\ref{f:seqSwap} shows a simplified \gls{qpcas}.
    830830Step 1 is missing in the sequential example since it only matters in the concurrent context.
    831831By inspection, the sequential swap copies each pointer being swapped, and then the original values of each pointer are reset using the copy of the other pointer.
     
    845845}
    846846\end{cfa}
    847 \caption{DCASW Sequential}
     847\caption{QPCAS Sequential}
    848848\label{f:seqSwap}
    849849\end{figure}
    850850
    851 To verify concurrent correctness, it is necessary to show \gls{dcasw} is wait-free, \ie all thieves fail or succeed in swapping the queues in a finite number of steps.
    852 This property is straightforward, because there are no locks or looping.
    853 As well, there is no retry mechanism in the case of a failed swap, since a failed swap either means the work is already stolen or that work is stolen from the thief.
    854 In both cases, it is apropos for a thief to give up stealing.
    855 
    856 The proof of correctness is shown through the existence of an invariant.
     851% All thieves fail or succeed in swapping the queues in a finite number of steps.
     852% This is straightforward, because there are no locks or looping.
     853% As well, there is no retry mechanism in the case of a failed swap, since a failed swap either means the work is already stolen or that work is stolen from the thief.
     854% In both cases, it is apropos for a thief to give up stealing.
     855
     856The concurrent proof of correctness is shown through the existence of an invariant.
    857857The invariant states when a queue pointer is set to @0p@ by a thief, then the next write to the pointer can only be performed by the same thief.
    858858To show that this invariant holds, it is shown that it is true at each step of the swap.
     
    877877Once a thief atomically sets their queue pointer to be @0p@ in step 2, the invariant guarantees that that pointer does not change.
    878878In the success case of step 3, it is known the value of the victim's queue-pointer, which is not overwritten, must be @vic_queue@ due to the use of @CAS@.
    879 Given that the pointers all have unique memory locations, this first write of the successful swap is correct since it can only occur when the pointer has not changed.
     879Given that the pointers all have unique memory locations (a pointer is never swapped with itself), this first write of the successful swap is correct since it can only occur when the pointer has not changed.
    880880By the invariant, the write back in the successful case is correct since no other worker can write to the @0p@ pointer.
    881881In the failed case of step 3, the outcome is correct in steps 1 and 2 since no writes have occurred so the program state is unchanged.
    882882Therefore, the program state is safely restored to the state it had prior to the @0p@ write in step 2, because the invariant makes the write back to the @0p@ pointer safe.
    883 Note that the assumption of the pointers having unique memory locations prevents the ABA problem in this usage of \gls{dcasw}, but it is not needed for correctness of the general \gls{dcasw} operation.
     883Note that the pointers having unique memory locations prevents the ABA problem.
    884884
    885885\begin{comment}
     
    905905First it is important to state that a thief does not attempt to steal from themselves.
    906906As such, the victim here is not also a thief.
    907 Stepping through the code in \ref{f:dcaswImpl}, for all thieves, steps 0-1 succeed since the victim is not stealing and has no queue pointers set to be @0p@.
     907Stepping through the code in \ref{f:qpcasImpl}, for all thieves, steps 0-1 succeed since the victim is not stealing and has no queue pointers set to be @0p@.
    908908Similarly, for all thieves, step 2 succeed since no one is stealing from any of the thieves.
    909909In step 3, the first thief to @CAS@ wins the race and successfully swaps the queue pointer.
  • doc/theses/colby_parsons_MMAth/text/conclusion.tex

    r2e94f3e7 rc68f6e6  
    1010The @waituntil@ statement aids in writing concurrent programs in both the message passing and shared memory paradigms of concurrency.
    1111Furthermore, no other language provides a synchronous multiplexing tool polymorphic over resources like \CFA's @waituntil@.
    12 From the novel copy-queue data structure in the actor system and the plethora of user-supporting safety features, all these utilities build upon existing tools with value added.
     12
     13On overview of the contributions in this thesis include the following:
     14\begin{enumerate}
     15\item The mutex statement, which provides performant and deadlock-free multiple lock acquisition.
     16\item Channels with comparable performance to Go, that have safety and productivity features including deadlock detection and an easy-to-use exception-based channel @close@ routine.
     17\item An in-memory actor system that achieved the lowest latency message send of systems tested due to the novel copy-queue data structure. The actor system presented has built-in detection of six common actor errors, and it has good performance compared to other systems on all benchmarks.
     18\item A @waituntil@ statement which tackles the hard problem of allowing a thread to safely synchronously wait for some set of concurrent resources.
     19\end{enumerate}
     20
     21From the novel copy-queue data structure in the actor system and the plethora of user-supporting safety features, all these utilities build upon existing concurrent tooling with value added.
    1322Performance results verify that each new feature is comparable or better than similar features in other programming languages.
    14 
    15 \PAB{This part seems a little short.}
     23All in all, this suite of concurrent tools expands users' ability to easily write safe and performant multi-threaded programs in \CFA.
    1624
    1725\section{Future Work}
  • doc/theses/colby_parsons_MMAth/text/waituntil.tex

    r2e94f3e7 rc68f6e6  
    382382More detail on channels and their interaction with @waituntil@ appear in Section~\ref{s:wu_chans}.
    383383
    384 The trait is used by having a blocking object return a type that supports the @is_selectable@ trait.
     384The trait can be used directly by having a blocking object support the @is_selectable@ trait, or it can be used indirectly through routines that take the object as an argument.
     385When used indirectly, the object's routine returns a type that supports the @is_selectable@ trait.
    385386This feature leverages \CFA's ability to overload on return type to select the correct overloaded routine for the @waituntil@ context.
    386 A selectable type is needed for types that want to support multiple operations such as channels that allow both reading and writing.
     387Indirect support through routines is needed for types that want to support multiple operations such as channels that allow both reading and writing.
    387388
    388389\section{\lstinline{waituntil} Implementation}
     
    520521This work incurs a high cost for signalling threads and heavily increase contention on internal channel locks.
    521522Furthermore, the @waituntil@ statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible.
    522 As such, the exclusive-or semantics is lost when using both @and@ and @or@ operators since it cannot be supported without significant complexity and significantly affects @waituntil@ performance.
     523As such, the exclusive-or semantics are lost when using both @and@ and @or@ operators since it cannot be supported without significant complexity and significantly affects @waituntil@ performance.
    523524
    524525It was deemed important that exclusive-or semantics are maintained when only @or@ operators are used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel.
     
    591592If any other threads attempt to set a WUT's race pointer and see a pending value, they wait until the value changes before proceeding to ensure that, in the case the WUT fails, the signal is not lost.
    592593This protocol ensures that signals cannot be lost and that the two races can be resolved in a safe manner.
    593 \PAB{I bet one of the readers is going to ask you to write the pseudo code for this algorithm.}
     594The implementation of this protocol is shown in Figure~\ref{f:WU_DeadlockAvoidance}.
     595
     596\begin{figure}
     597\begin{cfa}
     598bool pending_set_other( select_node & other, select_node & mine ) {
     599    unsigned long int cmp_status = UNSAT;
     600
     601    // Try to set other status, if we succeed break and return true
     602    while( !CAS( other.clause_status, &cmp_status, SAT ) ) {
     603        if ( cmp_status == SAT )
     604            return false; // If other status is SAT we lost so return false
     605
     606        // Toggle own status flag to allow other thread to potentially win
     607        mine.status = UNSAT;
     608
     609        // Reset compare flag
     610        cmp_status = UNSAT;
     611
     612        // Attempt to set own status flag back to PENDING to retry
     613        if ( !CAS( mine.clause_status, &cmp_status, PENDING ) )
     614            return false; // If we fail then we lost so return false
     615       
     616        // Reset compare flag
     617        cmp_status = UNSAT;
     618    }
     619    return true;
     620}
     621\end{cfa}
     622\caption{Exclusive-or \lstinline{waituntil} channel deadlock avoidance protocol}
     623\label{f:WU_DeadlockAvoidance}
     624\end{figure}
    594625
    595626Channels in \CFA have exception-based shutdown mechanisms that the @waituntil@ statement needs to support.
     
    600631
    601632It is trivial to check when a synchronous multiplexing utility is done for the or/xor relationship, since any resource becoming available means that the blocked thread can proceed and the @waituntil@ statement is finished.
    602 In \uC and \CFA, the \gls{synch_multiplex} mechanism have both an and/or relationship, which make the problem of checking for completion of the statement difficult.
    603 \PAB{Show an example of why this is difficult.}
     633In \uC and \CFA, the \gls{synch_multiplex} mechanism have both an and/or relationship, which along with guards, make the problem of checking for completion of the statement difficult.
     634Consider the @waituntil@ in Figure~\ref{f:WU_ComplexPredicate}.
     635When the @waituntil@ thread wakes up, checking if the statement is complete is non-trivial.
     636The predicate that will return if the statement in Figure~\ref{f:WU_ComplexPredicate} is satisfied is the following.
     637\begin{cfa}
     638A && B || C || !GA && B || !GB && A || !GA && !GB && !GC
     639\end{cfa}
     640Which simplifies to:
     641\begin{cfa}
     642( A || !GA ) && ( B || !GB ) || C || !GA && !GB && !GC
     643\end{cfa}
     644Checking a predicate this large with each iteration is expensive so \uC and \CFA both take steps to simplify checking statement completion.
     645
     646\begin{figure}
     647\begin{cfa}
     648when( GA ) waituntil( A ) {}
     649and when( GB ) waituntil( B ) {}
     650or when( GC ) waituntil( C ) {}
     651\end{cfa}
     652\caption{\lstinline{waituntil} with a non-trivial predicate}
     653\label{f:WU_ComplexPredicate}
     654\end{figure}
    604655
    605656In the \uC @_Select@ statement, this problem is solved by constructing a tree of the resources, where the internal nodes are operators and the leaves are booleans storing the state of each resource.
     
    608659Once the root of the tree has both subtrees marked as @true@ then the statement is complete.
    609660As an optimization, when the internal nodes are updated, the subtrees marked as @true@ are pruned and not examined again.
    610 To support statement guards in \uC, the tree prunes a branch if the corresponding guard is false.
    611 \PAB{Show an example.}
     661To support statement guards in \uC, the tree is modified to remove an internal node if a guard is false to maintain the appropriate predicate representation.
     662An diagram of the tree for the statement in Figure~\ref{f:WU_ComplexPredicate} is shown in Figure~\ref{f:uC_select_tree}, alongside the modification of the tree that occurs when @GA@ is @false@.
     663
     664\begin{figure}
     665\begin{center}
     666\input{diagrams/uCpp_select_tree.tikz}
     667\end{center}
     668\caption{\uC select tree modification}
     669\label{f:uC_select_tree}
     670\end{figure}
    612671
    613672The \CFA @waituntil@ statement blocks a thread until a set of resources have become available that satisfy the underlying predicate.
     
    616675Leveraging the compiler, a predicate routine is generated per @waituntil@ that when passes the statuses of the resources, returns @true@ when the @waituntil@ is done, and false otherwise.
    617676To support guards on the \CFA @waituntil@ statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate.
    618 \PAB{Show an example.}
     677The generated code allows the predicate that is checked with each iteration to be simplified to not check guard values.
     678For example, the following would be generated for the @waituntil@ shown in Figure~\ref{f:WU_ComplexPredicate}.
     679\begin{cfa}
     680// statement completion predicate
     681bool check_completion( select_node * nodes ) {
     682    return nodes[0].status && nodes[1].status || nodes[2].status;
     683}
     684
     685// skip statement if all guards false
     686if ( GA || GB || GC ) {
     687    select_node nodes[3];
     688    nodes[0].status = !GA && GB; // A's status
     689    nodes[1].status = !GB && GA; // B's status
     690    nodes[2].status = !GC;       // C's status
     691
     692    // ... rest of waituntil codegen ...
     693
     694}
     695\end{cfa}
    619696
    620697\uC's @_Select@, supports operators both inside and outside of the \lstinline[language=uC++]{_Select} clauses.
    621698In the following example, the code blocks run once their corresponding predicate inside the round braces is satisfied.
    622 % C_TODO put this is uC++ code style not cfa-style
     699
    623700\begin{lstlisting}[language=uC++,{moredelim=**[is][\color{red}]{@}{@}}]
    624701Future_ISM<int> A, B, C, D;
     
    640717however, that opens the potential for livelock.
    641718Another possibility is to use resource ordering similar to \CFA's @mutex@ statement, but that alone is insufficient, if the resource ordering is not used universally.
    642 Additionally, using resource ordering could conflict with other semantics of the @waituntil@ statement.
    643 For example, consider if the locks in the example must be acquired in the order @D@, @B@, @C@, @A@ because of other @waituntil@ statements.
    644 \PAB{I don't understand: If all the locks are available, it becomes complex to both respect the ordering of the \lstinline{waituntil} when choosing which code block to run and also respect the lock ordering of \lstinline{D}, \lstinline{B}, \lstinline{C}, \lstinline{A} at the same time.}
    645719One other way this could be implemented is to wait until all resources for a given clause are available before proceeding to acquire them, but this also quickly becomes a poor approach.
    646720This approach does not work due to \gls{toctou} issues;
     
    661735\begin{cfa}
    662736bool when_conditions[N];
    663 for ( node in s )                                                                       $\C[3.75in]{// evaluate guards}$
     737for ( node in nodes )                                                                   $\C[3.75in]{// evaluate guards}$
    664738        if ( node has guard )
    665739                when_conditions[node] = node_guard;
     
    667741                when_conditions[node] = true;
    668742
    669 select_nodes s[N];                                                                      $\C{// declare N select nodes}$
     743if ( any when_conditions[node] == true ) {
     744
     745select_nodes nodes[N];                                                                  $\C{// declare N select nodes}$
    670746try {
    671         for ( node in s )                                                               $\C{// register nodes}$
    672                 if ( when_conditions[node] )
    673                         register_select( resource, node );
    674 
    675         // ... set statuses for nodes with when_conditions[node] == false ...
    676 
    677         while ( statement predicate not satisfied ) {   $\C{// check predicate}$
    678                 // block
    679                 for ( resource in waituntil statement ) {       $\C{// run true code blocks}$
    680                         if ( statement predicate is satisfied ) break;
    681                         if ( resource is avail )
    682                                 try {
    683                                         if( on_selected( resource ) )   $\C{// conditionally run block}$
    684                                                 run code block
    685                                 } finally                                                       $\C{// for exception safety}$
    686                                         unregister_select( resource, node ); $\C{// immediate unregister}$
    687                 }
    688         }
     747    // ... set statuses for nodes with when_conditions[node] == false ...
     748
     749    for ( node in nodes )                                                               $\C{// register nodes}$
     750        if ( when_conditions[node] )
     751            register_select( resource, node );
     752
     753    while ( !check_completion( nodes ) ) {      $\C{// check predicate}$
     754        // block
     755        for ( resource in waituntil statement ) {       $\C{// run true code blocks}$
     756            if ( check_completion( nodes ) ) break;
     757            if ( resource is avail )
     758                try {
     759                    if( on_selected( resource ) )       $\C{// conditionally run block}$
     760                        run code block
     761                } finally                                                       $\C{// for exception safety}$
     762                    unregister_select( resource, node ); $\C{// immediate unregister}$
     763        }
     764    }
    689765} finally {                                                                                     $\C{// for exception safety}$
    690         for ( registered nodes in s )                                   $\C{// deregister nodes}$
     766        for ( registered nodes in nodes )                                       $\C{// deregister nodes}$
    691767                if ( when_conditions[node] && unregister_select( resource, node )
    692768                                && on_selected( resource ) )
    693769                        run code block                                                  $\C{// run code block upon unregister}\CRT$
     770}
     771
    694772}
    695773\end{cfa}
Note: See TracChangeset for help on using the changeset viewer.