Context Navigation

← Previous Change
Next Change →

Changeset c68f6e6 for doc/theses

Timestamp:

Jul 31, 2023, 4:38:35 PM (12 months ago)

Author:

Peter A. Buhr <pabuhr@…>

Branches:

Children:

Parents:

2e94f3e7 (diff), 17c13b9 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Location:

doc/theses/colby_parsons_MMAth

Files:

: 1 added
: 5 edited

Legend:

: Unmodified
: Added
: Removed

doc/theses/colby_parsons_MMAth/Makefile

r2e94f3e7	rc68f6e6
85	85	diagrams/cyclic_swap \
86	86	diagrams/steal \
	87	diagrams/uCpp_select_tree \
87	88	}
88	89

doc/theses/colby_parsons_MMAth/glossary.tex

r2e94f3e7	rc68f6e6
41	41	\newabbreviation{dwcas}{DWCAS}{\Newterm{double-wide (width) compare-and-set (swap)}}
42	42	\newabbreviation{dcas}{DCAS}{\Newterm{double compare-and-set (swap)}}
43		\newabbreviation{~~dcasw}{DCASW}{\Newterm{weak double~~ compare-and-set (swap)}}
	43	\newabbreviation{qpcas}{QPCAS}{\Newterm{queue pointer compare-and-set (swap)}}
44	44	\newabbreviation{ll}{LL}{\Newterm{load linked}}
45	45	\newabbreviation{sc}{SC}{\Newterm{store conditional}}

doc/theses/colby_parsons_MMAth/text/actors.tex

-                      r2e94f3e7
+                      rc68f6e6
 In more detail, the \CFA work-stealing algorithm begins by iterating over its message queues twice without finding any work before it tries to steal a queue from another worker.
 Stealing a queue is done wait-free (\ie no busy waiting) with a few atomic instructions that only create contention with other stealing workers not the victim.
+Stealing a queue is done atomically with a few atomic instructions that only create contention with other stealing workers not the victim.
 The complexity in the implementation is that victim gulping does not take the mailbox queue;
 rather it atomically transfers the mailbox nodes to another queue leaving the mailbox empty, as discussed in Section~\ref{s:executor}.
 …
 \subsection{Queue Pointer Swap}\label{s:swap}
 To atomically swap the two @worker_queues@ pointers during work stealing, a novel wait-free swap-algorithm is needed.
+To atomically swap the two @worker_queues@ pointers during work stealing, a novel atomic swap-algorithm is needed.
 The \gls{cas} is a read-modify-write instruction available on most modern architectures.
 It atomically compares two memory locations, and if the values are equal, it writes a new value into the first memory location.
 …
+}
 \end{cfa}
 and can swap two values, where the comparisons are superfluous.
+\gls{dcas} can be used to swap two values; for this use case the comparisons are superfluous.
 \begin{cfa}
 DCAS( x, y, x, y, y, x );
 \end{cfa}
 A restrictive form of \gls{dcas} can be simulated using \gls{ll}/\gls{sc}~\cite{Brown13} or more expensive transactional memory with the same progress property problems as LL/SC.
 (There is waning interest in transactional memory and it seems to be fading away.)
+% (There is waning interest in transactional memory and it seems to be fading away.)
 Similarly, very few architectures have a true memory/memory swap instruction (Motorola M68K, SPARC 32-bit).
 …
 Either a true memory/memory swap instruction or a \gls{dcas} would provide the ability to atomically swap two memory locations, but unfortunately neither of these instructions are supported on the architectures used in this work.
 Hence, a novel atomic swap for this use case is simulated, called \gls{dcasw}.
 The \gls{dcasw} is effectively a \gls{dcas} special cased in two ways:
+Hence, a novel atomic swap specific to the actor use case is simulated, called \gls{qpcas}.
+The \gls{qpcas} is effectively a \gls{dcas} special cased in a few ways:
 \begin{enumerate}
 \item
 It works on two separate memory locations, and hence, is logically the same as.
 \begin{cfa}
 bool DCASW( T * dst, T * src ) {
+bool QPCAS( T * dst, T * src ) {
         return DCAS( dest, src, *dest, *src, *src, *dest );
+}
 …
 The values swapped are never null pointers, so a null pointer can be used as an intermediate value during the swap.
 \end{enumerate}
 Figure~\ref{f:dcaswImpl} shows the \CFA pseudocode for the \gls{dcasw}.
+Figure~\ref{f:qpcasImpl} shows the \CFA pseudocode for the \gls{qpcas}.
 In detail, a thief performs the following steps to swap two pointers:
 \begin{enumerate}[start=0]
 …
 verifies the stored copy of the victim queue pointer, @vic_queue@, is valid.
 If @vic_queue@ is null, then the victim queue is part of another swap so the operation fails.
+No state has changed at this point so no fixup is needed.
+Note, @my_queue@ can never be equal to null at this point since thieves only set their own queues pointers to null when stealing.
+At no other point is a queue pointer set to null.
+Since each worker owns a disjoint range of the queue array, it is impossible for @my_queue@ to be null.
+Note, this algorithm is simplified due to each worker owning a disjoint range, allowing only the @vic_queue@ to be checked for null.
+This was not listed as a special case of this algorithm, since this requirement can be avoided by modifying Step 1 of Figure~\ref{f:dcaswImpl} to also check @my_queue@ for null.
+Further discussion of this generalization is omitted since it is not needed for the presented application.
+No state has changed at this point so the thief just returns.
+Note, thieves only set their own queues pointers to null when stealing, and queue pointers are not set to null anywhere else.
+As such, it is impossible for @my_queue@ to be null since each worker owns a disjoint range of the queue array.
+Hence, only @vic_queue@ is checked for null.
 \item
 attempts to atomically set the thief's queue pointer to null.
 …
 At this point, the thief-turned-victim fails, and since it has not changed any state, it just returns false.
 If the @CAS@ succeeds, the thief's queue pointer is now null.
+Nulling the pointer is safe since only thieves look at other worker's queue ranges, and whenever thieves need to dereference a queue pointer, it is checked for null.
+Only thieves look at other worker's queue ranges, and whenever thieves need to dereference a queue pointer, it is checked for null.
+A thief can only see the null queue pointer when looking for queues to steal or attempting a queue swap.
+If looking for queues, the thief will skip the null pointer, thus only the queue swap case needs to be considered for correctness.
 \item
 attempts to atomically set the victim's queue pointer to @my_queue@.
 …
 If the @CAS@ fails, the thief's queue pointer must be restored to its previous value before returning.
 \item
 set the thief's queue pointer to @vic_queue@ completing the swap.
+sets the thief's queue pointer to @vic_queue@ completing the swap.
 \end{enumerate}
 …
+}
 \end{cfa}
 \caption{DCASW Concurrent}
 \label{f:dcaswImpl}
+\caption{QPCAS Concurrent}
+\label{f:qpcasImpl}
 \end{figure}
 \begin{theorem}
 \gls{dcasw} is correct in both the success and failure cases.
+\gls{qpcas} is correct in both the success and failure cases.
 \end{theorem}
 To verify sequential correctness, Figure~\ref{f:seqSwap} shows a simplified \gls{dcasw}.
+To verify sequential correctness, Figure~\ref{f:seqSwap} shows a simplified \gls{qpcas}.
 Step 1 is missing in the sequential example since it only matters in the concurrent context.
 By inspection, the sequential swap copies each pointer being swapped, and then the original values of each pointer are reset using the copy of the other pointer.
 …
+}
 \end{cfa}
 \caption{DCASW Sequential}
+\caption{QPCAS Sequential}
 \label{f:seqSwap}
 \end{figure}
 To verify concurrent correctness, it is necessary to show \gls{dcasw} is wait-free, \ie all thieves fail or succeed in swapping the queues in a finite number of steps.
 This property is straightforward, because there are no locks or looping.
 As well, there is no retry mechanism in the case of a failed swap, since a failed swap either means the work is already stolen or that work is stolen from the thief.
 In both cases, it is apropos for a thief to give up stealing.
 The proof of correctness is shown through the existence of an invariant.
+% All thieves fail or succeed in swapping the queues in a finite number of steps.
+% This is straightforward, because there are no locks or looping.
+% As well, there is no retry mechanism in the case of a failed swap, since a failed swap either means the work is already stolen or that work is stolen from the thief.
+% In both cases, it is apropos for a thief to give up stealing.
+The concurrent proof of correctness is shown through the existence of an invariant.
 The invariant states when a queue pointer is set to @0p@ by a thief, then the next write to the pointer can only be performed by the same thief.
 To show that this invariant holds, it is shown that it is true at each step of the swap.
 …
 Once a thief atomically sets their queue pointer to be @0p@ in step 2, the invariant guarantees that that pointer does not change.
 In the success case of step 3, it is known the value of the victim's queue-pointer, which is not overwritten, must be @vic_queue@ due to the use of @CAS@.
 Given that the pointers all have unique memory locations, this first write of the successful swap is correct since it can only occur when the pointer has not changed.
+Given that the pointers all have unique memory locations (a pointer is never swapped with itself), this first write of the successful swap is correct since it can only occur when the pointer has not changed.
 By the invariant, the write back in the successful case is correct since no other worker can write to the @0p@ pointer.
 In the failed case of step 3, the outcome is correct in steps 1 and 2 since no writes have occurred so the program state is unchanged.
 Therefore, the program state is safely restored to the state it had prior to the @0p@ write in step 2, because the invariant makes the write back to the @0p@ pointer safe.
 Note that the assumption of the pointers having unique memory locations prevents the ABA problem in this usage of \gls{dcasw}, but it is not needed for correctness of the general \gls{dcasw} operation.
+Note that the pointers having unique memory locations prevents the ABA problem.
 \begin{comment}
 …
 First it is important to state that a thief does not attempt to steal from themselves.
 As such, the victim here is not also a thief.
 Stepping through the code in \ref{f:dcaswImpl}, for all thieves, steps 0-1 succeed since the victim is not stealing and has no queue pointers set to be @0p@.
+Stepping through the code in \ref{f:qpcasImpl}, for all thieves, steps 0-1 succeed since the victim is not stealing and has no queue pointers set to be @0p@.
 Similarly, for all thieves, step 2 succeed since no one is stealing from any of the thieves.
 In step 3, the first thief to @CAS@ wins the race and successfully swaps the queue pointer.

doc/theses/colby_parsons_MMAth/text/conclusion.tex

-                      r2e94f3e7
+                      rc68f6e6
 The @waituntil@ statement aids in writing concurrent programs in both the message passing and shared memory paradigms of concurrency.
 Furthermore, no other language provides a synchronous multiplexing tool polymorphic over resources like \CFA's @waituntil@.
+From the novel copy-queue data structure in the actor system and the plethora of user-supporting safety features, all these utilities build upon existing tools with value added.
+On overview of the contributions in this thesis include the following:
+\begin{enumerate}
+\item The mutex statement, which provides performant and deadlock-free multiple lock acquisition.
+\item Channels with comparable performance to Go, that have safety and productivity features including deadlock detection and an easy-to-use exception-based channel @close@ routine.
+\item An in-memory actor system that achieved the lowest latency message send of systems tested due to the novel copy-queue data structure. The actor system presented has built-in detection of six common actor errors, and it has good performance compared to other systems on all benchmarks.
+\item A @waituntil@ statement which tackles the hard problem of allowing a thread to safely synchronously wait for some set of concurrent resources.
+\end{enumerate}
+From the novel copy-queue data structure in the actor system and the plethora of user-supporting safety features, all these utilities build upon existing concurrent tooling with value added.
 Performance results verify that each new feature is comparable or better than similar features in other programming languages.
+\PAB{This part seems a little short.}
+All in all, this suite of concurrent tools expands users' ability to easily write safe and performant multi-threaded programs in \CFA.
 \section{Future Work}

doc/theses/colby_parsons_MMAth/text/waituntil.tex

-                      r2e94f3e7
+                      rc68f6e6
 More detail on channels and their interaction with @waituntil@ appear in Section~\ref{s:wu_chans}.
+The trait is used by having a blocking object return a type that supports the @is_selectable@ trait.
+The trait can be used directly by having a blocking object support the @is_selectable@ trait, or it can be used indirectly through routines that take the object as an argument.
+When used indirectly, the object's routine returns a type that supports the @is_selectable@ trait.
 This feature leverages \CFA's ability to overload on return type to select the correct overloaded routine for the @waituntil@ context.
 A selectable type is needed for types that want to support multiple operations such as channels that allow both reading and writing.
+Indirect support through routines is needed for types that want to support multiple operations such as channels that allow both reading and writing.
 \section{\lstinline{waituntil} Implementation}
 …
 This work incurs a high cost for signalling threads and heavily increase contention on internal channel locks.
 Furthermore, the @waituntil@ statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible.
 As such, the exclusive-or semantics is lost when using both @and@ and @or@ operators since it cannot be supported without significant complexity and significantly affects @waituntil@ performance.
+As such, the exclusive-or semantics are lost when using both @and@ and @or@ operators since it cannot be supported without significant complexity and significantly affects @waituntil@ performance.
 It was deemed important that exclusive-or semantics are maintained when only @or@ operators are used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel.
 …
 If any other threads attempt to set a WUT's race pointer and see a pending value, they wait until the value changes before proceeding to ensure that, in the case the WUT fails, the signal is not lost.
 This protocol ensures that signals cannot be lost and that the two races can be resolved in a safe manner.
+\PAB{I bet one of the readers is going to ask you to write the pseudo code for this algorithm.}
+The implementation of this protocol is shown in Figure~\ref{f:WU_DeadlockAvoidance}.
+\begin{figure}
+\begin{cfa}
+bool pending_set_other( select_node & other, select_node & mine ) {
+    unsigned long int cmp_status = UNSAT;
+    // Try to set other status, if we succeed break and return true
+    while( !CAS( other.clause_status, &cmp_status, SAT ) ) {
+        if ( cmp_status == SAT )
+            return false; // If other status is SAT we lost so return false
+        // Toggle own status flag to allow other thread to potentially win
+        mine.status = UNSAT;
+        // Reset compare flag
+        cmp_status = UNSAT;
+        // Attempt to set own status flag back to PENDING to retry
+        if ( !CAS( mine.clause_status, &cmp_status, PENDING ) )
+            return false; // If we fail then we lost so return false
+        // Reset compare flag
+        cmp_status = UNSAT;
+    }
+    return true;
+}
+\end{cfa}
+\caption{Exclusive-or \lstinline{waituntil} channel deadlock avoidance protocol}
+\label{f:WU_DeadlockAvoidance}
+\end{figure}
 Channels in \CFA have exception-based shutdown mechanisms that the @waituntil@ statement needs to support.
 …
 It is trivial to check when a synchronous multiplexing utility is done for the or/xor relationship, since any resource becoming available means that the blocked thread can proceed and the @waituntil@ statement is finished.
+In \uC and \CFA, the \gls{synch_multiplex} mechanism have both an and/or relationship, which make the problem of checking for completion of the statement difficult.
+\PAB{Show an example of why this is difficult.}
+In \uC and \CFA, the \gls{synch_multiplex} mechanism have both an and/or relationship, which along with guards, make the problem of checking for completion of the statement difficult.
+Consider the @waituntil@ in Figure~\ref{f:WU_ComplexPredicate}.
+When the @waituntil@ thread wakes up, checking if the statement is complete is non-trivial.
+The predicate that will return if the statement in Figure~\ref{f:WU_ComplexPredicate} is satisfied is the following.
+\begin{cfa}
+A && B || C || !GA && B || !GB && A || !GA && !GB && !GC
+\end{cfa}
+Which simplifies to:
+\begin{cfa}
+( A || !GA ) && ( B || !GB ) || C || !GA && !GB && !GC
+\end{cfa}
+Checking a predicate this large with each iteration is expensive so \uC and \CFA both take steps to simplify checking statement completion.
+\begin{figure}
+\begin{cfa}
+when( GA ) waituntil( A ) {}
+and when( GB ) waituntil( B ) {}
+or when( GC ) waituntil( C ) {}
+\end{cfa}
+\caption{\lstinline{waituntil} with a non-trivial predicate}
+\label{f:WU_ComplexPredicate}
+\end{figure}
 In the \uC @_Select@ statement, this problem is solved by constructing a tree of the resources, where the internal nodes are operators and the leaves are booleans storing the state of each resource.
 …
 Once the root of the tree has both subtrees marked as @true@ then the statement is complete.
 As an optimization, when the internal nodes are updated, the subtrees marked as @true@ are pruned and not examined again.
+To support statement guards in \uC, the tree prunes a branch if the corresponding guard is false.
+\PAB{Show an example.}
+To support statement guards in \uC, the tree is modified to remove an internal node if a guard is false to maintain the appropriate predicate representation.
+An diagram of the tree for the statement in Figure~\ref{f:WU_ComplexPredicate} is shown in Figure~\ref{f:uC_select_tree}, alongside the modification of the tree that occurs when @GA@ is @false@.
+\begin{figure}
+\begin{center}
+\input{diagrams/uCpp_select_tree.tikz}
+\end{center}
+\caption{\uC select tree modification}
+\label{f:uC_select_tree}
+\end{figure}
 The \CFA @waituntil@ statement blocks a thread until a set of resources have become available that satisfy the underlying predicate.
 …
 Leveraging the compiler, a predicate routine is generated per @waituntil@ that when passes the statuses of the resources, returns @true@ when the @waituntil@ is done, and false otherwise.
 To support guards on the \CFA @waituntil@ statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate.
+\PAB{Show an example.}
+The generated code allows the predicate that is checked with each iteration to be simplified to not check guard values.
+For example, the following would be generated for the @waituntil@ shown in Figure~\ref{f:WU_ComplexPredicate}.
+\begin{cfa}
+// statement completion predicate
+bool check_completion( select_node * nodes ) {
+    return nodes[0].status && nodes[1].status || nodes[2].status;
+}
+// skip statement if all guards false
+if ( GA || GB || GC ) {
+    select_node nodes[3];
+    nodes[0].status = !GA && GB; // A's status
+    nodes[1].status = !GB && GA; // B's status
+    nodes[2].status = !GC;       // C's status
+    // ... rest of waituntil codegen ...
+}
+\end{cfa}
 \uC's @_Select@, supports operators both inside and outside of the \lstinline[language=uC++]{_Select} clauses.
 In the following example, the code blocks run once their corresponding predicate inside the round braces is satisfied.
+% C_TODO put this is uC++ code style not cfa-style
 \begin{lstlisting}[language=uC++,{moredelim=**[is][\color{red}]{@}{@}}]
 Future_ISM<int> A, B, C, D;
 …
 however, that opens the potential for livelock.
 Another possibility is to use resource ordering similar to \CFA's @mutex@ statement, but that alone is insufficient, if the resource ordering is not used universally.
-Additionally, using resource ordering could conflict with other semantics of the @waituntil@ statement.
-For example, consider if the locks in the example must be acquired in the order @D@, @B@, @C@, @A@ because of other @waituntil@ statements.
-\PAB{I don't understand: If all the locks are available, it becomes complex to both respect the ordering of the \lstinline{waituntil} when choosing which code block to run and also respect the lock ordering of \lstinline{D}, \lstinline{B}, \lstinline{C}, \lstinline{A} at the same time.}
 One other way this could be implemented is to wait until all resources for a given clause are available before proceeding to acquire them, but this also quickly becomes a poor approach.
 This approach does not work due to \gls{toctou} issues;
 …
 \begin{cfa}
 bool when_conditions[N];
 for ( node in s )                                                                       $\C[3.75in]{// evaluate guards}$
+for ( node in nodes )                                                                   $\C[3.75in]{// evaluate guards}$
         if ( node has guard )
                 when_conditions[node] = node_guard;
 …
                 when_conditions[node] = true;
+select_nodes s[N];                                                                      $\C{// declare N select nodes}$
+if ( any when_conditions[node] == true ) {
+select_nodes nodes[N];                                                                  $\C{// declare N select nodes}$
 try {
+        for ( node in s )                                                               $\C{// register nodes}$
+                if ( when_conditions[node] )
+                        register_select( resource, node );
+        // ... set statuses for nodes with when_conditions[node] == false ...
         while ( statement predicate not satisfied ) {   $\C{// check predicate}$
                 // block
                 for ( resource in waituntil statement ) {       $\C{// run true code blocks}$
                         if ( statement predicate is satisfied ) break;
                         if ( resource is avail )
                                 try {
                                         if( on_selected( resource ) )   $\C{// conditionally run block}$
                                                 run code block
                                 } finally                                                       $\C{// for exception safety}$
                                         unregister_select( resource, node ); $\C{// immediate unregister}$
+                }
+        }
+    // ... set statuses for nodes with when_conditions[node] == false ...
+    for ( node in nodes )                                                               $\C{// register nodes}$
+        if ( when_conditions[node] )
+            register_select( resource, node );
+    while ( !check_completion( nodes ) ) {      $\C{// check predicate}$
+        // block
+        for ( resource in waituntil statement ) {       $\C{// run true code blocks}$
+            if ( check_completion( nodes ) ) break;
+            if ( resource is avail )
+                try {
+                    if( on_selected( resource ) )       $\C{// conditionally run block}$
+                        run code block
+                } finally                                                       $\C{// for exception safety}$
+                    unregister_select( resource, node ); $\C{// immediate unregister}$
+        }
+    }
 } finally {                                                                                     $\C{// for exception safety}$
         for ( registered nodes in s )                                   $\C{// deregister nodes}$
+        for ( registered nodes in nodes )                                       $\C{// deregister nodes}$
                 if ( when_conditions[node] && unregister_select( resource, node )
                                 && on_selected( resource ) )
                         run code block                                                  $\C{// run code block upon unregister}\CRT$
+}
+}
 \end{cfa}

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: