Changeset f496046 for doc/theses/colby_parsons_MMAth
- Timestamp:
- Jul 31, 2023, 12:04:38 PM (17 months ago)
- Branches:
- master
- Children:
- d3c3261
- Parents:
- 14c0f7b
- Location:
- doc/theses/colby_parsons_MMAth
- Files:
-
- 1 added
- 4 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/colby_parsons_MMAth/Makefile
r14c0f7b rf496046 85 85 diagrams/cyclic_swap \ 86 86 diagrams/steal \ 87 diagrams/uCpp_select_tree \ 87 88 } 88 89 -
doc/theses/colby_parsons_MMAth/glossary.tex
r14c0f7b rf496046 41 41 \newabbreviation{dwcas}{DWCAS}{\Newterm{double-wide (width) compare-and-set (swap)}} 42 42 \newabbreviation{dcas}{DCAS}{\Newterm{double compare-and-set (swap)}} 43 \newabbreviation{ dcasw}{DCASW}{\Newterm{weak doublecompare-and-set (swap)}}43 \newabbreviation{qpcas}{QPCAS}{\Newterm{queue pointer compare-and-set (swap)}} 44 44 \newabbreviation{ll}{LL}{\Newterm{load linked}} 45 45 \newabbreviation{sc}{SC}{\Newterm{store conditional}} -
doc/theses/colby_parsons_MMAth/text/actors.tex
r14c0f7b rf496046 578 578 579 579 In more detail, the \CFA work-stealing algorithm begins by iterating over its message queues twice without finding any work before it tries to steal a queue from another worker. 580 Stealing a queue is done wait-free (\ie no busy waiting)with a few atomic instructions that only create contention with other stealing workers not the victim.580 Stealing a queue is done atomically with a few atomic instructions that only create contention with other stealing workers not the victim. 581 581 The complexity in the implementation is that victim gulping does not take the mailbox queue; 582 582 rather it atomically transfers the mailbox nodes to another queue leaving the mailbox empty, as discussed in Section~\ref{s:executor}. … … 700 700 \subsection{Queue Pointer Swap}\label{s:swap} 701 701 702 To atomically swap the two @worker_queues@ pointers during work stealing, a novel wait-freeswap-algorithm is needed.702 To atomically swap the two @worker_queues@ pointers during work stealing, a novel atomic swap-algorithm is needed. 703 703 The \gls{cas} is a read-modify-write instruction available on most modern architectures. 704 704 It atomically compares two memory locations, and if the values are equal, it writes a new value into the first memory location. … … 737 737 } 738 738 \end{cfa} 739 and can swap two values, where the comparisons are superfluous.739 \gls{dcas} can be used to swap two values; for this use case the comparisons are superfluous. 740 740 \begin{cfa} 741 741 DCAS( x, y, x, y, y, x ); 742 742 \end{cfa} 743 743 A restrictive form of \gls{dcas} can be simulated using \gls{ll}/\gls{sc}~\cite{Brown13} or more expensive transactional memory with the same progress property problems as LL/SC. 744 (There is waning interest in transactional memory and it seems to be fading away.)744 % (There is waning interest in transactional memory and it seems to be fading away.) 745 745 746 746 Similarly, very few architectures have a true memory/memory swap instruction (Motorola M68K, SPARC 32-bit). … … 749 749 750 750 Either a true memory/memory swap instruction or a \gls{dcas} would provide the ability to atomically swap two memory locations, but unfortunately neither of these instructions are supported on the architectures used in this work. 751 Hence, a novel atomic swap for this use case is simulated, called \gls{dcasw}.752 The \gls{ dcasw} is effectively a \gls{dcas} special cased in twoways:751 Hence, a novel atomic swap specific to the actor use case is simulated, called \gls{qpcas}. 752 The \gls{qpcas} is effectively a \gls{dcas} special cased in a few ways: 753 753 \begin{enumerate} 754 754 \item 755 755 It works on two separate memory locations, and hence, is logically the same as. 756 756 \begin{cfa} 757 bool DCASW( T * dst, T * src ) {757 bool QPCAS( T * dst, T * src ) { 758 758 return DCAS( dest, src, *dest, *src, *src, *dest ); 759 759 } … … 762 762 The values swapped are never null pointers, so a null pointer can be used as an intermediate value during the swap. 763 763 \end{enumerate} 764 Figure~\ref{f: dcaswImpl} shows the \CFA pseudocode for the \gls{dcasw}.764 Figure~\ref{f:qpcasImpl} shows the \CFA pseudocode for the \gls{qpcas}. 765 765 In detail, a thief performs the following steps to swap two pointers: 766 766 \begin{enumerate}[start=0] … … 770 770 verifies the stored copy of the victim queue pointer, @vic_queue@, is valid. 771 771 If @vic_queue@ is null, then the victim queue is part of another swap so the operation fails. 772 No state has changed at this point so no fixup is needed. 773 Note, @my_queue@ can never be equal to null at this point since thieves only set their own queues pointers to null when stealing. 774 At no other point is a queue pointer set to null. 775 Since each worker owns a disjoint range of the queue array, it is impossible for @my_queue@ to be null. 776 Note, this algorithm is simplified due to each worker owning a disjoint range, allowing only the @vic_queue@ to be checked for null. 777 This was not listed as a special case of this algorithm, since this requirement can be avoided by modifying Step 1 of Figure~\ref{f:dcaswImpl} to also check @my_queue@ for null. 778 Further discussion of this generalization is omitted since it is not needed for the presented application. 772 No state has changed at this point so the thief just returns. 773 Note, thieves only set their own queues pointers to null when stealing, and queue pointers are not set to null anywhere else. 774 As such, it is impossible for @my_queue@ to be null since each worker owns a disjoint range of the queue array. 775 Hence, only @vic_queue@ is checked for null. 779 776 \item 780 777 attempts to atomically set the thief's queue pointer to null. … … 782 779 At this point, the thief-turned-victim fails, and since it has not changed any state, it just returns false. 783 780 If the @CAS@ succeeds, the thief's queue pointer is now null. 784 Nulling the pointer is safe since only thieves look at other worker's queue ranges, and whenever thieves need to dereference a queue pointer, it is checked for null. 781 Only thieves look at other worker's queue ranges, and whenever thieves need to dereference a queue pointer, it is checked for null. 782 A thief can only see the null queue pointer when looking for queues to steal or attempting a queue swap. 783 If looking for queues, the thief will skip the null pointer, thus only the queue swap case needs to be considered for correctness. 784 785 785 \item 786 786 attempts to atomically set the victim's queue pointer to @my_queue@. … … 788 788 If the @CAS@ fails, the thief's queue pointer must be restored to its previous value before returning. 789 789 \item 790 set the thief's queue pointer to @vic_queue@ completing the swap.790 sets the thief's queue pointer to @vic_queue@ completing the swap. 791 791 \end{enumerate} 792 792 … … 820 820 } 821 821 \end{cfa} 822 \caption{ DCASWConcurrent}823 \label{f: dcaswImpl}822 \caption{QPCAS Concurrent} 823 \label{f:qpcasImpl} 824 824 \end{figure} 825 825 826 826 \begin{theorem} 827 \gls{ dcasw} is correct in both the success and failure cases.827 \gls{qpcas} is correct in both the success and failure cases. 828 828 \end{theorem} 829 To verify sequential correctness, Figure~\ref{f:seqSwap} shows a simplified \gls{ dcasw}.829 To verify sequential correctness, Figure~\ref{f:seqSwap} shows a simplified \gls{qpcas}. 830 830 Step 1 is missing in the sequential example since it only matters in the concurrent context. 831 831 By inspection, the sequential swap copies each pointer being swapped, and then the original values of each pointer are reset using the copy of the other pointer. … … 845 845 } 846 846 \end{cfa} 847 \caption{ DCASWSequential}847 \caption{QPCAS Sequential} 848 848 \label{f:seqSwap} 849 849 \end{figure} 850 850 851 To verify concurrent correctness, it is necessary to show \gls{dcasw} is wait-free, \ie all thieves fail or succeed in swapping the queues in a finite number of steps.852 This propertyis straightforward, because there are no locks or looping.853 As well, there is no retry mechanism in the case of a failed swap, since a failed swap either means the work is already stolen or that work is stolen from the thief.854 In both cases, it is apropos for a thief to give up stealing.855 856 The proof of correctness is shown through the existence of an invariant.851 % All thieves fail or succeed in swapping the queues in a finite number of steps. 852 % This is straightforward, because there are no locks or looping. 853 % As well, there is no retry mechanism in the case of a failed swap, since a failed swap either means the work is already stolen or that work is stolen from the thief. 854 % In both cases, it is apropos for a thief to give up stealing. 855 856 The concurrent proof of correctness is shown through the existence of an invariant. 857 857 The invariant states when a queue pointer is set to @0p@ by a thief, then the next write to the pointer can only be performed by the same thief. 858 858 To show that this invariant holds, it is shown that it is true at each step of the swap. … … 877 877 Once a thief atomically sets their queue pointer to be @0p@ in step 2, the invariant guarantees that that pointer does not change. 878 878 In the success case of step 3, it is known the value of the victim's queue-pointer, which is not overwritten, must be @vic_queue@ due to the use of @CAS@. 879 Given that the pointers all have unique memory locations , this first write of the successful swap is correct since it can only occur when the pointer has not changed.879 Given that the pointers all have unique memory locations (a pointer is never swapped with itself), this first write of the successful swap is correct since it can only occur when the pointer has not changed. 880 880 By the invariant, the write back in the successful case is correct since no other worker can write to the @0p@ pointer. 881 881 In the failed case of step 3, the outcome is correct in steps 1 and 2 since no writes have occurred so the program state is unchanged. 882 882 Therefore, the program state is safely restored to the state it had prior to the @0p@ write in step 2, because the invariant makes the write back to the @0p@ pointer safe. 883 Note that the assumption of the pointers having unique memory locations prevents the ABA problem in this usage of \gls{dcasw}, but it is not needed for correctness of the general \gls{dcasw} operation.883 Note that the pointers having unique memory locations prevents the ABA problem. 884 884 885 885 \begin{comment} … … 905 905 First it is important to state that a thief does not attempt to steal from themselves. 906 906 As such, the victim here is not also a thief. 907 Stepping through the code in \ref{f: dcaswImpl}, for all thieves, steps 0-1 succeed since the victim is not stealing and has no queue pointers set to be @0p@.907 Stepping through the code in \ref{f:qpcasImpl}, for all thieves, steps 0-1 succeed since the victim is not stealing and has no queue pointers set to be @0p@. 908 908 Similarly, for all thieves, step 2 succeed since no one is stealing from any of the thieves. 909 909 In step 3, the first thief to @CAS@ wins the race and successfully swaps the queue pointer. -
doc/theses/colby_parsons_MMAth/text/waituntil.tex
r14c0f7b rf496046 382 382 More detail on channels and their interaction with @waituntil@ appear in Section~\ref{s:wu_chans}. 383 383 384 The trait is used by having a blocking object return a type that supports the @is_selectable@ trait. 384 The trait can be used directly by having a blocking object support the @is_selectable@ trait, or it can be used indirectly through routines that take the object as an argument. 385 When used indirectly, the object's routine returns a type that supports the @is_selectable@ trait. 385 386 This feature leverages \CFA's ability to overload on return type to select the correct overloaded routine for the @waituntil@ context. 386 A selectable typeis needed for types that want to support multiple operations such as channels that allow both reading and writing.387 Indirect support through routines is needed for types that want to support multiple operations such as channels that allow both reading and writing. 387 388 388 389 \section{\lstinline{waituntil} Implementation} … … 520 521 This work incurs a high cost for signalling threads and heavily increase contention on internal channel locks. 521 522 Furthermore, the @waituntil@ statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible. 522 As such, the exclusive-or semantics islost when using both @and@ and @or@ operators since it cannot be supported without significant complexity and significantly affects @waituntil@ performance.523 As such, the exclusive-or semantics are lost when using both @and@ and @or@ operators since it cannot be supported without significant complexity and significantly affects @waituntil@ performance. 523 524 524 525 It was deemed important that exclusive-or semantics are maintained when only @or@ operators are used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel. … … 591 592 If any other threads attempt to set a WUT's race pointer and see a pending value, they wait until the value changes before proceeding to ensure that, in the case the WUT fails, the signal is not lost. 592 593 This protocol ensures that signals cannot be lost and that the two races can be resolved in a safe manner. 593 \PAB{I bet one of the readers is going to ask you to write the pseudo code for this algorithm.} 594 The implementation of this protocol is shown in Figure~\ref{f:WU_DeadlockAvoidance}. 595 596 \begin{figure} 597 \begin{cfa} 598 bool pending_set_other( select_node & other, select_node & mine ) { 599 unsigned long int cmp_status = UNSAT; 600 601 // Try to set other status, if we succeed break and return true 602 while( !CAS( other.clause_status, &cmp_status, SAT ) ) { 603 if ( cmp_status == SAT ) 604 return false; // If other status is SAT we lost so return false 605 606 // Toggle own status flag to allow other thread to potentially win 607 mine.status = UNSAT; 608 609 // Reset compare flag 610 cmp_status = UNSAT; 611 612 // Attempt to set own status flag back to PENDING to retry 613 if ( !CAS( mine.clause_status, &cmp_status, PENDING ) ) 614 return false; // If we fail then we lost so return false 615 616 // Reset compare flag 617 cmp_status = UNSAT; 618 } 619 return true; 620 } 621 \end{cfa} 622 \caption{Exclusive-or \lstinline{waituntil} channel deadlock avoidance protocol} 623 \label{f:WU_DeadlockAvoidance} 624 \end{figure} 594 625 595 626 Channels in \CFA have exception-based shutdown mechanisms that the @waituntil@ statement needs to support. … … 600 631 601 632 It is trivial to check when a synchronous multiplexing utility is done for the or/xor relationship, since any resource becoming available means that the blocked thread can proceed and the @waituntil@ statement is finished. 602 In \uC and \CFA, the \gls{synch_multiplex} mechanism have both an and/or relationship, which make the problem of checking for completion of the statement difficult. 603 \PAB{Show an example of why this is difficult.} 633 In \uC and \CFA, the \gls{synch_multiplex} mechanism have both an and/or relationship, which along with guards, make the problem of checking for completion of the statement difficult. 634 Consider the @waituntil@ in Figure~\ref{f:WU_ComplexPredicate}. 635 When the @waituntil@ thread wakes up, checking if the statement is complete is non-trivial. 636 The predicate that will return if the statement in Figure~\ref{f:WU_ComplexPredicate} is satisfied is the following. 637 \begin{cfa} 638 A && B || C || !GA && B || !GB && A || !GA && !GB && !GC 639 \end{cfa} 640 Which simplifies to: 641 \begin{cfa} 642 ( A || !GA ) && ( B || !GB ) || C || !GA && !GB && !GC 643 \end{cfa} 644 Checking a predicate this large with each iteration is expensive so \uC and \CFA both take steps to simplify checking statement completion. 645 646 \begin{figure} 647 \begin{cfa} 648 when( GA ) waituntil( A ) {} 649 and when( GB ) waituntil( B ) {} 650 or when( GC ) waituntil( C ) {} 651 \end{cfa} 652 \caption{\lstinline{waituntil} with a non-trivial predicate} 653 \label{f:WU_ComplexPredicate} 654 \end{figure} 604 655 605 656 In the \uC @_Select@ statement, this problem is solved by constructing a tree of the resources, where the internal nodes are operators and the leaves are booleans storing the state of each resource. … … 608 659 Once the root of the tree has both subtrees marked as @true@ then the statement is complete. 609 660 As an optimization, when the internal nodes are updated, the subtrees marked as @true@ are pruned and not examined again. 610 To support statement guards in \uC, the tree prunes a branch if the corresponding guard is false. 611 \PAB{Show an example.} 661 To support statement guards in \uC, the tree is modified to remove an internal node if a guard is false to maintain the appropriate predicate representation. 662 An diagram of the tree for the statement in Figure~\ref{f:WU_ComplexPredicate} is shown in Figure~\ref{f:uC_select_tree}, alongside the modification of the tree that occurs when @GA@ is @false@. 663 664 \begin{figure} 665 \begin{center} 666 \input{diagrams/uCpp_select_tree.tikz} 667 \end{center} 668 \caption{\uC select tree modification} 669 \label{f:uC_select_tree} 670 \end{figure} 612 671 613 672 The \CFA @waituntil@ statement blocks a thread until a set of resources have become available that satisfy the underlying predicate. … … 616 675 Leveraging the compiler, a predicate routine is generated per @waituntil@ that when passes the statuses of the resources, returns @true@ when the @waituntil@ is done, and false otherwise. 617 676 To support guards on the \CFA @waituntil@ statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate. 618 \PAB{Show an example.} 677 The generated code allows the predicate that is checked with each iteration to be simplified to not check guard values. 678 For example, the following would be generated for the @waituntil@ shown in Figure~\ref{f:WU_ComplexPredicate}. 679 \begin{cfa} 680 // statement completion predicate 681 bool check_completion( select_node * nodes ) { 682 return nodes[0].status && nodes[1].status || nodes[2].status; 683 } 684 685 // skip statement if all guards false 686 if ( GA || GB || GC ) { 687 select_node nodes[3]; 688 nodes[0].status = !GA && GB; // A's status 689 nodes[1].status = !GB && GA; // B's status 690 nodes[2].status = !GC; // C's status 691 692 // ... rest of waituntil codegen ... 693 694 } 695 \end{cfa} 619 696 620 697 \uC's @_Select@, supports operators both inside and outside of the \lstinline[language=uC++]{_Select} clauses. 621 698 In the following example, the code blocks run once their corresponding predicate inside the round braces is satisfied. 622 % C_TODO put this is uC++ code style not cfa-style 699 623 700 \begin{lstlisting}[language=uC++,{moredelim=**[is][\color{red}]{@}{@}}] 624 701 Future_ISM<int> A, B, C, D; … … 640 717 however, that opens the potential for livelock. 641 718 Another possibility is to use resource ordering similar to \CFA's @mutex@ statement, but that alone is insufficient, if the resource ordering is not used universally. 642 Additionally, using resource ordering could conflict with other semantics of the @waituntil@ statement.643 For example, consider if the locks in the example must be acquired in the order @D@, @B@, @C@, @A@ because of other @waituntil@ statements.644 \PAB{I don't understand: If all the locks are available, it becomes complex to both respect the ordering of the \lstinline{waituntil} when choosing which code block to run and also respect the lock ordering of \lstinline{D}, \lstinline{B}, \lstinline{C}, \lstinline{A} at the same time.}645 719 One other way this could be implemented is to wait until all resources for a given clause are available before proceeding to acquire them, but this also quickly becomes a poor approach. 646 720 This approach does not work due to \gls{toctou} issues; … … 661 735 \begin{cfa} 662 736 bool when_conditions[N]; 663 for ( node in s ) $\C[3.75in]{// evaluate guards}$737 for ( node in nodes ) $\C[3.75in]{// evaluate guards}$ 664 738 if ( node has guard ) 665 739 when_conditions[node] = node_guard; … … 667 741 when_conditions[node] = true; 668 742 669 select_nodes s[N]; $\C{// declare N select nodes}$ 743 if ( any when_conditions[node] == true ) { 744 745 select_nodes nodes[N]; $\C{// declare N select nodes}$ 670 746 try { 671 for ( node in s ) $\C{// register nodes}$ 672 if ( when_conditions[node] ) 673 register_select( resource, node ); 674 675 // ... set statuses for nodes with when_conditions[node] == false ... 676 677 while ( statement predicate not satisfied) { $\C{// check predicate}$678 679 680 if ( statement predicate is satisfied) break;681 682 683 684 685 686 687 688 747 // ... set statuses for nodes with when_conditions[node] == false ... 748 749 for ( node in nodes ) $\C{// register nodes}$ 750 if ( when_conditions[node] ) 751 register_select( resource, node ); 752 753 while ( !check_completion( nodes ) ) { $\C{// check predicate}$ 754 // block 755 for ( resource in waituntil statement ) { $\C{// run true code blocks}$ 756 if ( check_completion( nodes ) ) break; 757 if ( resource is avail ) 758 try { 759 if( on_selected( resource ) ) $\C{// conditionally run block}$ 760 run code block 761 } finally $\C{// for exception safety}$ 762 unregister_select( resource, node ); $\C{// immediate unregister}$ 763 } 764 } 689 765 } finally { $\C{// for exception safety}$ 690 for ( registered nodes in s ) $\C{// deregister nodes}$766 for ( registered nodes in nodes ) $\C{// deregister nodes}$ 691 767 if ( when_conditions[node] && unregister_select( resource, node ) 692 768 && on_selected( resource ) ) 693 769 run code block $\C{// run code block upon unregister}\CRT$ 770 } 771 694 772 } 695 773 \end{cfa}
Note: See TracChangeset
for help on using the changeset viewer.