Index: doc/theses/colby_parsons_MMAth/text/waituntil.tex
===================================================================
--- doc/theses/colby_parsons_MMAth/text/waituntil.tex	(revision 47b7142b40bf38fed2ced7fe498b771ae5633293)
+++ doc/theses/colby_parsons_MMAth/text/waituntil.tex	(revision d4c1d1f51797f1bdf08c6cfa4ebd67c513877cfc)
@@ -522,53 +522,62 @@
 
 It was deemed important that exclusive-or semantics are maintained when only @or@ operators are used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel.
-Consider the following example with three threads running concurrently.
-\begin{cfa}
-channel A, B; // zero size channels
-
-// Thread 1:
+Consider the following example where thread 1 is reading and threads 2 and 3 are writing to channels @A@ and @B@ concurrently.
+\begin{cquote}
+\begin{tabular}{@{}l|l|l@{}}
+\multicolumn{3}{@{}l@{}}{\lstinline{channel A, B; // zero size channels}} \\
+thread 1 & thread 2 & thread 3 \\
+\begin{cfa}
 waituntil( i << A ) {}
 or waituntil( i << B ) {}
-
-// Thread 2:
-waituntil( A << 1 ) {}
-
-// Thread 3:
-waituntil( B << 2 ) {}
-\end{cfa}
-
-For Thread 1 to have exclusive-or semantics, it must only consume from exactly one of @A@ or @B@.
-As such, Thread 2 and 3 must race to establish the winning clause of the @waituntil@ in Thread 1.
-This race is consolidated by Thread 2 and 3 each attempting to set a pointer to the winning clause's @select_node@ address using \gls{cas}.
-The winner will successfully \gls{cas} the address, insert the value and then signal Thread 1, whereas the loser will fail their \gls{cas}.
-If there is space in the channel, the losing thread may proceed with their channel operation and not signal, otherwise if there is no space they block as they would in a standard channel insert operation.
-It is important that threads race \emph{before} operating on the channel, since if they lose the race they may need to take a different action with respect to their channel operation.
-In the example case, the winner will handoff their value to Thread 1 and the loser will block.
-If the race was consolidated after the operation, both Thread 2 and 3 could potentially write into @i@ concurrently, which would be erroneous behaviour.
-
-Channels introduce another interesting consideration in their implementation.
+\end{cfa}
+&
+\begin{cfa}
+A << 1;
+
+\end{cfa}
+&
+\begin{cfa}
+B << 2;
+
+\end{cfa}
+\end{tabular}
+\end{cquote}
+For thread 1 to have exclusive-or semantics, it must only consume from exactly one of @A@ or @B@.
+As such, thread 2 and 3 must race to establish the winning clause of the @waituntil@ in thread 1.
+This race is consolidated by thread 2 and 3 each attempting to set a pointer to the winning clause's @select_node@ address using \gls{cas}.
+The winner bypasses the channel and inserts into the WUT's left-hand, and signals thread 1.
+The loser continues checking if there is space in the channel, and if so performs the channel insert operation with a possible signal of a waiting remove thread;
+otherwise, if there is no space, the loser blocks.
+It is important the race occurs \emph{before} operating on the channel, because channel actions are different with respect to each thread.
+If the race was consolidated after the operation, both thread 2 and 3 could potentially write into @i@ concurrently.
+
+Channels introduce another interesting implementation issue.
 Supporting both reading and writing to a channel in a @waituntil@ means that one @waituntil@ clause may be the notifier of another @waituntil@ clause.
 This poses a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel.
-
 Consider the following example, alongside a described ordering of events to highlight the race.
-Assume Thread 1 executes first, registers with channel @A@ and proceeds since it is empty and then is interrupted before registering with @B@.
-Thread 2 similarly registers with channel @B@, and proceeds since it doesn't have space to insert and then is interrupted before registering with @A@.
-At this point Thread 1 and 2 resume execution.
+\begin{cquote}
+\begin{tabular}{@{}l|l@{}}
+\multicolumn{2}{@{}l@{}}{\lstinline{channel A, B; // zero size channels}} \\
+thread 1 & thread 2 \\
+\begin{cfa}
+waituntil( @i << A@ ) {}
+or waituntil( @i << B@ ) {}
+\end{cfa}
+&
+\begin{cfa}
+waituntil( @B << 2}@ ) {}
+or waituntil( @A << 1@ ) {}
+\end{cfa}
+\end{tabular}
+\end{cquote}
+Assume thread 1 executes first, registers with channel @A@ and proceeds, since it is empty, and then is interrupted before registering with @B@.
+Thread 2 similarly registers with channel @B@, and proceeds, since it does not have space to insert, and then is interrupted before registering with @A@.
+At this point, thread 1 and 2 resume execution.
 There is now a race that must be dealt with on two fronts.
-If Thread 1 and 2 only race to \gls{cas} a winning clause for their own @waituntil@, Thread 1 may think that it successfully removed from @B@ and Thread 2 may think it successfully inserted into @A@, when only one of these operations can occur.
-\begin{cfa}
-channel A, B; // zero size channels
-
-// Thread 1:
-waituntil( i << A ) {}
-or waituntil( i << B ) {}
-
-// Thread 2:
-waituntil( B << 2 ) {}
-or waituntil( A << 1 ) {}
-\end{cfa}
-
-Go solves this problem in their select statement by acquiring the internal locks of all channels before registering the select on the channels.
-This eliminates the race shown above since both Thread 1 and 2 cannot both be registering at the same time.
-This approach is not used in \CFA since the @waituntil@ is polymorphic.
+If thread 1 and 2 only race to the \gls{cas}, \ie a clause in their own @waituntil@, thread 1 can think that it successfully removed from @B@, and thread 2 may think it successfully inserted into @A@, when only one of these operations occurs.
+
+The Go @select@ solves this problem by acquiring all the internal locks of the channels before registering the @select@ on the channels.
+This approach eliminates the race shown above since thread 1 and 2 cannot both be registering at the same time.
+However, this approach cannot be used in \CFA, since the @waituntil@ is polymorphic.
 Not all types in a @waituntil@ have an internal lock, and when using non-channel types acquiring all the locks incurs extra unneeded overhead.
 Instead, this race is consolidated in \CFA in two phases by having an intermediate pending status value for the race.
@@ -577,45 +586,47 @@
 If either thread successfully sets the the other thread's @waituntil@ race pointer, then the operation can proceed, if not the signalling threads set its own race pointer back to the initial value and repeats.
 This retry mechanism can potentially introduce a livelock, but in practice a livelock here is highly unlikely.
-Furthermore, the likelihood of a livelock here is zero unless the program is in the niche case of having two or more exclusive-or waituntils with 2 or more clauses in reverse order of priority.
-This livelock case could be fully eliminated using locks like Go, or if a \gls{dcas} instruction is available.
-If any other threads attempt to set a WUT's race pointer and see a pending value, they will wait until the value changes before proceeding to ensure that in the case that the WUT fails, the signal will not be lost.
-This protocol ensures that signals will not be lost and that the two races can be resolved in a safe manner.
-
-Channels in \CFA have exception based shutdown mechanisms that the @waituntil@ statement needs to support.
+Furthermore, the likelihood of a livelock here is zero unless the program is in the niche case of having two or more exclusive-or @waituntil@s with two or more clauses in reverse order of priority.
+This livelock case can be fully eliminated using locks like Go, or if a \gls{dcas} instruction is available.
+If any other threads attempt to set a WUT's race pointer and see a pending value, they wait until the value changes before proceeding to ensure that, in the case the WUT fails, the signal is not lost.
+This protocol ensures that signals cannot be lost and that the two races can be resolved in a safe manner.
+\PAB{I bet one of the readers is going to ask you to write the pseudo code for this algorithm.}
+
+Channels in \CFA have exception-based shutdown mechanisms that the @waituntil@ statement needs to support.
 These exception mechanisms are supported through the @on_selected@ routine.
-This routine is needed by channels to detect if they are closed upon waking from a @waituntil@ statement, to ensure that the appropriate behaviour is taken and an exception is thrown.
+This routine is needed by channels to detect if they are closed after unblocking in a @waituntil@ statement, to ensure the appropriate behaviour is taken and an exception is thrown.
 
 \subsection{Guards and Statement Predicate}\label{s:wu_guards}
-Checking for when a synchronous multiplexing utility is done is trivial when it has an or/xor relationship, since any resource becoming available means that the blocked thread can proceed.
-In \uC and \CFA, their \gls{synch_multiplex} utilities involve both an @and@ and @or@ operator, which make the problem of checking for completion of the statement more difficult.
+
+It is trivial to check when a synchronous multiplexing utility is done for the or/xor relationship, since any resource becoming available means that the blocked thread can proceed and the @waituntil@ statement is finished.
+In \uC and \CFA, the \gls{synch_multiplex} mechanism have both an and/or relationship, which make the problem of checking for completion of the statement difficult.
+\PAB{Show an example of why this is difficult.}
 
 In the \uC @_Select@ statement, this problem is solved by constructing a tree of the resources, where the internal nodes are operators and the leaves are booleans storing the state of each resource.
 The internal nodes also store the statuses of the two subtrees beneath them.
-When resources become available, their corresponding leaf node status is modified and then percolates up into the internal nodes to update the state of the statement.
+When resources become available, their corresponding leaf node status is modified, which percolates up the tree to update the state of the statement.
 Once the root of the tree has both subtrees marked as @true@ then the statement is complete.
-As an optimization, when the internal nodes are updated, their subtrees marked as @true@ are pruned and are not touched again.
+As an optimization, when the internal nodes are updated, the subtrees marked as @true@ are pruned and not examined again.
 To support statement guards in \uC, the tree prunes a branch if the corresponding guard is false.
+\PAB{Show an example.}
 
 The \CFA @waituntil@ statement blocks a thread until a set of resources have become available that satisfy the underlying predicate.
 The waiting condition of the @waituntil@ statement can be represented as a predicate over the resources, joined by the @waituntil@ operators, where a resource is @true@ if it is available, and @false@ otherwise.
 In \CFA, this representation is used as the mechanism to check if a thread is done waiting on the @waituntil@.
-Leveraging the compiler, a predicate routine is generated per @waituntil@ that when passed the statuses of the resources, returns @true@ when the @waituntil@ is done, and false otherwise.
+Leveraging the compiler, a predicate routine is generated per @waituntil@ that when passes the statuses of the resources, returns @true@ when the @waituntil@ is done, and false otherwise.
 To support guards on the \CFA @waituntil@ statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate.
-
-\uC's @_Select@, supports operators both inside and outside of the clauses.
-\eg in the following example the code blocks will run once their corresponding predicate inside the round braces is satisfied.
-
+\PAB{Show an example.}
+
+\uC's @_Select@, supports operators both inside and outside of the \lstinline[language=uC++]{_Select} clauses.
+In the following example, the code blocks run once their corresponding predicate inside the round braces is satisfied.
 % C_TODO put this is uC++ code style not cfa-style
-\begin{cfa}
+\begin{lstlisting}[language=uC++]
 Future_ISM<int> A, B, C, D;
 _Select( A || B && C ) { ... }
 and _Select( D && E ) { ... }
-\end{cfa}
-
-This is more expressive that the @waituntil@ statement in \CFA.
-In \CFA, since the @waituntil@ statement supports more resources than just futures, implementing operators inside clauses was avoided for a few reasons.
-As a motivating example, suppose \CFA supported operators inside clauses and consider the code snippet in Figure~\ref{f:wu_inside_op}.
-
-\begin{figure}
+\end{lstlisting}
+This is more expressive that the @waituntil@ statement in \CFA, allowing the code block for @&&@ to only run after both resources are available.
+
+In \CFA, since the @waituntil@ statement supports more resources than just futures, implementing operators inside clauses is avoided for a few reasons.
+As a motivating example, suppose \CFA supported operators inside clauses as in:
 \begin{cfa}
 owner_lock A, B, C, D;
@@ -623,23 +634,26 @@
 or waituntil( C && D ) { ... }
 \end{cfa}
-\caption{Example of unsupported operators inside clauses in \CFA.}
-\label{f:wu_inside_op}
-\end{figure}
-
-If the @waituntil@ in Figure~\ref{f:wu_inside_op} works with the same semantics as described and acquires each lock as it becomes available, it opens itself up to possible deadlocks since it is now holding locks and waiting on other resources.
-Other semantics would be needed to ensure that this operation is safe.
-One possibility is to use \CC's @scoped_lock@ approach that was described in Section~\ref{s:DeadlockAvoidance}, however the potential for livelock leaves much to be desired.
-Another possibility would be to use resource ordering similar to \CFA's @mutex@ statement, but that alone is not sufficient if the resource ordering is not used everywhere.
+If the @waituntil@ acquires each lock as it becomes available, there is a possible deadlock since it is in a hold and wait situation.
+Other semantics are needed to ensure this operation is safe.
+One possibility is to use \CC's @scoped_lock@ approach described in Section~\ref{s:DeadlockAvoidance};
+however, that opens the potential for livelock.
+Another possibility is to use resource ordering similar to \CFA's @mutex@ statement, but that alone is insufficient, if the resource ordering is not used universally.
 Additionally, using resource ordering could conflict with other semantics of the @waituntil@ statement.
-To show this conflict, consider if the locks in Figure~\ref{f:wu_inside_op} were ordered @D@, @B@, @C@, @A@.
-If all the locks are available, it becomes complex to both respect the ordering of the @waituntil@ in Figure~\ref{f:wu_inside_op} when choosing which code block to run and also respect the lock ordering of @D@, @B@, @C@, @A@ at the same time.
+For example, consider if the locks in the example must be acquired in the order @D@, @B@, @C@, @A@ because of other @waituntil@ statements.
+\PAB{I don't understand: If all the locks are available, it becomes complex to both respect the ordering of the \lstinline{waituntil} when choosing which code block to run and also respect the lock ordering of \lstinline{D}, \lstinline{B}, \lstinline{C}, \lstinline{A} at the same time.}
 One other way this could be implemented is to wait until all resources for a given clause are available before proceeding to acquire them, but this also quickly becomes a poor approach.
-This approach won't work due to TOCTOU issues; it is not possible to ensure that the full set resources are available without holding them all first.
-Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems involved, but it was not deemed an important feature when taking into account the runtime cost that would need to be paid to handle these situations.
+This approach does not work due to \gls{toctou} issues;
+it is impossible to ensure that the full set of resources are available without holding them all first.
+Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems involved, but it was not deemed an important feature when taking into account the runtime cost need paid to handle this situation.
 The problem of operators inside clauses also becomes a difficult issue to handle when supporting channels.
-If internal operators were supported, it would require some way to ensure that channels used with internal operators are modified on if and only if the corresponding code block is run, but that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}.
+It would require some way to ensure channels used with internal operators are modified, if and only if, the corresponding code block is run, but that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}.
 
 \subsection{The full \lstinline{waituntil} picture}
-Now that the details have been discussed, the full pseudocode of the waituntil is presented in Figure~\ref{f:WU_Full_Impl}.
+
+Now the details have been discussed, the full pseudocode of the @waituntil@ is presented in Figure~\ref{f:WU_Full_Impl}.
+Some things to note are as follows.
+The @finally@ blocks provide exception-safe \gls{raii} unregistering of nodes, and in particular, the @finally@ inside the innermost loop performs the immediate unregistering required for deadlock-freedom mentioned in Section~\ref{s:wu_locks}.
+The @when_conditions@ array is used to store the boolean result of evaluating each guard at the beginning of the @waituntil@, and it is used to conditionally omit operations on resources with @false@ guards.
+As discussed in Section~\ref{s:wu_chans}, this pseudocode includes conditional code-block execution based on the result of both @on_selected@ and @unregister_select@, which allows the channel implementation to ensure all available channel resources have their corresponding code block run.
 
 \begin{figure}
@@ -682,10 +696,4 @@
 \label{f:WU_Full_Impl}
 \end{figure}
-
-In comparison to Figure~\ref{f:WU_Impl}, this pseudocode now includes the specifics discussed in this chapter.
-Some things to note are as follows:
-The @finally@ blocks provide exception-safe RAII unregistering of nodes, and in particular, the @finally@ inside the innermost loop performs the immediate unregistering required for deadlock-freedom that was mentioned in Section~\ref{s:wu_locks}.
-The @when_conditions@ array is used to store the boolean result of evaluating each guard at the beginning of the @waituntil@, and it is used to conditionally omit operations on resources with @false@ guards.
-As discussed in Section~\ref{s:wu_chans}, this pseudocode includes code blocks conditional on the result of both @on_selected@ and @unregister_select@, which allows the channel implementation to ensure that all available channel resources will have their corresponding code block run.
 
 \section{Waituntil Performance}