source: doc/theses/colby_parsons_MMAth/text/waituntil.tex@ 2e7a299

stuck-waitfor-destruct
Last change on this file since 2e7a299 was f496046, checked in by caparsons <caparson@…>, 3 years ago

incorporated actor and waituntil comments

  • Property mode set to 100644
File size: 56.1 KB
Line 
1% ======================================================================
2% ======================================================================
3\chapter{Waituntil}\label{s:waituntil}
4% ======================================================================
5% ======================================================================
6
7Consider the following motivating problem.
8There are $N$ stalls (resources) in a bathroom and there are $M$ people (threads) using the bathroom.
9Each stall has its own lock since only one person may occupy a stall at a time.
10Humans solve this problem in the following way.
11They check if all of the stalls are occupied.
12If not, they enter and claim an available stall.
13If they are all occupied, people queue and watch the stalls until one is free, and then enter and lock the stall.
14This solution can be implemented on a computer easily, if all threads are waiting on all stalls and agree to queue.
15
16Now the problem is extended.
17Some stalls are wheelchair accessible and some stalls have gender identification.
18Each person (thread) may be limited to only one kind of stall or may choose among different kinds of stalls that match their criteria.
19Immediately, the problem becomes more difficult.
20A single queue no longer solves the problem.
21What happens when there is a stall available that the person at the front of the queue cannot choose?
22The na\"ive solution has each thread spin indefinitely continually checking every matching kind of stall(s) until a suitable one is free.
23This approach is insufficient since it wastes cycles and results in unfairness among waiting threads as a thread can acquire the first matching stall without regard to the waiting time of other threads.
24Waiting for the first appropriate stall (resource) that becomes available without spinning is an example of \gls{synch_multiplex}: the ability to wait synchronously for one or more resources based on some selection criteria.
25
26\section{History of Synchronous Multiplexing}\label{s:History}
27
28There is a history of tools that provide \gls{synch_multiplex}.
29Some well known \gls{synch_multiplex} tools include Unix system utilities: @select@~\cite{linux:select}, @poll@~\cite{linux:poll}, and @epoll@~\cite{linux:epoll}, and the @select@ statement provided by Go~\cite{go:selectref}, Ada~\cite[\S~9.7]{Ada16}, and \uC~\cite[\S~3.3.1]{uC++}.
30The concept and theory surrounding \gls{synch_multiplex} was introduced by Hoare in his 1985 book, Communicating Sequential Processes (CSP)~\cite{Hoare85},
31\begin{quote}
32A communication is an event that is described by a pair $c.v$ where $c$ is the name of the channel on which the communication takes place and $v$ is the value of the message which passes.~\cite[p.~113]{Hoare85}
33\end{quote}
34The ideas in CSP were implemented by Roscoe and Hoare in the language Occam~\cite{Roscoe88}.
35
36Both CSP and Occam include the ability to wait for a \Newterm{choice} among receiver channels and \Newterm{guards} to toggle which receives are valid.
37For example,
38\begin{cfa}[mathescape]
39(@G1@(x) $\rightarrow$ P @|@ @G2@(y) $\rightarrow$ Q )
40\end{cfa}
41waits for either channel @x@ or @y@ to have a value, if and only if guards @G1@ and @G2@ are true;
42if only one guard is true, only one channel receives, and if both guards are false, no receive occurs.
43% extended CSP with a \gls{synch_multiplex} construct @ALT@, which waits for one resource to be available and then executes a corresponding block of code.
44In detail, waiting for one resource out of a set of resources can be thought of as a logical exclusive-or over the set of resources.
45Guards are a conditional operator similar to an @if@, except they apply to the resource being waited on.
46If a guard is false, then the resource it guards is not in the set of resources being waited on.
47If all guards are false, the ALT, Occam's \gls{synch_multiplex} statement, does nothing and the thread continues.
48Guards can be simulated using @if@ statements as shown in~\cite[rule~2.4, p~183]{Roscoe88}
49\begin{lstlisting}[basicstyle=\rm,mathescape]
50ALT( $b$ & $g$ $P$, $G$ ) = IF ( $b$ ALT($\,g$ $P$, $G$ ), $\neg\,$b ALT( $G$ ) ) (boolean guard elim).
51\end{lstlisting}
52but require $2^N-1$ @if@ statements, where $N$ is the number of guards.
53The exponential blowup comes from applying rule 2.4 repeatedly, since it works on one guard at a time.
54Figure~\ref{f:wu_if} shows in \CFA an example of applying rule 2.4 for three guards.
55Also, notice the additional code duplication for statements @S1@, @S2@, and @S3@.
56
57\begin{figure}
58\centering
59\begin{lrbox}{\myboxA}
60\begin{cfa}
61when( G1 )
62 waituntil( R1 ) S1
63or when( G2 )
64 waituntil( R2 ) S2
65or when( G3 )
66 waituntil( R3 ) S3
67
68
69
70
71
72
73
74\end{cfa}
75\end{lrbox}
76
77\begin{lrbox}{\myboxB}
78\begin{cfa}
79if ( G1 )
80 if ( G2 )
81 if ( G3 ) waituntil( R1 ) S1 or waituntil( R2 ) S2 or waituntil( R3 ) S3
82 else waituntil( R1 ) S1 or waituntil( R2 ) S2
83 else
84 if ( G3 ) waituntil( R1 ) S1 or waituntil( R3 ) S3
85 else waituntil( R1 ) S1
86else
87 if ( G2 )
88 if ( G3 ) waituntil( R2 ) S2 or waituntil( R3 ) S3
89 else waituntil( R2 ) S2
90 else
91 if ( G3 ) waituntil( R3 ) S3
92\end{cfa}
93\end{lrbox}
94
95\subfloat[Guards]{\label{l:guards}\usebox\myboxA}
96\hspace*{5pt}
97\vrule
98\hspace*{5pt}
99\subfloat[Simulated Guards]{\label{l:simulated_guards}\usebox\myboxB}
100\caption{\CFA guard simulated with \lstinline{if} statement.}
101\label{f:wu_if}
102\end{figure}
103
104When discussing \gls{synch_multiplex} implementations, the resource being multiplexed is important.
105While CSP waits on channels, the earliest known implementation of synch\-ronous multiplexing is Unix's @select@~\cite{linux:select}, multiplexing over file descriptors.
106The @select@ system-call is passed three sets of file descriptors (read, write, exceptional) to wait on and an optional timeout.
107@select@ blocks until either some subset of file descriptors are available or the timeout expires.
108All file descriptors that are ready are returned by modifying the argument sets to only contain the ready descriptors.
109
110This early implementation differs from the theory presented in CSP: when the call from @select@ returns it may provide more than one ready file descriptor.
111As such, @select@ has logical-or multiplexing semantics, whereas the theory described exclusive-or semantics.
112It is possible to achieve exclusive-or semantics with @select@ by arbitrarily operating on only one of the returned descriptors.
113@select@ passes the interest set of file descriptors between application and kernel in the form of a worst-case sized bit-mask, where the worst-case is the largest numbered file descriptor.
114@poll@ reduces the size of the interest sets changing from a bit mask to a linked data structure, independent of the file-descriptor values.
115@epoll@ further reduces the data passed per call by keeping the interest set in the kernel, rather than supplying it on every call.
116
117These early \gls{synch_multiplex} tools interact directly with the operating system and others are used to communicate among processes.
118Later, \gls{synch_multiplex} started to appear in applications, via programming languages, to support fast multiplexed concurrent communication among threads.
119An early example of \gls{synch_multiplex} is the @select@ statement in Ada~\cite[\S~9.7]{Ichbiah79}.
120This @select@ allows a task object, with their own threads, to multiplex over a subset of asynchronous calls to its methods.
121The Ada @select@ has the same exclusive-or semantics and guards as Occam ALT;
122however, it multiplexes over methods rather than channels.
123
124\begin{figure}
125\begin{lstlisting}[language=ada,literate=]
126task type buffer is -- thread
127 ... -- buffer declarations
128 count : integer := 0;
129begin -- thread starts here
130 loop
131 select
132 when count < Size => -- guard
133 accept insert( elem : in ElemType ) do -- method
134 ... -- add to buffer
135 count := count + 1;
136 end;
137 -- executed if this accept called
138 or
139 when count > 0 => -- guard
140 accept remove( elem : out ElemType ) do -- method
141 ... --remove and return from buffer via parameter
142 count := count - 1;
143 end;
144 -- executed if this accept called
145 or delay 10.0; -- unblock after 10 seconds without call
146 or else -- do not block, cannot appear with delay
147 end select;
148 end loop;
149end buffer;
150var buf : buffer; -- create task object and start thread in task body
151\end{lstlisting}
152\caption{Ada Bounded Buffer}
153\label{f:BB_Ada}
154\end{figure}
155
156Figure~\ref{f:BB_Ada} shows the outline of a bounded buffer implemented with an Ada task.
157Note, a task method is associated with the \lstinline[language=ada]{accept} clause of the \lstinline[language=ada]{select} statement, rather than being a separate routine.
158The thread executing the loop in the task body blocks at the \lstinline[language=ada]{select} until a call occurs to @insert@ or @remove@.
159Then the appropriate \lstinline[language=ada]{accept} method is run with the called arguments.
160Hence, the \lstinline[language=ada]{select} statement provides rendezvous points for threads, rather than providing channels with message passing.
161The \lstinline[language=ada]{select} statement also provides a timeout and @else@ (nonblocking), which changes synchronous multiplexing to asynchronous.
162Now the thread polls rather than blocks.
163
164Another example of programming-language \gls{synch_multiplex} is Go using a @select@ statement with channels~\cite{go:selectref}.
165Figure~\ref{l:BB_Go} shows the outline of a bounded buffer implemented with a Go routine.
166Here two channels are used for inserting and removing by client producers and consumers, respectively.
167(The @term@ and @stop@ channels are used to synchronize with the program main.)
168Go's @select@ has the same exclusive-or semantics as the ALT primitive from Occam and associated code blocks for each clause like ALT and Ada.
169However, unlike Ada and ALT, Go does not provide guards for the \lstinline[language=go]{case} clauses of the \lstinline[language=go]{select}.
170Go also provides a timeout via a channel and a @default@ clause like Ada @else@ for asynchronous multiplexing.
171
172\begin{figure}
173\centering
174
175\begin{lrbox}{\myboxA}
176\begin{lstlisting}[language=go,literate=]
177func main() {
178 insert := make( chan int, Size )
179 remove := make( chan int, Size )
180 term := make( chan string )
181 finish := make( chan string )
182
183 buf := func() {
184 L: for {
185 select { // wait for message
186 case i = <- buffer:
187 case <- term: break L
188 }
189 remove <- i;
190 }
191 finish <- "STOP" // completion
192 }
193 go buf() // start thread in buf
194}
195
196
197
198
199\end{lstlisting}
200\end{lrbox}
201
202\begin{lrbox}{\myboxB}
203\begin{lstlisting}[language=uC++=]
204_Task BoundedBuffer {
205 ... // buffer declarations
206 int count = 0;
207 public:
208 void insert( int elem ) {
209 ... // add to buffer
210 count += 1;
211 }
212 int remove() {
213 ... // remove and return from buffer
214 count -= 1;
215 }
216 private:
217 void main() {
218 for ( ;; ) {
219 _Accept( ~buffer ) break;
220 or _When ( count < Size ) _Accept( insert );
221 or _When ( count > 0 ) _Accept( remove );
222 }
223 }
224};
225buffer buf; // start thread in main method
226\end{lstlisting}
227\end{lrbox}
228
229\subfloat[Go]{\label{l:BB_Go}\usebox\myboxA}
230\hspace*{5pt}
231\vrule
232\hspace*{5pt}
233\subfloat[\uC]{\label{l:BB_uC++}\usebox\myboxB}
234
235\caption{Bounded Buffer}
236\label{f:AdaMultiplexing}
237\end{figure}
238
239Finally, \uC provides \gls{synch_multiplex} with Ada-style @select@ over monitor and task methods with the @_Accept@ statement~\cite[\S~2.9.2.1]{uC++}, and over futures with the @_Select@ statement~\cite[\S~3.3.1]{uC++}.
240The @_Select@ statement extends the ALT/Go @select@ by offering both @and@ and @or@ semantics, which can be used together in the same statement.
241Both @_Accept@ and @_Select@ statements provide guards for multiplexing clauses, as well as, timeout, and @else@ clauses.
242
243There are other languages that provide \gls{synch_multiplex}, including Rust's @select!@ over futures~\cite{rust:select}, OCaml's @select@ over channels~\cite{ocaml:channel}, and C++14's @when_any@ over futures~\cite{cpp:whenany}.
244Note that while C++14 and Rust provide \gls{synch_multiplex}, the implementations leave much to be desired as both rely on polling to wait on multiple resources.
245
246\section{Other Approaches to Synchronous Multiplexing}
247
248To avoid the need for \gls{synch_multiplex}, all communication among threads/processes must come from a single source.
249For example, in Erlang each process has a single heterogeneous mailbox that is the sole source of concurrent communication, removing the need for \gls{synch_multiplex} as there is only one place to wait on resources.
250Similar, actor systems circumvent the \gls{synch_multiplex} problem as actors only block when waiting for the next message never in a behaviour.
251While these approaches solve the \gls{synch_multiplex} problem, they introduce other issues.
252Consider the case where a thread has a single source of communication and it wants a set of @N@ resources.
253It must sequentially request the @N@ resources and wait for each response.
254During the receives for the @N@ resources, it can receive other communication, and has to save and postpone these communications, or discard them.
255% If the requests for the other resources need to be retracted, the burden falls on the programmer to determine how to synchronize appropriately to ensure that only one resource is delivered.
256
257\section{\CFA's Waituntil Statement}
258
259The new \CFA \gls{synch_multiplex} utility introduced in this work is the @waituntil@ statement.
260There already exists a @waitfor@ statement in \CFA that supports Ada-style \gls{synch_multiplex} over monitor methods~\cite{Delisle21}, so this @waituntil@ focuses on synchronizing over other resources.
261All of the \gls{synch_multiplex} features mentioned so far are monomorphic, only waiting on one kind of resource: Unix @select@ supports file descriptors, Go's @select@ supports channel operations, \uC's @select@ supports futures, and Ada's @select@ supports monitor method calls.
262The \CFA @waituntil@ is polymorphic and provides \gls{synch_multiplex} over any objects that satisfy the trait in Figure~\ref{f:wu_trait}.
263No other language provides a synchronous multiplexing tool polymorphic over resources like \CFA's @waituntil@.
264
265\begin{figure}
266\begin{cfa}
267forall(T & | sized(T))
268trait is_selectable {
269 // For registering a waituntil stmt on a selectable type
270 bool register_select( T &, select_node & );
271
272 // For unregistering a waituntil stmt from a selectable type
273 bool unregister_select( T &, select_node & );
274
275 // on_selected is run on the selecting thread prior to executing
276 // the statement associated with the select_node
277 bool on_selected( T &, select_node & );
278};
279\end{cfa}
280\caption{Trait for types that can be passed into \CFA's \lstinline{waituntil} statement.}
281\label{f:wu_trait}
282\end{figure}
283
284Currently locks, channels, futures and timeouts are supported by the @waituntil@ statement, and this set can be expanded through the @is_selectable@ trait as other use-cases arise.
285The @waituntil@ statement supports guard clauses, both @or@ and @and@ semantics, and timeout and @else@ for asynchronous multiplexing.
286Figure~\ref{f:wu_example} shows a \CFA @waituntil@ usage, which is waiting for either @Lock@ to be available \emph{or} for a value to be read from @Channel@ into @i@ \emph{and} for @Future@ to be fulfilled \emph{or} a timeout of one second.
287Note, the expression inside a @waituntil@ clause is evaluated once at the start of the @waituntil@ algorithm.
288
289\begin{figure}
290\begin{cfa}
291future(int) Future;
292channel(int) Channel;
293owner_lock Lock;
294int i = 0;
295
296waituntil( Lock ) { ... }
297or when( i == 0 ) waituntil( i << Channel ) { ... }
298and waituntil( Future ) { ... }
299or waituntil( timeout( 1`s ) ) { ... }
300// else { ... }
301\end{cfa}
302\caption{Example of \CFA's waituntil statement}
303\label{f:wu_example}
304\end{figure}
305
306\section{Waituntil Semantics}
307
308The @waituntil@ semantics has two parts: the semantics of the statement itself, \ie @and@, @or@, @when@ guards, and @else@ semantics, and the semantics of how the @waituntil@ interacts with types like locks, channels, and futures.
309
310\subsection{Statement Semantics}
311
312The @or@ semantics are the most straightforward and nearly match those laid out in the ALT statement from Occam.
313The clauses have an exclusive-or relationship where the first available one is run and only one clause is run.
314\CFA's @or@ semantics differ from ALT semantics: instead of randomly picking a clause when multiple are available, the first clause in the @waituntil@ that is available is executed.
315For example, in the following example, if @foo@ and @bar@ are both available, @foo@ is always selected since it comes first in the order of @waituntil@ clauses.
316\begin{cfa}
317future(int) bar, foo;
318waituntil( foo ) { ... } or waituntil( bar ) { ... }
319\end{cfa}
320The reason for this semantics is that prioritizing resources can be useful in certain problems.
321In the rare case where there is a starvation problem with the ordering, it possible to follow a @waituntil@ with its reverse form:
322\begin{cfa}
323waituntil( foo ) { ... } or waituntil( bar ) { ... } // prioritize foo
324waituntil( bar ) { ... } or waituntil( foo ) { ... } // prioritize bar
325\end{cfa}
326
327The \CFA @and@ semantics match the @and@ semantics of \uC \lstinline[language=uC++]{_Select}.
328When multiple clauses are joined by @and@, the @waituntil@ makes a thread wait for all to be available, but still runs the corresponding code blocks \emph{as they become available}.
329When an @and@ clause becomes available, the waiting thread unblocks and runs that clause's code-block, and then the thread waits again for the next available clause or the @waituntil@ statement is now true.
330This semantics allows work to be done in parallel while synchronizing over a set of resources, and furthermore, gives a good reason to use the @and@ operator.
331If the @and@ operator waited for all clauses to be available before running, it is the same as just acquiring those resources consecutively by a sequence of @waituntil@ statements.
332
333As for normal C expressions, the @and@ operator binds more tightly than the @or@.
334To give an @or@ operator higher precedence, parenthesis are used.
335For example, the following @waituntil@ unconditionally waits for @C@ and one of either @A@ or @B@, since the @or@ is given higher precedence via parenthesis.
336\begin{cfa}
337@(@ waituntil( A ) { ... } // bind tightly to or
338or waituntil( B ) { ... } @)@
339and waituntil( C ) { ... }
340\end{cfa}
341
342The guards in the @waituntil@ statement are called @when@ clauses.
343Each boolean expression inside a @when@ is evaluated \emph{once} before the @waituntil@ statement is run.
344Like Occam's ALT, the guards toggle clauses on and off, where a @waituntil@ clause is only evaluated and waited on if the corresponding guard is @true@.
345In addition, the @waituntil@ guards require some nuance since both @and@ and @or@ operators are supported \see{Section~\ref{s:wu_guards}}.
346When a guard is false and a clause is removed, it can be thought of as removing that clause and its preceding operation from the statement.
347For example, in the following, the two @waituntil@ statements are semantically equivalent.
348
349\begin{lrbox}{\myboxA}
350\begin{cfa}
351when( true ) waituntil( A ) { ... }
352or when( false ) waituntil( B ) { ... }
353and waituntil( C ) { ... }
354\end{cfa}
355\end{lrbox}
356
357\begin{lrbox}{\myboxB}
358\begin{cfa}
359waituntil( A ) { ... }
360and waituntil( C ) { ... }
361
362\end{cfa}
363\end{lrbox}
364
365\begin{tabular}{@{}lcl@{}}
366\usebox\myboxA & $\equiv$ & \usebox\myboxB
367\end{tabular}
368
369The @else@ clause on the @waituntil@ has identical semantics to the @else@ clause in Ada.
370If all resources are not immediately available and there is an @else@ clause, the @else@ clause is run and the thread continues.
371
372\subsection{Type Semantics}
373
374As mentioned, to support interaction with the @waituntil@ statement a type must support the trait in Figure~\ref{f:wu_trait}.
375The @waituntil@ statement expects types to register and unregister themselves via calls to @register_select@ and @unregister_select@, respectively.
376When a resource becomes available, @on_selected@ is run, and if it returns false, the corresponding code block is not run.
377Many types do not need @on_selected@, but it is provided if a type needs to perform work or checks before the resource can be accessed in the code block.
378The register/unregister routines in the trait also return booleans.
379The return value of @register_select@ is @true@, if the resource is immediately available and @false@ otherwise.
380The return value of @unregister_select@ is @true@, if the corresponding code block should be run after unregistration and @false@ otherwise.
381The routine @on_selected@ and the return value of @unregister_select@ are needed to support channels as a resource.
382More detail on channels and their interaction with @waituntil@ appear in Section~\ref{s:wu_chans}.
383
384The trait can be used directly by having a blocking object support the @is_selectable@ trait, or it can be used indirectly through routines that take the object as an argument.
385When used indirectly, the object's routine returns a type that supports the @is_selectable@ trait.
386This feature leverages \CFA's ability to overload on return type to select the correct overloaded routine for the @waituntil@ context.
387Indirect support through routines is needed for types that want to support multiple operations such as channels that allow both reading and writing.
388
389\section{\lstinline{waituntil} Implementation}
390
391The @waituntil@ statement is not inherently complex, and Figure~\ref{f:WU_Impl} only shows the basic outline of the @waituntil@ algorithm.
392The complexity comes from the consideration of race conditions and synchronization needed when supporting various primitives.
393The following sections then use examples to fill in details missing in Figure~\ref{f:WU_Impl}.
394The full pseudocode for the @waituntil@ algorithm is presented in Figure~\ref{f:WU_Full_Impl}.
395
396\begin{figure}
397\begin{cfa}
398select_nodes s[N]; $\C[3.25in]{// declare N select nodes}$
399for ( node in s ) $\C{// register nodes}$
400 register_select( resource, node );
401while ( statement predicate not satisfied ) { $\C{// check predicate}$
402 // block until clause(s) satisfied
403 for ( resource in waituntil statement ) { $\C{// run true code blocks}$
404 if ( resource is avail ) run code block
405 if ( statement predicate is satisfied ) break;
406 }
407}
408for ( node in s ) $\C{// deregister nodes}\CRT$
409 if ( unregister_select( resource, node ) ) run code block
410\end{cfa}
411\caption{\lstinline{waituntil} Implementation}
412\label{f:WU_Impl}
413\end{figure}
414
415The basic steps of the algorithm are:
416\begin{enumerate}
417\item
418The @waituntil@ statement declares $N$ @select_node@s, one per resource that is being waited on, which stores any @waituntil@ data pertaining to that resource.
419
420\item
421Each @select_node@ is then registered with the corresponding resource.
422
423\item
424The thread executing the @waituntil@ then loops until the statement's predicate is satisfied.
425In each iteration, if the predicate is unsatisfied, the @waituntil@ thread blocks.
426When another thread satisfies a resource clause (\eg sends to a channel), it unblocks the @waituntil@ thread.
427This thread checks all clauses for completion, and any completed clauses have their code blocks run.
428While checking clause completion, if enough clauses have been run such that the statement predicate is satisfied, the loop exits early.
429
430\item
431Once the thread escapes the loop, the @select_nodes@ are unregistered from the resources.
432\end{enumerate}
433These steps give a basic overview of how the statement works.
434The following sections shed light on the specific changes and provide more implementation detail.
435
436\subsection{Locks}\label{s:wu_locks}
437
438The \CFA runtime supports a number of spinning and blocking locks, \eg semaphore, MCS, futex, Go mutex, spinlock, owner, \etc.
439Many of these locks satisfy the @is_selectable@ trait, and hence, are resources supported by the @waituntil@ statement.
440For example, the following waits until the thread has acquired lock @l1@ or locks @l2@ and @l3@.
441\begin{cfa}
442owner_lock l1, l2, l3;
443waituntil ( l1 ) { ... }
444or waituntil( l2 ) { ... }
445and waituntil( l3 ) { ... }
446\end{cfa}
447Implicitly, the @waituntil@ is calling the lock acquire for each of these locks to establish a position in the lock's queue of waiting threads.
448When the lock schedules this thread, it unblocks and runs the code block associated with the lock and then releases the lock.
449
450In detail, when a thread waits on multiple locks via a @waituntil@, it enqueues a @select_node@ in each of the lock's waiting queues.
451When a @select_node@ reaches the front of the lock's queue and gains ownership, the thread blocked on the @waituntil@ is unblocked.
452Now, the lock is held by the @waituntil@ thread until the code block is executed, and then the node is unregistered, during which the lock is released.
453Immediately releasing the lock prevents the waiting thread from holding multiple locks and potentially introducing a deadlock.
454As such, the only unregistered nodes associated with locks are the ones that have not run.
455
456\subsection{Timeouts}
457
458A timeout for the @waituntil@ statement is a duration passed to \lstinline[deletekeywords={timeout}]{timeout}, \eg:
459\begin{cquote}
460\begin{tabular}{@{}l|l@{}}
461\multicolumn{2}{@{}l@{}}{\lstinline{Duration D1\{ 1`ms \}, D2\{ 2`ms \}, D3\{ 3`ms \};}} \\
462\begin{cfa}[deletekeywords={timeout}]
463waituntil( i << C1 ) {}
464or waituntil( i << C2 ) {}
465or waituntil( i << C3 ) {}
466or waituntil( timeout( D1 ) ) {}
467or waituntil( timeout( D2 ) ) {}
468or waituntil( timeout( D3 ) ) {}
469\end{cfa}
470&
471\begin{cfa}[deletekeywords={timeout}]
472waituntil( i << C1 ) {}
473or waituntil( i << C2 ) {}
474or waituntil( i << C3 ) {}
475or waituntil( timeout( min( D1, D2, D3 ) ) ) {}
476
477
478\end{cfa}
479\end{tabular}
480\end{cquote}
481These two examples are basically equivalent.
482Here, the multiple timeouts are useful because the durations can change during execution and the separate clauses provide different code blocks if a timeout triggers.
483Multiple timeouts can also be used with @and@ to provide a minimal delay before proceeding.
484In following example, either channel @C1@ or @C2@ must be satisfied but nothing can be done for at least 1 or 3 seconds after the channel read, respctively.
485\begin{cfa}[deletekeywords={timeout}]
486waituntil( i << C1 ); and waituntil( timeout( 1`s ) );
487or waituntil( i << C2 ); and waituntil( timeout( 3`s ) );
488\end{cfa}
489If only @C2@ is satisfied, \emph{both} timeout code-blocks trigger because 1 second ocurs before 3 seconds.
490Note, the \CFA @waitfor@ statement only provides a single @timeout@ clause because it only supports @or@ semantics.
491
492The \lstinline[deletekeywords={timeout}]{timeout} routine is different from UNIX @sleep@, which blocks for the specified duration and returns the amount of time elapsed since the call started.
493Instead, \lstinline[deletekeywords={timeout}]{timeout} returns a type that supports the @is_selectable@ trait, allowing the type system to select the correct overloaded routine for this context.
494For the @waituntil@, it is more idiomatic for the \lstinline[deletekeywords={timeout}]{timeout} to use the same syntax as other blocking operations instead of having a special language clause.
495
496\subsection{Channels}\label{s:wu_chans}
497
498Channels require more complexity to allow synchronous multiplexing.
499For locks, when an outside thread releases a lock and unblocks the waituntil thread (WUT), the lock's MX property is passed to the WUT (no spinning locks).
500For futures, the outside thread deliveries a value to the future and unblocks any waiting threads, including WUTs.
501In either case, after the WUT unblocks it is safe to execute its the corresponding code block knowing access to the resource is protected by the lock or the read-only state of the future.
502Similarly, for channels, when an outside thread inserts a value into a channel, it must unblock the WUT.
503However, for channels, there is a race condition that poses an issue.
504If the outside thread inserts into the channel and unblocks the WUT, there is a race where another thread can remove the channel data, so after the WUT unblocks and attempts to remove from the buffer, it fails, and the WUT must reblock (busy waiting).
505This scenario is a \gls{toctou} race that needs to be consolidated.
506To close the race, the outside thread must detect this case and insert directly into the left-hand side of the channel expression (@i << chan@) rather than into the channel, and then unblock the WUT.
507Now the unblocked WUT is guaranteed to have a satisfied resource and its code block can safely executed.
508The insertion circumvents the channel buffer via the wait-morphing in the \CFA channel implementation \see{Section~\ref{s:chan_impl}}, allowing @waituntil@ channel unblocking to not be special-cased.
509
510Furthermore, if both @and@ and @or@ operators are used, the @or@ operations stop behaving like exclusive-or due to the race among channel operations, \eg:
511\begin{cfa}
512waituntil( i << A ) {} and waituntil( i << B ) {}
513or waituntil( i << C ) {} and waituntil( i << D ) {}
514\end{cfa}
515If exclusive-or semantics are followed, only the code blocks for @A@ and @B@ are run, or the code blocks for @C@ and @D@.
516However, four outside threads can simultaneously put values into @i@ and attempt to unblock the WUT to run the four code-blocks.
517This case introduces a race with complexity that increases with the size of the @waituntil@ statement.
518However, due to TOCTOU issues, it is impossible to know if all resources are available without acquiring all the internal locks of channels in the subtree of the @waituntil@ clauses.
519This approach is a poor solution for two reasons.
520It is possible that once all the locks are acquired the subtree is not satisfied and the locks must be released.
521This work incurs a high cost for signalling threads and heavily increase contention on internal channel locks.
522Furthermore, the @waituntil@ statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible.
523As such, the exclusive-or semantics are lost when using both @and@ and @or@ operators since it cannot be supported without significant complexity and significantly affects @waituntil@ performance.
524
525It was deemed important that exclusive-or semantics are maintained when only @or@ operators are used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel.
526Consider the following example where thread 1 is reading and threads 2 and 3 are writing to channels @A@ and @B@ concurrently.
527\begin{cquote}
528\begin{tabular}{@{}l|l|l@{}}
529\multicolumn{3}{@{}l@{}}{\lstinline{channel A, B; // zero size channels}} \\
530thread 1 & thread 2 & thread 3 \\
531\begin{cfa}
532waituntil( i << A ) {}
533or waituntil( i << B ) {}
534\end{cfa}
535&
536\begin{cfa}
537A << 1;
538
539\end{cfa}
540&
541\begin{cfa}
542B << 2;
543
544\end{cfa}
545\end{tabular}
546\end{cquote}
547For thread 1 to have exclusive-or semantics, it must only consume from exactly one of @A@ or @B@.
548As such, thread 2 and 3 must race to establish the winning clause of the @waituntil@ in thread 1.
549This race is consolidated by thread 2 and 3 each attempting to set a pointer to the winning clause's @select_node@ address using \gls{cas}.
550The winner bypasses the channel and inserts into the WUT's left-hand, and signals thread 1.
551The loser continues checking if there is space in the channel, and if so performs the channel insert operation with a possible signal of a waiting remove thread;
552otherwise, if there is no space, the loser blocks.
553It is important the race occurs \emph{before} operating on the channel, because channel actions are different with respect to each thread.
554If the race was consolidated after the operation, both thread 2 and 3 could potentially write into @i@ concurrently.
555
556Channels introduce another interesting implementation issue.
557Supporting both reading and writing to a channel in a @waituntil@ means that one @waituntil@ clause may be the notifier of another @waituntil@ clause.
558This poses a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel.
559Consider the following example, alongside a described ordering of events to highlight the race.
560\begin{cquote}
561\begin{tabular}{@{}l|l@{}}
562\multicolumn{2}{@{}l@{}}{\lstinline{channel A, B; // zero size channels}} \\
563thread 1 & thread 2 \\
564\begin{cfa}[moredelim={**[is][\color{blue}]{\#}{\#}}]
565waituntil( @i << A@ ) {}
566or waituntil( #i << B# ) {}
567\end{cfa}
568&
569\begin{cfa}[moredelim={**[is][\color{blue}]{\#}{\#}}]
570waituntil( #B << 2# ) {}
571or waituntil( @A << 1@ ) {}
572\end{cfa}
573\end{tabular}
574\end{cquote}
575Assume thread 1 executes first, registers with channel @A@ and proceeds, since it is empty, and then is interrupted before registering with @B@.
576Thread 2 similarly registers with channel @B@, and proceeds, since it does not have space to insert, and then is interrupted before registering with @A@.
577At this point, thread 1 and 2 resume execution.
578There is now a race that must be dealt with on two fronts.
579If thread 1 and 2 only race to the \gls{cas}, \ie a clause in their own @waituntil@, thread 1 can think that it successfully removed from @B@, and thread 2 may think it successfully inserted into @A@, when only one of these operations occurs.
580
581The Go @select@ solves this problem by acquiring all the internal locks of the channels before registering the @select@ on the channels.
582This approach eliminates the race shown above since thread 1 and 2 cannot both be registering at the same time.
583However, this approach cannot be used in \CFA, since the @waituntil@ is polymorphic.
584Not all types in a @waituntil@ have an internal lock, and when using non-channel types acquiring all the locks incurs extra unneeded overhead.
585Instead, this race is consolidated in \CFA in two phases by having an intermediate pending status value for the race.
586This race case is detectable, and if detected, each thread first races to set its own @waituntil@ race pointer to be pending.
587If it succeeds, it then attempts to set the other thread's @waituntil@ race pointer to its success value.
588If either thread successfully sets the the other thread's @waituntil@ race pointer, then the operation can proceed, if not the signalling threads set its own race pointer back to the initial value and repeats.
589This retry mechanism can potentially introduce a livelock, but in practice a livelock here is highly unlikely.
590Furthermore, the likelihood of a livelock here is zero unless the program is in the niche case of having two or more exclusive-or @waituntil@s with two or more clauses in reverse order of priority.
591This livelock case can be fully eliminated using locks like Go, or if a \gls{dcas} instruction is available.
592If any other threads attempt to set a WUT's race pointer and see a pending value, they wait until the value changes before proceeding to ensure that, in the case the WUT fails, the signal is not lost.
593This protocol ensures that signals cannot be lost and that the two races can be resolved in a safe manner.
594The implementation of this protocol is shown in Figure~\ref{f:WU_DeadlockAvoidance}.
595
596\begin{figure}
597\begin{cfa}
598bool pending_set_other( select_node & other, select_node & mine ) {
599 unsigned long int cmp_status = UNSAT;
600
601 // Try to set other status, if we succeed break and return true
602 while( !CAS( other.clause_status, &cmp_status, SAT ) ) {
603 if ( cmp_status == SAT )
604 return false; // If other status is SAT we lost so return false
605
606 // Toggle own status flag to allow other thread to potentially win
607 mine.status = UNSAT;
608
609 // Reset compare flag
610 cmp_status = UNSAT;
611
612 // Attempt to set own status flag back to PENDING to retry
613 if ( !CAS( mine.clause_status, &cmp_status, PENDING ) )
614 return false; // If we fail then we lost so return false
615
616 // Reset compare flag
617 cmp_status = UNSAT;
618 }
619 return true;
620}
621\end{cfa}
622\caption{Exclusive-or \lstinline{waituntil} channel deadlock avoidance protocol}
623\label{f:WU_DeadlockAvoidance}
624\end{figure}
625
626Channels in \CFA have exception-based shutdown mechanisms that the @waituntil@ statement needs to support.
627These exception mechanisms are supported through the @on_selected@ routine.
628This routine is needed by channels to detect if they are closed after unblocking in a @waituntil@ statement, to ensure the appropriate behaviour is taken and an exception is thrown.
629
630\subsection{Guards and Statement Predicate}\label{s:wu_guards}
631
632It is trivial to check when a synchronous multiplexing utility is done for the or/xor relationship, since any resource becoming available means that the blocked thread can proceed and the @waituntil@ statement is finished.
633In \uC and \CFA, the \gls{synch_multiplex} mechanism have both an and/or relationship, which along with guards, make the problem of checking for completion of the statement difficult.
634Consider the @waituntil@ in Figure~\ref{f:WU_ComplexPredicate}.
635When the @waituntil@ thread wakes up, checking if the statement is complete is non-trivial.
636The predicate that will return if the statement in Figure~\ref{f:WU_ComplexPredicate} is satisfied is the following.
637\begin{cfa}
638A && B || C || !GA && B || !GB && A || !GA && !GB && !GC
639\end{cfa}
640Which simplifies to:
641\begin{cfa}
642( A || !GA ) && ( B || !GB ) || C || !GA && !GB && !GC
643\end{cfa}
644Checking a predicate this large with each iteration is expensive so \uC and \CFA both take steps to simplify checking statement completion.
645
646\begin{figure}
647\begin{cfa}
648when( GA ) waituntil( A ) {}
649and when( GB ) waituntil( B ) {}
650or when( GC ) waituntil( C ) {}
651\end{cfa}
652\caption{\lstinline{waituntil} with a non-trivial predicate}
653\label{f:WU_ComplexPredicate}
654\end{figure}
655
656In the \uC @_Select@ statement, this problem is solved by constructing a tree of the resources, where the internal nodes are operators and the leaves are booleans storing the state of each resource.
657The internal nodes also store the statuses of the two subtrees beneath them.
658When resources become available, their corresponding leaf node status is modified, which percolates up the tree to update the state of the statement.
659Once the root of the tree has both subtrees marked as @true@ then the statement is complete.
660As an optimization, when the internal nodes are updated, the subtrees marked as @true@ are pruned and not examined again.
661To support statement guards in \uC, the tree is modified to remove an internal node if a guard is false to maintain the appropriate predicate representation.
662An diagram of the tree for the statement in Figure~\ref{f:WU_ComplexPredicate} is shown in Figure~\ref{f:uC_select_tree}, alongside the modification of the tree that occurs when @GA@ is @false@.
663
664\begin{figure}
665\begin{center}
666\input{diagrams/uCpp_select_tree.tikz}
667\end{center}
668\caption{\uC select tree modification}
669\label{f:uC_select_tree}
670\end{figure}
671
672The \CFA @waituntil@ statement blocks a thread until a set of resources have become available that satisfy the underlying predicate.
673The waiting condition of the @waituntil@ statement can be represented as a predicate over the resources, joined by the @waituntil@ operators, where a resource is @true@ if it is available, and @false@ otherwise.
674In \CFA, this representation is used as the mechanism to check if a thread is done waiting on the @waituntil@.
675Leveraging the compiler, a predicate routine is generated per @waituntil@ that when passes the statuses of the resources, returns @true@ when the @waituntil@ is done, and false otherwise.
676To support guards on the \CFA @waituntil@ statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate.
677The generated code allows the predicate that is checked with each iteration to be simplified to not check guard values.
678For example, the following would be generated for the @waituntil@ shown in Figure~\ref{f:WU_ComplexPredicate}.
679\begin{cfa}
680// statement completion predicate
681bool check_completion( select_node * nodes ) {
682 return nodes[0].status && nodes[1].status || nodes[2].status;
683}
684
685// skip statement if all guards false
686if ( GA || GB || GC ) {
687 select_node nodes[3];
688 nodes[0].status = !GA && GB; // A's status
689 nodes[1].status = !GB && GA; // B's status
690 nodes[2].status = !GC; // C's status
691
692 // ... rest of waituntil codegen ...
693
694}
695\end{cfa}
696
697\uC's @_Select@, supports operators both inside and outside of the \lstinline[language=uC++]{_Select} clauses.
698In the following example, the code blocks run once their corresponding predicate inside the round braces is satisfied.
699
700\begin{lstlisting}[language=uC++,{moredelim=**[is][\color{red}]{@}{@}}]
701Future_ISM<int> A, B, C, D;
702_Select( @A || B && C@ ) { ... }
703and _Select( @D && E@ ) { ... }
704\end{lstlisting}
705This is more expressive that the @waituntil@ statement in \CFA, allowing the code block for @&&@ to only run after both resources are available.
706
707In \CFA, since the @waituntil@ statement supports more resources than just futures, implementing operators inside clauses is avoided for a few reasons.
708As a motivating example, suppose \CFA supported operators inside clauses as in:
709\begin{cfa}
710owner_lock A, B, C, D;
711waituntil( A && B ) { ... }
712or waituntil( C && D ) { ... }
713\end{cfa}
714If the @waituntil@ acquires each lock as it becomes available, there is a possible deadlock since it is in a hold and wait situation.
715Other semantics are needed to ensure this operation is safe.
716One possibility is to use \CC's @scoped_lock@ approach described in Section~\ref{s:DeadlockAvoidance};
717however, that opens the potential for livelock.
718Another possibility is to use resource ordering similar to \CFA's @mutex@ statement, but that alone is insufficient, if the resource ordering is not used universally.
719One other way this could be implemented is to wait until all resources for a given clause are available before proceeding to acquire them, but this also quickly becomes a poor approach.
720This approach does not work due to \gls{toctou} issues;
721it is impossible to ensure that the full set of resources are available without holding them all first.
722Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems involved, but it was not deemed an important feature when taking into account the runtime cost need paid to handle this situation.
723The problem of operators inside clauses also becomes a difficult issue to handle when supporting channels.
724It would require some way to ensure channels used with internal operators are modified, if and only if, the corresponding code block is run, but that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}.
725
726\subsection{The full \lstinline{waituntil} picture}
727
728Now the details have been discussed, the full pseudocode of the @waituntil@ is presented in Figure~\ref{f:WU_Full_Impl}.
729Some things to note are as follows.
730The @finally@ blocks provide exception-safe \gls{raii} unregistering of nodes, and in particular, the @finally@ inside the innermost loop performs the immediate unregistering required for deadlock-freedom mentioned in Section~\ref{s:wu_locks}.
731The @when_conditions@ array is used to store the boolean result of evaluating each guard at the beginning of the @waituntil@, and it is used to conditionally omit operations on resources with @false@ guards.
732As discussed in Section~\ref{s:wu_chans}, this pseudocode includes conditional code-block execution based on the result of both @on_selected@ and @unregister_select@, which allows the channel implementation to ensure all available channel resources have their corresponding code block run.
733
734\begin{figure}
735\begin{cfa}
736bool when_conditions[N];
737for ( node in nodes ) $\C[3.75in]{// evaluate guards}$
738 if ( node has guard )
739 when_conditions[node] = node_guard;
740 else
741 when_conditions[node] = true;
742
743if ( any when_conditions[node] == true ) {
744
745select_nodes nodes[N]; $\C{// declare N select nodes}$
746try {
747 // ... set statuses for nodes with when_conditions[node] == false ...
748
749 for ( node in nodes ) $\C{// register nodes}$
750 if ( when_conditions[node] )
751 register_select( resource, node );
752
753 while ( !check_completion( nodes ) ) { $\C{// check predicate}$
754 // block
755 for ( resource in waituntil statement ) { $\C{// run true code blocks}$
756 if ( check_completion( nodes ) ) break;
757 if ( resource is avail )
758 try {
759 if( on_selected( resource ) ) $\C{// conditionally run block}$
760 run code block
761 } finally $\C{// for exception safety}$
762 unregister_select( resource, node ); $\C{// immediate unregister}$
763 }
764 }
765} finally { $\C{// for exception safety}$
766 for ( registered nodes in nodes ) $\C{// deregister nodes}$
767 if ( when_conditions[node] && unregister_select( resource, node )
768 && on_selected( resource ) )
769 run code block $\C{// run code block upon unregister}\CRT$
770}
771
772}
773\end{cfa}
774\caption{Full \lstinline{waituntil} Pseudocode Implementation}
775\label{f:WU_Full_Impl}
776\end{figure}
777
778\section{Waituntil Performance}
779
780Similar facilities to @waituntil@ are discussed in Section~\ref{s:History} covering C, Ada, Rust, \CC, and OCaml.
781However, these facilities are either not meaningful or feasible to benchmark against.
782The UNIX @select@ and related utilities are not comparable since they are system calls that go into the kernel and operate on file descriptors, whereas the @waituntil@ exists solely in user space.
783Ada's \lstinline[language=Ada]{select} and \uC's \lstinline[language=uC++]{_Accept} only operates on method calls, which is done in \CFA via the @waitfor@ statement, so it is not meaningful to benchmark against the @waituntil@, which cannot wait on this resource.
784Rust and \CC only offer a busy-wait approach, which is not comparable to a blocking approach.
785OCaml's @select@ waits on channels that are not comparable with \CFA and Go channels, so OCaml @select@ is not benchmarked against Go's @select@ and \CFA's @waituntil@.
786
787The two \gls{synch_multiplex} utilities that are in the realm of comparability with the \CFA @waituntil@ statement are the Go \lstinline[language=Go]{select} statement and the \uC \lstinline[language=uC++]{_Select} statement.
788As such, two microbenchmarks are presented, one for Go and one for \uC to contrast this feature.
789Given the differences in features, polymorphism, and expressibility between @waituntil@ and \lstinline[language=Go]{select}, and \uC \lstinline[language=uC++]{_Select}, the aim of the microbenchmarking in this chapter is to show that these implementations lie in the same realm of performance, not to pick a winner.
790
791\subsection{Channel Benchmark}
792
793The channel multiplexing benchmarks compare \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}, where the resource being waited on is a set of channels.
794The basic structure of the benchmark has the number of cores split evenly between producer and consumer threads, \ie, with 8 cores there are 4 producer and 4 consumer threads.
795The number of resource clauses $C$ is also varied across 2, 4, and 8 clauses, where each clause has a different channel that is waits on.
796Each producer and consumer repeatedly waits to either produce or consume from one of the $C$ clauses and respective channels.
797For example, in \CFA syntax, the work loop in the consumer main with $C = 4$ clauses is:
798\begin{cfa}
799for ()
800 waituntil( val << chans[0] ); or waituntil( val << chans[1] );
801 or waituntil( val << chans[2] ); or waituntil( val << chans[3] );
802\end{cfa}
803A successful consumption is counted as a channel operation, and the throughput of these operations is measured over 10 seconds.
804The first benchmark measures throughput of the producers and consumer synchronously waiting on the channels and the second has the threads asynchronously wait on the channels using the Go @default@ and \CFA @else@ clause.
805The results are shown in Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} respectively.
806
807\begin{figure}
808 \centering
809 \captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize}
810 \subfloat[AMD]{
811 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_2.pgf}}
812 }
813 \subfloat[Intel]{
814 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_2.pgf}}
815 }
816 \bigskip
817
818 \subfloat[AMD]{
819 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_4.pgf}}
820 }
821 \subfloat[Intel]{
822 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_4.pgf}}
823 }
824 \bigskip
825
826 \subfloat[AMD]{
827 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_8.pgf}}
828 }
829 \subfloat[Intel]{
830 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_8.pgf}}
831 }
832 \caption{The channel synchronous multiplexing benchmark comparing Go select and \CFA \lstinline{waituntil} statement throughput (higher is better).}
833 \label{f:select_contend_bench}
834\end{figure}
835
836\begin{figure}
837 \centering
838 \captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize}
839 \subfloat[AMD]{
840 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_2.pgf}}
841 }
842 \subfloat[Intel]{
843 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_2.pgf}}
844 }
845 \bigskip
846
847 \subfloat[AMD]{
848 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_4.pgf}}
849 }
850 \subfloat[Intel]{
851 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_4.pgf}}
852 }
853 \bigskip
854
855 \subfloat[AMD]{
856 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_8.pgf}}
857 }
858 \subfloat[Intel]{
859 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_8.pgf}}
860 }
861 \caption{The asynchronous multiplexing channel benchmark comparing Go select and \CFA \lstinline{waituntil} statement throughput (higher is better).}
862 \label{f:select_spin_bench}
863\end{figure}
864
865Both Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} have similar results when comparing \lstinline[language=Go]{select} and @waituntil@.
866In the AMD benchmarks (left column), the performance is very similar as the number of cores scale.
867The AMD machine has a high-caching contention cost because of its \emph{chicklet} L3 cache (\ie many L3 caches servicing a small number of cores), which creates a bottleneck on the channel locks and dominates the shape of the performance curve for both \CFA and Go.
868Hence, it is difficult to detect differences in the \gls{synch_multiplex}, except at low cores, where Go has significantly better performance, due to an optimization in its scheduler.
869Go heavily optimizes thread handoffs on the local run-queue, which can result in very good performance for low numbers of threads parking/unparking each other~\cite{go:sched}.
870In the Intel benchmarks (right column), \CFA performs better than Go as the number of cores scales past 2/4 and as the number of clauses increase.
871This difference is due to Go's acquiring all channel locks when registering and unregistering channels on a \lstinline[language=Go]{select}.
872Go then is holding a lock for every channel, resulting in worse performance as the number of channels increase.
873In \CFA, since races are consolidated without holding all locks, it scales much better both with cores and clauses since more work can occur in parallel.
874This scalability difference is more significant on the Intel machine than the AMD machine since the Intel machine has lower cache contention costs.
875
876The Go approach of holding all internal channel-locks in the \lstinline[language=Go]{select} has additional drawbacks.
877There are pathological cases where Go's throughput has significant jitter.
878Consider a producer and consumer thread, @P1@ and @C1@, selecting from both channels @A@ and @B@.
879\begin{cquote}
880\begin{tabular}{@{}ll@{}}
881@P1@ & @C1@ \\
882\begin{cfa}
883waituntil( A << i ); or waituntil( B << i );
884\end{cfa}
885&
886\begin{cfa}
887waituntil( val << A ); or waituntil( val << B );
888\end{cfa}
889\end{tabular}
890\end{cquote}
891Additionally, there is another producer and consumer thread, @P2@ and @C2@, operating solely on @B@.
892\begin{cquote}
893\begin{tabular}{@{}ll@{}}
894@P2@ & @C2@ \\
895\begin{cfa}
896B << val;
897\end{cfa}
898&
899\begin{cfa}
900val << B;
901\end{cfa}
902\end{tabular}
903\end{cquote}
904In Go, this setup results in significantly worse performance since @P2@ and @C2@ cannot operate in parallel with @P1@ and @C1@ due to all locks being acquired.
905Interesting, this case may not be as pathological as it seems.
906If the set of channels belonging to a select have channels that overlap with the set of another select, these statements lose the ability to operate in parallel.
907The implementation in \CFA only holds a single lock at a time, resulting in better locking granularity, and hence, more parallelism.
908Comparison of this pathological case is shown in Table~\ref{t:pathGo}.
909The AMD results highlight the worst case scenario for Go since contention is more costly on this machine than the Intel machine.
910
911\begin{table}[t]
912\centering
913\setlength{\extrarowheight}{2pt}
914\setlength{\tabcolsep}{5pt}
915
916\caption{Throughput (channel operations per second) of \CFA and Go for a pathologically case for contention in Go's select implementation}
917\label{t:pathGo}
918\begin{tabular}{r|r|r}
919 & \multicolumn{1}{c|}{\CFA} & \multicolumn{1}{c}{Go} \\
920 \hline
921 AMD & \input{data/nasus_Order} \\
922 \hline
923 Intel & \input{data/pyke_Order}
924\end{tabular}
925\end{table}
926
927Another difference between Go and \CFA is the order of clause selection when multiple clauses are available.
928Go \emph{randomly} selects a clause~\cite{go:select}, but \CFA chooses in the order clauses are listed.
929This \CFA design decision allows users to set implicit priorities, which can result in more predictable behaviour and even better performance.
930In the previous example, threads @P1@ and @C1@ prioritize channel @A@ in the @waituntil@, which can reduce contention for threads @P2@ and @C2@ accessing channel @B@.
931If \CFA did not have priorities, the performance difference in Table~\ref{t:pathGo} would be significant less due to extra contention on channel @B@.
932
933\subsection{Future Benchmark}
934
935The future benchmark compares \CFA's @waituntil@ with \uC's \lstinline[language=uC++]{_Select}, with both utilities waiting on futures.
936While both statements have very similar semantics, supporting @and@ and @or@ operators, \lstinline[language=uC++]{_Select} can only wait on futures, whereas the @waituntil@ is polymorphic.
937As such, the underlying implementation of the operators differs between @waituntil@ and \lstinline[language=uC++]{_Select}.
938The @waituntil@ statement checks for statement completion using a predicate function, whereas the \lstinline[language=uC++]{_Select} statement maintains a tree that represents the state of the internal predicate.
939
940\begin{figure}
941 \centering
942 \subfloat[AMD Future Synchronization Benchmark]{
943 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Future.pgf}}
944 \label{f:futureAMD}
945 }
946 \subfloat[Intel Future Synchronization Benchmark]{
947 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Future.pgf}}
948 \label{f:futureIntel}
949 }
950 \caption{\CFA \lstinline{waituntil} and \uC \lstinline{_Select} statement throughput synchronizing on a set of futures with varying wait predicates (higher is better).}
951 \caption{}
952 \label{f:futurePerf}
953\end{figure}
954
955This benchmark aims to indirectly measure the impact of various predicates on the performance of the @waituntil@ and \lstinline[language=uC++]{_Select} statements.
956The benchmark is indirect since the performance of futures in \CFA and \uC differ by a significant margin.
957The experiment has a server, which cycles fulfilling three futures, @A@, @B@, and @C@, and a client, which waits for these futures to be fulfilled using four different kinds of predicates given in \CFA:
958\begin{cquote}
959\begin{tabular}{@{}l|l@{}}
960OR & AND \\
961\hline
962\begin{cfa}
963waituntil( A ) { get( A ); }
964or waituntil( B ) { get( B ); }
965or waituntil( C ) { get( C ); }
966\end{cfa}
967&
968\begin{cfa}
969waituntil( A ) { get( A ); }
970and waituntil( B ) { get( B ); }
971and waituntil( C ) { get( C ); }
972\end{cfa}
973\\
974\multicolumn{2}{@{}c@{}}{} \\
975AND-OR & OR-AND \\
976\hline
977\begin{cfa}
978waituntil( A ) { get( A ); }
979and waituntil( B ) { get( B ); }
980or waituntil( C ) { get( C ); }
981\end{cfa}
982&
983\begin{cfa}
984@(@ waituntil( A ) { get( A ); }
985or waituntil( B ) { get( B ); } @)@
986and waituntil( C ) { get( C ); }
987\end{cfa}
988\end{tabular}
989\end{cquote}
990The server and client use a low cost synchronize after each fulfillment, so the server does not race ahead of the client.
991
992Results of this benchmark are shown in Figure~\ref{f:futurePerf}.
993Each pair of bars is marked with the predicate name for that experiment and the value at the top of each bar is the standard deviation..
994In detail, \uC results are lower in all cases due to the performance difference between futures and the more complex \gls{synch_multiplex} implementation.
995However, the bars for both systems have similar height patterns across the experiments.
996The @OR@ column for \CFA is more performant than the other \CFA predicates, due to the special-casing of @waituntil@ statements with only @or@ operators.
997For both \uC and \CFA, the @AND@ experiment is the least performant, which is expected since all three futures need to be fulfilled for each statement completion.
998Interestingly, \CFA has lower variation across predicates on the AMD (excluding the special OR case), whereas \uC has lower variation on the Intel.
999Given the differences in semantics and implementation between \uC and \CFA, this test only illustrates the overall costs among the different kinds of predicates.
Note: See TracBrowser for help on using the repository browser.