Context Navigation

source: doc/theses/colby_parsons_MMAth/text/waituntil.tex @ 494a7e5

Last change on this file since 494a7e5 was 494a7e5, checked in by caparsons <caparson@…>, 11 months ago
more 7.5 improvements. Tried to improve chapter flow
Property mode set to `100644`
File size: 48.8 KB

Line
1	% ======================================================================
2	% ======================================================================
3	\chapter{Waituntil}\label{s:waituntil}
4	% ======================================================================
5	% ======================================================================
6
7	Consider the following motivating problem.
8	There are $N$ stalls (resources) in a bathroom and there are $M$ people (threads) using the bathroom.
9	Each stall has its own lock since only one person may occupy a stall at a time.
10	Humans solve this problem in the following way.
11	They check if all of the stalls are occupied.
12	If not, they enter and claim an available stall.
13	If they are all occupied, people queue and watch the stalls until one is free, and then enter and lock the stall.
14	This solution can be implemented on a computer easily, if all threads are waiting on all stalls and agree to queue.
15
16	Now the problem is extended.
17	Some stalls are wheelchair accessible and some stalls have specific gender identification.
18	Each person (thread) may be limited to only one kind of stall or may choose among different kinds of stalls that match their criteria.
19	Immediately, the problem becomes more difficult.
20	A single queue no longer fully solves the problem.
21	What happens when there is a stall available that the person at the front of the queue cannot choose?
22	The na\"ive solution has each thread spin indefinitely continually checking the every matching kind of stall(s) until a suitable one is free.
23	This approach is insufficient since it wastes cycles and results in unfairness among waiting threads as a thread can acquire the first matching stall without regard to the waiting time of other threads.
24	Waiting for the first appropriate stall (resource) that becomes available without spinning is an example of \gls{synch_multiplex}: the ability to wait synchronously for one or more resources based on some selection criteria.
25
26	\section{History of Synchronous Multiplexing}
27	There is a history of tools that provide \gls{synch_multiplex}.
28	Some well known \gls{synch_multiplex} tools include Unix system utilities: @select@~\cite{linux:select}, @poll@~\cite{linux:poll}, and @epoll@~\cite{linux:epoll}, and the @select@ statement provided by Go~\cite{go:selectref}, Ada~\cite[\S~9.7]{Ada16}, and \uC~\cite[\S~3.3.1]{uC++}.
29	The concept and theory surrounding \gls{synch_multiplex} was introduced by Hoare in his 1985 book, Communicating Sequential Processes (CSP)~\cite{Hoare85},
30	\begin{quote}
31	A communication is an event that is described by a pair $c.v$ where $c$ is the name of the channel on which the communication takes place and $v$ is the value of the message which passes.~\cite[p.~113]{Hoare85}
32	\end{quote}
33	The ideas in CSP were implemented by Roscoe and Hoare in the language Occam~\cite{Roscoe88}.
34
35	Both CSP and Occam include the ability to wait for a \Newterm{choice} among receiver channels and \Newterm{guards} to toggle which receives are valid.
36	For example,
37	\begin{cfa}[mathescape]
38	(@G1@(x) $\rightarrow$ P @\|@ @G2@(y) $\rightarrow$ Q )
39	\end{cfa}
40	waits for either channel @x@ or @y@ to have a value, if and only guards @G1@ and @G2@ are true;
41	if only one guard is true, only one channel receives, and if both guards are false, no receive occurs.
42	% extended CSP with a \gls{synch_multiplex} construct @ALT@, which waits for one resource to be available and then executes a corresponding block of code.
43	In detail, waiting for one resource out of a set of resources can be thought of as a logical exclusive-or over the set of resources.
44	Guards are a conditional operator similar to an @if@, except they apply to the resource being waited on.
45	If a guard is false, then the resource it guards is not in the set of resources being waited on.
46	If all guards are false, the ALT, Occam's \gls{synch_multiplex} statement, does nothing and the thread continues.
47	Guards can be simulated using @if@ statements as shown in~\cite[rule~2.4, p~183]{Roscoe88}
48	\begin{lstlisting}[basicstyle=\rm,mathescape]
49	ALT( $b$ & $g$ $P$, $G$ ) = IF ( $b$ ALT($\,g$ $P$, $G$ ), $\neg\,$b ALT( $G$ ) ) (boolean guard elim).
50	\end{lstlisting}
51	but require $2^N-1$ @if@ statements, where $N$ is the number of guards.
52	The exponential blowup comes from applying rule 2.4 repeatedly, since it works on one guard at a time.
53	Figure~\ref{f:wu_if} shows an example of applying rule 2.4 for three guards.
54	Also, notice the additional code duplication for statements @S1@, @S2@, and @S3@.
55
56	\begin{figure}
57	\centering
58	\begin{lrbox}{\myboxA}
59	\begin{cfa}
60	when( G1 )
61	waituntil( R1 ) S1
62	or when( G2 )
63	waituntil( R2 ) S2
64	or when( G3 )
65	waituntil( R3 ) S3
66
67
68
69
70
71
72
73	\end{cfa}
74	\end{lrbox}
75
76	\begin{lrbox}{\myboxB}
77	\begin{cfa}
78	if ( G1 )
79	if ( G2 )
80	if ( G3 ) waituntil( R1 ) S1 or waituntil( R2 ) S2 or waituntil( R3 ) S3
81	else waituntil( R1 ) S1 or waituntil( R2 ) S2
82	else
83	if ( G3 ) waituntil( R1 ) S1 or waituntil( R3 ) S3
84	else waituntil( R1 ) S1
85	else
86	if ( G2 )
87	if ( G3 ) waituntil( R2 ) S2 or waituntil( R3 ) S3
88	else waituntil( R2 ) S2
89	else
90	if ( G3 ) waituntil( R3 ) S3
91	\end{cfa}
92	\end{lrbox}
93
94	\subfloat[Guards]{\label{l:guards}\usebox\myboxA}
95	\hspace*{5pt}
96	\vrule
97	\hspace*{5pt}
98	\subfloat[Simulated Guards]{\label{l:simulated_guards}\usebox\myboxB}
99	\caption{\CFA guard simulated with \lstinline{if} statement.}
100	\label{f:wu_if}
101	\end{figure}
102
103	When discussing \gls{synch_multiplex} implementations, the resource being multiplexed is important.
104	While CSP wait on channels, the earliest known implementation of synch\-ronous multiplexing is Unix's @select@~\cite{linux:select}, multiplexing over file descriptors.
105	The @select@ system-call is passed three sets of file descriptors (read, write, exceptional) to wait on and an optional timeout.
106	@select@ blocks until either some subset of file descriptors are available or the timeout expires.
107	All file descriptors that are ready are returned by modifying the argument sets to only contain the ready descriptors.
108
109	This early implementation differs from the theory presented in CSP: when the call from @select@ returns it may provide more than one ready file descriptor.
110	As such, @select@ has logical-or multiplexing semantics, whereas the theory described exclusive-or semantics.
111	It is possible to achieve exclusive-or semantics with @select@ by arbitrarily operating on only one of the returned descriptors.
112	@select@ passes the interest set of file descriptors between application and kernel in the form of a worst-case sized bit-mask, where the worst-case is the largest numbered file descriptor.
113	@poll@ reduces the size of the interest sets changing from a bit mask to a linked data structures, independent of the file-descriptor values.
114	@epoll@ further reduces the data passed per call by keeping the interest set in the kernel, rather than supplying it on every call.
115
116	These early \gls{synch_multiplex} tools interact directly with the operating system and others are used to communicate among processes.
117	Later, \gls{synch_multiplex} started to appear in applications, via programming languages, to support fast multiplexed concurrent communication among threads.
118	An early example of \gls{synch_multiplex} is the @select@ statement in Ada~\cite[\S~9.7]{Ichbiah79}.
119	The @select@ statement in Ada allows a task object, with their own threads, to multiplex over a subset of asynchronous calls its methods.
120	The Ada @select@ statement has the same exclusive-or semantics and guards as Occam ALT;
121	however, it multiplexes over methods rather than channels.
122
123	\begin{figure}
124	\begin{lstlisting}[language=ada,literate=]
125	task type buffer is -- thread
126	... -- buffer declarations
127	count : integer := 0;
128	begin -- thread starts here
129	loop
130	select
131	when count < Size => -- guard
132	accept insert( elem : in ElemType ) do -- method
133	... -- add to buffer
134	count := count + 1;
135	end;
136	-- executed if this accept called
137	or
138	when count > 0 => -- guard
139	accept remove( elem : out ElemType ) do -- method
140	... --remove and return from buffer via parameter
141	count := count - 1;
142	end;
143	-- executed if this accept called
144	or delay 10.0; -- unblock after 10 seconds without call
145	or else -- do not block, cannot appear with delay
146	end select;
147	end loop;
148	end buffer;
149	var buf : buffer; -- create task object and start thread in task body
150	\end{lstlisting}
151	\caption{Ada Bounded Buffer}
152	\label{f:BB_Ada}
153	\end{figure}
154
155	Figure~\ref{f:BB_Ada} shows the outline of a bounded buffer implemented with Ada task.
156	Note, a task method is associated with the \lstinline[language=ada]{accept} clause of the \lstinline[language=ada]{select} statement, rather than being a separate routine.
157	The thread executing the loop in the task body blocks at the \lstinline[language=ada]{select} until a call occurs to @insert@ or @remove@.
158	Then the appropriate \lstinline[language=ada]{accept} method is run with the called arguments.
159	Hence, the @select@ statement provides rendezvous points for threads, rather than providing channels with message passing.
160	The \lstinline[language=ada]{select} statement also provides a timeout and @else@ (nonblocking), which changes synchronous multiplexing to asynchronous.
161	Now the thread polls rather than blocks.
162
163	Another example of programming-language \gls{synch_multiplex} is Go using a @select@ statement with channels~\cite{go:selectref}.
164	Figure~\ref{l:BB_Go} shows the outline of a bounded buffer implemented with a Go routine.
165	Here two channels are used for inserting and removing by client producers and consumers, respectively.
166	(The @term@ and @stop@ channels are used to synchronize with the program main.)
167	Go's @select@ has the same exclusive-or semantics as the ALT primitive from Occam and associated code blocks for each clause like ALT and Ada.
168	However, unlike Ada and ALT, Go does not provide guards for the \lstinline[language=go]{case} clauses of the \lstinline[language=go]{select}.
169	Go also provides a timeout via a channel and a @default@ clause like Ada @else@ for asynchronous multiplexing.
170
171	\begin{figure}
172	\centering
173
174	\begin{lrbox}{\myboxA}
175	\begin{lstlisting}[language=go,literate=]
176	func main() {
177	insert := make( chan int, Size )
178	remove := make( chan int, Size )
179	term := make( chan string )
180	finish := make( chan string )
181
182	buf := func() {
183	L: for {
184	select { // wait for message
185	case i = <- buffer:
186	case <- term: break L
187	}
188	remove <- i;
189	}
190	finish <- "STOP" // completion
191	}
192	go buf() // start thread in buf
193	}
194
195
196
197
198	\end{lstlisting}
199	\end{lrbox}
200
201	\begin{lrbox}{\myboxB}
202	\begin{lstlisting}[language=uC++=]
203	_Task BoundedBuffer {
204	... // buffer declarations
205	int count = 0;
206	public:
207	void insert( int elem ) {
208	... // add to buffer
209	count += 1;
210	}
211	int remove() {
212	... // remove and return from buffer
213	count -= 1;
214	}
215	private:
216	void main() {
217	for ( ;; ) {
218	_Accept( ~buffer ) break;
219	or _When ( count < Size ) _Accept( insert );
220	or _When ( count > 0 ) _Accept( remove );
221	}
222	}
223	};
224	buffer buf; // start thread in main method
225	\end{lstlisting}
226	\end{lrbox}
227
228	\subfloat[Go]{\label{l:BB_Go}\usebox\myboxA}
229	\hspace*{5pt}
230	\vrule
231	\hspace*{5pt}
232	\subfloat[\uC]{\label{l:BB_uC++}\usebox\myboxB}
233
234	\caption{Bounded Buffer}
235	\label{f:AdaMultiplexing}
236	\end{figure}
237
238	Finally, \uC provides \gls{synch_multiplex} with Ada-style @select@ over monitor and task methods with the @_Accept@ statement~\cite[\S~2.9.2.1]{uC++}, and over futures with the @_Select@ statement~\cite[\S~3.3.1]{uC++}.
239	The @_Select@ statement extends the ALT/Go @select@ by offering both @and@ and @or@ semantics, which can be used together in the same statement.
240	Both @_Accept@ and @_Select@ statements provide guards for multiplexing clauses, as well as, timeout, and @else@ clauses.
241
242	There are other languages that provide \gls{synch_multiplex}, including Rust's @select!@ over futures~\cite{rust:select}, OCaml's @select@ over channels~\cite{ocaml:channel}, and C++14's @when_any@ over futures~\cite{cpp:whenany}.
243	Note that while C++14 and Rust provide \gls{synch_multiplex}, the implementations leave much to be desired as both rely on polling to wait on multiple resources.
244
245	\section{Other Approaches to Synchronous Multiplexing}
246
247	To avoid the need for \gls{synch_multiplex}, all communication among threads/processes must come from a single source.
248	For example, in Erlang each process has a single heterogeneous mailbox that is the sole source of concurrent communication, removing the need for \gls{synch_multiplex} as there is only one place to wait on resources.
249	Similar, actor systems circumvent the \gls{synch_multiplex} problem as actors only block when waiting for the next message never in a behaviour.
250	While these approaches solve the \gls{synch_multiplex} problem, they introduce other issues.
251	Consider the case where a thread has a single source of communication and it wants a set of @N@ resources.
252	It sequentially requests the @N@ resources and waits for each response.
253	During the receives for the @N@ resources, it can receive other communication, and has to save and postpone these communications, or discard them.
254	% If the requests for the other resources need to be retracted, the burden falls on the programmer to determine how to synchronize appropriately to ensure that only one resource is delivered.
255
256	\section{\CFA's Waituntil Statement}
257
258	The new \CFA \gls{synch_multiplex} utility introduced in this work is the @waituntil@ statement.
259	There is a @waitfor@ statement in \CFA that supports Ada-style \gls{synch_multiplex} over monitor methods, so this @waituntil@ focuses on synchronizing over other resources.
260	All of the \gls{synch_multiplex} features mentioned so far are monomorphic, only waiting on one kind of resource: Unix @select@ supports file descriptors, Go's @select@ supports channel operations, \uC's @select@ supports futures, and Ada's @select@ supports monitor method calls.
261	The \CFA @waituntil@ is polymorphic and provides \gls{synch_multiplex} over any objects that satisfy the trait in Figure~\ref{f:wu_trait}.
262	No other language provides a synchronous multiplexing tool polymorphic over resources like \CFA's @waituntil@.
263
264	\begin{figure}
265	\begin{cfa}
266	forall(T & \| sized(T))
267	trait is_selectable {
268	// For registering a waituntil stmt on a selectable type
269	bool register_select( T &, select_node & );
270
271	// For unregistering a waituntil stmt from a selectable type
272	bool unregister_select( T &, select_node & );
273
274	// on_selected is run on the selecting thread prior to executing
275	// the statement associated with the select_node
276	bool on_selected( T &, select_node & );
277	};
278	\end{cfa}
279	\caption{Trait for types that can be passed into \CFA's \lstinline{waituntil} statement.}
280	\label{f:wu_trait}
281	\end{figure}
282
283	Currently locks, channels, futures and timeouts are supported by the @waituntil@ statement, and can be expanded through the @is_selectable@ trait as other use-cases arise.
284	The @waituntil@ statement supports guarded clauses, both @or@ and @and@ semantics, and provides an @else@ for asynchronous multiplexing.
285	Figure~\ref{f:wu_example} shows a \CFA @waituntil@ usage, which is waiting for either @Lock@ to be available \emph{or} for a value to be read from @Channel@ into @i@ \emph{and} for @Future@ to be fulfilled \emph{or} a timeout of one second.
286
287	\begin{figure}
288	\begin{cfa}
289	future(int) Future;
290	channel(int) Channel;
291	owner_lock Lock;
292	int i = 0;
293
294	waituntil( Lock ) { ... }
295	or when( i == 0 ) waituntil( i << Channel ) { ... }
296	and waituntil( Future ) { ... }
297	or waituntil( timeout( 1`s ) ) { ... }
298	// else { ... }
299	\end{cfa}
300	\caption{Example of \CFA's waituntil statement}
301	\label{f:wu_example}
302	\end{figure}
303
304	\section{Waituntil Semantics}
305
306	The @waituntil@ semantics has two parts: the semantics of the statement itself, \ie @and@, @or@, @when@ guards, and @else@ semantics, and the semantics of how the @waituntil@ interacts with types like channels, locks and futures.
307
308	\subsection{Statement Semantics}
309
310	The @or@ semantics are the most straightforward and nearly match those laid out in the ALT statement from Occam.
311	The clauses have an exclusive-or relationship where the first available one is run and only one clause is run.
312	\CFA's @or@ semantics differ from ALT semantics: instead of randomly picking a clause when multiple are available, the first clause in the @waituntil@ that is available is executed.
313	For example, in the following example, if @foo@ and @bar@ are both available, @foo@ is always selected since it comes first in the order of @waituntil@ clauses.
314	\begin{cfa}
315	future(int) bar, foo;
316
317	waituntil( foo ) { ... }
318	or waituntil( bar ) { ... }
319	\end{cfa}
320
321	The \CFA @and@ semantics match the @and@ semantics of \uC \lstinline[language=uC++]{_Select}.
322	When multiple clauses are joined by @and@, the @waituntil@ makes a thread wait for all to be available, but still runs the corresponding code blocks \emph{as they become available}.
323	When an @and@ clause becomes available, the waiting thread unblocks and runs that clause's code-block, and then the thread waits again for the next available clause or the @waituntil@ statement is now true.
324	This semantics allows work to be done in parallel while synchronizing over a set of resources, and furthermore, gives a good reason to use the @and@ operator.
325	If the @and@ operator waited for all clauses to be available before running, it would be the same as just acquiring those resources consecutively by a sequence of @waituntil@ statements.
326
327	As for normal C expressions, the @and@ operator binds more tightly than the @or@.
328	To give an @or@ operator higher precedence, parenthesis are used.
329	For example, the following @waituntil@ unconditionally waits for @C@ and one of either @A@ or @B@, since the @or@ is given higher precedence via parenthesis.
330	\begin{cfa}
331	@(@ waituntil( A ) { ... } // bind tightly to or
332	or waituntil( B ) { ... } @)@
333	and waituntil( C ) { ... }
334	\end{cfa}
335
336	The guards in the @waituntil@ statement are called @when@ clauses.
337	Each boolean expression inside a @when@ is evaluated \emph{once} before the @waituntil@ statement is run.
338	Like Occam's ALT, the guards toggle clauses on and off, where a @waituntil@ clause is only evaluated and waited on if the corresponding guard is @true@.
339	In addition, the @waituntil@ guards require some nuance since both @and@ and @or@ operators are supported \see{Section~\ref{s:wu_guards}}.
340	When a guard is false and a clause is removed, it can be thought of as removing that clause and its preceding operation from the statement.
341	For example, in the following, the two @waituntil@ statements are semantically equivalent.
342
343	\begin{lrbox}{\myboxA}
344	\begin{cfa}
345	when( true ) waituntil( A ) { ... }
346	or when( false ) waituntil( B ) { ... }
347	and waituntil( C ) { ... }
348	\end{cfa}
349	\end{lrbox}
350
351	\begin{lrbox}{\myboxB}
352	\begin{cfa}
353	waituntil( A ) { ... }
354	and waituntil( C ) { ... }
355
356	\end{cfa}
357	\end{lrbox}
358
359	\begin{tabular}{@{}lcl@{}}
360	\usebox\myboxA & $\equiv$ & \usebox\myboxB
361	\end{tabular}
362
363	The @else@ clause on the @waituntil@ has identical semantics to the @else@ clause in Ada.
364	If all resources are not immediately available and there is an @else@ clause, the @else@ clause is run and the thread continues.
365
366	\subsection{Type Semantics}
367
368	As mentioned, to support interaction with the @waituntil@ statement a type must support the trait in Figure~\ref{f:wu_trait}.
369	The @waituntil@ statement expects types to register and unregister themselves via calls to @register_select@ and @unregister_select@, respectively.
370	When a resource becomes available, @on_selected@ is run, and if it returns false, the corresponding code block is not run.
371	Many types do not need @on_selected@, but it is provided if a type needs to perform work or checks before the resource can be accessed in the code block.
372	The register/unregister routines in the trait also return booleans.
373	The return value of @register_select@ is @true@, if the resource is immediately available and @false@ otherwise.
374	The return value of @unregister_select@ is @true@, if the corresponding code block should be run after unregistration and @false@ otherwise.
375	The routine @on_selected@ and the return value of @unregister_select@ are needed to support channels as a resource.
376	More detail on channels and their interaction with @waituntil@ appear in Section~\ref{s:wu_chans}.
377
378	\section{\lstinline{waituntil} Implementation}
379	The @waituntil@ statement is not inherently complex, and the pseudo code is presented in Figure~\ref{f:WU_Impl}.
380	The complexity comes from the consideration of race conditions and synchronization needed when supporting various primitives.
381	Figure~\ref{f:WU_Impl} aims to introduce the reader to the rudimentary idea and control flow of the @waituntil@.
382	The following sections then use examples to fill in details that Figure~\ref{f:WU_Impl} does not provide.
383	Finally, the full pseudocode of the waituntil is presented in Figure~\ref{f:WU_Full_Impl}.
384	The basic steps of the @waituntil@ statement are:
385
386	\begin{figure}
387	\begin{cfa}
388	select_nodes s[N]; $\C[3.25in]{// declare N select nodes}$
389	for ( node in s ) $\C{// register nodes}$
390	register_select( resource, node );
391	while ( statement predicate not satisfied ) { $\C{// check predicate}$
392	// block
393	for ( resource in waituntil statement ) { $\C{// run true code blocks}$
394	if ( statement predicate is satisfied ) break;
395	if ( resource is avail ) run code block
396	}
397	}
398	for ( node in s ) $\C{// deregister nodes}\CRT$
399	if ( unregister_select( resource, node ) ) run code block
400	\end{cfa}
401	\caption{\lstinline{waituntil} Implementation}
402	\label{f:WU_Impl}
403	\end{figure}
404
405	\begin{enumerate}
406	\item
407	The @waituntil@ statement declares $N$ @select_node@s, one per resource that is being waited on, which stores any @waituntil@ data pertaining to that resource.
408
409	\item
410	Each @select_node@ is then registered with the corresponding resource.
411
412	\item
413	The thread executing the @waituntil@ then loops until the statement's predicate is satisfied.
414	In each iteration, if the predicate is unsatisfied, the thread blocks.
415	If clauses becomes satisfied, the thread unblocks, and for each satisfied clause the block fails and the thread proceeds, otherwise the block succeeds (like a semaphore where a block is a @P()@ and a satisfied clause is a @V()@).
416	After proceeding past the block all clauses are checked for completion and the completed clauses have their code blocks run.
417	While checking clause completion, if enough clauses have been run such that the statement predicate is satisfied, the loop exits early.
418	In the case where the block succeeds, the thread will be woken by the thread that marks one of the resources as available.
419
420	\item
421	Once the thread escapes the loop, the @select_nodes@ are unregistered from the resources.
422	\end{enumerate}
423
424	These steps give a basic overview of how the statement works.
425	Digging into parts of the implementation will shed light on the specifics and provide more detail.
426
427	\subsection{Locks}\label{s:wu_locks}
428
429	The \CFA runtime supports a number of spinning and blocking locks, \eg semaphore, MCS, futex, Go mutex, spinlock, owner, \etc.
430	Many of these locks satisfy the @is_selectable@ trait, and hence, are resources supported by the @waituntil@ statement.
431	For example, the following waits until the thread has acquired lock @l1@ or locks @l2@ and @l3@.
432	\begin{cfa}
433	owner_lock l1, l2, l3;
434	waituntil ( l1 ) { ... }
435	or waituntil( l2 ) { ... }
436	and waituntil( l3 ) { ... }
437	\end{cfa}
438	Implicitly, the @waituntil@ is calling the lock acquire for each of these locks to establish a position in the lock's queue of waiting threads.
439	When the lock schedules this thread, it unblocks and performs the @waituntil@ code to determine if it can proceed.
440	If it cannot proceed, it blocks again on the @waituntil@ lock, holding the acquired lock.
441
442	In detail, when a thread waits on multiple locks via a @waituntil@, it enqueues a @select_node@ in each of the lock's waiting queues.
443	When a @select_node@ reaches the front of the lock's queue and gains ownership, the thread blocked on the @waituntil@ is unblocked.
444	Now, the lock is temporarily held by the @waituntil@ thread until the node is unregistered, versus the thread waiting on the lock.
445	To prevent the waiting thread from holding many locks at once and potentially introducing a deadlock, the node is unregistered right after the corresponding code block is executed.
446	This prevents deadlocks since the waiting thread will never hold a lock while waiting on another resource.
447	As such the only nodes unregistered at the end are the ones that have not run.
448
449	\subsection{Timeouts}
450
451	Timeouts in the @waituntil@ take the form of a duration being passed to a @sleep@ or @timeout@ call.
452	An example is shown in the following code.
453
454	\begin{cfa}
455	waituntil( sleep( 1`ms ) ) {}
456	waituntil( timeout( 1`s ) ) {} or waituntil( timeout( 2`s ) ) {}
457	waituntil( timeout( 1`ns ) ) {} and waituntil( timeout( 2`s ) ) {}
458	\end{cfa}
459
460	The timeout implementation highlights a key part of the @waituntil@ semantics, the expression inside a @waituntil()@ is evaluated once at the start of the @waituntil@ algorithm.
461	As such, calls to these @sleep@ and @timeout@ routines do not block, but instead return a type that supports the @is_selectable@ trait.
462	This feature leverages \CFA's ability to overload on return type; a call to @sleep@ outside a @waituntil@ will call a different @sleep@ that does not return a type, which will block for the appropriate duration.
463	This mechanism of returning a selectable type is needed for types that want to support multiple operations such as channels that allow both reading and writing.
464
465	\subsection{Channels}\label{s:wu_chans}
466	To support both waiting on both reading and writing to channels, the operators @?<<?@ and @?>>?@ are used read and write to a channel respectively, where the left-hand operand is the value being read into/written and the right-hand operand is the channel.
467	Channels require significant complexity to synchronously multiplex on for a few reasons.
468	First, reading or writing to a channel is a mutating operation;
469	If a read or write to a channel occurs, the state of the channel has changed.
470	In comparison, for standard locks and futures, if a lock is acquired then released or a future is ready but not accessed, the state of the lock and the future is not permanently modified.
471	In this way, a @waituntil@ over locks or futures that completes with resources available but not consumed is not an issue.
472	However, if a thread modifies a channel on behalf of a thread blocked on a @waituntil@ statement, it is important that the corresponding @waituntil@ code block is run, otherwise there is a potentially erroneous mismatch between the channel state and associated side effects.
473	As such, the @unregister_select@ routine has a boolean return that is used by channels to indicate when the operation was completed but the block was not run yet.
474	When the return is @true@, the corresponding code block is run after the unregister.
475	Furthermore, if both @and@ and @or@ operators are used, the @or@ operators have to stop behaving like exclusive-or semantics due to the race between channel operations and unregisters.
476
477	It was deemed important that exclusive-or semantics were maintained when only @or@ operators were used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel.
478	This approach is infeasible in the case where @and@ and @or@ operators are used.
479	To show this consider the following @waituntil@ statement.
480
481	\begin{cfa}
482	waituntil( i >> A ) {} and waituntil( i >> B ) {}
483	or waituntil( i >> C ) {} and waituntil( i >> D ) {}
484	\end{cfa}
485
486	If exclusive-or semantics were followed, this @waituntil@ would only run the code blocks for @A@ and @B@, or the code blocks for @C@ and @D@.
487	However, to race before operation completion in this case introduces a race whose complexity increases with the size of the @waituntil@ statement.
488	In the example above, for @i@ to be inserted into @C@, to ensure the exclusive-or it must be ensured that @i@ can also be inserted into @D@.
489	Furthermore, the race for the @or@ would also need to be won.
490	However, due to TOCTOU issues, one cannot know that all resources are available without acquiring all the internal locks of channels in the subtree.
491	This is not a good solution for two reasons.
492	It is possible that once all the locks are acquired the subtree is not satisfied and the locks must all be released.
493	This would incur a high cost for signalling threads and heavily increase contention on internal channel locks.
494	Furthermore, the @waituntil@ statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible.
495	As such, the exclusive-or semantics are lost when using both @and@ and @or@ operators since they can not be supported without significant complexity and hits to @waituntil@ statement performance.
496
497	Channels introduce another interesting consideration in their implementation.
498	Supporting both reading and writing to a channel in A @waituntil@ means that one @waituntil@ clause may be the notifier for another @waituntil@ clause.
499	This poses a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel.
500	When both a special-case @or@ is inserting to a channel on one thread and another thread is blocked in a special-case @or@ consuming from the same channel there is not one but two races that need to be consolidated by the inserting thread.
501	(This race can also occur in the mirrored case with a blocked producer and signalling consumer.)
502	For the producing thread to know that the insert succeeded, they need to win the race for their own @waituntil@ and win the race for the other @waituntil@.
503
504	Go solves this problem in their select statement by acquiring the internal locks of all channels before registering the select on the channels.
505	This eliminates the race since no other threads can operate on the blocked channel since its lock will be held.
506	This approach is not used in \CFA since the @waituntil@ is polymorphic.
507	Not all types in a @waituntil@ have an internal lock, and when using non-channel types acquiring all the locks incurs extra unneeded overhead.
508	Instead this race is consolidated in \CFA in two phases by having an intermediate pending status value for the race.
509	This race case is detectable, and if detected, producer will first race to set its own race flag to be pending.
510	If it succeeds, it then attempts to set the consumer's race flag to its success value.
511	If the producer successfully sets the consumer race flag, then the operation can proceed, if not the signalling thread will set its own race flag back to the initial value.
512	If any other threads attempt to set the producer's flag and see a pending value, they will wait until the value changes before proceeding to ensure that in the case that the producer fails, the signal will not be lost.
513	This protocol ensures that signals will not be lost and that the two races can be resolved in a safe manner.
514
515	Channels in \CFA have exception based shutdown mechanisms that the @waituntil@ statement needs to support.
516	These exception mechanisms were what brought in the @on_selected@ routine.
517	This routine is needed by channels to detect if they are closed upon waking from a @waituntil@ statement, to ensure that the appropriate behaviour is taken and an exception is thrown.
518
519	\subsection{Guards and Statement Predicate}\label{s:wu_guards}
520	Checking for when a synchronous multiplexing utility is done is trivial when it has an or/xor relationship, since any resource becoming available means that the blocked thread can proceed.
521	In \uC and \CFA, their \gls{synch_multiplex} utilities involve both an @and@ and @or@ operator, which make the problem of checking for completion of the statement more difficult.
522
523	In the \uC @_Select@ statement, this problem is solved by constructing a tree of the resources, where the internal nodes are operators and the leaves are booleans storing the state of each resource.
524	The internal nodes also store the statuses of the two subtrees beneath them.
525	When resources become available, their corresponding leaf node status is modified and then percolates up into the internal nodes to update the state of the statement.
526	Once the root of the tree has both subtrees marked as @true@ then the statement is complete.
527	As an optimization, when the internal nodes are updated, their subtrees marked as @true@ are pruned and are not touched again.
528	To support statement guards in \uC, the tree prunes a branch if the corresponding guard is false.
529
530	The \CFA @waituntil@ statement blocks a thread until a set of resources have become available that satisfy the underlying predicate.
531	The waiting condition of the @waituntil@ statement can be represented as a predicate over the resources, joined by the @waituntil@ operators, where a resource is @true@ if it is available, and @false@ otherwise.
532	In \CFA, this representation is used as the mechanism to check if a thread is done waiting on the @waituntil@.
533	Leveraging the compiler, a predicate routine is generated per @waituntil@ that when passed the statuses of the resources, returns @true@ when the @waituntil@ is done, and false otherwise.
534	To support guards on the \CFA @waituntil@ statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate.
535
536	\uC's @_Select@, supports operators both inside and outside of the clauses.
537	\eg in the following example the code blocks will run once their corresponding predicate inside the round braces is satisfied.
538
539	% C_TODO put this is uC++ code style not cfa-style
540	\begin{cfa}
541	Future_ISM<int> A, B, C, D;
542	_Select( A \|\| B && C ) { ... }
543	and _Select( D && E ) { ... }
544	\end{cfa}
545
546	This is more expressive that the @waituntil@ statement in \CFA.
547	In \CFA, since the @waituntil@ statement supports more resources than just futures, implementing operators inside clauses was avoided for a few reasons.
548	As a motivating example, suppose \CFA supported operators inside clauses and consider the code snippet in Figure~\ref{f:wu_inside_op}.
549
550	\begin{figure}
551	\begin{cfa}
552	owner_lock A, B, C, D;
553	waituntil( A && B ) { ... }
554	or waituntil( C && D ) { ... }
555	\end{cfa}
556	\caption{Example of unsupported operators inside clauses in \CFA.}
557	\label{f:wu_inside_op}
558	\end{figure}
559
560	If the @waituntil@ in Figure~\ref{f:wu_inside_op} works with the same semantics as described and acquires each lock as it becomes available, it opens itself up to possible deadlocks since it is now holding locks and waiting on other resources.
561	Other semantics would be needed to ensure that this operation is safe.
562	One possibility is to use \CC's @scoped_lock@ approach that was described in Section~\ref{s:DeadlockAvoidance}, however the potential for livelock leaves much to be desired.
563	Another possibility would be to use resource ordering similar to \CFA's @mutex@ statement, but that alone is not sufficient if the resource ordering is not used everywhere.
564	Additionally, using resource ordering could conflict with other semantics of the @waituntil@ statement.
565	To show this conflict, consider if the locks in Figure~\ref{f:wu_inside_op} were ordered @D@, @B@, @C@, @A@.
566	If all the locks are available, it becomes complex to both respect the ordering of the @waituntil@ in Figure~\ref{f:wu_inside_op} when choosing which code block to run and also respect the lock ordering of @D@, @B@, @C@, @A@ at the same time.
567	One other way this could be implemented is to wait until all resources for a given clause are available before proceeding to acquire them, but this also quickly becomes a poor approach.
568	This approach won't work due to TOCTOU issues; it is not possible to ensure that the full set resources are available without holding them all first.
569	Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems involved, but it was not deemed an important feature when taking into account the runtime cost that would need to be paid to handle these situations.
570	The problem of operators inside clauses also becomes a difficult issue to handle when supporting channels.
571	If internal operators were supported, it would require some way to ensure that channels used with internal operators are modified on if and only if the corresponding code block is run, but that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}.
572
573	\subsection{The full \lstinline{waituntil} picture}
574	Now that the details have been discussed, the full pseudocode of the waituntil is presented in Figure~\ref{f:WU_Full_Impl}.
575
576	\begin{figure}
577	\begin{cfa}
578	bool when_conditions[N];
579	for ( node in s ) $\C{// evaluate guards}$
580	if ( node has guard )
581	when_conditions[node] = node_guard;
582	else
583	when_conditions[node] = true;
584
585	select_nodes s[N]; $\C[3.25in]{// declare N select nodes}$
586	try {
587	for ( node in s ) $\C{// register nodes}$
588	if ( when_conditions[node] )
589	register_select( resource, node );
590
591	// ... set statuses for nodes with when_conditions[node] == false ...
592
593	while ( statement predicate not satisfied ) { $\C{// check predicate}$
594	// block
595	for ( resource in waituntil statement ) { $\C{// run true code blocks}$
596	if ( statement predicate is satisfied ) break;
597	if ( resource is avail ) {
598	try {
599	if( on_selected( resource ) ) $\C{// conditionally run block}$
600	run code block
601	} finally { $\C{// for exception safety}$
602	unregister_select( resource, node ); $\C{// immediate unregister}$
603	}
604	}
605	}
606	}
607	} finally { $\C{// for exception safety}$
608	for ( registered nodes in s ) $\C{// deregister nodes}$
609	if ( when_conditions[node] && unregister_select( resource, node ) && on_selected( resource ) )
610	run code block $\C{// conditionally run code block upon unregister}\CRT$
611	}
612	\end{cfa}
613	\caption{Full \lstinline{waituntil} Pseudocode Implementation}
614	\label{f:WU_Full_Impl}
615	\end{figure}
616
617	In comparison to Figure~\ref{f:WU_Impl}, this pseudocode now includes the specifics discussed in this chapter.
618	Some things to note are as follows:
619	The @finally@ blocks provide exception-safe RAII unregistering of nodes, and in particular, the @finally@ inside the innermost loop performs the immediate unregistering required for deadlock-freedom that was mentioned in Section~\ref{s:wu_locks}.
620	The @when_conditions@ array is used to store the boolean result of evaulating each guard at the beginning of the @waituntil@, and it is used to conditionally omit operations on resources with @false@ guards.
621	As discussed in Section~\ref{s:wu_chans}, this pseudocode includes code blocks conditional on the result of both @on_selected@ and @unregister_select@, which allows the channel implementation to ensure that all available channel resources will have their corresponding code block run.
622
623	\section{Waituntil Performance}
624	The two \gls{synch_multiplex} utilities that are in the realm of comparability with the \CFA @waituntil@ statement are the Go @select@ statement and the \uC @_Select@ statement.
625	As such, two microbenchmarks are presented, one for Go and one for \uC to contrast the systems.
626	The similar utilities discussed at the start of this chapter in C, Ada, Rust, \CC, and OCaml are either not meaningful or feasible to benchmark against.
627	The select(2) and related utilities in C are not comparable since they are system calls that go into the kernel and operate on file descriptors, whereas the @waituntil@ exists solely in user space.
628	Ada's @select@ only operates on methods, which is done in \CFA via the @waitfor@ utility so it is not meaningful to benchmark against the @waituntil@, which cannot wait on the same resource.
629	Rust and \CC only offer a busy-wait based approach which is not comparable to a blocking approach.
630	OCaml's @select@ waits on channels that are not comparable with \CFA and Go channels, so OCaml @select@ is not benchmarked against Go's @select@ and \CFA's @waituntil@.
631	Given the differences in features, polymorphism, and expressibility between @waituntil@ and @select@, and @_Select@, the aim of the microbenchmarking in this chapter is to show that these implementations lie in the same realm of performance, not to pick a winner.
632
633	\subsection{Channel Benchmark}
634	The channel multiplexing microbenchmarks compare \CFA's @waituntil@ and Go's select, where the resource being waited on is a set of channels.
635	The basic structure of the microbenchmark has the number of cores split evenly between producer and consumer threads, \ie, with 8 cores there would be 4 producer threads and 4 consumer threads.
636	The number of clauses @C@ is also varied, with results shown with 2, 4, and 8 clauses.
637	Each clause has a respective channel that is operates on.
638	Each producer and consumer repeatedly waits to either produce or consume from one of the @C@ clauses and respective channels.
639	An example in \CFA syntax of the work loop in the consumer main with @C = 4@ clauses follows.
640
641	\begin{cfa}
642	for (;;)
643	waituntil( val << chans[0] ) {} or waituntil( val << chans[1] ) {}
644	or waituntil( val << chans[2] ) {} or waituntil( val << chans[3] ) {}
645	\end{cfa}
646	A successful consumption is counted as a channel operation, and the throughput of these operations is measured over 10 seconds.
647	The first microbenchmark measures throughput of the producers and consumer synchronously waiting on the channels and the second has the threads asynchronously wait on the channels.
648	The results are shown in Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} respectively.
649
650	\begin{figure}
651	\centering
652	\captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize}
653	\subfloat[AMD]{
654	\resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_2.pgf}}
655	}
656	\subfloat[Intel]{
657	\resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_2.pgf}}
658	}
659	\bigskip
660
661	\subfloat[AMD]{
662	\resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_4.pgf}}
663	}
664	\subfloat[Intel]{
665	\resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_4.pgf}}
666	}
667	\bigskip
668
669	\subfloat[AMD]{
670	\resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_8.pgf}}
671	}
672	\subfloat[Intel]{
673	\resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_8.pgf}}
674	}
675	\caption{The channel synchronous multiplexing benchmark comparing Go select and \CFA \lstinline{waituntil} statement throughput (higher is better).}
676	\label{f:select_contend_bench}
677	\end{figure}
678
679	\begin{figure}
680	\centering
681	\captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize}
682	\subfloat[AMD]{
683	\resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_2.pgf}}
684	}
685	\subfloat[Intel]{
686	\resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_2.pgf}}
687	}
688	\bigskip
689
690	\subfloat[AMD]{
691	\resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_4.pgf}}
692	}
693	\subfloat[Intel]{
694	\resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_4.pgf}}
695	}
696	\bigskip
697
698	\subfloat[AMD]{
699	\resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_8.pgf}}
700	}
701	\subfloat[Intel]{
702	\resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_8.pgf}}
703	}
704	\caption{The asynchronous multiplexing channel benchmark comparing Go select and \CFA \lstinline{waituntil} statement throughput (higher is better).}
705	\label{f:select_spin_bench}
706	\end{figure}
707
708	Both Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} have similar results when comparing @select@ and @waituntil@.
709	In the AMD benchmarks, the performance is very similar as the number of cores scale.
710	The AMD machine has been observed to have higher caching contention cost, which creates on a bottleneck on the channel locks, which results in similar scaling between \CFA and Go.
711	At low cores, Go has significantly better performance, which is likely due to an optimization in their scheduler.
712	Go heavily optimizes thread handoffs on their local run-queue, which can result in very good performance for low numbers of threads which are parking/unparking each other~\cite{go:sched}.
713	In the Intel benchmarks, \CFA performs better than Go as the number of cores scale and as the number of clauses scale.
714	This is likely due to Go's implementation choice of acquiring all channel locks when registering and unregistering channels on a @select@.
715	Go then has to hold a lock for every channel, so it follows that this results in worse performance as the number of channels increase.
716	In \CFA, since races are consolidated without holding all locks, it scales much better both with cores and clauses since more work can occur in parallel.
717	This scalability difference is more significant on the Intel machine than the AMD machine since the Intel machine has been observed to have lower cache contention costs.
718
719	The Go approach of holding all internal channel locks in the select has some additional drawbacks.
720	This approach results in some pathological cases where Go's system throughput on channels can greatly suffer.
721	Consider the case where there are two channels, @A@ and @B@.
722	There are both a producer thread and a consumer thread, @P1@ and @C1@, selecting both @A@ and @B@.
723	Additionally, there is another producer and another consumer thread, @P2@ and @C2@, that are both operating solely on @B@.
724	Compared to \CFA this setup results in significantly worse performance since @P2@ and @C2@ cannot operate in parallel with @P1@ and @C1@ due to all locks being acquired.
725	This case may not be as pathological as it may seem.
726	If the set of channels belonging to a select have channels that overlap with the set of another select, they lose the ability to operate on their select in parallel.
727	The implementation in \CFA only ever holds a single lock at a time, resulting in better locking granularity.
728	Comparison of this pathological case is shown in Table~\ref{t:pathGo}.
729	The AMD results highlight the worst case scenario for Go since contention is more costly on this machine than the Intel machine.
730
731	\begin{table}[t]
732	\centering
733	\setlength{\extrarowheight}{2pt}
734	\setlength{\tabcolsep}{5pt}
735
736	\caption{Throughput (channel operations per second) of \CFA and Go for a pathologically bad case for contention in Go's select implementation}
737	\label{t:pathGo}
738	\begin{tabular}{*{5}{r\|}r}
739	& \multicolumn{1}{c\|}{\CFA} & \multicolumn{1}{c@{}}{Go} \\
740	\hline
741	AMD & \input{data/nasus_Order} \\
742	\hline
743	Intel & \input{data/pyke_Order}
744	\end{tabular}
745	\end{table}
746
747	Another difference between Go and \CFA is the order of clause selection when multiple clauses are available.
748	Go "randomly" selects a clause, but \CFA chooses the clause in the order they are listed~\cite{go:select}.
749	This \CFA design decision allows users to set implicit priorities, which can result in more predictable behaviour, and even better performance in certain cases, such as the case shown in Table~\ref{t:pathGo}.
750	If \CFA didn't have priorities, the performance difference in Table~\ref{t:pathGo} would be less significant since @P1@ and @C1@ would try to compete to operate on @B@ more often with random selection.
751
752	\subsection{Future Benchmark}
753	The future benchmark compares \CFA's @waituntil@ with \uC's @_Select@, with both utilities waiting on futures.
754	Both \CFA's @waituntil@ and \uC's @_Select@ have very similar semantics, however @_Select@ can only wait on futures, whereas the @waituntil@ is polymorphic.
755	They both support @and@ and @or@ operators, but the underlying implementation of the operators differs between @waituntil@ and @_Select@.
756	The @waituntil@ statement checks for statement completion using a predicate function, whereas the @_Select@ statement maintains a tree that represents the state of the internal predicate.
757
758	\begin{figure}
759	\centering
760	\subfloat[AMD Future Synchronization Benchmark]{
761	\resizebox{0.5\textwidth}{!}{\input{figures/nasus_Future.pgf}}
762	\label{f:futureAMD}
763	}
764	\subfloat[Intel Future Synchronization Benchmark]{
765	\resizebox{0.5\textwidth}{!}{\input{figures/pyke_Future.pgf}}
766	\label{f:futureIntel}
767	}
768	\caption{\CFA \lstinline{waituntil} and \uC \lstinline{_Select} statement throughput synchronizing on a set of futures with varying wait predicates (higher is better).}
769	\caption{}
770	\label{f:futurePerf}
771	\end{figure}
772
773	This microbenchmark aims to measure the impact of various predicates on the performance of the @waituntil@ and @_Select@ statements.
774	This benchmark and section does not try to directly compare the @waituntil@ and @_Select@ statements since the performance of futures in \CFA and \uC differ by a significant margin, making them incomparable.
775	Results of this benchmark are shown in Figure~\ref{f:futurePerf}.
776	Each set of columns is marked with a name representing the predicate for that set of columns.
777	The predicate name and corresponding @waituntil@ statement is shown below:
778
779	\begin{cfa}
780	#ifdef OR
781	waituntil( A ) { get( A ); }
782	or waituntil( B ) { get( B ); }
783	or waituntil( C ) { get( C ); }
784	#endif
785	#ifdef AND
786	waituntil( A ) { get( A ); }
787	and waituntil( B ) { get( B ); }
788	and waituntil( C ) { get( C ); }
789	#endif
790	#ifdef ANDOR
791	waituntil( A ) { get( A ); }
792	and waituntil( B ) { get( B ); }
793	or waituntil( C ) { get( C ); }
794	#endif
795	#ifdef ORAND
796	(waituntil( A ) { get( A ); }
797	or waituntil( B ) { get( B ); }) // brackets create higher precedence for or
798	and waituntil( C ) { get( C ); }
799	#endif
800	\end{cfa}
801
802	In Figure~\ref{f:futurePerf}, the @OR@ column for \CFA is more performant than the other \CFA predicates, likely due to the special-casing of @waituntil@ statements with only @or@ operators.
803	For both \uC and \CFA the @AND@ column is the least performant, which is expected since all three futures need to be fulfilled for each statement completion, unlike any of the other operators.
804	Interestingly, \CFA has lower variation across predicates on the AMD (excluding the special OR case), whereas \uC has lower variation on the Intel.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: