Context Navigation

source: doc/theses/colby_parsons_MMAth/text/waituntil.tex @ 5e81a9c

Last change on this file since 5e81a9c was 5e81a9c, checked in by caparsons <caparson@…>, 12 months ago
polished waituntil and conclusion chapter
Property mode set to `100644`
File size: 40.8 KB

Line
1	% ======================================================================
2	% ======================================================================
3	\chapter{Waituntil}\label{s:waituntil}
4	% ======================================================================
5	% ======================================================================
6
7	Consider the following motivating problem.
8	There are @N@ stalls (resources) in a bathroom and there are @M@ people (threads).
9	Each stall has its own lock since only one person may occupy a stall at a time.
10	Humans tend to solve this problem in the following way.
11	They check if all of the stalls are occupied.
12	If not they enter and claim an available stall.
13	If they are all occupied, the people queue and watch the stalls until one is free and then enter and lock the stall.
14	This solution can be implemented on a computer easily if all threads are waiting on all stalls and agree to queue.
15	Now the problem is extended.
16	Some stalls are wheelchair accessible, some stalls are dirty and other stalls are clean.
17	Each person (thread) may choose some subset of dirty, clean and accessible stalls that they want to wait for.
18	Immediately the problem becomes much more difficult.
19	A single queue no longer fully solves the problem: What happens when there is a stall available that the person at the front of the queue will not choose?
20	The naive solution to this problem has each thread to spin indefinitely continually checking the stalls until an suitable one is free.
21	This is not good enough since this approach wastes cycles and results in no fairness among threads waiting for stalls as a thread will jump in the first stall available without any regard to other waiting threads.
22	Waiting for the first stall (resource) available without spinning is an example of \gls{synch_multiplex}, the ability to wait synchronously for a resource or set of resources.
23
24	\section{History of Synchronous Multiplexing}
25	There is a history of tools that provide \gls{synch_multiplex}.
26	Some well known \gls{synch_multiplex} tools include unix system utilities: select(2)\cite{linux:select}, poll(2)\cite{linux:poll}, and epoll(7)\cite{linux:epoll}, and the select statement provided by Go\cite{go:selectref}.
27
28	The theory surrounding \gls{synch_multiplex} was largely introduced by Hoare in his 1985 CSP book \cite{Hoare85} and his later work with Roscoe on the theoretical language Occam\cite{Roscoe88}.
29	The work on Occam in \cite{Roscoe88} calls their \gls{synch_multiplex} primitive ALT, which waits for one resource to be available and then executes a corresponding block of code.
30	Waiting for one resource out of a set of resources can be thought of as a logical exclusive-or over the set of resources.
31	Both CSP and Occam include \Newterm{guards} for communication channels and the ability to wait for a single channel to be ready out of a set of channels.
32	Guards are a conditional operator similar to an @if@, except they apply to the resource being waited on.
33	If a guard is false then the resource it guards is considered to not be in the set of resources being waited on.
34	Guards can be simulated using if statements, but to do so requires \[2^N\] if statements, where @N@ is the number of guards.
35	The equivalence between guards and exponential if statements comes from an Occam ALT statement rule~\cite{Roscoe88}, which is presented in \CFA syntax in Figure~\ref{f:wu_if}.
36	Providing guards allows for easy toggling of waituntil clauses without introducing repeated code.
37
38	\begin{figure}
39	\begin{cfa}
40	// CFA's guards use the keyword 'when'
41	when( predicate ) waituntil( A ) {}
42	or waituntil( B ) {}
43	// ===
44	if ( predicate ) {
45	waituntil( A ) {}
46	or waituntil( B ) {}
47	} else {
48	waituntil( B ) {}
49	}
50	\end{cfa}
51	\caption{Occam's guard to if statement equivalence shown in \CFA syntax.}
52	\label{f:wu_if}
53	\end{figure}
54
55	When discussing \gls{synch_multiplex} implementations, one must discuss the resources being multiplexed.
56	While the aforementioned theory waits on channels, the earliest known implementation of a synchronous multiplexing tool, Unix's select(2)\cite{linux:select}, is multiplexed over file descriptors.
57	The select(2) system call is passed three sets of file descriptors (read, write, exceptional) to wait on and an optional timeout.
58	Select(2) will block until either some subset of file descriptors are available or the timeout expires.
59	All file descriptors that are ready will be returned by modifying the argument sets to only contain the ready descriptors.
60	This early implementation differs from the theory presented in Occam and CSP; when the call from select(2) returns it may provide more than one ready file descriptor.
61	As such, select(2) has logical-or multiplexing semantics, whereas the theory described exclusive-or semantics.
62	This is not a drawback.
63	A user can easily achieve exclusive-or semantics with select by arbitrarily choosing only one of the returned descriptors to operate on.
64	Select(2) was followed by poll(2), which was later followed by epoll(7), with each successor improving upon drawbacks in their predecessors.
65	The syscall poll(2) improved on select(2) by allowing users to monitor descriptors with numbers higher than 1024 which was not supported by select.
66	Epoll(7) then improved on poll(2) to return the set of file descriptors; when one or more descriptors became available poll(2) would return the number of availables descriptors, but would not indicate which descriptors were ready.
67
68	It is worth noting these \gls{synch_multiplex} tools mentioned so far interact directly with the operating system and are often used to communicate between processes.
69	Later, \gls{synch_multiplex} started to appear in user-space to support fast multiplexed concurrent communication between threads.
70	An early example of \gls{synch_multiplex} is the select statement in Ada~\cite[\S~9.7]{Ichbiah79}.
71	The select statement in Ada allows a task to multiplex over some subset of its own methods that it would like to @accept@ calls to.
72	Tasks in Ada are essentially objects that have their own thread, and as such have methods, fields, etc.
73	The Ada select statement has the same exclusive-or semantics and guards as ALT from Occam, however it multiplexes over methods on rather than multiplexing over channels.
74	A code block is associated with each @accept@, and the method that is accepted first has its corresponding code block run after the task unblocks.
75	In this way the select statement in Ada provides rendezvous points for threads, rather than providing some resource through message passing.
76	The select statement in Ada also supports an optional timeout with the same semantics as select(2), and provides an @else@.
77	The @else@ changes the synchronous multiplexing to asynchronous multiplexing.
78	If an @else@ clause is in a select statement and no calls to the @accept@ed methods are immediately available the code block associated with the @else@ is run and the task does not block.
79
80	A popular example of user-space \gls{synch_multiplex} is Go with their select statement~\cite{go:selectref}.
81	Go's select statement operates on channels and has the same exclusive-or semantics as the ALT primitive from Occam, and has associated code blocks for each clause like ALT and Ada.
82	However, unlike Ada and ALT, Go does not provide any guards for their select statement cases.
83	Go provides a timeout utility and also provides a @default@ clause which has the same semantics as Ada's @else@ clause.
84
85	\uC provides \gls{synch_multiplex} over futures with their @_Select@ statement and Ada-style \gls{synch_multiplex} over monitor and task methods with their @_Accept@ statement~\cite{uC++}.
86	Their @_Accept@ statement builds upon the select statement offered by Ada, by offering both @and@ and @or@ semantics, which can be used together in the same statement.
87	These semantics are also supported for \uC's @_Select@ statement.
88	This enables fully expressive \gls{synch_multiplex} predicates.
89
90	There are many other languages that provide \gls{synch_multiplex}, including Rust's @select!@ over futures~\cite{rust:select}, OCaml's @select@ over channels~\cite{ocaml:channel}, and C++14's @when_any@ over futures~\cite{cpp:whenany}.
91	Note that while C++14 and Rust provide \gls{synch_multiplex}, their implementations leave much to be desired as they both rely on busy-waiting polling to wait on multiple resources.
92
93	\section{Other Approaches to Synchronous Multiplexing}
94	To avoid the need for \gls{synch_multiplex}, all communication between threads/processes has to come from a single source.
95	One key example is Erlang, in which each process has a single heterogenous mailbox that is the sole source of concurrent communication, removing the need for \gls{synch_multiplex} as there is only one place to wait on resources.
96	In a similar vein, actor systems circumvent the \gls{synch_multiplex} problem as actors are traditionally non-blocking, so they will never block in a behaviour and only block when waiting for the next message.
97	While these approaches solve the \gls{synch_multiplex} problem, they introduce other issues.
98	Consider the case where a thread has a single source of communication (like erlang and actor systems) wants one of a set of @N@ resources.
99	It requests @N@ resources and waits for responses.
100	In the meantime the thread may receive other communication, and may either has to save and postpone the related work or discard it.
101	After the thread receives one of the @N@ resources, it will continue to receive the other ones it requested, even if it does not need them.
102	If the requests for the other resources need to be retracted, the burden falls on the programmer to determine how to synchronize appropriately to ensure that only one resource is delivered.
103
104	\section{\CFA's Waituntil Statement}
105	The new \CFA \gls{synch_multiplex} utility introduced in this work is the @waituntil@ statement.
106	There is a @waitfor@ statement in \CFA that supports Ada-style \gls{synch_multiplex} over monitor methods, so this @waituntil@ focuses on synchronizing over other resources.
107	All of the \gls{synch_multiplex} features mentioned so far are monomorphic, only supporting one resource to wait on: select(2) supports file descriptors, Go's select supports channel operations, \uC's select supports futures, and Ada's select supports monitor method calls.
108	The waituntil statement in \CFA is polymorphic and provides \gls{synch_multiplex} over any objects that satisfy the trait in Figure~\ref{f:wu_trait}.
109	No other language provides a synchronous multiplexing tool polymorphic over resources like \CFA's waituntil.
110
111	\begin{figure}
112	\begin{cfa}
113	forall(T & \| sized(T))
114	trait is_selectable {
115	// For registering a waituntil stmt on a selectable type
116	bool register_select( T &, select_node & );
117
118	// For unregistering a waituntil stmt from a selectable type
119	bool unregister_select( T &, select_node & );
120
121	// on_selected is run on the selecting thread prior to executing the statement associated with the select_node
122	void on_selected( T &, select_node & );
123	};
124	\end{cfa}
125	\caption{Trait for types that can be passed into \CFA's waituntil statement.}
126	\label{f:wu_trait}
127	\end{figure}
128
129	Currently locks, channels, futures and timeouts are supported by the waituntil statement, but this will be expanded as other use cases arise.
130	The @waituntil@ statement supports guarded clauses, like Ada, and Occam, supports both @or@, and @and@ semantics, like \uC, and provides an @else@ for asynchronous multiplexing. An example of \CFA waituntil usage is shown in Figure~\ref{f:wu_example}. In Figure~\ref{f:wu_example} the waituntil statement is waiting for either @Lock@ to be available or for a value to be read from @Channel@ into @i@ and for @Future@ to be fulfilled.
131
132	\begin{figure}
133	\begin{cfa}
134	future(int) Future;
135	channel(int) Channel;
136	owner_lock Lock;
137	int i = 0;
138
139	waituntil( Lock ) { ... }
140	or when( i == 0 ) waituntil( i << Channel ) { ... }
141	and waituntil( Future ) { ... }
142	\end{cfa}
143	\caption{Example of \CFA's waituntil statement}
144	\label{f:wu_example}
145	\end{figure}
146
147	\section{Waituntil Semantics}
148	There are two parts of the waituntil semantics to discuss, the semantics of the statement itself, \ie @and@, @or@, @when@ guards, and @else@ semantics, and the semantics of how the waituntil interacts with types like channels, locks and futures.
149
150	\subsection{Waituntil Statement Semantics}
151	The @or@ semantics are the most straightforward and nearly match those laid out in the ALT statement from Occam, the clauses have an exclusive-or relationship where the first one to be available will be run and only one clause is run.
152	\CFA's @or@ semantics differ from ALT semantics in one respect, instead of randomly picking a clause when multiple are available, the clause that appears first in the order of clauses will be picked.
153	\eg in the following example, if @foo@ and @bar@ are both available, @foo@ will always be selected since it comes first in the order of @waituntil@ clauses.
154	\begin{cfa}
155	future(int) bar;
156	future(int) foo;
157	waituntil( foo ) { ... }
158	or waituntil( bar ) { ... }
159	\end{cfa}
160
161	The @and@ semantics match the @and@ semantics used by \uC.
162	When multiple clauses are joined by @and@, the @waituntil@ will make a thread wait for all to be available, but will run the corresponding code blocks \emph{as they become available}.
163	As @and@ clauses are made available, the thread will be woken to run those clauses' code blocks and then the thread will wait again until all clauses have been run.
164	This allows work to be done in parallel while synchronizing over a set of resources, and furthermore gives a good reason to use the @and@ operator.
165	If the @and@ operator waited for all clauses to be available before running, it would not provide much more use that just acquiring those resources one by one in subsequent lines of code.
166	The @and@ operator binds more tightly than the @or@ operator.
167	To give an @or@ operator higher precedence brackets can be used.
168	\eg the following waituntil unconditionally waits for @C@ and one of either @A@ or @B@, since the @or@ is given higher precendence via brackets.
169	\begin{cfa}
170	(waituntil( A ) { ... }
171	or waituntil( B ) { ... } )
172	and waituntil( C ) { ... }
173	\end{cfa}
174
175	The guards in the waituntil statement are called @when@ clauses.
176	Each the boolean expression inside a @when@ is evaluated once before the waituntil statement is run.
177	The guards in Occam's ALT effectively toggle clauses on and off, where a clause will only be evaluated and waited on if the corresponding guard is @true@.
178	The guards in the waituntil statement operate the same way, but require some nuance since both @and@ and @or@ operators are supported.
179	This will be discussed further in Section~\ref{s:wu_guards}.
180	When a guard is false and a clause is removed, it can be thought of as removing that clause and its preceding operator from the statement.
181	\eg in the following example the two waituntil statements are semantically the same.
182	\begin{cfa}
183	when(true) waituntil( A ) { ... }
184	or when(false) waituntil( B ) { ... }
185	and waituntil( C ) { ... }
186	// ===
187	waituntil( A ) { ... }
188	and waituntil( C ) { ... }
189	\end{cfa}
190
191	The @else@ clause on the waituntil has identical semantics to the @else@ clause in Ada.
192	If all resources are not immediately available and there is an @else@ clause, the @else@ clause is run and the thread will not block.
193
194	\subsection{Waituntil Type Semantics}
195	As described earlier, to support interaction with the waituntil statement a type must support the trait shown in Figure~\ref{f:wu_trait}.
196	The waituntil statement expects types to register and unregister themselves via calls to @register_select@ and @unregister_select@ respectively.
197	When a resource becomes available, @on_selected@ is run.
198	Many types do not need @on_selected@, but it is provided since some types may need to perform some work or checks before the resource can be accessed in the code block.
199	The register/unregister routines in the trait return booleans.
200	The return value of @register_select@ is @true@ if the resource is immediately available, and @false@ otherwise.
201	The return value of @unregister_select@ is @true@ if the corresponding code block should be run after unregistration and @false@ otherwise.
202	The routine @on_selected@, and the return value of @unregister_select@ were needed to support channels as a resource.
203	More detail on channels and their interaction with waituntil will be discussed in Section~\ref{s:wu_chans}.
204
205	\section{Waituntil Implementation}
206	The waituntil statement is not inherently complex, and can be described as a few steps.
207	The complexity of the statement comes from the consideration of race conditions and synchronization needed when supporting various primitives.
208	The basic steps of the waituntil statement are the following:
209
210	\begin{enumerate}[topsep=5pt,itemsep=3pt,parsep=0pt]
211
212	\item
213	First the waituntil statement creates a @select_node@ per resource that is being waited on.
214	The @select_node@ is an object that stores the waituntil data pertaining to one of the resources.
215
216	\item
217	Then, each @select_node@ is then registered with the corresponding resource.
218
219	\item
220	The thread executing the waituntil then enters a loop that will loop until the @waituntil@ statement's predicate is satisfied.
221	In each iteration of the loop the thread attempts to block.
222	If any clauses are satified the block will fail and the thread will proceed, otherwise the block succeeds.
223	After proceeding past the block all clauses are checked for completion and the completed clauses have their code blocks run.
224	In the case where the block suceeds, the thread will be woken by the thread that marks one of the resources as available.
225
226	\item
227	Once the thread escapes the loop, the @select_nodes@ are unregistered from the resources.
228	\end{enumerate}
229	Pseudocode detailing these steps is presented in the following code block.
230	\begin{cfa}
231	select_nodes s[N]; // N select nodes
232	for ( node in s )
233	register_select( resource, node );
234	while( statement predicate not satisfied ) {
235	// try to block
236	for ( resource in waituntil statement )
237	if ( resource is avail ) run code block
238	}
239	for ( node in s )
240	unregister_select( resource, node );
241	\end{cfa}
242	These steps give a basic overview of how the statement works.
243	Digging into parts of the implementation will shed light on the specifics and provide more detail.
244
245	\subsection{Locks}
246	Locks are one of the resources supported by the @waituntil@ statement.
247	When a thread waits on multiple locks via a waituntil, it enqueues a @select_node@ in each of the lock's waiting queues.
248	When a @select_node@ reaches the front of the queue and gains ownership of a lock, the blocked thread is notified.
249	The lock will be held until the node is unregistered.
250	To prevent the waiting thread from holding many locks at once and potentially introducing a deadlock, the node is unregistered right after the corresponding code block is executed.
251	This prevents deadlocks since the waiting thread will never hold a lock while waiting on another resource.
252	As such the only nodes unregistered at the end are the ones that have not run.
253
254	\subsection{Timeouts}
255	Timeouts in the waituntil take the form of a duration being passed to a @sleep@ or @timeout@ call.
256	An example is shown in the following code.
257
258	\begin{cfa}
259	waituntil( sleep( 1`ms ) ) {}
260	waituntil( timeout( 1`s ) ) {} or waituntil( timeout( 2`s ) ) {}
261	waituntil( timeout( 1`ns ) ) {} and waituntil( timeout( 2`s ) ) {}
262	\end{cfa}
263
264	The timeout implementation highlights a key part of the waituntil semantics, the expression inside a @waituntil()@ is evaluated once at the start of the @waituntil@ algorithm.
265	As such, calls to these @sleep@ and @timeout@ routines do not block, but instead return a type that supports the @is_selectable@ trait.
266	This feature leverages \CFA's ability to overload on return type; a call to @sleep@ outside a waituntil will call a different @sleep@ that does not return a type, which will block for the appropriate duration.
267	This mechanism of returning a selectable type is needed for types that want to support multiple operations such as channels that allow both reading and writing.
268
269	\subsection{Channels}\label{s:wu_chans}
270	To support both waiting on both reading and writing to channels, the operators @?<<?@ and @?>>?@ are used read and write to a channel respectively, where the lefthand operand is the value being read into/written and the righthand operand is the channel.
271	Channels require significant complexity to synchronously multiplex on for a few reasons.
272	First, reading or writing to a channel is a mutating operation;
273	If a read or write to a channel occurs, the state of the channel has changed.
274	In comparison, for standard locks and futures, if a lock is acquired then released or a future is ready but not accessed, the state of the lock and the future is not permanently modified.
275	In this way, a waituntil over locks or futures that completes with resources available but not consumed is not an issue.
276	However, if a thread modifies a channel on behalf of a thread blocked on a waituntil statement, it is important that the corresponding waituntil code block is run, otherwise there is a potentially erroneous mismatch between the channel state and associated side effects.
277	As such, the @unregister_select@ routine has a boolean return that is used by channels to indicate when the operation was completed but the block was not run yet.
278	When the return is @true@, the corresponding code block is run after the unregister.
279	Furthermore, if both @and@ and @or@ operators are used, the @or@ operators have to stop behaving like exclusive-or semantics due to the race between channel operations and unregisters.
280
281	It was deemed important that exclusive-or semantics were maintained when only @or@ operators were used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel.
282	This approach is infeasible in the case where @and@ and @or@ operators are used.
283	To show this consider the following waituntil statement.
284
285	\begin{cfa}
286	waituntil( i >> A ) {} and waituntil( i >> B ) {}
287	or waituntil( i >> C ) {} and waituntil( i >> D ) {}
288	\end{cfa}
289
290	If exclusive-or semantics were followed, this waituntil would only run the code blocks for @A@ and @B@, or the code blocks for @C@ and @D@.
291	However, to race before operation completion in this case introduces a race whose complexity increases with the size of the waituntil statement.
292	In the example above, for @i@ to be inserted into @C@, to ensure the exclusive-or it must be ensured that @i@ can also be inserted into @D@.
293	Furthermore, the race for the @or@ would also need to be won.
294	However, due to TOCTOU issues, one cannot know that all resources are available without acquiring all the internal locks of channels in the subtree.
295	This is not a good solution for two reasons.
296	It is possible that once all the locks are acquired the subtree is not satisfied and the locks must all be released.
297	This would incur a high cost for signalling threads and heavily increase contention on internal channel locks.
298	Furthermore, the @waituntil@ statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible.
299	As such, the exclusive-or semantics are lost when using both @and@ and @or@ operators since they can not be supported without significant complexity and hits to waituntil statement performance.
300
301	Channels introduce another interesting consideration in their implementation.
302	Supporting both reading and writing to a channel in A @waituntil@ means that one @waituntil@ clause may be the notifier for another @waituntil@ clause.
303	This poses a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel.
304	When both a special-case @or@ is inserting to a channel on one thread and another thread is blocked in a special-case @or@ consuming from the same channel there is not one but two races that need to be consolidated by the inserting thread.
305	(This race can also occur in the mirrored case with a blocked producer and signalling consumer.)
306	For the producing thread to know that the insert succeeded, they need to win the race for their own waituntil and win the race for the other waituntil.
307
308	Go solves this problem in their select statement by acquiring the internal locks of all channels before registering the select on the channels.
309	This eliminates the race since no other threads can operate on the blocked channel since its lock will be held.
310	This approach is not used in \CFA since the waituntil is polymorphic.
311	Not all types in a waituntil have an internal lock, and when using non-channel types acquiring all the locks incurs extra uneeded overhead.
312	Instead this race is consolidated in \CFA in two phases by having an intermediate pending status value for the race.
313	This race case is detectable, and if detected the thread attempting to signal will first race to set the race flag to be pending.
314	If it succeeds, it then attempts to set the consumer's race flag to its success value.
315	If the producer successfully sets the consumer race flag, then the operation can proceed, if not the signalling thread will set its own race flag back to the initial value.
316	If any other threads attempt to set the producer's flag and see a pending value, they will wait until the value changes before proceeding to ensure that in the case that the producer fails, the signal will not be lost.
317	This protocol ensures that signals will not be lost and that the two races can be resolved in a safe manner.
318
319	Channels in \CFA have exception based shutdown mechanisms that the waituntil statement needs to support.
320	These exception mechanisms were what brought in the @on_selected@ routine.
321	This routine is needed by channels to detect if they are closed upon waking from a waituntil statement, to ensure that the appropriate behaviour is taken.
322
323	\subsection{Guards and Statement Predicate}\label{s:wu_guards}
324	Checking for when a synchronous multiplexing utility is done is trivial when it has an or/xor relationship, since any resource becoming available means that the blocked thread can proceed.
325	In \uC and \CFA, their \gls{synch_multiplex} utilities involve both an @and@ and @or@ operator, which make the problem of checking for completion of the statement more difficult.
326
327	In the \uC @_Select@ statement, this problem is solved by constructing a tree of the resources, where the internal nodes are operators and the leaves are booleans storing the state of each resource.
328	The internal nodes also store the statuses of the two subtrees beneath them.
329	When resources become available, their corresponding leaf node status is modified and then percolates up into the internal nodes to update the state of the statement.
330	Once the root of the tree has both subtrees marked as @true@ then the statement is complete.
331	As an optimization, when the internal nodes are updated, their subtrees marked as @true@ are pruned and are not touched again.
332	To support statement guards in \uC, the tree prunes a branch if the corresponding guard is false.
333
334	The \CFA waituntil statement blocks a thread until a set of resources have become available that satisfy the underlying predicate.
335	The waiting condition of the waituntil statement can be represented as a predicate over the resources, joined by the waituntil operators, where a resource is @true@ if it is available, and @false@ otherwise.
336	In \CFA, this representation is used as the mechanism to check if a thread is done waiting on the waituntil.
337	Leveraging the compiler, a predicate routine is generated per waituntil that when passed the statuses of the resources, returns @true@ when the waituntil is done, and false otherwise.
338	To support guards on the \CFA waituntil statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate.
339
340	\uC's @_Select@, supports operators both inside and outside of the clauses.
341	\eg in the following example the code blocks will run once their corresponding predicate inside the round braces is satisfied.
342
343	% C_TODO put this is uC++ code style not cfa-style
344	\begin{cfa}
345	Future_ISM<int> A, B, C, D;
346	_Select( A \|\| B && C ) { ... }
347	and _Select( D && E ) { ... }
348	\end{cfa}
349
350	This is more expressive that the waituntil statement in \CFA.
351	In \CFA, since the waituntil statement supports more resources than just futures, implementing operators inside clauses was avoided for a few reasons.
352	As a motivating example, suppose \CFA supported operators inside clauses and consider the code snippet in Figure~\ref{f:wu_inside_op}.
353
354	\begin{figure}
355	\begin{cfa}
356	owner_lock A, B, C, D;
357	waituntil( A && B ) { ... }
358	or waituntil( C && D ) { ... }
359	\end{cfa}
360	\caption{Example of unsupported operators inside clauses in \CFA.}
361	\label{f:wu_inside_op}
362	\end{figure}
363
364	If the waituntil in Figure~\ref{f:wu_inside_op} works with the same semantics as described and acquires each lock as it becomes available, it opens itself up to possible deadlocks since it is now holding locks and waiting on other resources.
365	Other semantics would be needed to ensure that this operation is safe.
366	One possibility is to use \CC's @scoped_lock@ approach that was described in Section~\ref{s:DeadlockAvoidance}, however the potential for livelock leaves much to be desired.
367	Another possibility would be to use resource ordering similar to \CFA's @mutex@ statement, but that alone is not sufficient if the resource ordering is not used everywhere.
368	Additionally, using resource ordering could conflict with other semantics of the waituntil statement.
369	To show this conflict, consider if the locks in Figure~\ref{f:wu_inside_op} were ordered @D@, @B@, @C@, @A@.
370	If all the locks are available, it becomes complex to both respect the ordering of the waituntil in Figure~\ref{f:wu_inside_op} when choosing which code block to run and also respect the lock ordering of @D@, @B@, @C@, @A@ at the same time.
371	One other way this could be implemented is to wait until all resources for a given clause are available before proceeding to acquire them, but this also quickly becomes a poor approach.
372	This approach won't work due to TOCTOU issues; it is not possible to ensure that the full set resources are available without holding them all first.
373	Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems involved, but it was not deemed an important feature when taking into account the runtime cost that would need to be paid to handle these situations.
374	The problem of operators inside clauses also becomes a difficult issue to handle when supporting channels.
375	If internal operators were supported, it would require some way to ensure that channels used with internal operators are modified on if and only if the corresponding code block is run, but that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}.
376
377	\section{Waituntil Performance}
378	The two \gls{synch_multiplex} utilities that are in the realm of comparability with the \CFA waituntil statement are the Go @select@ statement and the \uC @_Select@ statement.
379	As such, two microbenchmarks are presented, one for Go and one for \uC to contrast the systems.
380	The similar utilities discussed at the start of this chapter in C, Ada, Rust, \CC, and OCaml are either not meaningful or feasible to benchmark against.
381	The select(2) and related utilities in C are not comparable since they are system calls that go into the kernel and operate on file descriptors, whereas the waituntil exists solely in userspace.
382	Ada's @select@ only operates on methods, which is done in \CFA via the @waitfor@ utility so it is not meaningful to benchmark against the @waituntil@, which cannot wait on the same resource.
383	Rust and \CC only offer a busy-wait based approach which is not comparable to a blocking approach.
384	OCaml's @select@ waits on channels that are not comparable with \CFA and Go channels, so OCaml @select@ is not benchmarked against Go's @select@ and \CFA's @waituntil@.
385	Given the differences in features, polymorphism, and expressibility between @waituntil@ and @select@, and @_Select@, the aim of the microbenchmarking in this chapter is to show that these implementations lie in the same realm of performance, not to pick a winner.
386
387	\subsection{Channel Benchmark}
388	The channel multiplexing microbenchmarks compare \CFA's waituntil and Go's select, where the resource being waited on is a set of channels.
389	The basic structure of the microbenchmark has the number of cores split evenly between producer and consumer threads, \ie, with 8 cores there would be 4 producer threads and 4 consumer threads.
390	The number of clauses @C@ is also varied, with results shown with 2, 4, and 8 clauses.
391	Each clause has a respective channel that is operates on.
392	Each producer and consumer repeatedly waits to either produce or consume from one of the @C@ clauses and respective channels.
393	An example in \CFA syntax of the work loop in the consumer main with @C = 4@ clauses follows.
394
395	\begin{cfa}
396	for (;;)
397	waituntil( val << chans[0] ) {} or waituntil( val << chans[1] ) {}
398	or waituntil( val << chans[2] ) {} or waituntil( val << chans[3] ) {}
399	\end{cfa}
400	A successful consumption is counted as a channel operation, and the throughput of these operations is measured over 10 seconds.
401	The first microbenchmark measures throughput of the producers and consumer synchronously waiting on the channels and the second has the threads asynchronously wait on the channels.
402	The results are shown in Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} respectively.
403
404	\begin{figure}
405	\centering
406	\captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize}
407	\subfloat[AMD]{
408	\resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_2.pgf}}
409	}
410	\subfloat[Intel]{
411	\resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_2.pgf}}
412	}
413	\bigskip
414
415	\subfloat[AMD]{
416	\resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_4.pgf}}
417	}
418	\subfloat[Intel]{
419	\resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_4.pgf}}
420	}
421	\bigskip
422
423	\subfloat[AMD]{
424	\resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_8.pgf}}
425	}
426	\subfloat[Intel]{
427	\resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_8.pgf}}
428	}
429	\caption{The channel synchronous multiplexing benchmark comparing Go select and \CFA waituntil statement throughput (higher is better).}
430	\label{f:select_contend_bench}
431	\end{figure}
432
433	\begin{figure}
434	\centering
435	\captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize}
436	\subfloat[AMD]{
437	\resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_2.pgf}}
438	}
439	\subfloat[Intel]{
440	\resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_2.pgf}}
441	}
442	\bigskip
443
444	\subfloat[AMD]{
445	\resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_4.pgf}}
446	}
447	\subfloat[Intel]{
448	\resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_4.pgf}}
449	}
450	\bigskip
451
452	\subfloat[AMD]{
453	\resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_8.pgf}}
454	}
455	\subfloat[Intel]{
456	\resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_8.pgf}}
457	}
458	\caption{The asynchronous multiplexing channel benchmark comparing Go select and \CFA waituntil statement throughput (higher is better).}
459	\label{f:select_spin_bench}
460	\end{figure}
461
462	Both Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} have similar results when comparing @select@ and @waituntil@.
463	In the AMD benchmarks, the performance is very similar as the number of cores scale.
464	The AMD machine has been observed to have higher caching contention cost, which creates on a bottleneck on the channel locks, which results in similar scaling between \CFA and Go.
465	At low cores, Go has significantly better performance, which is likely due to an optimization in their scheduler.
466	Go heavily optimizes thread handoffs on their local runqueue, which can result in very good performance for low numbers of threads which are parking/unparking eachother~\cite{go:sched}.
467	In the Intel benchmarks, \CFA performs better than Go as the number of cores scale and as the number of clauses scale.
468	This is likely due to Go's implementation choice of acquiring all channel locks when registering and unregistering channels on a @select@.
469	Go then has to hold a lock for every channel, so it follows that this results in worse performance as the number of channels increase.
470	In \CFA, since races are consolidated without holding all locks, it scales much better both with cores and clauses since more work can occur in parallel.
471	This scalability difference is more significant on the Intel machine than the AMD machine since the Intel machine has been observed to have lower cache contention costs.
472
473	The Go approach of holding all internal channel locks in the select has some additional drawbacks.
474	This approach results in some pathological cases where Go's system throughput on channels can greatly suffer.
475	Consider the case where there are two channels, @A@ and @B@.
476	There are both a producer thread and a consumer thread, @P1@ and @C1@, selecting both @A@ and @B@.
477	Additionally, there is another producer and another consumer thread, @P2@ and @C2@, that are both operating solely on @B@.
478	Compared to \CFA this setup results in significantly worse performance since @P2@ and @C2@ cannot operate in parallel with @P1@ and @C1@ due to all locks being acquired.
479	This case may not be as pathological as it may seem.
480	If the set of channels belonging to a select have channels that overlap with the set of another select, they lose the ability to operate on their select in parallel.
481	The implementation in \CFA only ever holds a single lock at a time, resulting in better locking granularity.
482	Comparison of this pathological case is shown in Table~\ref{t:pathGo}.
483	The AMD results highlight the worst case scenario for Go since contention is more costly on this machine than the Intel machine.
484
485	\begin{table}[t]
486	\centering
487	\setlength{\extrarowheight}{2pt}
488	\setlength{\tabcolsep}{5pt}
489
490	\caption{Throughput (channel operations per second) of \CFA and Go for a pathologically bad case for contention in Go's select implementation}
491	\label{t:pathGo}
492	\begin{tabular}{*{5}{r\|}r}
493	& \multicolumn{1}{c\|}{\CFA} & \multicolumn{1}{c@{}}{Go} \\
494	\hline
495	AMD & \input{data/nasus_Order} \\
496	\hline
497	Intel & \input{data/pyke_Order}
498	\end{tabular}
499	\end{table}
500
501	Another difference between Go and \CFA is the order of clause selection when multiple clauses are available.
502	Go "randomly" selects a clause, but \CFA chooses the clause in the order they are listed~\cite{go:select}.
503	This \CFA design decision allows users to set implicit priorities, which can result in more predictable behaviour, and even better performance in certain cases, such as the case shown in Table~\ref{}.
504	If \CFA didn't have priorities, the performance difference in Table~\ref{} would be less significant since @P1@ and @C1@ would try to compete to operate on @B@ more often with random selection.
505
506	\subsection{Future Benchmark}
507	The future benchmark compares \CFA's waituntil with \uC's @_Select@, with both utilities waiting on futures.
508	Both \CFA's @waituntil@ and \uC's @_Select@ have very similar semantics, however @_Select@ can only wait on futures, whereas the @waituntil@ is polymorphic.
509	They both support @and@ and @or@ operators, but the underlying implementation of the operators differs between @waituntil@ and @_Select@.
510	The @waituntil@ statement checks for statement completion using a predicate function, whereas the @_Select@ statement maintains a tree that represents the state of the internal predicate.
511
512	\begin{figure}
513	\centering
514	\subfloat[AMD Future Synchronization Benchmark]{
515	\resizebox{0.5\textwidth}{!}{\input{figures/nasus_Future.pgf}}
516	\label{f:futureAMD}
517	}
518	\subfloat[Intel Future Synchronization Benchmark]{
519	\resizebox{0.5\textwidth}{!}{\input{figures/pyke_Future.pgf}}
520	\label{f:futureIntel}
521	}
522	\caption{\CFA waituntil and \uC \_Select statement throughput synchronizing on a set of futures with varying wait predicates (higher is better).}
523	\caption{}
524	\label{f:futurePerf}
525	\end{figure}
526
527	This microbenchmark aims to measure the impact of various predicates on the performance of the @waituntil@ and @_Select@ statements.
528	This benchmark and section does not try to directly compare the @waituntil@ and @_Select@ statements since the performance of futures in \CFA and \uC differ by a significant margin, making them incomparable.
529	Results of this benchmark are shown in Figure~\ref{f:futurePerf}.
530	Each set of columns is marked with a name representing the predicate for that set of columns.
531	The predicate name and corresponding waituntil statement is shown below:
532
533	\begin{cfa}
534	#ifdef OR
535	waituntil( A ) { get( A ); }
536	or waituntil( B ) { get( B ); }
537	or waituntil( C ) { get( C ); }
538	#endif
539	#ifdef AND
540	waituntil( A ) { get( A ); }
541	and waituntil( B ) { get( B ); }
542	and waituntil( C ) { get( C ); }
543	#endif
544	#ifdef ANDOR
545	waituntil( A ) { get( A ); }
546	and waituntil( B ) { get( B ); }
547	or waituntil( C ) { get( C ); }
548	#endif
549	#ifdef ORAND
550	(waituntil( A ) { get( A ); }
551	or waituntil( B ) { get( B ); }) // brackets create higher precedence for or
552	and waituntil( C ) { get( C ); }
553	#endif
554	\end{cfa}
555
556	In Figure~\ref{f:futurePerf}, the @OR@ column for \CFA is more performant than the other \CFA predicates, likely due to the special-casing of waituntil statements with only @or@ operators.
557	For both \uC and \CFA the @AND@ column is the least performant, which is expected since all three futures need to be fulfilled for each statement completion, unlike any of the other operators.
558	Interestingly, \CFA has lower variation across predicates on the AMD (excluding the special OR case), whereas \uC has lower variation on the Intel.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: