1 | % ======================================================================
|
---|
2 | % ======================================================================
|
---|
3 | \chapter{Waituntil}\label{s:waituntil}
|
---|
4 | % ======================================================================
|
---|
5 | % ======================================================================
|
---|
6 |
|
---|
7 | Consider the following motivating problem.
|
---|
8 | There are $N$ stalls (resources) in a bathroom and there are $M$ people (threads) using the bathroom.
|
---|
9 | Each stall has its own lock since only one person may occupy a stall at a time.
|
---|
10 | Humans solve this problem in the following way.
|
---|
11 | They check if all of the stalls are occupied.
|
---|
12 | If not, they enter and claim an available stall.
|
---|
13 | If they are all occupied, people queue and watch the stalls until one is free, and then enter and lock the stall.
|
---|
14 | This solution can be implemented on a computer easily, if all threads are waiting on all stalls and agree to queue.
|
---|
15 |
|
---|
16 | Now the problem is extended.
|
---|
17 | Some stalls are wheelchair accessible and some stalls have gender identification.
|
---|
18 | Each person (thread) may be limited to only one kind of stall or may choose among different kinds of stalls that match their criteria.
|
---|
19 | Immediately, the problem becomes more difficult.
|
---|
20 | A single queue no longer solves the problem.
|
---|
21 | What happens when there is a stall available that the person at the front of the queue cannot choose?
|
---|
22 | The na\"ive solution has each thread spin indefinitely continually checking every matching kind of stall(s) until a suitable one is free.
|
---|
23 | This approach is insufficient since it wastes cycles and results in unfairness among waiting threads as a thread can acquire the first matching stall without regard to the waiting time of other threads.
|
---|
24 | Waiting for the first appropriate stall (resource) that becomes available without spinning is an example of \gls{synch_multiplex}: the ability to wait synchronously for one or more resources based on some selection criteria.
|
---|
25 |
|
---|
26 | \section{History of Synchronous Multiplexing}\label{s:History}
|
---|
27 |
|
---|
28 | There is a history of tools that provide \gls{synch_multiplex}.
|
---|
29 | Some well known \gls{synch_multiplex} tools include Unix system utilities: @select@~\cite{linux:select}, @poll@~\cite{linux:poll}, and @epoll@~\cite{linux:epoll}, and the @select@ statement provided by Go~\cite{go:selectref}, Ada~\cite[\S~9.7]{Ada16}, and \uC~\cite[\S~3.3.1]{uC++}.
|
---|
30 | The concept and theory surrounding \gls{synch_multiplex} was introduced by Hoare in his 1985 book, Communicating Sequential Processes (CSP)~\cite{Hoare85},
|
---|
31 | \begin{quote}
|
---|
32 | A communication is an event that is described by a pair $c.v$ where $c$ is the name of the channel on which the communication takes place and $v$ is the value of the message which passes.~\cite[p.~113]{Hoare85}
|
---|
33 | \end{quote}
|
---|
34 | The ideas in CSP were implemented by Roscoe and Hoare in the language Occam~\cite{Roscoe88}.
|
---|
35 |
|
---|
36 | Both CSP and Occam include the ability to wait for a \Newterm{choice} among receiver channels and \Newterm{guards} to toggle which receives are valid.
|
---|
37 | For example,
|
---|
38 | \begin{cfa}[mathescape]
|
---|
39 | (@G1@(x) $\rightarrow$ P @|@ @G2@(y) $\rightarrow$ Q )
|
---|
40 | \end{cfa}
|
---|
41 | waits for either channel @x@ or @y@ to have a value, if and only if guards @G1@ and @G2@ are true;
|
---|
42 | if only one guard is true, only one channel receives, and if both guards are false, no receive occurs.
|
---|
43 | % extended CSP with a \gls{synch_multiplex} construct @ALT@, which waits for one resource to be available and then executes a corresponding block of code.
|
---|
44 | In detail, waiting for one resource out of a set of resources can be thought of as a logical exclusive-or over the set of resources.
|
---|
45 | Guards are a conditional operator similar to an @if@, except they apply to the resource being waited on.
|
---|
46 | If a guard is false, then the resource it guards is not in the set of resources being waited on.
|
---|
47 | If all guards are false, the ALT, Occam's \gls{synch_multiplex} statement, does nothing and the thread continues.
|
---|
48 | Guards can be simulated using @if@ statements as shown in~\cite[rule~2.4, p~183]{Roscoe88}
|
---|
49 | \begin{lstlisting}[basicstyle=\rm,mathescape]
|
---|
50 | ALT( $b$ & $g$ $P$, $G$ ) = IF ( $b$ ALT($\,g$ $P$, $G$ ), $\neg\,$b ALT( $G$ ) ) (boolean guard elim).
|
---|
51 | \end{lstlisting}
|
---|
52 | but require $2^N-1$ @if@ statements, where $N$ is the number of guards.
|
---|
53 | The exponential blowup comes from applying rule 2.4 repeatedly, since it works on one guard at a time.
|
---|
54 | Figure~\ref{f:wu_if} shows in \CFA an example of applying rule 2.4 for three guards.
|
---|
55 | Also, notice the additional code duplication for statements @S1@, @S2@, and @S3@.
|
---|
56 |
|
---|
57 | \begin{figure}
|
---|
58 | \centering
|
---|
59 | \begin{lrbox}{\myboxA}
|
---|
60 | \begin{cfa}
|
---|
61 | when( G1 )
|
---|
62 | waituntil( R1 ) S1
|
---|
63 | or when( G2 )
|
---|
64 | waituntil( R2 ) S2
|
---|
65 | or when( G3 )
|
---|
66 | waituntil( R3 ) S3
|
---|
67 |
|
---|
68 |
|
---|
69 |
|
---|
70 |
|
---|
71 |
|
---|
72 |
|
---|
73 |
|
---|
74 | \end{cfa}
|
---|
75 | \end{lrbox}
|
---|
76 |
|
---|
77 | \begin{lrbox}{\myboxB}
|
---|
78 | \begin{cfa}
|
---|
79 | if ( G1 )
|
---|
80 | if ( G2 )
|
---|
81 | if ( G3 ) waituntil( R1 ) S1 or waituntil( R2 ) S2 or waituntil( R3 ) S3
|
---|
82 | else waituntil( R1 ) S1 or waituntil( R2 ) S2
|
---|
83 | else
|
---|
84 | if ( G3 ) waituntil( R1 ) S1 or waituntil( R3 ) S3
|
---|
85 | else waituntil( R1 ) S1
|
---|
86 | else
|
---|
87 | if ( G2 )
|
---|
88 | if ( G3 ) waituntil( R2 ) S2 or waituntil( R3 ) S3
|
---|
89 | else waituntil( R2 ) S2
|
---|
90 | else
|
---|
91 | if ( G3 ) waituntil( R3 ) S3
|
---|
92 | \end{cfa}
|
---|
93 | \end{lrbox}
|
---|
94 |
|
---|
95 | \subfloat[Guards]{\label{l:guards}\usebox\myboxA}
|
---|
96 | \hspace*{5pt}
|
---|
97 | \vrule
|
---|
98 | \hspace*{5pt}
|
---|
99 | \subfloat[Simulated Guards]{\label{l:simulated_guards}\usebox\myboxB}
|
---|
100 | \caption{\CFA guard simulated with \lstinline{if} statement.}
|
---|
101 | \label{f:wu_if}
|
---|
102 | \end{figure}
|
---|
103 |
|
---|
104 | When discussing \gls{synch_multiplex} implementations, the resource being multiplexed is important.
|
---|
105 | While CSP waits on channels, the earliest known implementation of synch\-ronous multiplexing is Unix's @select@~\cite{linux:select}, multiplexing over file descriptors, which conceptually differ from channels in name only.
|
---|
106 | The @select@ system-call is passed three sets of file descriptors (read, write, exceptional) to wait on and an optional timeout.
|
---|
107 | @select@ blocks until either some subset of file descriptors are available or the timeout expires.
|
---|
108 | All file descriptors that are ready are returned by modifying the argument sets to only contain the ready descriptors.
|
---|
109 |
|
---|
110 | This early implementation differs from the theory presented in CSP: when the call from @select@ returns it may provide more than one ready file descriptor.
|
---|
111 | As such, @select@ has logical-or multiplexing semantics, whereas the theory described exclusive-or semantics.
|
---|
112 | It is possible to achieve exclusive-or semantics with @select@ by arbitrarily operating on only one of the returned descriptors.
|
---|
113 | @select@ passes the interest set of file descriptors between application and kernel in the form of a worst-case sized bit-mask, where the worst-case is the largest numbered file descriptor.
|
---|
114 | @poll@ reduces the size of the interest sets changing from a bit mask to a linked data structure, independent of the file-descriptor values.
|
---|
115 | @epoll@ further reduces the data passed per call by keeping the interest set in the kernel, rather than supplying it on every call.
|
---|
116 |
|
---|
117 | These early \gls{synch_multiplex} tools interact directly with the operating system and others are used to communicate among processes.
|
---|
118 | Later, \gls{synch_multiplex} started to appear in applications, via programming languages, to support fast multiplexed concurrent communication among threads.
|
---|
119 | An early example of \gls{synch_multiplex} is the @select@ statement in Ada~\cite[\S~9.7]{Ichbiah79}.
|
---|
120 | This @select@ allows a task object, with their own threads, to multiplex over a subset of asynchronous calls to its methods.
|
---|
121 | The Ada @select@ has the same exclusive-or semantics and guards as Occam ALT;
|
---|
122 | however, it multiplexes over methods rather than channels.
|
---|
123 |
|
---|
124 | \begin{figure}
|
---|
125 | \begin{lstlisting}[language=ada,literate=,{moredelim={**[is][\color{red}]{@}{@}}}]
|
---|
126 | task type buffer is -- thread type
|
---|
127 | ... -- buffer declarations
|
---|
128 | count : integer := 0;
|
---|
129 | begin -- thread starts here
|
---|
130 | loop
|
---|
131 | select
|
---|
132 | @when count < Size@ => -- guard
|
---|
133 | @accept insert( elem : in ElemType )@ do -- method
|
---|
134 | ... -- add to buffer
|
---|
135 | count := count + 1;
|
---|
136 | end;
|
---|
137 | -- executed if this accept called
|
---|
138 | or
|
---|
139 | @when count > 0@ => -- guard
|
---|
140 | @accept remove( elem : out ElemType )@ do -- method
|
---|
141 | ... --remove and return from buffer via parameter
|
---|
142 | count := count - 1;
|
---|
143 | end;
|
---|
144 | -- executed if this accept called
|
---|
145 | or @delay 10.0@; -- unblock after 10 seconds without call
|
---|
146 | or @else@ -- do not block, cannot appear with delay
|
---|
147 | end select;
|
---|
148 | end loop;
|
---|
149 | end buffer;
|
---|
150 | var buf : buffer; -- create task object and start thread in task body
|
---|
151 | \end{lstlisting}
|
---|
152 | \caption{Ada Bounded Buffer}
|
---|
153 | \label{f:BB_Ada}
|
---|
154 | \end{figure}
|
---|
155 |
|
---|
156 | Figure~\ref{f:BB_Ada} shows the outline of a bounded buffer implemented with an Ada task.
|
---|
157 | Note that a task method is associated with the \lstinline[language=ada]{accept} clause of the \lstinline[language=ada]{select} statement, rather than being a separate routine.
|
---|
158 | The thread executing the loop in the task body blocks at the \lstinline[language=ada]{select} until a call occurs to @insert@ or @remove@.
|
---|
159 | Then the appropriate \lstinline[language=ada]{accept} method is run with the called arguments.
|
---|
160 | Hence, the \lstinline[language=ada]{select} statement provides rendezvous points for threads, rather than providing channels with message passing.
|
---|
161 | The \lstinline[language=ada]{select} statement also provides a timeout and @else@ (nonblocking), which changes synchronous multiplexing to asynchronous.
|
---|
162 | Now the thread polls rather than blocks.
|
---|
163 |
|
---|
164 | Another example of programming-language \gls{synch_multiplex} is Go using a @select@ statement with channels~\cite{go:selectref}.
|
---|
165 | Figure~\ref{l:BB_Go} shows the outline of a bounded buffer administrator implemented with a Go routine.
|
---|
166 | Here two channels are used for inserting and removing by client producers and consumers, respectively.
|
---|
167 | (The @done@ channel is used to synchronize with the program main.)
|
---|
168 | Go's @select@ has the same exclusive-or semantics as the ALT primitive from Occam and associated code blocks for each clause like ALT and Ada.
|
---|
169 | However, unlike Ada and ALT, Go does not provide guards for the \lstinline[language=go]{case} clauses of the \lstinline[language=go]{select}.
|
---|
170 | As such, the exponential blowup can be seen comparing Go and \uC in Figure~\ref{f:AdaMultiplexing}.
|
---|
171 | Go also provides a timeout via a channel and a @default@ clause like Ada @else@ for asynchronous multiplexing.
|
---|
172 |
|
---|
173 | \begin{figure}
|
---|
174 | \centering
|
---|
175 |
|
---|
176 | \begin{lrbox}{\myboxA}
|
---|
177 | \begin{lstlisting}[language=go,literate=,{moredelim={**[is][\color{red}]{@}{@}}}]
|
---|
178 | insert := make( chan int )
|
---|
179 | remove := make( chan * int )
|
---|
180 | buffer := make( chan int Size )
|
---|
181 | done := make( chan int )
|
---|
182 | count := 0
|
---|
183 | func in_buf( int val ) {
|
---|
184 | buffer <- val
|
---|
185 | count++
|
---|
186 | }
|
---|
187 | func out_buf( int * ptr ) {
|
---|
188 | *ptr := <-buffer
|
---|
189 | count--
|
---|
190 | }
|
---|
191 | func BoundedBuffer {
|
---|
192 | L: for {
|
---|
193 | if ( count < Size && count > 0 ) {
|
---|
194 | select { // wait for message
|
---|
195 | @case i := <- insert: in_buf( i )@
|
---|
196 | @case p := <- remove: out_buf( p )@
|
---|
197 | case <- done: break L
|
---|
198 | }
|
---|
199 | } else if ( count < Size ) {
|
---|
200 | select { // wait for message
|
---|
201 | @case i := <- insert: in_buf( i )@
|
---|
202 | case <- done: break L
|
---|
203 | }
|
---|
204 | } else ( count > 0 ) {
|
---|
205 | select { // wait for message
|
---|
206 | @case p := <- remove: out_buf( p )@
|
---|
207 | case <- done: break L;
|
---|
208 | }
|
---|
209 | }
|
---|
210 | }
|
---|
211 | done <- 0
|
---|
212 | }
|
---|
213 | func main() {
|
---|
214 | go BoundedBuffer() // start administrator
|
---|
215 | }
|
---|
216 | \end{lstlisting}
|
---|
217 | \end{lrbox}
|
---|
218 |
|
---|
219 | \begin{lrbox}{\myboxB}
|
---|
220 | \begin{lstlisting}[language=uC++,{moredelim={**[is][\color{red}]{@}{@}}}]
|
---|
221 |
|
---|
222 |
|
---|
223 |
|
---|
224 |
|
---|
225 |
|
---|
226 |
|
---|
227 |
|
---|
228 |
|
---|
229 |
|
---|
230 |
|
---|
231 |
|
---|
232 |
|
---|
233 | _Task BoundedBuffer {
|
---|
234 | int * buffer;
|
---|
235 | int front = back = count = 0;
|
---|
236 | public:
|
---|
237 | // ... constructor implementation
|
---|
238 | void insert( int elem ) {
|
---|
239 | buffer[front] = elem;
|
---|
240 | front = ( front + 1 ) % Size;
|
---|
241 | count += 1;
|
---|
242 | }
|
---|
243 | int remove() {
|
---|
244 | int ret = buffer[back];
|
---|
245 | back = ( back + 1 ) % Size;
|
---|
246 | count -= 1;
|
---|
247 | return ret;
|
---|
248 | }
|
---|
249 | private:
|
---|
250 | void main() {
|
---|
251 | for ( ;; ) {
|
---|
252 | _Accept( ~buffer ) break;
|
---|
253 | @or _When ( count < Size ) _Accept( insert )@;
|
---|
254 | @or _When ( count > 0 ) _Accept( remove )@;
|
---|
255 | }
|
---|
256 | }
|
---|
257 | };
|
---|
258 | buffer buf; // start thread in main method
|
---|
259 | \end{lstlisting}
|
---|
260 | \end{lrbox}
|
---|
261 |
|
---|
262 | \subfloat[Go]{\label{l:BB_Go}\usebox\myboxA}
|
---|
263 | \hspace*{5pt}
|
---|
264 | \vrule
|
---|
265 | \hspace*{5pt}
|
---|
266 | \subfloat[\uC]{\label{l:BB_uC++}\usebox\myboxB}
|
---|
267 |
|
---|
268 | \caption{Bounded Buffer}
|
---|
269 | \label{f:AdaMultiplexing}
|
---|
270 | \end{figure}
|
---|
271 |
|
---|
272 | Finally, \uC provides \gls{synch_multiplex} with Ada-style @select@ over monitor and task methods with the @_Accept@ statement~\cite[\S~2.9.2.1]{uC++}, and over futures with the @_Select@ statement~\cite[\S~3.3.1]{uC++}.
|
---|
273 | The @_Select@ statement extends the ALT/Go @select@ by offering both @and@ and @or@ semantics, which can be used together in the same statement.
|
---|
274 | Both @_Accept@ and @_Select@ statements provide guards for multiplexing clauses, as well as, timeout, and @else@ clauses.
|
---|
275 | Figure~\ref{l:BB_uC++} shows the outline of a bounded buffer administrator implemented with \uC @_Accept@ statements.
|
---|
276 |
|
---|
277 | There are other languages that provide \gls{synch_multiplex}, including Rust's @select!@ over futures~\cite{rust:select}, Java's @allof@/@anyof@ over futures~\cite{java:allof:anyof}, OCaml's @select@ over channels~\cite{ocaml:channel}, and C++14's @when_any@ over futures~\cite{cpp:whenany}.
|
---|
278 | Note that while C++14 and Rust provide \gls{synch_multiplex}, the implementations leave much to be desired as both rely on polling to wait on multiple resources.
|
---|
279 |
|
---|
280 | \section{Other Approaches to Synchronous Multiplexing}
|
---|
281 |
|
---|
282 | To avoid the need for \gls{synch_multiplex}, all communication among threads/processes must come from a single source.
|
---|
283 | For example, in Erlang each process has a single heterogeneous mailbox that is the sole source of concurrent communication, removing the need for \gls{synch_multiplex} as there is only one place to wait on resources.
|
---|
284 | Similar, actor systems circumvent the \gls{synch_multiplex} problem as actors only block when waiting for the next message never in a behaviour.
|
---|
285 | While these approaches solve the \gls{synch_multiplex} problem, they introduce other issues.
|
---|
286 | Consider the case where a thread has a single source of communication and it wants a set of $N$ resources.
|
---|
287 | It must sequentially request the $N$ resources and wait for each response.
|
---|
288 | During the receives for the $N$ resources, it can receive other communication, and has to save and postpone these communications, or discard them.
|
---|
289 | % If the requests for the other resources need to be retracted, the burden falls on the programmer to determine how to synchronize appropriately to ensure that only one resource is delivered.
|
---|
290 |
|
---|
291 | \section{\CFA's Waituntil Statement}
|
---|
292 |
|
---|
293 | The new \CFA \gls{synch_multiplex} utility introduced in this work is the @waituntil@ statement.
|
---|
294 | There already exists a @waitfor@ statement in \CFA that supports Ada-style \gls{synch_multiplex} over monitor methods~\cite{Delisle21}, so this @waituntil@ focuses on synchronizing over other resources.
|
---|
295 | All of the \gls{synch_multiplex} features mentioned so far are monomorphic, only waiting on one kind of resource: Unix @select@ supports file descriptors, Go's @select@ supports channel operations, \uC's @select@ supports futures, and Ada's @select@ supports monitor method calls.
|
---|
296 | The \CFA @waituntil@ is polymorphic and provides \gls{synch_multiplex} over any objects that satisfy the trait in Figure~\ref{f:wu_trait}.
|
---|
297 | No other language provides a synchronous multiplexing tool polymorphic over resources like \CFA's @waituntil@.
|
---|
298 |
|
---|
299 | \begin{figure}
|
---|
300 | \begin{cfa}
|
---|
301 | forall(T & | sized(T))
|
---|
302 | trait is_selectable {
|
---|
303 | // For registering a waituntil stmt on a selectable type
|
---|
304 | bool register_select( T &, select_node & );
|
---|
305 |
|
---|
306 | // For unregistering a waituntil stmt from a selectable type
|
---|
307 | bool unregister_select( T &, select_node & );
|
---|
308 |
|
---|
309 | // on_selected is run on the selecting thread prior to executing
|
---|
310 | // the statement associated with the select_node
|
---|
311 | bool on_selected( T &, select_node & );
|
---|
312 | };
|
---|
313 | \end{cfa}
|
---|
314 | \caption{Trait for types that can be passed into \CFA's \lstinline{waituntil} statement.}
|
---|
315 | \label{f:wu_trait}
|
---|
316 | \end{figure}
|
---|
317 |
|
---|
318 | Currently locks, channels, futures and timeouts are supported by the @waituntil@ statement, and this set can be expanded through the @is_selectable@ trait as other use-cases arise.
|
---|
319 | The @waituntil@ statement supports guard clauses, both @or@ and @and@ semantics, and timeout and @else@ for asynchronous multiplexing.
|
---|
320 | Figure~\ref{f:wu_example} shows a \CFA @waituntil@ usage, which is waiting for either @Lock@ to be available \emph{or} for a value to be read from @Channel@ into @i@ \emph{and} for @Future@ to be fulfilled \emph{or} a timeout of one second.
|
---|
321 | Note that the expression inside a @waituntil@ clause is evaluated once at the start of the @waituntil@ algorithm.
|
---|
322 |
|
---|
323 | \section{Waituntil Semantics}
|
---|
324 |
|
---|
325 | The @waituntil@ semantics has two parts: the semantics of the statement itself, \ie @and@, @or@, @when@ guards, and @else@ semantics, and the semantics of how the @waituntil@ interacts with types like locks, channels, and futures.
|
---|
326 |
|
---|
327 | \subsection{Statement Semantics}
|
---|
328 |
|
---|
329 | The @or@ semantics are the most straightforward and nearly match those laid out in the ALT statement from Occam.
|
---|
330 | The clauses have an exclusive-or relationship where the first available one is run and only one clause is run.
|
---|
331 | \CFA's @or@ semantics differ from ALT semantics: instead of randomly picking a clause when multiple are available, the first clause in the @waituntil@ that is available is executed.
|
---|
332 | For example, in the following example, if @foo@ and @bar@ are both available, @foo@ is always selected since it comes first in the order of @waituntil@ clauses.
|
---|
333 | \begin{cfa}
|
---|
334 | future(int) bar, foo;
|
---|
335 | waituntil( foo ) { ... } or waituntil( bar ) { ... } // prioritize foo
|
---|
336 | \end{cfa}
|
---|
337 | The reason for these semantics is that prioritizing resources can be useful in certain problems, such as shutdown.
|
---|
338 | In the rare case where there is a starvation problem with the ordering, it possible to follow a @waituntil@ with its reverse form, alternating which resource has the highest priority:
|
---|
339 | \begin{cfa}
|
---|
340 | waituntil( foo ) { ... } or waituntil( bar ) { ... } // prioritize foo
|
---|
341 | waituntil( bar ) { ... } or waituntil( foo ) { ... } // prioritize bar
|
---|
342 | \end{cfa}
|
---|
343 | While this approach is not general for many resources, it handles many basic cases.
|
---|
344 |
|
---|
345 | \begin{figure}
|
---|
346 | \begin{cfa}
|
---|
347 | future(int) Future;
|
---|
348 | channel(int) Channel;
|
---|
349 | owner_lock Lock;
|
---|
350 | int i = 0;
|
---|
351 |
|
---|
352 | waituntil( @Lock@ ) { ... }
|
---|
353 | or when( i == 0 ) waituntil( i << @Channel@ ) { ... }
|
---|
354 | and waituntil( @Future@ ) { ... }
|
---|
355 | or waituntil( @timeout( 1`s )@ ) { ... }
|
---|
356 | // else { ... }
|
---|
357 | \end{cfa}
|
---|
358 | \caption{Example of \CFA's waituntil statement}
|
---|
359 | \label{f:wu_example}
|
---|
360 | \end{figure}
|
---|
361 |
|
---|
362 | The \CFA @and@ semantics match the @and@ semantics of \uC \lstinline[language=uC++]{_Select}.
|
---|
363 | When multiple clauses are joined by @and@, the @waituntil@ makes a thread wait for all to be available, but still runs the corresponding code blocks \emph{as they become available}.
|
---|
364 | When an @and@ clause becomes available, the waiting thread unblocks and runs that clause's code-block, and then the thread waits again for the next available clause or the @waituntil@ statement is now satisfied.
|
---|
365 | This semantics allows work to be done in parallel while synchronizing over a set of resources, and furthermore, gives a good reason to use the @and@ operator.
|
---|
366 | If the @and@ operator waited for all clauses to be available before running, it is the same as just acquiring those resources consecutively by a sequence of @waituntil@ statements.
|
---|
367 |
|
---|
368 | As with normal C expressions, the @and@ operator binds more tightly than the @or@.
|
---|
369 | To give an @or@ operator higher precedence, parenthesis are used.
|
---|
370 | For example, the following @waituntil@ unconditionally waits for @C@ and one of either @A@ or @B@, since the @or@ is given higher precedence via parenthesis.
|
---|
371 | \begin{cfa}
|
---|
372 | @(@ waituntil( A ) { ... } // bind tightly to or
|
---|
373 | or waituntil( B ) { ... } @)@
|
---|
374 | and waituntil( C ) { ... }
|
---|
375 | \end{cfa}
|
---|
376 |
|
---|
377 | The guards in the @waituntil@ statement are called @when@ clauses.
|
---|
378 | Each boolean expression inside a @when@ is evaluated \emph{once} before the @waituntil@ statement is run.
|
---|
379 | Like Occam's ALT, the guards toggle clauses on and off, where a @waituntil@ clause is only evaluated and waited on if the corresponding guard is @true@.
|
---|
380 | In addition, the @waituntil@ guards require some nuance since both @and@ and @or@ operators are supported \see{Section~\ref{s:wu_guards}}.
|
---|
381 | When a guard is false and a clause is removed, it can be thought of as removing that clause and its preceding operation from the statement.
|
---|
382 | For example, in the following, the two @waituntil@ statements are semantically equivalent.
|
---|
383 |
|
---|
384 | \begin{lrbox}{\myboxA}
|
---|
385 | \begin{cfa}
|
---|
386 | when( true ) waituntil( A ) { ... }
|
---|
387 | or when( false ) waituntil( B ) { ... }
|
---|
388 | and waituntil( C ) { ... }
|
---|
389 | \end{cfa}
|
---|
390 | \end{lrbox}
|
---|
391 |
|
---|
392 | \begin{lrbox}{\myboxB}
|
---|
393 | \begin{cfa}
|
---|
394 | waituntil( A ) { ... }
|
---|
395 | and waituntil( C ) { ... }
|
---|
396 |
|
---|
397 | \end{cfa}
|
---|
398 | \end{lrbox}
|
---|
399 |
|
---|
400 | \begin{tabular}{@{}lcl@{}}
|
---|
401 | \usebox\myboxA & $\equiv$ & \usebox\myboxB
|
---|
402 | \end{tabular}
|
---|
403 |
|
---|
404 | The @else@ clause on the @waituntil@ has identical semantics to the @else@ clause in Ada.
|
---|
405 | If all resources are not immediately available and there is an @else@ clause, the @else@ clause is run and the thread continues.
|
---|
406 |
|
---|
407 | \subsection{Type Semantics}
|
---|
408 |
|
---|
409 | As mentioned, to support interaction with the @waituntil@ statement a type must support the trait in Figure~\ref{f:wu_trait}.
|
---|
410 | The @waituntil@ statement expects types to register and unregister themselves via calls to @register_select@ and @unregister_select@, respectively.
|
---|
411 | When a resource becomes available, @on_selected@ is run, and if it returns false, the corresponding code block is not run.
|
---|
412 | Many types do not need @on_selected@, but it is provided if a type needs to perform work or checks before the resource can be accessed in the code block.
|
---|
413 | The register/unregister routines in the trait also return booleans.
|
---|
414 | The return value of @register_select@ is @true@, if the resource is immediately available and @false@ otherwise.
|
---|
415 | The return value of @unregister_select@ is @true@, if the corresponding code block should be run after unregistration and @false@ otherwise.
|
---|
416 | The routine @on_selected@ and the return value of @unregister_select@ are needed to support channels as a resource.
|
---|
417 | More detail on channels and their interaction with @waituntil@ appear in Section~\ref{s:wu_chans}.
|
---|
418 |
|
---|
419 | The trait can be used directly by having a blocking object support the @is_selectable@ trait, or it can be used indirectly through routines that take the object as an argument.
|
---|
420 | When used indirectly, the object's routine returns a type that supports the @is_selectable@ trait.
|
---|
421 | This feature leverages \CFA's ability to overload on return type to select the correct overloaded routine for the @waituntil@ context.
|
---|
422 | Indirect support through routines is needed for types that want to support multiple operations such as channels that allow both reading and writing.
|
---|
423 |
|
---|
424 | \section{\lstinline{waituntil} Implementation}
|
---|
425 |
|
---|
426 | The @waituntil@ statement is not inherently complex, and Figure~\ref{f:WU_Impl} shows the basic outline of the @waituntil@ algorithm.
|
---|
427 | Complexity comes from the consideration of race conditions and synchronization needed when supporting various primitives.
|
---|
428 | The following sections use examples to fill in complexity details missing in Figure~\ref{f:WU_Impl}.
|
---|
429 | After which, the full pseudocode for the @waituntil@ algorithm is presented in Figure~\ref{f:WU_Full_Impl}.
|
---|
430 |
|
---|
431 | \begin{figure}
|
---|
432 | \begin{cfa}
|
---|
433 | select_nodes s[N]; $\C[3.25in]{// declare N select nodes}$
|
---|
434 | for ( node in s ) $\C{// register nodes}$
|
---|
435 | register_select( resource, node );
|
---|
436 | while ( statement predicate not satisfied ) { $\C{// check predicate}$
|
---|
437 | // block until clause(s) satisfied
|
---|
438 | for ( resource in waituntil statement ) { $\C{// run true code blocks}$
|
---|
439 | if ( resource is avail ) run code block
|
---|
440 | if ( statement predicate is satisfied ) break;
|
---|
441 | }
|
---|
442 | }
|
---|
443 | for ( node in s ) $\C{// deregister nodes}\CRT$
|
---|
444 | if ( unregister_select( resource, node ) ) run code block
|
---|
445 | \end{cfa}
|
---|
446 | \caption{\lstinline{waituntil} Implementation}
|
---|
447 | \label{f:WU_Impl}
|
---|
448 | \end{figure}
|
---|
449 |
|
---|
450 | The basic steps of the algorithm are:
|
---|
451 | \begin{enumerate}
|
---|
452 | \item
|
---|
453 | The @waituntil@ statement declares $N$ @select_node@s, one per resource that is being waited on, which stores any @waituntil@ data pertaining to that resource.
|
---|
454 |
|
---|
455 | \item
|
---|
456 | Each @select_node@ is then registered with the corresponding resource.
|
---|
457 |
|
---|
458 | \item
|
---|
459 | The thread executing the @waituntil@ then loops until the statement's predicate is satisfied.
|
---|
460 | In each iteration, if the predicate is unsatisfied, the @waituntil@ thread blocks.
|
---|
461 | When another thread satisfies a resource clause (\eg sends to a channel), it unblocks the @waituntil@ thread.
|
---|
462 | This thread checks all clauses for completion, and any completed clauses have their code blocks run.
|
---|
463 | While checking clause completion, if enough clauses have been run such that the statement predicate is satisfied, the loop exits early.
|
---|
464 |
|
---|
465 | \item
|
---|
466 | Once the thread escapes the loop, the @select_nodes@ are unregistered from the resources.
|
---|
467 | \end{enumerate}
|
---|
468 | These steps give a basic overview of how the statement works.
|
---|
469 | The following sections shed light on the specific changes and provide more implementation detail.
|
---|
470 |
|
---|
471 | \subsection{Locks}\label{s:wu_locks}
|
---|
472 |
|
---|
473 | The \CFA runtime supports a number of spinning and blocking locks, \eg semaphore, MCS, futex, Go mutex, spinlock, owner, \etc.
|
---|
474 | Many of these locks satisfy the @is_selectable@ trait, and hence, are resources supported by the @waituntil@ statement.
|
---|
475 | For example, the following waits until the thread has acquired lock @l1@ or locks @l2@ and @l3@.
|
---|
476 | \begin{cfa}
|
---|
477 | owner_lock l1, l2, l3;
|
---|
478 | waituntil ( l1 ) { ... }
|
---|
479 | or waituntil( l2 ) { ... }
|
---|
480 | and waituntil( l3 ) { ... }
|
---|
481 | \end{cfa}
|
---|
482 | Implicitly, the @waituntil@ is calling the lock acquire for each of these locks to establish a position in the lock's queue of waiting threads.
|
---|
483 | When the lock schedules this thread, it unblocks and runs the code block associated with the lock and then releases the lock.
|
---|
484 |
|
---|
485 | In detail, when a thread waits on multiple locks via a @waituntil@, it enqueues a @select_node@ in each of the lock's waiting queues.
|
---|
486 | When a @select_node@ reaches the front of the lock's queue and gains ownership, the thread blocked on the @waituntil@ is unblocked.
|
---|
487 | Now, the lock is held by the @waituntil@ thread until the code block is executed, and then the node is unregistered, during which the lock is released.
|
---|
488 | Immediately releasing the lock after the code block prevents the waiting thread from holding multiple locks and potentially introducing a deadlock.
|
---|
489 | As such, the only unregistered nodes associated with locks are the ones that have not run.
|
---|
490 |
|
---|
491 | \subsection{Timeouts}
|
---|
492 |
|
---|
493 | A timeout for the @waituntil@ statement is a duration passed to routine \lstinline[deletekeywords={timeout}]{timeout}\footnote{\lstinline{timeout} is a quasi-keyword in \CFA, allowing it to be used an identifier.}, \eg:
|
---|
494 | \begin{cquote}
|
---|
495 | \begin{tabular}{@{}l|l@{}}
|
---|
496 | \multicolumn{2}{@{}l@{}}{\lstinline{Duration D1\{ 1`ms \}, D2\{ 2`ms \}, D3\{ 3`ms \};}} \\
|
---|
497 | \begin{cfa}[deletekeywords={timeout}]
|
---|
498 | waituntil( i << C1 ) {}
|
---|
499 | or waituntil( i << C2 ) {}
|
---|
500 | or waituntil( i << C3 ) {}
|
---|
501 | or waituntil( timeout( D1 ) ) {}
|
---|
502 | or waituntil( timeout( D2 ) ) {}
|
---|
503 | or waituntil( timeout( D3 ) ) {}
|
---|
504 | \end{cfa}
|
---|
505 | &
|
---|
506 | \begin{cfa}[deletekeywords={timeout}]
|
---|
507 | waituntil( i << C1 ) {}
|
---|
508 | or waituntil( i << C2 ) {}
|
---|
509 | or waituntil( i << C3 ) {}
|
---|
510 | or waituntil( timeout( min( D1, D2, D3 ) ) ) {}
|
---|
511 |
|
---|
512 |
|
---|
513 | \end{cfa}
|
---|
514 | \end{tabular}
|
---|
515 | \end{cquote}
|
---|
516 | These two examples are basically equivalent.
|
---|
517 | Here, the multiple timeouts are useful because the durations can change during execution and the separate clauses provide different code blocks if a timeout triggers.
|
---|
518 | Multiple timeouts can also be used with @and@ to provide a minimal delay before proceeding.
|
---|
519 | In following example, either channel @C1@ or @C2@ must be satisfied but nothing can be done for at least 1 or 3 seconds after the channel read, respectively.
|
---|
520 | \begin{cfa}[deletekeywords={timeout}]
|
---|
521 | waituntil( i << C1 ){} and waituntil( timeout( 1`s ) ){}
|
---|
522 | or waituntil( i << C2 ){} and waituntil( timeout( 3`s ) ){}
|
---|
523 | \end{cfa}
|
---|
524 | If only @C2@ is satisfied, \emph{both} timeout code-blocks trigger because 1 second occurs before 3 seconds.
|
---|
525 | Note that the \CFA @waitfor@ statement only provides a single @timeout@ clause because it only supports @or@ semantics.
|
---|
526 |
|
---|
527 | The \lstinline[deletekeywords={timeout}]{timeout} routine is different from UNIX @sleep@, which blocks for the specified duration and returns the amount of time elapsed since the call started.
|
---|
528 | Instead, \lstinline[deletekeywords={timeout}]{timeout} returns a type that supports the @is_selectable@ trait, allowing the type system to select the correct overloaded routine for this context.
|
---|
529 | For the @waituntil@, it is more idiomatic for the \lstinline[deletekeywords={timeout}]{timeout} to use the same syntax as other blocking operations instead of having a special language clause.
|
---|
530 |
|
---|
531 | \subsection{Channels}\label{s:wu_chans}
|
---|
532 |
|
---|
533 | Channels require more complexity to allow synchronous multiplexing.
|
---|
534 | For locks, when an outside thread releases a lock and unblocks the @waituntil@ thread (WUT), the lock's MX property is passed to the WUT (no spinning locks).
|
---|
535 | For futures, the outside thread deliveries a value to the future and unblocks any waiting threads, including WUTs.
|
---|
536 | In either case, after the WUT unblocks, it is safe to execute its corresponding code block knowing access to the resource is protected by the lock or the read-only state of the future.
|
---|
537 | Similarly, for channels, when an outside thread inserts a value into a channel, it must unblock the WUT.
|
---|
538 | However, for channels, there is a race condition that poses an issue.
|
---|
539 | If the outside thread inserts into the channel and unblocks the WUT, there is a race where another thread can remove the channel data, so after the WUT unblocks and attempts to remove from the buffer, it fails, and the WUT must reblock (busy waiting).
|
---|
540 | This scenario is a \gls{toctou} race that needs to be consolidated.
|
---|
541 | To close the race, the outside thread must detect this case and insert directly into the left-hand side of the channel expression (@i << chan@) rather than into the channel, and then unblock the WUT.
|
---|
542 | Now the unblocked WUT is guaranteed to have a satisfied resource and its code block can safely executed.
|
---|
543 | The insertion circumvents the channel buffer via the wait-morphing in the \CFA channel implementation \see{Section~\ref{s:chan_impl}}, allowing @waituntil@ channel unblocking to not be special-cased.
|
---|
544 | Note that all channel operations are fair and no preference is given between @waituntil@ and direct channel operations when unblocking.
|
---|
545 |
|
---|
546 | Furthermore, if both @and@ and @or@ operators are used, the @or@ operations stop behaving like exclusive-or due to the race among channel operations, \eg:
|
---|
547 | \begin{cfa}
|
---|
548 | int i;
|
---|
549 | waituntil( i << A ) {} and waituntil( i << B ) {}
|
---|
550 | or waituntil( i << C ) {} and waituntil( i << D ) {}
|
---|
551 | \end{cfa}
|
---|
552 | If exclusive-or semantics are followed, only the code blocks for @A@ and @B@ are run, or the code blocks for @C@ and @D@.
|
---|
553 | However, four outside threads inserting into each channel can simultaneously put values into @i@ and attempt to unblock the WUT to run the four code-blocks.
|
---|
554 | This case introduces a race with complexity that increases with the size of the @waituntil@ statement.
|
---|
555 | However, due to TOCTOU issues, it is impossible to know if all resources are available without acquiring all the internal locks of channels in the subtree of the @waituntil@ clauses.
|
---|
556 | This approach is a poor solution for two reasons.
|
---|
557 | It is possible that once all the locks are acquired, the subtree is not satisfied and the locks must be released.
|
---|
558 | This work incurs a high cost for signalling threads and heavily increase contention on internal channel locks.
|
---|
559 | Furthermore, the @waituntil@ statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible.
|
---|
560 | As such, the exclusive-or semantics are lost when using both @and@ and @or@ operators since it cannot be supported without significant complexity and significantly affects @waituntil@ performance.
|
---|
561 | Therefore, the example of reusing variable @i@ by multiple output channels is considered a user error without exclusive-or semantics.
|
---|
562 | Given aliasing in C, it is impossible to even warn of the potential race.
|
---|
563 | In the future, it would be interesting to support Go-like syntax, \lstinline[language=Go]{case i := <- ...}, defining a new scoped @i@ variable for each clause.
|
---|
564 |
|
---|
565 | It was deemed important that exclusive-or semantics are maintained when only @or@ operators are used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel.
|
---|
566 | Consider the following example where thread 1 is reading and threads 2 and 3 are writing to channels @A@ and @B@ concurrently.
|
---|
567 | \begin{cquote}
|
---|
568 | \begin{tabular}{@{}l|l|l@{}}
|
---|
569 | \multicolumn{3}{@{}l@{}}{\lstinline{channel(int) A, B; // zero size channels}} \\
|
---|
570 | thread 1 & thread 2 & thread 3 \\
|
---|
571 | \begin{cfa}
|
---|
572 | waituntil( i << A ) {}
|
---|
573 | or waituntil( i << B ) {}
|
---|
574 | \end{cfa}
|
---|
575 | &
|
---|
576 | \begin{cfa}
|
---|
577 | A << 1;
|
---|
578 |
|
---|
579 | \end{cfa}
|
---|
580 | &
|
---|
581 | \begin{cfa}
|
---|
582 | B << 2;
|
---|
583 |
|
---|
584 | \end{cfa}
|
---|
585 | \end{tabular}
|
---|
586 | \end{cquote}
|
---|
587 | For thread 1 to have exclusive-or semantics, it must only consume from exactly one of @A@ or @B@.
|
---|
588 | As such, thread 2 and 3 must race to establish the winning clause of the @waituntil@ in thread 1.
|
---|
589 | This race is consolidated by thread 2 and 3 each attempting to set a pointer to the winning clause's @select_node@ address using \gls{cas}.
|
---|
590 | The winner bypasses the channel and inserts into the WUT's left-hand, and signals thread 1.
|
---|
591 | The loser continues checking if there is space in the channel, and if so performs the channel insert operation with a possible signal of a waiting remove thread;
|
---|
592 | otherwise, if there is no space, the loser blocks.
|
---|
593 | It is important the race occurs \emph{before} operating on the channel, because channel actions are different with respect to each thread.
|
---|
594 | If the race was consolidated after the operation, both thread 2 and 3 could potentially write into @i@ concurrently.
|
---|
595 |
|
---|
596 | Channels introduce another interesting implementation issue.
|
---|
597 | Supporting both reading and writing to a channel in a @waituntil@ means that one @waituntil@ clause may be the notifier of another @waituntil@ clause.
|
---|
598 | This conjunction poses a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel.
|
---|
599 | Consider the following example, alongside a described ordering of events to highlight the race.
|
---|
600 | \begin{cquote}
|
---|
601 | \begin{tabular}{@{}l|l@{}}
|
---|
602 | \multicolumn{2}{@{}l@{}}{\lstinline{channel(int) A, B; // zero size channels}} \\
|
---|
603 | thread 1 & thread 2 \\
|
---|
604 | \begin{cfa}[moredelim={**[is][\color{blue}]{\#}{\#}}]
|
---|
605 | waituntil( @i << A@ ) {}
|
---|
606 | or waituntil( #i << B# ) {}
|
---|
607 | \end{cfa}
|
---|
608 | &
|
---|
609 | \begin{cfa}[moredelim={**[is][\color{blue}]{\#}{\#}}]
|
---|
610 | waituntil( #B << 2# ) {}
|
---|
611 | or waituntil( @A << 1@ ) {}
|
---|
612 | \end{cfa}
|
---|
613 | \end{tabular}
|
---|
614 | \end{cquote}
|
---|
615 | Assume thread 1 executes first, registers with channel @A@ and proceeds in the @waituntil@.
|
---|
616 | Since @A@ is empty, thread 1 cannot remove, and then thread 1 is interrupted before registering with @B@.
|
---|
617 | Thread 2 similarly registers with channel @B@, and proceeds in the @waituntil@.
|
---|
618 | Since @B@ is zero size there is no space to insert, and then thread 2 is interrupted before registering with @A@.
|
---|
619 | At this point, thread 1 and 2 resume execution.
|
---|
620 | Remember from above, each exclusive-or @waituntil@ holds a race to set the winning clause of the statement.
|
---|
621 | The issue that arises is that these two @waituntil@ statements must have matching winning clauses (both @A@ clauses or both @B@ clauses) to preserve the exclusive-or semantics, since a zero-sized channel needs an insert/remove pair for an operation to occur.
|
---|
622 | If threads 1 and 2 race to set a winner only in their own @waituntil@, thread 1 can think it successfully removed from @B@, and thread 2 can think it successfully inserted into @A@, which is an error.
|
---|
623 | Hence, there is a race on two fronts.
|
---|
624 | If thread 1 wins the race and sees that @B@ has a waiting insertion, then thread 2 must execute the first clause of its @waituntil@ and thread 1 execute its second.
|
---|
625 | Correspondingly, if thread 2 wins the race and sees that @A@ has a waiting removal, then thread 1 must execute the first clause of its @waituntil@ and thread 2 execute its second.
|
---|
626 | Any other execution scenario is incorrect for exclusive-or semantics.
|
---|
627 | Note that priority execution of multiple satisfied @waituntil@ causes (\ie top to bottom) is not violated because, in this scenario, there is only one satisfied clause for either thread.
|
---|
628 |
|
---|
629 | The Go @select@ solves this problem by acquiring all the internal locks of the channels before registering the @select@ on the channels.
|
---|
630 | This approach eliminates the race shown above since thread 1 and 2 cannot both be registering at the same time.
|
---|
631 | However, this approach cannot be used in \CFA, since the @waituntil@ is polymorphic.
|
---|
632 | Not all types in a @waituntil@ have an internal lock, and when using non-channel types, acquiring all the locks incurs extra unneeded overhead.
|
---|
633 | Instead, this race is consolidated in \CFA in two phases by having an intermediate pending status value for the race.
|
---|
634 | This race case is detectable, and if detected, each thread first races to set its own @waituntil@ race pointer to be pending.
|
---|
635 | If it succeeds, it then attempts to set the other thread's @waituntil@ race pointer to its success value.
|
---|
636 | If either thread successfully sets the the other thread's @waituntil@ race pointer, then the operation can proceed, if not the signalling threads set its own race pointer back to the initial value and repeats.
|
---|
637 | This retry mechanism can potentially introduce a livelock, but in practice a livelock here is highly unlikely.
|
---|
638 | Furthermore, the likelihood of a livelock here is zero unless the program is in the niche case of having two or more exclusive-or @waituntil@s with two or more clauses in reverse order of priority.
|
---|
639 | This livelock case can be fully eliminated using locks like Go, or if a \gls{dcas} instruction is available.
|
---|
640 | If any other threads attempt to set a WUT's race pointer and see a pending value, they wait until the value changes before proceeding to ensure that, in the case the WUT fails, the signal is not lost.
|
---|
641 | This protocol ensures that signals cannot be lost and that the two races can be resolved in a safe manner.
|
---|
642 | The implementation of this protocol is shown in Figure~\ref{f:WU_DeadlockAvoidance}.
|
---|
643 |
|
---|
644 | \begin{figure}
|
---|
645 | \begin{cfa}
|
---|
646 | bool pending_set_other( select_node & other, select_node & mine ) {
|
---|
647 | unsigned long int cmp_status = UNSAT;
|
---|
648 |
|
---|
649 | // Try to set other status, if we succeed break and return true
|
---|
650 | while( ! CAS( other.clause_status, &cmp_status, SAT ) ) {
|
---|
651 | if ( cmp_status == SAT )
|
---|
652 | return false; // If other status is SAT we lost so return false
|
---|
653 |
|
---|
654 | // Toggle own status flag to allow other thread to potentially win
|
---|
655 | mine.status = UNSAT;
|
---|
656 |
|
---|
657 | // Reset compare flag
|
---|
658 | cmp_status = UNSAT;
|
---|
659 |
|
---|
660 | // Attempt to set own status flag back to PENDING to retry
|
---|
661 | if ( ! CAS( mine.clause_status, &cmp_status, PENDING ) )
|
---|
662 | return false; // If we fail then we lost so return false
|
---|
663 |
|
---|
664 | // Reset compare flag
|
---|
665 | cmp_status = UNSAT;
|
---|
666 | }
|
---|
667 | return true;
|
---|
668 | }
|
---|
669 | \end{cfa}
|
---|
670 | \caption{Exclusive-or \lstinline{waituntil} channel deadlock avoidance protocol}
|
---|
671 | \label{f:WU_DeadlockAvoidance}
|
---|
672 | \end{figure}
|
---|
673 |
|
---|
674 | Channels in \CFA have exception-based shutdown mechanisms that the @waituntil@ statement needs to support.
|
---|
675 | These exception mechanisms are supported through the @on_selected@ routine.
|
---|
676 | This routine is needed by channels to detect if they are closed after unblocking in a @waituntil@ statement, to ensure the appropriate behaviour is taken and an exception is thrown.
|
---|
677 | Hence, the channel close-down mechanism is handled correctly.
|
---|
678 |
|
---|
679 | \subsection{Guards and Statement Predicate}\label{s:wu_guards}
|
---|
680 |
|
---|
681 | It is trivial to check when a synchronous multiplexing utility is done for the or/xor relationship, since any resource becoming available means that the blocked thread can proceed and the @waituntil@ statement is finished.
|
---|
682 | In \uC and \CFA, the \gls{synch_multiplex} mechanism have both an and/or relationship, which along with guards, make the problem of checking for completion of the statement difficult.
|
---|
683 | Consider the following @waituntil@.
|
---|
684 | \begin{cfa}
|
---|
685 | when( GA ) waituntil( A ) {}
|
---|
686 | and when( GB ) waituntil( B ) {}
|
---|
687 | or when( GC ) waituntil( C ) {}
|
---|
688 | \end{cfa}
|
---|
689 | When the @waituntil@ thread wakes up, the following predicate represents satisfaction:
|
---|
690 | \begin{cfa}
|
---|
691 | A && B || C || ! GA && B || ! GB && A || ! GA && ! GB && ! GC
|
---|
692 | \end{cfa}
|
---|
693 | which can be simplified to:
|
---|
694 | \begin{cfa}
|
---|
695 | ( A || ! GA ) && ( B || ! GB ) || C || ! GA && ! GB && ! GC
|
---|
696 | \end{cfa}
|
---|
697 | Checking the complete predicate on each iteration of the pending @waituntil@ is expensive so \uC and \CFA both take steps to simplify checking for statement completion.
|
---|
698 |
|
---|
699 | In the \uC @_Select@ statement, this problem is solved by constructing a tree of the resources, where the internal nodes are operators and the leaves are booleans storing the state of each resource.
|
---|
700 | A diagram of the tree for the complete predicate above is shown in Figure~\ref{f:uC_select_tree}, alongside the modification of the tree that occurs when @GA@ is @false@.
|
---|
701 | Each internal node stores the statuses of the two subtrees beneath it.
|
---|
702 | When resources become available, their corresponding leaf node status is modified, which percolates up the tree to update the state of the statement.
|
---|
703 | Once the root of the tree has both subtrees marked as @true@ then the statement is complete.
|
---|
704 | As an optimization, when the internal nodes are updated, the subtrees marked as @true@ are pruned and not examined again.
|
---|
705 | To support statement guards in \uC, the tree is modified to remove an internal node if a guard is false to maintain the appropriate predicate representation.
|
---|
706 |
|
---|
707 | \begin{figure}
|
---|
708 | \begin{center}
|
---|
709 | \input{diagrams/uCpp_select_tree.tikz}
|
---|
710 | \end{center}
|
---|
711 | \caption{\uC \lstinline[language=uC++]{select} tree modification}
|
---|
712 | \label{f:uC_select_tree}
|
---|
713 | \end{figure}
|
---|
714 |
|
---|
715 | The \CFA @waituntil@ statement blocks a thread until a set of resources have become available that satisfy the complete predicate of a @waituntil@.
|
---|
716 | The waiting condition of the @waituntil@ statement is implemented as the complete predicate over the resources, joined by the @waituntil@ operators, where a resource is @true@ if it is available, and @false@ otherwise.
|
---|
717 | This complete predicate is used as the mechanism to check if a thread is done waiting on a @waituntil@.
|
---|
718 | Leveraging the compiler, a predicate routine is generated per @waituntil@.
|
---|
719 | This predicate routine accepts the statuses of the resources being waited on as arguments.
|
---|
720 | A resource status is an integer that indicates whether a resource is either not available, available, or has already run its associated code block.
|
---|
721 | The predicate routine returns @true@ when the @waituntil@ is done, \ie enough resources have run their associated code blocks to satisfy the @waituntil@'s predicate, and false otherwise.
|
---|
722 | To support guards on the \CFA @waituntil@ statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate.
|
---|
723 | The generated code allows the predicate that is checked with each iteration to be simplified to not check guard values.
|
---|
724 | For example, the following is generated for the complete predicate above:
|
---|
725 | \begin{cfa}
|
---|
726 | // statement completion predicate
|
---|
727 | bool check_completion( select_node * nodes ) {
|
---|
728 | return nodes[0].status && nodes[1].status || nodes[2].status;
|
---|
729 | }
|
---|
730 | if ( GA || GB || GC ) { $\C{// skip statement if all guards false}$
|
---|
731 | select_node nodes[3];
|
---|
732 | nodes[0].status = ! GA && GB; $\C{// A's status}$
|
---|
733 | nodes[1].status = ! GB && GA; $\C{// B's status}$
|
---|
734 | nodes[2].status = ! GC; $\C{// C's status}$
|
---|
735 | // ... rest of waituntil codegen ...
|
---|
736 | }
|
---|
737 | \end{cfa}
|
---|
738 |
|
---|
739 | \uC's @_Select@, supports operators both inside and outside of the \lstinline[language=uC++]{_Select} clauses.
|
---|
740 | In the following example, the code blocks run once their corresponding predicate inside the round braces is satisfied.
|
---|
741 | \begin{lstlisting}[language=uC++,{moredelim=**[is][\color{red}]{@}{@}}]
|
---|
742 | Future_ISM<int> A, B, C, D;
|
---|
743 | _Select( @A || B && C@ ) { ... }
|
---|
744 | and _Select( @D && E@ ) { ... }
|
---|
745 | \end{lstlisting}
|
---|
746 | This feature is more expressive than the @waituntil@ statement in \CFA, allowing the code block for @&&@ to only run after \emph{both} resources are available.
|
---|
747 |
|
---|
748 | In \CFA, since the @waituntil@ statement supports more resources than just futures, implementing operators inside clauses is avoided for a few reasons.
|
---|
749 | As a motivating example, suppose \CFA supported operators inside clauses as in:
|
---|
750 | \begin{cfa}
|
---|
751 | owner_lock A, B, C, D;
|
---|
752 | waituntil( A && B ) { ... }
|
---|
753 | or waituntil( C && D ) { ... }
|
---|
754 | \end{cfa}
|
---|
755 | If the @waituntil@ acquires each lock as it becomes available, there is a possible deadlock since it is in a hold-and-wait situation.
|
---|
756 | Other semantics are needed to ensure this operation is safe.
|
---|
757 | One possibility is to use \CC's @scoped_lock@ approach described in Section~\ref{s:DeadlockAvoidance};
|
---|
758 | however, that opens the potential for livelock.
|
---|
759 | Another possibility is to use resource ordering similar to \CFA's @mutex@ statement, but that alone is insufficient, if the resource ordering is not used universally.
|
---|
760 | One other way this could be implemented is to wait until all resources for a given clause are available before proceeding to acquire them, but this also quickly becomes a poor approach.
|
---|
761 | This approach does not work due to \gls{toctou} issues, \ie it is impossible to ensure that the full set of resources are available without holding them all first.
|
---|
762 | Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems.
|
---|
763 | Finally, the problem of operators inside clauses is also a difficult to handle when supporting channels.
|
---|
764 | It would require some way to ensure channels used with internal operators are modified, if and only if, the corresponding code block is run.
|
---|
765 | However, that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}.
|
---|
766 | Ultimately, this feature was not considered crucial after taking into account the complexity and runtime cost.
|
---|
767 |
|
---|
768 | \subsection{The full \lstinline{waituntil} picture}
|
---|
769 |
|
---|
770 | Given all the complex details handled by the @waituntil@, its full pseudocode is presented in Figure \ref{f:WU_Full_Impl}.
|
---|
771 | Some things to note are as follows.
|
---|
772 | The @finally@ blocks provide exception-safe \gls{raii} unregistering of nodes, and in particular, the @finally@ inside the innermost loop performs the immediate unregistering required for deadlock-freedom mentioned in Section~\ref{s:wu_locks}.
|
---|
773 | The @when_conditions@ array is used to store the boolean result of evaluating each guard at the beginning of the @waituntil@, and it is used to conditionally omit operations on resources with @false@ guards.
|
---|
774 | As discussed in Section~\ref{s:wu_chans}, this pseudocode includes conditional code-block execution based on the result of both @on_selected@ and @unregister_select@, which allows the channel implementation to ensure all available channel resources have their corresponding code block run.
|
---|
775 |
|
---|
776 | \begin{figure}
|
---|
777 | \begin{cfa}
|
---|
778 | bool when_conditions[N];
|
---|
779 | for ( node in nodes ) $\C[3.75in]{// evaluate guards}$
|
---|
780 | if ( node has guard )
|
---|
781 | when_conditions[node] = node_guard;
|
---|
782 | else
|
---|
783 | when_conditions[node] = true;
|
---|
784 |
|
---|
785 | if ( any when_conditions[node] are true ) {
|
---|
786 | select_nodes nodes[N]; $\C{// declare N select nodes}$
|
---|
787 | try {
|
---|
788 | // ... set statuses for nodes with when_conditions[node] == false ...
|
---|
789 |
|
---|
790 | for ( node in nodes ) $\C{// register nodes}$
|
---|
791 | if ( when_conditions[node] )
|
---|
792 | register_select( resource, node );
|
---|
793 |
|
---|
794 | while ( !check_completion( nodes ) ) { $\C{// check predicate}$
|
---|
795 | // block
|
---|
796 | for ( resource in waituntil statement ) { $\C{// run true code blocks}$
|
---|
797 | if ( check_completion( nodes ) ) break;
|
---|
798 | if ( resource is avail )
|
---|
799 | try {
|
---|
800 | if( on_selected( resource ) ) $\C{// conditionally run block}$
|
---|
801 | run code block
|
---|
802 | } finally $\C{// for exception safety}$
|
---|
803 | unregister_select( resource, node ); $\C{// immediate unregister}$
|
---|
804 | }
|
---|
805 | }
|
---|
806 | } finally { $\C{// for exception safety}$
|
---|
807 | for ( registered nodes in nodes ) $\C{// deregister nodes}$
|
---|
808 | if ( when_conditions[node] && unregister_select( resource, node )
|
---|
809 | && on_selected( resource ) )
|
---|
810 | run code block $\C{// run code block upon unregister}\CRT$
|
---|
811 | }
|
---|
812 | }
|
---|
813 | \end{cfa}
|
---|
814 | \caption{Full \lstinline{waituntil} Pseudocode Implementation}
|
---|
815 | \label{f:WU_Full_Impl}
|
---|
816 | \end{figure}
|
---|
817 |
|
---|
818 | \section{\lstinline{waituntil} Performance}
|
---|
819 |
|
---|
820 | Similar facilities to @waituntil@ are discussed in Section~\ref{s:History} covering C, Ada, Rust, \CC, and OCaml.
|
---|
821 | However, these facilities are either not meaningful or feasible to benchmark against.
|
---|
822 | The UNIX @select@ and related utilities are not comparable since they are system calls that go into the kernel and operate on file descriptors, whereas the @waituntil@ exists solely in user space.
|
---|
823 | Ada's \lstinline[language=Ada]{select} and \uC's \lstinline[language=uC++]{_Accept} only operate on method calls, which is done in \CFA via the @waitfor@ statement.
|
---|
824 | Rust and \CC only offer a busy-wait approach, which is not comparable to a blocking approach.
|
---|
825 | OCaml's @select@ waits on channels that are not comparable with \CFA and Go channels, so OCaml @select@ is not benchmarked against Go's @select@ and \CFA's @waituntil@.
|
---|
826 |
|
---|
827 | The two \gls{synch_multiplex} utilities that are in the realm of comparability with the \CFA @waituntil@ statement are the Go \lstinline[language=Go]{select} statement and the \uC \lstinline[language=uC++]{_Select} statement.
|
---|
828 | As such, two microbenchmarks are presented, one for Go and one for \uC to contrast this feature.
|
---|
829 | Given the differences in features, polymorphism, and expressibility between @waituntil@ and \lstinline[language=Go]{select}, and \uC \lstinline[language=uC++]{_Select}, the aim of the microbenchmarking in this chapter is to show that these implementations lie in the same realm of performance, not to pick a winner.
|
---|
830 |
|
---|
831 | \subsection{Channel Benchmark}
|
---|
832 |
|
---|
833 | The channel multiplexing benchmarks compare \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}, where the resource being waited on is a set of channels.
|
---|
834 | Although Unix's select, poll and epoll multiplex over file descriptors, which are effectively channels, these utilities are not included in the benchmarks.
|
---|
835 | Reading or writing to a file descriptor requires a system call which is much more expensive than operating on a user-space channel.
|
---|
836 | As such, it is infeasible to compare Unix's select, poll and epoll with \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}.
|
---|
837 | The basic structure of the benchmark has the number of cores split evenly between producer and consumer threads, \ie, with 8 cores there are 4 producer and 4 consumer threads.
|
---|
838 | The number of resource clauses $C$ is also varied across 2, 4, and 8 clauses, where each clause has a different channel that it waits on.
|
---|
839 | Each producer and consumer repeatedly waits to either produce or consume from one of the $C$ clauses and respective channels.
|
---|
840 | For example, in \CFA syntax, the work loop in the consumer main with $C = 4$ clauses is:
|
---|
841 | \begin{cfa}
|
---|
842 | for ()
|
---|
843 | waituntil( val << chans[0] ); or waituntil( val << chans[1] );
|
---|
844 | or waituntil( val << chans[2] ); or waituntil( val << chans[3] );
|
---|
845 | \end{cfa}
|
---|
846 | A successful consumption is counted as a channel operation, and the throughput of these operations is measured over 10 seconds.
|
---|
847 | The first benchmark measures throughput of the producers and consumer synchronously waiting on the channels and the second has the threads asynchronously wait on the channels using the Go @default@ and \CFA @else@ clause.
|
---|
848 | The results are shown in Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} respectively.
|
---|
849 |
|
---|
850 | \begin{figure}
|
---|
851 | \centering
|
---|
852 | \captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize}
|
---|
853 | \subfloat[AMD]{
|
---|
854 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_2.pgf}}
|
---|
855 | }
|
---|
856 | \subfloat[Intel]{
|
---|
857 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_2.pgf}}
|
---|
858 | }
|
---|
859 | \bigskip
|
---|
860 |
|
---|
861 | \subfloat[AMD]{
|
---|
862 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_4.pgf}}
|
---|
863 | }
|
---|
864 | \subfloat[Intel]{
|
---|
865 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_4.pgf}}
|
---|
866 | }
|
---|
867 | \bigskip
|
---|
868 |
|
---|
869 | \subfloat[AMD]{
|
---|
870 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_8.pgf}}
|
---|
871 | }
|
---|
872 | \subfloat[Intel]{
|
---|
873 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_8.pgf}}
|
---|
874 | }
|
---|
875 | \caption{The channel synchronous multiplexing benchmark comparing Go select and \CFA \lstinline{waituntil} statement throughput (higher is better).}
|
---|
876 | \label{f:select_contend_bench}
|
---|
877 | \end{figure}
|
---|
878 |
|
---|
879 | \begin{figure}
|
---|
880 | \centering
|
---|
881 | \captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize}
|
---|
882 | \subfloat[AMD]{
|
---|
883 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_2.pgf}}
|
---|
884 | }
|
---|
885 | \subfloat[Intel]{
|
---|
886 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_2.pgf}}
|
---|
887 | }
|
---|
888 | \bigskip
|
---|
889 |
|
---|
890 | \subfloat[AMD]{
|
---|
891 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_4.pgf}}
|
---|
892 | }
|
---|
893 | \subfloat[Intel]{
|
---|
894 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_4.pgf}}
|
---|
895 | }
|
---|
896 | \bigskip
|
---|
897 |
|
---|
898 | \subfloat[AMD]{
|
---|
899 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_8.pgf}}
|
---|
900 | }
|
---|
901 | \subfloat[Intel]{
|
---|
902 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_8.pgf}}
|
---|
903 | }
|
---|
904 | \caption{The asynchronous multiplexing channel benchmark comparing Go select and \CFA \lstinline{waituntil} statement throughput (higher is better).}
|
---|
905 | \label{f:select_spin_bench}
|
---|
906 | \end{figure}
|
---|
907 |
|
---|
908 | Both Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} have similar results when comparing \lstinline[language=Go]{select} and @waituntil@.
|
---|
909 | In the AMD benchmarks (left column), the performance is very similar as the number of cores scale.
|
---|
910 | The AMD machine has a high-caching contention cost because of its \emph{chiplet} L3 cache (\ie many L3 caches servicing a small number of cores), which creates a bottleneck on the channel locks and dominates the shape of the performance curve for both \CFA and Go.
|
---|
911 | Hence, it is difficult to detect differences in the \gls{synch_multiplex}, except at low cores, where Go has significantly better performance, due to an optimization in its scheduler.
|
---|
912 | Go heavily optimizes thread handoffs on the local run-queue, which can result in very good performance for low numbers of threads parking/unparking each other~\cite{go:sched}.
|
---|
913 | In the Intel benchmarks (right column), \CFA performs better than Go as the number of cores scales past 2/4 and as the number of clauses increase.
|
---|
914 | This difference is due to Go's acquiring all channel locks when registering and unregistering channels on a \lstinline[language=Go]{select}.
|
---|
915 | Go then is holding a lock for every channel, resulting in worse performance as the number of channels increase.
|
---|
916 | In \CFA, since races are consolidated without holding all locks, it scales much better both with cores and clauses since more work can occur in parallel.
|
---|
917 | This scalability difference is more significant on the Intel machine than the AMD machine since the Intel has lower cache-contention costs.
|
---|
918 |
|
---|
919 | The Go approach of holding all internal channel-locks in the \lstinline[language=Go]{select} has additional drawbacks.
|
---|
920 | There are pathological cases where Go's throughput has significant jitter.
|
---|
921 | Consider a producer and consumer thread, @P1@ and @C1@, selecting from both channels @A@ and @B@.
|
---|
922 | \begin{cquote}
|
---|
923 | \begin{tabular}{@{}ll@{}}
|
---|
924 | @P1@ & @C1@ \\
|
---|
925 | \begin{cfa}
|
---|
926 | waituntil( A << i ); or waituntil( B << i );
|
---|
927 | \end{cfa}
|
---|
928 | &
|
---|
929 | \begin{cfa}
|
---|
930 | waituntil( val << A ); or waituntil( val << B );
|
---|
931 | \end{cfa}
|
---|
932 | \end{tabular}
|
---|
933 | \end{cquote}
|
---|
934 | Additionally, there is another producer and consumer thread, @P2@ and @C2@, operating solely on @B@.
|
---|
935 | \begin{cquote}
|
---|
936 | \begin{tabular}{@{}ll@{}}
|
---|
937 | @P2@ & @C2@ \\
|
---|
938 | \begin{cfa}
|
---|
939 | B << val;
|
---|
940 | \end{cfa}
|
---|
941 | &
|
---|
942 | \begin{cfa}
|
---|
943 | val << B;
|
---|
944 | \end{cfa}
|
---|
945 | \end{tabular}
|
---|
946 | \end{cquote}
|
---|
947 | In Go, this setup results in significantly worse performance since @P2@ and @C2@ cannot operate in parallel with @P1@ and @C1@ due to all locks being acquired.
|
---|
948 | Interestingly, this case may not be as pathological as it seems.
|
---|
949 | If the set of channels belonging to a \lstinline[language=Go]{select} have channels that overlap with the set of another \lstinline[language=Go]{select}, these statements lose the ability to operate in parallel.
|
---|
950 | The implementation in \CFA only holds a single lock at a time, resulting in better locking granularity, and hence, more parallelism.
|
---|
951 | Comparison of this pathological case is shown in Table~\ref{t:pathGo}.
|
---|
952 | The AMD results highlight the worst-case scenario for Go since contention is more costly on this machine than the Intel machine.
|
---|
953 |
|
---|
954 | \begin{table}[t]
|
---|
955 | \centering
|
---|
956 | \setlength{\extrarowheight}{2pt}
|
---|
957 | \setlength{\tabcolsep}{5pt}
|
---|
958 |
|
---|
959 | \caption{Throughput (channel operations per second) of \CFA and Go in a pathological case for contention in Go's select implementation}
|
---|
960 | \label{t:pathGo}
|
---|
961 | \begin{tabular}{r|r|r}
|
---|
962 | & \multicolumn{1}{c|}{\CFA} & \multicolumn{1}{c}{Go} \\
|
---|
963 | \hline
|
---|
964 | AMD & \input{data/nasus_Order} \\
|
---|
965 | \hline
|
---|
966 | Intel & \input{data/pyke_Order}
|
---|
967 | \end{tabular}
|
---|
968 | \end{table}
|
---|
969 |
|
---|
970 | Another difference between Go and \CFA is the order of clause selection when multiple clauses are available.
|
---|
971 | Go \emph{randomly} selects a clause~\cite{go:select}, but \CFA chooses in the order clauses are listed.
|
---|
972 | This \CFA design decision allows users to set implicit priorities, which can result in more predictable behaviour and even better performance.
|
---|
973 | In the previous example, threads @P1@ and @C1@ prioritize channel @A@ in the @waituntil@, which can reduce contention for threads @P2@ and @C2@ accessing channel @B@.
|
---|
974 | If \CFA did not have priorities, the performance difference in Table~\ref{t:pathGo} would be significant less due to extra contention on channel @B@.
|
---|
975 |
|
---|
976 | \subsection{Future Benchmark}
|
---|
977 |
|
---|
978 | The future benchmark compares \CFA's @waituntil@ with \uC's \lstinline[language=uC++]{_Select}, with both utilities waiting on futures.
|
---|
979 | While both statements have very similar semantics, supporting @and@ and @or@ operators, \lstinline[language=uC++]{_Select} can only wait on futures, whereas the @waituntil@ is polymorphic.
|
---|
980 | As such, the underlying implementation of the operators differs between @waituntil@ and \lstinline[language=uC++]{_Select}.
|
---|
981 | The @waituntil@ statement checks for statement completion using a predicate function, whereas the \lstinline[language=uC++]{_Select} statement maintains a tree that represents the state of the internal predicate.
|
---|
982 |
|
---|
983 | This benchmark aims to indirectly measure the impact of various predicates on the performance of the @waituntil@ and \lstinline[language=uC++]{_Select} statements.
|
---|
984 | The benchmark is indirect since the performance of futures in \CFA and \uC differ by a significant margin.
|
---|
985 | The experiment has a server, which cycles fulfilling three futures, @A@, @B@, and @C@, and a client, which waits for these futures to be fulfilled using four different kinds of predicates given in \CFA:
|
---|
986 | \begin{cquote}
|
---|
987 | \begin{tabular}{@{}l|l@{}}
|
---|
988 | OR & AND \\
|
---|
989 | \hline
|
---|
990 | \begin{cfa}
|
---|
991 | waituntil( A ) { get( A ); }
|
---|
992 | or waituntil( B ) { get( B ); }
|
---|
993 | or waituntil( C ) { get( C ); }
|
---|
994 | \end{cfa}
|
---|
995 | &
|
---|
996 | \begin{cfa}
|
---|
997 | waituntil( A ) { get( A ); }
|
---|
998 | and waituntil( B ) { get( B ); }
|
---|
999 | and waituntil( C ) { get( C ); }
|
---|
1000 | \end{cfa}
|
---|
1001 | \\
|
---|
1002 | \multicolumn{2}{@{}c@{}}{} \\
|
---|
1003 | AND-OR & OR-AND \\
|
---|
1004 | \hline
|
---|
1005 | \begin{cfa}
|
---|
1006 | waituntil( A ) { get( A ); }
|
---|
1007 | and waituntil( B ) { get( B ); }
|
---|
1008 | or waituntil( C ) { get( C ); }
|
---|
1009 | \end{cfa}
|
---|
1010 | &
|
---|
1011 | \begin{cfa}
|
---|
1012 | @(@ waituntil( A ) { get( A ); }
|
---|
1013 | or waituntil( B ) { get( B ); } @)@
|
---|
1014 | and waituntil( C ) { get( C ); }
|
---|
1015 | \end{cfa}
|
---|
1016 | \end{tabular}
|
---|
1017 | \end{cquote}
|
---|
1018 | The server and client use a low cost synchronize after each fulfillment, so the server does not race ahead of the client.
|
---|
1019 |
|
---|
1020 | Results of this benchmark are shown in Figure~\ref{f:futurePerf}.
|
---|
1021 | Each pair of bars is marked with the predicate name for that experiment and the value at the top of each bar is the standard deviation.
|
---|
1022 | In detail, \uC results are lower in all cases due to the performance difference between futures and the more complex \gls{synch_multiplex} implementation.
|
---|
1023 | However, the bars for both systems have similar height patterns across the experiments.
|
---|
1024 | The @OR@ column for \CFA is more performant than the other \CFA predicates, due to the special-casing of @waituntil@ statements with only @or@ operators.
|
---|
1025 | For both \uC and \CFA, the @AND@ experiment is the least performant, which is expected since all three futures need to be fulfilled for each statement completion.
|
---|
1026 | Interestingly, \CFA has lower variation across predicates on the AMD (excluding the special OR case), whereas \uC has lower variation on the Intel.
|
---|
1027 | Given the differences in semantics and implementation between \uC and \CFA, this test only illustrates the overall costs among the different kinds of predicates.
|
---|
1028 |
|
---|
1029 | \begin{figure}
|
---|
1030 | \centering
|
---|
1031 | \subfloat[AMD Future Synchronization Benchmark]{
|
---|
1032 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Future.pgf}}
|
---|
1033 | \label{f:futureAMD}
|
---|
1034 | }
|
---|
1035 | \subfloat[Intel Future Synchronization Benchmark]{
|
---|
1036 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Future.pgf}}
|
---|
1037 | \label{f:futureIntel}
|
---|
1038 | }
|
---|
1039 | \caption{\CFA \lstinline{waituntil} and \uC \lstinline{_Select} statement throughput synchronizing on a set of futures with varying wait predicates (higher is better).}
|
---|
1040 | \label{f:futurePerf}
|
---|
1041 | \end{figure}
|
---|