1 | % ====================================================================== |
---|
2 | % ====================================================================== |
---|
3 | \chapter{Waituntil}\label{s:waituntil} |
---|
4 | % ====================================================================== |
---|
5 | % ====================================================================== |
---|
6 | |
---|
7 | Consider the following motivating problem. |
---|
8 | There are $N$ stalls (resources) in a bathroom and there are $M$ people (threads) using the bathroom. |
---|
9 | Each stall has its own lock since only one person may occupy a stall at a time. |
---|
10 | Humans solve this problem in the following way. |
---|
11 | They check if all of the stalls are occupied. |
---|
12 | If not, they enter and claim an available stall. |
---|
13 | If they are all occupied, people queue and watch the stalls until one is free, and then enter and lock the stall. |
---|
14 | This solution can be implemented on a computer easily, if all threads are waiting on all stalls and agree to queue. |
---|
15 | |
---|
16 | Now the problem is extended. |
---|
17 | Some stalls are wheelchair accessible and some stalls have gender identification. |
---|
18 | Each person (thread) may be limited to only one kind of stall or may choose among different kinds of stalls that match their criteria. |
---|
19 | Immediately, the problem becomes more difficult. |
---|
20 | A single queue no longer solves the problem. |
---|
21 | What happens when there is a stall available that the person at the front of the queue cannot choose? |
---|
22 | The na\"ive solution has each thread spin indefinitely continually checking every matching kind of stall(s) until a suitable one is free. |
---|
23 | This approach is insufficient since it wastes cycles and results in unfairness among waiting threads as a thread can acquire the first matching stall without regard to the waiting time of other threads. |
---|
24 | Waiting for the first appropriate stall (resource) that becomes available without spinning is an example of \gls{synch_multiplex}: the ability to wait synchronously for one or more resources based on some selection criteria. |
---|
25 | |
---|
26 | \section{History of Synchronous Multiplexing}\label{s:History} |
---|
27 | |
---|
28 | There is a history of tools that provide \gls{synch_multiplex}. |
---|
29 | Some well known \gls{synch_multiplex} tools include Unix system utilities: @select@~\cite{linux:select}, @poll@~\cite{linux:poll}, and @epoll@~\cite{linux:epoll}, and the @select@ statement provided by Go~\cite{go:selectref}, Ada~\cite[\S~9.7]{Ada16}, and \uC~\cite[\S~3.3.1]{uC++}. |
---|
30 | The concept and theory surrounding \gls{synch_multiplex} was introduced by Hoare in his 1985 book, Communicating Sequential Processes (CSP)~\cite{Hoare85}, |
---|
31 | \begin{quote} |
---|
32 | A communication is an event that is described by a pair $c.v$ where $c$ is the name of the channel on which the communication takes place and $v$ is the value of the message which passes.~\cite[p.~113]{Hoare85} |
---|
33 | \end{quote} |
---|
34 | The ideas in CSP were implemented by Roscoe and Hoare in the language Occam~\cite{Roscoe88}. |
---|
35 | |
---|
36 | Both CSP and Occam include the ability to wait for a \Newterm{choice} among receiver channels and \Newterm{guards} to toggle which receives are valid. |
---|
37 | For example, |
---|
38 | \begin{cfa}[mathescape] |
---|
39 | (@G1@(x) $\rightarrow$ P @|@ @G2@(y) $\rightarrow$ Q ) |
---|
40 | \end{cfa} |
---|
41 | waits for either channel @x@ or @y@ to have a value, if and only if guards @G1@ and @G2@ are true; |
---|
42 | if only one guard is true, only one channel receives, and if both guards are false, no receive occurs. |
---|
43 | % extended CSP with a \gls{synch_multiplex} construct @ALT@, which waits for one resource to be available and then executes a corresponding block of code. |
---|
44 | In detail, waiting for one resource out of a set of resources can be thought of as a logical exclusive-or over the set of resources. |
---|
45 | Guards are a conditional operator similar to an @if@, except they apply to the resource being waited on. |
---|
46 | If a guard is false, then the resource it guards is not in the set of resources being waited on. |
---|
47 | If all guards are false, the ALT, Occam's \gls{synch_multiplex} statement, does nothing and the thread continues. |
---|
48 | Guards can be simulated using @if@ statements as shown in~\cite[rule~2.4, p~183]{Roscoe88} |
---|
49 | \begin{lstlisting}[basicstyle=\rm,mathescape] |
---|
50 | ALT( $b$ & $g$ $P$, $G$ ) = IF ( $b$ ALT($\,g$ $P$, $G$ ), $\neg\,$b ALT( $G$ ) ) (boolean guard elim). |
---|
51 | \end{lstlisting} |
---|
52 | but require $2^N-1$ @if@ statements, where $N$ is the number of guards. |
---|
53 | The exponential blowup comes from applying rule 2.4 repeatedly, since it works on one guard at a time. |
---|
54 | Figure~\ref{f:wu_if} shows in \CFA an example of applying rule 2.4 for three guards. |
---|
55 | Also, notice the additional code duplication for statements @S1@, @S2@, and @S3@. |
---|
56 | |
---|
57 | \begin{figure} |
---|
58 | \centering |
---|
59 | \begin{lrbox}{\myboxA} |
---|
60 | \begin{cfa} |
---|
61 | when( G1 ) |
---|
62 | waituntil( R1 ) S1 |
---|
63 | or when( G2 ) |
---|
64 | waituntil( R2 ) S2 |
---|
65 | or when( G3 ) |
---|
66 | waituntil( R3 ) S3 |
---|
67 | |
---|
68 | |
---|
69 | |
---|
70 | |
---|
71 | |
---|
72 | |
---|
73 | |
---|
74 | \end{cfa} |
---|
75 | \end{lrbox} |
---|
76 | |
---|
77 | \begin{lrbox}{\myboxB} |
---|
78 | \begin{cfa} |
---|
79 | if ( G1 ) |
---|
80 | if ( G2 ) |
---|
81 | if ( G3 ) waituntil( R1 ) S1 or waituntil( R2 ) S2 or waituntil( R3 ) S3 |
---|
82 | else waituntil( R1 ) S1 or waituntil( R2 ) S2 |
---|
83 | else |
---|
84 | if ( G3 ) waituntil( R1 ) S1 or waituntil( R3 ) S3 |
---|
85 | else waituntil( R1 ) S1 |
---|
86 | else |
---|
87 | if ( G2 ) |
---|
88 | if ( G3 ) waituntil( R2 ) S2 or waituntil( R3 ) S3 |
---|
89 | else waituntil( R2 ) S2 |
---|
90 | else |
---|
91 | if ( G3 ) waituntil( R3 ) S3 |
---|
92 | \end{cfa} |
---|
93 | \end{lrbox} |
---|
94 | |
---|
95 | \subfloat[Guards]{\label{l:guards}\usebox\myboxA} |
---|
96 | \hspace*{5pt} |
---|
97 | \vrule |
---|
98 | \hspace*{5pt} |
---|
99 | \subfloat[Simulated Guards]{\label{l:simulated_guards}\usebox\myboxB} |
---|
100 | \caption{\CFA guard simulated with \lstinline{if} statement.} |
---|
101 | \label{f:wu_if} |
---|
102 | \end{figure} |
---|
103 | |
---|
104 | When discussing \gls{synch_multiplex} implementations, the resource being multiplexed is important. |
---|
105 | While CSP waits on channels, the earliest known implementation of synch\-ronous multiplexing is Unix's @select@~\cite{linux:select}, multiplexing over file descriptors, which conceptually differ from channels in name only. |
---|
106 | The @select@ system-call is passed three sets of file descriptors (read, write, exceptional) to wait on and an optional timeout. |
---|
107 | @select@ blocks until either some subset of file descriptors are available or the timeout expires. |
---|
108 | All file descriptors that are ready are returned by modifying the argument sets to only contain the ready descriptors. |
---|
109 | |
---|
110 | This early implementation differs from the theory presented in CSP: when the call from @select@ returns it may provide more than one ready file descriptor. |
---|
111 | As such, @select@ has logical-or multiplexing semantics, whereas the theory described exclusive-or semantics. |
---|
112 | It is possible to achieve exclusive-or semantics with @select@ by arbitrarily operating on only one of the returned descriptors. |
---|
113 | @select@ passes the interest set of file descriptors between application and kernel in the form of a worst-case sized bit-mask, where the worst-case is the largest numbered file descriptor. |
---|
114 | @poll@ reduces the size of the interest sets changing from a bit mask to a linked data structure, independent of the file-descriptor values. |
---|
115 | @epoll@ further reduces the data passed per call by keeping the interest set in the kernel, rather than supplying it on every call. |
---|
116 | |
---|
117 | These early \gls{synch_multiplex} tools interact directly with the operating system and others are used to communicate among processes. |
---|
118 | Later, \gls{synch_multiplex} started to appear in applications, via programming languages, to support fast multiplexed concurrent communication among threads. |
---|
119 | An early example of \gls{synch_multiplex} is the @select@ statement in Ada~\cite[\S~9.7]{Ichbiah79}. |
---|
120 | This @select@ allows a task object, with their own threads, to multiplex over a subset of asynchronous calls to its methods. |
---|
121 | The Ada @select@ has the same exclusive-or semantics and guards as Occam ALT; |
---|
122 | however, it multiplexes over methods rather than channels. |
---|
123 | |
---|
124 | \begin{figure} |
---|
125 | \begin{lstlisting}[language=ada,literate=,{moredelim={**[is][\color{red}]{@}{@}}}] |
---|
126 | task type buffer is -- thread type |
---|
127 | ... -- buffer declarations |
---|
128 | count : integer := 0; |
---|
129 | begin -- thread starts here |
---|
130 | loop |
---|
131 | select |
---|
132 | @when count < Size@ => -- guard |
---|
133 | @accept insert( elem : in ElemType )@ do -- method |
---|
134 | ... -- add to buffer |
---|
135 | count := count + 1; |
---|
136 | end; |
---|
137 | -- executed if this accept called |
---|
138 | or |
---|
139 | @when count > 0@ => -- guard |
---|
140 | @accept remove( elem : out ElemType )@ do -- method |
---|
141 | ... --remove and return from buffer via parameter |
---|
142 | count := count - 1; |
---|
143 | end; |
---|
144 | -- executed if this accept called |
---|
145 | or @delay 10.0@; -- unblock after 10 seconds without call |
---|
146 | or @else@ -- do not block, cannot appear with delay |
---|
147 | end select; |
---|
148 | end loop; |
---|
149 | end buffer; |
---|
150 | var buf : buffer; -- create task object and start thread in task body |
---|
151 | \end{lstlisting} |
---|
152 | \caption{Ada Bounded Buffer} |
---|
153 | \label{f:BB_Ada} |
---|
154 | \end{figure} |
---|
155 | |
---|
156 | Figure~\ref{f:BB_Ada} shows the outline of a bounded buffer implemented with an Ada task. |
---|
157 | Note that a task method is associated with the \lstinline[language=ada]{accept} clause of the \lstinline[language=ada]{select} statement, rather than being a separate routine. |
---|
158 | The thread executing the loop in the task body blocks at the \lstinline[language=ada]{select} until a call occurs to @insert@ or @remove@. |
---|
159 | Then the appropriate \lstinline[language=ada]{accept} method is run with the called arguments. |
---|
160 | Hence, the \lstinline[language=ada]{select} statement provides rendezvous points for threads, rather than providing channels with message passing. |
---|
161 | The \lstinline[language=ada]{select} statement also provides a timeout and @else@ (nonblocking), which changes synchronous multiplexing to asynchronous. |
---|
162 | Now the thread polls rather than blocks. |
---|
163 | |
---|
164 | Another example of programming-language \gls{synch_multiplex} is Go using a @select@ statement with channels~\cite{go:selectref}. |
---|
165 | Figure~\ref{l:BB_Go} shows the outline of a bounded buffer administrator implemented with a Go routine. |
---|
166 | Here two channels are used for inserting and removing by client producers and consumers, respectively. |
---|
167 | (The @done@ channel is used to synchronize with the program main.) |
---|
168 | Go's @select@ has the same exclusive-or semantics as the ALT primitive from Occam and associated code blocks for each clause like ALT and Ada. |
---|
169 | However, unlike Ada and ALT, Go does not provide guards for the \lstinline[language=go]{case} clauses of the \lstinline[language=go]{select}. |
---|
170 | As such, the exponential blowup can be seen comparing Go and \uC in Figure~\ref{f:AdaMultiplexing}. |
---|
171 | Go also provides a timeout via a channel and a @default@ clause like Ada @else@ for asynchronous multiplexing. |
---|
172 | |
---|
173 | \begin{figure} |
---|
174 | \centering |
---|
175 | |
---|
176 | \begin{lrbox}{\myboxA} |
---|
177 | \begin{lstlisting}[language=go,literate=,{moredelim={**[is][\color{red}]{@}{@}}}] |
---|
178 | insert := make( chan int ) |
---|
179 | remove := make( chan * int ) |
---|
180 | buffer := make( chan int Size ) |
---|
181 | done := make( chan int ) |
---|
182 | count := 0 |
---|
183 | func in_buf( int val ) { |
---|
184 | buffer <- val |
---|
185 | count++ |
---|
186 | } |
---|
187 | func out_buf( int * ptr ) { |
---|
188 | *ptr := <-buffer |
---|
189 | count-- |
---|
190 | } |
---|
191 | func BoundedBuffer { |
---|
192 | L: for { |
---|
193 | if ( count < Size && count > 0 ) { |
---|
194 | select { // wait for message |
---|
195 | @case i := <- insert: in_buf( i )@ |
---|
196 | @case p := <- remove: out_buf( p )@ |
---|
197 | case <- done: break L |
---|
198 | } |
---|
199 | } else if ( count < Size ) { |
---|
200 | select { // wait for message |
---|
201 | @case i := <- insert: in_buf( i )@ |
---|
202 | case <- done: break L |
---|
203 | } |
---|
204 | } else ( count > 0 ) { |
---|
205 | select { // wait for message |
---|
206 | @case p := <- remove: out_buf( p )@ |
---|
207 | case <- done: break L; |
---|
208 | } |
---|
209 | } |
---|
210 | } |
---|
211 | done <- 0 |
---|
212 | } |
---|
213 | func main() { |
---|
214 | go BoundedBuffer() // start administrator |
---|
215 | } |
---|
216 | \end{lstlisting} |
---|
217 | \end{lrbox} |
---|
218 | |
---|
219 | \begin{lrbox}{\myboxB} |
---|
220 | \begin{lstlisting}[language=uC++,{moredelim={**[is][\color{red}]{@}{@}}}] |
---|
221 | |
---|
222 | |
---|
223 | |
---|
224 | |
---|
225 | |
---|
226 | |
---|
227 | |
---|
228 | |
---|
229 | |
---|
230 | |
---|
231 | |
---|
232 | |
---|
233 | _Task BoundedBuffer { |
---|
234 | int * buffer; |
---|
235 | int front = back = count = 0; |
---|
236 | public: |
---|
237 | // ... constructor implementation |
---|
238 | void insert( int elem ) { |
---|
239 | buffer[front] = elem; |
---|
240 | front = ( front + 1 ) % Size; |
---|
241 | count += 1; |
---|
242 | } |
---|
243 | int remove() { |
---|
244 | int ret = buffer[back]; |
---|
245 | back = ( back + 1 ) % Size; |
---|
246 | count -= 1; |
---|
247 | return ret; |
---|
248 | } |
---|
249 | private: |
---|
250 | void main() { |
---|
251 | for ( ;; ) { |
---|
252 | _Accept( ~buffer ) break; |
---|
253 | @or _When ( count < Size ) _Accept( insert )@; |
---|
254 | @or _When ( count > 0 ) _Accept( remove )@; |
---|
255 | } |
---|
256 | } |
---|
257 | }; |
---|
258 | buffer buf; // start thread in main method |
---|
259 | \end{lstlisting} |
---|
260 | \end{lrbox} |
---|
261 | |
---|
262 | \subfloat[Go]{\label{l:BB_Go}\usebox\myboxA} |
---|
263 | \hspace*{5pt} |
---|
264 | \vrule |
---|
265 | \hspace*{5pt} |
---|
266 | \subfloat[\uC]{\label{l:BB_uC++}\usebox\myboxB} |
---|
267 | |
---|
268 | \caption{Bounded Buffer} |
---|
269 | \label{f:AdaMultiplexing} |
---|
270 | \end{figure} |
---|
271 | |
---|
272 | Finally, \uC provides \gls{synch_multiplex} with Ada-style @select@ over monitor and task methods with the @_Accept@ statement~\cite[\S~2.9.2.1]{uC++}, and over futures with the @_Select@ statement~\cite[\S~3.3.1]{uC++}. |
---|
273 | The @_Select@ statement extends the ALT/Go @select@ by offering both @and@ and @or@ semantics, which can be used together in the same statement. |
---|
274 | Both @_Accept@ and @_Select@ statements provide guards for multiplexing clauses, as well as, timeout, and @else@ clauses. |
---|
275 | Figure~\ref{l:BB_uC++} shows the outline of a bounded buffer administrator implemented with \uC @_Accept@ statements. |
---|
276 | |
---|
277 | There are other languages that provide \gls{synch_multiplex}, including Rust's @select!@ over futures~\cite{rust:select}, Java's @allof@/@anyof@ over futures~\cite{java:allof:anyof}, OCaml's @select@ over channels~\cite{ocaml:channel}, and C++14's @when_any@ over futures~\cite{cpp:whenany}. |
---|
278 | Note that while C++14 and Rust provide \gls{synch_multiplex}, the implementations leave much to be desired as both rely on polling to wait on multiple resources. |
---|
279 | |
---|
280 | \section{Other Approaches to Synchronous Multiplexing} |
---|
281 | |
---|
282 | To avoid the need for \gls{synch_multiplex}, all communication among threads/processes must come from a single source. |
---|
283 | For example, in Erlang each process has a single heterogeneous mailbox that is the sole source of concurrent communication, removing the need for \gls{synch_multiplex} as there is only one place to wait on resources. |
---|
284 | Similar, actor systems circumvent the \gls{synch_multiplex} problem as actors only block when waiting for the next message never in a behaviour. |
---|
285 | While these approaches solve the \gls{synch_multiplex} problem, they introduce other issues. |
---|
286 | Consider the case where a thread has a single source of communication and it wants a set of $N$ resources. |
---|
287 | It must sequentially request the $N$ resources and wait for each response. |
---|
288 | During the receives for the $N$ resources, it can receive other communication, and has to save and postpone these communications, or discard them. |
---|
289 | % If the requests for the other resources need to be retracted, the burden falls on the programmer to determine how to synchronize appropriately to ensure that only one resource is delivered. |
---|
290 | |
---|
291 | \section{\CFA's Waituntil Statement} |
---|
292 | |
---|
293 | The new \CFA \gls{synch_multiplex} utility introduced in this work is the @waituntil@ statement. |
---|
294 | There already exists a @waitfor@ statement in \CFA that supports Ada-style \gls{synch_multiplex} over monitor methods~\cite{Delisle21}, so this @waituntil@ focuses on synchronizing over other resources. |
---|
295 | All of the \gls{synch_multiplex} features mentioned so far are monomorphic, only waiting on one kind of resource: Unix @select@ supports file descriptors, Go's @select@ supports channel operations, \uC's @select@ supports futures, and Ada's @select@ supports monitor method calls. |
---|
296 | The \CFA @waituntil@ is polymorphic and provides \gls{synch_multiplex} over any objects that satisfy the trait in Figure~\ref{f:wu_trait}. |
---|
297 | No other language provides a synchronous multiplexing tool polymorphic over resources like \CFA's @waituntil@. |
---|
298 | |
---|
299 | \begin{figure} |
---|
300 | \begin{cfa} |
---|
301 | forall(T & | sized(T)) |
---|
302 | trait is_selectable { |
---|
303 | // For registering a waituntil stmt on a selectable type |
---|
304 | bool register_select( T &, select_node & ); |
---|
305 | |
---|
306 | // For unregistering a waituntil stmt from a selectable type |
---|
307 | bool unregister_select( T &, select_node & ); |
---|
308 | |
---|
309 | // on_selected is run on the selecting thread prior to executing |
---|
310 | // the statement associated with the select_node |
---|
311 | bool on_selected( T &, select_node & ); |
---|
312 | }; |
---|
313 | \end{cfa} |
---|
314 | \caption{Trait for types that can be passed into \CFA's \lstinline{waituntil} statement.} |
---|
315 | \label{f:wu_trait} |
---|
316 | \end{figure} |
---|
317 | |
---|
318 | Currently locks, channels, futures and timeouts are supported by the @waituntil@ statement, and this set can be expanded through the @is_selectable@ trait as other use-cases arise. |
---|
319 | The @waituntil@ statement supports guard clauses, both @or@ and @and@ semantics, and timeout and @else@ for asynchronous multiplexing. |
---|
320 | Figure~\ref{f:wu_example} shows a \CFA @waituntil@ usage, which is waiting for either @Lock@ to be available \emph{or} for a value to be read from @Channel@ into @i@ \emph{and} for @Future@ to be fulfilled \emph{or} a timeout of one second. |
---|
321 | Note that the expression inside a @waituntil@ clause is evaluated once at the start of the @waituntil@ algorithm. |
---|
322 | |
---|
323 | \section{Waituntil Semantics} |
---|
324 | |
---|
325 | The @waituntil@ semantics has two parts: the semantics of the statement itself, \ie @and@, @or@, @when@ guards, and @else@ semantics, and the semantics of how the @waituntil@ interacts with types like locks, channels, and futures. |
---|
326 | |
---|
327 | \subsection{Statement Semantics} |
---|
328 | |
---|
329 | The @or@ semantics are the most straightforward and nearly match those laid out in the ALT statement from Occam. |
---|
330 | The clauses have an exclusive-or relationship where the first available one is run and only one clause is run. |
---|
331 | \CFA's @or@ semantics differ from ALT semantics: instead of randomly picking a clause when multiple are available, the first clause in the @waituntil@ that is available is executed. |
---|
332 | For example, in the following example, if @foo@ and @bar@ are both available, @foo@ is always selected since it comes first in the order of @waituntil@ clauses. |
---|
333 | \begin{cfa} |
---|
334 | future(int) bar, foo; |
---|
335 | waituntil( foo ) { ... } or waituntil( bar ) { ... } // prioritize foo |
---|
336 | \end{cfa} |
---|
337 | The reason for these semantics is that prioritizing resources can be useful in certain problems, such as shutdown. |
---|
338 | In the rare case where there is a starvation problem with the ordering, it possible to follow a @waituntil@ with its reverse form, alternating which resource has the highest priority: |
---|
339 | \begin{cfa} |
---|
340 | waituntil( foo ) { ... } or waituntil( bar ) { ... } // prioritize foo |
---|
341 | waituntil( bar ) { ... } or waituntil( foo ) { ... } // prioritize bar |
---|
342 | \end{cfa} |
---|
343 | While this approach is not general for many resources, it handles many basic cases. |
---|
344 | |
---|
345 | \begin{figure} |
---|
346 | \begin{cfa} |
---|
347 | future(int) Future; |
---|
348 | channel(int) Channel; |
---|
349 | owner_lock Lock; |
---|
350 | int i = 0; |
---|
351 | |
---|
352 | waituntil( @Lock@ ) { ... } |
---|
353 | or when( i == 0 ) waituntil( i << @Channel@ ) { ... } |
---|
354 | and waituntil( @Future@ ) { ... } |
---|
355 | or waituntil( @timeout( 1`s )@ ) { ... } |
---|
356 | // else { ... } |
---|
357 | \end{cfa} |
---|
358 | \caption{Example of \CFA's waituntil statement} |
---|
359 | \label{f:wu_example} |
---|
360 | \end{figure} |
---|
361 | |
---|
362 | The \CFA @and@ semantics match the @and@ semantics of \uC \lstinline[language=uC++]{_Select}. |
---|
363 | When multiple clauses are joined by @and@, the @waituntil@ makes a thread wait for all to be available, but still runs the corresponding code blocks \emph{as they become available}. |
---|
364 | When an @and@ clause becomes available, the waiting thread unblocks and runs that clause's code-block, and then the thread waits again for the next available clause or the @waituntil@ statement is now satisfied. |
---|
365 | This semantics allows work to be done in parallel while synchronizing over a set of resources, and furthermore, gives a good reason to use the @and@ operator. |
---|
366 | If the @and@ operator waited for all clauses to be available before running, it is the same as just acquiring those resources consecutively by a sequence of @waituntil@ statements. |
---|
367 | |
---|
368 | As with normal C expressions, the @and@ operator binds more tightly than the @or@. |
---|
369 | To give an @or@ operator higher precedence, parenthesis are used. |
---|
370 | For example, the following @waituntil@ unconditionally waits for @C@ and one of either @A@ or @B@, since the @or@ is given higher precedence via parenthesis. |
---|
371 | \begin{cfa} |
---|
372 | @(@ waituntil( A ) { ... } // bind tightly to or |
---|
373 | or waituntil( B ) { ... } @)@ |
---|
374 | and waituntil( C ) { ... } |
---|
375 | \end{cfa} |
---|
376 | |
---|
377 | The guards in the @waituntil@ statement are called @when@ clauses. |
---|
378 | Each boolean expression inside a @when@ is evaluated \emph{once} before the @waituntil@ statement is run. |
---|
379 | Like Occam's ALT, the guards toggle clauses on and off, where a @waituntil@ clause is only evaluated and waited on if the corresponding guard is @true@. |
---|
380 | In addition, the @waituntil@ guards require some nuance since both @and@ and @or@ operators are supported \see{Section~\ref{s:wu_guards}}. |
---|
381 | When a guard is false and a clause is removed, it can be thought of as removing that clause and its preceding operation from the statement. |
---|
382 | For example, in the following, the two @waituntil@ statements are semantically equivalent. |
---|
383 | |
---|
384 | \begin{lrbox}{\myboxA} |
---|
385 | \begin{cfa} |
---|
386 | when( true ) waituntil( A ) { ... } |
---|
387 | or when( false ) waituntil( B ) { ... } |
---|
388 | and waituntil( C ) { ... } |
---|
389 | \end{cfa} |
---|
390 | \end{lrbox} |
---|
391 | |
---|
392 | \begin{lrbox}{\myboxB} |
---|
393 | \begin{cfa} |
---|
394 | waituntil( A ) { ... } |
---|
395 | and waituntil( C ) { ... } |
---|
396 | |
---|
397 | \end{cfa} |
---|
398 | \end{lrbox} |
---|
399 | |
---|
400 | \begin{tabular}{@{}lcl@{}} |
---|
401 | \usebox\myboxA & $\equiv$ & \usebox\myboxB |
---|
402 | \end{tabular} |
---|
403 | |
---|
404 | The @else@ clause on the @waituntil@ has identical semantics to the @else@ clause in Ada. |
---|
405 | If all resources are not immediately available and there is an @else@ clause, the @else@ clause is run and the thread continues. |
---|
406 | |
---|
407 | \subsection{Type Semantics} |
---|
408 | |
---|
409 | As mentioned, to support interaction with the @waituntil@ statement a type must support the trait in Figure~\ref{f:wu_trait}. |
---|
410 | The @waituntil@ statement expects types to register and unregister themselves via calls to @register_select@ and @unregister_select@, respectively. |
---|
411 | When a resource becomes available, @on_selected@ is run, and if it returns false, the corresponding code block is not run. |
---|
412 | Many types do not need @on_selected@, but it is provided if a type needs to perform work or checks before the resource can be accessed in the code block. |
---|
413 | The register/unregister routines in the trait also return booleans. |
---|
414 | The return value of @register_select@ is @true@, if the resource is immediately available and @false@ otherwise. |
---|
415 | The return value of @unregister_select@ is @true@, if the corresponding code block should be run after unregistration and @false@ otherwise. |
---|
416 | The routine @on_selected@ and the return value of @unregister_select@ are needed to support channels as a resource. |
---|
417 | More detail on channels and their interaction with @waituntil@ appear in Section~\ref{s:wu_chans}. |
---|
418 | |
---|
419 | The trait can be used directly by having a blocking object support the @is_selectable@ trait, or it can be used indirectly through routines that take the object as an argument. |
---|
420 | When used indirectly, the object's routine returns a type that supports the @is_selectable@ trait. |
---|
421 | This feature leverages \CFA's ability to overload on return type to select the correct overloaded routine for the @waituntil@ context. |
---|
422 | Indirect support through routines is needed for types that want to support multiple operations such as channels that allow both reading and writing. |
---|
423 | |
---|
424 | \section{\lstinline{waituntil} Implementation} |
---|
425 | |
---|
426 | The @waituntil@ statement is not inherently complex, and Figure~\ref{f:WU_Impl} shows the basic outline of the @waituntil@ algorithm. |
---|
427 | Complexity comes from the consideration of race conditions and synchronization needed when supporting various primitives. |
---|
428 | The following sections use examples to fill in complexity details missing in Figure~\ref{f:WU_Impl}. |
---|
429 | After which, the full pseudocode for the @waituntil@ algorithm is presented in Figure~\ref{f:WU_Full_Impl}. |
---|
430 | |
---|
431 | \begin{figure} |
---|
432 | \begin{cfa} |
---|
433 | select_nodes s[N]; $\C[3.25in]{// declare N select nodes}$ |
---|
434 | for ( node in s ) $\C{// register nodes}$ |
---|
435 | register_select( resource, node ); |
---|
436 | while ( statement predicate not satisfied ) { $\C{// check predicate}$ |
---|
437 | // block until clause(s) satisfied |
---|
438 | for ( resource in waituntil statement ) { $\C{// run true code blocks}$ |
---|
439 | if ( resource is avail ) run code block |
---|
440 | if ( statement predicate is satisfied ) break; |
---|
441 | } |
---|
442 | } |
---|
443 | for ( node in s ) $\C{// deregister nodes}\CRT$ |
---|
444 | if ( unregister_select( resource, node ) ) run code block |
---|
445 | \end{cfa} |
---|
446 | \caption{\lstinline{waituntil} Implementation} |
---|
447 | \label{f:WU_Impl} |
---|
448 | \end{figure} |
---|
449 | |
---|
450 | The basic steps of the algorithm are: |
---|
451 | \begin{enumerate} |
---|
452 | \item |
---|
453 | The @waituntil@ statement declares $N$ @select_node@s, one per resource that is being waited on, which stores any @waituntil@ data pertaining to that resource. |
---|
454 | |
---|
455 | \item |
---|
456 | Each @select_node@ is then registered with the corresponding resource. |
---|
457 | |
---|
458 | \item |
---|
459 | The thread executing the @waituntil@ then loops until the statement's predicate is satisfied. |
---|
460 | In each iteration, if the predicate is unsatisfied, the @waituntil@ thread blocks. |
---|
461 | When another thread satisfies a resource clause (\eg sends to a channel), it unblocks the @waituntil@ thread. |
---|
462 | This thread checks all clauses for completion, and any completed clauses have their code blocks run. |
---|
463 | While checking clause completion, if enough clauses have been run such that the statement predicate is satisfied, the loop exits early. |
---|
464 | |
---|
465 | \item |
---|
466 | Once the thread escapes the loop, the @select_nodes@ are unregistered from the resources. |
---|
467 | \end{enumerate} |
---|
468 | These steps give a basic overview of how the statement works. |
---|
469 | The following sections shed light on the specific changes and provide more implementation detail. |
---|
470 | |
---|
471 | \subsection{Locks}\label{s:wu_locks} |
---|
472 | |
---|
473 | The \CFA runtime supports a number of spinning and blocking locks, \eg semaphore, MCS, futex, Go mutex, spinlock, owner, \etc. |
---|
474 | Many of these locks satisfy the @is_selectable@ trait, and hence, are resources supported by the @waituntil@ statement. |
---|
475 | For example, the following waits until the thread has acquired lock @l1@ or locks @l2@ and @l3@. |
---|
476 | \begin{cfa} |
---|
477 | owner_lock l1, l2, l3; |
---|
478 | waituntil ( l1 ) { ... } |
---|
479 | or waituntil( l2 ) { ... } |
---|
480 | and waituntil( l3 ) { ... } |
---|
481 | \end{cfa} |
---|
482 | Implicitly, the @waituntil@ is calling the lock acquire for each of these locks to establish a position in the lock's queue of waiting threads. |
---|
483 | When the lock schedules this thread, it unblocks and runs the code block associated with the lock and then releases the lock. |
---|
484 | |
---|
485 | In detail, when a thread waits on multiple locks via a @waituntil@, it enqueues a @select_node@ in each of the lock's waiting queues. |
---|
486 | When a @select_node@ reaches the front of the lock's queue and gains ownership, the thread blocked on the @waituntil@ is unblocked. |
---|
487 | Now, the lock is held by the @waituntil@ thread until the code block is executed, and then the node is unregistered, during which the lock is released. |
---|
488 | Immediately releasing the lock after the code block prevents the waiting thread from holding multiple locks and potentially introducing a deadlock. |
---|
489 | As such, the only unregistered nodes associated with locks are the ones that have not run. |
---|
490 | |
---|
491 | \subsection{Timeouts} |
---|
492 | |
---|
493 | A timeout for the @waituntil@ statement is a duration passed to routine \lstinline[deletekeywords={timeout}]{timeout}\footnote{\lstinline{timeout} is a quasi-keyword in \CFA, allowing it to be used an identifier.}, \eg: |
---|
494 | \begin{cquote} |
---|
495 | \begin{tabular}{@{}l|l@{}} |
---|
496 | \multicolumn{2}{@{}l@{}}{\lstinline{Duration D1\{ 1`ms \}, D2\{ 2`ms \}, D3\{ 3`ms \};}} \\ |
---|
497 | \begin{cfa}[deletekeywords={timeout}] |
---|
498 | waituntil( i << C1 ) {} |
---|
499 | or waituntil( i << C2 ) {} |
---|
500 | or waituntil( i << C3 ) {} |
---|
501 | or waituntil( timeout( D1 ) ) {} |
---|
502 | or waituntil( timeout( D2 ) ) {} |
---|
503 | or waituntil( timeout( D3 ) ) {} |
---|
504 | \end{cfa} |
---|
505 | & |
---|
506 | \begin{cfa}[deletekeywords={timeout}] |
---|
507 | waituntil( i << C1 ) {} |
---|
508 | or waituntil( i << C2 ) {} |
---|
509 | or waituntil( i << C3 ) {} |
---|
510 | or waituntil( timeout( min( D1, D2, D3 ) ) ) {} |
---|
511 | |
---|
512 | |
---|
513 | \end{cfa} |
---|
514 | \end{tabular} |
---|
515 | \end{cquote} |
---|
516 | These two examples are basically equivalent. |
---|
517 | Here, the multiple timeouts are useful because the durations can change during execution and the separate clauses provide different code blocks if a timeout triggers. |
---|
518 | Multiple timeouts can also be used with @and@ to provide a minimal delay before proceeding. |
---|
519 | In following example, either channel @C1@ or @C2@ must be satisfied but nothing can be done for at least 1 or 3 seconds after the channel read, respectively. |
---|
520 | \begin{cfa}[deletekeywords={timeout}] |
---|
521 | waituntil( i << C1 ){} and waituntil( timeout( 1`s ) ){} |
---|
522 | or waituntil( i << C2 ){} and waituntil( timeout( 3`s ) ){} |
---|
523 | \end{cfa} |
---|
524 | If only @C2@ is satisfied, \emph{both} timeout code-blocks trigger because 1 second occurs before 3 seconds. |
---|
525 | Note that the \CFA @waitfor@ statement only provides a single @timeout@ clause because it only supports @or@ semantics. |
---|
526 | |
---|
527 | The \lstinline[deletekeywords={timeout}]{timeout} routine is different from UNIX @sleep@, which blocks for the specified duration and returns the amount of time elapsed since the call started. |
---|
528 | Instead, \lstinline[deletekeywords={timeout}]{timeout} returns a type that supports the @is_selectable@ trait, allowing the type system to select the correct overloaded routine for this context. |
---|
529 | For the @waituntil@, it is more idiomatic for the \lstinline[deletekeywords={timeout}]{timeout} to use the same syntax as other blocking operations instead of having a special language clause. |
---|
530 | |
---|
531 | \subsection{Channels}\label{s:wu_chans} |
---|
532 | |
---|
533 | Channels require more complexity to allow synchronous multiplexing. |
---|
534 | For locks, when an outside thread releases a lock and unblocks the @waituntil@ thread (WUT), the lock's MX property is passed to the WUT (no spinning locks). |
---|
535 | For futures, the outside thread deliveries a value to the future and unblocks any waiting threads, including WUTs. |
---|
536 | In either case, after the WUT unblocks, it is safe to execute its corresponding code block knowing access to the resource is protected by the lock or the read-only state of the future. |
---|
537 | Similarly, for channels, when an outside thread inserts a value into a channel, it must unblock the WUT. |
---|
538 | However, for channels, there is a race condition that poses an issue. |
---|
539 | If the outside thread inserts into the channel and unblocks the WUT, there is a race where another thread can remove the channel data, so after the WUT unblocks and attempts to remove from the buffer, it fails, and the WUT must reblock (busy waiting). |
---|
540 | This scenario is a \gls{toctou} race that needs to be consolidated. |
---|
541 | To close the race, the outside thread must detect this case and insert directly into the left-hand side of the channel expression (@i << chan@) rather than into the channel, and then unblock the WUT. |
---|
542 | Now the unblocked WUT is guaranteed to have a satisfied resource and its code block can safely executed. |
---|
543 | The insertion circumvents the channel buffer via the wait-morphing in the \CFA channel implementation \see{Section~\ref{s:chan_impl}}, allowing @waituntil@ channel unblocking to not be special-cased. |
---|
544 | Note that all channel operations are fair and no preference is given between @waituntil@ and direct channel operations when unblocking. |
---|
545 | |
---|
546 | Furthermore, if both @and@ and @or@ operators are used, the @or@ operations stop behaving like exclusive-or due to the race among channel operations, \eg: |
---|
547 | \begin{cfa} |
---|
548 | int i; |
---|
549 | waituntil( i << A ) {} and waituntil( i << B ) {} |
---|
550 | or waituntil( i << C ) {} and waituntil( i << D ) {} |
---|
551 | \end{cfa} |
---|
552 | If exclusive-or semantics are followed, only the code blocks for @A@ and @B@ are run, or the code blocks for @C@ and @D@. |
---|
553 | However, four outside threads inserting into each channel can simultaneously put values into @i@ and attempt to unblock the WUT to run the four code-blocks. |
---|
554 | This case introduces a race with complexity that increases with the size of the @waituntil@ statement. |
---|
555 | However, due to TOCTOU issues, it is impossible to know if all resources are available without acquiring all the internal locks of channels in the subtree of the @waituntil@ clauses. |
---|
556 | This approach is a poor solution for two reasons. |
---|
557 | It is possible that once all the locks are acquired, the subtree is not satisfied and the locks must be released. |
---|
558 | This work incurs a high cost for signalling threads and heavily increase contention on internal channel locks. |
---|
559 | Furthermore, the @waituntil@ statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible. |
---|
560 | As such, the exclusive-or semantics are lost when using both @and@ and @or@ operators since it cannot be supported without significant complexity and significantly affects @waituntil@ performance. |
---|
561 | Therefore, the example of reusing variable @i@ by multiple output channels is considered a user error without exclusive-or semantics. |
---|
562 | Given aliasing in C, it is impossible to even warn of the potential race. |
---|
563 | In the future, it would be interesting to support Go-like syntax, \lstinline[language=Go]{case i := <- ...}, defining a new scoped @i@ variable for each clause. |
---|
564 | |
---|
565 | It was deemed important that exclusive-or semantics are maintained when only @or@ operators are used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel. |
---|
566 | Consider the following example where thread 1 is reading and threads 2 and 3 are writing to channels @A@ and @B@ concurrently. |
---|
567 | \begin{cquote} |
---|
568 | \begin{tabular}{@{}l|l|l@{}} |
---|
569 | \multicolumn{3}{@{}l@{}}{\lstinline{channel(int) A, B; // zero size channels}} \\ |
---|
570 | thread 1 & thread 2 & thread 3 \\ |
---|
571 | \begin{cfa} |
---|
572 | waituntil( i << A ) {} |
---|
573 | or waituntil( i << B ) {} |
---|
574 | \end{cfa} |
---|
575 | & |
---|
576 | \begin{cfa} |
---|
577 | A << 1; |
---|
578 | |
---|
579 | \end{cfa} |
---|
580 | & |
---|
581 | \begin{cfa} |
---|
582 | B << 2; |
---|
583 | |
---|
584 | \end{cfa} |
---|
585 | \end{tabular} |
---|
586 | \end{cquote} |
---|
587 | For thread 1 to have exclusive-or semantics, it must only consume from exactly one of @A@ or @B@. |
---|
588 | As such, thread 2 and 3 must race to establish the winning clause of the @waituntil@ in thread 1. |
---|
589 | This race is consolidated by thread 2 and 3 each attempting to set a pointer to the winning clause's @select_node@ address using \gls{cas}. |
---|
590 | The winner bypasses the channel and inserts into the WUT's left-hand, and signals thread 1. |
---|
591 | The loser continues checking if there is space in the channel, and if so performs the channel insert operation with a possible signal of a waiting remove thread; |
---|
592 | otherwise, if there is no space, the loser blocks. |
---|
593 | It is important the race occurs \emph{before} operating on the channel, because channel actions are different with respect to each thread. |
---|
594 | If the race was consolidated after the operation, both thread 2 and 3 could potentially write into @i@ concurrently. |
---|
595 | |
---|
596 | Channels introduce another interesting implementation issue. |
---|
597 | Supporting both reading and writing to a channel in a @waituntil@ means that one @waituntil@ clause may be the notifier of another @waituntil@ clause. |
---|
598 | This conjunction poses a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel. |
---|
599 | Consider the following example, alongside a described ordering of events to highlight the race. |
---|
600 | \begin{cquote} |
---|
601 | \begin{tabular}{@{}l|l@{}} |
---|
602 | \multicolumn{2}{@{}l@{}}{\lstinline{channel(int) A, B; // zero size channels}} \\ |
---|
603 | thread 1 & thread 2 \\ |
---|
604 | \begin{cfa}[moredelim={**[is][\color{blue}]{\#}{\#}}] |
---|
605 | waituntil( @i << A@ ) {} |
---|
606 | or waituntil( #i << B# ) {} |
---|
607 | \end{cfa} |
---|
608 | & |
---|
609 | \begin{cfa}[moredelim={**[is][\color{blue}]{\#}{\#}}] |
---|
610 | waituntil( #B << 2# ) {} |
---|
611 | or waituntil( @A << 1@ ) {} |
---|
612 | \end{cfa} |
---|
613 | \end{tabular} |
---|
614 | \end{cquote} |
---|
615 | Assume thread 1 executes first, registers with channel @A@ and proceeds in the @waituntil@. |
---|
616 | Since @A@ is empty, thread 1 cannot remove, and then thread 1 is interrupted before registering with @B@. |
---|
617 | Thread 2 similarly registers with channel @B@, and proceeds in the @waituntil@. |
---|
618 | Since @B@ is zero size there is no space to insert, and then thread 2 is interrupted before registering with @A@. |
---|
619 | At this point, thread 1 and 2 resume execution. |
---|
620 | Remember from above, each exclusive-or @waituntil@ holds a race to set the winning clause of the statement. |
---|
621 | The issue that arises is that these two @waituntil@ statements must have matching winning clauses (both @A@ clauses or both @B@ clauses) to preserve the exclusive-or semantics, since a zero-sized channel needs an insert/remove pair for an operation to occur. |
---|
622 | If threads 1 and 2 race to set a winner only in their own @waituntil@, thread 1 can think it successfully removed from @B@, and thread 2 can think it successfully inserted into @A@, which is an error. |
---|
623 | Hence, there is a race on two fronts. |
---|
624 | If thread 1 wins the race and sees that @B@ has a waiting insertion, then thread 2 must execute the first clause of its @waituntil@ and thread 1 execute its second. |
---|
625 | Correspondingly, if thread 2 wins the race and sees that @A@ has a waiting removal, then thread 1 must execute the first clause of its @waituntil@ and thread 2 execute its second. |
---|
626 | Any other execution scenario is incorrect for exclusive-or semantics. |
---|
627 | Note that priority execution of multiple satisfied @waituntil@ causes (\ie top to bottom) is not violated because, in this scenario, there is only one satisfied clause for either thread. |
---|
628 | |
---|
629 | The Go @select@ solves this problem by acquiring all the internal locks of the channels before registering the @select@ on the channels. |
---|
630 | This approach eliminates the race shown above since thread 1 and 2 cannot both be registering at the same time. |
---|
631 | However, this approach cannot be used in \CFA, since the @waituntil@ is polymorphic. |
---|
632 | Not all types in a @waituntil@ have an internal lock, and when using non-channel types, acquiring all the locks incurs extra unneeded overhead. |
---|
633 | Instead, this race is consolidated in \CFA in two phases by having an intermediate pending status value for the race. |
---|
634 | This race case is detectable, and if detected, each thread first races to set its own @waituntil@ race pointer to be pending. |
---|
635 | If it succeeds, it then attempts to set the other thread's @waituntil@ race pointer to its success value. |
---|
636 | If either thread successfully sets the the other thread's @waituntil@ race pointer, then the operation can proceed, if not the signalling threads set its own race pointer back to the initial value and repeats. |
---|
637 | This retry mechanism can potentially introduce a livelock, but in practice a livelock here is highly unlikely. |
---|
638 | Furthermore, the likelihood of a livelock here is zero unless the program is in the niche case of having two or more exclusive-or @waituntil@s with two or more clauses in reverse order of priority. |
---|
639 | This livelock case can be fully eliminated using locks like Go, or if a \gls{dcas} instruction is available. |
---|
640 | If any other threads attempt to set a WUT's race pointer and see a pending value, they wait until the value changes before proceeding to ensure that, in the case the WUT fails, the signal is not lost. |
---|
641 | This protocol ensures that signals cannot be lost and that the two races can be resolved in a safe manner. |
---|
642 | The implementation of this protocol is shown in Figure~\ref{f:WU_DeadlockAvoidance}. |
---|
643 | |
---|
644 | \begin{figure} |
---|
645 | \begin{cfa} |
---|
646 | bool pending_set_other( select_node & other, select_node & mine ) { |
---|
647 | unsigned long int cmp_status = UNSAT; |
---|
648 | |
---|
649 | // Try to set other status, if we succeed break and return true |
---|
650 | while( ! CAS( other.clause_status, &cmp_status, SAT ) ) { |
---|
651 | if ( cmp_status == SAT ) |
---|
652 | return false; // If other status is SAT we lost so return false |
---|
653 | |
---|
654 | // Toggle own status flag to allow other thread to potentially win |
---|
655 | mine.status = UNSAT; |
---|
656 | |
---|
657 | // Reset compare flag |
---|
658 | cmp_status = UNSAT; |
---|
659 | |
---|
660 | // Attempt to set own status flag back to PENDING to retry |
---|
661 | if ( ! CAS( mine.clause_status, &cmp_status, PENDING ) ) |
---|
662 | return false; // If we fail then we lost so return false |
---|
663 | |
---|
664 | // Reset compare flag |
---|
665 | cmp_status = UNSAT; |
---|
666 | } |
---|
667 | return true; |
---|
668 | } |
---|
669 | \end{cfa} |
---|
670 | \caption{Exclusive-or \lstinline{waituntil} channel deadlock avoidance protocol} |
---|
671 | \label{f:WU_DeadlockAvoidance} |
---|
672 | \end{figure} |
---|
673 | |
---|
674 | Channels in \CFA have exception-based shutdown mechanisms that the @waituntil@ statement needs to support. |
---|
675 | These exception mechanisms are supported through the @on_selected@ routine. |
---|
676 | This routine is needed by channels to detect if they are closed after unblocking in a @waituntil@ statement, to ensure the appropriate behaviour is taken and an exception is thrown. |
---|
677 | Hence, the channel close-down mechanism is handled correctly. |
---|
678 | |
---|
679 | \subsection{Guards and Statement Predicate}\label{s:wu_guards} |
---|
680 | |
---|
681 | It is trivial to check when a synchronous multiplexing utility is done for the or/xor relationship, since any resource becoming available means that the blocked thread can proceed and the @waituntil@ statement is finished. |
---|
682 | In \uC and \CFA, the \gls{synch_multiplex} mechanism have both an and/or relationship, which along with guards, make the problem of checking for completion of the statement difficult. |
---|
683 | Consider the following @waituntil@. |
---|
684 | \begin{cfa} |
---|
685 | when( GA ) waituntil( A ) {} |
---|
686 | and when( GB ) waituntil( B ) {} |
---|
687 | or when( GC ) waituntil( C ) {} |
---|
688 | \end{cfa} |
---|
689 | When the @waituntil@ thread wakes up, the following predicate represents satisfaction: |
---|
690 | \begin{cfa} |
---|
691 | A && B || C || ! GA && B || ! GB && A || ! GA && ! GB && ! GC |
---|
692 | \end{cfa} |
---|
693 | which can be simplified to: |
---|
694 | \begin{cfa} |
---|
695 | ( A || ! GA ) && ( B || ! GB ) || C || ! GA && ! GB && ! GC |
---|
696 | \end{cfa} |
---|
697 | Checking the complete predicate on each iteration of the pending @waituntil@ is expensive so \uC and \CFA both take steps to simplify checking for statement completion. |
---|
698 | |
---|
699 | In the \uC @_Select@ statement, this problem is solved by constructing a tree of the resources, where the internal nodes are operators and the leaves are booleans storing the state of each resource. |
---|
700 | A diagram of the tree for the complete predicate above is shown in Figure~\ref{f:uC_select_tree}, alongside the modification of the tree that occurs when @GA@ is @false@. |
---|
701 | Each internal node stores the statuses of the two subtrees beneath it. |
---|
702 | When resources become available, their corresponding leaf node status is modified, which percolates up the tree to update the state of the statement. |
---|
703 | Once the root of the tree has both subtrees marked as @true@ then the statement is complete. |
---|
704 | As an optimization, when the internal nodes are updated, the subtrees marked as @true@ are pruned and not examined again. |
---|
705 | To support statement guards in \uC, the tree is modified to remove an internal node if a guard is false to maintain the appropriate predicate representation. |
---|
706 | |
---|
707 | \begin{figure} |
---|
708 | \begin{center} |
---|
709 | \input{diagrams/uCpp_select_tree.tikz} |
---|
710 | \end{center} |
---|
711 | \caption{\uC \lstinline[language=uC++]{select} tree modification} |
---|
712 | \label{f:uC_select_tree} |
---|
713 | \end{figure} |
---|
714 | |
---|
715 | The \CFA @waituntil@ statement blocks a thread until a set of resources have become available that satisfy the complete predicate of a @waituntil@. |
---|
716 | The waiting condition of the @waituntil@ statement is implemented as the complete predicate over the resources, joined by the @waituntil@ operators, where a resource is @true@ if it is available, and @false@ otherwise. |
---|
717 | This complete predicate is used as the mechanism to check if a thread is done waiting on a @waituntil@. |
---|
718 | Leveraging the compiler, a predicate routine is generated per @waituntil@. |
---|
719 | This predicate routine accepts the statuses of the resources being waited on as arguments. |
---|
720 | A resource status is an integer that indicates whether a resource is either not available, available, or has already run its associated code block. |
---|
721 | The predicate routine returns @true@ when the @waituntil@ is done, \ie enough resources have run their associated code blocks to satisfy the @waituntil@'s predicate, and false otherwise. |
---|
722 | To support guards on the \CFA @waituntil@ statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate. |
---|
723 | The generated code allows the predicate that is checked with each iteration to be simplified to not check guard values. |
---|
724 | For example, the following is generated for the complete predicate above: |
---|
725 | \begin{cfa} |
---|
726 | // statement completion predicate |
---|
727 | bool check_completion( select_node * nodes ) { |
---|
728 | return nodes[0].status && nodes[1].status || nodes[2].status; |
---|
729 | } |
---|
730 | if ( GA || GB || GC ) { $\C{// skip statement if all guards false}$ |
---|
731 | select_node nodes[3]; |
---|
732 | nodes[0].status = ! GA && GB; $\C{// A's status}$ |
---|
733 | nodes[1].status = ! GB && GA; $\C{// B's status}$ |
---|
734 | nodes[2].status = ! GC; $\C{// C's status}$ |
---|
735 | // ... rest of waituntil codegen ... |
---|
736 | } |
---|
737 | \end{cfa} |
---|
738 | |
---|
739 | \uC's @_Select@, supports operators both inside and outside of the \lstinline[language=uC++]{_Select} clauses. |
---|
740 | In the following example, the code blocks run once their corresponding predicate inside the round braces is satisfied. |
---|
741 | \begin{lstlisting}[language=uC++,{moredelim=**[is][\color{red}]{@}{@}}] |
---|
742 | Future_ISM<int> A, B, C, D; |
---|
743 | _Select( @A || B && C@ ) { ... } |
---|
744 | and _Select( @D && E@ ) { ... } |
---|
745 | \end{lstlisting} |
---|
746 | This feature is more expressive than the @waituntil@ statement in \CFA, allowing the code block for @&&@ to only run after \emph{both} resources are available. |
---|
747 | |
---|
748 | In \CFA, since the @waituntil@ statement supports more resources than just futures, implementing operators inside clauses is avoided for a few reasons. |
---|
749 | As a motivating example, suppose \CFA supported operators inside clauses as in: |
---|
750 | \begin{cfa} |
---|
751 | owner_lock A, B, C, D; |
---|
752 | waituntil( A && B ) { ... } |
---|
753 | or waituntil( C && D ) { ... } |
---|
754 | \end{cfa} |
---|
755 | If the @waituntil@ acquires each lock as it becomes available, there is a possible deadlock since it is in a hold-and-wait situation. |
---|
756 | Other semantics are needed to ensure this operation is safe. |
---|
757 | One possibility is to use \CC's @scoped_lock@ approach described in Section~\ref{s:DeadlockAvoidance}; |
---|
758 | however, that opens the potential for livelock. |
---|
759 | Another possibility is to use resource ordering similar to \CFA's @mutex@ statement, but that alone is insufficient, if the resource ordering is not used universally. |
---|
760 | One other way this could be implemented is to wait until all resources for a given clause are available before proceeding to acquire them, but this also quickly becomes a poor approach. |
---|
761 | This approach does not work due to \gls{toctou} issues, \ie it is impossible to ensure that the full set of resources are available without holding them all first. |
---|
762 | Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems. |
---|
763 | Finally, the problem of operators inside clauses is also a difficult to handle when supporting channels. |
---|
764 | It would require some way to ensure channels used with internal operators are modified, if and only if, the corresponding code block is run. |
---|
765 | However, that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}. |
---|
766 | Ultimately, this feature was not considered crucial after taking into account the complexity and runtime cost. |
---|
767 | |
---|
768 | \subsection{The full \lstinline{waituntil} picture} |
---|
769 | |
---|
770 | Given all the complex details handled by the @waituntil@, its full pseudocode is presented in Figure \ref{f:WU_Full_Impl}. |
---|
771 | Some things to note are as follows. |
---|
772 | The @finally@ blocks provide exception-safe \gls{raii} unregistering of nodes, and in particular, the @finally@ inside the innermost loop performs the immediate unregistering required for deadlock-freedom mentioned in Section~\ref{s:wu_locks}. |
---|
773 | The @when_conditions@ array is used to store the boolean result of evaluating each guard at the beginning of the @waituntil@, and it is used to conditionally omit operations on resources with @false@ guards. |
---|
774 | As discussed in Section~\ref{s:wu_chans}, this pseudocode includes conditional code-block execution based on the result of both @on_selected@ and @unregister_select@, which allows the channel implementation to ensure all available channel resources have their corresponding code block run. |
---|
775 | |
---|
776 | \begin{figure} |
---|
777 | \begin{cfa} |
---|
778 | bool when_conditions[N]; |
---|
779 | for ( node in nodes ) $\C[3.75in]{// evaluate guards}$ |
---|
780 | if ( node has guard ) |
---|
781 | when_conditions[node] = node_guard; |
---|
782 | else |
---|
783 | when_conditions[node] = true; |
---|
784 | |
---|
785 | if ( any when_conditions[node] are true ) { |
---|
786 | select_nodes nodes[N]; $\C{// declare N select nodes}$ |
---|
787 | try { |
---|
788 | // ... set statuses for nodes with when_conditions[node] == false ... |
---|
789 | |
---|
790 | for ( node in nodes ) $\C{// register nodes}$ |
---|
791 | if ( when_conditions[node] ) |
---|
792 | register_select( resource, node ); |
---|
793 | |
---|
794 | while ( !check_completion( nodes ) ) { $\C{// check predicate}$ |
---|
795 | // block |
---|
796 | for ( resource in waituntil statement ) { $\C{// run true code blocks}$ |
---|
797 | if ( check_completion( nodes ) ) break; |
---|
798 | if ( resource is avail ) |
---|
799 | try { |
---|
800 | if( on_selected( resource ) ) $\C{// conditionally run block}$ |
---|
801 | run code block |
---|
802 | } finally $\C{// for exception safety}$ |
---|
803 | unregister_select( resource, node ); $\C{// immediate unregister}$ |
---|
804 | } |
---|
805 | } |
---|
806 | } finally { $\C{// for exception safety}$ |
---|
807 | for ( registered nodes in nodes ) $\C{// deregister nodes}$ |
---|
808 | if ( when_conditions[node] && unregister_select( resource, node ) |
---|
809 | && on_selected( resource ) ) |
---|
810 | run code block $\C{// run code block upon unregister}\CRT$ |
---|
811 | } |
---|
812 | } |
---|
813 | \end{cfa} |
---|
814 | \caption{Full \lstinline{waituntil} Pseudocode Implementation} |
---|
815 | \label{f:WU_Full_Impl} |
---|
816 | \end{figure} |
---|
817 | |
---|
818 | \section{\lstinline{waituntil} Performance} |
---|
819 | |
---|
820 | Similar facilities to @waituntil@ are discussed in Section~\ref{s:History} covering C, Ada, Rust, \CC, and OCaml. |
---|
821 | However, these facilities are either not meaningful or feasible to benchmark against. |
---|
822 | The UNIX @select@ and related utilities are not comparable since they are system calls that go into the kernel and operate on file descriptors, whereas the @waituntil@ exists solely in user space. |
---|
823 | Ada's \lstinline[language=Ada]{select} and \uC's \lstinline[language=uC++]{_Accept} only operate on method calls, which is done in \CFA via the @waitfor@ statement. |
---|
824 | Rust and \CC only offer a busy-wait approach, which is not comparable to a blocking approach. |
---|
825 | OCaml's @select@ waits on channels that are not comparable with \CFA and Go channels, so OCaml @select@ is not benchmarked against Go's @select@ and \CFA's @waituntil@. |
---|
826 | |
---|
827 | The two \gls{synch_multiplex} utilities that are in the realm of comparability with the \CFA @waituntil@ statement are the Go \lstinline[language=Go]{select} statement and the \uC \lstinline[language=uC++]{_Select} statement. |
---|
828 | As such, two microbenchmarks are presented, one for Go and one for \uC to contrast this feature. |
---|
829 | Given the differences in features, polymorphism, and expressibility between @waituntil@ and \lstinline[language=Go]{select}, and \uC \lstinline[language=uC++]{_Select}, the aim of the microbenchmarking in this chapter is to show that these implementations lie in the same realm of performance, not to pick a winner. |
---|
830 | |
---|
831 | \subsection{Channel Benchmark} |
---|
832 | |
---|
833 | The channel multiplexing benchmarks compare \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}, where the resource being waited on is a set of channels. |
---|
834 | Although Unix's select, poll and epoll multiplex over file descriptors, which are effectively channels, these utilities are not included in the benchmarks. |
---|
835 | Reading or writing to a file descriptor requires a system call which is much more expensive than operating on a user-space channel. |
---|
836 | As such, it is infeasible to compare Unix's select, poll and epoll with \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}. |
---|
837 | The basic structure of the benchmark has the number of cores split evenly between producer and consumer threads, \ie, with 8 cores there are 4 producer and 4 consumer threads. |
---|
838 | The number of resource clauses $C$ is also varied across 2, 4, and 8 clauses, where each clause has a different channel that it waits on. |
---|
839 | Each producer and consumer repeatedly waits to either produce or consume from one of the $C$ clauses and respective channels. |
---|
840 | For example, in \CFA syntax, the work loop in the consumer main with $C = 4$ clauses is: |
---|
841 | \begin{cfa} |
---|
842 | for () |
---|
843 | waituntil( val << chans[0] ); or waituntil( val << chans[1] ); |
---|
844 | or waituntil( val << chans[2] ); or waituntil( val << chans[3] ); |
---|
845 | \end{cfa} |
---|
846 | A successful consumption is counted as a channel operation, and the throughput of these operations is measured over 10 seconds. |
---|
847 | The first benchmark measures throughput of the producers and consumer synchronously waiting on the channels and the second has the threads asynchronously wait on the channels using the Go @default@ and \CFA @else@ clause. |
---|
848 | The results are shown in Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} respectively. |
---|
849 | |
---|
850 | \begin{figure} |
---|
851 | \centering |
---|
852 | \captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize} |
---|
853 | \subfloat[AMD]{ |
---|
854 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_2.pgf}} |
---|
855 | } |
---|
856 | \subfloat[Intel]{ |
---|
857 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_2.pgf}} |
---|
858 | } |
---|
859 | \bigskip |
---|
860 | |
---|
861 | \subfloat[AMD]{ |
---|
862 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_4.pgf}} |
---|
863 | } |
---|
864 | \subfloat[Intel]{ |
---|
865 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_4.pgf}} |
---|
866 | } |
---|
867 | \bigskip |
---|
868 | |
---|
869 | \subfloat[AMD]{ |
---|
870 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_8.pgf}} |
---|
871 | } |
---|
872 | \subfloat[Intel]{ |
---|
873 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_8.pgf}} |
---|
874 | } |
---|
875 | \caption{The channel synchronous multiplexing benchmark comparing Go select and \CFA \lstinline{waituntil} statement throughput (higher is better).} |
---|
876 | \label{f:select_contend_bench} |
---|
877 | \end{figure} |
---|
878 | |
---|
879 | \begin{figure} |
---|
880 | \centering |
---|
881 | \captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize} |
---|
882 | \subfloat[AMD]{ |
---|
883 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_2.pgf}} |
---|
884 | } |
---|
885 | \subfloat[Intel]{ |
---|
886 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_2.pgf}} |
---|
887 | } |
---|
888 | \bigskip |
---|
889 | |
---|
890 | \subfloat[AMD]{ |
---|
891 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_4.pgf}} |
---|
892 | } |
---|
893 | \subfloat[Intel]{ |
---|
894 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_4.pgf}} |
---|
895 | } |
---|
896 | \bigskip |
---|
897 | |
---|
898 | \subfloat[AMD]{ |
---|
899 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_8.pgf}} |
---|
900 | } |
---|
901 | \subfloat[Intel]{ |
---|
902 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_8.pgf}} |
---|
903 | } |
---|
904 | \caption{The asynchronous multiplexing channel benchmark comparing Go select and \CFA \lstinline{waituntil} statement throughput (higher is better).} |
---|
905 | \label{f:select_spin_bench} |
---|
906 | \end{figure} |
---|
907 | |
---|
908 | Both Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} have similar results when comparing \lstinline[language=Go]{select} and @waituntil@. |
---|
909 | In the AMD benchmarks (left column), the performance is very similar as the number of cores scale. |
---|
910 | The AMD machine has a high-caching contention cost because of its \emph{chiplet} L3 cache (\ie many L3 caches servicing a small number of cores), which creates a bottleneck on the channel locks and dominates the shape of the performance curve for both \CFA and Go. |
---|
911 | Hence, it is difficult to detect differences in the \gls{synch_multiplex}, except at low cores, where Go has significantly better performance, due to an optimization in its scheduler. |
---|
912 | Go heavily optimizes thread handoffs on the local run-queue, which can result in very good performance for low numbers of threads parking/unparking each other~\cite{go:sched}. |
---|
913 | In the Intel benchmarks (right column), \CFA performs better than Go as the number of cores scales past 2/4 and as the number of clauses increase. |
---|
914 | This difference is due to Go's acquiring all channel locks when registering and unregistering channels on a \lstinline[language=Go]{select}. |
---|
915 | Go then is holding a lock for every channel, resulting in worse performance as the number of channels increase. |
---|
916 | In \CFA, since races are consolidated without holding all locks, it scales much better both with cores and clauses since more work can occur in parallel. |
---|
917 | This scalability difference is more significant on the Intel machine than the AMD machine since the Intel has lower cache-contention costs. |
---|
918 | |
---|
919 | The Go approach of holding all internal channel-locks in the \lstinline[language=Go]{select} has additional drawbacks. |
---|
920 | There are pathological cases where Go's throughput has significant jitter. |
---|
921 | Consider a producer and consumer thread, @P1@ and @C1@, selecting from both channels @A@ and @B@. |
---|
922 | \begin{cquote} |
---|
923 | \begin{tabular}{@{}ll@{}} |
---|
924 | @P1@ & @C1@ \\ |
---|
925 | \begin{cfa} |
---|
926 | waituntil( A << i ); or waituntil( B << i ); |
---|
927 | \end{cfa} |
---|
928 | & |
---|
929 | \begin{cfa} |
---|
930 | waituntil( val << A ); or waituntil( val << B ); |
---|
931 | \end{cfa} |
---|
932 | \end{tabular} |
---|
933 | \end{cquote} |
---|
934 | Additionally, there is another producer and consumer thread, @P2@ and @C2@, operating solely on @B@. |
---|
935 | \begin{cquote} |
---|
936 | \begin{tabular}{@{}ll@{}} |
---|
937 | @P2@ & @C2@ \\ |
---|
938 | \begin{cfa} |
---|
939 | B << val; |
---|
940 | \end{cfa} |
---|
941 | & |
---|
942 | \begin{cfa} |
---|
943 | val << B; |
---|
944 | \end{cfa} |
---|
945 | \end{tabular} |
---|
946 | \end{cquote} |
---|
947 | In Go, this setup results in significantly worse performance since @P2@ and @C2@ cannot operate in parallel with @P1@ and @C1@ due to all locks being acquired. |
---|
948 | Interestingly, this case may not be as pathological as it seems. |
---|
949 | If the set of channels belonging to a \lstinline[language=Go]{select} have channels that overlap with the set of another \lstinline[language=Go]{select}, these statements lose the ability to operate in parallel. |
---|
950 | The implementation in \CFA only holds a single lock at a time, resulting in better locking granularity, and hence, more parallelism. |
---|
951 | Comparison of this pathological case is shown in Table~\ref{t:pathGo}. |
---|
952 | The AMD results highlight the worst-case scenario for Go since contention is more costly on this machine than the Intel machine. |
---|
953 | |
---|
954 | \begin{table}[t] |
---|
955 | \centering |
---|
956 | \setlength{\extrarowheight}{2pt} |
---|
957 | \setlength{\tabcolsep}{5pt} |
---|
958 | |
---|
959 | \caption{Throughput (channel operations per second) of \CFA and Go in a pathological case for contention in Go's select implementation} |
---|
960 | \label{t:pathGo} |
---|
961 | \begin{tabular}{r|r|r} |
---|
962 | & \multicolumn{1}{c|}{\CFA} & \multicolumn{1}{c}{Go} \\ |
---|
963 | \hline |
---|
964 | AMD & \input{data/nasus_Order} \\ |
---|
965 | \hline |
---|
966 | Intel & \input{data/pyke_Order} |
---|
967 | \end{tabular} |
---|
968 | \end{table} |
---|
969 | |
---|
970 | Another difference between Go and \CFA is the order of clause selection when multiple clauses are available. |
---|
971 | Go \emph{randomly} selects a clause~\cite{go:select}, but \CFA chooses in the order clauses are listed. |
---|
972 | This \CFA design decision allows users to set implicit priorities, which can result in more predictable behaviour and even better performance. |
---|
973 | In the previous example, threads @P1@ and @C1@ prioritize channel @A@ in the @waituntil@, which can reduce contention for threads @P2@ and @C2@ accessing channel @B@. |
---|
974 | If \CFA did not have priorities, the performance difference in Table~\ref{t:pathGo} would be significant less due to extra contention on channel @B@. |
---|
975 | |
---|
976 | \subsection{Future Benchmark} |
---|
977 | |
---|
978 | The future benchmark compares \CFA's @waituntil@ with \uC's \lstinline[language=uC++]{_Select}, with both utilities waiting on futures. |
---|
979 | While both statements have very similar semantics, supporting @and@ and @or@ operators, \lstinline[language=uC++]{_Select} can only wait on futures, whereas the @waituntil@ is polymorphic. |
---|
980 | As such, the underlying implementation of the operators differs between @waituntil@ and \lstinline[language=uC++]{_Select}. |
---|
981 | The @waituntil@ statement checks for statement completion using a predicate function, whereas the \lstinline[language=uC++]{_Select} statement maintains a tree that represents the state of the internal predicate. |
---|
982 | |
---|
983 | This benchmark aims to indirectly measure the impact of various predicates on the performance of the @waituntil@ and \lstinline[language=uC++]{_Select} statements. |
---|
984 | The benchmark is indirect since the performance of futures in \CFA and \uC differ by a significant margin. |
---|
985 | The experiment has a server, which cycles fulfilling three futures, @A@, @B@, and @C@, and a client, which waits for these futures to be fulfilled using four different kinds of predicates given in \CFA: |
---|
986 | \begin{cquote} |
---|
987 | \begin{tabular}{@{}l|l@{}} |
---|
988 | OR & AND \\ |
---|
989 | \hline |
---|
990 | \begin{cfa} |
---|
991 | waituntil( A ) { get( A ); } |
---|
992 | or waituntil( B ) { get( B ); } |
---|
993 | or waituntil( C ) { get( C ); } |
---|
994 | \end{cfa} |
---|
995 | & |
---|
996 | \begin{cfa} |
---|
997 | waituntil( A ) { get( A ); } |
---|
998 | and waituntil( B ) { get( B ); } |
---|
999 | and waituntil( C ) { get( C ); } |
---|
1000 | \end{cfa} |
---|
1001 | \\ |
---|
1002 | \multicolumn{2}{@{}c@{}}{} \\ |
---|
1003 | AND-OR & OR-AND \\ |
---|
1004 | \hline |
---|
1005 | \begin{cfa} |
---|
1006 | waituntil( A ) { get( A ); } |
---|
1007 | and waituntil( B ) { get( B ); } |
---|
1008 | or waituntil( C ) { get( C ); } |
---|
1009 | \end{cfa} |
---|
1010 | & |
---|
1011 | \begin{cfa} |
---|
1012 | @(@ waituntil( A ) { get( A ); } |
---|
1013 | or waituntil( B ) { get( B ); } @)@ |
---|
1014 | and waituntil( C ) { get( C ); } |
---|
1015 | \end{cfa} |
---|
1016 | \end{tabular} |
---|
1017 | \end{cquote} |
---|
1018 | The server and client use a low cost synchronize after each fulfillment, so the server does not race ahead of the client. |
---|
1019 | |
---|
1020 | Results of this benchmark are shown in Figure~\ref{f:futurePerf}. |
---|
1021 | Each pair of bars is marked with the predicate name for that experiment and the value at the top of each bar is the standard deviation. |
---|
1022 | In detail, \uC results are lower in all cases due to the performance difference between futures and the more complex \gls{synch_multiplex} implementation. |
---|
1023 | However, the bars for both systems have similar height patterns across the experiments. |
---|
1024 | The @OR@ column for \CFA is more performant than the other \CFA predicates, due to the special-casing of @waituntil@ statements with only @or@ operators. |
---|
1025 | For both \uC and \CFA, the @AND@ experiment is the least performant, which is expected since all three futures need to be fulfilled for each statement completion. |
---|
1026 | Interestingly, \CFA has lower variation across predicates on the AMD (excluding the special OR case), whereas \uC has lower variation on the Intel. |
---|
1027 | Given the differences in semantics and implementation between \uC and \CFA, this test only illustrates the overall costs among the different kinds of predicates. |
---|
1028 | |
---|
1029 | \begin{figure} |
---|
1030 | \centering |
---|
1031 | \subfloat[AMD Future Synchronization Benchmark]{ |
---|
1032 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Future.pgf}} |
---|
1033 | \label{f:futureAMD} |
---|
1034 | } |
---|
1035 | \subfloat[Intel Future Synchronization Benchmark]{ |
---|
1036 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Future.pgf}} |
---|
1037 | \label{f:futureIntel} |
---|
1038 | } |
---|
1039 | \caption{\CFA \lstinline{waituntil} and \uC \lstinline{_Select} statement throughput synchronizing on a set of futures with varying wait predicates (higher is better).} |
---|
1040 | \label{f:futurePerf} |
---|
1041 | \end{figure} |
---|