1 | % ====================================================================== |
---|
2 | % ====================================================================== |
---|
3 | \chapter{Waituntil}\label{s:waituntil} |
---|
4 | % ====================================================================== |
---|
5 | % ====================================================================== |
---|
6 | |
---|
7 | Consider the following motivating problem. |
---|
8 | There are $N$ stalls (resources) in a bathroom and there are $M$ people (threads) using the bathroom. |
---|
9 | Each stall has its own lock since only one person may occupy a stall at a time. |
---|
10 | Humans solve this problem in the following way. |
---|
11 | They check if all of the stalls are occupied. |
---|
12 | If not, they enter and claim an available stall. |
---|
13 | If they are all occupied, people queue and watch the stalls until one is free, and then enter and lock the stall. |
---|
14 | This solution can be implemented on a computer easily, if all threads are waiting on all stalls and agree to queue. |
---|
15 | |
---|
16 | Now the problem is extended. |
---|
17 | Some stalls are wheelchair accessible and some stalls have gender identification. |
---|
18 | Each person (thread) may be limited to only one kind of stall or may choose among different kinds of stalls that match their criteria. |
---|
19 | Immediately, the problem becomes more difficult. |
---|
20 | A single queue no longer solves the problem. |
---|
21 | What happens when there is a stall available that the person at the front of the queue cannot choose? |
---|
22 | The na\"ive solution has each thread spin indefinitely continually checking every matching kind of stall(s) until a suitable one is free. |
---|
23 | This approach is insufficient since it wastes cycles and results in unfairness among waiting threads as a thread can acquire the first matching stall without regard to the waiting time of other threads. |
---|
24 | Waiting for the first appropriate stall (resource) that becomes available without spinning is an example of \gls{synch_multiplex}: the ability to wait synchronously for one or more resources based on some selection criteria. |
---|
25 | |
---|
26 | \section{History of Synchronous Multiplexing} |
---|
27 | There is a history of tools that provide \gls{synch_multiplex}. |
---|
28 | Some well known \gls{synch_multiplex} tools include Unix system utilities: @select@~\cite{linux:select}, @poll@~\cite{linux:poll}, and @epoll@~\cite{linux:epoll}, and the @select@ statement provided by Go~\cite{go:selectref}, Ada~\cite[\S~9.7]{Ada16}, and \uC~\cite[\S~3.3.1]{uC++}. |
---|
29 | The concept and theory surrounding \gls{synch_multiplex} was introduced by Hoare in his 1985 book, Communicating Sequential Processes (CSP)~\cite{Hoare85}, |
---|
30 | \begin{quote} |
---|
31 | A communication is an event that is described by a pair $c.v$ where $c$ is the name of the channel on which the communication takes place and $v$ is the value of the message which passes.~\cite[p.~113]{Hoare85} |
---|
32 | \end{quote} |
---|
33 | The ideas in CSP were implemented by Roscoe and Hoare in the language Occam~\cite{Roscoe88}. |
---|
34 | |
---|
35 | Both CSP and Occam include the ability to wait for a \Newterm{choice} among receiver channels and \Newterm{guards} to toggle which receives are valid. |
---|
36 | For example, |
---|
37 | \begin{cfa}[mathescape] |
---|
38 | (@G1@(x) $\rightarrow$ P @|@ @G2@(y) $\rightarrow$ Q ) |
---|
39 | \end{cfa} |
---|
40 | waits for either channel @x@ or @y@ to have a value, if and only if guards @G1@ and @G2@ are true; |
---|
41 | if only one guard is true, only one channel receives, and if both guards are false, no receive occurs. |
---|
42 | % extended CSP with a \gls{synch_multiplex} construct @ALT@, which waits for one resource to be available and then executes a corresponding block of code. |
---|
43 | In detail, waiting for one resource out of a set of resources can be thought of as a logical exclusive-or over the set of resources. |
---|
44 | Guards are a conditional operator similar to an @if@, except they apply to the resource being waited on. |
---|
45 | If a guard is false, then the resource it guards is not in the set of resources being waited on. |
---|
46 | If all guards are false, the ALT, Occam's \gls{synch_multiplex} statement, does nothing and the thread continues. |
---|
47 | Guards can be simulated using @if@ statements as shown in~\cite[rule~2.4, p~183]{Roscoe88} |
---|
48 | \begin{lstlisting}[basicstyle=\rm,mathescape] |
---|
49 | ALT( $b$ & $g$ $P$, $G$ ) = IF ( $b$ ALT($\,g$ $P$, $G$ ), $\neg\,$b ALT( $G$ ) ) (boolean guard elim). |
---|
50 | \end{lstlisting} |
---|
51 | but require $2^N-1$ @if@ statements, where $N$ is the number of guards. |
---|
52 | The exponential blowup comes from applying rule 2.4 repeatedly, since it works on one guard at a time. |
---|
53 | Figure~\ref{f:wu_if} shows in \CFA an example of applying rule 2.4 for three guards. |
---|
54 | Also, notice the additional code duplication for statements @S1@, @S2@, and @S3@. |
---|
55 | |
---|
56 | \begin{figure} |
---|
57 | \centering |
---|
58 | \begin{lrbox}{\myboxA} |
---|
59 | \begin{cfa} |
---|
60 | when( G1 ) |
---|
61 | waituntil( R1 ) S1 |
---|
62 | or when( G2 ) |
---|
63 | waituntil( R2 ) S2 |
---|
64 | or when( G3 ) |
---|
65 | waituntil( R3 ) S3 |
---|
66 | |
---|
67 | |
---|
68 | |
---|
69 | |
---|
70 | |
---|
71 | |
---|
72 | |
---|
73 | \end{cfa} |
---|
74 | \end{lrbox} |
---|
75 | |
---|
76 | \begin{lrbox}{\myboxB} |
---|
77 | \begin{cfa} |
---|
78 | if ( G1 ) |
---|
79 | if ( G2 ) |
---|
80 | if ( G3 ) waituntil( R1 ) S1 or waituntil( R2 ) S2 or waituntil( R3 ) S3 |
---|
81 | else waituntil( R1 ) S1 or waituntil( R2 ) S2 |
---|
82 | else |
---|
83 | if ( G3 ) waituntil( R1 ) S1 or waituntil( R3 ) S3 |
---|
84 | else waituntil( R1 ) S1 |
---|
85 | else |
---|
86 | if ( G2 ) |
---|
87 | if ( G3 ) waituntil( R2 ) S2 or waituntil( R3 ) S3 |
---|
88 | else waituntil( R2 ) S2 |
---|
89 | else |
---|
90 | if ( G3 ) waituntil( R3 ) S3 |
---|
91 | \end{cfa} |
---|
92 | \end{lrbox} |
---|
93 | |
---|
94 | \subfloat[Guards]{\label{l:guards}\usebox\myboxA} |
---|
95 | \hspace*{5pt} |
---|
96 | \vrule |
---|
97 | \hspace*{5pt} |
---|
98 | \subfloat[Simulated Guards]{\label{l:simulated_guards}\usebox\myboxB} |
---|
99 | \caption{\CFA guard simulated with \lstinline{if} statement.} |
---|
100 | \label{f:wu_if} |
---|
101 | \end{figure} |
---|
102 | |
---|
103 | When discussing \gls{synch_multiplex} implementations, the resource being multiplexed is important. |
---|
104 | While CSP waits on channels, the earliest known implementation of synch\-ronous multiplexing is Unix's @select@~\cite{linux:select}, multiplexing over file descriptors. |
---|
105 | The @select@ system-call is passed three sets of file descriptors (read, write, exceptional) to wait on and an optional timeout. |
---|
106 | @select@ blocks until either some subset of file descriptors are available or the timeout expires. |
---|
107 | All file descriptors that are ready are returned by modifying the argument sets to only contain the ready descriptors. |
---|
108 | |
---|
109 | This early implementation differs from the theory presented in CSP: when the call from @select@ returns it may provide more than one ready file descriptor. |
---|
110 | As such, @select@ has logical-or multiplexing semantics, whereas the theory described exclusive-or semantics. |
---|
111 | It is possible to achieve exclusive-or semantics with @select@ by arbitrarily operating on only one of the returned descriptors. |
---|
112 | @select@ passes the interest set of file descriptors between application and kernel in the form of a worst-case sized bit-mask, where the worst-case is the largest numbered file descriptor. |
---|
113 | @poll@ reduces the size of the interest sets changing from a bit mask to a linked data structure, independent of the file-descriptor values. |
---|
114 | @epoll@ further reduces the data passed per call by keeping the interest set in the kernel, rather than supplying it on every call. |
---|
115 | |
---|
116 | These early \gls{synch_multiplex} tools interact directly with the operating system and others are used to communicate among processes. |
---|
117 | Later, \gls{synch_multiplex} started to appear in applications, via programming languages, to support fast multiplexed concurrent communication among threads. |
---|
118 | An early example of \gls{synch_multiplex} is the @select@ statement in Ada~\cite[\S~9.7]{Ichbiah79}. |
---|
119 | This @select@ allows a task object, with their own threads, to multiplex over a subset of asynchronous calls to its methods. |
---|
120 | The Ada @select@ has the same exclusive-or semantics and guards as Occam ALT; |
---|
121 | however, it multiplexes over methods rather than channels. |
---|
122 | |
---|
123 | \begin{figure} |
---|
124 | \begin{lstlisting}[language=ada,literate=] |
---|
125 | task type buffer is -- thread |
---|
126 | ... -- buffer declarations |
---|
127 | count : integer := 0; |
---|
128 | begin -- thread starts here |
---|
129 | loop |
---|
130 | select |
---|
131 | when count < Size => -- guard |
---|
132 | accept insert( elem : in ElemType ) do -- method |
---|
133 | ... -- add to buffer |
---|
134 | count := count + 1; |
---|
135 | end; |
---|
136 | -- executed if this accept called |
---|
137 | or |
---|
138 | when count > 0 => -- guard |
---|
139 | accept remove( elem : out ElemType ) do -- method |
---|
140 | ... --remove and return from buffer via parameter |
---|
141 | count := count - 1; |
---|
142 | end; |
---|
143 | -- executed if this accept called |
---|
144 | or delay 10.0; -- unblock after 10 seconds without call |
---|
145 | or else -- do not block, cannot appear with delay |
---|
146 | end select; |
---|
147 | end loop; |
---|
148 | end buffer; |
---|
149 | var buf : buffer; -- create task object and start thread in task body |
---|
150 | \end{lstlisting} |
---|
151 | \caption{Ada Bounded Buffer} |
---|
152 | \label{f:BB_Ada} |
---|
153 | \end{figure} |
---|
154 | |
---|
155 | Figure~\ref{f:BB_Ada} shows the outline of a bounded buffer implemented with an Ada task. |
---|
156 | Note, a task method is associated with the \lstinline[language=ada]{accept} clause of the \lstinline[language=ada]{select} statement, rather than being a separate routine. |
---|
157 | The thread executing the loop in the task body blocks at the \lstinline[language=ada]{select} until a call occurs to @insert@ or @remove@. |
---|
158 | Then the appropriate \lstinline[language=ada]{accept} method is run with the called arguments. |
---|
159 | Hence, the \lstinline[language=ada]{select} statement provides rendezvous points for threads, rather than providing channels with message passing. |
---|
160 | The \lstinline[language=ada]{select} statement also provides a timeout and @else@ (nonblocking), which changes synchronous multiplexing to asynchronous. |
---|
161 | Now the thread polls rather than blocks. |
---|
162 | |
---|
163 | Another example of programming-language \gls{synch_multiplex} is Go using a @select@ statement with channels~\cite{go:selectref}. |
---|
164 | Figure~\ref{l:BB_Go} shows the outline of a bounded buffer implemented with a Go routine. |
---|
165 | Here two channels are used for inserting and removing by client producers and consumers, respectively. |
---|
166 | (The @term@ and @stop@ channels are used to synchronize with the program main.) |
---|
167 | Go's @select@ has the same exclusive-or semantics as the ALT primitive from Occam and associated code blocks for each clause like ALT and Ada. |
---|
168 | However, unlike Ada and ALT, Go does not provide guards for the \lstinline[language=go]{case} clauses of the \lstinline[language=go]{select}. |
---|
169 | Go also provides a timeout via a channel and a @default@ clause like Ada @else@ for asynchronous multiplexing. |
---|
170 | |
---|
171 | \begin{figure} |
---|
172 | \centering |
---|
173 | |
---|
174 | \begin{lrbox}{\myboxA} |
---|
175 | \begin{lstlisting}[language=go,literate=] |
---|
176 | func main() { |
---|
177 | insert := make( chan int, Size ) |
---|
178 | remove := make( chan int, Size ) |
---|
179 | term := make( chan string ) |
---|
180 | finish := make( chan string ) |
---|
181 | |
---|
182 | buf := func() { |
---|
183 | L: for { |
---|
184 | select { // wait for message |
---|
185 | case i = <- buffer: |
---|
186 | case <- term: break L |
---|
187 | } |
---|
188 | remove <- i; |
---|
189 | } |
---|
190 | finish <- "STOP" // completion |
---|
191 | } |
---|
192 | go buf() // start thread in buf |
---|
193 | } |
---|
194 | |
---|
195 | |
---|
196 | |
---|
197 | |
---|
198 | \end{lstlisting} |
---|
199 | \end{lrbox} |
---|
200 | |
---|
201 | \begin{lrbox}{\myboxB} |
---|
202 | \begin{lstlisting}[language=uC++=] |
---|
203 | _Task BoundedBuffer { |
---|
204 | ... // buffer declarations |
---|
205 | int count = 0; |
---|
206 | public: |
---|
207 | void insert( int elem ) { |
---|
208 | ... // add to buffer |
---|
209 | count += 1; |
---|
210 | } |
---|
211 | int remove() { |
---|
212 | ... // remove and return from buffer |
---|
213 | count -= 1; |
---|
214 | } |
---|
215 | private: |
---|
216 | void main() { |
---|
217 | for ( ;; ) { |
---|
218 | _Accept( ~buffer ) break; |
---|
219 | or _When ( count < Size ) _Accept( insert ); |
---|
220 | or _When ( count > 0 ) _Accept( remove ); |
---|
221 | } |
---|
222 | } |
---|
223 | }; |
---|
224 | buffer buf; // start thread in main method |
---|
225 | \end{lstlisting} |
---|
226 | \end{lrbox} |
---|
227 | |
---|
228 | \subfloat[Go]{\label{l:BB_Go}\usebox\myboxA} |
---|
229 | \hspace*{5pt} |
---|
230 | \vrule |
---|
231 | \hspace*{5pt} |
---|
232 | \subfloat[\uC]{\label{l:BB_uC++}\usebox\myboxB} |
---|
233 | |
---|
234 | \caption{Bounded Buffer} |
---|
235 | \label{f:AdaMultiplexing} |
---|
236 | \end{figure} |
---|
237 | |
---|
238 | Finally, \uC provides \gls{synch_multiplex} with Ada-style @select@ over monitor and task methods with the @_Accept@ statement~\cite[\S~2.9.2.1]{uC++}, and over futures with the @_Select@ statement~\cite[\S~3.3.1]{uC++}. |
---|
239 | The @_Select@ statement extends the ALT/Go @select@ by offering both @and@ and @or@ semantics, which can be used together in the same statement. |
---|
240 | Both @_Accept@ and @_Select@ statements provide guards for multiplexing clauses, as well as, timeout, and @else@ clauses. |
---|
241 | |
---|
242 | There are other languages that provide \gls{synch_multiplex}, including Rust's @select!@ over futures~\cite{rust:select}, OCaml's @select@ over channels~\cite{ocaml:channel}, and C++14's @when_any@ over futures~\cite{cpp:whenany}. |
---|
243 | Note that while C++14 and Rust provide \gls{synch_multiplex}, the implementations leave much to be desired as both rely on polling to wait on multiple resources. |
---|
244 | |
---|
245 | \section{Other Approaches to Synchronous Multiplexing} |
---|
246 | |
---|
247 | To avoid the need for \gls{synch_multiplex}, all communication among threads/processes must come from a single source. |
---|
248 | For example, in Erlang each process has a single heterogeneous mailbox that is the sole source of concurrent communication, removing the need for \gls{synch_multiplex} as there is only one place to wait on resources. |
---|
249 | Similar, actor systems circumvent the \gls{synch_multiplex} problem as actors only block when waiting for the next message never in a behaviour. |
---|
250 | While these approaches solve the \gls{synch_multiplex} problem, they introduce other issues. |
---|
251 | Consider the case where a thread has a single source of communication and it wants a set of @N@ resources. |
---|
252 | It must sequentially request the @N@ resources and wait for each response. |
---|
253 | During the receives for the @N@ resources, it can receive other communication, and has to save and postpone these communications, or discard them. |
---|
254 | % If the requests for the other resources need to be retracted, the burden falls on the programmer to determine how to synchronize appropriately to ensure that only one resource is delivered. |
---|
255 | |
---|
256 | \section{\CFA's Waituntil Statement} |
---|
257 | |
---|
258 | The new \CFA \gls{synch_multiplex} utility introduced in this work is the @waituntil@ statement. |
---|
259 | There already exists a @waitfor@ statement in \CFA that supports Ada-style \gls{synch_multiplex} over monitor methods~\cite{Delisle21}, so this @waituntil@ focuses on synchronizing over other resources. |
---|
260 | All of the \gls{synch_multiplex} features mentioned so far are monomorphic, only waiting on one kind of resource: Unix @select@ supports file descriptors, Go's @select@ supports channel operations, \uC's @select@ supports futures, and Ada's @select@ supports monitor method calls. |
---|
261 | The \CFA @waituntil@ is polymorphic and provides \gls{synch_multiplex} over any objects that satisfy the trait in Figure~\ref{f:wu_trait}. |
---|
262 | No other language provides a synchronous multiplexing tool polymorphic over resources like \CFA's @waituntil@. |
---|
263 | |
---|
264 | \begin{figure} |
---|
265 | \begin{cfa} |
---|
266 | forall(T & | sized(T)) |
---|
267 | trait is_selectable { |
---|
268 | // For registering a waituntil stmt on a selectable type |
---|
269 | bool register_select( T &, select_node & ); |
---|
270 | |
---|
271 | // For unregistering a waituntil stmt from a selectable type |
---|
272 | bool unregister_select( T &, select_node & ); |
---|
273 | |
---|
274 | // on_selected is run on the selecting thread prior to executing |
---|
275 | // the statement associated with the select_node |
---|
276 | bool on_selected( T &, select_node & ); |
---|
277 | }; |
---|
278 | \end{cfa} |
---|
279 | \caption{Trait for types that can be passed into \CFA's \lstinline{waituntil} statement.} |
---|
280 | \label{f:wu_trait} |
---|
281 | \end{figure} |
---|
282 | |
---|
283 | Currently locks, channels, futures and timeouts are supported by the @waituntil@ statement, and this set can be expanded through the @is_selectable@ trait as other use-cases arise. |
---|
284 | The @waituntil@ statement supports guard clauses, both @or@ and @and@ semantics, and timeout and @else@ for asynchronous multiplexing. |
---|
285 | Figure~\ref{f:wu_example} shows a \CFA @waituntil@ usage, which is waiting for either @Lock@ to be available \emph{or} for a value to be read from @Channel@ into @i@ \emph{and} for @Future@ to be fulfilled \emph{or} a timeout of one second. |
---|
286 | Note, the expression inside a @waituntil@ clause is evaluated once at the start of the @waituntil@ algorithm. |
---|
287 | |
---|
288 | \begin{figure} |
---|
289 | \begin{cfa} |
---|
290 | future(int) Future; |
---|
291 | channel(int) Channel; |
---|
292 | owner_lock Lock; |
---|
293 | int i = 0; |
---|
294 | |
---|
295 | waituntil( Lock ) { ... } |
---|
296 | or when( i == 0 ) waituntil( i << Channel ) { ... } |
---|
297 | and waituntil( Future ) { ... } |
---|
298 | or waituntil( timeout( 1`s ) ) { ... } |
---|
299 | // else { ... } |
---|
300 | \end{cfa} |
---|
301 | \caption{Example of \CFA's waituntil statement} |
---|
302 | \label{f:wu_example} |
---|
303 | \end{figure} |
---|
304 | |
---|
305 | \section{Waituntil Semantics} |
---|
306 | |
---|
307 | The @waituntil@ semantics has two parts: the semantics of the statement itself, \ie @and@, @or@, @when@ guards, and @else@ semantics, and the semantics of how the @waituntil@ interacts with types like locks, channels, and futures. |
---|
308 | |
---|
309 | \subsection{Statement Semantics} |
---|
310 | |
---|
311 | The @or@ semantics are the most straightforward and nearly match those laid out in the ALT statement from Occam. |
---|
312 | The clauses have an exclusive-or relationship where the first available one is run and only one clause is run. |
---|
313 | \CFA's @or@ semantics differ from ALT semantics: instead of randomly picking a clause when multiple are available, the first clause in the @waituntil@ that is available is executed. |
---|
314 | For example, in the following example, if @foo@ and @bar@ are both available, @foo@ is always selected since it comes first in the order of @waituntil@ clauses. |
---|
315 | \begin{cfa} |
---|
316 | future(int) bar, foo; |
---|
317 | waituntil( foo ) { ... } or waituntil( bar ) { ... } |
---|
318 | \end{cfa} |
---|
319 | The reason for this semantics is that prioritizing resources can be useful in certain problems. |
---|
320 | In the rare case where there is a starvation problem with the ordering, it possible to follow a @waituntil@ with its reverse form: |
---|
321 | \begin{cfa} |
---|
322 | waituntil( foo ) { ... } or waituntil( bar ) { ... } // prioritize foo |
---|
323 | waituntil( bar ) { ... } or waituntil( foo ) { ... } // prioritize bar |
---|
324 | \end{cfa} |
---|
325 | |
---|
326 | The \CFA @and@ semantics match the @and@ semantics of \uC \lstinline[language=uC++]{_Select}. |
---|
327 | When multiple clauses are joined by @and@, the @waituntil@ makes a thread wait for all to be available, but still runs the corresponding code blocks \emph{as they become available}. |
---|
328 | When an @and@ clause becomes available, the waiting thread unblocks and runs that clause's code-block, and then the thread waits again for the next available clause or the @waituntil@ statement is now true. |
---|
329 | This semantics allows work to be done in parallel while synchronizing over a set of resources, and furthermore, gives a good reason to use the @and@ operator. |
---|
330 | If the @and@ operator waited for all clauses to be available before running, it is the same as just acquiring those resources consecutively by a sequence of @waituntil@ statements. |
---|
331 | |
---|
332 | As for normal C expressions, the @and@ operator binds more tightly than the @or@. |
---|
333 | To give an @or@ operator higher precedence, parenthesis are used. |
---|
334 | For example, the following @waituntil@ unconditionally waits for @C@ and one of either @A@ or @B@, since the @or@ is given higher precedence via parenthesis. |
---|
335 | \begin{cfa} |
---|
336 | @(@ waituntil( A ) { ... } // bind tightly to or |
---|
337 | or waituntil( B ) { ... } @)@ |
---|
338 | and waituntil( C ) { ... } |
---|
339 | \end{cfa} |
---|
340 | |
---|
341 | The guards in the @waituntil@ statement are called @when@ clauses. |
---|
342 | Each boolean expression inside a @when@ is evaluated \emph{once} before the @waituntil@ statement is run. |
---|
343 | Like Occam's ALT, the guards toggle clauses on and off, where a @waituntil@ clause is only evaluated and waited on if the corresponding guard is @true@. |
---|
344 | In addition, the @waituntil@ guards require some nuance since both @and@ and @or@ operators are supported \see{Section~\ref{s:wu_guards}}. |
---|
345 | When a guard is false and a clause is removed, it can be thought of as removing that clause and its preceding operation from the statement. |
---|
346 | For example, in the following, the two @waituntil@ statements are semantically equivalent. |
---|
347 | |
---|
348 | \begin{lrbox}{\myboxA} |
---|
349 | \begin{cfa} |
---|
350 | when( true ) waituntil( A ) { ... } |
---|
351 | or when( false ) waituntil( B ) { ... } |
---|
352 | and waituntil( C ) { ... } |
---|
353 | \end{cfa} |
---|
354 | \end{lrbox} |
---|
355 | |
---|
356 | \begin{lrbox}{\myboxB} |
---|
357 | \begin{cfa} |
---|
358 | waituntil( A ) { ... } |
---|
359 | and waituntil( C ) { ... } |
---|
360 | |
---|
361 | \end{cfa} |
---|
362 | \end{lrbox} |
---|
363 | |
---|
364 | \begin{tabular}{@{}lcl@{}} |
---|
365 | \usebox\myboxA & $\equiv$ & \usebox\myboxB |
---|
366 | \end{tabular} |
---|
367 | |
---|
368 | The @else@ clause on the @waituntil@ has identical semantics to the @else@ clause in Ada. |
---|
369 | If all resources are not immediately available and there is an @else@ clause, the @else@ clause is run and the thread continues. |
---|
370 | |
---|
371 | \subsection{Type Semantics} |
---|
372 | |
---|
373 | As mentioned, to support interaction with the @waituntil@ statement a type must support the trait in Figure~\ref{f:wu_trait}. |
---|
374 | The @waituntil@ statement expects types to register and unregister themselves via calls to @register_select@ and @unregister_select@, respectively. |
---|
375 | When a resource becomes available, @on_selected@ is run, and if it returns false, the corresponding code block is not run. |
---|
376 | Many types do not need @on_selected@, but it is provided if a type needs to perform work or checks before the resource can be accessed in the code block. |
---|
377 | The register/unregister routines in the trait also return booleans. |
---|
378 | The return value of @register_select@ is @true@, if the resource is immediately available and @false@ otherwise. |
---|
379 | The return value of @unregister_select@ is @true@, if the corresponding code block should be run after unregistration and @false@ otherwise. |
---|
380 | The routine @on_selected@ and the return value of @unregister_select@ are needed to support channels as a resource. |
---|
381 | More detail on channels and their interaction with @waituntil@ appear in Section~\ref{s:wu_chans}. |
---|
382 | |
---|
383 | The trait is used by having a blocking object return a type that supports the @is_selectable@ trait. |
---|
384 | This feature leverages \CFA's ability to overload on return type to select the correct overloaded routine for the @waituntil@ context. |
---|
385 | A selectable type is needed for types that want to support multiple operations such as channels that allow both reading and writing. |
---|
386 | |
---|
387 | \section{\lstinline{waituntil} Implementation} |
---|
388 | |
---|
389 | The @waituntil@ statement is not inherently complex, and Figure~\ref{f:WU_Impl} only shows the basic outline of the @waituntil@ algorithm. |
---|
390 | The complexity comes from the consideration of race conditions and synchronization needed when supporting various primitives. |
---|
391 | The following sections then use examples to fill in details missing in Figure~\ref{f:WU_Impl}. |
---|
392 | The full pseudocode for the @waituntil@ algorithm is presented in Figure~\ref{f:WU_Full_Impl}. |
---|
393 | |
---|
394 | \begin{figure} |
---|
395 | \begin{cfa} |
---|
396 | select_nodes s[N]; $\C[3.25in]{// declare N select nodes}$ |
---|
397 | for ( node in s ) $\C{// register nodes}$ |
---|
398 | register_select( resource, node ); |
---|
399 | while ( statement predicate not satisfied ) { $\C{// check predicate}$ |
---|
400 | // block until clause(s) satisfied |
---|
401 | for ( resource in waituntil statement ) { $\C{// run true code blocks}$ |
---|
402 | if ( resource is avail ) run code block |
---|
403 | if ( statement predicate is satisfied ) break; |
---|
404 | } |
---|
405 | } |
---|
406 | for ( node in s ) $\C{// deregister nodes}\CRT$ |
---|
407 | if ( unregister_select( resource, node ) ) run code block |
---|
408 | \end{cfa} |
---|
409 | \caption{\lstinline{waituntil} Implementation} |
---|
410 | \label{f:WU_Impl} |
---|
411 | \end{figure} |
---|
412 | |
---|
413 | The basic steps of the algorithm are: |
---|
414 | \begin{enumerate} |
---|
415 | \item |
---|
416 | The @waituntil@ statement declares $N$ @select_node@s, one per resource that is being waited on, which stores any @waituntil@ data pertaining to that resource. |
---|
417 | |
---|
418 | \item |
---|
419 | Each @select_node@ is then registered with the corresponding resource. |
---|
420 | |
---|
421 | \item |
---|
422 | The thread executing the @waituntil@ then loops until the statement's predicate is satisfied. |
---|
423 | In each iteration, if the predicate is unsatisfied, the @waituntil@ thread blocks. |
---|
424 | When another thread satisfies a resource clause (\eg sends to a channel), it unblocks the @waituntil@ thread. |
---|
425 | This thread checks all clauses for completion, and any completed clauses have their code blocks run. |
---|
426 | While checking clause completion, if enough clauses have been run such that the statement predicate is satisfied, the loop exits early. |
---|
427 | |
---|
428 | \item |
---|
429 | Once the thread escapes the loop, the @select_nodes@ are unregistered from the resources. |
---|
430 | \end{enumerate} |
---|
431 | These steps give a basic overview of how the statement works. |
---|
432 | The following sections shed light on the specific changes and provide more implementation detail. |
---|
433 | |
---|
434 | \subsection{Locks}\label{s:wu_locks} |
---|
435 | |
---|
436 | The \CFA runtime supports a number of spinning and blocking locks, \eg semaphore, MCS, futex, Go mutex, spinlock, owner, \etc. |
---|
437 | Many of these locks satisfy the @is_selectable@ trait, and hence, are resources supported by the @waituntil@ statement. |
---|
438 | For example, the following waits until the thread has acquired lock @l1@ or locks @l2@ and @l3@. |
---|
439 | \begin{cfa} |
---|
440 | owner_lock l1, l2, l3; |
---|
441 | waituntil ( l1 ) { ... } |
---|
442 | or waituntil( l2 ) { ... } |
---|
443 | and waituntil( l3 ) { ... } |
---|
444 | \end{cfa} |
---|
445 | Implicitly, the @waituntil@ is calling the lock acquire for each of these locks to establish a position in the lock's queue of waiting threads. |
---|
446 | When the lock schedules this thread, it unblocks and runs the code block associated with the lock and then releases the lock. |
---|
447 | |
---|
448 | In detail, when a thread waits on multiple locks via a @waituntil@, it enqueues a @select_node@ in each of the lock's waiting queues. |
---|
449 | When a @select_node@ reaches the front of the lock's queue and gains ownership, the thread blocked on the @waituntil@ is unblocked. |
---|
450 | Now, the lock is held by the @waituntil@ thread until the code block is executed, and then the node is unregistered, during which the lock is released. |
---|
451 | Immediately releasing the lock prevents the waiting thread from holding multiple locks and potentially introducing a deadlock. |
---|
452 | As such, the only unregistered nodes associated with locks are the ones that have not run. |
---|
453 | |
---|
454 | \subsection{Timeouts} |
---|
455 | |
---|
456 | A timeout for the @waituntil@ statement is a duration passed to \lstinline[deletekeywords={timeout}]{timeout}, \eg: |
---|
457 | \begin{cquote} |
---|
458 | \begin{tabular}{@{}l|l@{}} |
---|
459 | \multicolumn{2}{@{}l@{}}{\lstinline{Duration D1\{ 1`ms \}, D2\{ 2`ms \}, D3\{ 3`ms \};}} \\ |
---|
460 | \begin{cfa}[deletekeywords={timeout}] |
---|
461 | waituntil( i << C1 ) {} |
---|
462 | or waituntil( i << C2 ) {} |
---|
463 | or waituntil( i << C3 ) {} |
---|
464 | or waituntil( timeout( D1 ) ) {} |
---|
465 | or waituntil( timeout( D2 ) ) {} |
---|
466 | or waituntil( timeout( D3 ) ) {} |
---|
467 | \end{cfa} |
---|
468 | & |
---|
469 | \begin{cfa}[deletekeywords={timeout}] |
---|
470 | waituntil( i << C1 ) {} |
---|
471 | or waituntil( i << C2 ) {} |
---|
472 | or waituntil( i << C3 ) {} |
---|
473 | or waituntil( min( timeout( D1 ), timeout( D2 ), timeout( D3 ) ) {} |
---|
474 | |
---|
475 | |
---|
476 | \end{cfa} |
---|
477 | \end{tabular} |
---|
478 | \end{cquote} |
---|
479 | These examples are basically equivalence. |
---|
480 | Here, the multiple timeouts are useful because the durations can change during execution and the separate clauses provide different code blocks if a timeout triggers. |
---|
481 | Multiple timeouts can also be used with @and@ to provide a minimal delay before proceeding. |
---|
482 | In following example, either channels @C1@ or @C2@ must be satisfied but nothing can be done for at least 1 or 3 seconds after the channel read. |
---|
483 | \begin{cfa}[deletekeywords={timeout}] |
---|
484 | waituntil( i << C1 ); and waituntil( timeout( 1`s ) ); |
---|
485 | or waituntil( i << C2 ); and waituntil( timeout( 3`s ) ); |
---|
486 | \end{cfa} |
---|
487 | If only @C2@ is satisfied, \emph{both} timeout code-blocks trigger. |
---|
488 | Note, the \CFA @waitfor@ statement only provides a single @timeout@ clause because it only supports @or@ semantics. |
---|
489 | |
---|
490 | The \lstinline[deletekeywords={timeout}]{timeout} routine is different from UNIX @sleep@, which blocks for the specified duration and returns the amount of time elapsed since the call started. |
---|
491 | Instead, \lstinline[deletekeywords={timeout}]{timeout} returns a type that supports the @is_selectable@ trait, allowing the type system to select the correct overloaded routine for this context. |
---|
492 | For the @waituntil@, it is more idiomatic for the \lstinline[deletekeywords={timeout}]{timeout} to use the same syntax as other blocking operations instead of having a special language clause. |
---|
493 | |
---|
494 | \subsection{Channels}\label{s:wu_chans} |
---|
495 | |
---|
496 | Channels require more complexity to allow synchronous multiplexing. |
---|
497 | For locks, when an outside thread releases a lock and unblocks the waituntil thread (WUT), the lock's MX property is passed to the WUT (no spinning locks). |
---|
498 | For futures, the outside thread deliveries a value to the future and unblocks any waiting threads, including WUTs. |
---|
499 | In either case, after the WUT unblocks it is safe to execute its the corresponding code block knowing access to the resource is protected by the lock or the read-only state of the future. |
---|
500 | Similarly, for channels, when an outside thread inserts a value into a channel, it must unblock the WUT. |
---|
501 | However, for channels, there is a race issue. |
---|
502 | If the outside thread inserts into the channel and unblocks the WUT, there is a race such that, another thread can remove the channel data, so after the WUT unblocks and attempts to remove from the buffer, it fails, and the WUT must reblock (busy waiting). |
---|
503 | This scenario is a \gls{toctou} race that needs to be consolidated. |
---|
504 | To close the race, the outside thread must detect this case and insert directly into the left-hand side of the channel expression (@i << chan@) rather than into the channel, and then unblock the WUT. |
---|
505 | Now the unblocked WUT is guaranteed to have a satisfied resource and its code block can safely executed. |
---|
506 | The insertion circumvents the channel buffer via the wait-morphing in the \CFA channel implementation \see{Section~\ref{s:chan_impl}}, allowing @waituntil@ channel unblocking to not be special-cased. |
---|
507 | |
---|
508 | Furthermore, if both @and@ and @or@ operators are used, the @or@ operations stop behaving like exclusive-or due to the race among channel operations, \eg: |
---|
509 | \begin{cfa} |
---|
510 | waituntil( i << A ) {} and waituntil( i << B ) {} |
---|
511 | or waituntil( i << C ) {} and waituntil( i << D ) {} |
---|
512 | \end{cfa} |
---|
513 | If exclusive-or semantics are followed, only the code blocks for @A@ and @B@ are run, or the code blocks for @C@ and @D@. |
---|
514 | However, four outside threads can simultaneously put values into @i@ and attempt to unblock the WUT to run the four code-blocks. |
---|
515 | This case introduces a race with complexity that increases with the size of the @waituntil@ statement. |
---|
516 | However, due to TOCTOU issues, it is impossible to know if all resources are available without acquiring all the internal locks of channels in the subtree of the @waituntil@ clauses. |
---|
517 | This approach is a poor solution for two reasons. |
---|
518 | It is possible that once all the locks are acquired the subtree is not satisfied and the locks must be released. |
---|
519 | This work incurs a high cost for signalling threads and heavily increase contention on internal channel locks. |
---|
520 | Furthermore, the @waituntil@ statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible. |
---|
521 | As such, the exclusive-or semantics is lost when using both @and@ and @or@ operators since it cannot be supported without significant complexity and significantly affects @waituntil@ performance. |
---|
522 | |
---|
523 | It was deemed important that exclusive-or semantics are maintained when only @or@ operators are used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel. |
---|
524 | \PAB{explain how this is handled.} |
---|
525 | |
---|
526 | \PAB{I'm lost here to the end of the section. You have no example of the case you want to discuss so a made something up.} |
---|
527 | |
---|
528 | Channels introduce another interesting consideration in their implementation. |
---|
529 | Supporting both reading and writing to a channel in a @waituntil@ means that one @waituntil@ clause may be the notifier by another @waituntil@ clause. |
---|
530 | \begin{cfa} |
---|
531 | waituntil( i << A ) { @B << i;@ } |
---|
532 | or waituntil( i << B ) { @A << i;@ } |
---|
533 | \end{cfa} |
---|
534 | This poses a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel. |
---|
535 | When both a special-case @or@ is inserting to a channel on one thread and another thread is blocked in a special-case @or@ consuming from the same channel there is not one but two races that need to be consolidated by the inserting thread. |
---|
536 | (This race can also occur in the mirrored case with a blocked producer and signalling consumer.) |
---|
537 | For the producing thread to know that the insert succeeded, they need to win the race for their own @waituntil@ and win the race for the other @waituntil@. |
---|
538 | |
---|
539 | Go solves this problem in their select statement by acquiring the internal locks of all channels before registering the select on the channels. |
---|
540 | This eliminates the race since no other threads can operate on the blocked channel since its lock is held. |
---|
541 | This approach is not used in \CFA since the @waituntil@ is polymorphic. |
---|
542 | Not all types in a @waituntil@ have an internal lock, and when using non-channel types acquiring all the locks incurs extra unneeded overhead. |
---|
543 | Instead, this race is consolidated in \CFA in two phases by having an intermediate pending status value for the race. |
---|
544 | This race case is detectable, and if detected, the outside thread first races to set its own race flag to be pending. |
---|
545 | If it succeeds, it then attempts to set the WUT's race flag to its success value. |
---|
546 | If the outside thread successfully sets the WUT's race flag, then the operation can proceed, if not the signalling threads set its own race flag back to the initial value. |
---|
547 | If any other threads attempt to set the producer's flag and see a pending value, they will wait until the value changes before proceeding to ensure that in the case that the producer fails, the signal will not be lost. |
---|
548 | This protocol ensures that signals will not be lost and that the two races can be resolved in a safe manner. |
---|
549 | |
---|
550 | Channels in \CFA have exception based shutdown mechanisms that the @waituntil@ statement needs to support. |
---|
551 | These exception mechanisms were what brought in the @on_selected@ routine. |
---|
552 | This routine is needed by channels to detect if they are closed upon waking from a @waituntil@ statement, to ensure that the appropriate behaviour is taken and an exception is thrown. |
---|
553 | |
---|
554 | \subsection{Guards and Statement Predicate}\label{s:wu_guards} |
---|
555 | Checking for when a synchronous multiplexing utility is done is trivial when it has an or/xor relationship, since any resource becoming available means that the blocked thread can proceed. |
---|
556 | In \uC and \CFA, their \gls{synch_multiplex} utilities involve both an @and@ and @or@ operator, which make the problem of checking for completion of the statement more difficult. |
---|
557 | |
---|
558 | In the \uC @_Select@ statement, this problem is solved by constructing a tree of the resources, where the internal nodes are operators and the leaves are booleans storing the state of each resource. |
---|
559 | The internal nodes also store the statuses of the two subtrees beneath them. |
---|
560 | When resources become available, their corresponding leaf node status is modified and then percolates up into the internal nodes to update the state of the statement. |
---|
561 | Once the root of the tree has both subtrees marked as @true@ then the statement is complete. |
---|
562 | As an optimization, when the internal nodes are updated, their subtrees marked as @true@ are pruned and are not touched again. |
---|
563 | To support statement guards in \uC, the tree prunes a branch if the corresponding guard is false. |
---|
564 | |
---|
565 | The \CFA @waituntil@ statement blocks a thread until a set of resources have become available that satisfy the underlying predicate. |
---|
566 | The waiting condition of the @waituntil@ statement can be represented as a predicate over the resources, joined by the @waituntil@ operators, where a resource is @true@ if it is available, and @false@ otherwise. |
---|
567 | In \CFA, this representation is used as the mechanism to check if a thread is done waiting on the @waituntil@. |
---|
568 | Leveraging the compiler, a predicate routine is generated per @waituntil@ that when passed the statuses of the resources, returns @true@ when the @waituntil@ is done, and false otherwise. |
---|
569 | To support guards on the \CFA @waituntil@ statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate. |
---|
570 | |
---|
571 | \uC's @_Select@, supports operators both inside and outside of the clauses. |
---|
572 | \eg in the following example the code blocks will run once their corresponding predicate inside the round braces is satisfied. |
---|
573 | |
---|
574 | % C_TODO put this is uC++ code style not cfa-style |
---|
575 | \begin{cfa} |
---|
576 | Future_ISM<int> A, B, C, D; |
---|
577 | _Select( A || B && C ) { ... } |
---|
578 | and _Select( D && E ) { ... } |
---|
579 | \end{cfa} |
---|
580 | |
---|
581 | This is more expressive that the @waituntil@ statement in \CFA. |
---|
582 | In \CFA, since the @waituntil@ statement supports more resources than just futures, implementing operators inside clauses was avoided for a few reasons. |
---|
583 | As a motivating example, suppose \CFA supported operators inside clauses and consider the code snippet in Figure~\ref{f:wu_inside_op}. |
---|
584 | |
---|
585 | \begin{figure} |
---|
586 | \begin{cfa} |
---|
587 | owner_lock A, B, C, D; |
---|
588 | waituntil( A && B ) { ... } |
---|
589 | or waituntil( C && D ) { ... } |
---|
590 | \end{cfa} |
---|
591 | \caption{Example of unsupported operators inside clauses in \CFA.} |
---|
592 | \label{f:wu_inside_op} |
---|
593 | \end{figure} |
---|
594 | |
---|
595 | If the @waituntil@ in Figure~\ref{f:wu_inside_op} works with the same semantics as described and acquires each lock as it becomes available, it opens itself up to possible deadlocks since it is now holding locks and waiting on other resources. |
---|
596 | Other semantics would be needed to ensure that this operation is safe. |
---|
597 | One possibility is to use \CC's @scoped_lock@ approach that was described in Section~\ref{s:DeadlockAvoidance}, however the potential for livelock leaves much to be desired. |
---|
598 | Another possibility would be to use resource ordering similar to \CFA's @mutex@ statement, but that alone is not sufficient if the resource ordering is not used everywhere. |
---|
599 | Additionally, using resource ordering could conflict with other semantics of the @waituntil@ statement. |
---|
600 | To show this conflict, consider if the locks in Figure~\ref{f:wu_inside_op} were ordered @D@, @B@, @C@, @A@. |
---|
601 | If all the locks are available, it becomes complex to both respect the ordering of the @waituntil@ in Figure~\ref{f:wu_inside_op} when choosing which code block to run and also respect the lock ordering of @D@, @B@, @C@, @A@ at the same time. |
---|
602 | One other way this could be implemented is to wait until all resources for a given clause are available before proceeding to acquire them, but this also quickly becomes a poor approach. |
---|
603 | This approach won't work due to TOCTOU issues; it is not possible to ensure that the full set resources are available without holding them all first. |
---|
604 | Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems involved, but it was not deemed an important feature when taking into account the runtime cost that would need to be paid to handle these situations. |
---|
605 | The problem of operators inside clauses also becomes a difficult issue to handle when supporting channels. |
---|
606 | If internal operators were supported, it would require some way to ensure that channels used with internal operators are modified on if and only if the corresponding code block is run, but that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}. |
---|
607 | |
---|
608 | \subsection{The full \lstinline{waituntil} picture} |
---|
609 | Now that the details have been discussed, the full pseudocode of the waituntil is presented in Figure~\ref{f:WU_Full_Impl}. |
---|
610 | |
---|
611 | \begin{figure} |
---|
612 | \begin{cfa} |
---|
613 | bool when_conditions[N]; |
---|
614 | for ( node in s ) $\C[3.75in]{// evaluate guards}$ |
---|
615 | if ( node has guard ) |
---|
616 | when_conditions[node] = node_guard; |
---|
617 | else |
---|
618 | when_conditions[node] = true; |
---|
619 | |
---|
620 | select_nodes s[N]; $\C{// declare N select nodes}$ |
---|
621 | try { |
---|
622 | for ( node in s ) $\C{// register nodes}$ |
---|
623 | if ( when_conditions[node] ) |
---|
624 | register_select( resource, node ); |
---|
625 | |
---|
626 | // ... set statuses for nodes with when_conditions[node] == false ... |
---|
627 | |
---|
628 | while ( statement predicate not satisfied ) { $\C{// check predicate}$ |
---|
629 | // block |
---|
630 | for ( resource in waituntil statement ) { $\C{// run true code blocks}$ |
---|
631 | if ( statement predicate is satisfied ) break; |
---|
632 | if ( resource is avail ) |
---|
633 | try { |
---|
634 | if( on_selected( resource ) ) $\C{// conditionally run block}$ |
---|
635 | run code block |
---|
636 | } finally $\C{// for exception safety}$ |
---|
637 | unregister_select( resource, node ); $\C{// immediate unregister}$ |
---|
638 | } |
---|
639 | } |
---|
640 | } finally { $\C{// for exception safety}$ |
---|
641 | for ( registered nodes in s ) $\C{// deregister nodes}$ |
---|
642 | if ( when_conditions[node] && unregister_select( resource, node ) |
---|
643 | && on_selected( resource ) ) |
---|
644 | run code block $\C{// run code block upon unregister}\CRT$ |
---|
645 | } |
---|
646 | \end{cfa} |
---|
647 | \caption{Full \lstinline{waituntil} Pseudocode Implementation} |
---|
648 | \label{f:WU_Full_Impl} |
---|
649 | \end{figure} |
---|
650 | |
---|
651 | In comparison to Figure~\ref{f:WU_Impl}, this pseudocode now includes the specifics discussed in this chapter. |
---|
652 | Some things to note are as follows: |
---|
653 | The @finally@ blocks provide exception-safe RAII unregistering of nodes, and in particular, the @finally@ inside the innermost loop performs the immediate unregistering required for deadlock-freedom that was mentioned in Section~\ref{s:wu_locks}. |
---|
654 | The @when_conditions@ array is used to store the boolean result of evaluating each guard at the beginning of the @waituntil@, and it is used to conditionally omit operations on resources with @false@ guards. |
---|
655 | As discussed in Section~\ref{s:wu_chans}, this pseudocode includes code blocks conditional on the result of both @on_selected@ and @unregister_select@, which allows the channel implementation to ensure that all available channel resources will have their corresponding code block run. |
---|
656 | |
---|
657 | \section{Waituntil Performance} |
---|
658 | |
---|
659 | Similar facilities to @waituntil@ are discussed at the start of this chapter in C, Ada, Rust, \CC, and OCaml. |
---|
660 | However, these facilities are either not meaningful or feasible to benchmark against. |
---|
661 | The UNIX @select@ and related utilities are not comparable since they are system calls that go into the kernel and operate on file descriptors, whereas the @waituntil@ exists solely in user space. |
---|
662 | Ada's @select@ only operates on methods, which is done in \CFA via the @waitfor@ utility so it is not meaningful to benchmark against the @waituntil@, which cannot wait on the same resource. |
---|
663 | Rust and \CC only offer a busy-wait based approach, which is not comparable to a blocking approach. |
---|
664 | OCaml's @select@ waits on channels that are not comparable with \CFA and Go channels, so OCaml @select@ is not benchmarked against Go's @select@ and \CFA's @waituntil@. |
---|
665 | |
---|
666 | The two \gls{synch_multiplex} utilities that are in the realm of comparability with the \CFA @waituntil@ statement are the Go @select@ statement and the \uC @_Select@ statement. |
---|
667 | As such, two microbenchmarks are presented, one for Go and one for \uC to contrast the systems. |
---|
668 | Given the differences in features, polymorphism, and expressibility between @waituntil@ and @select@, and @_Select@, the aim of the microbenchmarking in this chapter is to show that these implementations lie in the same realm of performance, not to pick a winner. |
---|
669 | |
---|
670 | \subsection{Channel Benchmark} |
---|
671 | |
---|
672 | The channel multiplexing microbenchmarks compare \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}, where the resource being waited on is a set of channels. |
---|
673 | The basic structure of the microbenchmark has the number of cores split evenly between producer and consumer threads, \ie, with 8 cores there are 4 producer and 4 consumer threads. |
---|
674 | The number of resource clauses $C$ is also varied across 2, 4, and 8 clauses, where each clause has different channel that is waits on. |
---|
675 | Each producer and consumer repeatedly waits to either produce or consume from one of the $C$ clauses and respective channels. |
---|
676 | For example, in \CFA syntax, the work loop in the consumer main with $C = 4$ clauses is: |
---|
677 | \begin{cfa} |
---|
678 | for () |
---|
679 | waituntil( val << chans[0] ); or waituntil( val << chans[1] ); |
---|
680 | or waituntil( val << chans[2] ); or waituntil( val << chans[3] ); |
---|
681 | \end{cfa} |
---|
682 | A successful consumption is counted as a channel operation, and the throughput of these operations is measured over 10 seconds. |
---|
683 | The first microbenchmark measures throughput of the producers and consumer synchronously waiting on the channels and the second has the threads asynchronously wait on the channels. |
---|
684 | The results are shown in Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} respectively. |
---|
685 | |
---|
686 | \begin{figure} |
---|
687 | \centering |
---|
688 | \captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize} |
---|
689 | \subfloat[AMD]{ |
---|
690 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_2.pgf}} |
---|
691 | } |
---|
692 | \subfloat[Intel]{ |
---|
693 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_2.pgf}} |
---|
694 | } |
---|
695 | \bigskip |
---|
696 | |
---|
697 | \subfloat[AMD]{ |
---|
698 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_4.pgf}} |
---|
699 | } |
---|
700 | \subfloat[Intel]{ |
---|
701 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_4.pgf}} |
---|
702 | } |
---|
703 | \bigskip |
---|
704 | |
---|
705 | \subfloat[AMD]{ |
---|
706 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_8.pgf}} |
---|
707 | } |
---|
708 | \subfloat[Intel]{ |
---|
709 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_8.pgf}} |
---|
710 | } |
---|
711 | \caption{The channel synchronous multiplexing benchmark comparing Go select and \CFA \lstinline{waituntil} statement throughput (higher is better).} |
---|
712 | \label{f:select_contend_bench} |
---|
713 | \end{figure} |
---|
714 | |
---|
715 | \begin{figure} |
---|
716 | \centering |
---|
717 | \captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize} |
---|
718 | \subfloat[AMD]{ |
---|
719 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_2.pgf}} |
---|
720 | } |
---|
721 | \subfloat[Intel]{ |
---|
722 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_2.pgf}} |
---|
723 | } |
---|
724 | \bigskip |
---|
725 | |
---|
726 | \subfloat[AMD]{ |
---|
727 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_4.pgf}} |
---|
728 | } |
---|
729 | \subfloat[Intel]{ |
---|
730 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_4.pgf}} |
---|
731 | } |
---|
732 | \bigskip |
---|
733 | |
---|
734 | \subfloat[AMD]{ |
---|
735 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_8.pgf}} |
---|
736 | } |
---|
737 | \subfloat[Intel]{ |
---|
738 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_8.pgf}} |
---|
739 | } |
---|
740 | \caption{The asynchronous multiplexing channel benchmark comparing Go select and \CFA \lstinline{waituntil} statement throughput (higher is better).} |
---|
741 | \label{f:select_spin_bench} |
---|
742 | \end{figure} |
---|
743 | |
---|
744 | Both Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} have similar results when comparing @select@ and @waituntil@. |
---|
745 | In the AMD benchmarks, the performance is very similar as the number of cores scale. |
---|
746 | The AMD machine has been observed to have higher caching contention cost, which creates on a bottleneck on the channel locks, which results in similar scaling between \CFA and Go. |
---|
747 | At low cores, Go has significantly better performance, which is likely due to an optimization in their scheduler. |
---|
748 | Go heavily optimizes thread handoffs on their local run-queue, which can result in very good performance for low numbers of threads which are parking/unparking each other~\cite{go:sched}. |
---|
749 | In the Intel benchmarks, \CFA performs better than Go as the number of cores scale and as the number of clauses scale. |
---|
750 | This is likely due to Go's implementation choice of acquiring all channel locks when registering and unregistering channels on a @select@. |
---|
751 | Go then has to hold a lock for every channel, so it follows that this results in worse performance as the number of channels increase. |
---|
752 | In \CFA, since races are consolidated without holding all locks, it scales much better both with cores and clauses since more work can occur in parallel. |
---|
753 | This scalability difference is more significant on the Intel machine than the AMD machine since the Intel machine has been observed to have lower cache contention costs. |
---|
754 | |
---|
755 | The Go approach of holding all internal channel locks in the select has some additional drawbacks. |
---|
756 | This approach results in some pathological cases where Go's system throughput on channels can greatly suffer. |
---|
757 | Consider the case where there are two channels, @A@ and @B@. |
---|
758 | There are both a producer thread and a consumer thread, @P1@ and @C1@, selecting both @A@ and @B@. |
---|
759 | Additionally, there is another producer and another consumer thread, @P2@ and @C2@, that are both operating solely on @B@. |
---|
760 | Compared to \CFA this setup results in significantly worse performance since @P2@ and @C2@ cannot operate in parallel with @P1@ and @C1@ due to all locks being acquired. |
---|
761 | This case may not be as pathological as it may seem. |
---|
762 | If the set of channels belonging to a select have channels that overlap with the set of another select, they lose the ability to operate on their select in parallel. |
---|
763 | The implementation in \CFA only ever holds a single lock at a time, resulting in better locking granularity. |
---|
764 | Comparison of this pathological case is shown in Table~\ref{t:pathGo}. |
---|
765 | The AMD results highlight the worst case scenario for Go since contention is more costly on this machine than the Intel machine. |
---|
766 | |
---|
767 | \begin{table}[t] |
---|
768 | \centering |
---|
769 | \setlength{\extrarowheight}{2pt} |
---|
770 | \setlength{\tabcolsep}{5pt} |
---|
771 | |
---|
772 | \caption{Throughput (channel operations per second) of \CFA and Go for a pathologically bad case for contention in Go's select implementation} |
---|
773 | \label{t:pathGo} |
---|
774 | \begin{tabular}{*{5}{r|}r} |
---|
775 | & \multicolumn{1}{c|}{\CFA} & \multicolumn{1}{c@{}}{Go} \\ |
---|
776 | \hline |
---|
777 | AMD & \input{data/nasus_Order} \\ |
---|
778 | \hline |
---|
779 | Intel & \input{data/pyke_Order} |
---|
780 | \end{tabular} |
---|
781 | \end{table} |
---|
782 | |
---|
783 | Another difference between Go and \CFA is the order of clause selection when multiple clauses are available. |
---|
784 | Go "randomly" selects a clause, but \CFA chooses the clause in the order they are listed~\cite{go:select}. |
---|
785 | This \CFA design decision allows users to set implicit priorities, which can result in more predictable behaviour, and even better performance in certain cases, such as the case shown in Table~\ref{t:pathGo}. |
---|
786 | If \CFA didn't have priorities, the performance difference in Table~\ref{t:pathGo} would be less significant since @P1@ and @C1@ would try to compete to operate on @B@ more often with random selection. |
---|
787 | |
---|
788 | \subsection{Future Benchmark} |
---|
789 | The future benchmark compares \CFA's @waituntil@ with \uC's @_Select@, with both utilities waiting on futures. |
---|
790 | Both \CFA's @waituntil@ and \uC's @_Select@ have very similar semantics, however @_Select@ can only wait on futures, whereas the @waituntil@ is polymorphic. |
---|
791 | They both support @and@ and @or@ operators, but the underlying implementation of the operators differs between @waituntil@ and @_Select@. |
---|
792 | The @waituntil@ statement checks for statement completion using a predicate function, whereas the @_Select@ statement maintains a tree that represents the state of the internal predicate. |
---|
793 | |
---|
794 | \begin{figure} |
---|
795 | \centering |
---|
796 | \subfloat[AMD Future Synchronization Benchmark]{ |
---|
797 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Future.pgf}} |
---|
798 | \label{f:futureAMD} |
---|
799 | } |
---|
800 | \subfloat[Intel Future Synchronization Benchmark]{ |
---|
801 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Future.pgf}} |
---|
802 | \label{f:futureIntel} |
---|
803 | } |
---|
804 | \caption{\CFA \lstinline{waituntil} and \uC \lstinline{_Select} statement throughput synchronizing on a set of futures with varying wait predicates (higher is better).} |
---|
805 | \caption{} |
---|
806 | \label{f:futurePerf} |
---|
807 | \end{figure} |
---|
808 | |
---|
809 | This microbenchmark aims to measure the impact of various predicates on the performance of the @waituntil@ and @_Select@ statements. |
---|
810 | This benchmark and section does not try to directly compare the @waituntil@ and @_Select@ statements since the performance of futures in \CFA and \uC differ by a significant margin, making them incomparable. |
---|
811 | Results of this benchmark are shown in Figure~\ref{f:futurePerf}. |
---|
812 | Each set of columns is marked with a name representing the predicate for that set of columns. |
---|
813 | The predicate name and corresponding @waituntil@ statement is shown below: |
---|
814 | |
---|
815 | \begin{cfa} |
---|
816 | #ifdef OR |
---|
817 | waituntil( A ) { get( A ); } |
---|
818 | or waituntil( B ) { get( B ); } |
---|
819 | or waituntil( C ) { get( C ); } |
---|
820 | #endif |
---|
821 | #ifdef AND |
---|
822 | waituntil( A ) { get( A ); } |
---|
823 | and waituntil( B ) { get( B ); } |
---|
824 | and waituntil( C ) { get( C ); } |
---|
825 | #endif |
---|
826 | #ifdef ANDOR |
---|
827 | waituntil( A ) { get( A ); } |
---|
828 | and waituntil( B ) { get( B ); } |
---|
829 | or waituntil( C ) { get( C ); } |
---|
830 | #endif |
---|
831 | #ifdef ORAND |
---|
832 | (waituntil( A ) { get( A ); } |
---|
833 | or waituntil( B ) { get( B ); }) // brackets create higher precedence for or |
---|
834 | and waituntil( C ) { get( C ); } |
---|
835 | #endif |
---|
836 | \end{cfa} |
---|
837 | |
---|
838 | In Figure~\ref{f:futurePerf}, the @OR@ column for \CFA is more performant than the other \CFA predicates, likely due to the special-casing of @waituntil@ statements with only @or@ operators. |
---|
839 | For both \uC and \CFA the @AND@ column is the least performant, which is expected since all three futures need to be fulfilled for each statement completion, unlike any of the other operators. |
---|
840 | Interestingly, \CFA has lower variation across predicates on the AMD (excluding the special OR case), whereas \uC has lower variation on the Intel. |
---|