[a0c746df] | 1 | % ====================================================================== |
---|
| 2 | % ====================================================================== |
---|
| 3 | \chapter{Waituntil}\label{s:waituntil} |
---|
| 4 | % ====================================================================== |
---|
| 5 | % ====================================================================== |
---|
| 6 | |
---|
[5e81a9c] | 7 | Consider the following motivating problem. |
---|
[847ab8f] | 8 | There are $N$ stalls (resources) in a bathroom and there are $M$ people (threads) using the bathroom. |
---|
[5e81a9c] | 9 | Each stall has its own lock since only one person may occupy a stall at a time. |
---|
[847ab8f] | 10 | Humans solve this problem in the following way. |
---|
[5e81a9c] | 11 | They check if all of the stalls are occupied. |
---|
[847ab8f] | 12 | If not, they enter and claim an available stall. |
---|
| 13 | If they are all occupied, people queue and watch the stalls until one is free, and then enter and lock the stall. |
---|
| 14 | This solution can be implemented on a computer easily, if all threads are waiting on all stalls and agree to queue. |
---|
| 15 | |
---|
[5e81a9c] | 16 | Now the problem is extended. |
---|
[c03c1ac] | 17 | Some stalls are wheelchair accessible and some stalls have gender identification. |
---|
[847ab8f] | 18 | Each person (thread) may be limited to only one kind of stall or may choose among different kinds of stalls that match their criteria. |
---|
| 19 | Immediately, the problem becomes more difficult. |
---|
[c03c1ac] | 20 | A single queue no longer solves the problem. |
---|
[847ab8f] | 21 | What happens when there is a stall available that the person at the front of the queue cannot choose? |
---|
[c03c1ac] | 22 | The na\"ive solution has each thread spin indefinitely continually checking every matching kind of stall(s) until a suitable one is free. |
---|
[847ab8f] | 23 | This approach is insufficient since it wastes cycles and results in unfairness among waiting threads as a thread can acquire the first matching stall without regard to the waiting time of other threads. |
---|
| 24 | Waiting for the first appropriate stall (resource) that becomes available without spinning is an example of \gls{synch_multiplex}: the ability to wait synchronously for one or more resources based on some selection criteria. |
---|
[a0c746df] | 25 | |
---|
[1ae3f185] | 26 | \section{History of Synchronous Multiplexing}\label{s:History} |
---|
| 27 | |
---|
[a0c746df] | 28 | There is a history of tools that provide \gls{synch_multiplex}. |
---|
[847ab8f] | 29 | Some well known \gls{synch_multiplex} tools include Unix system utilities: @select@~\cite{linux:select}, @poll@~\cite{linux:poll}, and @epoll@~\cite{linux:epoll}, and the @select@ statement provided by Go~\cite{go:selectref}, Ada~\cite[\S~9.7]{Ada16}, and \uC~\cite[\S~3.3.1]{uC++}. |
---|
| 30 | The concept and theory surrounding \gls{synch_multiplex} was introduced by Hoare in his 1985 book, Communicating Sequential Processes (CSP)~\cite{Hoare85}, |
---|
| 31 | \begin{quote} |
---|
| 32 | A communication is an event that is described by a pair $c.v$ where $c$ is the name of the channel on which the communication takes place and $v$ is the value of the message which passes.~\cite[p.~113]{Hoare85} |
---|
| 33 | \end{quote} |
---|
| 34 | The ideas in CSP were implemented by Roscoe and Hoare in the language Occam~\cite{Roscoe88}. |
---|
| 35 | |
---|
| 36 | Both CSP and Occam include the ability to wait for a \Newterm{choice} among receiver channels and \Newterm{guards} to toggle which receives are valid. |
---|
| 37 | For example, |
---|
| 38 | \begin{cfa}[mathescape] |
---|
| 39 | (@G1@(x) $\rightarrow$ P @|@ @G2@(y) $\rightarrow$ Q ) |
---|
| 40 | \end{cfa} |
---|
[c03c1ac] | 41 | waits for either channel @x@ or @y@ to have a value, if and only if guards @G1@ and @G2@ are true; |
---|
[847ab8f] | 42 | if only one guard is true, only one channel receives, and if both guards are false, no receive occurs. |
---|
| 43 | % extended CSP with a \gls{synch_multiplex} construct @ALT@, which waits for one resource to be available and then executes a corresponding block of code. |
---|
| 44 | In detail, waiting for one resource out of a set of resources can be thought of as a logical exclusive-or over the set of resources. |
---|
[a0c746df] | 45 | Guards are a conditional operator similar to an @if@, except they apply to the resource being waited on. |
---|
[847ab8f] | 46 | If a guard is false, then the resource it guards is not in the set of resources being waited on. |
---|
[7ed01be] | 47 | If all guards are false, the ALT, Occam's \gls{synch_multiplex} statement, does nothing and the thread continues. |
---|
[847ab8f] | 48 | Guards can be simulated using @if@ statements as shown in~\cite[rule~2.4, p~183]{Roscoe88} |
---|
| 49 | \begin{lstlisting}[basicstyle=\rm,mathescape] |
---|
| 50 | ALT( $b$ & $g$ $P$, $G$ ) = IF ( $b$ ALT($\,g$ $P$, $G$ ), $\neg\,$b ALT( $G$ ) ) (boolean guard elim). |
---|
| 51 | \end{lstlisting} |
---|
| 52 | but require $2^N-1$ @if@ statements, where $N$ is the number of guards. |
---|
| 53 | The exponential blowup comes from applying rule 2.4 repeatedly, since it works on one guard at a time. |
---|
[c03c1ac] | 54 | Figure~\ref{f:wu_if} shows in \CFA an example of applying rule 2.4 for three guards. |
---|
[847ab8f] | 55 | Also, notice the additional code duplication for statements @S1@, @S2@, and @S3@. |
---|
[760c88c] | 56 | |
---|
| 57 | \begin{figure} |
---|
[847ab8f] | 58 | \centering |
---|
| 59 | \begin{lrbox}{\myboxA} |
---|
[760c88c] | 60 | \begin{cfa} |
---|
[847ab8f] | 61 | when( G1 ) |
---|
| 62 | waituntil( R1 ) S1 |
---|
| 63 | or when( G2 ) |
---|
| 64 | waituntil( R2 ) S2 |
---|
| 65 | or when( G3 ) |
---|
| 66 | waituntil( R3 ) S3 |
---|
| 67 | |
---|
| 68 | |
---|
| 69 | |
---|
| 70 | |
---|
| 71 | |
---|
| 72 | |
---|
| 73 | |
---|
| 74 | \end{cfa} |
---|
| 75 | \end{lrbox} |
---|
| 76 | |
---|
| 77 | \begin{lrbox}{\myboxB} |
---|
| 78 | \begin{cfa} |
---|
| 79 | if ( G1 ) |
---|
| 80 | if ( G2 ) |
---|
| 81 | if ( G3 ) waituntil( R1 ) S1 or waituntil( R2 ) S2 or waituntil( R3 ) S3 |
---|
| 82 | else waituntil( R1 ) S1 or waituntil( R2 ) S2 |
---|
| 83 | else |
---|
| 84 | if ( G3 ) waituntil( R1 ) S1 or waituntil( R3 ) S3 |
---|
| 85 | else waituntil( R1 ) S1 |
---|
| 86 | else |
---|
| 87 | if ( G2 ) |
---|
| 88 | if ( G3 ) waituntil( R2 ) S2 or waituntil( R3 ) S3 |
---|
| 89 | else waituntil( R2 ) S2 |
---|
| 90 | else |
---|
| 91 | if ( G3 ) waituntil( R3 ) S3 |
---|
[760c88c] | 92 | \end{cfa} |
---|
[847ab8f] | 93 | \end{lrbox} |
---|
| 94 | |
---|
| 95 | \subfloat[Guards]{\label{l:guards}\usebox\myboxA} |
---|
| 96 | \hspace*{5pt} |
---|
| 97 | \vrule |
---|
| 98 | \hspace*{5pt} |
---|
| 99 | \subfloat[Simulated Guards]{\label{l:simulated_guards}\usebox\myboxB} |
---|
| 100 | \caption{\CFA guard simulated with \lstinline{if} statement.} |
---|
[760c88c] | 101 | \label{f:wu_if} |
---|
| 102 | \end{figure} |
---|
[a0c746df] | 103 | |
---|
[847ab8f] | 104 | When discussing \gls{synch_multiplex} implementations, the resource being multiplexed is important. |
---|
[aae9c17] | 105 | While CSP waits on channels, the earliest known implementation of synch\-ronous multiplexing is Unix's @select@~\cite{linux:select}, multiplexing over file descriptors, which conceptually differ from channels in name only. |
---|
[847ab8f] | 106 | The @select@ system-call is passed three sets of file descriptors (read, write, exceptional) to wait on and an optional timeout. |
---|
| 107 | @select@ blocks until either some subset of file descriptors are available or the timeout expires. |
---|
| 108 | All file descriptors that are ready are returned by modifying the argument sets to only contain the ready descriptors. |
---|
| 109 | |
---|
| 110 | This early implementation differs from the theory presented in CSP: when the call from @select@ returns it may provide more than one ready file descriptor. |
---|
| 111 | As such, @select@ has logical-or multiplexing semantics, whereas the theory described exclusive-or semantics. |
---|
| 112 | It is possible to achieve exclusive-or semantics with @select@ by arbitrarily operating on only one of the returned descriptors. |
---|
| 113 | @select@ passes the interest set of file descriptors between application and kernel in the form of a worst-case sized bit-mask, where the worst-case is the largest numbered file descriptor. |
---|
[c03c1ac] | 114 | @poll@ reduces the size of the interest sets changing from a bit mask to a linked data structure, independent of the file-descriptor values. |
---|
[847ab8f] | 115 | @epoll@ further reduces the data passed per call by keeping the interest set in the kernel, rather than supplying it on every call. |
---|
| 116 | |
---|
| 117 | These early \gls{synch_multiplex} tools interact directly with the operating system and others are used to communicate among processes. |
---|
| 118 | Later, \gls{synch_multiplex} started to appear in applications, via programming languages, to support fast multiplexed concurrent communication among threads. |
---|
| 119 | An early example of \gls{synch_multiplex} is the @select@ statement in Ada~\cite[\S~9.7]{Ichbiah79}. |
---|
[c03c1ac] | 120 | This @select@ allows a task object, with their own threads, to multiplex over a subset of asynchronous calls to its methods. |
---|
| 121 | The Ada @select@ has the same exclusive-or semantics and guards as Occam ALT; |
---|
[847ab8f] | 122 | however, it multiplexes over methods rather than channels. |
---|
| 123 | |
---|
| 124 | \begin{figure} |
---|
[afb3d68] | 125 | \begin{lstlisting}[language=ada,literate=,{moredelim={**[is][\color{red}]{@}{@}}}] |
---|
| 126 | task type buffer is -- thread type |
---|
[847ab8f] | 127 | ... -- buffer declarations |
---|
| 128 | count : integer := 0; |
---|
| 129 | begin -- thread starts here |
---|
| 130 | loop |
---|
| 131 | select |
---|
[afb3d68] | 132 | @when count < Size@ => -- guard |
---|
| 133 | @accept insert( elem : in ElemType )@ do -- method |
---|
[847ab8f] | 134 | ... -- add to buffer |
---|
| 135 | count := count + 1; |
---|
| 136 | end; |
---|
| 137 | -- executed if this accept called |
---|
| 138 | or |
---|
[afb3d68] | 139 | @when count > 0@ => -- guard |
---|
| 140 | @accept remove( elem : out ElemType )@ do -- method |
---|
[847ab8f] | 141 | ... --remove and return from buffer via parameter |
---|
| 142 | count := count - 1; |
---|
| 143 | end; |
---|
| 144 | -- executed if this accept called |
---|
[afb3d68] | 145 | or @delay 10.0@; -- unblock after 10 seconds without call |
---|
| 146 | or @else@ -- do not block, cannot appear with delay |
---|
[847ab8f] | 147 | end select; |
---|
| 148 | end loop; |
---|
| 149 | end buffer; |
---|
| 150 | var buf : buffer; -- create task object and start thread in task body |
---|
| 151 | \end{lstlisting} |
---|
| 152 | \caption{Ada Bounded Buffer} |
---|
| 153 | \label{f:BB_Ada} |
---|
| 154 | \end{figure} |
---|
| 155 | |
---|
[c03c1ac] | 156 | Figure~\ref{f:BB_Ada} shows the outline of a bounded buffer implemented with an Ada task. |
---|
[aae9c17] | 157 | Note that a task method is associated with the \lstinline[language=ada]{accept} clause of the \lstinline[language=ada]{select} statement, rather than being a separate routine. |
---|
[847ab8f] | 158 | The thread executing the loop in the task body blocks at the \lstinline[language=ada]{select} until a call occurs to @insert@ or @remove@. |
---|
| 159 | Then the appropriate \lstinline[language=ada]{accept} method is run with the called arguments. |
---|
[c03c1ac] | 160 | Hence, the \lstinline[language=ada]{select} statement provides rendezvous points for threads, rather than providing channels with message passing. |
---|
[847ab8f] | 161 | The \lstinline[language=ada]{select} statement also provides a timeout and @else@ (nonblocking), which changes synchronous multiplexing to asynchronous. |
---|
| 162 | Now the thread polls rather than blocks. |
---|
| 163 | |
---|
| 164 | Another example of programming-language \gls{synch_multiplex} is Go using a @select@ statement with channels~\cite{go:selectref}. |
---|
[000d68f] | 165 | Figure~\ref{l:BB_Go} shows the outline of a bounded buffer administrator implemented with a Go routine. |
---|
[847ab8f] | 166 | Here two channels are used for inserting and removing by client producers and consumers, respectively. |
---|
[000d68f] | 167 | (The @done@ channel is used to synchronize with the program main.) |
---|
[847ab8f] | 168 | Go's @select@ has the same exclusive-or semantics as the ALT primitive from Occam and associated code blocks for each clause like ALT and Ada. |
---|
| 169 | However, unlike Ada and ALT, Go does not provide guards for the \lstinline[language=go]{case} clauses of the \lstinline[language=go]{select}. |
---|
[9509d67a] | 170 | As such, the exponential blowup can be seen comparing Go and \uC in Figure~\ref{f:AdaMultiplexing}. |
---|
[847ab8f] | 171 | Go also provides a timeout via a channel and a @default@ clause like Ada @else@ for asynchronous multiplexing. |
---|
| 172 | |
---|
| 173 | \begin{figure} |
---|
| 174 | \centering |
---|
| 175 | |
---|
| 176 | \begin{lrbox}{\myboxA} |
---|
[d5f5eb7] | 177 | \begin{lstlisting}[language=go,literate=,{moredelim={**[is][\color{red}]{@}{@}}}] |
---|
[000d68f] | 178 | insert := make( chan int ) |
---|
| 179 | remove := make( chan * int ) |
---|
| 180 | buffer := make( chan int Size ) |
---|
| 181 | done := make( chan int ) |
---|
| 182 | count := 0 |
---|
| 183 | func in_buf( int val ) { |
---|
[d5f5eb7] | 184 | buffer <- val |
---|
| 185 | count++ |
---|
[000d68f] | 186 | } |
---|
| 187 | func out_buf( int * ptr ) { |
---|
[d5f5eb7] | 188 | *ptr := <-buffer |
---|
| 189 | count-- |
---|
[000d68f] | 190 | } |
---|
| 191 | func BoundedBuffer { |
---|
[d5f5eb7] | 192 | L: for { |
---|
| 193 | if ( count < Size && count > 0 ) { |
---|
| 194 | select { // wait for message |
---|
| 195 | @case i := <- insert: in_buf( i )@ |
---|
| 196 | @case p := <- remove: out_buf( p )@ |
---|
| 197 | case <- done: break L |
---|
| 198 | } |
---|
| 199 | } else if ( count < Size ) { |
---|
| 200 | select { // wait for message |
---|
| 201 | @case i := <- insert: in_buf( i )@ |
---|
| 202 | case <- done: break L |
---|
| 203 | } |
---|
| 204 | } else ( count > 0 ) { |
---|
| 205 | select { // wait for message |
---|
| 206 | @case p := <- remove: out_buf( p )@ |
---|
| 207 | case <- done: break L; |
---|
| 208 | } |
---|
| 209 | } |
---|
| 210 | } |
---|
| 211 | done <- 0 |
---|
[000d68f] | 212 | } |
---|
[847ab8f] | 213 | func main() { |
---|
[000d68f] | 214 | go BoundedBuffer() // start administrator |
---|
[847ab8f] | 215 | } |
---|
| 216 | \end{lstlisting} |
---|
| 217 | \end{lrbox} |
---|
| 218 | |
---|
| 219 | \begin{lrbox}{\myboxB} |
---|
[afb3d68] | 220 | \begin{lstlisting}[language=uC++,{moredelim={**[is][\color{red}]{@}{@}}}] |
---|
[d5f5eb7] | 221 | |
---|
| 222 | |
---|
| 223 | |
---|
| 224 | |
---|
| 225 | |
---|
| 226 | |
---|
| 227 | |
---|
| 228 | |
---|
| 229 | |
---|
| 230 | |
---|
| 231 | |
---|
| 232 | |
---|
[847ab8f] | 233 | _Task BoundedBuffer { |
---|
[000d68f] | 234 | int * buffer; |
---|
| 235 | int front = back = count = 0; |
---|
[847ab8f] | 236 | public: |
---|
[d5f5eb7] | 237 | // ... constructor implementation |
---|
[847ab8f] | 238 | void insert( int elem ) { |
---|
[000d68f] | 239 | buffer[front] = elem; |
---|
[d5f5eb7] | 240 | front = ( front + 1 ) % Size; |
---|
[847ab8f] | 241 | count += 1; |
---|
| 242 | } |
---|
| 243 | int remove() { |
---|
[000d68f] | 244 | int ret = buffer[back]; |
---|
[d5f5eb7] | 245 | back = ( back + 1 ) % Size; |
---|
[847ab8f] | 246 | count -= 1; |
---|
[d5f5eb7] | 247 | return ret; |
---|
[847ab8f] | 248 | } |
---|
| 249 | private: |
---|
| 250 | void main() { |
---|
| 251 | for ( ;; ) { |
---|
| 252 | _Accept( ~buffer ) break; |
---|
[afb3d68] | 253 | @or _When ( count < Size ) _Accept( insert )@; |
---|
| 254 | @or _When ( count > 0 ) _Accept( remove )@; |
---|
[847ab8f] | 255 | } |
---|
| 256 | } |
---|
| 257 | }; |
---|
| 258 | buffer buf; // start thread in main method |
---|
| 259 | \end{lstlisting} |
---|
| 260 | \end{lrbox} |
---|
| 261 | |
---|
| 262 | \subfloat[Go]{\label{l:BB_Go}\usebox\myboxA} |
---|
| 263 | \hspace*{5pt} |
---|
| 264 | \vrule |
---|
| 265 | \hspace*{5pt} |
---|
| 266 | \subfloat[\uC]{\label{l:BB_uC++}\usebox\myboxB} |
---|
| 267 | |
---|
| 268 | \caption{Bounded Buffer} |
---|
| 269 | \label{f:AdaMultiplexing} |
---|
| 270 | \end{figure} |
---|
| 271 | |
---|
| 272 | Finally, \uC provides \gls{synch_multiplex} with Ada-style @select@ over monitor and task methods with the @_Accept@ statement~\cite[\S~2.9.2.1]{uC++}, and over futures with the @_Select@ statement~\cite[\S~3.3.1]{uC++}. |
---|
| 273 | The @_Select@ statement extends the ALT/Go @select@ by offering both @and@ and @or@ semantics, which can be used together in the same statement. |
---|
| 274 | Both @_Accept@ and @_Select@ statements provide guards for multiplexing clauses, as well as, timeout, and @else@ clauses. |
---|
[000d68f] | 275 | Figure~\ref{l:BB_uC++} shows the outline of a bounded buffer administrator implemented with \uC @_Accept@ statements. |
---|
[847ab8f] | 276 | |
---|
[afb3d68] | 277 | There are other languages that provide \gls{synch_multiplex}, including Rust's @select!@ over futures~\cite{rust:select}, Java's @allof@/@anyof@ over futures~\cite{java:allof:anyof}, OCaml's @select@ over channels~\cite{ocaml:channel}, and C++14's @when_any@ over futures~\cite{cpp:whenany}. |
---|
[847ab8f] | 278 | Note that while C++14 and Rust provide \gls{synch_multiplex}, the implementations leave much to be desired as both rely on polling to wait on multiple resources. |
---|
[760c88c] | 279 | |
---|
[a0c746df] | 280 | \section{Other Approaches to Synchronous Multiplexing} |
---|
[847ab8f] | 281 | |
---|
| 282 | To avoid the need for \gls{synch_multiplex}, all communication among threads/processes must come from a single source. |
---|
| 283 | For example, in Erlang each process has a single heterogeneous mailbox that is the sole source of concurrent communication, removing the need for \gls{synch_multiplex} as there is only one place to wait on resources. |
---|
| 284 | Similar, actor systems circumvent the \gls{synch_multiplex} problem as actors only block when waiting for the next message never in a behaviour. |
---|
[a0c746df] | 285 | While these approaches solve the \gls{synch_multiplex} problem, they introduce other issues. |
---|
[afb3d68] | 286 | Consider the case where a thread has a single source of communication and it wants a set of $N$ resources. |
---|
| 287 | It must sequentially request the $N$ resources and wait for each response. |
---|
| 288 | During the receives for the $N$ resources, it can receive other communication, and has to save and postpone these communications, or discard them. |
---|
[847ab8f] | 289 | % If the requests for the other resources need to be retracted, the burden falls on the programmer to determine how to synchronize appropriately to ensure that only one resource is delivered. |
---|
[a0c746df] | 290 | |
---|
| 291 | \section{\CFA's Waituntil Statement} |
---|
[847ab8f] | 292 | |
---|
[760c88c] | 293 | The new \CFA \gls{synch_multiplex} utility introduced in this work is the @waituntil@ statement. |
---|
[c03c1ac] | 294 | There already exists a @waitfor@ statement in \CFA that supports Ada-style \gls{synch_multiplex} over monitor methods~\cite{Delisle21}, so this @waituntil@ focuses on synchronizing over other resources. |
---|
[847ab8f] | 295 | All of the \gls{synch_multiplex} features mentioned so far are monomorphic, only waiting on one kind of resource: Unix @select@ supports file descriptors, Go's @select@ supports channel operations, \uC's @select@ supports futures, and Ada's @select@ supports monitor method calls. |
---|
| 296 | The \CFA @waituntil@ is polymorphic and provides \gls{synch_multiplex} over any objects that satisfy the trait in Figure~\ref{f:wu_trait}. |
---|
| 297 | No other language provides a synchronous multiplexing tool polymorphic over resources like \CFA's @waituntil@. |
---|
[760c88c] | 298 | |
---|
| 299 | \begin{figure} |
---|
| 300 | \begin{cfa} |
---|
| 301 | forall(T & | sized(T)) |
---|
| 302 | trait is_selectable { |
---|
[847ab8f] | 303 | // For registering a waituntil stmt on a selectable type |
---|
| 304 | bool register_select( T &, select_node & ); |
---|
[760c88c] | 305 | |
---|
[847ab8f] | 306 | // For unregistering a waituntil stmt from a selectable type |
---|
| 307 | bool unregister_select( T &, select_node & ); |
---|
[760c88c] | 308 | |
---|
[847ab8f] | 309 | // on_selected is run on the selecting thread prior to executing |
---|
| 310 | // the statement associated with the select_node |
---|
| 311 | bool on_selected( T &, select_node & ); |
---|
[760c88c] | 312 | }; |
---|
| 313 | \end{cfa} |
---|
[847ab8f] | 314 | \caption{Trait for types that can be passed into \CFA's \lstinline{waituntil} statement.} |
---|
[760c88c] | 315 | \label{f:wu_trait} |
---|
| 316 | \end{figure} |
---|
| 317 | |
---|
[c03c1ac] | 318 | Currently locks, channels, futures and timeouts are supported by the @waituntil@ statement, and this set can be expanded through the @is_selectable@ trait as other use-cases arise. |
---|
| 319 | The @waituntil@ statement supports guard clauses, both @or@ and @and@ semantics, and timeout and @else@ for asynchronous multiplexing. |
---|
[847ab8f] | 320 | Figure~\ref{f:wu_example} shows a \CFA @waituntil@ usage, which is waiting for either @Lock@ to be available \emph{or} for a value to be read from @Channel@ into @i@ \emph{and} for @Future@ to be fulfilled \emph{or} a timeout of one second. |
---|
[aae9c17] | 321 | Note that the expression inside a @waituntil@ clause is evaluated once at the start of the @waituntil@ algorithm. |
---|
[760c88c] | 322 | |
---|
| 323 | \section{Waituntil Semantics} |
---|
| 324 | |
---|
[c03c1ac] | 325 | The @waituntil@ semantics has two parts: the semantics of the statement itself, \ie @and@, @or@, @when@ guards, and @else@ semantics, and the semantics of how the @waituntil@ interacts with types like locks, channels, and futures. |
---|
[847ab8f] | 326 | |
---|
| 327 | \subsection{Statement Semantics} |
---|
| 328 | |
---|
| 329 | The @or@ semantics are the most straightforward and nearly match those laid out in the ALT statement from Occam. |
---|
| 330 | The clauses have an exclusive-or relationship where the first available one is run and only one clause is run. |
---|
| 331 | \CFA's @or@ semantics differ from ALT semantics: instead of randomly picking a clause when multiple are available, the first clause in the @waituntil@ that is available is executed. |
---|
| 332 | For example, in the following example, if @foo@ and @bar@ are both available, @foo@ is always selected since it comes first in the order of @waituntil@ clauses. |
---|
[760c88c] | 333 | \begin{cfa} |
---|
[847ab8f] | 334 | future(int) bar, foo; |
---|
[afb3d68] | 335 | waituntil( foo ) { ... } or waituntil( bar ) { ... } // prioritize foo |
---|
[c03c1ac] | 336 | \end{cfa} |
---|
[2cb15b0] | 337 | The reason for these semantics is that prioritizing resources can be useful in certain problems, such as shutdown. |
---|
[afb3d68] | 338 | In the rare case where there is a starvation problem with the ordering, it possible to follow a @waituntil@ with its reverse form, alternating which resource has the highest priority: |
---|
[c03c1ac] | 339 | \begin{cfa} |
---|
| 340 | waituntil( foo ) { ... } or waituntil( bar ) { ... } // prioritize foo |
---|
| 341 | waituntil( bar ) { ... } or waituntil( foo ) { ... } // prioritize bar |
---|
[760c88c] | 342 | \end{cfa} |
---|
[afb3d68] | 343 | While this approach is not general for many resources, it handles many basic cases. |
---|
[760c88c] | 344 | |
---|
[2cb15b0] | 345 | \begin{figure} |
---|
| 346 | \begin{cfa} |
---|
| 347 | future(int) Future; |
---|
| 348 | channel(int) Channel; |
---|
| 349 | owner_lock Lock; |
---|
| 350 | int i = 0; |
---|
| 351 | |
---|
| 352 | waituntil( @Lock@ ) { ... } |
---|
| 353 | or when( i == 0 ) waituntil( i << @Channel@ ) { ... } |
---|
| 354 | and waituntil( @Future@ ) { ... } |
---|
| 355 | or waituntil( @timeout( 1`s )@ ) { ... } |
---|
| 356 | // else { ... } |
---|
| 357 | \end{cfa} |
---|
| 358 | \caption{Example of \CFA's waituntil statement} |
---|
| 359 | \label{f:wu_example} |
---|
| 360 | \end{figure} |
---|
| 361 | |
---|
[847ab8f] | 362 | The \CFA @and@ semantics match the @and@ semantics of \uC \lstinline[language=uC++]{_Select}. |
---|
| 363 | When multiple clauses are joined by @and@, the @waituntil@ makes a thread wait for all to be available, but still runs the corresponding code blocks \emph{as they become available}. |
---|
[afb3d68] | 364 | When an @and@ clause becomes available, the waiting thread unblocks and runs that clause's code-block, and then the thread waits again for the next available clause or the @waituntil@ statement is now satisfied. |
---|
[847ab8f] | 365 | This semantics allows work to be done in parallel while synchronizing over a set of resources, and furthermore, gives a good reason to use the @and@ operator. |
---|
[c03c1ac] | 366 | If the @and@ operator waited for all clauses to be available before running, it is the same as just acquiring those resources consecutively by a sequence of @waituntil@ statements. |
---|
[847ab8f] | 367 | |
---|
[aae9c17] | 368 | As with normal C expressions, the @and@ operator binds more tightly than the @or@. |
---|
[847ab8f] | 369 | To give an @or@ operator higher precedence, parenthesis are used. |
---|
| 370 | For example, the following @waituntil@ unconditionally waits for @C@ and one of either @A@ or @B@, since the @or@ is given higher precedence via parenthesis. |
---|
[760c88c] | 371 | \begin{cfa} |
---|
[847ab8f] | 372 | @(@ waituntil( A ) { ... } // bind tightly to or |
---|
| 373 | or waituntil( B ) { ... } @)@ |
---|
[760c88c] | 374 | and waituntil( C ) { ... } |
---|
| 375 | \end{cfa} |
---|
| 376 | |
---|
[847ab8f] | 377 | The guards in the @waituntil@ statement are called @when@ clauses. |
---|
| 378 | Each boolean expression inside a @when@ is evaluated \emph{once} before the @waituntil@ statement is run. |
---|
| 379 | Like Occam's ALT, the guards toggle clauses on and off, where a @waituntil@ clause is only evaluated and waited on if the corresponding guard is @true@. |
---|
| 380 | In addition, the @waituntil@ guards require some nuance since both @and@ and @or@ operators are supported \see{Section~\ref{s:wu_guards}}. |
---|
| 381 | When a guard is false and a clause is removed, it can be thought of as removing that clause and its preceding operation from the statement. |
---|
| 382 | For example, in the following, the two @waituntil@ statements are semantically equivalent. |
---|
| 383 | |
---|
| 384 | \begin{lrbox}{\myboxA} |
---|
[760c88c] | 385 | \begin{cfa} |
---|
[847ab8f] | 386 | when( true ) waituntil( A ) { ... } |
---|
| 387 | or when( false ) waituntil( B ) { ... } |
---|
[760c88c] | 388 | and waituntil( C ) { ... } |
---|
[847ab8f] | 389 | \end{cfa} |
---|
| 390 | \end{lrbox} |
---|
| 391 | |
---|
| 392 | \begin{lrbox}{\myboxB} |
---|
| 393 | \begin{cfa} |
---|
[760c88c] | 394 | waituntil( A ) { ... } |
---|
| 395 | and waituntil( C ) { ... } |
---|
[847ab8f] | 396 | |
---|
[760c88c] | 397 | \end{cfa} |
---|
[847ab8f] | 398 | \end{lrbox} |
---|
| 399 | |
---|
| 400 | \begin{tabular}{@{}lcl@{}} |
---|
| 401 | \usebox\myboxA & $\equiv$ & \usebox\myboxB |
---|
| 402 | \end{tabular} |
---|
| 403 | |
---|
| 404 | The @else@ clause on the @waituntil@ has identical semantics to the @else@ clause in Ada. |
---|
| 405 | If all resources are not immediately available and there is an @else@ clause, the @else@ clause is run and the thread continues. |
---|
[760c88c] | 406 | |
---|
[847ab8f] | 407 | \subsection{Type Semantics} |
---|
[760c88c] | 408 | |
---|
[847ab8f] | 409 | As mentioned, to support interaction with the @waituntil@ statement a type must support the trait in Figure~\ref{f:wu_trait}. |
---|
| 410 | The @waituntil@ statement expects types to register and unregister themselves via calls to @register_select@ and @unregister_select@, respectively. |
---|
[494a7e5] | 411 | When a resource becomes available, @on_selected@ is run, and if it returns false, the corresponding code block is not run. |
---|
[847ab8f] | 412 | Many types do not need @on_selected@, but it is provided if a type needs to perform work or checks before the resource can be accessed in the code block. |
---|
[494a7e5] | 413 | The register/unregister routines in the trait also return booleans. |
---|
[847ab8f] | 414 | The return value of @register_select@ is @true@, if the resource is immediately available and @false@ otherwise. |
---|
| 415 | The return value of @unregister_select@ is @true@, if the corresponding code block should be run after unregistration and @false@ otherwise. |
---|
| 416 | The routine @on_selected@ and the return value of @unregister_select@ are needed to support channels as a resource. |
---|
| 417 | More detail on channels and their interaction with @waituntil@ appear in Section~\ref{s:wu_chans}. |
---|
[760c88c] | 418 | |
---|
[f496046] | 419 | The trait can be used directly by having a blocking object support the @is_selectable@ trait, or it can be used indirectly through routines that take the object as an argument. |
---|
| 420 | When used indirectly, the object's routine returns a type that supports the @is_selectable@ trait. |
---|
[c03c1ac] | 421 | This feature leverages \CFA's ability to overload on return type to select the correct overloaded routine for the @waituntil@ context. |
---|
[f496046] | 422 | Indirect support through routines is needed for types that want to support multiple operations such as channels that allow both reading and writing. |
---|
[c03c1ac] | 423 | |
---|
[847ab8f] | 424 | \section{\lstinline{waituntil} Implementation} |
---|
[c03c1ac] | 425 | |
---|
[afb3d68] | 426 | The @waituntil@ statement is not inherently complex, and Figure~\ref{f:WU_Impl} shows the basic outline of the @waituntil@ algorithm. |
---|
| 427 | Complexity comes from the consideration of race conditions and synchronization needed when supporting various primitives. |
---|
| 428 | The following sections use examples to fill in complexity details missing in Figure~\ref{f:WU_Impl}. |
---|
| 429 | After which, the full pseudocode for the @waituntil@ algorithm is presented in Figure~\ref{f:WU_Full_Impl}. |
---|
[760c88c] | 430 | |
---|
[847ab8f] | 431 | \begin{figure} |
---|
| 432 | \begin{cfa} |
---|
[c03c1ac] | 433 | select_nodes s[N]; $\C[3.25in]{// declare N select nodes}$ |
---|
| 434 | for ( node in s ) $\C{// register nodes}$ |
---|
[847ab8f] | 435 | register_select( resource, node ); |
---|
| 436 | while ( statement predicate not satisfied ) { $\C{// check predicate}$ |
---|
[c03c1ac] | 437 | // block until clause(s) satisfied |
---|
[7ed01be] | 438 | for ( resource in waituntil statement ) { $\C{// run true code blocks}$ |
---|
[847ab8f] | 439 | if ( resource is avail ) run code block |
---|
[c03c1ac] | 440 | if ( statement predicate is satisfied ) break; |
---|
| 441 | } |
---|
[847ab8f] | 442 | } |
---|
| 443 | for ( node in s ) $\C{// deregister nodes}\CRT$ |
---|
[7ed01be] | 444 | if ( unregister_select( resource, node ) ) run code block |
---|
[847ab8f] | 445 | \end{cfa} |
---|
| 446 | \caption{\lstinline{waituntil} Implementation} |
---|
| 447 | \label{f:WU_Impl} |
---|
| 448 | \end{figure} |
---|
[5e81a9c] | 449 | |
---|
[c03c1ac] | 450 | The basic steps of the algorithm are: |
---|
[847ab8f] | 451 | \begin{enumerate} |
---|
[5e81a9c] | 452 | \item |
---|
[847ab8f] | 453 | The @waituntil@ statement declares $N$ @select_node@s, one per resource that is being waited on, which stores any @waituntil@ data pertaining to that resource. |
---|
[5e81a9c] | 454 | |
---|
| 455 | \item |
---|
[847ab8f] | 456 | Each @select_node@ is then registered with the corresponding resource. |
---|
[5e81a9c] | 457 | |
---|
| 458 | \item |
---|
[847ab8f] | 459 | The thread executing the @waituntil@ then loops until the statement's predicate is satisfied. |
---|
[c03c1ac] | 460 | In each iteration, if the predicate is unsatisfied, the @waituntil@ thread blocks. |
---|
| 461 | When another thread satisfies a resource clause (\eg sends to a channel), it unblocks the @waituntil@ thread. |
---|
| 462 | This thread checks all clauses for completion, and any completed clauses have their code blocks run. |
---|
[7ed01be] | 463 | While checking clause completion, if enough clauses have been run such that the statement predicate is satisfied, the loop exits early. |
---|
[760c88c] | 464 | |
---|
[5e81a9c] | 465 | \item |
---|
| 466 | Once the thread escapes the loop, the @select_nodes@ are unregistered from the resources. |
---|
| 467 | \end{enumerate} |
---|
| 468 | These steps give a basic overview of how the statement works. |
---|
[c03c1ac] | 469 | The following sections shed light on the specific changes and provide more implementation detail. |
---|
[760c88c] | 470 | |
---|
[494a7e5] | 471 | \subsection{Locks}\label{s:wu_locks} |
---|
[847ab8f] | 472 | |
---|
| 473 | The \CFA runtime supports a number of spinning and blocking locks, \eg semaphore, MCS, futex, Go mutex, spinlock, owner, \etc. |
---|
| 474 | Many of these locks satisfy the @is_selectable@ trait, and hence, are resources supported by the @waituntil@ statement. |
---|
| 475 | For example, the following waits until the thread has acquired lock @l1@ or locks @l2@ and @l3@. |
---|
| 476 | \begin{cfa} |
---|
| 477 | owner_lock l1, l2, l3; |
---|
| 478 | waituntil ( l1 ) { ... } |
---|
| 479 | or waituntil( l2 ) { ... } |
---|
| 480 | and waituntil( l3 ) { ... } |
---|
| 481 | \end{cfa} |
---|
| 482 | Implicitly, the @waituntil@ is calling the lock acquire for each of these locks to establish a position in the lock's queue of waiting threads. |
---|
[c03c1ac] | 483 | When the lock schedules this thread, it unblocks and runs the code block associated with the lock and then releases the lock. |
---|
[847ab8f] | 484 | |
---|
| 485 | In detail, when a thread waits on multiple locks via a @waituntil@, it enqueues a @select_node@ in each of the lock's waiting queues. |
---|
| 486 | When a @select_node@ reaches the front of the lock's queue and gains ownership, the thread blocked on the @waituntil@ is unblocked. |
---|
[c03c1ac] | 487 | Now, the lock is held by the @waituntil@ thread until the code block is executed, and then the node is unregistered, during which the lock is released. |
---|
[afb3d68] | 488 | Immediately releasing the lock after the code block prevents the waiting thread from holding multiple locks and potentially introducing a deadlock. |
---|
[c03c1ac] | 489 | As such, the only unregistered nodes associated with locks are the ones that have not run. |
---|
[760c88c] | 490 | |
---|
| 491 | \subsection{Timeouts} |
---|
[847ab8f] | 492 | |
---|
[afb3d68] | 493 | A timeout for the @waituntil@ statement is a duration passed to routine \lstinline[deletekeywords={timeout}]{timeout}\footnote{\lstinline{timeout} is a quasi-keyword in \CFA, allowing it to be used an identifier.}, \eg: |
---|
[c03c1ac] | 494 | \begin{cquote} |
---|
| 495 | \begin{tabular}{@{}l|l@{}} |
---|
| 496 | \multicolumn{2}{@{}l@{}}{\lstinline{Duration D1\{ 1`ms \}, D2\{ 2`ms \}, D3\{ 3`ms \};}} \\ |
---|
| 497 | \begin{cfa}[deletekeywords={timeout}] |
---|
| 498 | waituntil( i << C1 ) {} |
---|
| 499 | or waituntil( i << C2 ) {} |
---|
| 500 | or waituntil( i << C3 ) {} |
---|
| 501 | or waituntil( timeout( D1 ) ) {} |
---|
| 502 | or waituntil( timeout( D2 ) ) {} |
---|
| 503 | or waituntil( timeout( D3 ) ) {} |
---|
| 504 | \end{cfa} |
---|
| 505 | & |
---|
| 506 | \begin{cfa}[deletekeywords={timeout}] |
---|
| 507 | waituntil( i << C1 ) {} |
---|
| 508 | or waituntil( i << C2 ) {} |
---|
| 509 | or waituntil( i << C3 ) {} |
---|
[a59e338] | 510 | or waituntil( timeout( min( D1, D2, D3 ) ) ) {} |
---|
[c03c1ac] | 511 | |
---|
[760c88c] | 512 | |
---|
| 513 | \end{cfa} |
---|
[c03c1ac] | 514 | \end{tabular} |
---|
| 515 | \end{cquote} |
---|
[a59e338] | 516 | These two examples are basically equivalent. |
---|
[c03c1ac] | 517 | Here, the multiple timeouts are useful because the durations can change during execution and the separate clauses provide different code blocks if a timeout triggers. |
---|
| 518 | Multiple timeouts can also be used with @and@ to provide a minimal delay before proceeding. |
---|
[afb3d68] | 519 | In following example, either channel @C1@ or @C2@ must be satisfied but nothing can be done for at least 1 or 3 seconds after the channel read, respectively. |
---|
[c03c1ac] | 520 | \begin{cfa}[deletekeywords={timeout}] |
---|
[9509d67a] | 521 | waituntil( i << C1 ){} and waituntil( timeout( 1`s ) ){} |
---|
| 522 | or waituntil( i << C2 ){} and waituntil( timeout( 3`s ) ){} |
---|
[c03c1ac] | 523 | \end{cfa} |
---|
[afb3d68] | 524 | If only @C2@ is satisfied, \emph{both} timeout code-blocks trigger because 1 second occurs before 3 seconds. |
---|
[aae9c17] | 525 | Note that the \CFA @waitfor@ statement only provides a single @timeout@ clause because it only supports @or@ semantics. |
---|
[760c88c] | 526 | |
---|
[c03c1ac] | 527 | The \lstinline[deletekeywords={timeout}]{timeout} routine is different from UNIX @sleep@, which blocks for the specified duration and returns the amount of time elapsed since the call started. |
---|
| 528 | Instead, \lstinline[deletekeywords={timeout}]{timeout} returns a type that supports the @is_selectable@ trait, allowing the type system to select the correct overloaded routine for this context. |
---|
| 529 | For the @waituntil@, it is more idiomatic for the \lstinline[deletekeywords={timeout}]{timeout} to use the same syntax as other blocking operations instead of having a special language clause. |
---|
[760c88c] | 530 | |
---|
| 531 | \subsection{Channels}\label{s:wu_chans} |
---|
| 532 | |
---|
[c03c1ac] | 533 | Channels require more complexity to allow synchronous multiplexing. |
---|
[afb3d68] | 534 | For locks, when an outside thread releases a lock and unblocks the @waituntil@ thread (WUT), the lock's MX property is passed to the WUT (no spinning locks). |
---|
[c03c1ac] | 535 | For futures, the outside thread deliveries a value to the future and unblocks any waiting threads, including WUTs. |
---|
[afb3d68] | 536 | In either case, after the WUT unblocks, it is safe to execute its corresponding code block knowing access to the resource is protected by the lock or the read-only state of the future. |
---|
[c03c1ac] | 537 | Similarly, for channels, when an outside thread inserts a value into a channel, it must unblock the WUT. |
---|
[47b7142] | 538 | However, for channels, there is a race condition that poses an issue. |
---|
| 539 | If the outside thread inserts into the channel and unblocks the WUT, there is a race where another thread can remove the channel data, so after the WUT unblocks and attempts to remove from the buffer, it fails, and the WUT must reblock (busy waiting). |
---|
[c03c1ac] | 540 | This scenario is a \gls{toctou} race that needs to be consolidated. |
---|
| 541 | To close the race, the outside thread must detect this case and insert directly into the left-hand side of the channel expression (@i << chan@) rather than into the channel, and then unblock the WUT. |
---|
| 542 | Now the unblocked WUT is guaranteed to have a satisfied resource and its code block can safely executed. |
---|
| 543 | The insertion circumvents the channel buffer via the wait-morphing in the \CFA channel implementation \see{Section~\ref{s:chan_impl}}, allowing @waituntil@ channel unblocking to not be special-cased. |
---|
[9509d67a] | 544 | Note that all channel operations are fair and no preference is given between @waituntil@ and direct channel operations when unblocking. |
---|
[c03c1ac] | 545 | |
---|
| 546 | Furthermore, if both @and@ and @or@ operators are used, the @or@ operations stop behaving like exclusive-or due to the race among channel operations, \eg: |
---|
[760c88c] | 547 | \begin{cfa} |
---|
[afb3d68] | 548 | int i; |
---|
[c03c1ac] | 549 | waituntil( i << A ) {} and waituntil( i << B ) {} |
---|
| 550 | or waituntil( i << C ) {} and waituntil( i << D ) {} |
---|
[760c88c] | 551 | \end{cfa} |
---|
[c03c1ac] | 552 | If exclusive-or semantics are followed, only the code blocks for @A@ and @B@ are run, or the code blocks for @C@ and @D@. |
---|
[afb3d68] | 553 | However, four outside threads inserting into each channel can simultaneously put values into @i@ and attempt to unblock the WUT to run the four code-blocks. |
---|
[c03c1ac] | 554 | This case introduces a race with complexity that increases with the size of the @waituntil@ statement. |
---|
| 555 | However, due to TOCTOU issues, it is impossible to know if all resources are available without acquiring all the internal locks of channels in the subtree of the @waituntil@ clauses. |
---|
| 556 | This approach is a poor solution for two reasons. |
---|
[afb3d68] | 557 | It is possible that once all the locks are acquired, the subtree is not satisfied and the locks must be released. |
---|
[c03c1ac] | 558 | This work incurs a high cost for signalling threads and heavily increase contention on internal channel locks. |
---|
[5e81a9c] | 559 | Furthermore, the @waituntil@ statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible. |
---|
[f496046] | 560 | As such, the exclusive-or semantics are lost when using both @and@ and @or@ operators since it cannot be supported without significant complexity and significantly affects @waituntil@ performance. |
---|
[afb3d68] | 561 | Therefore, the example of reusing variable @i@ by multiple output channels is considered a user error without exclusive-or semantics. |
---|
| 562 | Given aliasing in C, it is impossible to even warn of the potential race. |
---|
| 563 | In the future, it would be interesting to support Go-like syntax, \lstinline[language=Go]{case i := <- ...}, defining a new scoped @i@ variable for each clause. |
---|
[c03c1ac] | 564 | |
---|
| 565 | It was deemed important that exclusive-or semantics are maintained when only @or@ operators are used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel. |
---|
[d4c1d1f] | 566 | Consider the following example where thread 1 is reading and threads 2 and 3 are writing to channels @A@ and @B@ concurrently. |
---|
| 567 | \begin{cquote} |
---|
| 568 | \begin{tabular}{@{}l|l|l@{}} |
---|
[afb3d68] | 569 | \multicolumn{3}{@{}l@{}}{\lstinline{channel(int) A, B; // zero size channels}} \\ |
---|
[d4c1d1f] | 570 | thread 1 & thread 2 & thread 3 \\ |
---|
[47b7142] | 571 | \begin{cfa} |
---|
| 572 | waituntil( i << A ) {} |
---|
| 573 | or waituntil( i << B ) {} |
---|
| 574 | \end{cfa} |
---|
[d4c1d1f] | 575 | & |
---|
| 576 | \begin{cfa} |
---|
| 577 | A << 1; |
---|
[c03c1ac] | 578 | |
---|
[d4c1d1f] | 579 | \end{cfa} |
---|
| 580 | & |
---|
| 581 | \begin{cfa} |
---|
| 582 | B << 2; |
---|
[760c88c] | 583 | |
---|
[d4c1d1f] | 584 | \end{cfa} |
---|
| 585 | \end{tabular} |
---|
| 586 | \end{cquote} |
---|
| 587 | For thread 1 to have exclusive-or semantics, it must only consume from exactly one of @A@ or @B@. |
---|
| 588 | As such, thread 2 and 3 must race to establish the winning clause of the @waituntil@ in thread 1. |
---|
| 589 | This race is consolidated by thread 2 and 3 each attempting to set a pointer to the winning clause's @select_node@ address using \gls{cas}. |
---|
| 590 | The winner bypasses the channel and inserts into the WUT's left-hand, and signals thread 1. |
---|
| 591 | The loser continues checking if there is space in the channel, and if so performs the channel insert operation with a possible signal of a waiting remove thread; |
---|
| 592 | otherwise, if there is no space, the loser blocks. |
---|
| 593 | It is important the race occurs \emph{before} operating on the channel, because channel actions are different with respect to each thread. |
---|
| 594 | If the race was consolidated after the operation, both thread 2 and 3 could potentially write into @i@ concurrently. |
---|
| 595 | |
---|
| 596 | Channels introduce another interesting implementation issue. |
---|
[47b7142] | 597 | Supporting both reading and writing to a channel in a @waituntil@ means that one @waituntil@ clause may be the notifier of another @waituntil@ clause. |
---|
[afb3d68] | 598 | This conjunction poses a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel. |
---|
[47b7142] | 599 | Consider the following example, alongside a described ordering of events to highlight the race. |
---|
[d4c1d1f] | 600 | \begin{cquote} |
---|
| 601 | \begin{tabular}{@{}l|l@{}} |
---|
[afb3d68] | 602 | \multicolumn{2}{@{}l@{}}{\lstinline{channel(int) A, B; // zero size channels}} \\ |
---|
[d4c1d1f] | 603 | thread 1 & thread 2 \\ |
---|
[1ae3f185] | 604 | \begin{cfa}[moredelim={**[is][\color{blue}]{\#}{\#}}] |
---|
[d4c1d1f] | 605 | waituntil( @i << A@ ) {} |
---|
[1ae3f185] | 606 | or waituntil( #i << B# ) {} |
---|
[c03c1ac] | 607 | \end{cfa} |
---|
[d4c1d1f] | 608 | & |
---|
[1ae3f185] | 609 | \begin{cfa}[moredelim={**[is][\color{blue}]{\#}{\#}}] |
---|
| 610 | waituntil( #B << 2# ) {} |
---|
[d4c1d1f] | 611 | or waituntil( @A << 1@ ) {} |
---|
| 612 | \end{cfa} |
---|
| 613 | \end{tabular} |
---|
| 614 | \end{cquote} |
---|
[afb3d68] | 615 | Assume thread 1 executes first, registers with channel @A@ and proceeds in the @waituntil@. |
---|
| 616 | Since @A@ is empty, thread 1 cannot remove, and then thread 1 is interrupted before registering with @B@. |
---|
| 617 | Thread 2 similarly registers with channel @B@, and proceeds in the @waituntil@. |
---|
| 618 | Since @B@ is zero size there is no space to insert, and then thread 2 is interrupted before registering with @A@. |
---|
[d4c1d1f] | 619 | At this point, thread 1 and 2 resume execution. |
---|
[afb3d68] | 620 | Remember from above, each exclusive-or @waituntil@ holds a race to set the winning clause of the statement. |
---|
| 621 | The issue that arises is that these two @waituntil@ statements must have matching winning clauses (both @A@ clauses or both @B@ clauses) to preserve the exclusive-or semantics, since a zero-sized channel needs an insert/remove pair for an operation to occur. |
---|
| 622 | If threads 1 and 2 race to set a winner only in their own @waituntil@, thread 1 can think it successfully removed from @B@, and thread 2 can think it successfully inserted into @A@, which is an error. |
---|
| 623 | Hence, there is a race on two fronts. |
---|
| 624 | If thread 1 wins the race and sees that @B@ has a waiting insertion, then thread 2 must execute the first clause of its @waituntil@ and thread 1 execute its second. |
---|
| 625 | Correspondingly, if thread 2 wins the race and sees that @A@ has a waiting removal, then thread 1 must execute the first clause of its @waituntil@ and thread 2 execute its second. |
---|
| 626 | Any other execution scenario is incorrect for exclusive-or semantics. |
---|
[aae9c17] | 627 | Note that priority execution of multiple satisfied @waituntil@ causes (\ie top to bottom) is not violated because, in this scenario, there is only one satisfied clause for either thread. |
---|
[760c88c] | 628 | |
---|
[d4c1d1f] | 629 | The Go @select@ solves this problem by acquiring all the internal locks of the channels before registering the @select@ on the channels. |
---|
| 630 | This approach eliminates the race shown above since thread 1 and 2 cannot both be registering at the same time. |
---|
| 631 | However, this approach cannot be used in \CFA, since the @waituntil@ is polymorphic. |
---|
[afb3d68] | 632 | Not all types in a @waituntil@ have an internal lock, and when using non-channel types, acquiring all the locks incurs extra unneeded overhead. |
---|
[c03c1ac] | 633 | Instead, this race is consolidated in \CFA in two phases by having an intermediate pending status value for the race. |
---|
[47b7142] | 634 | This race case is detectable, and if detected, each thread first races to set its own @waituntil@ race pointer to be pending. |
---|
| 635 | If it succeeds, it then attempts to set the other thread's @waituntil@ race pointer to its success value. |
---|
| 636 | If either thread successfully sets the the other thread's @waituntil@ race pointer, then the operation can proceed, if not the signalling threads set its own race pointer back to the initial value and repeats. |
---|
| 637 | This retry mechanism can potentially introduce a livelock, but in practice a livelock here is highly unlikely. |
---|
[d4c1d1f] | 638 | Furthermore, the likelihood of a livelock here is zero unless the program is in the niche case of having two or more exclusive-or @waituntil@s with two or more clauses in reverse order of priority. |
---|
| 639 | This livelock case can be fully eliminated using locks like Go, or if a \gls{dcas} instruction is available. |
---|
| 640 | If any other threads attempt to set a WUT's race pointer and see a pending value, they wait until the value changes before proceeding to ensure that, in the case the WUT fails, the signal is not lost. |
---|
| 641 | This protocol ensures that signals cannot be lost and that the two races can be resolved in a safe manner. |
---|
[f496046] | 642 | The implementation of this protocol is shown in Figure~\ref{f:WU_DeadlockAvoidance}. |
---|
| 643 | |
---|
| 644 | \begin{figure} |
---|
| 645 | \begin{cfa} |
---|
| 646 | bool pending_set_other( select_node & other, select_node & mine ) { |
---|
[d5f5eb7] | 647 | unsigned long int cmp_status = UNSAT; |
---|
| 648 | |
---|
| 649 | // Try to set other status, if we succeed break and return true |
---|
| 650 | while( ! CAS( other.clause_status, &cmp_status, SAT ) ) { |
---|
| 651 | if ( cmp_status == SAT ) |
---|
| 652 | return false; // If other status is SAT we lost so return false |
---|
| 653 | |
---|
| 654 | // Toggle own status flag to allow other thread to potentially win |
---|
| 655 | mine.status = UNSAT; |
---|
| 656 | |
---|
| 657 | // Reset compare flag |
---|
| 658 | cmp_status = UNSAT; |
---|
| 659 | |
---|
| 660 | // Attempt to set own status flag back to PENDING to retry |
---|
| 661 | if ( ! CAS( mine.clause_status, &cmp_status, PENDING ) ) |
---|
| 662 | return false; // If we fail then we lost so return false |
---|
| 663 | |
---|
| 664 | // Reset compare flag |
---|
| 665 | cmp_status = UNSAT; |
---|
| 666 | } |
---|
| 667 | return true; |
---|
[f496046] | 668 | } |
---|
| 669 | \end{cfa} |
---|
| 670 | \caption{Exclusive-or \lstinline{waituntil} channel deadlock avoidance protocol} |
---|
| 671 | \label{f:WU_DeadlockAvoidance} |
---|
| 672 | \end{figure} |
---|
[760c88c] | 673 | |
---|
[d4c1d1f] | 674 | Channels in \CFA have exception-based shutdown mechanisms that the @waituntil@ statement needs to support. |
---|
[47b7142] | 675 | These exception mechanisms are supported through the @on_selected@ routine. |
---|
[d4c1d1f] | 676 | This routine is needed by channels to detect if they are closed after unblocking in a @waituntil@ statement, to ensure the appropriate behaviour is taken and an exception is thrown. |
---|
[afb3d68] | 677 | Hence, the channel close-down mechanism is handled correctly. |
---|
[760c88c] | 678 | |
---|
| 679 | \subsection{Guards and Statement Predicate}\label{s:wu_guards} |
---|
[d4c1d1f] | 680 | |
---|
| 681 | It is trivial to check when a synchronous multiplexing utility is done for the or/xor relationship, since any resource becoming available means that the blocked thread can proceed and the @waituntil@ statement is finished. |
---|
[f496046] | 682 | In \uC and \CFA, the \gls{synch_multiplex} mechanism have both an and/or relationship, which along with guards, make the problem of checking for completion of the statement difficult. |
---|
[afb3d68] | 683 | Consider the following @waituntil@. |
---|
[f496046] | 684 | \begin{cfa} |
---|
[afb3d68] | 685 | when( GA ) waituntil( A ) {} |
---|
| 686 | and when( GB ) waituntil( B ) {} |
---|
| 687 | or when( GC ) waituntil( C ) {} |
---|
[f496046] | 688 | \end{cfa} |
---|
[afb3d68] | 689 | When the @waituntil@ thread wakes up, the following predicate represents satisfaction: |
---|
[f496046] | 690 | \begin{cfa} |
---|
[afb3d68] | 691 | A && B || C || ! GA && B || ! GB && A || ! GA && ! GB && ! GC |
---|
[f496046] | 692 | \end{cfa} |
---|
[afb3d68] | 693 | which can be simplified to: |
---|
[f496046] | 694 | \begin{cfa} |
---|
[afb3d68] | 695 | ( A || ! GA ) && ( B || ! GB ) || C || ! GA && ! GB && ! GC |
---|
[f496046] | 696 | \end{cfa} |
---|
[afb3d68] | 697 | Checking the complete predicate on each iteration of the pending @waituntil@ is expensive so \uC and \CFA both take steps to simplify checking for statement completion. |
---|
[760c88c] | 698 | |
---|
[5e81a9c] | 699 | In the \uC @_Select@ statement, this problem is solved by constructing a tree of the resources, where the internal nodes are operators and the leaves are booleans storing the state of each resource. |
---|
[afb3d68] | 700 | A diagram of the tree for the complete predicate above is shown in Figure~\ref{f:uC_select_tree}, alongside the modification of the tree that occurs when @GA@ is @false@. |
---|
| 701 | Each internal node stores the statuses of the two subtrees beneath it. |
---|
[d4c1d1f] | 702 | When resources become available, their corresponding leaf node status is modified, which percolates up the tree to update the state of the statement. |
---|
[760c88c] | 703 | Once the root of the tree has both subtrees marked as @true@ then the statement is complete. |
---|
[d4c1d1f] | 704 | As an optimization, when the internal nodes are updated, the subtrees marked as @true@ are pruned and not examined again. |
---|
[f496046] | 705 | To support statement guards in \uC, the tree is modified to remove an internal node if a guard is false to maintain the appropriate predicate representation. |
---|
| 706 | |
---|
| 707 | \begin{figure} |
---|
| 708 | \begin{center} |
---|
| 709 | \input{diagrams/uCpp_select_tree.tikz} |
---|
| 710 | \end{center} |
---|
[afb3d68] | 711 | \caption{\uC \lstinline[language=uC++]{select} tree modification} |
---|
[f496046] | 712 | \label{f:uC_select_tree} |
---|
| 713 | \end{figure} |
---|
[760c88c] | 714 | |
---|
[afb3d68] | 715 | The \CFA @waituntil@ statement blocks a thread until a set of resources have become available that satisfy the complete predicate of a @waituntil@. |
---|
| 716 | The waiting condition of the @waituntil@ statement is implemented as the complete predicate over the resources, joined by the @waituntil@ operators, where a resource is @true@ if it is available, and @false@ otherwise. |
---|
| 717 | This complete predicate is used as the mechanism to check if a thread is done waiting on a @waituntil@. |
---|
[aae9c17] | 718 | Leveraging the compiler, a predicate routine is generated per @waituntil@. |
---|
| 719 | This predicate routine accepts the statuses of the resources being waited on as arguments. |
---|
| 720 | A resource status is an integer that indicates whether a resource is either not available, available, or has already run its associated code block. |
---|
| 721 | The predicate routine returns @true@ when the @waituntil@ is done, \ie enough resources have run their associated code blocks to satisfy the @waituntil@'s predicate, and false otherwise. |
---|
[847ab8f] | 722 | To support guards on the \CFA @waituntil@ statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate. |
---|
[f496046] | 723 | The generated code allows the predicate that is checked with each iteration to be simplified to not check guard values. |
---|
[afb3d68] | 724 | For example, the following is generated for the complete predicate above: |
---|
[f496046] | 725 | \begin{cfa} |
---|
| 726 | // statement completion predicate |
---|
| 727 | bool check_completion( select_node * nodes ) { |
---|
[d5f5eb7] | 728 | return nodes[0].status && nodes[1].status || nodes[2].status; |
---|
[f496046] | 729 | } |
---|
[afb3d68] | 730 | if ( GA || GB || GC ) { $\C{// skip statement if all guards false}$ |
---|
[d5f5eb7] | 731 | select_node nodes[3]; |
---|
| 732 | nodes[0].status = ! GA && GB; $\C{// A's status}$ |
---|
| 733 | nodes[1].status = ! GB && GA; $\C{// B's status}$ |
---|
| 734 | nodes[2].status = ! GC; $\C{// C's status}$ |
---|
| 735 | // ... rest of waituntil codegen ... |
---|
[f496046] | 736 | } |
---|
| 737 | \end{cfa} |
---|
[760c88c] | 738 | |
---|
[d4c1d1f] | 739 | \uC's @_Select@, supports operators both inside and outside of the \lstinline[language=uC++]{_Select} clauses. |
---|
| 740 | In the following example, the code blocks run once their corresponding predicate inside the round braces is satisfied. |
---|
[1ae3f185] | 741 | \begin{lstlisting}[language=uC++,{moredelim=**[is][\color{red}]{@}{@}}] |
---|
[760c88c] | 742 | Future_ISM<int> A, B, C, D; |
---|
[1ae3f185] | 743 | _Select( @A || B && C@ ) { ... } |
---|
| 744 | and _Select( @D && E@ ) { ... } |
---|
[d4c1d1f] | 745 | \end{lstlisting} |
---|
[afb3d68] | 746 | This feature is more expressive than the @waituntil@ statement in \CFA, allowing the code block for @&&@ to only run after \emph{both} resources are available. |
---|
[760c88c] | 747 | |
---|
[d4c1d1f] | 748 | In \CFA, since the @waituntil@ statement supports more resources than just futures, implementing operators inside clauses is avoided for a few reasons. |
---|
| 749 | As a motivating example, suppose \CFA supported operators inside clauses as in: |
---|
[760c88c] | 750 | \begin{cfa} |
---|
| 751 | owner_lock A, B, C, D; |
---|
| 752 | waituntil( A && B ) { ... } |
---|
| 753 | or waituntil( C && D ) { ... } |
---|
| 754 | \end{cfa} |
---|
[afb3d68] | 755 | If the @waituntil@ acquires each lock as it becomes available, there is a possible deadlock since it is in a hold-and-wait situation. |
---|
[d4c1d1f] | 756 | Other semantics are needed to ensure this operation is safe. |
---|
| 757 | One possibility is to use \CC's @scoped_lock@ approach described in Section~\ref{s:DeadlockAvoidance}; |
---|
| 758 | however, that opens the potential for livelock. |
---|
| 759 | Another possibility is to use resource ordering similar to \CFA's @mutex@ statement, but that alone is insufficient, if the resource ordering is not used universally. |
---|
[760c88c] | 760 | One other way this could be implemented is to wait until all resources for a given clause are available before proceeding to acquire them, but this also quickly becomes a poor approach. |
---|
[afb3d68] | 761 | This approach does not work due to \gls{toctou} issues, \ie it is impossible to ensure that the full set of resources are available without holding them all first. |
---|
| 762 | Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems. |
---|
| 763 | Finally, the problem of operators inside clauses is also a difficult to handle when supporting channels. |
---|
| 764 | It would require some way to ensure channels used with internal operators are modified, if and only if, the corresponding code block is run. |
---|
| 765 | However, that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}. |
---|
[aae9c17] | 766 | Ultimately, this feature was not considered crucial after taking into account the complexity and runtime cost. |
---|
[494a7e5] | 767 | |
---|
| 768 | \subsection{The full \lstinline{waituntil} picture} |
---|
[d4c1d1f] | 769 | |
---|
[afb3d68] | 770 | Given all the complex details handled by the @waituntil@, its full pseudocode is presented in Figure \ref{f:WU_Full_Impl}. |
---|
[d4c1d1f] | 771 | Some things to note are as follows. |
---|
| 772 | The @finally@ blocks provide exception-safe \gls{raii} unregistering of nodes, and in particular, the @finally@ inside the innermost loop performs the immediate unregistering required for deadlock-freedom mentioned in Section~\ref{s:wu_locks}. |
---|
| 773 | The @when_conditions@ array is used to store the boolean result of evaluating each guard at the beginning of the @waituntil@, and it is used to conditionally omit operations on resources with @false@ guards. |
---|
| 774 | As discussed in Section~\ref{s:wu_chans}, this pseudocode includes conditional code-block execution based on the result of both @on_selected@ and @unregister_select@, which allows the channel implementation to ensure all available channel resources have their corresponding code block run. |
---|
[494a7e5] | 775 | |
---|
| 776 | \begin{figure} |
---|
| 777 | \begin{cfa} |
---|
| 778 | bool when_conditions[N]; |
---|
[f496046] | 779 | for ( node in nodes ) $\C[3.75in]{// evaluate guards}$ |
---|
[c03c1ac] | 780 | if ( node has guard ) |
---|
| 781 | when_conditions[node] = node_guard; |
---|
| 782 | else |
---|
| 783 | when_conditions[node] = true; |
---|
[494a7e5] | 784 | |
---|
[000d68f] | 785 | if ( any when_conditions[node] are true ) { |
---|
[d5f5eb7] | 786 | select_nodes nodes[N]; $\C{// declare N select nodes}$ |
---|
| 787 | try { |
---|
| 788 | // ... set statuses for nodes with when_conditions[node] == false ... |
---|
| 789 | |
---|
| 790 | for ( node in nodes ) $\C{// register nodes}$ |
---|
| 791 | if ( when_conditions[node] ) |
---|
| 792 | register_select( resource, node ); |
---|
| 793 | |
---|
| 794 | while ( !check_completion( nodes ) ) { $\C{// check predicate}$ |
---|
| 795 | // block |
---|
| 796 | for ( resource in waituntil statement ) { $\C{// run true code blocks}$ |
---|
| 797 | if ( check_completion( nodes ) ) break; |
---|
| 798 | if ( resource is avail ) |
---|
| 799 | try { |
---|
| 800 | if( on_selected( resource ) ) $\C{// conditionally run block}$ |
---|
| 801 | run code block |
---|
| 802 | } finally $\C{// for exception safety}$ |
---|
| 803 | unregister_select( resource, node ); $\C{// immediate unregister}$ |
---|
| 804 | } |
---|
| 805 | } |
---|
| 806 | } finally { $\C{// for exception safety}$ |
---|
| 807 | for ( registered nodes in nodes ) $\C{// deregister nodes}$ |
---|
| 808 | if ( when_conditions[node] && unregister_select( resource, node ) |
---|
| 809 | && on_selected( resource ) ) |
---|
| 810 | run code block $\C{// run code block upon unregister}\CRT$ |
---|
| 811 | } |
---|
[f496046] | 812 | } |
---|
[494a7e5] | 813 | \end{cfa} |
---|
| 814 | \caption{Full \lstinline{waituntil} Pseudocode Implementation} |
---|
| 815 | \label{f:WU_Full_Impl} |
---|
| 816 | \end{figure} |
---|
| 817 | |
---|
[afb3d68] | 818 | \section{\lstinline{waituntil} Performance} |
---|
[c03c1ac] | 819 | |
---|
[1ae3f185] | 820 | Similar facilities to @waituntil@ are discussed in Section~\ref{s:History} covering C, Ada, Rust, \CC, and OCaml. |
---|
[c03c1ac] | 821 | However, these facilities are either not meaningful or feasible to benchmark against. |
---|
| 822 | The UNIX @select@ and related utilities are not comparable since they are system calls that go into the kernel and operate on file descriptors, whereas the @waituntil@ exists solely in user space. |
---|
[afb3d68] | 823 | Ada's \lstinline[language=Ada]{select} and \uC's \lstinline[language=uC++]{_Accept} only operate on method calls, which is done in \CFA via the @waitfor@ statement. |
---|
[1ae3f185] | 824 | Rust and \CC only offer a busy-wait approach, which is not comparable to a blocking approach. |
---|
[5e81a9c] | 825 | OCaml's @select@ waits on channels that are not comparable with \CFA and Go channels, so OCaml @select@ is not benchmarked against Go's @select@ and \CFA's @waituntil@. |
---|
[c03c1ac] | 826 | |
---|
[1ae3f185] | 827 | The two \gls{synch_multiplex} utilities that are in the realm of comparability with the \CFA @waituntil@ statement are the Go \lstinline[language=Go]{select} statement and the \uC \lstinline[language=uC++]{_Select} statement. |
---|
| 828 | As such, two microbenchmarks are presented, one for Go and one for \uC to contrast this feature. |
---|
| 829 | Given the differences in features, polymorphism, and expressibility between @waituntil@ and \lstinline[language=Go]{select}, and \uC \lstinline[language=uC++]{_Select}, the aim of the microbenchmarking in this chapter is to show that these implementations lie in the same realm of performance, not to pick a winner. |
---|
[760c88c] | 830 | |
---|
| 831 | \subsection{Channel Benchmark} |
---|
| 832 | |
---|
[1ae3f185] | 833 | The channel multiplexing benchmarks compare \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}, where the resource being waited on is a set of channels. |
---|
[aae9c17] | 834 | Although Unix's select, poll and epoll multiplex over file descriptors, which are effectively channels, these utilities are not included in the benchmarks. |
---|
| 835 | Reading or writing to a file descriptor requires a system call which is much more expensive than operating on a user-space channel. |
---|
| 836 | As such, it is infeasible to compare Unix's select, poll and epoll with \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}. |
---|
[1ae3f185] | 837 | The basic structure of the benchmark has the number of cores split evenly between producer and consumer threads, \ie, with 8 cores there are 4 producer and 4 consumer threads. |
---|
[afb3d68] | 838 | The number of resource clauses $C$ is also varied across 2, 4, and 8 clauses, where each clause has a different channel that it waits on. |
---|
[c03c1ac] | 839 | Each producer and consumer repeatedly waits to either produce or consume from one of the $C$ clauses and respective channels. |
---|
| 840 | For example, in \CFA syntax, the work loop in the consumer main with $C = 4$ clauses is: |
---|
[14e1053] | 841 | \begin{cfa} |
---|
[c03c1ac] | 842 | for () |
---|
| 843 | waituntil( val << chans[0] ); or waituntil( val << chans[1] ); |
---|
| 844 | or waituntil( val << chans[2] ); or waituntil( val << chans[3] ); |
---|
[14e1053] | 845 | \end{cfa} |
---|
| 846 | A successful consumption is counted as a channel operation, and the throughput of these operations is measured over 10 seconds. |
---|
[1ae3f185] | 847 | The first benchmark measures throughput of the producers and consumer synchronously waiting on the channels and the second has the threads asynchronously wait on the channels using the Go @default@ and \CFA @else@ clause. |
---|
[14e1053] | 848 | The results are shown in Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} respectively. |
---|
| 849 | |
---|
| 850 | \begin{figure} |
---|
| 851 | \centering |
---|
[847ab8f] | 852 | \captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize} |
---|
[14e1053] | 853 | \subfloat[AMD]{ |
---|
| 854 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_2.pgf}} |
---|
| 855 | } |
---|
| 856 | \subfloat[Intel]{ |
---|
| 857 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_2.pgf}} |
---|
| 858 | } |
---|
[847ab8f] | 859 | \bigskip |
---|
[14e1053] | 860 | |
---|
| 861 | \subfloat[AMD]{ |
---|
| 862 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_4.pgf}} |
---|
| 863 | } |
---|
| 864 | \subfloat[Intel]{ |
---|
| 865 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_4.pgf}} |
---|
| 866 | } |
---|
[847ab8f] | 867 | \bigskip |
---|
[14e1053] | 868 | |
---|
| 869 | \subfloat[AMD]{ |
---|
| 870 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_8.pgf}} |
---|
| 871 | } |
---|
| 872 | \subfloat[Intel]{ |
---|
| 873 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_8.pgf}} |
---|
| 874 | } |
---|
[847ab8f] | 875 | \caption{The channel synchronous multiplexing benchmark comparing Go select and \CFA \lstinline{waituntil} statement throughput (higher is better).} |
---|
[14e1053] | 876 | \label{f:select_contend_bench} |
---|
| 877 | \end{figure} |
---|
[760c88c] | 878 | |
---|
[14e1053] | 879 | \begin{figure} |
---|
| 880 | \centering |
---|
[847ab8f] | 881 | \captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize} |
---|
[14e1053] | 882 | \subfloat[AMD]{ |
---|
| 883 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_2.pgf}} |
---|
| 884 | } |
---|
| 885 | \subfloat[Intel]{ |
---|
| 886 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_2.pgf}} |
---|
| 887 | } |
---|
[847ab8f] | 888 | \bigskip |
---|
[14e1053] | 889 | |
---|
| 890 | \subfloat[AMD]{ |
---|
| 891 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_4.pgf}} |
---|
| 892 | } |
---|
| 893 | \subfloat[Intel]{ |
---|
| 894 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_4.pgf}} |
---|
| 895 | } |
---|
[847ab8f] | 896 | \bigskip |
---|
[14e1053] | 897 | |
---|
| 898 | \subfloat[AMD]{ |
---|
| 899 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_8.pgf}} |
---|
| 900 | } |
---|
| 901 | \subfloat[Intel]{ |
---|
| 902 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_8.pgf}} |
---|
| 903 | } |
---|
[847ab8f] | 904 | \caption{The asynchronous multiplexing channel benchmark comparing Go select and \CFA \lstinline{waituntil} statement throughput (higher is better).} |
---|
[14e1053] | 905 | \label{f:select_spin_bench} |
---|
| 906 | \end{figure} |
---|
[760c88c] | 907 | |
---|
[1ae3f185] | 908 | Both Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} have similar results when comparing \lstinline[language=Go]{select} and @waituntil@. |
---|
| 909 | In the AMD benchmarks (left column), the performance is very similar as the number of cores scale. |
---|
[aae9c17] | 910 | The AMD machine has a high-caching contention cost because of its \emph{chiplet} L3 cache (\ie many L3 caches servicing a small number of cores), which creates a bottleneck on the channel locks and dominates the shape of the performance curve for both \CFA and Go. |
---|
[1ae3f185] | 911 | Hence, it is difficult to detect differences in the \gls{synch_multiplex}, except at low cores, where Go has significantly better performance, due to an optimization in its scheduler. |
---|
| 912 | Go heavily optimizes thread handoffs on the local run-queue, which can result in very good performance for low numbers of threads parking/unparking each other~\cite{go:sched}. |
---|
| 913 | In the Intel benchmarks (right column), \CFA performs better than Go as the number of cores scales past 2/4 and as the number of clauses increase. |
---|
| 914 | This difference is due to Go's acquiring all channel locks when registering and unregistering channels on a \lstinline[language=Go]{select}. |
---|
| 915 | Go then is holding a lock for every channel, resulting in worse performance as the number of channels increase. |
---|
[14e1053] | 916 | In \CFA, since races are consolidated without holding all locks, it scales much better both with cores and clauses since more work can occur in parallel. |
---|
[afb3d68] | 917 | This scalability difference is more significant on the Intel machine than the AMD machine since the Intel has lower cache-contention costs. |
---|
[1ae3f185] | 918 | |
---|
| 919 | The Go approach of holding all internal channel-locks in the \lstinline[language=Go]{select} has additional drawbacks. |
---|
| 920 | There are pathological cases where Go's throughput has significant jitter. |
---|
| 921 | Consider a producer and consumer thread, @P1@ and @C1@, selecting from both channels @A@ and @B@. |
---|
| 922 | \begin{cquote} |
---|
| 923 | \begin{tabular}{@{}ll@{}} |
---|
| 924 | @P1@ & @C1@ \\ |
---|
| 925 | \begin{cfa} |
---|
| 926 | waituntil( A << i ); or waituntil( B << i ); |
---|
| 927 | \end{cfa} |
---|
| 928 | & |
---|
| 929 | \begin{cfa} |
---|
| 930 | waituntil( val << A ); or waituntil( val << B ); |
---|
| 931 | \end{cfa} |
---|
| 932 | \end{tabular} |
---|
| 933 | \end{cquote} |
---|
| 934 | Additionally, there is another producer and consumer thread, @P2@ and @C2@, operating solely on @B@. |
---|
| 935 | \begin{cquote} |
---|
| 936 | \begin{tabular}{@{}ll@{}} |
---|
| 937 | @P2@ & @C2@ \\ |
---|
| 938 | \begin{cfa} |
---|
| 939 | B << val; |
---|
| 940 | \end{cfa} |
---|
| 941 | & |
---|
| 942 | \begin{cfa} |
---|
| 943 | val << B; |
---|
| 944 | \end{cfa} |
---|
| 945 | \end{tabular} |
---|
| 946 | \end{cquote} |
---|
| 947 | In Go, this setup results in significantly worse performance since @P2@ and @C2@ cannot operate in parallel with @P1@ and @C1@ due to all locks being acquired. |
---|
[aae9c17] | 948 | Interestingly, this case may not be as pathological as it seems. |
---|
[afb3d68] | 949 | If the set of channels belonging to a \lstinline[language=Go]{select} have channels that overlap with the set of another \lstinline[language=Go]{select}, these statements lose the ability to operate in parallel. |
---|
[1ae3f185] | 950 | The implementation in \CFA only holds a single lock at a time, resulting in better locking granularity, and hence, more parallelism. |
---|
[14e1053] | 951 | Comparison of this pathological case is shown in Table~\ref{t:pathGo}. |
---|
[afb3d68] | 952 | The AMD results highlight the worst-case scenario for Go since contention is more costly on this machine than the Intel machine. |
---|
[14e1053] | 953 | |
---|
| 954 | \begin{table}[t] |
---|
| 955 | \centering |
---|
| 956 | \setlength{\extrarowheight}{2pt} |
---|
| 957 | \setlength{\tabcolsep}{5pt} |
---|
| 958 | |
---|
[000d68f] | 959 | \caption{Throughput (channel operations per second) of \CFA and Go in a pathological case for contention in Go's select implementation} |
---|
[14e1053] | 960 | \label{t:pathGo} |
---|
[1ae3f185] | 961 | \begin{tabular}{r|r|r} |
---|
| 962 | & \multicolumn{1}{c|}{\CFA} & \multicolumn{1}{c}{Go} \\ |
---|
[847ab8f] | 963 | \hline |
---|
| 964 | AMD & \input{data/nasus_Order} \\ |
---|
| 965 | \hline |
---|
| 966 | Intel & \input{data/pyke_Order} |
---|
[14e1053] | 967 | \end{tabular} |
---|
| 968 | \end{table} |
---|
| 969 | |
---|
| 970 | Another difference between Go and \CFA is the order of clause selection when multiple clauses are available. |
---|
[1ae3f185] | 971 | Go \emph{randomly} selects a clause~\cite{go:select}, but \CFA chooses in the order clauses are listed. |
---|
| 972 | This \CFA design decision allows users to set implicit priorities, which can result in more predictable behaviour and even better performance. |
---|
| 973 | In the previous example, threads @P1@ and @C1@ prioritize channel @A@ in the @waituntil@, which can reduce contention for threads @P2@ and @C2@ accessing channel @B@. |
---|
| 974 | If \CFA did not have priorities, the performance difference in Table~\ref{t:pathGo} would be significant less due to extra contention on channel @B@. |
---|
[760c88c] | 975 | |
---|
| 976 | \subsection{Future Benchmark} |
---|
[1ae3f185] | 977 | |
---|
| 978 | The future benchmark compares \CFA's @waituntil@ with \uC's \lstinline[language=uC++]{_Select}, with both utilities waiting on futures. |
---|
| 979 | While both statements have very similar semantics, supporting @and@ and @or@ operators, \lstinline[language=uC++]{_Select} can only wait on futures, whereas the @waituntil@ is polymorphic. |
---|
| 980 | As such, the underlying implementation of the operators differs between @waituntil@ and \lstinline[language=uC++]{_Select}. |
---|
| 981 | The @waituntil@ statement checks for statement completion using a predicate function, whereas the \lstinline[language=uC++]{_Select} statement maintains a tree that represents the state of the internal predicate. |
---|
[14e1053] | 982 | |
---|
[1ae3f185] | 983 | This benchmark aims to indirectly measure the impact of various predicates on the performance of the @waituntil@ and \lstinline[language=uC++]{_Select} statements. |
---|
| 984 | The benchmark is indirect since the performance of futures in \CFA and \uC differ by a significant margin. |
---|
| 985 | The experiment has a server, which cycles fulfilling three futures, @A@, @B@, and @C@, and a client, which waits for these futures to be fulfilled using four different kinds of predicates given in \CFA: |
---|
| 986 | \begin{cquote} |
---|
| 987 | \begin{tabular}{@{}l|l@{}} |
---|
| 988 | OR & AND \\ |
---|
| 989 | \hline |
---|
[14e1053] | 990 | \begin{cfa} |
---|
| 991 | waituntil( A ) { get( A ); } |
---|
| 992 | or waituntil( B ) { get( B ); } |
---|
| 993 | or waituntil( C ) { get( C ); } |
---|
[1ae3f185] | 994 | \end{cfa} |
---|
| 995 | & |
---|
| 996 | \begin{cfa} |
---|
[14e1053] | 997 | waituntil( A ) { get( A ); } |
---|
| 998 | and waituntil( B ) { get( B ); } |
---|
| 999 | and waituntil( C ) { get( C ); } |
---|
[1ae3f185] | 1000 | \end{cfa} |
---|
| 1001 | \\ |
---|
| 1002 | \multicolumn{2}{@{}c@{}}{} \\ |
---|
| 1003 | AND-OR & OR-AND \\ |
---|
| 1004 | \hline |
---|
| 1005 | \begin{cfa} |
---|
[14e1053] | 1006 | waituntil( A ) { get( A ); } |
---|
| 1007 | and waituntil( B ) { get( B ); } |
---|
| 1008 | or waituntil( C ) { get( C ); } |
---|
[1ae3f185] | 1009 | \end{cfa} |
---|
| 1010 | & |
---|
| 1011 | \begin{cfa} |
---|
| 1012 | @(@ waituntil( A ) { get( A ); } |
---|
| 1013 | or waituntil( B ) { get( B ); } @)@ |
---|
[14e1053] | 1014 | and waituntil( C ) { get( C ); } |
---|
| 1015 | \end{cfa} |
---|
[1ae3f185] | 1016 | \end{tabular} |
---|
| 1017 | \end{cquote} |
---|
| 1018 | The server and client use a low cost synchronize after each fulfillment, so the server does not race ahead of the client. |
---|
[a0c746df] | 1019 | |
---|
[1ae3f185] | 1020 | Results of this benchmark are shown in Figure~\ref{f:futurePerf}. |
---|
[afb3d68] | 1021 | Each pair of bars is marked with the predicate name for that experiment and the value at the top of each bar is the standard deviation. |
---|
[1ae3f185] | 1022 | In detail, \uC results are lower in all cases due to the performance difference between futures and the more complex \gls{synch_multiplex} implementation. |
---|
| 1023 | However, the bars for both systems have similar height patterns across the experiments. |
---|
| 1024 | The @OR@ column for \CFA is more performant than the other \CFA predicates, due to the special-casing of @waituntil@ statements with only @or@ operators. |
---|
| 1025 | For both \uC and \CFA, the @AND@ experiment is the least performant, which is expected since all three futures need to be fulfilled for each statement completion. |
---|
[14e1053] | 1026 | Interestingly, \CFA has lower variation across predicates on the AMD (excluding the special OR case), whereas \uC has lower variation on the Intel. |
---|
[1ae3f185] | 1027 | Given the differences in semantics and implementation between \uC and \CFA, this test only illustrates the overall costs among the different kinds of predicates. |
---|
[afb3d68] | 1028 | |
---|
| 1029 | \begin{figure} |
---|
| 1030 | \centering |
---|
| 1031 | \subfloat[AMD Future Synchronization Benchmark]{ |
---|
| 1032 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Future.pgf}} |
---|
| 1033 | \label{f:futureAMD} |
---|
| 1034 | } |
---|
| 1035 | \subfloat[Intel Future Synchronization Benchmark]{ |
---|
| 1036 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Future.pgf}} |
---|
| 1037 | \label{f:futureIntel} |
---|
| 1038 | } |
---|
| 1039 | \caption{\CFA \lstinline{waituntil} and \uC \lstinline{_Select} statement throughput synchronizing on a set of futures with varying wait predicates (higher is better).} |
---|
| 1040 | \label{f:futurePerf} |
---|
| 1041 | \end{figure} |
---|