| [a0c746df] | 1 | % ====================================================================== | 
|---|
|  | 2 | % ====================================================================== | 
|---|
|  | 3 | \chapter{Waituntil}\label{s:waituntil} | 
|---|
|  | 4 | % ====================================================================== | 
|---|
|  | 5 | % ====================================================================== | 
|---|
|  | 6 |  | 
|---|
| [5e81a9c] | 7 | Consider the following motivating problem. | 
|---|
| [847ab8f] | 8 | There are $N$ stalls (resources) in a bathroom and there are $M$ people (threads) using the bathroom. | 
|---|
| [5e81a9c] | 9 | Each stall has its own lock since only one person may occupy a stall at a time. | 
|---|
| [847ab8f] | 10 | Humans solve this problem in the following way. | 
|---|
| [5e81a9c] | 11 | They check if all of the stalls are occupied. | 
|---|
| [847ab8f] | 12 | If not, they enter and claim an available stall. | 
|---|
|  | 13 | If they are all occupied, people queue and watch the stalls until one is free, and then enter and lock the stall. | 
|---|
|  | 14 | This solution can be implemented on a computer easily, if all threads are waiting on all stalls and agree to queue. | 
|---|
|  | 15 |  | 
|---|
| [5e81a9c] | 16 | Now the problem is extended. | 
|---|
| [c03c1ac] | 17 | Some stalls are wheelchair accessible and some stalls have gender identification. | 
|---|
| [847ab8f] | 18 | Each person (thread) may be limited to only one kind of stall or may choose among different kinds of stalls that match their criteria. | 
|---|
|  | 19 | Immediately, the problem becomes more difficult. | 
|---|
| [c03c1ac] | 20 | A single queue no longer solves the problem. | 
|---|
| [847ab8f] | 21 | What happens when there is a stall available that the person at the front of the queue cannot choose? | 
|---|
| [c03c1ac] | 22 | The na\"ive solution has each thread spin indefinitely continually checking every matching kind of stall(s) until a suitable one is free. | 
|---|
| [847ab8f] | 23 | This approach is insufficient since it wastes cycles and results in unfairness among waiting threads as a thread can acquire the first matching stall without regard to the waiting time of other threads. | 
|---|
|  | 24 | Waiting for the first appropriate stall (resource) that becomes available without spinning is an example of \gls{synch_multiplex}: the ability to wait synchronously for one or more resources based on some selection criteria. | 
|---|
| [a0c746df] | 25 |  | 
|---|
| [1ae3f185] | 26 | \section{History of Synchronous Multiplexing}\label{s:History} | 
|---|
|  | 27 |  | 
|---|
| [a0c746df] | 28 | There is a history of tools that provide \gls{synch_multiplex}. | 
|---|
| [847ab8f] | 29 | Some well known \gls{synch_multiplex} tools include Unix system utilities: @select@~\cite{linux:select}, @poll@~\cite{linux:poll}, and @epoll@~\cite{linux:epoll}, and the @select@ statement provided by Go~\cite{go:selectref}, Ada~\cite[\S~9.7]{Ada16}, and \uC~\cite[\S~3.3.1]{uC++}. | 
|---|
|  | 30 | The concept and theory surrounding \gls{synch_multiplex} was introduced by Hoare in his 1985 book, Communicating Sequential Processes (CSP)~\cite{Hoare85}, | 
|---|
|  | 31 | \begin{quote} | 
|---|
|  | 32 | A communication is an event that is described by a pair $c.v$ where $c$ is the name of the channel on which the communication takes place and $v$ is the value of the message which passes.~\cite[p.~113]{Hoare85} | 
|---|
|  | 33 | \end{quote} | 
|---|
|  | 34 | The ideas in CSP were implemented by Roscoe and Hoare in the language Occam~\cite{Roscoe88}. | 
|---|
|  | 35 |  | 
|---|
|  | 36 | Both CSP and Occam include the ability to wait for a \Newterm{choice} among receiver channels and \Newterm{guards} to toggle which receives are valid. | 
|---|
|  | 37 | For example, | 
|---|
|  | 38 | \begin{cfa}[mathescape] | 
|---|
|  | 39 | (@G1@(x) $\rightarrow$ P @|@ @G2@(y) $\rightarrow$ Q ) | 
|---|
|  | 40 | \end{cfa} | 
|---|
| [c03c1ac] | 41 | waits for either channel @x@ or @y@ to have a value, if and only if guards @G1@ and @G2@ are true; | 
|---|
| [847ab8f] | 42 | if only one guard is true, only one channel receives, and if both guards are false, no receive occurs. | 
|---|
|  | 43 | % extended CSP with a \gls{synch_multiplex} construct @ALT@, which waits for one resource to be available and then executes a corresponding block of code. | 
|---|
|  | 44 | In detail, waiting for one resource out of a set of resources can be thought of as a logical exclusive-or over the set of resources. | 
|---|
| [a0c746df] | 45 | Guards are a conditional operator similar to an @if@, except they apply to the resource being waited on. | 
|---|
| [847ab8f] | 46 | If a guard is false, then the resource it guards is not in the set of resources being waited on. | 
|---|
| [7ed01be] | 47 | If all guards are false, the ALT, Occam's \gls{synch_multiplex} statement, does nothing and the thread continues. | 
|---|
| [847ab8f] | 48 | Guards can be simulated using @if@ statements as shown in~\cite[rule~2.4, p~183]{Roscoe88} | 
|---|
|  | 49 | \begin{lstlisting}[basicstyle=\rm,mathescape] | 
|---|
|  | 50 | ALT( $b$ & $g$ $P$, $G$ ) = IF ( $b$ ALT($\,g$ $P$, $G$ ), $\neg\,$b ALT( $G$ ) )                        (boolean guard elim). | 
|---|
|  | 51 | \end{lstlisting} | 
|---|
|  | 52 | but require $2^N-1$ @if@ statements, where $N$ is the number of guards. | 
|---|
|  | 53 | The exponential blowup comes from applying rule 2.4 repeatedly, since it works on one guard at a time. | 
|---|
| [c03c1ac] | 54 | Figure~\ref{f:wu_if} shows in \CFA an example of applying rule 2.4 for three guards. | 
|---|
| [847ab8f] | 55 | Also, notice the additional code duplication for statements @S1@, @S2@, and @S3@. | 
|---|
| [760c88c] | 56 |  | 
|---|
|  | 57 | \begin{figure} | 
|---|
| [847ab8f] | 58 | \centering | 
|---|
|  | 59 | \begin{lrbox}{\myboxA} | 
|---|
| [760c88c] | 60 | \begin{cfa} | 
|---|
| [847ab8f] | 61 | when( G1 ) | 
|---|
|  | 62 | waituntil( R1 ) S1 | 
|---|
|  | 63 | or when( G2 ) | 
|---|
|  | 64 | waituntil( R2 ) S2 | 
|---|
|  | 65 | or when( G3 ) | 
|---|
|  | 66 | waituntil( R3 ) S3 | 
|---|
|  | 67 |  | 
|---|
|  | 68 |  | 
|---|
|  | 69 |  | 
|---|
|  | 70 |  | 
|---|
|  | 71 |  | 
|---|
|  | 72 |  | 
|---|
|  | 73 |  | 
|---|
|  | 74 | \end{cfa} | 
|---|
|  | 75 | \end{lrbox} | 
|---|
|  | 76 |  | 
|---|
|  | 77 | \begin{lrbox}{\myboxB} | 
|---|
|  | 78 | \begin{cfa} | 
|---|
|  | 79 | if ( G1 ) | 
|---|
|  | 80 | if ( G2 ) | 
|---|
|  | 81 | if ( G3 ) waituntil( R1 ) S1 or waituntil( R2 ) S2 or waituntil( R3 ) S3 | 
|---|
|  | 82 | else waituntil( R1 ) S1 or waituntil( R2 ) S2 | 
|---|
|  | 83 | else | 
|---|
|  | 84 | if ( G3 ) waituntil( R1 ) S1 or waituntil( R3 ) S3 | 
|---|
|  | 85 | else waituntil( R1 ) S1 | 
|---|
|  | 86 | else | 
|---|
|  | 87 | if ( G2 ) | 
|---|
|  | 88 | if ( G3 ) waituntil( R2 ) S2 or waituntil( R3 ) S3 | 
|---|
|  | 89 | else waituntil( R2 ) S2 | 
|---|
|  | 90 | else | 
|---|
|  | 91 | if ( G3 ) waituntil( R3 ) S3 | 
|---|
| [760c88c] | 92 | \end{cfa} | 
|---|
| [847ab8f] | 93 | \end{lrbox} | 
|---|
|  | 94 |  | 
|---|
|  | 95 | \subfloat[Guards]{\label{l:guards}\usebox\myboxA} | 
|---|
|  | 96 | \hspace*{5pt} | 
|---|
|  | 97 | \vrule | 
|---|
|  | 98 | \hspace*{5pt} | 
|---|
|  | 99 | \subfloat[Simulated Guards]{\label{l:simulated_guards}\usebox\myboxB} | 
|---|
|  | 100 | \caption{\CFA guard simulated with \lstinline{if} statement.} | 
|---|
| [760c88c] | 101 | \label{f:wu_if} | 
|---|
|  | 102 | \end{figure} | 
|---|
| [a0c746df] | 103 |  | 
|---|
| [847ab8f] | 104 | When discussing \gls{synch_multiplex} implementations, the resource being multiplexed is important. | 
|---|
| [aae9c17] | 105 | While CSP waits on channels, the earliest known implementation of synch\-ronous multiplexing is Unix's @select@~\cite{linux:select}, multiplexing over file descriptors, which conceptually differ from channels in name only. | 
|---|
| [847ab8f] | 106 | The @select@ system-call is passed three sets of file descriptors (read, write, exceptional) to wait on and an optional timeout. | 
|---|
|  | 107 | @select@ blocks until either some subset of file descriptors are available or the timeout expires. | 
|---|
|  | 108 | All file descriptors that are ready are returned by modifying the argument sets to only contain the ready descriptors. | 
|---|
|  | 109 |  | 
|---|
|  | 110 | This early implementation differs from the theory presented in CSP: when the call from @select@ returns it may provide more than one ready file descriptor. | 
|---|
|  | 111 | As such, @select@ has logical-or multiplexing semantics, whereas the theory described exclusive-or semantics. | 
|---|
|  | 112 | It is possible to achieve exclusive-or semantics with @select@ by arbitrarily operating on only one of the returned descriptors. | 
|---|
|  | 113 | @select@ passes the interest set of file descriptors between application and kernel in the form of a worst-case sized bit-mask, where the worst-case is the largest numbered file descriptor. | 
|---|
| [c03c1ac] | 114 | @poll@ reduces the size of the interest sets changing from a bit mask to a linked data structure, independent of the file-descriptor values. | 
|---|
| [847ab8f] | 115 | @epoll@ further reduces the data passed per call by keeping the interest set in the kernel, rather than supplying it on every call. | 
|---|
|  | 116 |  | 
|---|
|  | 117 | These early \gls{synch_multiplex} tools interact directly with the operating system and others are used to communicate among processes. | 
|---|
|  | 118 | Later, \gls{synch_multiplex} started to appear in applications, via programming languages, to support fast multiplexed concurrent communication among threads. | 
|---|
|  | 119 | An early example of \gls{synch_multiplex} is the @select@ statement in Ada~\cite[\S~9.7]{Ichbiah79}. | 
|---|
| [c03c1ac] | 120 | This @select@ allows a task object, with their own threads, to multiplex over a subset of asynchronous calls to its methods. | 
|---|
|  | 121 | The Ada @select@ has the same exclusive-or semantics and guards as Occam ALT; | 
|---|
| [847ab8f] | 122 | however, it multiplexes over methods rather than channels. | 
|---|
|  | 123 |  | 
|---|
|  | 124 | \begin{figure} | 
|---|
| [afb3d68] | 125 | \begin{lstlisting}[language=ada,literate=,{moredelim={**[is][\color{red}]{@}{@}}}] | 
|---|
|  | 126 | task type buffer is -- thread type | 
|---|
| [847ab8f] | 127 | ... -- buffer declarations | 
|---|
|  | 128 | count : integer := 0; | 
|---|
|  | 129 | begin -- thread starts here | 
|---|
|  | 130 | loop | 
|---|
|  | 131 | select | 
|---|
| [afb3d68] | 132 | @when count < Size@ => -- guard | 
|---|
|  | 133 | @accept insert( elem : in ElemType )@ do  -- method | 
|---|
| [847ab8f] | 134 | ... -- add to buffer | 
|---|
|  | 135 | count := count + 1; | 
|---|
|  | 136 | end; | 
|---|
|  | 137 | -- executed if this accept called | 
|---|
|  | 138 | or | 
|---|
| [afb3d68] | 139 | @when count > 0@ => -- guard | 
|---|
|  | 140 | @accept remove( elem : out ElemType )@ do  -- method | 
|---|
| [847ab8f] | 141 | ... --remove and return from buffer via parameter | 
|---|
|  | 142 | count := count - 1; | 
|---|
|  | 143 | end; | 
|---|
|  | 144 | -- executed if this accept called | 
|---|
| [afb3d68] | 145 | or @delay 10.0@;  -- unblock after 10 seconds without call | 
|---|
|  | 146 | or @else@ -- do not block, cannot appear with delay | 
|---|
| [847ab8f] | 147 | end select; | 
|---|
|  | 148 | end loop; | 
|---|
|  | 149 | end buffer; | 
|---|
|  | 150 | var buf : buffer; -- create task object and start thread in task body | 
|---|
|  | 151 | \end{lstlisting} | 
|---|
|  | 152 | \caption{Ada Bounded Buffer} | 
|---|
|  | 153 | \label{f:BB_Ada} | 
|---|
|  | 154 | \end{figure} | 
|---|
|  | 155 |  | 
|---|
| [c03c1ac] | 156 | Figure~\ref{f:BB_Ada} shows the outline of a bounded buffer implemented with an Ada task. | 
|---|
| [aae9c17] | 157 | Note that a task method is associated with the \lstinline[language=ada]{accept} clause of the \lstinline[language=ada]{select} statement, rather than being a separate routine. | 
|---|
| [847ab8f] | 158 | The thread executing the loop in the task body blocks at the \lstinline[language=ada]{select} until a call occurs to @insert@ or @remove@. | 
|---|
|  | 159 | Then the appropriate \lstinline[language=ada]{accept} method is run with the called arguments. | 
|---|
| [c03c1ac] | 160 | Hence, the \lstinline[language=ada]{select} statement provides rendezvous points for threads, rather than providing channels with message passing. | 
|---|
| [847ab8f] | 161 | The \lstinline[language=ada]{select} statement also provides a timeout and @else@ (nonblocking), which changes synchronous multiplexing to asynchronous. | 
|---|
|  | 162 | Now the thread polls rather than blocks. | 
|---|
|  | 163 |  | 
|---|
|  | 164 | Another example of programming-language \gls{synch_multiplex} is Go using a @select@ statement with channels~\cite{go:selectref}. | 
|---|
| [000d68f] | 165 | Figure~\ref{l:BB_Go} shows the outline of a bounded buffer administrator implemented with a Go routine. | 
|---|
| [847ab8f] | 166 | Here two channels are used for inserting and removing by client producers and consumers, respectively. | 
|---|
| [000d68f] | 167 | (The @done@ channel is used to synchronize with the program main.) | 
|---|
| [847ab8f] | 168 | Go's @select@ has the same exclusive-or semantics as the ALT primitive from Occam and associated code blocks for each clause like ALT and Ada. | 
|---|
|  | 169 | However, unlike Ada and ALT, Go does not provide guards for the \lstinline[language=go]{case} clauses of the \lstinline[language=go]{select}. | 
|---|
| [9509d67a] | 170 | As such, the exponential blowup can be seen comparing Go and \uC in Figure~\ref{f:AdaMultiplexing}. | 
|---|
| [847ab8f] | 171 | Go also provides a timeout via a channel and a @default@ clause like Ada @else@ for asynchronous multiplexing. | 
|---|
|  | 172 |  | 
|---|
|  | 173 | \begin{figure} | 
|---|
|  | 174 | \centering | 
|---|
|  | 175 |  | 
|---|
|  | 176 | \begin{lrbox}{\myboxA} | 
|---|
| [d5f5eb7] | 177 | \begin{lstlisting}[language=go,literate=,{moredelim={**[is][\color{red}]{@}{@}}}] | 
|---|
| [000d68f] | 178 | insert := make( chan int ) | 
|---|
|  | 179 | remove := make( chan * int ) | 
|---|
|  | 180 | buffer := make( chan int Size ) | 
|---|
|  | 181 | done := make( chan int ) | 
|---|
|  | 182 | count := 0 | 
|---|
|  | 183 | func in_buf( int val ) { | 
|---|
| [d5f5eb7] | 184 | buffer <- val | 
|---|
|  | 185 | count++ | 
|---|
| [000d68f] | 186 | } | 
|---|
|  | 187 | func out_buf( int * ptr ) { | 
|---|
| [d5f5eb7] | 188 | *ptr := <-buffer | 
|---|
|  | 189 | count-- | 
|---|
| [000d68f] | 190 | } | 
|---|
|  | 191 | func BoundedBuffer { | 
|---|
| [d5f5eb7] | 192 | L: for { | 
|---|
|  | 193 | if ( count < Size && count > 0 ) { | 
|---|
|  | 194 | select { // wait for message | 
|---|
|  | 195 | @case i := <- insert: in_buf( i )@ | 
|---|
|  | 196 | @case p := <- remove: out_buf( p )@ | 
|---|
|  | 197 | case <- done: break L | 
|---|
|  | 198 | } | 
|---|
|  | 199 | } else if ( count < Size ) { | 
|---|
|  | 200 | select { // wait for message | 
|---|
|  | 201 | @case i := <- insert: in_buf( i )@ | 
|---|
|  | 202 | case <- done: break L | 
|---|
|  | 203 | } | 
|---|
|  | 204 | } else ( count > 0 ) { | 
|---|
|  | 205 | select { // wait for message | 
|---|
|  | 206 | @case p := <- remove: out_buf( p )@ | 
|---|
|  | 207 | case <- done: break L; | 
|---|
|  | 208 | } | 
|---|
|  | 209 | } | 
|---|
|  | 210 | } | 
|---|
|  | 211 | done <- 0 | 
|---|
| [000d68f] | 212 | } | 
|---|
| [847ab8f] | 213 | func main() { | 
|---|
| [000d68f] | 214 | go BoundedBuffer() // start administrator | 
|---|
| [847ab8f] | 215 | } | 
|---|
|  | 216 | \end{lstlisting} | 
|---|
|  | 217 | \end{lrbox} | 
|---|
|  | 218 |  | 
|---|
|  | 219 | \begin{lrbox}{\myboxB} | 
|---|
| [afb3d68] | 220 | \begin{lstlisting}[language=uC++,{moredelim={**[is][\color{red}]{@}{@}}}] | 
|---|
| [d5f5eb7] | 221 |  | 
|---|
|  | 222 |  | 
|---|
|  | 223 |  | 
|---|
|  | 224 |  | 
|---|
|  | 225 |  | 
|---|
|  | 226 |  | 
|---|
|  | 227 |  | 
|---|
|  | 228 |  | 
|---|
|  | 229 |  | 
|---|
|  | 230 |  | 
|---|
|  | 231 |  | 
|---|
|  | 232 |  | 
|---|
| [847ab8f] | 233 | _Task BoundedBuffer { | 
|---|
| [000d68f] | 234 | int * buffer; | 
|---|
|  | 235 | int front = back = count = 0; | 
|---|
| [847ab8f] | 236 | public: | 
|---|
| [d5f5eb7] | 237 | // ... constructor implementation | 
|---|
| [847ab8f] | 238 | void insert( int elem ) { | 
|---|
| [000d68f] | 239 | buffer[front] = elem; | 
|---|
| [d5f5eb7] | 240 | front = ( front + 1 ) % Size; | 
|---|
| [847ab8f] | 241 | count += 1; | 
|---|
|  | 242 | } | 
|---|
|  | 243 | int remove() { | 
|---|
| [000d68f] | 244 | int ret = buffer[back]; | 
|---|
| [d5f5eb7] | 245 | back = ( back + 1 ) % Size; | 
|---|
| [847ab8f] | 246 | count -= 1; | 
|---|
| [d5f5eb7] | 247 | return ret; | 
|---|
| [847ab8f] | 248 | } | 
|---|
|  | 249 | private: | 
|---|
|  | 250 | void main() { | 
|---|
|  | 251 | for ( ;; ) { | 
|---|
|  | 252 | _Accept( ~buffer ) break; | 
|---|
| [afb3d68] | 253 | @or _When ( count < Size ) _Accept( insert )@; | 
|---|
|  | 254 | @or _When ( count > 0 ) _Accept( remove )@; | 
|---|
| [847ab8f] | 255 | } | 
|---|
|  | 256 | } | 
|---|
|  | 257 | }; | 
|---|
|  | 258 | buffer buf; // start thread in main method | 
|---|
|  | 259 | \end{lstlisting} | 
|---|
|  | 260 | \end{lrbox} | 
|---|
|  | 261 |  | 
|---|
|  | 262 | \subfloat[Go]{\label{l:BB_Go}\usebox\myboxA} | 
|---|
|  | 263 | \hspace*{5pt} | 
|---|
|  | 264 | \vrule | 
|---|
|  | 265 | \hspace*{5pt} | 
|---|
|  | 266 | \subfloat[\uC]{\label{l:BB_uC++}\usebox\myboxB} | 
|---|
|  | 267 |  | 
|---|
|  | 268 | \caption{Bounded Buffer} | 
|---|
|  | 269 | \label{f:AdaMultiplexing} | 
|---|
|  | 270 | \end{figure} | 
|---|
|  | 271 |  | 
|---|
|  | 272 | Finally, \uC provides \gls{synch_multiplex} with Ada-style @select@ over monitor and task methods with the @_Accept@ statement~\cite[\S~2.9.2.1]{uC++}, and over futures with the @_Select@ statement~\cite[\S~3.3.1]{uC++}. | 
|---|
|  | 273 | The @_Select@ statement extends the ALT/Go @select@ by offering both @and@ and @or@ semantics, which can be used together in the same statement. | 
|---|
|  | 274 | Both @_Accept@ and @_Select@ statements provide guards for multiplexing clauses, as well as, timeout, and @else@ clauses. | 
|---|
| [000d68f] | 275 | Figure~\ref{l:BB_uC++} shows the outline of a bounded buffer administrator implemented with \uC @_Accept@ statements. | 
|---|
| [847ab8f] | 276 |  | 
|---|
| [afb3d68] | 277 | There are other languages that provide \gls{synch_multiplex}, including Rust's @select!@ over futures~\cite{rust:select}, Java's @allof@/@anyof@ over futures~\cite{java:allof:anyof}, OCaml's @select@ over channels~\cite{ocaml:channel}, and C++14's @when_any@ over futures~\cite{cpp:whenany}. | 
|---|
| [847ab8f] | 278 | Note that while C++14 and Rust provide \gls{synch_multiplex}, the implementations leave much to be desired as both rely on polling to wait on multiple resources. | 
|---|
| [760c88c] | 279 |  | 
|---|
| [a0c746df] | 280 | \section{Other Approaches to Synchronous Multiplexing} | 
|---|
| [847ab8f] | 281 |  | 
|---|
|  | 282 | To avoid the need for \gls{synch_multiplex}, all communication among threads/processes must come from a single source. | 
|---|
|  | 283 | For example, in Erlang each process has a single heterogeneous mailbox that is the sole source of concurrent communication, removing the need for \gls{synch_multiplex} as there is only one place to wait on resources. | 
|---|
|  | 284 | Similar, actor systems circumvent the \gls{synch_multiplex} problem as actors only block when waiting for the next message never in a behaviour. | 
|---|
| [a0c746df] | 285 | While these approaches solve the \gls{synch_multiplex} problem, they introduce other issues. | 
|---|
| [afb3d68] | 286 | Consider the case where a thread has a single source of communication and it wants a set of $N$ resources. | 
|---|
|  | 287 | It must sequentially request the $N$ resources and wait for each response. | 
|---|
|  | 288 | During the receives for the $N$ resources, it can receive other communication, and has to save and postpone these communications, or discard them. | 
|---|
| [847ab8f] | 289 | % If the requests for the other resources need to be retracted, the burden falls on the programmer to determine how to synchronize appropriately to ensure that only one resource is delivered. | 
|---|
| [a0c746df] | 290 |  | 
|---|
|  | 291 | \section{\CFA's Waituntil Statement} | 
|---|
| [847ab8f] | 292 |  | 
|---|
| [760c88c] | 293 | The new \CFA \gls{synch_multiplex} utility introduced in this work is the @waituntil@ statement. | 
|---|
| [c03c1ac] | 294 | There already exists a @waitfor@ statement in \CFA that supports Ada-style \gls{synch_multiplex} over monitor methods~\cite{Delisle21}, so this @waituntil@ focuses on synchronizing over other resources. | 
|---|
| [847ab8f] | 295 | All of the \gls{synch_multiplex} features mentioned so far are monomorphic, only waiting on one kind of resource: Unix @select@ supports file descriptors, Go's @select@ supports channel operations, \uC's @select@ supports futures, and Ada's @select@ supports monitor method calls. | 
|---|
|  | 296 | The \CFA @waituntil@ is polymorphic and provides \gls{synch_multiplex} over any objects that satisfy the trait in Figure~\ref{f:wu_trait}. | 
|---|
|  | 297 | No other language provides a synchronous multiplexing tool polymorphic over resources like \CFA's @waituntil@. | 
|---|
| [760c88c] | 298 |  | 
|---|
|  | 299 | \begin{figure} | 
|---|
|  | 300 | \begin{cfa} | 
|---|
|  | 301 | forall(T & | sized(T)) | 
|---|
|  | 302 | trait is_selectable { | 
|---|
| [847ab8f] | 303 | // For registering a waituntil stmt on a selectable type | 
|---|
|  | 304 | bool register_select( T &, select_node & ); | 
|---|
| [760c88c] | 305 |  | 
|---|
| [847ab8f] | 306 | // For unregistering a waituntil stmt from a selectable type | 
|---|
|  | 307 | bool unregister_select( T &, select_node & ); | 
|---|
| [760c88c] | 308 |  | 
|---|
| [847ab8f] | 309 | // on_selected is run on the selecting thread prior to executing | 
|---|
|  | 310 | // the statement associated with the select_node | 
|---|
|  | 311 | bool on_selected( T &, select_node & ); | 
|---|
| [760c88c] | 312 | }; | 
|---|
|  | 313 | \end{cfa} | 
|---|
| [847ab8f] | 314 | \caption{Trait for types that can be passed into \CFA's \lstinline{waituntil} statement.} | 
|---|
| [760c88c] | 315 | \label{f:wu_trait} | 
|---|
|  | 316 | \end{figure} | 
|---|
|  | 317 |  | 
|---|
| [c03c1ac] | 318 | Currently locks, channels, futures and timeouts are supported by the @waituntil@ statement, and this set can be expanded through the @is_selectable@ trait as other use-cases arise. | 
|---|
|  | 319 | The @waituntil@ statement supports guard clauses, both @or@ and @and@ semantics, and timeout and @else@ for asynchronous multiplexing. | 
|---|
| [847ab8f] | 320 | Figure~\ref{f:wu_example} shows a \CFA @waituntil@ usage, which is waiting for either @Lock@ to be available \emph{or} for a value to be read from @Channel@ into @i@ \emph{and} for @Future@ to be fulfilled \emph{or} a timeout of one second. | 
|---|
| [aae9c17] | 321 | Note that the expression inside a @waituntil@ clause is evaluated once at the start of the @waituntil@ algorithm. | 
|---|
| [760c88c] | 322 |  | 
|---|
|  | 323 | \section{Waituntil Semantics} | 
|---|
|  | 324 |  | 
|---|
| [c03c1ac] | 325 | The @waituntil@ semantics has two parts: the semantics of the statement itself, \ie @and@, @or@, @when@ guards, and @else@ semantics, and the semantics of how the @waituntil@ interacts with types like locks, channels, and futures. | 
|---|
| [847ab8f] | 326 |  | 
|---|
|  | 327 | \subsection{Statement Semantics} | 
|---|
|  | 328 |  | 
|---|
|  | 329 | The @or@ semantics are the most straightforward and nearly match those laid out in the ALT statement from Occam. | 
|---|
|  | 330 | The clauses have an exclusive-or relationship where the first available one is run and only one clause is run. | 
|---|
|  | 331 | \CFA's @or@ semantics differ from ALT semantics: instead of randomly picking a clause when multiple are available, the first clause in the @waituntil@ that is available is executed. | 
|---|
|  | 332 | For example, in the following example, if @foo@ and @bar@ are both available, @foo@ is always selected since it comes first in the order of @waituntil@ clauses. | 
|---|
| [760c88c] | 333 | \begin{cfa} | 
|---|
| [847ab8f] | 334 | future(int) bar, foo; | 
|---|
| [afb3d68] | 335 | waituntil( foo ) { ... } or waituntil( bar ) { ... } // prioritize foo | 
|---|
| [c03c1ac] | 336 | \end{cfa} | 
|---|
| [2cb15b0] | 337 | The reason for these semantics is that prioritizing resources can be useful in certain problems, such as shutdown. | 
|---|
| [afb3d68] | 338 | In the rare case where there is a starvation problem with the ordering, it possible to follow a @waituntil@ with its reverse form, alternating which resource has the highest priority: | 
|---|
| [c03c1ac] | 339 | \begin{cfa} | 
|---|
|  | 340 | waituntil( foo ) { ... } or waituntil( bar ) { ... } // prioritize foo | 
|---|
|  | 341 | waituntil( bar ) { ... } or waituntil( foo ) { ... } // prioritize bar | 
|---|
| [760c88c] | 342 | \end{cfa} | 
|---|
| [afb3d68] | 343 | While this approach is not general for many resources, it handles many basic cases. | 
|---|
| [760c88c] | 344 |  | 
|---|
| [2cb15b0] | 345 | \begin{figure} | 
|---|
|  | 346 | \begin{cfa} | 
|---|
|  | 347 | future(int) Future; | 
|---|
|  | 348 | channel(int) Channel; | 
|---|
|  | 349 | owner_lock Lock; | 
|---|
|  | 350 | int i = 0; | 
|---|
|  | 351 |  | 
|---|
|  | 352 | waituntil( @Lock@ ) { ... } | 
|---|
|  | 353 | or when( i == 0 ) waituntil( i << @Channel@ ) { ... } | 
|---|
|  | 354 | and waituntil( @Future@ ) { ... } | 
|---|
|  | 355 | or waituntil( @timeout( 1`s )@ ) { ... } | 
|---|
|  | 356 | // else { ... } | 
|---|
|  | 357 | \end{cfa} | 
|---|
|  | 358 | \caption{Example of \CFA's waituntil statement} | 
|---|
|  | 359 | \label{f:wu_example} | 
|---|
|  | 360 | \end{figure} | 
|---|
|  | 361 |  | 
|---|
| [847ab8f] | 362 | The \CFA @and@ semantics match the @and@ semantics of \uC \lstinline[language=uC++]{_Select}. | 
|---|
|  | 363 | When multiple clauses are joined by @and@, the @waituntil@ makes a thread wait for all to be available, but still runs the corresponding code blocks \emph{as they become available}. | 
|---|
| [afb3d68] | 364 | When an @and@ clause becomes available, the waiting thread unblocks and runs that clause's code-block, and then the thread waits again for the next available clause or the @waituntil@ statement is now satisfied. | 
|---|
| [847ab8f] | 365 | This semantics allows work to be done in parallel while synchronizing over a set of resources, and furthermore, gives a good reason to use the @and@ operator. | 
|---|
| [c03c1ac] | 366 | If the @and@ operator waited for all clauses to be available before running, it is the same as just acquiring those resources consecutively by a sequence of @waituntil@ statements. | 
|---|
| [847ab8f] | 367 |  | 
|---|
| [aae9c17] | 368 | As with normal C expressions, the @and@ operator binds more tightly than the @or@. | 
|---|
| [847ab8f] | 369 | To give an @or@ operator higher precedence, parenthesis are used. | 
|---|
|  | 370 | For example, the following @waituntil@ unconditionally waits for @C@ and one of either @A@ or @B@, since the @or@ is given higher precedence via parenthesis. | 
|---|
| [760c88c] | 371 | \begin{cfa} | 
|---|
| [847ab8f] | 372 | @(@ waituntil( A ) { ... }              // bind tightly to or | 
|---|
|  | 373 | or waituntil( B ) { ... } @)@ | 
|---|
| [760c88c] | 374 | and waituntil( C ) { ... } | 
|---|
|  | 375 | \end{cfa} | 
|---|
|  | 376 |  | 
|---|
| [847ab8f] | 377 | The guards in the @waituntil@ statement are called @when@ clauses. | 
|---|
|  | 378 | Each boolean expression inside a @when@ is evaluated \emph{once} before the @waituntil@ statement is run. | 
|---|
|  | 379 | Like Occam's ALT, the guards toggle clauses on and off, where a @waituntil@ clause is only evaluated and waited on if the corresponding guard is @true@. | 
|---|
|  | 380 | In addition, the @waituntil@ guards require some nuance since both @and@ and @or@ operators are supported \see{Section~\ref{s:wu_guards}}. | 
|---|
|  | 381 | When a guard is false and a clause is removed, it can be thought of as removing that clause and its preceding operation from the statement. | 
|---|
|  | 382 | For example, in the following, the two @waituntil@ statements are semantically equivalent. | 
|---|
|  | 383 |  | 
|---|
|  | 384 | \begin{lrbox}{\myboxA} | 
|---|
| [760c88c] | 385 | \begin{cfa} | 
|---|
| [847ab8f] | 386 | when( true ) waituntil( A ) { ... } | 
|---|
|  | 387 | or when( false ) waituntil( B ) { ... } | 
|---|
| [760c88c] | 388 | and waituntil( C ) { ... } | 
|---|
| [847ab8f] | 389 | \end{cfa} | 
|---|
|  | 390 | \end{lrbox} | 
|---|
|  | 391 |  | 
|---|
|  | 392 | \begin{lrbox}{\myboxB} | 
|---|
|  | 393 | \begin{cfa} | 
|---|
| [760c88c] | 394 | waituntil( A ) { ... } | 
|---|
|  | 395 | and waituntil( C ) { ... } | 
|---|
| [847ab8f] | 396 |  | 
|---|
| [760c88c] | 397 | \end{cfa} | 
|---|
| [847ab8f] | 398 | \end{lrbox} | 
|---|
|  | 399 |  | 
|---|
|  | 400 | \begin{tabular}{@{}lcl@{}} | 
|---|
|  | 401 | \usebox\myboxA & $\equiv$ & \usebox\myboxB | 
|---|
|  | 402 | \end{tabular} | 
|---|
|  | 403 |  | 
|---|
|  | 404 | The @else@ clause on the @waituntil@ has identical semantics to the @else@ clause in Ada. | 
|---|
|  | 405 | If all resources are not immediately available and there is an @else@ clause, the @else@ clause is run and the thread continues. | 
|---|
| [760c88c] | 406 |  | 
|---|
| [847ab8f] | 407 | \subsection{Type Semantics} | 
|---|
| [760c88c] | 408 |  | 
|---|
| [847ab8f] | 409 | As mentioned, to support interaction with the @waituntil@ statement a type must support the trait in Figure~\ref{f:wu_trait}. | 
|---|
|  | 410 | The @waituntil@ statement expects types to register and unregister themselves via calls to @register_select@ and @unregister_select@, respectively. | 
|---|
| [494a7e5] | 411 | When a resource becomes available, @on_selected@ is run, and if it returns false, the corresponding code block is not run. | 
|---|
| [847ab8f] | 412 | Many types do not need @on_selected@, but it is provided if a type needs to perform work or checks before the resource can be accessed in the code block. | 
|---|
| [494a7e5] | 413 | The register/unregister routines in the trait also return booleans. | 
|---|
| [847ab8f] | 414 | The return value of @register_select@ is @true@, if the resource is immediately available and @false@ otherwise. | 
|---|
|  | 415 | The return value of @unregister_select@ is @true@, if the corresponding code block should be run after unregistration and @false@ otherwise. | 
|---|
|  | 416 | The routine @on_selected@ and the return value of @unregister_select@ are needed to support channels as a resource. | 
|---|
|  | 417 | More detail on channels and their interaction with @waituntil@ appear in Section~\ref{s:wu_chans}. | 
|---|
| [760c88c] | 418 |  | 
|---|
| [f496046] | 419 | The trait can be used directly by having a blocking object support the @is_selectable@ trait, or it can be used indirectly through routines that take the object as an argument. | 
|---|
|  | 420 | When used indirectly, the object's routine returns a type that supports the @is_selectable@ trait. | 
|---|
| [c03c1ac] | 421 | This feature leverages \CFA's ability to overload on return type to select the correct overloaded routine for the @waituntil@ context. | 
|---|
| [f496046] | 422 | Indirect support through routines is needed for types that want to support multiple operations such as channels that allow both reading and writing. | 
|---|
| [c03c1ac] | 423 |  | 
|---|
| [847ab8f] | 424 | \section{\lstinline{waituntil} Implementation} | 
|---|
| [c03c1ac] | 425 |  | 
|---|
| [afb3d68] | 426 | The @waituntil@ statement is not inherently complex, and Figure~\ref{f:WU_Impl} shows the basic outline of the @waituntil@ algorithm. | 
|---|
|  | 427 | Complexity comes from the consideration of race conditions and synchronization needed when supporting various primitives. | 
|---|
|  | 428 | The following sections use examples to fill in complexity details missing in Figure~\ref{f:WU_Impl}. | 
|---|
|  | 429 | After which, the full pseudocode for the @waituntil@ algorithm is presented in Figure~\ref{f:WU_Full_Impl}. | 
|---|
| [760c88c] | 430 |  | 
|---|
| [847ab8f] | 431 | \begin{figure} | 
|---|
|  | 432 | \begin{cfa} | 
|---|
| [c03c1ac] | 433 | select_nodes s[N];                                                              $\C[3.25in]{// declare N select nodes}$ | 
|---|
|  | 434 | for ( node in s )                                                               $\C{// register nodes}$ | 
|---|
| [847ab8f] | 435 | register_select( resource, node ); | 
|---|
|  | 436 | while ( statement predicate not satisfied ) {   $\C{// check predicate}$ | 
|---|
| [c03c1ac] | 437 | // block until clause(s) satisfied | 
|---|
| [7ed01be] | 438 | for ( resource in waituntil statement ) {       $\C{// run true code blocks}$ | 
|---|
| [847ab8f] | 439 | if ( resource is avail ) run code block | 
|---|
| [c03c1ac] | 440 | if ( statement predicate is satisfied ) break; | 
|---|
|  | 441 | } | 
|---|
| [847ab8f] | 442 | } | 
|---|
|  | 443 | for ( node in s )                                                               $\C{// deregister nodes}\CRT$ | 
|---|
| [7ed01be] | 444 | if ( unregister_select( resource, node ) ) run code block | 
|---|
| [847ab8f] | 445 | \end{cfa} | 
|---|
|  | 446 | \caption{\lstinline{waituntil} Implementation} | 
|---|
|  | 447 | \label{f:WU_Impl} | 
|---|
|  | 448 | \end{figure} | 
|---|
| [5e81a9c] | 449 |  | 
|---|
| [c03c1ac] | 450 | The basic steps of the algorithm are: | 
|---|
| [847ab8f] | 451 | \begin{enumerate} | 
|---|
| [5e81a9c] | 452 | \item | 
|---|
| [847ab8f] | 453 | The @waituntil@ statement declares $N$ @select_node@s, one per resource that is being waited on, which stores any @waituntil@ data pertaining to that resource. | 
|---|
| [5e81a9c] | 454 |  | 
|---|
|  | 455 | \item | 
|---|
| [847ab8f] | 456 | Each @select_node@ is then registered with the corresponding resource. | 
|---|
| [5e81a9c] | 457 |  | 
|---|
|  | 458 | \item | 
|---|
| [847ab8f] | 459 | The thread executing the @waituntil@ then loops until the statement's predicate is satisfied. | 
|---|
| [c03c1ac] | 460 | In each iteration, if the predicate is unsatisfied, the @waituntil@ thread blocks. | 
|---|
|  | 461 | When another thread satisfies a resource clause (\eg sends to a  channel), it unblocks the @waituntil@ thread. | 
|---|
|  | 462 | This thread checks all clauses for completion, and any completed clauses have their code blocks run. | 
|---|
| [7ed01be] | 463 | While checking clause completion, if enough clauses have been run such that the statement predicate is satisfied, the loop exits early. | 
|---|
| [760c88c] | 464 |  | 
|---|
| [5e81a9c] | 465 | \item | 
|---|
|  | 466 | Once the thread escapes the loop, the @select_nodes@ are unregistered from the resources. | 
|---|
|  | 467 | \end{enumerate} | 
|---|
|  | 468 | These steps give a basic overview of how the statement works. | 
|---|
| [c03c1ac] | 469 | The following sections shed light on the specific changes and provide more implementation detail. | 
|---|
| [760c88c] | 470 |  | 
|---|
| [494a7e5] | 471 | \subsection{Locks}\label{s:wu_locks} | 
|---|
| [847ab8f] | 472 |  | 
|---|
|  | 473 | The \CFA runtime supports a number of spinning and blocking locks, \eg semaphore, MCS, futex, Go mutex, spinlock, owner, \etc. | 
|---|
|  | 474 | Many of these locks satisfy the @is_selectable@ trait, and hence, are resources supported by the @waituntil@ statement. | 
|---|
|  | 475 | For example, the following waits until the thread has acquired lock @l1@ or locks @l2@ and @l3@. | 
|---|
|  | 476 | \begin{cfa} | 
|---|
|  | 477 | owner_lock l1, l2, l3; | 
|---|
|  | 478 | waituntil ( l1 ) { ... } | 
|---|
|  | 479 | or waituntil( l2 ) { ... } | 
|---|
|  | 480 | and waituntil( l3 ) { ... } | 
|---|
|  | 481 | \end{cfa} | 
|---|
|  | 482 | Implicitly, the @waituntil@ is calling the lock acquire for each of these locks to establish a position in the lock's queue of waiting threads. | 
|---|
| [c03c1ac] | 483 | When the lock schedules this thread, it unblocks and runs the code block associated with the lock and then releases the lock. | 
|---|
| [847ab8f] | 484 |  | 
|---|
|  | 485 | In detail, when a thread waits on multiple locks via a @waituntil@, it enqueues a @select_node@ in each of the lock's waiting queues. | 
|---|
|  | 486 | When a @select_node@ reaches the front of the lock's queue and gains ownership, the thread blocked on the @waituntil@ is unblocked. | 
|---|
| [c03c1ac] | 487 | Now, the lock is held by the @waituntil@ thread until the code block is executed, and then the node is unregistered, during which the lock is released. | 
|---|
| [afb3d68] | 488 | Immediately releasing the lock after the code block prevents the waiting thread from holding multiple locks and potentially introducing a deadlock. | 
|---|
| [c03c1ac] | 489 | As such, the only unregistered nodes associated with locks are the ones that have not run. | 
|---|
| [760c88c] | 490 |  | 
|---|
|  | 491 | \subsection{Timeouts} | 
|---|
| [847ab8f] | 492 |  | 
|---|
| [afb3d68] | 493 | A timeout for the @waituntil@ statement is a duration passed to routine \lstinline[deletekeywords={timeout}]{timeout}\footnote{\lstinline{timeout} is a quasi-keyword in \CFA, allowing it to be used an identifier.}, \eg: | 
|---|
| [c03c1ac] | 494 | \begin{cquote} | 
|---|
|  | 495 | \begin{tabular}{@{}l|l@{}} | 
|---|
|  | 496 | \multicolumn{2}{@{}l@{}}{\lstinline{Duration D1\{ 1`ms \}, D2\{ 2`ms \}, D3\{ 3`ms \};}} \\ | 
|---|
|  | 497 | \begin{cfa}[deletekeywords={timeout}] | 
|---|
|  | 498 | waituntil( i << C1 ) {} | 
|---|
|  | 499 | or waituntil( i << C2 ) {} | 
|---|
|  | 500 | or waituntil( i << C3 ) {} | 
|---|
|  | 501 | or waituntil( timeout( D1 ) ) {} | 
|---|
|  | 502 | or waituntil( timeout( D2 ) ) {} | 
|---|
|  | 503 | or waituntil( timeout( D3 ) ) {} | 
|---|
|  | 504 | \end{cfa} | 
|---|
|  | 505 | & | 
|---|
|  | 506 | \begin{cfa}[deletekeywords={timeout}] | 
|---|
|  | 507 | waituntil( i << C1 ) {} | 
|---|
|  | 508 | or waituntil( i << C2 ) {} | 
|---|
|  | 509 | or waituntil( i << C3 ) {} | 
|---|
| [a59e338] | 510 | or waituntil( timeout( min( D1, D2, D3 ) ) ) {} | 
|---|
| [c03c1ac] | 511 |  | 
|---|
| [760c88c] | 512 |  | 
|---|
|  | 513 | \end{cfa} | 
|---|
| [c03c1ac] | 514 | \end{tabular} | 
|---|
|  | 515 | \end{cquote} | 
|---|
| [a59e338] | 516 | These two examples are basically equivalent. | 
|---|
| [c03c1ac] | 517 | Here, the multiple timeouts are useful because the durations can change during execution and the separate clauses provide different code blocks if a timeout triggers. | 
|---|
|  | 518 | Multiple timeouts can also be used with @and@ to provide a minimal delay before proceeding. | 
|---|
| [afb3d68] | 519 | In following example, either channel @C1@ or @C2@ must be satisfied but nothing can be done for at least 1 or 3 seconds after the channel read, respectively. | 
|---|
| [c03c1ac] | 520 | \begin{cfa}[deletekeywords={timeout}] | 
|---|
| [9509d67a] | 521 | waituntil( i << C1 ){} and waituntil( timeout( 1`s ) ){} | 
|---|
|  | 522 | or waituntil( i << C2 ){} and waituntil( timeout( 3`s ) ){} | 
|---|
| [c03c1ac] | 523 | \end{cfa} | 
|---|
| [afb3d68] | 524 | If only @C2@ is satisfied, \emph{both} timeout code-blocks trigger because 1 second occurs before 3 seconds. | 
|---|
| [aae9c17] | 525 | Note that the \CFA @waitfor@ statement only provides a single @timeout@ clause because it only supports @or@ semantics. | 
|---|
| [760c88c] | 526 |  | 
|---|
| [c03c1ac] | 527 | The \lstinline[deletekeywords={timeout}]{timeout} routine is different from UNIX @sleep@, which blocks for the specified duration and returns the amount of time elapsed since the call started. | 
|---|
|  | 528 | Instead, \lstinline[deletekeywords={timeout}]{timeout} returns a type that supports the @is_selectable@ trait, allowing the type system to select the correct overloaded routine for this context. | 
|---|
|  | 529 | For the @waituntil@, it is more idiomatic for the \lstinline[deletekeywords={timeout}]{timeout} to use the same syntax as other blocking operations instead of having a special language clause. | 
|---|
| [760c88c] | 530 |  | 
|---|
|  | 531 | \subsection{Channels}\label{s:wu_chans} | 
|---|
|  | 532 |  | 
|---|
| [c03c1ac] | 533 | Channels require more complexity to allow synchronous multiplexing. | 
|---|
| [afb3d68] | 534 | For locks, when an outside thread releases a lock and unblocks the @waituntil@ thread (WUT), the lock's MX property is passed to the WUT (no spinning locks). | 
|---|
| [c03c1ac] | 535 | For futures, the outside thread deliveries a value to the future and unblocks any waiting threads, including WUTs. | 
|---|
| [afb3d68] | 536 | In either case, after the WUT unblocks, it is safe to execute its corresponding code block knowing access to the resource is protected by the lock or the read-only state of the future. | 
|---|
| [c03c1ac] | 537 | Similarly, for channels, when an outside thread inserts a value into a channel, it must unblock the WUT. | 
|---|
| [47b7142] | 538 | However, for channels, there is a race condition that poses an issue. | 
|---|
|  | 539 | If the outside thread inserts into the channel and unblocks the WUT, there is a race where another thread can remove the channel data, so after the WUT unblocks and attempts to remove from the buffer, it fails, and the WUT must reblock (busy waiting). | 
|---|
| [c03c1ac] | 540 | This scenario is a \gls{toctou} race that needs to be consolidated. | 
|---|
|  | 541 | To close the race, the outside thread must detect this case and insert directly into the left-hand side of the channel expression (@i << chan@) rather than into the channel, and then unblock the WUT. | 
|---|
|  | 542 | Now the unblocked WUT is guaranteed to have a satisfied resource and its code block can safely executed. | 
|---|
|  | 543 | The insertion circumvents the channel buffer via the wait-morphing in the \CFA channel implementation \see{Section~\ref{s:chan_impl}}, allowing @waituntil@ channel unblocking to not be special-cased. | 
|---|
| [9509d67a] | 544 | Note that all channel operations are fair and no preference is given between @waituntil@ and direct channel operations when unblocking. | 
|---|
| [c03c1ac] | 545 |  | 
|---|
|  | 546 | Furthermore, if both @and@ and @or@ operators are used, the @or@ operations stop behaving like exclusive-or due to the race among channel operations, \eg: | 
|---|
| [760c88c] | 547 | \begin{cfa} | 
|---|
| [afb3d68] | 548 | int i; | 
|---|
| [c03c1ac] | 549 | waituntil( i << A ) {} and waituntil( i << B ) {} | 
|---|
|  | 550 | or waituntil( i << C ) {} and waituntil( i << D ) {} | 
|---|
| [760c88c] | 551 | \end{cfa} | 
|---|
| [c03c1ac] | 552 | If exclusive-or semantics are followed, only the code blocks for @A@ and @B@ are run, or the code blocks for @C@ and @D@. | 
|---|
| [afb3d68] | 553 | However, four outside threads inserting into each channel can simultaneously put values into @i@ and attempt to unblock the WUT to run the four code-blocks. | 
|---|
| [c03c1ac] | 554 | This case introduces a race with complexity that increases with the size of the @waituntil@ statement. | 
|---|
|  | 555 | However, due to TOCTOU issues, it is impossible to know if all resources are available without acquiring all the internal locks of channels in the subtree of the @waituntil@ clauses. | 
|---|
|  | 556 | This approach is a poor solution for two reasons. | 
|---|
| [afb3d68] | 557 | It is possible that once all the locks are acquired, the subtree is not satisfied and the locks must be released. | 
|---|
| [c03c1ac] | 558 | This work incurs a high cost for signalling threads and heavily increase contention on internal channel locks. | 
|---|
| [5e81a9c] | 559 | Furthermore, the @waituntil@ statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible. | 
|---|
| [f496046] | 560 | As such, the exclusive-or semantics are lost when using both @and@ and @or@ operators since it cannot be supported without significant complexity and significantly affects @waituntil@ performance. | 
|---|
| [afb3d68] | 561 | Therefore, the example of reusing variable @i@ by multiple output channels is considered a user error without exclusive-or semantics. | 
|---|
|  | 562 | Given aliasing in C, it is impossible to even warn of the potential race. | 
|---|
|  | 563 | In the future, it would be interesting to support Go-like syntax, \lstinline[language=Go]{case i := <- ...}, defining a new scoped @i@ variable for each clause. | 
|---|
| [c03c1ac] | 564 |  | 
|---|
|  | 565 | It was deemed important that exclusive-or semantics are maintained when only @or@ operators are used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel. | 
|---|
| [d4c1d1f] | 566 | Consider the following example where thread 1 is reading and threads 2 and 3 are writing to channels @A@ and @B@ concurrently. | 
|---|
|  | 567 | \begin{cquote} | 
|---|
|  | 568 | \begin{tabular}{@{}l|l|l@{}} | 
|---|
| [afb3d68] | 569 | \multicolumn{3}{@{}l@{}}{\lstinline{channel(int) A, B; // zero size channels}} \\ | 
|---|
| [d4c1d1f] | 570 | thread 1 & thread 2 & thread 3 \\ | 
|---|
| [47b7142] | 571 | \begin{cfa} | 
|---|
|  | 572 | waituntil( i << A ) {} | 
|---|
|  | 573 | or waituntil( i << B ) {} | 
|---|
|  | 574 | \end{cfa} | 
|---|
| [d4c1d1f] | 575 | & | 
|---|
|  | 576 | \begin{cfa} | 
|---|
|  | 577 | A << 1; | 
|---|
| [c03c1ac] | 578 |  | 
|---|
| [d4c1d1f] | 579 | \end{cfa} | 
|---|
|  | 580 | & | 
|---|
|  | 581 | \begin{cfa} | 
|---|
|  | 582 | B << 2; | 
|---|
| [760c88c] | 583 |  | 
|---|
| [d4c1d1f] | 584 | \end{cfa} | 
|---|
|  | 585 | \end{tabular} | 
|---|
|  | 586 | \end{cquote} | 
|---|
|  | 587 | For thread 1 to have exclusive-or semantics, it must only consume from exactly one of @A@ or @B@. | 
|---|
|  | 588 | As such, thread 2 and 3 must race to establish the winning clause of the @waituntil@ in thread 1. | 
|---|
|  | 589 | This race is consolidated by thread 2 and 3 each attempting to set a pointer to the winning clause's @select_node@ address using \gls{cas}. | 
|---|
|  | 590 | The winner bypasses the channel and inserts into the WUT's left-hand, and signals thread 1. | 
|---|
|  | 591 | The loser continues checking if there is space in the channel, and if so performs the channel insert operation with a possible signal of a waiting remove thread; | 
|---|
|  | 592 | otherwise, if there is no space, the loser blocks. | 
|---|
|  | 593 | It is important the race occurs \emph{before} operating on the channel, because channel actions are different with respect to each thread. | 
|---|
|  | 594 | If the race was consolidated after the operation, both thread 2 and 3 could potentially write into @i@ concurrently. | 
|---|
|  | 595 |  | 
|---|
|  | 596 | Channels introduce another interesting implementation issue. | 
|---|
| [47b7142] | 597 | Supporting both reading and writing to a channel in a @waituntil@ means that one @waituntil@ clause may be the notifier of another @waituntil@ clause. | 
|---|
| [afb3d68] | 598 | This conjunction poses a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel. | 
|---|
| [47b7142] | 599 | Consider the following example, alongside a described ordering of events to highlight the race. | 
|---|
| [d4c1d1f] | 600 | \begin{cquote} | 
|---|
|  | 601 | \begin{tabular}{@{}l|l@{}} | 
|---|
| [afb3d68] | 602 | \multicolumn{2}{@{}l@{}}{\lstinline{channel(int) A, B; // zero size channels}} \\ | 
|---|
| [d4c1d1f] | 603 | thread 1 & thread 2 \\ | 
|---|
| [1ae3f185] | 604 | \begin{cfa}[moredelim={**[is][\color{blue}]{\#}{\#}}] | 
|---|
| [d4c1d1f] | 605 | waituntil( @i << A@ ) {} | 
|---|
| [1ae3f185] | 606 | or waituntil( #i << B# ) {} | 
|---|
| [c03c1ac] | 607 | \end{cfa} | 
|---|
| [d4c1d1f] | 608 | & | 
|---|
| [1ae3f185] | 609 | \begin{cfa}[moredelim={**[is][\color{blue}]{\#}{\#}}] | 
|---|
|  | 610 | waituntil( #B << 2# ) {} | 
|---|
| [d4c1d1f] | 611 | or waituntil( @A << 1@ ) {} | 
|---|
|  | 612 | \end{cfa} | 
|---|
|  | 613 | \end{tabular} | 
|---|
|  | 614 | \end{cquote} | 
|---|
| [afb3d68] | 615 | Assume thread 1 executes first, registers with channel @A@ and proceeds in the @waituntil@. | 
|---|
|  | 616 | Since @A@ is empty, thread 1 cannot remove, and then thread 1 is interrupted before registering with @B@. | 
|---|
|  | 617 | Thread 2 similarly registers with channel @B@, and proceeds in the @waituntil@. | 
|---|
|  | 618 | Since @B@ is zero size there is no space to insert, and then thread 2 is interrupted before registering with @A@. | 
|---|
| [d4c1d1f] | 619 | At this point, thread 1 and 2 resume execution. | 
|---|
| [afb3d68] | 620 | Remember from above, each exclusive-or @waituntil@ holds a race to set the winning clause of the statement. | 
|---|
|  | 621 | The issue that arises is that these two @waituntil@ statements must have matching winning clauses (both @A@ clauses or both @B@ clauses) to preserve the exclusive-or semantics, since a zero-sized channel needs an insert/remove pair for an operation to occur. | 
|---|
|  | 622 | If threads 1 and 2 race to set a winner only in their own @waituntil@, thread 1 can think it successfully removed from @B@, and thread 2 can think it successfully inserted into @A@, which is an error. | 
|---|
|  | 623 | Hence, there is a race on two fronts. | 
|---|
|  | 624 | If thread 1 wins the race and sees that @B@ has a waiting insertion, then thread 2 must execute the first clause of its @waituntil@ and thread 1 execute its second. | 
|---|
|  | 625 | Correspondingly, if thread 2 wins the race and sees that @A@ has a waiting removal, then thread 1 must execute the first clause of its @waituntil@ and thread 2 execute its second. | 
|---|
|  | 626 | Any other execution scenario is incorrect for exclusive-or semantics. | 
|---|
| [aae9c17] | 627 | Note that priority execution of multiple satisfied @waituntil@ causes (\ie top to bottom) is not violated because, in this scenario, there is only one satisfied clause for either thread. | 
|---|
| [760c88c] | 628 |  | 
|---|
| [d4c1d1f] | 629 | The Go @select@ solves this problem by acquiring all the internal locks of the channels before registering the @select@ on the channels. | 
|---|
|  | 630 | This approach eliminates the race shown above since thread 1 and 2 cannot both be registering at the same time. | 
|---|
|  | 631 | However, this approach cannot be used in \CFA, since the @waituntil@ is polymorphic. | 
|---|
| [afb3d68] | 632 | Not all types in a @waituntil@ have an internal lock, and when using non-channel types, acquiring all the locks incurs extra unneeded overhead. | 
|---|
| [c03c1ac] | 633 | Instead, this race is consolidated in \CFA in two phases by having an intermediate pending status value for the race. | 
|---|
| [47b7142] | 634 | This race case is detectable, and if detected, each thread first races to set its own @waituntil@ race pointer to be pending. | 
|---|
|  | 635 | If it succeeds, it then attempts to set the other thread's @waituntil@ race pointer to its success value. | 
|---|
|  | 636 | If either thread successfully sets the the other thread's @waituntil@ race pointer, then the operation can proceed, if not the signalling threads set its own race pointer back to the initial value and repeats. | 
|---|
|  | 637 | This retry mechanism can potentially introduce a livelock, but in practice a livelock here is highly unlikely. | 
|---|
| [d4c1d1f] | 638 | Furthermore, the likelihood of a livelock here is zero unless the program is in the niche case of having two or more exclusive-or @waituntil@s with two or more clauses in reverse order of priority. | 
|---|
|  | 639 | This livelock case can be fully eliminated using locks like Go, or if a \gls{dcas} instruction is available. | 
|---|
|  | 640 | If any other threads attempt to set a WUT's race pointer and see a pending value, they wait until the value changes before proceeding to ensure that, in the case the WUT fails, the signal is not lost. | 
|---|
|  | 641 | This protocol ensures that signals cannot be lost and that the two races can be resolved in a safe manner. | 
|---|
| [f496046] | 642 | The implementation of this protocol is shown in Figure~\ref{f:WU_DeadlockAvoidance}. | 
|---|
|  | 643 |  | 
|---|
|  | 644 | \begin{figure} | 
|---|
|  | 645 | \begin{cfa} | 
|---|
|  | 646 | bool pending_set_other( select_node & other, select_node & mine ) { | 
|---|
| [d5f5eb7] | 647 | unsigned long int cmp_status = UNSAT; | 
|---|
|  | 648 |  | 
|---|
|  | 649 | // Try to set other status, if we succeed break and return true | 
|---|
|  | 650 | while( ! CAS( other.clause_status, &cmp_status, SAT ) ) { | 
|---|
|  | 651 | if ( cmp_status == SAT ) | 
|---|
|  | 652 | return false; // If other status is SAT we lost so return false | 
|---|
|  | 653 |  | 
|---|
|  | 654 | // Toggle own status flag to allow other thread to potentially win | 
|---|
|  | 655 | mine.status = UNSAT; | 
|---|
|  | 656 |  | 
|---|
|  | 657 | // Reset compare flag | 
|---|
|  | 658 | cmp_status = UNSAT; | 
|---|
|  | 659 |  | 
|---|
|  | 660 | // Attempt to set own status flag back to PENDING to retry | 
|---|
|  | 661 | if ( ! CAS( mine.clause_status, &cmp_status, PENDING ) ) | 
|---|
|  | 662 | return false; // If we fail then we lost so return false | 
|---|
|  | 663 |  | 
|---|
|  | 664 | // Reset compare flag | 
|---|
|  | 665 | cmp_status = UNSAT; | 
|---|
|  | 666 | } | 
|---|
|  | 667 | return true; | 
|---|
| [f496046] | 668 | } | 
|---|
|  | 669 | \end{cfa} | 
|---|
|  | 670 | \caption{Exclusive-or \lstinline{waituntil} channel deadlock avoidance protocol} | 
|---|
|  | 671 | \label{f:WU_DeadlockAvoidance} | 
|---|
|  | 672 | \end{figure} | 
|---|
| [760c88c] | 673 |  | 
|---|
| [d4c1d1f] | 674 | Channels in \CFA have exception-based shutdown mechanisms that the @waituntil@ statement needs to support. | 
|---|
| [47b7142] | 675 | These exception mechanisms are supported through the @on_selected@ routine. | 
|---|
| [d4c1d1f] | 676 | This routine is needed by channels to detect if they are closed after unblocking in a @waituntil@ statement, to ensure the appropriate behaviour is taken and an exception is thrown. | 
|---|
| [afb3d68] | 677 | Hence, the channel close-down mechanism is handled correctly. | 
|---|
| [760c88c] | 678 |  | 
|---|
|  | 679 | \subsection{Guards and Statement Predicate}\label{s:wu_guards} | 
|---|
| [d4c1d1f] | 680 |  | 
|---|
|  | 681 | It is trivial to check when a synchronous multiplexing utility is done for the or/xor relationship, since any resource becoming available means that the blocked thread can proceed and the @waituntil@ statement is finished. | 
|---|
| [f496046] | 682 | In \uC and \CFA, the \gls{synch_multiplex} mechanism have both an and/or relationship, which along with guards, make the problem of checking for completion of the statement difficult. | 
|---|
| [afb3d68] | 683 | Consider the following @waituntil@. | 
|---|
| [f496046] | 684 | \begin{cfa} | 
|---|
| [afb3d68] | 685 | when( GA ) waituntil( A ) {} | 
|---|
|  | 686 | and when( GB ) waituntil( B ) {} | 
|---|
|  | 687 | or when( GC ) waituntil( C ) {} | 
|---|
| [f496046] | 688 | \end{cfa} | 
|---|
| [afb3d68] | 689 | When the @waituntil@ thread wakes up, the following predicate represents satisfaction: | 
|---|
| [f496046] | 690 | \begin{cfa} | 
|---|
| [afb3d68] | 691 | A && B || C || ! GA && B || ! GB && A || ! GA && ! GB && ! GC | 
|---|
| [f496046] | 692 | \end{cfa} | 
|---|
| [afb3d68] | 693 | which can be simplified to: | 
|---|
| [f496046] | 694 | \begin{cfa} | 
|---|
| [afb3d68] | 695 | ( A || ! GA ) && ( B || ! GB ) || C || ! GA && ! GB && ! GC | 
|---|
| [f496046] | 696 | \end{cfa} | 
|---|
| [afb3d68] | 697 | Checking the complete predicate on each iteration of the pending @waituntil@ is expensive so \uC and \CFA both take steps to simplify checking for statement completion. | 
|---|
| [760c88c] | 698 |  | 
|---|
| [5e81a9c] | 699 | In the \uC @_Select@ statement, this problem is solved by constructing a tree of the resources, where the internal nodes are operators and the leaves are booleans storing the state of each resource. | 
|---|
| [afb3d68] | 700 | A diagram of the tree for the complete predicate above is shown in Figure~\ref{f:uC_select_tree}, alongside the modification of the tree that occurs when @GA@ is @false@. | 
|---|
|  | 701 | Each internal node stores the statuses of the two subtrees beneath it. | 
|---|
| [d4c1d1f] | 702 | When resources become available, their corresponding leaf node status is modified, which percolates up the tree to update the state of the statement. | 
|---|
| [760c88c] | 703 | Once the root of the tree has both subtrees marked as @true@ then the statement is complete. | 
|---|
| [d4c1d1f] | 704 | As an optimization, when the internal nodes are updated, the subtrees marked as @true@ are pruned and not examined again. | 
|---|
| [f496046] | 705 | To support statement guards in \uC, the tree is modified to remove an internal node if a guard is false to maintain the appropriate predicate representation. | 
|---|
|  | 706 |  | 
|---|
|  | 707 | \begin{figure} | 
|---|
|  | 708 | \begin{center} | 
|---|
|  | 709 | \input{diagrams/uCpp_select_tree.tikz} | 
|---|
|  | 710 | \end{center} | 
|---|
| [afb3d68] | 711 | \caption{\uC \lstinline[language=uC++]{select} tree modification} | 
|---|
| [f496046] | 712 | \label{f:uC_select_tree} | 
|---|
|  | 713 | \end{figure} | 
|---|
| [760c88c] | 714 |  | 
|---|
| [afb3d68] | 715 | The \CFA @waituntil@ statement blocks a thread until a set of resources have become available that satisfy the complete predicate of a @waituntil@. | 
|---|
|  | 716 | The waiting condition of the @waituntil@ statement is implemented as the complete predicate over the resources, joined by the @waituntil@ operators, where a resource is @true@ if it is available, and @false@ otherwise. | 
|---|
|  | 717 | This complete predicate is used as the mechanism to check if a thread is done waiting on a @waituntil@. | 
|---|
| [aae9c17] | 718 | Leveraging the compiler, a predicate routine is generated per @waituntil@. | 
|---|
|  | 719 | This predicate routine accepts the statuses of the resources being waited on as arguments. | 
|---|
|  | 720 | A resource status is an integer that indicates whether a resource is either not available, available, or has already run its associated code block. | 
|---|
|  | 721 | The predicate routine returns @true@ when the @waituntil@ is done, \ie enough resources have run their associated code blocks to satisfy the @waituntil@'s predicate, and false otherwise. | 
|---|
| [847ab8f] | 722 | To support guards on the \CFA @waituntil@ statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate. | 
|---|
| [f496046] | 723 | The generated code allows the predicate that is checked with each iteration to be simplified to not check guard values. | 
|---|
| [afb3d68] | 724 | For example, the following is generated for the complete predicate above: | 
|---|
| [f496046] | 725 | \begin{cfa} | 
|---|
|  | 726 | // statement completion predicate | 
|---|
|  | 727 | bool check_completion( select_node * nodes ) { | 
|---|
| [d5f5eb7] | 728 | return nodes[0].status && nodes[1].status || nodes[2].status; | 
|---|
| [f496046] | 729 | } | 
|---|
| [afb3d68] | 730 | if ( GA || GB || GC ) {                         $\C{// skip statement if all guards false}$ | 
|---|
| [d5f5eb7] | 731 | select_node nodes[3]; | 
|---|
|  | 732 | nodes[0].status = ! GA && GB;   $\C{// A's status}$ | 
|---|
|  | 733 | nodes[1].status = ! GB && GA;   $\C{// B's status}$ | 
|---|
|  | 734 | nodes[2].status = ! GC;                 $\C{// C's status}$ | 
|---|
|  | 735 | // ... rest of waituntil codegen ... | 
|---|
| [f496046] | 736 | } | 
|---|
|  | 737 | \end{cfa} | 
|---|
| [760c88c] | 738 |  | 
|---|
| [d4c1d1f] | 739 | \uC's @_Select@, supports operators both inside and outside of the \lstinline[language=uC++]{_Select} clauses. | 
|---|
|  | 740 | In the following example, the code blocks run once their corresponding predicate inside the round braces is satisfied. | 
|---|
| [1ae3f185] | 741 | \begin{lstlisting}[language=uC++,{moredelim=**[is][\color{red}]{@}{@}}] | 
|---|
| [760c88c] | 742 | Future_ISM<int> A, B, C, D; | 
|---|
| [1ae3f185] | 743 | _Select( @A || B && C@ ) { ... } | 
|---|
|  | 744 | and _Select( @D && E@ ) { ... } | 
|---|
| [d4c1d1f] | 745 | \end{lstlisting} | 
|---|
| [afb3d68] | 746 | This feature is more expressive than the @waituntil@ statement in \CFA, allowing the code block for @&&@ to only run after \emph{both} resources are available. | 
|---|
| [760c88c] | 747 |  | 
|---|
| [d4c1d1f] | 748 | In \CFA, since the @waituntil@ statement supports more resources than just futures, implementing operators inside clauses is avoided for a few reasons. | 
|---|
|  | 749 | As a motivating example, suppose \CFA supported operators inside clauses as in: | 
|---|
| [760c88c] | 750 | \begin{cfa} | 
|---|
|  | 751 | owner_lock A, B, C, D; | 
|---|
|  | 752 | waituntil( A && B ) { ... } | 
|---|
|  | 753 | or waituntil( C && D ) { ... } | 
|---|
|  | 754 | \end{cfa} | 
|---|
| [afb3d68] | 755 | If the @waituntil@ acquires each lock as it becomes available, there is a possible deadlock since it is in a hold-and-wait situation. | 
|---|
| [d4c1d1f] | 756 | Other semantics are needed to ensure this operation is safe. | 
|---|
|  | 757 | One possibility is to use \CC's @scoped_lock@ approach described in Section~\ref{s:DeadlockAvoidance}; | 
|---|
|  | 758 | however, that opens the potential for livelock. | 
|---|
|  | 759 | Another possibility is to use resource ordering similar to \CFA's @mutex@ statement, but that alone is insufficient, if the resource ordering is not used universally. | 
|---|
| [760c88c] | 760 | One other way this could be implemented is to wait until all resources for a given clause are available before proceeding to acquire them, but this also quickly becomes a poor approach. | 
|---|
| [afb3d68] | 761 | This approach does not work due to \gls{toctou} issues, \ie it is impossible to ensure that the full set of resources are available without holding them all first. | 
|---|
|  | 762 | Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems. | 
|---|
|  | 763 | Finally, the problem of operators inside clauses is also a difficult to handle when supporting channels. | 
|---|
|  | 764 | It would require some way to ensure channels used with internal operators are modified, if and only if, the corresponding code block is run. | 
|---|
|  | 765 | However, that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}. | 
|---|
| [aae9c17] | 766 | Ultimately, this feature was not considered crucial after taking into account the complexity and runtime cost. | 
|---|
| [494a7e5] | 767 |  | 
|---|
|  | 768 | \subsection{The full \lstinline{waituntil} picture} | 
|---|
| [d4c1d1f] | 769 |  | 
|---|
| [afb3d68] | 770 | Given all the complex details handled by the @waituntil@, its full pseudocode is presented in Figure \ref{f:WU_Full_Impl}. | 
|---|
| [d4c1d1f] | 771 | Some things to note are as follows. | 
|---|
|  | 772 | The @finally@ blocks provide exception-safe \gls{raii} unregistering of nodes, and in particular, the @finally@ inside the innermost loop performs the immediate unregistering required for deadlock-freedom mentioned in Section~\ref{s:wu_locks}. | 
|---|
|  | 773 | The @when_conditions@ array is used to store the boolean result of evaluating each guard at the beginning of the @waituntil@, and it is used to conditionally omit operations on resources with @false@ guards. | 
|---|
|  | 774 | As discussed in Section~\ref{s:wu_chans}, this pseudocode includes conditional code-block execution based on the result of both @on_selected@ and @unregister_select@, which allows the channel implementation to ensure all available channel resources have their corresponding code block run. | 
|---|
| [494a7e5] | 775 |  | 
|---|
|  | 776 | \begin{figure} | 
|---|
|  | 777 | \begin{cfa} | 
|---|
|  | 778 | bool when_conditions[N]; | 
|---|
| [f496046] | 779 | for ( node in nodes )                                                                   $\C[3.75in]{// evaluate guards}$ | 
|---|
| [c03c1ac] | 780 | if ( node has guard ) | 
|---|
|  | 781 | when_conditions[node] = node_guard; | 
|---|
|  | 782 | else | 
|---|
|  | 783 | when_conditions[node] = true; | 
|---|
| [494a7e5] | 784 |  | 
|---|
| [000d68f] | 785 | if ( any when_conditions[node] are true ) { | 
|---|
| [d5f5eb7] | 786 | select_nodes nodes[N];                                                                  $\C{// declare N select nodes}$ | 
|---|
|  | 787 | try { | 
|---|
|  | 788 | // ... set statuses for nodes with when_conditions[node] == false ... | 
|---|
|  | 789 |  | 
|---|
|  | 790 | for ( node in nodes )                                                           $\C{// register nodes}$ | 
|---|
|  | 791 | if ( when_conditions[node] ) | 
|---|
|  | 792 | register_select( resource, node ); | 
|---|
|  | 793 |  | 
|---|
|  | 794 | while ( !check_completion( nodes ) ) {  $\C{// check predicate}$ | 
|---|
|  | 795 | // block | 
|---|
|  | 796 | for ( resource in waituntil statement ) {       $\C{// run true code blocks}$ | 
|---|
|  | 797 | if ( check_completion( nodes ) ) break; | 
|---|
|  | 798 | if ( resource is avail ) | 
|---|
|  | 799 | try { | 
|---|
|  | 800 | if( on_selected( resource ) )   $\C{// conditionally run block}$ | 
|---|
|  | 801 | run code block | 
|---|
|  | 802 | } finally                                                       $\C{// for exception safety}$ | 
|---|
|  | 803 | unregister_select( resource, node ); $\C{// immediate unregister}$ | 
|---|
|  | 804 | } | 
|---|
|  | 805 | } | 
|---|
|  | 806 | } finally {                                                                                     $\C{// for exception safety}$ | 
|---|
|  | 807 | for ( registered nodes in nodes )                                       $\C{// deregister nodes}$ | 
|---|
|  | 808 | if ( when_conditions[node] && unregister_select( resource, node ) | 
|---|
|  | 809 | && on_selected( resource ) ) | 
|---|
|  | 810 | run code block                                                  $\C{// run code block upon unregister}\CRT$ | 
|---|
|  | 811 | } | 
|---|
| [f496046] | 812 | } | 
|---|
| [494a7e5] | 813 | \end{cfa} | 
|---|
|  | 814 | \caption{Full \lstinline{waituntil} Pseudocode Implementation} | 
|---|
|  | 815 | \label{f:WU_Full_Impl} | 
|---|
|  | 816 | \end{figure} | 
|---|
|  | 817 |  | 
|---|
| [afb3d68] | 818 | \section{\lstinline{waituntil} Performance} | 
|---|
| [c03c1ac] | 819 |  | 
|---|
| [1ae3f185] | 820 | Similar facilities to @waituntil@ are discussed in Section~\ref{s:History} covering C, Ada, Rust, \CC, and OCaml. | 
|---|
| [c03c1ac] | 821 | However, these facilities are either not meaningful or feasible to benchmark against. | 
|---|
|  | 822 | The UNIX @select@ and related utilities are not comparable since they are system calls that go into the kernel and operate on file descriptors, whereas the @waituntil@ exists solely in user space. | 
|---|
| [afb3d68] | 823 | Ada's \lstinline[language=Ada]{select} and \uC's \lstinline[language=uC++]{_Accept} only operate on method calls, which is done in \CFA via the @waitfor@ statement. | 
|---|
| [1ae3f185] | 824 | Rust and \CC only offer a busy-wait approach, which is not comparable to a blocking approach. | 
|---|
| [5e81a9c] | 825 | OCaml's @select@ waits on channels that are not comparable with \CFA and Go channels, so OCaml @select@ is not benchmarked against Go's @select@ and \CFA's @waituntil@. | 
|---|
| [c03c1ac] | 826 |  | 
|---|
| [1ae3f185] | 827 | The two \gls{synch_multiplex} utilities that are in the realm of comparability with the \CFA @waituntil@ statement are the Go \lstinline[language=Go]{select} statement and the \uC \lstinline[language=uC++]{_Select} statement. | 
|---|
|  | 828 | As such, two microbenchmarks are presented, one for Go and one for \uC to contrast this feature. | 
|---|
|  | 829 | Given the differences in features, polymorphism, and expressibility between @waituntil@ and \lstinline[language=Go]{select}, and \uC \lstinline[language=uC++]{_Select}, the aim of the microbenchmarking in this chapter is to show that these implementations lie in the same realm of performance, not to pick a winner. | 
|---|
| [760c88c] | 830 |  | 
|---|
|  | 831 | \subsection{Channel Benchmark} | 
|---|
|  | 832 |  | 
|---|
| [1ae3f185] | 833 | The channel multiplexing benchmarks compare \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}, where the resource being waited on is a set of channels. | 
|---|
| [aae9c17] | 834 | Although Unix's select, poll and epoll multiplex over file descriptors, which are effectively channels, these utilities are not included in the benchmarks. | 
|---|
|  | 835 | Reading or writing to a file descriptor requires a system call which is much more expensive than operating on a user-space channel. | 
|---|
|  | 836 | As such, it is infeasible to compare Unix's select, poll and epoll with \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}. | 
|---|
| [1ae3f185] | 837 | The basic structure of the benchmark has the number of cores split evenly between producer and consumer threads, \ie, with 8 cores there are 4 producer and 4 consumer threads. | 
|---|
| [afb3d68] | 838 | The number of resource clauses $C$ is also varied across 2, 4, and 8 clauses, where each clause has a different channel that it waits on. | 
|---|
| [c03c1ac] | 839 | Each producer and consumer repeatedly waits to either produce or consume from one of the $C$ clauses and respective channels. | 
|---|
|  | 840 | For example, in \CFA syntax, the work loop in the consumer main with $C = 4$ clauses is: | 
|---|
| [14e1053] | 841 | \begin{cfa} | 
|---|
| [c03c1ac] | 842 | for () | 
|---|
|  | 843 | waituntil( val << chans[0] ); or waituntil( val << chans[1] ); | 
|---|
|  | 844 | or waituntil( val << chans[2] ); or waituntil( val << chans[3] ); | 
|---|
| [14e1053] | 845 | \end{cfa} | 
|---|
|  | 846 | A successful consumption is counted as a channel operation, and the throughput of these operations is measured over 10 seconds. | 
|---|
| [1ae3f185] | 847 | The first benchmark measures throughput of the producers and consumer synchronously waiting on the channels and the second has the threads asynchronously wait on the channels using the Go @default@ and \CFA @else@ clause. | 
|---|
| [14e1053] | 848 | The results are shown in Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} respectively. | 
|---|
|  | 849 |  | 
|---|
|  | 850 | \begin{figure} | 
|---|
|  | 851 | \centering | 
|---|
| [847ab8f] | 852 | \captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize} | 
|---|
| [14e1053] | 853 | \subfloat[AMD]{ | 
|---|
|  | 854 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_2.pgf}} | 
|---|
|  | 855 | } | 
|---|
|  | 856 | \subfloat[Intel]{ | 
|---|
|  | 857 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_2.pgf}} | 
|---|
|  | 858 | } | 
|---|
| [847ab8f] | 859 | \bigskip | 
|---|
| [14e1053] | 860 |  | 
|---|
|  | 861 | \subfloat[AMD]{ | 
|---|
|  | 862 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_4.pgf}} | 
|---|
|  | 863 | } | 
|---|
|  | 864 | \subfloat[Intel]{ | 
|---|
|  | 865 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_4.pgf}} | 
|---|
|  | 866 | } | 
|---|
| [847ab8f] | 867 | \bigskip | 
|---|
| [14e1053] | 868 |  | 
|---|
|  | 869 | \subfloat[AMD]{ | 
|---|
|  | 870 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_8.pgf}} | 
|---|
|  | 871 | } | 
|---|
|  | 872 | \subfloat[Intel]{ | 
|---|
|  | 873 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_8.pgf}} | 
|---|
|  | 874 | } | 
|---|
| [847ab8f] | 875 | \caption{The channel synchronous multiplexing benchmark comparing Go select and \CFA \lstinline{waituntil} statement throughput (higher is better).} | 
|---|
| [14e1053] | 876 | \label{f:select_contend_bench} | 
|---|
|  | 877 | \end{figure} | 
|---|
| [760c88c] | 878 |  | 
|---|
| [14e1053] | 879 | \begin{figure} | 
|---|
|  | 880 | \centering | 
|---|
| [847ab8f] | 881 | \captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize} | 
|---|
| [14e1053] | 882 | \subfloat[AMD]{ | 
|---|
|  | 883 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_2.pgf}} | 
|---|
|  | 884 | } | 
|---|
|  | 885 | \subfloat[Intel]{ | 
|---|
|  | 886 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_2.pgf}} | 
|---|
|  | 887 | } | 
|---|
| [847ab8f] | 888 | \bigskip | 
|---|
| [14e1053] | 889 |  | 
|---|
|  | 890 | \subfloat[AMD]{ | 
|---|
|  | 891 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_4.pgf}} | 
|---|
|  | 892 | } | 
|---|
|  | 893 | \subfloat[Intel]{ | 
|---|
|  | 894 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_4.pgf}} | 
|---|
|  | 895 | } | 
|---|
| [847ab8f] | 896 | \bigskip | 
|---|
| [14e1053] | 897 |  | 
|---|
|  | 898 | \subfloat[AMD]{ | 
|---|
|  | 899 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_8.pgf}} | 
|---|
|  | 900 | } | 
|---|
|  | 901 | \subfloat[Intel]{ | 
|---|
|  | 902 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_8.pgf}} | 
|---|
|  | 903 | } | 
|---|
| [847ab8f] | 904 | \caption{The asynchronous multiplexing channel benchmark comparing Go select and \CFA \lstinline{waituntil} statement throughput (higher is better).} | 
|---|
| [14e1053] | 905 | \label{f:select_spin_bench} | 
|---|
|  | 906 | \end{figure} | 
|---|
| [760c88c] | 907 |  | 
|---|
| [1ae3f185] | 908 | Both Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} have similar results when comparing \lstinline[language=Go]{select} and @waituntil@. | 
|---|
|  | 909 | In the AMD benchmarks (left column), the performance is very similar as the number of cores scale. | 
|---|
| [aae9c17] | 910 | The AMD machine has a high-caching contention cost because of its \emph{chiplet} L3 cache (\ie many L3 caches servicing a small number of cores), which creates a bottleneck on the channel locks and dominates the shape of the performance curve for both \CFA and Go. | 
|---|
| [1ae3f185] | 911 | Hence, it is difficult to detect differences in the \gls{synch_multiplex}, except at low cores, where Go has significantly better performance, due to an optimization in its scheduler. | 
|---|
|  | 912 | Go heavily optimizes thread handoffs on the local run-queue, which can result in very good performance for low numbers of threads parking/unparking each other~\cite{go:sched}. | 
|---|
|  | 913 | In the Intel benchmarks (right column), \CFA performs better than Go as the number of cores scales past 2/4 and as the number of clauses increase. | 
|---|
|  | 914 | This difference is due to Go's acquiring all channel locks when registering and unregistering channels on a \lstinline[language=Go]{select}. | 
|---|
|  | 915 | Go then is holding a lock for every channel, resulting in worse performance as the number of channels increase. | 
|---|
| [14e1053] | 916 | In \CFA, since races are consolidated without holding all locks, it scales much better both with cores and clauses since more work can occur in parallel. | 
|---|
| [afb3d68] | 917 | This scalability difference is more significant on the Intel machine than the AMD machine since the Intel has lower cache-contention costs. | 
|---|
| [1ae3f185] | 918 |  | 
|---|
|  | 919 | The Go approach of holding all internal channel-locks in the \lstinline[language=Go]{select} has additional drawbacks. | 
|---|
|  | 920 | There are pathological cases where Go's throughput has significant jitter. | 
|---|
|  | 921 | Consider a producer and consumer thread, @P1@ and @C1@, selecting from both channels @A@ and @B@. | 
|---|
|  | 922 | \begin{cquote} | 
|---|
|  | 923 | \begin{tabular}{@{}ll@{}} | 
|---|
|  | 924 | @P1@ & @C1@ \\ | 
|---|
|  | 925 | \begin{cfa} | 
|---|
|  | 926 | waituntil( A << i ); or waituntil( B << i ); | 
|---|
|  | 927 | \end{cfa} | 
|---|
|  | 928 | & | 
|---|
|  | 929 | \begin{cfa} | 
|---|
|  | 930 | waituntil( val << A ); or waituntil( val << B ); | 
|---|
|  | 931 | \end{cfa} | 
|---|
|  | 932 | \end{tabular} | 
|---|
|  | 933 | \end{cquote} | 
|---|
|  | 934 | Additionally, there is another producer and consumer thread, @P2@ and @C2@, operating solely on @B@. | 
|---|
|  | 935 | \begin{cquote} | 
|---|
|  | 936 | \begin{tabular}{@{}ll@{}} | 
|---|
|  | 937 | @P2@ & @C2@ \\ | 
|---|
|  | 938 | \begin{cfa} | 
|---|
|  | 939 | B << val; | 
|---|
|  | 940 | \end{cfa} | 
|---|
|  | 941 | & | 
|---|
|  | 942 | \begin{cfa} | 
|---|
|  | 943 | val << B; | 
|---|
|  | 944 | \end{cfa} | 
|---|
|  | 945 | \end{tabular} | 
|---|
|  | 946 | \end{cquote} | 
|---|
|  | 947 | In Go, this setup results in significantly worse performance since @P2@ and @C2@ cannot operate in parallel with @P1@ and @C1@ due to all locks being acquired. | 
|---|
| [aae9c17] | 948 | Interestingly, this case may not be as pathological as it seems. | 
|---|
| [afb3d68] | 949 | If the set of channels belonging to a \lstinline[language=Go]{select} have channels that overlap with the set of another \lstinline[language=Go]{select}, these statements lose the ability to operate in parallel. | 
|---|
| [1ae3f185] | 950 | The implementation in \CFA only holds a single lock at a time, resulting in better locking granularity, and hence, more parallelism. | 
|---|
| [14e1053] | 951 | Comparison of this pathological case is shown in Table~\ref{t:pathGo}. | 
|---|
| [afb3d68] | 952 | The AMD results highlight the worst-case scenario for Go since contention is more costly on this machine than the Intel machine. | 
|---|
| [14e1053] | 953 |  | 
|---|
|  | 954 | \begin{table}[t] | 
|---|
|  | 955 | \centering | 
|---|
|  | 956 | \setlength{\extrarowheight}{2pt} | 
|---|
|  | 957 | \setlength{\tabcolsep}{5pt} | 
|---|
|  | 958 |  | 
|---|
| [000d68f] | 959 | \caption{Throughput (channel operations per second) of \CFA and Go in a pathological case for contention in Go's select implementation} | 
|---|
| [14e1053] | 960 | \label{t:pathGo} | 
|---|
| [1ae3f185] | 961 | \begin{tabular}{r|r|r} | 
|---|
|  | 962 | & \multicolumn{1}{c|}{\CFA} & \multicolumn{1}{c}{Go} \\ | 
|---|
| [847ab8f] | 963 | \hline | 
|---|
|  | 964 | AMD             & \input{data/nasus_Order} \\ | 
|---|
|  | 965 | \hline | 
|---|
|  | 966 | Intel   & \input{data/pyke_Order} | 
|---|
| [14e1053] | 967 | \end{tabular} | 
|---|
|  | 968 | \end{table} | 
|---|
|  | 969 |  | 
|---|
|  | 970 | Another difference between Go and \CFA is the order of clause selection when multiple clauses are available. | 
|---|
| [1ae3f185] | 971 | Go \emph{randomly} selects a clause~\cite{go:select}, but \CFA chooses in the order clauses are listed. | 
|---|
|  | 972 | This \CFA design decision allows users to set implicit priorities, which can result in more predictable behaviour and even better performance. | 
|---|
|  | 973 | In the previous example, threads @P1@ and @C1@ prioritize channel @A@ in the @waituntil@, which can reduce contention for threads @P2@ and @C2@ accessing channel @B@. | 
|---|
|  | 974 | If \CFA did not have priorities, the performance difference in Table~\ref{t:pathGo} would be significant less due to extra contention on channel @B@. | 
|---|
| [760c88c] | 975 |  | 
|---|
|  | 976 | \subsection{Future Benchmark} | 
|---|
| [1ae3f185] | 977 |  | 
|---|
|  | 978 | The future benchmark compares \CFA's @waituntil@ with \uC's \lstinline[language=uC++]{_Select}, with both utilities waiting on futures. | 
|---|
|  | 979 | While both statements have very similar semantics, supporting @and@ and @or@ operators, \lstinline[language=uC++]{_Select} can only wait on futures, whereas the @waituntil@ is polymorphic. | 
|---|
|  | 980 | As such, the underlying implementation of the operators differs between @waituntil@ and \lstinline[language=uC++]{_Select}. | 
|---|
|  | 981 | The @waituntil@ statement checks for statement completion using a predicate function, whereas the \lstinline[language=uC++]{_Select} statement maintains a tree that represents the state of the internal predicate. | 
|---|
| [14e1053] | 982 |  | 
|---|
| [1ae3f185] | 983 | This benchmark aims to indirectly measure the impact of various predicates on the performance of the @waituntil@ and \lstinline[language=uC++]{_Select} statements. | 
|---|
|  | 984 | The benchmark is indirect since the performance of futures in \CFA and \uC differ by a significant margin. | 
|---|
|  | 985 | The experiment has a server, which cycles fulfilling three futures, @A@, @B@, and @C@, and a client, which waits for these futures to be fulfilled using four different kinds of predicates given in \CFA: | 
|---|
|  | 986 | \begin{cquote} | 
|---|
|  | 987 | \begin{tabular}{@{}l|l@{}} | 
|---|
|  | 988 | OR & AND \\ | 
|---|
|  | 989 | \hline | 
|---|
| [14e1053] | 990 | \begin{cfa} | 
|---|
|  | 991 | waituntil( A ) { get( A ); } | 
|---|
|  | 992 | or waituntil( B ) { get( B ); } | 
|---|
|  | 993 | or waituntil( C ) { get( C ); } | 
|---|
| [1ae3f185] | 994 | \end{cfa} | 
|---|
|  | 995 | & | 
|---|
|  | 996 | \begin{cfa} | 
|---|
| [14e1053] | 997 | waituntil( A ) { get( A ); } | 
|---|
|  | 998 | and waituntil( B ) { get( B ); } | 
|---|
|  | 999 | and waituntil( C ) { get( C ); } | 
|---|
| [1ae3f185] | 1000 | \end{cfa} | 
|---|
|  | 1001 | \\ | 
|---|
|  | 1002 | \multicolumn{2}{@{}c@{}}{} \\ | 
|---|
|  | 1003 | AND-OR & OR-AND \\ | 
|---|
|  | 1004 | \hline | 
|---|
|  | 1005 | \begin{cfa} | 
|---|
| [14e1053] | 1006 | waituntil( A ) { get( A ); } | 
|---|
|  | 1007 | and waituntil( B ) { get( B ); } | 
|---|
|  | 1008 | or waituntil( C ) { get( C ); } | 
|---|
| [1ae3f185] | 1009 | \end{cfa} | 
|---|
|  | 1010 | & | 
|---|
|  | 1011 | \begin{cfa} | 
|---|
|  | 1012 | @(@ waituntil( A ) { get( A ); } | 
|---|
|  | 1013 | or waituntil( B ) { get( B ); } @)@ | 
|---|
| [14e1053] | 1014 | and waituntil( C ) { get( C ); } | 
|---|
|  | 1015 | \end{cfa} | 
|---|
| [1ae3f185] | 1016 | \end{tabular} | 
|---|
|  | 1017 | \end{cquote} | 
|---|
|  | 1018 | The server and client use a low cost synchronize after each fulfillment, so the server does not race ahead of the client. | 
|---|
| [a0c746df] | 1019 |  | 
|---|
| [1ae3f185] | 1020 | Results of this benchmark are shown in Figure~\ref{f:futurePerf}. | 
|---|
| [afb3d68] | 1021 | Each pair of bars is marked with the predicate name for that experiment and the value at the top of each bar is the standard deviation. | 
|---|
| [1ae3f185] | 1022 | In detail, \uC results are lower in all cases due to the performance difference between futures and the more complex \gls{synch_multiplex} implementation. | 
|---|
|  | 1023 | However, the bars for both systems have similar height patterns across the experiments. | 
|---|
|  | 1024 | The @OR@ column for \CFA is more performant than the other \CFA predicates, due to the special-casing of @waituntil@ statements with only @or@ operators. | 
|---|
|  | 1025 | For both \uC and \CFA, the @AND@ experiment is the least performant, which is expected since all three futures need to be fulfilled for each statement completion. | 
|---|
| [14e1053] | 1026 | Interestingly, \CFA has lower variation across predicates on the AMD (excluding the special OR case), whereas \uC has lower variation on the Intel. | 
|---|
| [1ae3f185] | 1027 | Given the differences in semantics and implementation between \uC and \CFA, this test only illustrates the overall costs among the different kinds of predicates. | 
|---|
| [afb3d68] | 1028 |  | 
|---|
|  | 1029 | \begin{figure} | 
|---|
|  | 1030 | \centering | 
|---|
|  | 1031 | \subfloat[AMD Future Synchronization Benchmark]{ | 
|---|
|  | 1032 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Future.pgf}} | 
|---|
|  | 1033 | \label{f:futureAMD} | 
|---|
|  | 1034 | } | 
|---|
|  | 1035 | \subfloat[Intel Future Synchronization Benchmark]{ | 
|---|
|  | 1036 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Future.pgf}} | 
|---|
|  | 1037 | \label{f:futureIntel} | 
|---|
|  | 1038 | } | 
|---|
|  | 1039 | \caption{\CFA \lstinline{waituntil} and \uC \lstinline{_Select} statement throughput synchronizing on a set of futures with varying wait predicates (higher is better).} | 
|---|
|  | 1040 | \label{f:futurePerf} | 
|---|
|  | 1041 | \end{figure} | 
|---|