Changeset 2a301ff for doc/theses/colby_parsons_MMAth/text/waituntil.tex
- Timestamp:
- Aug 31, 2023, 11:31:15 PM (3 years ago)
- Branches:
- master, stuck-waitfor-destruct
- Children:
- 950c58e
- Parents:
- 92355883 (diff), 686912c (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)links above to see all the changes relative to each parent. - File:
-
- 1 edited
-
doc/theses/colby_parsons_MMAth/text/waituntil.tex (modified) (5 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/colby_parsons_MMAth/text/waituntil.tex
r92355883 r2a301ff 5 5 % ====================================================================== 6 6 7 Consider the following motivational problem that we shall title the bathroom problem. 8 There are @N@ stalls (resources) in a bathroom and there are @M@ people (threads). 9 Each stall has their own lock since only one person may occupy a stall at a time. 10 The standard way that humans solve this problem is that they check if the stalls are occupied and if they all are they queue and watch the stalls until one is free and then enter and lock the stall. 11 This solution is simple in real life, but can be difficult to implement in a concurrent context as it requires the threads to somehow wait on all the stalls at the same time. 12 The naive solution to this is for each thread to spin indefinitely continually checking the stalls until one is free. 13 This wastes cycles and also results in no fairness among threads waiting for stalls as a thread will jump in the first stall available without any regard to other waiting threads. 14 The ability to wait for the first stall available without spinning can be done with concurrent tools that provide \gls{synch_multiplex}, the ability to wait synchronously for a resource or set of resources. 15 16 \section{History of Synchronous Multiplexing} 7 Consider the following motivating problem. 8 There are $N$ stalls (resources) in a bathroom and there are $M$ people (threads) using the bathroom. 9 Each stall has its own lock since only one person may occupy a stall at a time. 10 Humans solve this problem in the following way. 11 They check if all of the stalls are occupied. 12 If not, they enter and claim an available stall. 13 If they are all occupied, people queue and watch the stalls until one is free, and then enter and lock the stall. 14 This solution can be implemented on a computer easily, if all threads are waiting on all stalls and agree to queue. 15 16 Now the problem is extended. 17 Some stalls are wheelchair accessible and some stalls have gender identification. 18 Each person (thread) may be limited to only one kind of stall or may choose among different kinds of stalls that match their criteria. 19 Immediately, the problem becomes more difficult. 20 A single queue no longer solves the problem. 21 What happens when there is a stall available that the person at the front of the queue cannot choose? 22 The na\"ive solution has each thread spin indefinitely continually checking every matching kind of stall(s) until a suitable one is free. 23 This approach is insufficient since it wastes cycles and results in unfairness among waiting threads as a thread can acquire the first matching stall without regard to the waiting time of other threads. 24 Waiting for the first appropriate stall (resource) that becomes available without spinning is an example of \gls{synch_multiplex}: the ability to wait synchronously for one or more resources based on some selection criteria. 25 26 \section{History of Synchronous Multiplexing}\label{s:History} 27 17 28 There is a history of tools that provide \gls{synch_multiplex}. 18 Some of the most well known include the set of unix system utilities: select(2)\cite{linux:select}, poll(2)\cite{linux:poll}, and epoll(7)\cite{linux:epoll}, and the select statement provided by Go\cite{go:selectref}. 19 20 Before one can examine the history of \gls{synch_multiplex} implementations in detail, the preceding theory must be discussed. 21 The theory surrounding this topic was largely introduced by Hoare in his 1985 CSP book \cite{Hoare85} and his later work with Roscoe on the theoretical language Occam\cite{Roscoe88}. 22 Both include guards for communication channels and the ability to wait for a single channel to be ready out of a set of channels. 23 The work on Occam in \cite{Roscoe88} calls their \gls{synch_multiplex} primitive ALT, which waits for one resource to be available and then executes a corresponding block of code. 24 Waiting for one resource out of a set of resources can be thought of as a logical exclusive or over the set of resources. 29 Some well known \gls{synch_multiplex} tools include Unix system utilities: @select@~\cite{linux:select}, @poll@~\cite{linux:poll}, and @epoll@~\cite{linux:epoll}, and the @select@ statement provided by Go~\cite{go:selectref}, Ada~\cite[\S~9.7]{Ada16}, and \uC~\cite[\S~3.3.1]{uC++}. 30 The concept and theory surrounding \gls{synch_multiplex} was introduced by Hoare in his 1985 book, Communicating Sequential Processes (CSP)~\cite{Hoare85}, 31 \begin{quote} 32 A communication is an event that is described by a pair $c.v$ where $c$ is the name of the channel on which the communication takes place and $v$ is the value of the message which passes.~\cite[p.~113]{Hoare85} 33 \end{quote} 34 The ideas in CSP were implemented by Roscoe and Hoare in the language Occam~\cite{Roscoe88}. 35 36 Both CSP and Occam include the ability to wait for a \Newterm{choice} among receiver channels and \Newterm{guards} to toggle which receives are valid. 37 For example, 38 \begin{cfa}[mathescape] 39 (@G1@(x) $\rightarrow$ P @|@ @G2@(y) $\rightarrow$ Q ) 40 \end{cfa} 41 waits for either channel @x@ or @y@ to have a value, if and only if guards @G1@ and @G2@ are true; 42 if only one guard is true, only one channel receives, and if both guards are false, no receive occurs. 43 % extended CSP with a \gls{synch_multiplex} construct @ALT@, which waits for one resource to be available and then executes a corresponding block of code. 44 In detail, waiting for one resource out of a set of resources can be thought of as a logical exclusive-or over the set of resources. 25 45 Guards are a conditional operator similar to an @if@, except they apply to the resource being waited on. 26 If a guard is false then the resource it guards is considered to not be in the set of resources being waited on. 27 Guards can be simulated using if statements, but to do so requires \[2^N\] if cases, where @N@ is the number of guards. 28 The equivalence between guards and exponential if statements comes from an Occam ALT statement rule~\cite{Roscoe88}, which is presented in \CFA syntax in Figure~\ref{f:wu_if}. 29 Providing guards allows for easy toggling of waituntil clauses without introducing repeated code. 30 31 \begin{figure} 32 \begin{cfa} 33 when( predicate ) waituntil( A ) {} 34 or waituntil( B ) {} 35 // === 36 if ( predicate ) { 37 waituntil( A ) {} 38 or waituntil( B ) {} 39 } else { 40 waituntil( B ) {} 46 If a guard is false, then the resource it guards is not in the set of resources being waited on. 47 If all guards are false, the ALT, Occam's \gls{synch_multiplex} statement, does nothing and the thread continues. 48 Guards can be simulated using @if@ statements as shown in~\cite[rule~2.4, p~183]{Roscoe88} 49 \begin{lstlisting}[basicstyle=\rm,mathescape] 50 ALT( $b$ & $g$ $P$, $G$ ) = IF ( $b$ ALT($\,g$ $P$, $G$ ), $\neg\,$b ALT( $G$ ) ) (boolean guard elim). 51 \end{lstlisting} 52 but require $2^N-1$ @if@ statements, where $N$ is the number of guards. 53 The exponential blowup comes from applying rule 2.4 repeatedly, since it works on one guard at a time. 54 Figure~\ref{f:wu_if} shows in \CFA an example of applying rule 2.4 for three guards. 55 Also, notice the additional code duplication for statements @S1@, @S2@, and @S3@. 56 57 \begin{figure} 58 \centering 59 \begin{lrbox}{\myboxA} 60 \begin{cfa} 61 when( G1 ) 62 waituntil( R1 ) S1 63 or when( G2 ) 64 waituntil( R2 ) S2 65 or when( G3 ) 66 waituntil( R3 ) S3 67 68 69 70 71 72 73 74 \end{cfa} 75 \end{lrbox} 76 77 \begin{lrbox}{\myboxB} 78 \begin{cfa} 79 if ( G1 ) 80 if ( G2 ) 81 if ( G3 ) waituntil( R1 ) S1 or waituntil( R2 ) S2 or waituntil( R3 ) S3 82 else waituntil( R1 ) S1 or waituntil( R2 ) S2 83 else 84 if ( G3 ) waituntil( R1 ) S1 or waituntil( R3 ) S3 85 else waituntil( R1 ) S1 86 else 87 if ( G2 ) 88 if ( G3 ) waituntil( R2 ) S2 or waituntil( R3 ) S3 89 else waituntil( R2 ) S2 90 else 91 if ( G3 ) waituntil( R3 ) S3 92 \end{cfa} 93 \end{lrbox} 94 95 \subfloat[Guards]{\label{l:guards}\usebox\myboxA} 96 \hspace*{5pt} 97 \vrule 98 \hspace*{5pt} 99 \subfloat[Simulated Guards]{\label{l:simulated_guards}\usebox\myboxB} 100 \caption{\CFA guard simulated with \lstinline{if} statement.} 101 \label{f:wu_if} 102 \end{figure} 103 104 When discussing \gls{synch_multiplex} implementations, the resource being multiplexed is important. 105 While CSP waits on channels, the earliest known implementation of synch\-ronous multiplexing is Unix's @select@~\cite{linux:select}, multiplexing over file descriptors. 106 The @select@ system-call is passed three sets of file descriptors (read, write, exceptional) to wait on and an optional timeout. 107 @select@ blocks until either some subset of file descriptors are available or the timeout expires. 108 All file descriptors that are ready are returned by modifying the argument sets to only contain the ready descriptors. 109 110 This early implementation differs from the theory presented in CSP: when the call from @select@ returns it may provide more than one ready file descriptor. 111 As such, @select@ has logical-or multiplexing semantics, whereas the theory described exclusive-or semantics. 112 It is possible to achieve exclusive-or semantics with @select@ by arbitrarily operating on only one of the returned descriptors. 113 @select@ passes the interest set of file descriptors between application and kernel in the form of a worst-case sized bit-mask, where the worst-case is the largest numbered file descriptor. 114 @poll@ reduces the size of the interest sets changing from a bit mask to a linked data structure, independent of the file-descriptor values. 115 @epoll@ further reduces the data passed per call by keeping the interest set in the kernel, rather than supplying it on every call. 116 117 These early \gls{synch_multiplex} tools interact directly with the operating system and others are used to communicate among processes. 118 Later, \gls{synch_multiplex} started to appear in applications, via programming languages, to support fast multiplexed concurrent communication among threads. 119 An early example of \gls{synch_multiplex} is the @select@ statement in Ada~\cite[\S~9.7]{Ichbiah79}. 120 This @select@ allows a task object, with their own threads, to multiplex over a subset of asynchronous calls to its methods. 121 The Ada @select@ has the same exclusive-or semantics and guards as Occam ALT; 122 however, it multiplexes over methods rather than channels. 123 124 \begin{figure} 125 \begin{lstlisting}[language=ada,literate=,{moredelim={**[is][\color{red}]{@}{@}}}] 126 task type buffer is -- thread type 127 ... -- buffer declarations 128 count : integer := 0; 129 begin -- thread starts here 130 loop 131 select 132 @when count < Size@ => -- guard 133 @accept insert( elem : in ElemType )@ do -- method 134 ... -- add to buffer 135 count := count + 1; 136 end; 137 -- executed if this accept called 138 or 139 @when count > 0@ => -- guard 140 @accept remove( elem : out ElemType )@ do -- method 141 ... --remove and return from buffer via parameter 142 count := count - 1; 143 end; 144 -- executed if this accept called 145 or @delay 10.0@; -- unblock after 10 seconds without call 146 or @else@ -- do not block, cannot appear with delay 147 end select; 148 end loop; 149 end buffer; 150 var buf : buffer; -- create task object and start thread in task body 151 \end{lstlisting} 152 \caption{Ada Bounded Buffer} 153 \label{f:BB_Ada} 154 \end{figure} 155 156 Figure~\ref{f:BB_Ada} shows the outline of a bounded buffer implemented with an Ada task. 157 Note, a task method is associated with the \lstinline[language=ada]{accept} clause of the \lstinline[language=ada]{select} statement, rather than being a separate routine. 158 The thread executing the loop in the task body blocks at the \lstinline[language=ada]{select} until a call occurs to @insert@ or @remove@. 159 Then the appropriate \lstinline[language=ada]{accept} method is run with the called arguments. 160 Hence, the \lstinline[language=ada]{select} statement provides rendezvous points for threads, rather than providing channels with message passing. 161 The \lstinline[language=ada]{select} statement also provides a timeout and @else@ (nonblocking), which changes synchronous multiplexing to asynchronous. 162 Now the thread polls rather than blocks. 163 164 Another example of programming-language \gls{synch_multiplex} is Go using a @select@ statement with channels~\cite{go:selectref}. 165 Figure~\ref{l:BB_Go} shows the outline of a bounded buffer administrator implemented with a Go routine. 166 Here two channels are used for inserting and removing by client producers and consumers, respectively. 167 (The @done@ channel is used to synchronize with the program main.) 168 Go's @select@ has the same exclusive-or semantics as the ALT primitive from Occam and associated code blocks for each clause like ALT and Ada. 169 However, unlike Ada and ALT, Go does not provide guards for the \lstinline[language=go]{case} clauses of the \lstinline[language=go]{select}. 170 As such, the exponential blowup can be seen comparing Go and \uC in Figure~\label{f:AdaMultiplexing}. 171 Go also provides a timeout via a channel and a @default@ clause like Ada @else@ for asynchronous multiplexing. 172 173 \begin{figure} 174 \centering 175 176 \begin{lrbox}{\myboxA} 177 \begin{lstlisting}[language=go,literate=,{moredelim={**[is][\color{red}]{@}{@}}}] 178 insert := make( chan int ) 179 remove := make( chan * int ) 180 buffer := make( chan int Size ) 181 done := make( chan int ) 182 count := 0 183 func in_buf( int val ) { 184 buffer <- val 185 count++ 41 186 } 42 \end{cfa} 43 \caption{Occam's guard to if statement equivalence shown in \CFA syntax.} 44 \label{f:wu_if} 45 \end{figure} 46 47 Switching to implementations, it is important to discuss the resources being multiplexed. 48 While the aforementioned theory discusses waiting on channels, the earliest known implementation of a synchronous multiplexing tool, Unix's select(2), is multiplexed over file descriptors. 49 The select(2) system call takes in sets of file descriptors to wait on as arguments and an optional timeout. 50 Select will block until either some subset of file descriptors are available or the timeout expires. 51 All file descriptors that are ready will be returned. 52 This early implementation differs from the theory as when the call from select returns it may provide more than one ready file descriptor. 53 As such select has a logical or multiplexing semantics, whereas the theory described exclusive-or semantics. 54 This is not a drawback. 55 A user can easily achieve exclusive-or semantics with select by arbitrarily choosing only one of the returned descriptors to operate on. 56 Select was followed by poll(2), and later epoll(7), which both improved upon drawbacks in their predecessors. 57 The syscall poll(2) improved on select by allowing users to monitor descriptors with numbers higher than 1024 which was not supported by select. 58 Epoll then improved on poll to return the set of file descriptors since poll would only say that some descriptor from the set was ready but not return which ones were ready. 59 60 It is worth noting these \gls{synch_multiplex} tools mentioned so far interact directly with the operating system and are often used to communicate between processes. 61 Later \gls{synch_multiplex} started to appear in user-space to support fast multiplexed concurrent communication between threads. 62 An early example of \gls{synch_multiplex} is the select statement in Ada~\cite[\S~9.7]{Ichbiah79}. 63 The select statement in Ada allows a task to multiplex over some subset of its own methods that it would like to @accept@ calls to. 64 Tasks in Ada can be thought of as threads which are an object of a specific class, and as such have methods, fields, etc. 65 This statement has the same exclusive-or semantics as ALT from Occam, and supports guards as described in Occam, however it multiplexes over methods on rather than multiplexing over channels. 66 A code block is associated with each @accept@, and the method that is @accept@ed first has its corresponding code block run after the task unblocks. 67 In this way the select statement in Ada provides rendezvous points for threads, rather than providing some resource through message passing. 68 The select statement in Ada also supports an optional timeout with the same semantics as select(2), and provides an @else@. 69 The @else@ changes the synchronous multiplexing to asynchronous multiplexing. 70 If an @else@ clause is in a select statement and no calls to the @accept@ed methods are immediately available the code block associated with the @else@ is run and the task does not block. 71 72 A popular example of user-space \gls{synch_multiplex} is Go with their select statement~\cite{go:selectref}. 73 Go's select statement operates on channels and has the same exclusive-or semantics as the ALT primitive from Occam, and has associated code blocks for each clause like ALT and Ada. 74 However, unlike Ada and ALT, Go does not provide any guards for their select statement cases. 75 Go provides a timeout utility and also provides a @default@ clause which has the same semantics as Ada's @else@ clause. 76 77 \uC provides \gls{synch_multiplex} over futures with their @_Select@ statement and Ada-style \gls{synch_multiplex} over monitor methods with their @_Accept@ statement~\cite{uC++}. 78 Their @_Accept@ statement builds upon the select statement offered by Ada, by offering both @and@ and @or@ semantics, which can be used together in the same statement. 79 These semantics are also supported for \uC's @_Select@ statement. 80 This enables fully expressive \gls{synch_multiplex} predicates. 81 82 There are many other languages that provide \gls{synch_multiplex}, including Rust's @select!@ over futures~\cite{rust:select}, OCaml's @select@ over channels~\cite{ocaml:channe}, and C++14's @when_any@ over futures~\cite{cpp:whenany}. 83 Note that while C++14 and Rust provide \gls{synch_multiplex}, their implemetations leave much to be desired as they both rely on busy-waiting polling to wait on multiple resources. 187 func out_buf( int * ptr ) { 188 *ptr := <-buffer 189 count-- 190 } 191 func BoundedBuffer { 192 L: for { 193 if ( count < Size && count > 0 ) { 194 select { // wait for message 195 @case i := <- insert: in_buf( i )@ 196 @case p := <- remove: out_buf( p )@ 197 case <- done: break L 198 } 199 } else if ( count < Size ) { 200 select { // wait for message 201 @case i := <- insert: in_buf( i )@ 202 case <- done: break L 203 } 204 } else ( count > 0 ) { 205 select { // wait for message 206 @case p := <- remove: out_buf( p )@ 207 case <- done: break L; 208 } 209 } 210 } 211 done <- 0 212 } 213 func main() { 214 go BoundedBuffer() // start administrator 215 } 216 \end{lstlisting} 217 \end{lrbox} 218 219 \begin{lrbox}{\myboxB} 220 \begin{lstlisting}[language=uC++,{moredelim={**[is][\color{red}]{@}{@}}}] 221 222 223 224 225 226 227 228 229 230 231 232 233 _Task BoundedBuffer { 234 int * buffer; 235 int front = back = count = 0; 236 public: 237 // ... constructor implementation 238 void insert( int elem ) { 239 buffer[front] = elem; 240 front = ( front + 1 ) % Size; 241 count += 1; 242 } 243 int remove() { 244 int ret = buffer[back]; 245 back = ( back + 1 ) % Size; 246 count -= 1; 247 return ret; 248 } 249 private: 250 void main() { 251 for ( ;; ) { 252 _Accept( ~buffer ) break; 253 @or _When ( count < Size ) _Accept( insert )@; 254 @or _When ( count > 0 ) _Accept( remove )@; 255 } 256 } 257 }; 258 buffer buf; // start thread in main method 259 \end{lstlisting} 260 \end{lrbox} 261 262 \subfloat[Go]{\label{l:BB_Go}\usebox\myboxA} 263 \hspace*{5pt} 264 \vrule 265 \hspace*{5pt} 266 \subfloat[\uC]{\label{l:BB_uC++}\usebox\myboxB} 267 268 \caption{Bounded Buffer} 269 \label{f:AdaMultiplexing} 270 \end{figure} 271 272 Finally, \uC provides \gls{synch_multiplex} with Ada-style @select@ over monitor and task methods with the @_Accept@ statement~\cite[\S~2.9.2.1]{uC++}, and over futures with the @_Select@ statement~\cite[\S~3.3.1]{uC++}. 273 The @_Select@ statement extends the ALT/Go @select@ by offering both @and@ and @or@ semantics, which can be used together in the same statement. 274 Both @_Accept@ and @_Select@ statements provide guards for multiplexing clauses, as well as, timeout, and @else@ clauses. 275 Figure~\ref{l:BB_uC++} shows the outline of a bounded buffer administrator implemented with \uC @_Accept@ statements. 276 277 There are other languages that provide \gls{synch_multiplex}, including Rust's @select!@ over futures~\cite{rust:select}, Java's @allof@/@anyof@ over futures~\cite{java:allof:anyof}, OCaml's @select@ over channels~\cite{ocaml:channel}, and C++14's @when_any@ over futures~\cite{cpp:whenany}. 278 Note that while C++14 and Rust provide \gls{synch_multiplex}, the implementations leave much to be desired as both rely on polling to wait on multiple resources. 84 279 85 280 \section{Other Approaches to Synchronous Multiplexing} 86 To avoid the need for \gls{synch_multiplex}, all communication between threads/processes has to come from a single source. 87 One key example is Erlang, in which each process has a single heterogenous mailbox that is the sole source of concurrent communication, removing the need for \gls{synch_multiplex} as there is only one place to wait on resources. 88 In a similar vein, actor systems circumvent the \gls{synch_multiplex} problem as actors are traditionally non-blocking, so they will only block when waiting for the next message. 281 282 To avoid the need for \gls{synch_multiplex}, all communication among threads/processes must come from a single source. 283 For example, in Erlang each process has a single heterogeneous mailbox that is the sole source of concurrent communication, removing the need for \gls{synch_multiplex} as there is only one place to wait on resources. 284 Similar, actor systems circumvent the \gls{synch_multiplex} problem as actors only block when waiting for the next message never in a behaviour. 89 285 While these approaches solve the \gls{synch_multiplex} problem, they introduce other issues. 90 Consider the case where a thread has a single source of communication (like erlang and actor systems) wants one of a set of @N@ resources. 91 It requests @N@ resources and waits for responses. 92 In the meantime the thread may receive other communication, and may either has to save and postpone the related work or discard it. 93 After the thread receives one of the @N@ resources, it will continue to receive the other ones it requested, even if it does not need them. 94 If the requests for the other resources need to be retracted, the burden falls on the programmer to determine how to synchronize appropriately to ensure that only one resource is delivered. 286 Consider the case where a thread has a single source of communication and it wants a set of $N$ resources. 287 It must sequentially request the $N$ resources and wait for each response. 288 During the receives for the $N$ resources, it can receive other communication, and has to save and postpone these communications, or discard them. 289 % If the requests for the other resources need to be retracted, the burden falls on the programmer to determine how to synchronize appropriately to ensure that only one resource is delivered. 95 290 96 291 \section{\CFA's Waituntil Statement} 292 97 293 The new \CFA \gls{synch_multiplex} utility introduced in this work is the @waituntil@ statement. 98 There is a @waitfor@ statement in \CFA that supports Ada-style \gls{synch_multiplex} over monitor methods, so this @waituntil@ focuses on synchronizing over other resources. 99 All of the \gls{synch_multiplex} features mentioned so far are monomorphic, only supporting one resource to wait on, select(2) supports file descriptors, Go's select supports channel operations, \uC's select supports futures, and Ada's select supports monitor method calls. 100 The waituntil statement in \CFA is polymorphic and provides \gls{synch_multiplex} over any objects that satisfy the trait in Figure~\ref{f:wu_trait}. 294 There already exists a @waitfor@ statement in \CFA that supports Ada-style \gls{synch_multiplex} over monitor methods~\cite{Delisle21}, so this @waituntil@ focuses on synchronizing over other resources. 295 All of the \gls{synch_multiplex} features mentioned so far are monomorphic, only waiting on one kind of resource: Unix @select@ supports file descriptors, Go's @select@ supports channel operations, \uC's @select@ supports futures, and Ada's @select@ supports monitor method calls. 296 The \CFA @waituntil@ is polymorphic and provides \gls{synch_multiplex} over any objects that satisfy the trait in Figure~\ref{f:wu_trait}. 297 No other language provides a synchronous multiplexing tool polymorphic over resources like \CFA's @waituntil@. 101 298 102 299 \begin{figure} … … 104 301 forall(T & | sized(T)) 105 302 trait is_selectable { 106 // For registering a waituntil stmt on a selectable type 107 bool register_select( T &, select_node & ); 108 109 // For unregistering a waituntil stmt from a selectable type 110 bool unregister_select( T &, select_node & ); 111 112 // on_selected is run on the selecting thread prior to executing the statement associated with the select_node 113 void on_selected( T &, select_node & ); 303 // For registering a waituntil stmt on a selectable type 304 bool register_select( T &, select_node & ); 305 306 // For unregistering a waituntil stmt from a selectable type 307 bool unregister_select( T &, select_node & ); 308 309 // on_selected is run on the selecting thread prior to executing 310 // the statement associated with the select_node 311 bool on_selected( T &, select_node & ); 114 312 }; 115 313 \end{cfa} 116 \caption{Trait for types that can be passed into \CFA's waituntilstatement.}314 \caption{Trait for types that can be passed into \CFA's \lstinline{waituntil} statement.} 117 315 \label{f:wu_trait} 118 316 \end{figure} 119 317 120 Currently locks, channels, futures and timeouts are supported by the waituntil statement, but this will be expanded as other use cases arise. 121 The waituntil statement supports guarded clauses, like Ada, and Occam, supports both @or@, and @and@ semantics, like \uC, and provides an @else@ for asynchronous multiplexing. An example of \CFA waituntil usage is shown in Figure~\ref{f:wu_example}. In Figure~\ref{f:wu_example} the waituntil statement is waiting for either @Lock@ to be available or for a value to be read from @Channel@ into @i@ and for @Future@ to be fulfilled. The semantics of the waituntil statement will be discussed in detail in the next section. 318 Currently locks, channels, futures and timeouts are supported by the @waituntil@ statement, and this set can be expanded through the @is_selectable@ trait as other use-cases arise. 319 The @waituntil@ statement supports guard clauses, both @or@ and @and@ semantics, and timeout and @else@ for asynchronous multiplexing. 320 Figure~\ref{f:wu_example} shows a \CFA @waituntil@ usage, which is waiting for either @Lock@ to be available \emph{or} for a value to be read from @Channel@ into @i@ \emph{and} for @Future@ to be fulfilled \emph{or} a timeout of one second. 321 Note, the expression inside a @waituntil@ clause is evaluated once at the start of the @waituntil@ algorithm. 322 323 \section{Waituntil Semantics} 324 325 The @waituntil@ semantics has two parts: the semantics of the statement itself, \ie @and@, @or@, @when@ guards, and @else@ semantics, and the semantics of how the @waituntil@ interacts with types like locks, channels, and futures. 326 327 \subsection{Statement Semantics} 328 329 The @or@ semantics are the most straightforward and nearly match those laid out in the ALT statement from Occam. 330 The clauses have an exclusive-or relationship where the first available one is run and only one clause is run. 331 \CFA's @or@ semantics differ from ALT semantics: instead of randomly picking a clause when multiple are available, the first clause in the @waituntil@ that is available is executed. 332 For example, in the following example, if @foo@ and @bar@ are both available, @foo@ is always selected since it comes first in the order of @waituntil@ clauses. 333 \begin{cfa} 334 future(int) bar, foo; 335 waituntil( foo ) { ... } or waituntil( bar ) { ... } // prioritize foo 336 \end{cfa} 337 The reason for these semantics is that prioritizing resources can be useful in certain problems, such as shutdown. 338 In the rare case where there is a starvation problem with the ordering, it possible to follow a @waituntil@ with its reverse form, alternating which resource has the highest priority: 339 \begin{cfa} 340 waituntil( foo ) { ... } or waituntil( bar ) { ... } // prioritize foo 341 waituntil( bar ) { ... } or waituntil( foo ) { ... } // prioritize bar 342 \end{cfa} 343 While this approach is not general for many resources, it handles many basic cases. 122 344 123 345 \begin{figure} … … 128 350 int i = 0; 129 351 130 waituntil( Lock ) { ... } 131 or when( i == 0 ) waituntil( i << Channel ) { ... } 132 and waituntil( Future ) { ... } 352 waituntil( @Lock@ ) { ... } 353 or when( i == 0 ) waituntil( i << @Channel@ ) { ... } 354 and waituntil( @Future@ ) { ... } 355 or waituntil( @timeout( 1`s )@ ) { ... } 356 // else { ... } 133 357 \end{cfa} 134 358 \caption{Example of \CFA's waituntil statement} … … 136 360 \end{figure} 137 361 138 \section{Waituntil Semantics} 139 There are two parts of the waituntil semantics to discuss, the semantics of the statement itself, \ie @and@, @or@, @when@ guards, and @else@ semantics, and the semantics of how the waituntil interacts with types like channels, locks and futures. 140 To start, the semantics of the statement itself will be discussed. 141 142 \subsection{Waituntil Statement Semantics} 143 The @or@ semantics are the most straightforward and nearly match those laid out in the ALT statement from Occam, the clauses have an exclusive-or relationship where the first one to be available will be run and only one clause is run. 144 \CFA's @or@ semantics differ from ALT semantics in one respect, instead of randomly picking a clause when multiple are available, the clause that appears first in the order of clauses will be picked. 145 \eg in the following example, if @foo@ and @bar@ are both available, @foo@ will always be selected since it comes first in the order of waituntil clauses. 146 \begin{cfa} 147 future(int) bar; 148 future(int) foo; 149 waituntil( foo ) { ... } 150 or waituntil( bar ) { ... } 151 \end{cfa} 152 153 The @and@ semantics match the @and@ semantics used by \uC. 154 When multiple clauses are joined by @and@, the waituntil will make a thread wait for all to be available, but will run the corresponding code blocks \emph{as they become available}. 155 As @and@ clauses are made available, the thread will be woken to run those clauses' code blocks and then the thread will wait again until all clauses have been run. 156 This allows work to be done in parallel while synchronizing over a set of resources, and furthermore gives a good reason to use the @and@ operator. 157 If the @and@ operator waited for all clauses to be available before running, it would not provide much more use that just acquiring those resources one by one in subsequent lines of code. 158 The @and@ operator binds more tightly than the @or@ operator. 159 To give an @or@ operator higher precedence brackets can be used. 160 \eg the following waituntil unconditionally waits for @C@ and one of either @A@ or @B@, since the @or@ is given higher precendence via brackets. 161 \begin{cfa} 162 (waituntil( A ) { ... } 163 or waituntil( B ) { ... } ) 362 The \CFA @and@ semantics match the @and@ semantics of \uC \lstinline[language=uC++]{_Select}. 363 When multiple clauses are joined by @and@, the @waituntil@ makes a thread wait for all to be available, but still runs the corresponding code blocks \emph{as they become available}. 364 When an @and@ clause becomes available, the waiting thread unblocks and runs that clause's code-block, and then the thread waits again for the next available clause or the @waituntil@ statement is now satisfied. 365 This semantics allows work to be done in parallel while synchronizing over a set of resources, and furthermore, gives a good reason to use the @and@ operator. 366 If the @and@ operator waited for all clauses to be available before running, it is the same as just acquiring those resources consecutively by a sequence of @waituntil@ statements. 367 368 As for normal C expressions, the @and@ operator binds more tightly than the @or@. 369 To give an @or@ operator higher precedence, parenthesis are used. 370 For example, the following @waituntil@ unconditionally waits for @C@ and one of either @A@ or @B@, since the @or@ is given higher precedence via parenthesis. 371 \begin{cfa} 372 @(@ waituntil( A ) { ... } // bind tightly to or 373 or waituntil( B ) { ... } @)@ 164 374 and waituntil( C ) { ... } 165 375 \end{cfa} 166 376 167 The guards in the waituntil statement are called @when@ clauses. 168 The @when@ clause is passed a boolean expression. 169 All the @when@ boolean expressions are evaluated before the waituntil statement is run. 170 The guards in Occam's ALT effectively toggle clauses on and off, where a clause will only be evaluated and waited on if the corresponding guard is @true@. 171 The guards in the waituntil statement operate the same way, but require some nuance since both @and@ and @or@ operators are supported. 172 When a guard is false and a clause is removed, it can be thought of as removing that clause and its preceding operator from the statement. 173 \eg in the following example the two waituntil statements are semantically the same. 174 \begin{cfa} 175 when(true) waituntil( A ) { ... } 176 or when(false) waituntil( B ) { ... } 377 The guards in the @waituntil@ statement are called @when@ clauses. 378 Each boolean expression inside a @when@ is evaluated \emph{once} before the @waituntil@ statement is run. 379 Like Occam's ALT, the guards toggle clauses on and off, where a @waituntil@ clause is only evaluated and waited on if the corresponding guard is @true@. 380 In addition, the @waituntil@ guards require some nuance since both @and@ and @or@ operators are supported \see{Section~\ref{s:wu_guards}}. 381 When a guard is false and a clause is removed, it can be thought of as removing that clause and its preceding operation from the statement. 382 For example, in the following, the two @waituntil@ statements are semantically equivalent. 383 384 \begin{lrbox}{\myboxA} 385 \begin{cfa} 386 when( true ) waituntil( A ) { ... } 387 or when( false ) waituntil( B ) { ... } 177 388 and waituntil( C ) { ... } 178 // === 389 \end{cfa} 390 \end{lrbox} 391 392 \begin{lrbox}{\myboxB} 393 \begin{cfa} 179 394 waituntil( A ) { ... } 180 395 and waituntil( C ) { ... } 181 \end{cfa} 182 183 The @else@ clause on the waituntil has identical semantics to the @else@ clause in Ada. 184 If all resources are not immediately available and there is an @else@ clause, the @else@ clause is run and the thread will not block. 185 186 \subsection{Waituntil Type Semantics} 187 As described earlier, to support interaction with the waituntil statement a type must support the trait shown in Figure~\ref{f:wu_trait}. 188 The waituntil statement expects types to register and unregister themselves via calls to @register_select@ and @unregister_select@ respectively. 189 When a resource becomes available, @on_selected@ is run. 190 Many types may not need @on_selected@, but it is provided since some types may need to check and set things before the resource can be accessed in the code block. 191 The register/unregister routines in the trait return booleans. 192 The return value of @register_select@ is @true@ if the resource is immediately available, and @false@ otherwise. 193 The return value of @unregister_select@ is @true@ if the corresponding code block should be run after unregistration and @false@ otherwise. 194 The routine @on_selected@, and the return value of @unregister_select@ were needed to support channels as a resource. 195 More detail on channels and their interaction with waituntil will be discussed in Section~\ref{s:wu_chans}. 196 197 \section{Waituntil Implementation} 198 The waituntil statement is not inherently complex, and can be described as a few steps. 199 The complexity of the statement comes from the consideration of race conditions and synchronization needed when supporting various primitives. 200 The basic steps that the waituntil statement follows are the following. 201 202 First the waituntil statement creates a @select_node@ per resource that is being waited on. 203 The @select_node@ is an object that stores the waituntil data pertaining to one of the resources. 204 Then, each @select_node@ is then registered with the corresponding resource. 205 The thread executing the waituntil then enters a loop that will loop until the entire waituntil statement being satisfied. 206 In each iteration of the loop the thread attempts to block. 207 If any clauses are satified the block will fail and the thread will proceed, otherwise the block succeeds. 208 After proceeding past the block all clauses are checked for completion and the completed clauses have their code blocks run. 396 397 \end{cfa} 398 \end{lrbox} 399 400 \begin{tabular}{@{}lcl@{}} 401 \usebox\myboxA & $\equiv$ & \usebox\myboxB 402 \end{tabular} 403 404 The @else@ clause on the @waituntil@ has identical semantics to the @else@ clause in Ada. 405 If all resources are not immediately available and there is an @else@ clause, the @else@ clause is run and the thread continues. 406 407 \subsection{Type Semantics} 408 409 As mentioned, to support interaction with the @waituntil@ statement a type must support the trait in Figure~\ref{f:wu_trait}. 410 The @waituntil@ statement expects types to register and unregister themselves via calls to @register_select@ and @unregister_select@, respectively. 411 When a resource becomes available, @on_selected@ is run, and if it returns false, the corresponding code block is not run. 412 Many types do not need @on_selected@, but it is provided if a type needs to perform work or checks before the resource can be accessed in the code block. 413 The register/unregister routines in the trait also return booleans. 414 The return value of @register_select@ is @true@, if the resource is immediately available and @false@ otherwise. 415 The return value of @unregister_select@ is @true@, if the corresponding code block should be run after unregistration and @false@ otherwise. 416 The routine @on_selected@ and the return value of @unregister_select@ are needed to support channels as a resource. 417 More detail on channels and their interaction with @waituntil@ appear in Section~\ref{s:wu_chans}. 418 419 The trait can be used directly by having a blocking object support the @is_selectable@ trait, or it can be used indirectly through routines that take the object as an argument. 420 When used indirectly, the object's routine returns a type that supports the @is_selectable@ trait. 421 This feature leverages \CFA's ability to overload on return type to select the correct overloaded routine for the @waituntil@ context. 422 Indirect support through routines is needed for types that want to support multiple operations such as channels that allow both reading and writing. 423 424 \section{\lstinline{waituntil} Implementation} 425 426 The @waituntil@ statement is not inherently complex, and Figure~\ref{f:WU_Impl} shows the basic outline of the @waituntil@ algorithm. 427 Complexity comes from the consideration of race conditions and synchronization needed when supporting various primitives. 428 The following sections use examples to fill in complexity details missing in Figure~\ref{f:WU_Impl}. 429 After which, the full pseudocode for the @waituntil@ algorithm is presented in Figure~\ref{f:WU_Full_Impl}. 430 431 \begin{figure} 432 \begin{cfa} 433 select_nodes s[N]; $\C[3.25in]{// declare N select nodes}$ 434 for ( node in s ) $\C{// register nodes}$ 435 register_select( resource, node ); 436 while ( statement predicate not satisfied ) { $\C{// check predicate}$ 437 // block until clause(s) satisfied 438 for ( resource in waituntil statement ) { $\C{// run true code blocks}$ 439 if ( resource is avail ) run code block 440 if ( statement predicate is satisfied ) break; 441 } 442 } 443 for ( node in s ) $\C{// deregister nodes}\CRT$ 444 if ( unregister_select( resource, node ) ) run code block 445 \end{cfa} 446 \caption{\lstinline{waituntil} Implementation} 447 \label{f:WU_Impl} 448 \end{figure} 449 450 The basic steps of the algorithm are: 451 \begin{enumerate} 452 \item 453 The @waituntil@ statement declares $N$ @select_node@s, one per resource that is being waited on, which stores any @waituntil@ data pertaining to that resource. 454 455 \item 456 Each @select_node@ is then registered with the corresponding resource. 457 458 \item 459 The thread executing the @waituntil@ then loops until the statement's predicate is satisfied. 460 In each iteration, if the predicate is unsatisfied, the @waituntil@ thread blocks. 461 When another thread satisfies a resource clause (\eg sends to a channel), it unblocks the @waituntil@ thread. 462 This thread checks all clauses for completion, and any completed clauses have their code blocks run. 463 While checking clause completion, if enough clauses have been run such that the statement predicate is satisfied, the loop exits early. 464 465 \item 209 466 Once the thread escapes the loop, the @select_nodes@ are unregistered from the resources. 210 In the case where the block suceeds, the thread will be woken by the thread that marks one of the resources as available. 211 Pseudocode detailing these steps is presented in the following code block. 212 213 \begin{cfa} 214 select_nodes s[N]; // N select nodes 215 for ( node in s ) 216 register_select( resource, node ); 217 while( statement not satisfied ) { 218 // try to block 219 for ( resource in waituntil statement ) 220 if ( resource is avail ) run code block 467 \end{enumerate} 468 These steps give a basic overview of how the statement works. 469 The following sections shed light on the specific changes and provide more implementation detail. 470 471 \subsection{Locks}\label{s:wu_locks} 472 473 The \CFA runtime supports a number of spinning and blocking locks, \eg semaphore, MCS, futex, Go mutex, spinlock, owner, \etc. 474 Many of these locks satisfy the @is_selectable@ trait, and hence, are resources supported by the @waituntil@ statement. 475 For example, the following waits until the thread has acquired lock @l1@ or locks @l2@ and @l3@. 476 \begin{cfa} 477 owner_lock l1, l2, l3; 478 waituntil ( l1 ) { ... } 479 or waituntil( l2 ) { ... } 480 and waituntil( l3 ) { ... } 481 \end{cfa} 482 Implicitly, the @waituntil@ is calling the lock acquire for each of these locks to establish a position in the lock's queue of waiting threads. 483 When the lock schedules this thread, it unblocks and runs the code block associated with the lock and then releases the lock. 484 485 In detail, when a thread waits on multiple locks via a @waituntil@, it enqueues a @select_node@ in each of the lock's waiting queues. 486 When a @select_node@ reaches the front of the lock's queue and gains ownership, the thread blocked on the @waituntil@ is unblocked. 487 Now, the lock is held by the @waituntil@ thread until the code block is executed, and then the node is unregistered, during which the lock is released. 488 Immediately releasing the lock after the code block prevents the waiting thread from holding multiple locks and potentially introducing a deadlock. 489 As such, the only unregistered nodes associated with locks are the ones that have not run. 490 491 \subsection{Timeouts} 492 493 A timeout for the @waituntil@ statement is a duration passed to routine \lstinline[deletekeywords={timeout}]{timeout}\footnote{\lstinline{timeout} is a quasi-keyword in \CFA, allowing it to be used an identifier.}, \eg: 494 \begin{cquote} 495 \begin{tabular}{@{}l|l@{}} 496 \multicolumn{2}{@{}l@{}}{\lstinline{Duration D1\{ 1`ms \}, D2\{ 2`ms \}, D3\{ 3`ms \};}} \\ 497 \begin{cfa}[deletekeywords={timeout}] 498 waituntil( i << C1 ) {} 499 or waituntil( i << C2 ) {} 500 or waituntil( i << C3 ) {} 501 or waituntil( timeout( D1 ) ) {} 502 or waituntil( timeout( D2 ) ) {} 503 or waituntil( timeout( D3 ) ) {} 504 \end{cfa} 505 & 506 \begin{cfa}[deletekeywords={timeout}] 507 waituntil( i << C1 ) {} 508 or waituntil( i << C2 ) {} 509 or waituntil( i << C3 ) {} 510 or waituntil( timeout( min( D1, D2, D3 ) ) ) {} 511 512 513 \end{cfa} 514 \end{tabular} 515 \end{cquote} 516 These two examples are basically equivalent. 517 Here, the multiple timeouts are useful because the durations can change during execution and the separate clauses provide different code blocks if a timeout triggers. 518 Multiple timeouts can also be used with @and@ to provide a minimal delay before proceeding. 519 In following example, either channel @C1@ or @C2@ must be satisfied but nothing can be done for at least 1 or 3 seconds after the channel read, respectively. 520 \begin{cfa}[deletekeywords={timeout}] 521 waituntil( i << C1 ); and waituntil( timeout( 1`s ) ); 522 or waituntil( i << C2 ); and waituntil( timeout( 3`s ) ); 523 \end{cfa} 524 If only @C2@ is satisfied, \emph{both} timeout code-blocks trigger because 1 second occurs before 3 seconds. 525 Note, the \CFA @waitfor@ statement only provides a single @timeout@ clause because it only supports @or@ semantics. 526 527 The \lstinline[deletekeywords={timeout}]{timeout} routine is different from UNIX @sleep@, which blocks for the specified duration and returns the amount of time elapsed since the call started. 528 Instead, \lstinline[deletekeywords={timeout}]{timeout} returns a type that supports the @is_selectable@ trait, allowing the type system to select the correct overloaded routine for this context. 529 For the @waituntil@, it is more idiomatic for the \lstinline[deletekeywords={timeout}]{timeout} to use the same syntax as other blocking operations instead of having a special language clause. 530 531 \subsection{Channels}\label{s:wu_chans} 532 533 Channels require more complexity to allow synchronous multiplexing. 534 For locks, when an outside thread releases a lock and unblocks the @waituntil@ thread (WUT), the lock's MX property is passed to the WUT (no spinning locks). 535 For futures, the outside thread deliveries a value to the future and unblocks any waiting threads, including WUTs. 536 In either case, after the WUT unblocks, it is safe to execute its corresponding code block knowing access to the resource is protected by the lock or the read-only state of the future. 537 Similarly, for channels, when an outside thread inserts a value into a channel, it must unblock the WUT. 538 However, for channels, there is a race condition that poses an issue. 539 If the outside thread inserts into the channel and unblocks the WUT, there is a race where another thread can remove the channel data, so after the WUT unblocks and attempts to remove from the buffer, it fails, and the WUT must reblock (busy waiting). 540 This scenario is a \gls{toctou} race that needs to be consolidated. 541 To close the race, the outside thread must detect this case and insert directly into the left-hand side of the channel expression (@i << chan@) rather than into the channel, and then unblock the WUT. 542 Now the unblocked WUT is guaranteed to have a satisfied resource and its code block can safely executed. 543 The insertion circumvents the channel buffer via the wait-morphing in the \CFA channel implementation \see{Section~\ref{s:chan_impl}}, allowing @waituntil@ channel unblocking to not be special-cased. 544 545 Furthermore, if both @and@ and @or@ operators are used, the @or@ operations stop behaving like exclusive-or due to the race among channel operations, \eg: 546 \begin{cfa} 547 int i; 548 waituntil( i << A ) {} and waituntil( i << B ) {} 549 or waituntil( i << C ) {} and waituntil( i << D ) {} 550 \end{cfa} 551 If exclusive-or semantics are followed, only the code blocks for @A@ and @B@ are run, or the code blocks for @C@ and @D@. 552 However, four outside threads inserting into each channel can simultaneously put values into @i@ and attempt to unblock the WUT to run the four code-blocks. 553 This case introduces a race with complexity that increases with the size of the @waituntil@ statement. 554 However, due to TOCTOU issues, it is impossible to know if all resources are available without acquiring all the internal locks of channels in the subtree of the @waituntil@ clauses. 555 This approach is a poor solution for two reasons. 556 It is possible that once all the locks are acquired, the subtree is not satisfied and the locks must be released. 557 This work incurs a high cost for signalling threads and heavily increase contention on internal channel locks. 558 Furthermore, the @waituntil@ statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible. 559 As such, the exclusive-or semantics are lost when using both @and@ and @or@ operators since it cannot be supported without significant complexity and significantly affects @waituntil@ performance. 560 Therefore, the example of reusing variable @i@ by multiple output channels is considered a user error without exclusive-or semantics. 561 Given aliasing in C, it is impossible to even warn of the potential race. 562 In the future, it would be interesting to support Go-like syntax, \lstinline[language=Go]{case i := <- ...}, defining a new scoped @i@ variable for each clause. 563 564 It was deemed important that exclusive-or semantics are maintained when only @or@ operators are used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel. 565 Consider the following example where thread 1 is reading and threads 2 and 3 are writing to channels @A@ and @B@ concurrently. 566 \begin{cquote} 567 \begin{tabular}{@{}l|l|l@{}} 568 \multicolumn{3}{@{}l@{}}{\lstinline{channel(int) A, B; // zero size channels}} \\ 569 thread 1 & thread 2 & thread 3 \\ 570 \begin{cfa} 571 waituntil( i << A ) {} 572 or waituntil( i << B ) {} 573 \end{cfa} 574 & 575 \begin{cfa} 576 A << 1; 577 578 \end{cfa} 579 & 580 \begin{cfa} 581 B << 2; 582 583 \end{cfa} 584 \end{tabular} 585 \end{cquote} 586 For thread 1 to have exclusive-or semantics, it must only consume from exactly one of @A@ or @B@. 587 As such, thread 2 and 3 must race to establish the winning clause of the @waituntil@ in thread 1. 588 This race is consolidated by thread 2 and 3 each attempting to set a pointer to the winning clause's @select_node@ address using \gls{cas}. 589 The winner bypasses the channel and inserts into the WUT's left-hand, and signals thread 1. 590 The loser continues checking if there is space in the channel, and if so performs the channel insert operation with a possible signal of a waiting remove thread; 591 otherwise, if there is no space, the loser blocks. 592 It is important the race occurs \emph{before} operating on the channel, because channel actions are different with respect to each thread. 593 If the race was consolidated after the operation, both thread 2 and 3 could potentially write into @i@ concurrently. 594 595 Channels introduce another interesting implementation issue. 596 Supporting both reading and writing to a channel in a @waituntil@ means that one @waituntil@ clause may be the notifier of another @waituntil@ clause. 597 This conjunction poses a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel. 598 Consider the following example, alongside a described ordering of events to highlight the race. 599 \begin{cquote} 600 \begin{tabular}{@{}l|l@{}} 601 \multicolumn{2}{@{}l@{}}{\lstinline{channel(int) A, B; // zero size channels}} \\ 602 thread 1 & thread 2 \\ 603 \begin{cfa}[moredelim={**[is][\color{blue}]{\#}{\#}}] 604 waituntil( @i << A@ ) {} 605 or waituntil( #i << B# ) {} 606 \end{cfa} 607 & 608 \begin{cfa}[moredelim={**[is][\color{blue}]{\#}{\#}}] 609 waituntil( #B << 2# ) {} 610 or waituntil( @A << 1@ ) {} 611 \end{cfa} 612 \end{tabular} 613 \end{cquote} 614 Assume thread 1 executes first, registers with channel @A@ and proceeds in the @waituntil@. 615 Since @A@ is empty, thread 1 cannot remove, and then thread 1 is interrupted before registering with @B@. 616 Thread 2 similarly registers with channel @B@, and proceeds in the @waituntil@. 617 Since @B@ is zero size there is no space to insert, and then thread 2 is interrupted before registering with @A@. 618 At this point, thread 1 and 2 resume execution. 619 Remember from above, each exclusive-or @waituntil@ holds a race to set the winning clause of the statement. 620 The issue that arises is that these two @waituntil@ statements must have matching winning clauses (both @A@ clauses or both @B@ clauses) to preserve the exclusive-or semantics, since a zero-sized channel needs an insert/remove pair for an operation to occur. 621 If threads 1 and 2 race to set a winner only in their own @waituntil@, thread 1 can think it successfully removed from @B@, and thread 2 can think it successfully inserted into @A@, which is an error. 622 Hence, there is a race on two fronts. 623 If thread 1 wins the race and sees that @B@ has a waiting insertion, then thread 2 must execute the first clause of its @waituntil@ and thread 1 execute its second. 624 Correspondingly, if thread 2 wins the race and sees that @A@ has a waiting removal, then thread 1 must execute the first clause of its @waituntil@ and thread 2 execute its second. 625 Any other execution scenario is incorrect for exclusive-or semantics. 626 Note, that priority execution of multiple satisfied @waituntil@ causes (\ie top to bottom) is not violated because, in this scenario, there is only one satisfied clause for either thread. 627 628 The Go @select@ solves this problem by acquiring all the internal locks of the channels before registering the @select@ on the channels. 629 This approach eliminates the race shown above since thread 1 and 2 cannot both be registering at the same time. 630 However, this approach cannot be used in \CFA, since the @waituntil@ is polymorphic. 631 Not all types in a @waituntil@ have an internal lock, and when using non-channel types, acquiring all the locks incurs extra unneeded overhead. 632 Instead, this race is consolidated in \CFA in two phases by having an intermediate pending status value for the race. 633 This race case is detectable, and if detected, each thread first races to set its own @waituntil@ race pointer to be pending. 634 If it succeeds, it then attempts to set the other thread's @waituntil@ race pointer to its success value. 635 If either thread successfully sets the the other thread's @waituntil@ race pointer, then the operation can proceed, if not the signalling threads set its own race pointer back to the initial value and repeats. 636 This retry mechanism can potentially introduce a livelock, but in practice a livelock here is highly unlikely. 637 Furthermore, the likelihood of a livelock here is zero unless the program is in the niche case of having two or more exclusive-or @waituntil@s with two or more clauses in reverse order of priority. 638 This livelock case can be fully eliminated using locks like Go, or if a \gls{dcas} instruction is available. 639 If any other threads attempt to set a WUT's race pointer and see a pending value, they wait until the value changes before proceeding to ensure that, in the case the WUT fails, the signal is not lost. 640 This protocol ensures that signals cannot be lost and that the two races can be resolved in a safe manner. 641 The implementation of this protocol is shown in Figure~\ref{f:WU_DeadlockAvoidance}. 642 643 \begin{figure} 644 \begin{cfa} 645 bool pending_set_other( select_node & other, select_node & mine ) { 646 unsigned long int cmp_status = UNSAT; 647 648 // Try to set other status, if we succeed break and return true 649 while( ! CAS( other.clause_status, &cmp_status, SAT ) ) { 650 if ( cmp_status == SAT ) 651 return false; // If other status is SAT we lost so return false 652 653 // Toggle own status flag to allow other thread to potentially win 654 mine.status = UNSAT; 655 656 // Reset compare flag 657 cmp_status = UNSAT; 658 659 // Attempt to set own status flag back to PENDING to retry 660 if ( ! CAS( mine.clause_status, &cmp_status, PENDING ) ) 661 return false; // If we fail then we lost so return false 662 663 // Reset compare flag 664 cmp_status = UNSAT; 665 } 666 return true; 221 667 } 222 for ( node in s ) 223 unregister_select( resource, node ); 224 \end{cfa} 225 226 These steps give a basic, but mildly inaccurate overview of how the statement works. 227 Digging into some parts of the implementation will shed light on more of the specifics and provide some accuracy. 228 229 \subsection{Locks} 230 Locks are one of the resources supported in the waituntil statement. 231 When a thread waits on multiple locks via a waituntil, it enqueues a @select_node@ in each of the lock's waiting queues. 232 When a @select_node@ reaches the front of the queue and gains ownership of a lock, the blocked thread is notified. 233 The lock will be held until the node is unregistered. 234 To prevent the waiting thread from holding many locks at once and potentially introducing a deadlock, the node is unregistered right after the corresponding code block is executed. 235 This prevents deadlocks since the waiting thread will never hold a lock while waiting on another resource. 236 As such the only nodes unregistered at the end are the ones that have not run. 237 238 \subsection{Timeouts} 239 Timeouts in the waituntil take the form of a duration being passed to a @sleep@ or @timeout@ call. 240 An example is shown in the following code. 241 242 \begin{cfa} 243 waituntil( sleep( 1`ms ) ) {} 244 waituntil( timeout( 1`s ) ) {} or waituntil( timeout( 2`s ) ) {} 245 waituntil( timeout( 1`ns ) ) {} and waituntil( timeout( 2`s ) ) {} 246 \end{cfa} 247 248 The timeout implementation highlights a key part of the waituntil semantics, the expression is evaluated before the waituntil runs. 249 As such calls to @sleep@ and @timeout@ do not block, but instead return a type that supports the @is_selectable@ trait. 250 This mechanism is needed for types that want to support multiple operations such as channels that support reading and writing. 251 252 \subsection{Channels}\label{s:wu_chans} 253 To support both waiting on both reading and writing to channels, the opperators @?<<?@ and @?>>?@ are used to show reading and writing to a channel respectively, where the lefthand operand is the value and the righthand operand is the channel. 254 Channels require significant complexity to wait on for a few reasons. 255 The first reason is that reading or writing to a channel is a mutating operation. 256 What this means is that if a read or write to a channel occurs, the state of the channel has changed. 257 In comparison, for standard locks and futures, if a lock is acquired then released or a future is ready but not accessed, the states of the lock and the future are not modified. 258 In this way if a waituntil over locks or futures have some resources available that were not consumed, it is not an issue. 259 However, if a thread modifies a channel on behalf of a thread blocked on a waituntil statement, it is important that the corresponding waituntil code block is run, otherwise there is a potentially erroneous mismatch between the channel state and associated side effects. 260 As such, the @unregister_select@ routine has a boolean return that is used by channels to indicate when the operation was completed but the block was not run yet. 261 As such some channel code blocks may be run as part of the unregister. 262 Furthermore if there are both @and@ and @or@ operators, the @or@ operators stop behaving like exclusive-or semantics since this race between operations and unregisters exists. 263 264 It was deemed important that exclusive-or semantics were maintained when only @or@ operators were used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel. 265 This approach is infeasible in the case where @and@ and @or@ operators are used. 266 To show this consider the following waituntil statement. 267 268 \begin{cfa} 269 waituntil( i >> A ) {} and waituntil( i >> B ) {} 270 or waituntil( i >> C ) {} and waituntil( i >> D ) {} 271 \end{cfa} 272 273 If exclusive-or semantics were followed, this waituntil would only run the code blocks for @A@ and @B@, or the code blocks for @C@ and @D@. 274 However, to race before operation completion in this case introduces a race whose complexity increases with the size of the waituntil statement. 275 In the example above, for @i@ to be inserted into @C@, to ensure the exclusive-or it must be ensured that @i@ can also be inserted into @D@. 276 Furthermore, the race for the @or@ would also need to be won. 277 However, due to TOCTOU issues, one cannot know that all resources are available without acquiring all the internal locks of channels in the subtree. 278 This is not a good solution for two reasons. 279 It is possible that once all the locks are acquired that the subtree is not satisfied and they must all be released. 280 This would incur high cost for signalling threads and also heavily increase contention on internal channel locks. 281 Furthermore, the waituntil statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible. 282 As such, the exclusive-or semantics are lost when using both @and@ and @or@ operators since they can not be supported without significant complexity and hits to waituntil statement performance. 283 284 The mechanism by which the predicate of the waituntil is checked is discussed in more detail in Section~\ref{s:wu_guards}. 285 286 Another consideration introduced by channels is that supporting both reading and writing to a channel in a waituntil means that one waituntil clause may be the notifier for another waituntil clause. 287 This becomes a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel. 288 When you have both a special-case @or@ inserting on one thread and another special-case @or@ consuming is blocked on another thread there is not one but two races that need to be consolidated by the inserting thread. 289 (The race can occur in the opposite case with a blocked producer and signalling consumer too.) 290 For them to know that the insert succeeded, they need to win the race for their own waituntil and win the race for the other waituntil. 291 Go solves this problem in their select statement by acquiring the internal locks of all channels before registering the select on the channels. 292 This eliminates the race since no other threads can operate on the blocked channel since its lock will be held. 293 294 This approach is not used in \CFA since the waituntil is polymorphic. 295 Not all types in a waituntil have an internal lock, and when using non-channel types acquiring all the locks incurs extra uneeded overhead. 296 Instead this race is consolidated in \CFA in two phases by having an intermediate pending status value for the race. 297 This case is detectable, and if detected the thread attempting to signal will first race to set the race flag to be pending. 298 If it succeeds, it then attempts to set the consumer's race flag to its success value. 299 If the producer successfully sets the consumer race flag, then the operation can proceed, if not the signalling thread will set its own race flag back to the initial value. 300 If any other threads attempt to set the producer's flag and see a pending value, they will wait until the value changes before proceeding to ensure that in the case that the producer fails, the signal will not be lost. 301 This protocol ensures that signals will not be lost and that the two races can be resolved in a safe manner. 302 303 Channels in \CFA have exception based shutdown mechanisms that the waituntil statement needs to support. 304 These exception mechanisms were what brought in the @on_selected@ routine. 305 This routine is needed by channels to detect if they are closed upon waking from a waituntil statement, to ensure that the appropriate behaviour is taken. 668 \end{cfa} 669 \caption{Exclusive-or \lstinline{waituntil} channel deadlock avoidance protocol} 670 \label{f:WU_DeadlockAvoidance} 671 \end{figure} 672 673 Channels in \CFA have exception-based shutdown mechanisms that the @waituntil@ statement needs to support. 674 These exception mechanisms are supported through the @on_selected@ routine. 675 This routine is needed by channels to detect if they are closed after unblocking in a @waituntil@ statement, to ensure the appropriate behaviour is taken and an exception is thrown. 676 Hence, the channel close-down mechanism is handled correctly. 306 677 307 678 \subsection{Guards and Statement Predicate}\label{s:wu_guards} 308 Checking for when a synchronous multiplexing utility is done is trivial when it has an or/xor relationship, since any resource becoming available means that the blocked thread can proceed. 309 In \uC and \CFA, their \gls{synch_multiplex} utilities involve both an @and@ and @or@ operator, which make the problem of checking for completion of the statement more difficult. 310 311 In the \uC @_Select@ statement, they solve this problem by constructing a tree of the resources, where the internal nodes are operators and the leafs are the resources. 312 The internal nodes also store the status of each of the subtrees beneath them. 313 When resources become available, their status is modified and the status of the leaf nodes percolate into the internal nodes update the state of the statement. 679 680 It is trivial to check when a synchronous multiplexing utility is done for the or/xor relationship, since any resource becoming available means that the blocked thread can proceed and the @waituntil@ statement is finished. 681 In \uC and \CFA, the \gls{synch_multiplex} mechanism have both an and/or relationship, which along with guards, make the problem of checking for completion of the statement difficult. 682 Consider the following @waituntil@. 683 \begin{cfa} 684 when( GA ) waituntil( A ) {} 685 and when( GB ) waituntil( B ) {} 686 or when( GC ) waituntil( C ) {} 687 \end{cfa} 688 When the @waituntil@ thread wakes up, the following predicate represents satisfaction: 689 \begin{cfa} 690 A && B || C || ! GA && B || ! GB && A || ! GA && ! GB && ! GC 691 \end{cfa} 692 which can be simplified to: 693 \begin{cfa} 694 ( A || ! GA ) && ( B || ! GB ) || C || ! GA && ! GB && ! GC 695 \end{cfa} 696 Checking the complete predicate on each iteration of the pending @waituntil@ is expensive so \uC and \CFA both take steps to simplify checking for statement completion. 697 698 In the \uC @_Select@ statement, this problem is solved by constructing a tree of the resources, where the internal nodes are operators and the leaves are booleans storing the state of each resource. 699 A diagram of the tree for the complete predicate above is shown in Figure~\ref{f:uC_select_tree}, alongside the modification of the tree that occurs when @GA@ is @false@. 700 Each internal node stores the statuses of the two subtrees beneath it. 701 When resources become available, their corresponding leaf node status is modified, which percolates up the tree to update the state of the statement. 314 702 Once the root of the tree has both subtrees marked as @true@ then the statement is complete. 315 As an optimization, when the internal nodes are updated, their subtrees marked as @true@ are effectively pruned and are not touched again. 316 To support \uC's @_Select@ statement guards, the tree prunes the branch if the guard is false. 317 318 The \CFA waituntil statement blocks a thread until a set of resources have become available that satisfy the underlying predicate. 319 The waiting condition of the waituntil statement can be represented as a predicate over the resources, joined by the waituntil operators, where a resource is @true@ if it is available, and @false@ otherwise. 320 In \CFA, this representation is used as the mechanism to check if a thread is done waiting on the waituntil. 321 Leveraging the compiler, a routine is generated per waituntil that is passed the statuses of the resources and returns a boolean that is @true@ when the waituntil is done, and false otherwise. 322 To support guards on the \CFA waituntil statement, the status of a resource disabled by a guard is set to ensure that the predicate function behaves as if that resource is no longer part of the predicate. 323 324 In \uC's @_Select@, it supports operators both inside and outside the clauses of their statement. 325 \eg in the following example the code blocks will run once their corresponding predicate inside the round braces is satisfied. 326 327 % C_TODO put this is uC++ code style not cfa-style 328 \begin{cfa} 703 As an optimization, when the internal nodes are updated, the subtrees marked as @true@ are pruned and not examined again. 704 To support statement guards in \uC, the tree is modified to remove an internal node if a guard is false to maintain the appropriate predicate representation. 705 706 \begin{figure} 707 \begin{center} 708 \input{diagrams/uCpp_select_tree.tikz} 709 \end{center} 710 \caption{\uC \lstinline[language=uC++]{select} tree modification} 711 \label{f:uC_select_tree} 712 \end{figure} 713 714 The \CFA @waituntil@ statement blocks a thread until a set of resources have become available that satisfy the complete predicate of a @waituntil@. 715 The waiting condition of the @waituntil@ statement is implemented as the complete predicate over the resources, joined by the @waituntil@ operators, where a resource is @true@ if it is available, and @false@ otherwise. 716 This complete predicate is used as the mechanism to check if a thread is done waiting on a @waituntil@. 717 Leveraging the compiler, a predicate routine is generated per @waituntil@ that when passes the statuses of the resources, returns @true@ when the @waituntil@ is done, and false otherwise. 718 To support guards on the \CFA @waituntil@ statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate. 719 The generated code allows the predicate that is checked with each iteration to be simplified to not check guard values. 720 For example, the following is generated for the complete predicate above: 721 \begin{cfa} 722 // statement completion predicate 723 bool check_completion( select_node * nodes ) { 724 return nodes[0].status && nodes[1].status || nodes[2].status; 725 } 726 if ( GA || GB || GC ) { $\C{// skip statement if all guards false}$ 727 select_node nodes[3]; 728 nodes[0].status = ! GA && GB; $\C{// A's status}$ 729 nodes[1].status = ! GB && GA; $\C{// B's status}$ 730 nodes[2].status = ! GC; $\C{// C's status}$ 731 // ... rest of waituntil codegen ... 732 } 733 \end{cfa} 734 735 \uC's @_Select@, supports operators both inside and outside of the \lstinline[language=uC++]{_Select} clauses. 736 In the following example, the code blocks run once their corresponding predicate inside the round braces is satisfied. 737 \begin{lstlisting}[language=uC++,{moredelim=**[is][\color{red}]{@}{@}}] 329 738 Future_ISM<int> A, B, C, D; 330 _Select( A || B && C ) { ... } 331 and _Select( D && E ) { ... } 332 \end{cfa} 333 334 This is more expressive that the waituntil statement in \CFA. 335 In \CFA, since the waituntil statement supports more resources than just futures, implmenting operators inside clauses was avoided for a few reasons. 336 As an example, suppose \CFA supported operators inside clauses and consider the code snippet in Figure~\ref{f:wu_inside_op}. 337 338 \begin{figure} 739 _Select( @A || B && C@ ) { ... } 740 and _Select( @D && E@ ) { ... } 741 \end{lstlisting} 742 This feature is more expressive than the @waituntil@ statement in \CFA, allowing the code block for @&&@ to only run after \emph{both} resources are available. 743 744 In \CFA, since the @waituntil@ statement supports more resources than just futures, implementing operators inside clauses is avoided for a few reasons. 745 As a motivating example, suppose \CFA supported operators inside clauses as in: 339 746 \begin{cfa} 340 747 owner_lock A, B, C, D; … … 342 749 or waituntil( C && D ) { ... } 343 750 \end{cfa} 344 \caption{Example of unsupported operators inside clauses in \CFA.} 345 \label{f:wu_inside_op} 346 \end{figure} 347 348 If the waituntil in Figure~\ref{f:wu_inside_op} works with the same semantics as described and acquires each lock as it becomes available, it opens itself up to possible deadlocks since it is now holding locks and waiting on other resources. 349 As such other semantics would be needed to ensure that this operation is safe. 350 One possibility is to use \CC's @scoped_lock@ approach that was described in Section~\ref{s:DeadlockAvoidance}, however the potential for livelock leaves much to be desired. 351 Another possibility would be to use resource ordering similar to \CFA's @mutex@ statement, but that alone is not sufficient if the resource ordering is not used everywhere. 352 Additionally, using resource ordering could conflict with other semantics of the waituntil statement. 353 To show this conflict, consider if the locks in Figure~\ref{f:wu_inside_op} were ordered @D@, @B@, @C@, @A@. 354 If all the locks are available, it becomes complex to both respect the ordering of the waituntil in Figure~\ref{f:wu_inside_op} when choosing which code block to run and also respect the lock ordering of @D@, @B@, @C@, @A@ at the same time. 751 If the @waituntil@ acquires each lock as it becomes available, there is a possible deadlock since it is in a hold-and-wait situation. 752 Other semantics are needed to ensure this operation is safe. 753 One possibility is to use \CC's @scoped_lock@ approach described in Section~\ref{s:DeadlockAvoidance}; 754 however, that opens the potential for livelock. 755 Another possibility is to use resource ordering similar to \CFA's @mutex@ statement, but that alone is insufficient, if the resource ordering is not used universally. 355 756 One other way this could be implemented is to wait until all resources for a given clause are available before proceeding to acquire them, but this also quickly becomes a poor approach. 356 This approach won't work due to TOCTOU issues, as it is not possible to ensure that the full set resources are available without holding them all first. 357 Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems involved, but it was not deemed an important feature when taking into account the runtime cost that would need to be paid to handle these situations. 358 The problem of operators inside clauses also becomes a difficult issue to handle when supporting channels. 359 If internal operators were supported, it would require some way to ensure that channels with internal operators are modified on if and only if the corresponding code block is run, but that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}. 360 361 \section{Waituntil Performance} 362 The two \gls{synch_multiplex} utilities that are in the realm of comparability with the \CFA waituntil statement are the Go @select@ statement and the \uC @_Select@ statement. 363 As such, two microbenchmarks are presented, one for Go and one for \uC to contrast the systems. 364 The similar utilities discussed at the start of this chapter in C, Ada, Rust, \CC, and OCaml are either not meaningful or feasible to benchmark against. 365 The select(2) and related utilities in C are not comparable since they are system calls that go into the kernel and operate on file descriptors, whereas the waituntil exists solely in userspace. 366 Ada's @select@ only operates on methods, which is done in \CFA via the @waitfor@ utility so it is not feasible to benchmark against the @waituntil@, which cannot wait on the same resource. 367 Rust and \CC only offer a busy-wait based approach which is not meaningly comparable to a blocking approach. 368 OCaml's @select@ waits on channels that are not comparable with \CFA and Go channels, which makes the OCaml @select@ infeasible to compare it with Go's @select@ and \CFA's @waituntil@. 369 Given the differences in features, polymorphism, and expressibility between the waituntil and @select@, and @_Select@, the aim of the microbenchmarking in this chapter is to show that these implementations lie in the same realm of performance, not to pick a winner. 757 This approach does not work due to \gls{toctou} issues, \ie it is impossible to ensure that the full set of resources are available without holding them all first. 758 Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems. 759 Finally, the problem of operators inside clauses is also a difficult to handle when supporting channels. 760 It would require some way to ensure channels used with internal operators are modified, if and only if, the corresponding code block is run. 761 However, that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}. 762 Ultimately, I did not deemed this feature to be crucial enough when taking into account the complexity and runtime cost. 763 764 \subsection{The full \lstinline{waituntil} picture} 765 766 Given all the complex details handled by the @waituntil@, its full pseudocode is presented in Figure \ref{f:WU_Full_Impl}. 767 Some things to note are as follows. 768 The @finally@ blocks provide exception-safe \gls{raii} unregistering of nodes, and in particular, the @finally@ inside the innermost loop performs the immediate unregistering required for deadlock-freedom mentioned in Section~\ref{s:wu_locks}. 769 The @when_conditions@ array is used to store the boolean result of evaluating each guard at the beginning of the @waituntil@, and it is used to conditionally omit operations on resources with @false@ guards. 770 As discussed in Section~\ref{s:wu_chans}, this pseudocode includes conditional code-block execution based on the result of both @on_selected@ and @unregister_select@, which allows the channel implementation to ensure all available channel resources have their corresponding code block run. 771 772 \begin{figure} 773 \begin{cfa} 774 bool when_conditions[N]; 775 for ( node in nodes ) $\C[3.75in]{// evaluate guards}$ 776 if ( node has guard ) 777 when_conditions[node] = node_guard; 778 else 779 when_conditions[node] = true; 780 781 if ( any when_conditions[node] are true ) { 782 select_nodes nodes[N]; $\C{// declare N select nodes}$ 783 try { 784 // ... set statuses for nodes with when_conditions[node] == false ... 785 786 for ( node in nodes ) $\C{// register nodes}$ 787 if ( when_conditions[node] ) 788 register_select( resource, node ); 789 790 while ( !check_completion( nodes ) ) { $\C{// check predicate}$ 791 // block 792 for ( resource in waituntil statement ) { $\C{// run true code blocks}$ 793 if ( check_completion( nodes ) ) break; 794 if ( resource is avail ) 795 try { 796 if( on_selected( resource ) ) $\C{// conditionally run block}$ 797 run code block 798 } finally $\C{// for exception safety}$ 799 unregister_select( resource, node ); $\C{// immediate unregister}$ 800 } 801 } 802 } finally { $\C{// for exception safety}$ 803 for ( registered nodes in nodes ) $\C{// deregister nodes}$ 804 if ( when_conditions[node] && unregister_select( resource, node ) 805 && on_selected( resource ) ) 806 run code block $\C{// run code block upon unregister}\CRT$ 807 } 808 } 809 \end{cfa} 810 \caption{Full \lstinline{waituntil} Pseudocode Implementation} 811 \label{f:WU_Full_Impl} 812 \end{figure} 813 814 \section{\lstinline{waituntil} Performance} 815 816 Similar facilities to @waituntil@ are discussed in Section~\ref{s:History} covering C, Ada, Rust, \CC, and OCaml. 817 However, these facilities are either not meaningful or feasible to benchmark against. 818 The UNIX @select@ and related utilities are not comparable since they are system calls that go into the kernel and operate on file descriptors, whereas the @waituntil@ exists solely in user space. 819 Ada's \lstinline[language=Ada]{select} and \uC's \lstinline[language=uC++]{_Accept} only operate on method calls, which is done in \CFA via the @waitfor@ statement. 820 Rust and \CC only offer a busy-wait approach, which is not comparable to a blocking approach. 821 OCaml's @select@ waits on channels that are not comparable with \CFA and Go channels, so OCaml @select@ is not benchmarked against Go's @select@ and \CFA's @waituntil@. 822 823 The two \gls{synch_multiplex} utilities that are in the realm of comparability with the \CFA @waituntil@ statement are the Go \lstinline[language=Go]{select} statement and the \uC \lstinline[language=uC++]{_Select} statement. 824 As such, two microbenchmarks are presented, one for Go and one for \uC to contrast this feature. 825 Given the differences in features, polymorphism, and expressibility between @waituntil@ and \lstinline[language=Go]{select}, and \uC \lstinline[language=uC++]{_Select}, the aim of the microbenchmarking in this chapter is to show that these implementations lie in the same realm of performance, not to pick a winner. 370 826 371 827 \subsection{Channel Benchmark} 372 The channel microbenchmark compares \CFA's waituntil and Go's select, where the resource being waited on is a set of channels. 373 374 %C_TODO explain benchmark 375 376 %C_TODO show results 377 378 %C_TODO discuss results 828 829 The channel multiplexing benchmarks compare \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}, where the resource being waited on is a set of channels. 830 The basic structure of the benchmark has the number of cores split evenly between producer and consumer threads, \ie, with 8 cores there are 4 producer and 4 consumer threads. 831 The number of resource clauses $C$ is also varied across 2, 4, and 8 clauses, where each clause has a different channel that it waits on. 832 Each producer and consumer repeatedly waits to either produce or consume from one of the $C$ clauses and respective channels. 833 For example, in \CFA syntax, the work loop in the consumer main with $C = 4$ clauses is: 834 \begin{cfa} 835 for () 836 waituntil( val << chans[0] ); or waituntil( val << chans[1] ); 837 or waituntil( val << chans[2] ); or waituntil( val << chans[3] ); 838 \end{cfa} 839 A successful consumption is counted as a channel operation, and the throughput of these operations is measured over 10 seconds. 840 The first benchmark measures throughput of the producers and consumer synchronously waiting on the channels and the second has the threads asynchronously wait on the channels using the Go @default@ and \CFA @else@ clause. 841 The results are shown in Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} respectively. 842 843 \begin{figure} 844 \centering 845 \captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize} 846 \subfloat[AMD]{ 847 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_2.pgf}} 848 } 849 \subfloat[Intel]{ 850 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_2.pgf}} 851 } 852 \bigskip 853 854 \subfloat[AMD]{ 855 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_4.pgf}} 856 } 857 \subfloat[Intel]{ 858 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_4.pgf}} 859 } 860 \bigskip 861 862 \subfloat[AMD]{ 863 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Contend_8.pgf}} 864 } 865 \subfloat[Intel]{ 866 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Contend_8.pgf}} 867 } 868 \caption{The channel synchronous multiplexing benchmark comparing Go select and \CFA \lstinline{waituntil} statement throughput (higher is better).} 869 \label{f:select_contend_bench} 870 \end{figure} 871 872 \begin{figure} 873 \centering 874 \captionsetup[subfloat]{labelfont=footnotesize,textfont=footnotesize} 875 \subfloat[AMD]{ 876 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_2.pgf}} 877 } 878 \subfloat[Intel]{ 879 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_2.pgf}} 880 } 881 \bigskip 882 883 \subfloat[AMD]{ 884 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_4.pgf}} 885 } 886 \subfloat[Intel]{ 887 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_4.pgf}} 888 } 889 \bigskip 890 891 \subfloat[AMD]{ 892 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Spin_8.pgf}} 893 } 894 \subfloat[Intel]{ 895 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Spin_8.pgf}} 896 } 897 \caption{The asynchronous multiplexing channel benchmark comparing Go select and \CFA \lstinline{waituntil} statement throughput (higher is better).} 898 \label{f:select_spin_bench} 899 \end{figure} 900 901 Both Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} have similar results when comparing \lstinline[language=Go]{select} and @waituntil@. 902 In the AMD benchmarks (left column), the performance is very similar as the number of cores scale. 903 The AMD machine has a high-caching contention cost because of its \emph{chicklet} L3 cache (\ie many L3 caches servicing a small number of cores), which creates a bottleneck on the channel locks and dominates the shape of the performance curve for both \CFA and Go. 904 Hence, it is difficult to detect differences in the \gls{synch_multiplex}, except at low cores, where Go has significantly better performance, due to an optimization in its scheduler. 905 Go heavily optimizes thread handoffs on the local run-queue, which can result in very good performance for low numbers of threads parking/unparking each other~\cite{go:sched}. 906 In the Intel benchmarks (right column), \CFA performs better than Go as the number of cores scales past 2/4 and as the number of clauses increase. 907 This difference is due to Go's acquiring all channel locks when registering and unregistering channels on a \lstinline[language=Go]{select}. 908 Go then is holding a lock for every channel, resulting in worse performance as the number of channels increase. 909 In \CFA, since races are consolidated without holding all locks, it scales much better both with cores and clauses since more work can occur in parallel. 910 This scalability difference is more significant on the Intel machine than the AMD machine since the Intel has lower cache-contention costs. 911 912 The Go approach of holding all internal channel-locks in the \lstinline[language=Go]{select} has additional drawbacks. 913 There are pathological cases where Go's throughput has significant jitter. 914 Consider a producer and consumer thread, @P1@ and @C1@, selecting from both channels @A@ and @B@. 915 \begin{cquote} 916 \begin{tabular}{@{}ll@{}} 917 @P1@ & @C1@ \\ 918 \begin{cfa} 919 waituntil( A << i ); or waituntil( B << i ); 920 \end{cfa} 921 & 922 \begin{cfa} 923 waituntil( val << A ); or waituntil( val << B ); 924 \end{cfa} 925 \end{tabular} 926 \end{cquote} 927 Additionally, there is another producer and consumer thread, @P2@ and @C2@, operating solely on @B@. 928 \begin{cquote} 929 \begin{tabular}{@{}ll@{}} 930 @P2@ & @C2@ \\ 931 \begin{cfa} 932 B << val; 933 \end{cfa} 934 & 935 \begin{cfa} 936 val << B; 937 \end{cfa} 938 \end{tabular} 939 \end{cquote} 940 In Go, this setup results in significantly worse performance since @P2@ and @C2@ cannot operate in parallel with @P1@ and @C1@ due to all locks being acquired. 941 Interesting, this case may not be as pathological as it seems. 942 If the set of channels belonging to a \lstinline[language=Go]{select} have channels that overlap with the set of another \lstinline[language=Go]{select}, these statements lose the ability to operate in parallel. 943 The implementation in \CFA only holds a single lock at a time, resulting in better locking granularity, and hence, more parallelism. 944 Comparison of this pathological case is shown in Table~\ref{t:pathGo}. 945 The AMD results highlight the worst-case scenario for Go since contention is more costly on this machine than the Intel machine. 946 947 \begin{table}[t] 948 \centering 949 \setlength{\extrarowheight}{2pt} 950 \setlength{\tabcolsep}{5pt} 951 952 \caption{Throughput (channel operations per second) of \CFA and Go in a pathological case for contention in Go's select implementation} 953 \label{t:pathGo} 954 \begin{tabular}{r|r|r} 955 & \multicolumn{1}{c|}{\CFA} & \multicolumn{1}{c}{Go} \\ 956 \hline 957 AMD & \input{data/nasus_Order} \\ 958 \hline 959 Intel & \input{data/pyke_Order} 960 \end{tabular} 961 \end{table} 962 963 Another difference between Go and \CFA is the order of clause selection when multiple clauses are available. 964 Go \emph{randomly} selects a clause~\cite{go:select}, but \CFA chooses in the order clauses are listed. 965 This \CFA design decision allows users to set implicit priorities, which can result in more predictable behaviour and even better performance. 966 In the previous example, threads @P1@ and @C1@ prioritize channel @A@ in the @waituntil@, which can reduce contention for threads @P2@ and @C2@ accessing channel @B@. 967 If \CFA did not have priorities, the performance difference in Table~\ref{t:pathGo} would be significant less due to extra contention on channel @B@. 379 968 380 969 \subsection{Future Benchmark} 381 The future benchmark compares \CFA's waituntil with \uC's @_Select@, with both utilities waiting on futures. 382 383 %C_TODO explain benchmark 384 385 %C_TODO show results 386 387 %C_TODO discuss results 970 971 The future benchmark compares \CFA's @waituntil@ with \uC's \lstinline[language=uC++]{_Select}, with both utilities waiting on futures. 972 While both statements have very similar semantics, supporting @and@ and @or@ operators, \lstinline[language=uC++]{_Select} can only wait on futures, whereas the @waituntil@ is polymorphic. 973 As such, the underlying implementation of the operators differs between @waituntil@ and \lstinline[language=uC++]{_Select}. 974 The @waituntil@ statement checks for statement completion using a predicate function, whereas the \lstinline[language=uC++]{_Select} statement maintains a tree that represents the state of the internal predicate. 975 976 This benchmark aims to indirectly measure the impact of various predicates on the performance of the @waituntil@ and \lstinline[language=uC++]{_Select} statements. 977 The benchmark is indirect since the performance of futures in \CFA and \uC differ by a significant margin. 978 The experiment has a server, which cycles fulfilling three futures, @A@, @B@, and @C@, and a client, which waits for these futures to be fulfilled using four different kinds of predicates given in \CFA: 979 \begin{cquote} 980 \begin{tabular}{@{}l|l@{}} 981 OR & AND \\ 982 \hline 983 \begin{cfa} 984 waituntil( A ) { get( A ); } 985 or waituntil( B ) { get( B ); } 986 or waituntil( C ) { get( C ); } 987 \end{cfa} 988 & 989 \begin{cfa} 990 waituntil( A ) { get( A ); } 991 and waituntil( B ) { get( B ); } 992 and waituntil( C ) { get( C ); } 993 \end{cfa} 994 \\ 995 \multicolumn{2}{@{}c@{}}{} \\ 996 AND-OR & OR-AND \\ 997 \hline 998 \begin{cfa} 999 waituntil( A ) { get( A ); } 1000 and waituntil( B ) { get( B ); } 1001 or waituntil( C ) { get( C ); } 1002 \end{cfa} 1003 & 1004 \begin{cfa} 1005 @(@ waituntil( A ) { get( A ); } 1006 or waituntil( B ) { get( B ); } @)@ 1007 and waituntil( C ) { get( C ); } 1008 \end{cfa} 1009 \end{tabular} 1010 \end{cquote} 1011 The server and client use a low cost synchronize after each fulfillment, so the server does not race ahead of the client. 1012 1013 Results of this benchmark are shown in Figure~\ref{f:futurePerf}. 1014 Each pair of bars is marked with the predicate name for that experiment and the value at the top of each bar is the standard deviation. 1015 In detail, \uC results are lower in all cases due to the performance difference between futures and the more complex \gls{synch_multiplex} implementation. 1016 However, the bars for both systems have similar height patterns across the experiments. 1017 The @OR@ column for \CFA is more performant than the other \CFA predicates, due to the special-casing of @waituntil@ statements with only @or@ operators. 1018 For both \uC and \CFA, the @AND@ experiment is the least performant, which is expected since all three futures need to be fulfilled for each statement completion. 1019 Interestingly, \CFA has lower variation across predicates on the AMD (excluding the special OR case), whereas \uC has lower variation on the Intel. 1020 Given the differences in semantics and implementation between \uC and \CFA, this test only illustrates the overall costs among the different kinds of predicates. 1021 1022 \begin{figure} 1023 \centering 1024 \subfloat[AMD Future Synchronization Benchmark]{ 1025 \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Future.pgf}} 1026 \label{f:futureAMD} 1027 } 1028 \subfloat[Intel Future Synchronization Benchmark]{ 1029 \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Future.pgf}} 1030 \label{f:futureIntel} 1031 } 1032 \caption{\CFA \lstinline{waituntil} and \uC \lstinline{_Select} statement throughput synchronizing on a set of futures with varying wait predicates (higher is better).} 1033 \label{f:futurePerf} 1034 \end{figure}
Note:
See TracChangeset
for help on using the changeset viewer.