Changes in / [4c2e561:3397eed]
- Files:
-
- 1 deleted
- 12 edited
-
doc/theses/colby_parsons_MMAth/Makefile (modified) (1 diff)
-
doc/theses/colby_parsons_MMAth/diagrams/steal.tikz (deleted)
-
doc/theses/colby_parsons_MMAth/local.bib (modified) (1 diff)
-
doc/theses/colby_parsons_MMAth/text/channels.tex (modified) (1 diff)
-
doc/theses/colby_parsons_MMAth/text/conclusion.tex (modified) (3 diffs)
-
doc/theses/colby_parsons_MMAth/text/intro.tex (modified) (1 diff)
-
doc/theses/colby_parsons_MMAth/text/waituntil.tex (modified) (20 diffs)
-
doc/theses/colby_parsons_MMAth/thesis.tex (modified) (1 diff)
-
src/Common/ScopedMap.h (modified) (3 diffs)
-
src/Concurrency/KeywordsNew.cpp (modified) (2 diffs)
-
src/GenPoly/ErasableScopedMap.h (modified) (2 diffs)
-
src/GenPoly/ScopedSet.h (modified) (8 diffs)
-
src/SymTab/FixFunction.cc (modified) (2 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/colby_parsons_MMAth/Makefile
r4c2e561 r3397eed 23 23 text/channels \ 24 24 text/waituntil \ 25 text/conclusion \26 25 } 27 26 -
doc/theses/colby_parsons_MMAth/local.bib
r4c2e561 r3397eed 168 168 } 169 169 170 @misc{openmp, 171 author = "OpenMP", 172 title = "OPENMP API Specification", 173 howpublished = {\href{https://www.openmp.org/spec-html/5.0/openmpch1.html}}, 174 note = "[Online; accessed 23-May-2023]" 175 } 176 170 177 @techreport{wilson94, 171 178 title={The suif compiler system: a parallelizing and optimizing research compiler}, -
doc/theses/colby_parsons_MMAth/text/channels.tex
r4c2e561 r3397eed 80 80 This approach is similar to wait morphing for locks~\cite[p.~82]{Butenhof97} and improves performance in a few ways. 81 81 First, each thread interacting with the channel only acquires and releases the internal channel lock once. 82 As a result, contention on the internal lock is decreased ; only entering threads compete for the lock sinceunblocking threads do not reacquire the lock.82 As a result, contention on the internal lock is decreased, as only entering threads compete for the lock as unblocking threads do not reacquire the lock. 83 83 The other advantage of Go's wait-morphing approach is that it eliminates the bottleneck of waiting for signalled threads to run. 84 84 Note, the property of acquiring/releasing the lock only once can also be achieved with a different form of cooperation, called \Newterm{baton passing}. -
doc/theses/colby_parsons_MMAth/text/conclusion.tex
r4c2e561 r3397eed 5 5 % ====================================================================== 6 6 This thesis presented a suite of safe and efficient concurrency tools that provide users with the means to write scalable programs in \CFA through many avenues. 7 If users prefer the message passing paradigm of concurrency, \CFA now provides message passingtools in the form of a performant actor system and channels.7 If users prefer the message passing paradigm of concurrency, \CFA now provides tools in the form of a performant actor system and channels. 8 8 For shared memory concurrency the mutex statement provides a safe and easy-to-use interface for mutual exclusion. 9 The waituntil statement provided by this works aids in writing concurrent programs in both the message passing and shared memory paradigms of concurrency.10 Furthermore ,no other language provides a synchronous multiplexing tool polymorphic over resources like \CFA's waituntil.9 The waituntil statement provided by this works aids in writing concurrent programs in both the message passing and shared memory worlds of concurrency. 10 Furthermore no other language provides a synchronous multiplexing tool polymorphic over resources like \CFA's waituntil. 11 11 From the novel copy queue data structure in the actor system, to the plethora of user-supporting safety features, all these utilities build upon existing tools with value added. 12 12 … … 15 15 This thesis scratches the surface of implicit concurrency by providing an actor system. 16 16 There is room for more implicit concurrency tools in \CFA. 17 User-defined implicit concurrency in the form of annotated loops or recursive functions exists in many other languages and libraries~\cite{uC++, OpenMP}. Similar implicit concurrency mechanismscould be implemented and expanded on in \CFA.17 User-defined implicit concurrency in the form of annotated loops or recursive functions exists in many other languages and libraries~\cite{uC++,openmp} and could be implemented and expanded on in \CFA. 18 18 Additionally, the problem of automatic parallelism of sequential programs via the compiler is an interesting research space that other languages have approached~\cite{wilson94,haskell:parallel} that could also be explored in \CFA. 19 20 21 \subsection{Synchronously Multiplexing System Calls} 22 There are many tools that try to sychronize on or asynchronously check I/O, since improvements in this area pay dividends in many areas of computer science~\cite{linux:select,linux:poll,linux:epoll,linux:iouring}. 23 Research on improving user-space tools to synchronize over I/O and other system calls is an interesting area to explore in the world of concurrent tooling. 19 24 20 25 \subsection{Advanced Actor Stealing Heuristics} 21 26 In this thesis, two basic victim selection heuristics were chosen when implementing the work stealing actor system. Good victim selection is an active area of work stealing research, especially when taking into account NUMA effects and cache locality~\cite{barghi18,wolke17}. The actor system in \CFA is modular and exploration of other victim selection heuristics for queue stealing in \CFA could constitute future work. The other question that arises with work stealing is: When should a worker thread steal? Work stealing systems can often be too aggressive when stealing, causing the cost of the steal to be higher than what is saved by sharing the work. In the presented actor system, a worker thread steals work after it checks all its work queues for work twice and sees them all empty. Given that thief threads often have cycles to spare, there is room for a more nuanced approach to choosing when to steal. 22 23 \subsection{Synchronously Multiplexing System Calls}24 There are many tools that try to sychronously wait for or asynchronously check I/O, since improvements in this area pay dividends in many areas of computer science~\cite{linux:select,linux:poll,linux:epoll,linux:iouring}.25 Research on improving user-space tools to synchronize over I/O and other system calls is an interesting area to explore in the world of concurrent tooling.26 27 27 28 \subsection{Better Atomic Operations} … … 33 34 The semantics and safety of these builtins require careful navigation since they require the user to have a nuanced understanding of concurrent memory ordering models to pass via flags. 34 35 Furthermore, these atomics also often require a user to understand how to fence appropriately to ensure correctness. 35 All these problems and more could benefit from language support in \CFA. 36 Adding good language support for atomics is a difficult and nuanced problem, which if solved well would allow for easier and safer writing of low-level concurrent code. 37 36 All these problems and more could benefit from language support, and adding said language support in \CFA could constitute a great research contribution, and allow for easier writing of low-level safe concurrent code. -
doc/theses/colby_parsons_MMAth/text/intro.tex
r4c2e561 r3397eed 13 13 The groundwork for concurrent features in \CFA was implemented by Thierry Delisle~\cite{Delisle18}, who contributed the threading system, coroutines, monitors and other tools. 14 14 This thesis builds on top of that foundation by providing a suite of high-level concurrent features. 15 The features include a @mutex@ statement, channels ,a @waituntil@ statement, and an actor system.15 The features include a @mutex@ statement, channels and a @waituntil@ statement, and an actor system. 16 16 All of these features exist in other programming in some shape or form, however this thesis extends the original ideas by improving performance, productivity, and safety. -
doc/theses/colby_parsons_MMAth/text/waituntil.tex
r4c2e561 r3397eed 5 5 % ====================================================================== 6 6 7 Consider the following motivati ngproblem.7 Consider the following motivational problem that we shall title the bathroom problem. 8 8 There are @N@ stalls (resources) in a bathroom and there are @M@ people (threads). 9 Each stall has its own lock since only one person may occupy a stall at a time. 10 Humans tend to solve this problem in the following way. 11 They check if all of the stalls are occupied. 12 If not they enter and claim an available stall. 13 If they are all occupied, the people queue and watch the stalls until one is free and then enter and lock the stall. 14 This solution can be implemented on a computer easily if all threads are waiting on all stalls and agree to queue. 15 Now the problem is extended. 16 Some stalls are wheelchair accessible, some stalls are dirty and other stalls are clean. 17 Each person (thread) may choose some subset of dirty, clean and accessible stalls that they want to wait for. 18 Immediately the problem becomes much more difficult. 19 A single queue no longer fully solves the problem: What happens when there is a stall available that the person at the front of the queue will not choose? 20 The naive solution to this problem has each thread to spin indefinitely continually checking the stalls until an suitable one is free. 21 This is not good enough since this approach wastes cycles and results in no fairness among threads waiting for stalls as a thread will jump in the first stall available without any regard to other waiting threads. 22 Waiting for the first stall (resource) available without spinning is an example of \gls{synch_multiplex}, the ability to wait synchronously for a resource or set of resources. 9 Each stall has their own lock since only one person may occupy a stall at a time. 10 The standard way that humans solve this problem is that they check if the stalls are occupied and if they all are they queue and watch the stalls until one is free and then enter and lock the stall. 11 This solution is simple in real life, but can be difficult to implement in a concurrent context as it requires the threads to somehow wait on all the stalls at the same time. 12 The naive solution to this is for each thread to spin indefinitely continually checking the stalls until one is free. 13 This wastes cycles and also results in no fairness among threads waiting for stalls as a thread will jump in the first stall available without any regard to other waiting threads. 14 The ability to wait for the first stall available without spinning can be done with concurrent tools that provide \gls{synch_multiplex}, the ability to wait synchronously for a resource or set of resources. 23 15 24 16 \section{History of Synchronous Multiplexing} 25 17 There is a history of tools that provide \gls{synch_multiplex}. 26 Some well known \gls{synch_multiplex} tools include unix system utilities: select(2)\cite{linux:select}, poll(2)\cite{linux:poll}, and epoll(7)\cite{linux:epoll}, and the select statement provided by Go\cite{go:selectref}. 27 28 The theory surrounding \gls{synch_multiplex} was largely introduced by Hoare in his 1985 CSP book \cite{Hoare85} and his later work with Roscoe on the theoretical language Occam\cite{Roscoe88}. 18 Some of the most well known include the set of unix system utilities: select(2)\cite{linux:select}, poll(2)\cite{linux:poll}, and epoll(7)\cite{linux:epoll}, and the select statement provided by Go\cite{go:selectref}. 19 20 Before one can examine the history of \gls{synch_multiplex} implementations in detail, the preceding theory must be discussed. 21 The theory surrounding this topic was largely introduced by Hoare in his 1985 CSP book \cite{Hoare85} and his later work with Roscoe on the theoretical language Occam\cite{Roscoe88}. 22 Both include guards for communication channels and the ability to wait for a single channel to be ready out of a set of channels. 29 23 The work on Occam in \cite{Roscoe88} calls their \gls{synch_multiplex} primitive ALT, which waits for one resource to be available and then executes a corresponding block of code. 30 Waiting for one resource out of a set of resources can be thought of as a logical exclusive-or over the set of resources. 31 Both CSP and Occam include \Newterm{guards} for communication channels and the ability to wait for a single channel to be ready out of a set of channels. 24 Waiting for one resource out of a set of resources can be thought of as a logical exclusive or over the set of resources. 32 25 Guards are a conditional operator similar to an @if@, except they apply to the resource being waited on. 33 26 If a guard is false then the resource it guards is considered to not be in the set of resources being waited on. 34 Guards can be simulated using if statements, but to do so requires \[2^N\] if statements, where @N@ is the number of guards.27 Guards can be simulated using if statements, but to do so requires \[2^N\] if cases, where @N@ is the number of guards. 35 28 The equivalence between guards and exponential if statements comes from an Occam ALT statement rule~\cite{Roscoe88}, which is presented in \CFA syntax in Figure~\ref{f:wu_if}. 36 29 Providing guards allows for easy toggling of waituntil clauses without introducing repeated code. … … 38 31 \begin{figure} 39 32 \begin{cfa} 40 // CFA's guards use the keyword 'when' 41 when( predicate ) waituntil( A ) {} 33 when( predicate ) waituntil( A ) {} 42 34 or waituntil( B ) {} 43 35 // === … … 53 45 \end{figure} 54 46 55 When discussing \gls{synch_multiplex} implementations, one mustdiscuss the resources being multiplexed.56 While the aforementioned theory waits on channels, the earliest known implementation of a synchronous multiplexing tool, Unix's select(2)\cite{linux:select}, is multiplexed over file descriptors.57 The select(2) system call is passed three sets of file descriptors (read, write, exceptional) to wait onand an optional timeout.58 Select (2)will block until either some subset of file descriptors are available or the timeout expires.59 All file descriptors that are ready will be returned by modifying the argument sets to only contain the ready descriptors.60 This early implementation differs from the theory presented in Occam and CSP; when the call from select(2)returns it may provide more than one ready file descriptor.61 As such , select(2) has logical-or multiplexing semantics, whereas the theory described exclusive-or semantics.47 Switching to implementations, it is important to discuss the resources being multiplexed. 48 While the aforementioned theory discusses waiting on channels, the earliest known implementation of a synchronous multiplexing tool, Unix's select(2), is multiplexed over file descriptors. 49 The select(2) system call takes in sets of file descriptors to wait on as arguments and an optional timeout. 50 Select will block until either some subset of file descriptors are available or the timeout expires. 51 All file descriptors that are ready will be returned. 52 This early implementation differs from the theory as when the call from select returns it may provide more than one ready file descriptor. 53 As such select has a logical or multiplexing semantics, whereas the theory described exclusive-or semantics. 62 54 This is not a drawback. 63 55 A user can easily achieve exclusive-or semantics with select by arbitrarily choosing only one of the returned descriptors to operate on. 64 Select (2) was followed by poll(2), which was later followed by epoll(7), with each successor improvingupon drawbacks in their predecessors.65 The syscall poll(2) improved on select (2)by allowing users to monitor descriptors with numbers higher than 1024 which was not supported by select.66 Epoll (7) then improved on poll(2) to return the set of file descriptors; when one or more descriptors became available poll(2) would return the number of availables descriptors, but would not indicate which descriptors were ready.56 Select was followed by poll(2), and later epoll(7), which both improved upon drawbacks in their predecessors. 57 The syscall poll(2) improved on select by allowing users to monitor descriptors with numbers higher than 1024 which was not supported by select. 58 Epoll then improved on poll to return the set of file descriptors since poll would only say that some descriptor from the set was ready but not return which ones were ready. 67 59 68 60 It is worth noting these \gls{synch_multiplex} tools mentioned so far interact directly with the operating system and are often used to communicate between processes. 69 Later ,\gls{synch_multiplex} started to appear in user-space to support fast multiplexed concurrent communication between threads.61 Later \gls{synch_multiplex} started to appear in user-space to support fast multiplexed concurrent communication between threads. 70 62 An early example of \gls{synch_multiplex} is the select statement in Ada~\cite[\S~9.7]{Ichbiah79}. 71 63 The select statement in Ada allows a task to multiplex over some subset of its own methods that it would like to @accept@ calls to. 72 Tasks in Ada are essentially objects that have their own thread, and as such have methods, fields, etc.73 Th e Ada select statement has the same exclusive-or semantics and guards as ALT fromOccam, however it multiplexes over methods on rather than multiplexing over channels.74 A code block is associated with each @accept@, and the method that is accepted first has its corresponding code block run after the task unblocks.64 Tasks in Ada can be thought of as threads which are an object of a specific class, and as such have methods, fields, etc. 65 This statement has the same exclusive-or semantics as ALT from Occam, and supports guards as described in Occam, however it multiplexes over methods on rather than multiplexing over channels. 66 A code block is associated with each @accept@, and the method that is @accept@ed first has its corresponding code block run after the task unblocks. 75 67 In this way the select statement in Ada provides rendezvous points for threads, rather than providing some resource through message passing. 76 68 The select statement in Ada also supports an optional timeout with the same semantics as select(2), and provides an @else@. … … 83 75 Go provides a timeout utility and also provides a @default@ clause which has the same semantics as Ada's @else@ clause. 84 76 85 \uC provides \gls{synch_multiplex} over futures with their @_Select@ statement and Ada-style \gls{synch_multiplex} over monitor and taskmethods with their @_Accept@ statement~\cite{uC++}.77 \uC provides \gls{synch_multiplex} over futures with their @_Select@ statement and Ada-style \gls{synch_multiplex} over monitor methods with their @_Accept@ statement~\cite{uC++}. 86 78 Their @_Accept@ statement builds upon the select statement offered by Ada, by offering both @and@ and @or@ semantics, which can be used together in the same statement. 87 79 These semantics are also supported for \uC's @_Select@ statement. … … 89 81 90 82 There are many other languages that provide \gls{synch_multiplex}, including Rust's @select!@ over futures~\cite{rust:select}, OCaml's @select@ over channels~\cite{ocaml:channel}, and C++14's @when_any@ over futures~\cite{cpp:whenany}. 91 Note that while C++14 and Rust provide \gls{synch_multiplex}, their impleme ntations leave much to be desired as they both rely on busy-waiting polling to wait on multiple resources.83 Note that while C++14 and Rust provide \gls{synch_multiplex}, their implemetations leave much to be desired as they both rely on busy-waiting polling to wait on multiple resources. 92 84 93 85 \section{Other Approaches to Synchronous Multiplexing} 94 86 To avoid the need for \gls{synch_multiplex}, all communication between threads/processes has to come from a single source. 95 87 One key example is Erlang, in which each process has a single heterogenous mailbox that is the sole source of concurrent communication, removing the need for \gls{synch_multiplex} as there is only one place to wait on resources. 96 In a similar vein, actor systems circumvent the \gls{synch_multiplex} problem as actors are traditionally non-blocking, so they will never block in a behaviour andonly block when waiting for the next message.88 In a similar vein, actor systems circumvent the \gls{synch_multiplex} problem as actors are traditionally non-blocking, so they will only block when waiting for the next message. 97 89 While these approaches solve the \gls{synch_multiplex} problem, they introduce other issues. 98 90 Consider the case where a thread has a single source of communication (like erlang and actor systems) wants one of a set of @N@ resources. … … 105 97 The new \CFA \gls{synch_multiplex} utility introduced in this work is the @waituntil@ statement. 106 98 There is a @waitfor@ statement in \CFA that supports Ada-style \gls{synch_multiplex} over monitor methods, so this @waituntil@ focuses on synchronizing over other resources. 107 All of the \gls{synch_multiplex} features mentioned so far are monomorphic, only supporting one resource to wait on :select(2) supports file descriptors, Go's select supports channel operations, \uC's select supports futures, and Ada's select supports monitor method calls.99 All of the \gls{synch_multiplex} features mentioned so far are monomorphic, only supporting one resource to wait on, select(2) supports file descriptors, Go's select supports channel operations, \uC's select supports futures, and Ada's select supports monitor method calls. 108 100 The waituntil statement in \CFA is polymorphic and provides \gls{synch_multiplex} over any objects that satisfy the trait in Figure~\ref{f:wu_trait}. 109 101 No other language provides a synchronous multiplexing tool polymorphic over resources like \CFA's waituntil. 102 All others them tie themselves to some specific type of resource. 110 103 111 104 \begin{figure} … … 128 121 129 122 Currently locks, channels, futures and timeouts are supported by the waituntil statement, but this will be expanded as other use cases arise. 130 The @waituntil@ statement supports guarded clauses, like Ada, and Occam, supports both @or@, and @and@ semantics, like \uC, and provides an @else@ for asynchronous multiplexing. An example of \CFA waituntil usage is shown in Figure~\ref{f:wu_example}. In Figure~\ref{f:wu_example} the waituntil statement is waiting for either @Lock@ to be available or for a value to be read from @Channel@ into @i@ and for @Future@ to be fulfilled.123 The waituntil statement supports guarded clauses, like Ada, and Occam, supports both @or@, and @and@ semantics, like \uC, and provides an @else@ for asynchronous multiplexing. An example of \CFA waituntil usage is shown in Figure~\ref{f:wu_example}. In Figure~\ref{f:wu_example} the waituntil statement is waiting for either @Lock@ to be available or for a value to be read from @Channel@ into @i@ and for @Future@ to be fulfilled. The semantics of the waituntil statement will be discussed in detail in the next section. 131 124 132 125 \begin{figure} … … 147 140 \section{Waituntil Semantics} 148 141 There are two parts of the waituntil semantics to discuss, the semantics of the statement itself, \ie @and@, @or@, @when@ guards, and @else@ semantics, and the semantics of how the waituntil interacts with types like channels, locks and futures. 142 To start, the semantics of the statement itself will be discussed. 149 143 150 144 \subsection{Waituntil Statement Semantics} 151 145 The @or@ semantics are the most straightforward and nearly match those laid out in the ALT statement from Occam, the clauses have an exclusive-or relationship where the first one to be available will be run and only one clause is run. 152 146 \CFA's @or@ semantics differ from ALT semantics in one respect, instead of randomly picking a clause when multiple are available, the clause that appears first in the order of clauses will be picked. 153 \eg in the following example, if @foo@ and @bar@ are both available, @foo@ will always be selected since it comes first in the order of @waituntil@clauses.147 \eg in the following example, if @foo@ and @bar@ are both available, @foo@ will always be selected since it comes first in the order of waituntil clauses. 154 148 \begin{cfa} 155 149 future(int) bar; … … 160 154 161 155 The @and@ semantics match the @and@ semantics used by \uC. 162 When multiple clauses are joined by @and@, the @waituntil@will make a thread wait for all to be available, but will run the corresponding code blocks \emph{as they become available}.156 When multiple clauses are joined by @and@, the waituntil will make a thread wait for all to be available, but will run the corresponding code blocks \emph{as they become available}. 163 157 As @and@ clauses are made available, the thread will be woken to run those clauses' code blocks and then the thread will wait again until all clauses have been run. 164 158 This allows work to be done in parallel while synchronizing over a set of resources, and furthermore gives a good reason to use the @and@ operator. … … 174 168 175 169 The guards in the waituntil statement are called @when@ clauses. 176 Each the boolean expression inside a @when@ is evaluated once before the waituntil statement is run. 170 The @when@ clause is passed a boolean expression. 171 All the @when@ boolean expressions are evaluated before the waituntil statement is run. 177 172 The guards in Occam's ALT effectively toggle clauses on and off, where a clause will only be evaluated and waited on if the corresponding guard is @true@. 178 173 The guards in the waituntil statement operate the same way, but require some nuance since both @and@ and @or@ operators are supported. 179 This will be discussed further in Section~\ref{s:wu_guards}.180 174 When a guard is false and a clause is removed, it can be thought of as removing that clause and its preceding operator from the statement. 181 175 \eg in the following example the two waituntil statements are semantically the same. … … 196 190 The waituntil statement expects types to register and unregister themselves via calls to @register_select@ and @unregister_select@ respectively. 197 191 When a resource becomes available, @on_selected@ is run. 198 Many types do not need @on_selected@, but it is provided since some types may need to perform some work or checks before the resource can be accessed in the code block.192 Many types may not need @on_selected@, but it is provided since some types may need to check and set things before the resource can be accessed in the code block. 199 193 The register/unregister routines in the trait return booleans. 200 194 The return value of @register_select@ is @true@ if the resource is immediately available, and @false@ otherwise. … … 206 200 The waituntil statement is not inherently complex, and can be described as a few steps. 207 201 The complexity of the statement comes from the consideration of race conditions and synchronization needed when supporting various primitives. 208 The basic steps of the waituntil statement are the following: 209 210 \begin{enumerate}[topsep=5pt,itemsep=3pt,parsep=0pt] 211 212 \item 202 The basic steps that the waituntil statement follows are the following. 203 213 204 First the waituntil statement creates a @select_node@ per resource that is being waited on. 214 205 The @select_node@ is an object that stores the waituntil data pertaining to one of the resources. 215 216 \item217 206 Then, each @select_node@ is then registered with the corresponding resource. 218 219 \item 220 The thread executing the waituntil then enters a loop that will loop until the @waituntil@ statement's predicate is satisfied. 207 The thread executing the waituntil then enters a loop that will loop until the entire waituntil statement being satisfied. 221 208 In each iteration of the loop the thread attempts to block. 222 209 If any clauses are satified the block will fail and the thread will proceed, otherwise the block succeeds. 223 210 After proceeding past the block all clauses are checked for completion and the completed clauses have their code blocks run. 211 Once the thread escapes the loop, the @select_nodes@ are unregistered from the resources. 224 212 In the case where the block suceeds, the thread will be woken by the thread that marks one of the resources as available. 225 226 \item227 Once the thread escapes the loop, the @select_nodes@ are unregistered from the resources.228 \end{enumerate}229 213 Pseudocode detailing these steps is presented in the following code block. 214 230 215 \begin{cfa} 231 216 select_nodes s[N]; // N select nodes 232 217 for ( node in s ) 233 218 register_select( resource, node ); 234 while( statement predicatenot satisfied ) {219 while( statement not satisfied ) { 235 220 // try to block 236 221 for ( resource in waituntil statement ) … … 240 225 unregister_select( resource, node ); 241 226 \end{cfa} 242 These steps give a basic overview of how the statement works. 243 Digging into parts of the implementation will shed light on the specifics and provide more detail. 227 228 These steps give a basic, but mildly inaccurate overview of how the statement works. 229 Digging into some parts of the implementation will shed light on more of the specifics and provide some accuracy. 244 230 245 231 \subsection{Locks} 246 Locks are one of the resources supported by the @waituntil@statement.232 Locks are one of the resources supported in the waituntil statement. 247 233 When a thread waits on multiple locks via a waituntil, it enqueues a @select_node@ in each of the lock's waiting queues. 248 234 When a @select_node@ reaches the front of the queue and gains ownership of a lock, the blocked thread is notified. … … 262 248 \end{cfa} 263 249 264 The timeout implementation highlights a key part of the waituntil semantics, the expression inside a @waituntil()@ is evaluated once at the start of the @waituntil@ algorithm. 265 As such, calls to these @sleep@ and @timeout@ routines do not block, but instead return a type that supports the @is_selectable@ trait. 266 This feature leverages \CFA's ability to overload on return type; a call to @sleep@ outside a waituntil will call a different @sleep@ that does not return a type, which will block for the appropriate duration. 267 This mechanism of returning a selectable type is needed for types that want to support multiple operations such as channels that allow both reading and writing. 250 The timeout implementation highlights a key part of the waituntil semantics, the expression is evaluated before the waituntil runs. 251 As such calls to @sleep@ and @timeout@ do not block, but instead return a type that supports the @is_selectable@ trait. 252 This mechanism is needed for types that want to support multiple operations such as channels that support reading and writing. 268 253 269 254 \subsection{Channels}\label{s:wu_chans} 270 To support both waiting on both reading and writing to channels, the op erators @?<<?@ and @?>>?@ are used read and write to a channel respectively, where the lefthand operand is the value being read into/writtenand the righthand operand is the channel.271 Channels require significant complexity to synchronously multiplexon for a few reasons.272 First, reading or writing to a channel is a mutating operation; 273 If a read or write to a channel occurs, the state of the channel has changed.274 In comparison, for standard locks and futures, if a lock is acquired then released or a future is ready but not accessed, the state of the lock and the future is not permanentlymodified.275 In this way , a waituntil over locks or futures that completes with resources available but not consumedis not an issue.255 To support both waiting on both reading and writing to channels, the opperators @?<<?@ and @?>>?@ are used to show reading and writing to a channel respectively, where the lefthand operand is the value and the righthand operand is the channel. 256 Channels require significant complexity to wait on for a few reasons. 257 The first reason is that reading or writing to a channel is a mutating operation. 258 What this means is that if a read or write to a channel occurs, the state of the channel has changed. 259 In comparison, for standard locks and futures, if a lock is acquired then released or a future is ready but not accessed, the states of the lock and the future are not modified. 260 In this way if a waituntil over locks or futures have some resources available that were not consumed, it is not an issue. 276 261 However, if a thread modifies a channel on behalf of a thread blocked on a waituntil statement, it is important that the corresponding waituntil code block is run, otherwise there is a potentially erroneous mismatch between the channel state and associated side effects. 277 262 As such, the @unregister_select@ routine has a boolean return that is used by channels to indicate when the operation was completed but the block was not run yet. 278 When the return is @true@, the corresponding code block is run afterthe unregister.279 Furthermore , if both @and@ and @or@ operators are used, the @or@ operators have to stop behaving like exclusive-or semantics due to the race between channel operations and unregisters.263 As such some channel code blocks may be run as part of the unregister. 264 Furthermore if there are both @and@ and @or@ operators, the @or@ operators stop behaving like exclusive-or semantics since this race between operations and unregisters exists. 280 265 281 266 It was deemed important that exclusive-or semantics were maintained when only @or@ operators were used, so this situation has been special-cased, and is handled by having all clauses race to set a value \emph{before} operating on the channel. … … 294 279 However, due to TOCTOU issues, one cannot know that all resources are available without acquiring all the internal locks of channels in the subtree. 295 280 This is not a good solution for two reasons. 296 It is possible that once all the locks are acquired th e subtree is not satisfied and the locksmust all be released.297 This would incur a high cost for signalling threads andheavily increase contention on internal channel locks.298 Furthermore, the @waituntil@statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible.281 It is possible that once all the locks are acquired that the subtree is not satisfied and they must all be released. 282 This would incur high cost for signalling threads and also heavily increase contention on internal channel locks. 283 Furthermore, the waituntil statement is polymorphic and can support resources that do not have internal locks, which also makes this approach infeasible. 299 284 As such, the exclusive-or semantics are lost when using both @and@ and @or@ operators since they can not be supported without significant complexity and hits to waituntil statement performance. 300 285 301 Channels introduce another interesting consideration in their implementation.302 Supporting both reading and writing to a channel in A @waituntil@ means that one @waituntil@ clause may be the notifier for another @waituntil@ clause. 303 This poses a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel.304 When both a special-case @or@ is inserting to a channel on one thread and another thread is blocked in a special-case @or@ consuming from the same channel there is not one but two races that need to be consolidated by the inserting thread.305 (This race can also occur in the mirrored case with a blocked producer and signalling consumer.) 306 For the producing thread to know that the insert succeeded, they need to win the race for their own waituntil and win the race for the other waituntil. 307 286 The mechanism by which the predicate of the waituntil is checked is discussed in more detail in Section~\ref{s:wu_guards}. 287 288 Another consideration introduced by channels is that supporting both reading and writing to a channel in a waituntil means that one waituntil clause may be the notifier for another waituntil clause. 289 This becomes a problem when dealing with the special-cased @or@ where the clauses need to win a race to operate on a channel. 290 When you have both a special-case @or@ inserting on one thread and another special-case @or@ consuming is blocked on another thread there is not one but two races that need to be consolidated by the inserting thread. 291 (The race can occur in the opposite case with a blocked producer and signalling consumer too.) 292 For them to know that the insert succeeded, they need to win the race for their own waituntil and win the race for the other waituntil. 308 293 Go solves this problem in their select statement by acquiring the internal locks of all channels before registering the select on the channels. 309 294 This eliminates the race since no other threads can operate on the blocked channel since its lock will be held. 295 310 296 This approach is not used in \CFA since the waituntil is polymorphic. 311 297 Not all types in a waituntil have an internal lock, and when using non-channel types acquiring all the locks incurs extra uneeded overhead. 312 298 Instead this race is consolidated in \CFA in two phases by having an intermediate pending status value for the race. 313 This racecase is detectable, and if detected the thread attempting to signal will first race to set the race flag to be pending.299 This case is detectable, and if detected the thread attempting to signal will first race to set the race flag to be pending. 314 300 If it succeeds, it then attempts to set the consumer's race flag to its success value. 315 301 If the producer successfully sets the consumer race flag, then the operation can proceed, if not the signalling thread will set its own race flag back to the initial value. … … 325 311 In \uC and \CFA, their \gls{synch_multiplex} utilities involve both an @and@ and @or@ operator, which make the problem of checking for completion of the statement more difficult. 326 312 327 In the \uC @_Select@ statement, th is problem is solved by constructing a tree of the resources, where the internal nodes are operators and the leaves are booleans storing the state of each resource.328 The internal nodes also store the status es of the twosubtrees beneath them.329 When resources become available, their corresponding leaf node status is modified and then percolates up into the internal nodes toupdate the state of the statement.313 In the \uC @_Select@ statement, they solve this problem by constructing a tree of the resources, where the internal nodes are operators and the leafs are the resources. 314 The internal nodes also store the status of each of the subtrees beneath them. 315 When resources become available, their status is modified and the status of the leaf nodes percolate into the internal nodes update the state of the statement. 330 316 Once the root of the tree has both subtrees marked as @true@ then the statement is complete. 331 As an optimization, when the internal nodes are updated, their subtrees marked as @true@ are pruned and are not touched again.332 To support statement guards in \uC, the tree prunes a branch if the correspondingguard is false.317 As an optimization, when the internal nodes are updated, their subtrees marked as @true@ are effectively pruned and are not touched again. 318 To support \uC's @_Select@ statement guards, the tree prunes the branch if the guard is false. 333 319 334 320 The \CFA waituntil statement blocks a thread until a set of resources have become available that satisfy the underlying predicate. 335 321 The waiting condition of the waituntil statement can be represented as a predicate over the resources, joined by the waituntil operators, where a resource is @true@ if it is available, and @false@ otherwise. 336 322 In \CFA, this representation is used as the mechanism to check if a thread is done waiting on the waituntil. 337 Leveraging the compiler, a predicate routine is generated per waituntil that when passed the statuses of the resources, returns @true@ when the waituntil is done, and false otherwise.338 To support guards on the \CFA waituntil statement, the status of a resource disabled by a guard is set to a boolean value that ensuresthat the predicate function behaves as if that resource is no longer part of the predicate.339 340 \uC's @_Select@, supports operators both inside and outside of the clauses.323 Leveraging the compiler, a routine is generated per waituntil that is passed the statuses of the resources and returns a boolean that is @true@ when the waituntil is done, and false otherwise. 324 To support guards on the \CFA waituntil statement, the status of a resource disabled by a guard is set to ensure that the predicate function behaves as if that resource is no longer part of the predicate. 325 326 In \uC's @_Select@, it supports operators both inside and outside the clauses of their statement. 341 327 \eg in the following example the code blocks will run once their corresponding predicate inside the round braces is satisfied. 342 328 … … 349 335 350 336 This is more expressive that the waituntil statement in \CFA. 351 In \CFA, since the waituntil statement supports more resources than just futures, impl ementing operators inside clauses was avoided for a few reasons.352 As a motivatingexample, suppose \CFA supported operators inside clauses and consider the code snippet in Figure~\ref{f:wu_inside_op}.337 In \CFA, since the waituntil statement supports more resources than just futures, implmenting operators inside clauses was avoided for a few reasons. 338 As an example, suppose \CFA supported operators inside clauses and consider the code snippet in Figure~\ref{f:wu_inside_op}. 353 339 354 340 \begin{figure} … … 363 349 364 350 If the waituntil in Figure~\ref{f:wu_inside_op} works with the same semantics as described and acquires each lock as it becomes available, it opens itself up to possible deadlocks since it is now holding locks and waiting on other resources. 365 Other semantics would be needed to ensure that this operation is safe.351 As such other semantics would be needed to ensure that this operation is safe. 366 352 One possibility is to use \CC's @scoped_lock@ approach that was described in Section~\ref{s:DeadlockAvoidance}, however the potential for livelock leaves much to be desired. 367 353 Another possibility would be to use resource ordering similar to \CFA's @mutex@ statement, but that alone is not sufficient if the resource ordering is not used everywhere. … … 370 356 If all the locks are available, it becomes complex to both respect the ordering of the waituntil in Figure~\ref{f:wu_inside_op} when choosing which code block to run and also respect the lock ordering of @D@, @B@, @C@, @A@ at the same time. 371 357 One other way this could be implemented is to wait until all resources for a given clause are available before proceeding to acquire them, but this also quickly becomes a poor approach. 372 This approach won't work due to TOCTOU issues ;it is not possible to ensure that the full set resources are available without holding them all first.358 This approach won't work due to TOCTOU issues, as it is not possible to ensure that the full set resources are available without holding them all first. 373 359 Operators inside clauses in \CFA could potentially be implemented with careful circumvention of the problems involved, but it was not deemed an important feature when taking into account the runtime cost that would need to be paid to handle these situations. 374 360 The problem of operators inside clauses also becomes a difficult issue to handle when supporting channels. 375 If internal operators were supported, it would require some way to ensure that channels usedwith internal operators are modified on if and only if the corresponding code block is run, but that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}.361 If internal operators were supported, it would require some way to ensure that channels with internal operators are modified on if and only if the corresponding code block is run, but that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}. 376 362 377 363 \section{Waituntil Performance} … … 380 366 The similar utilities discussed at the start of this chapter in C, Ada, Rust, \CC, and OCaml are either not meaningful or feasible to benchmark against. 381 367 The select(2) and related utilities in C are not comparable since they are system calls that go into the kernel and operate on file descriptors, whereas the waituntil exists solely in userspace. 382 Ada's @select@ only operates on methods, which is done in \CFA via the @waitfor@ utility so it is not meaningfulto benchmark against the @waituntil@, which cannot wait on the same resource.383 Rust and \CC only offer a busy-wait based approach which is not comparable to a blocking approach.384 OCaml's @select@ waits on channels that are not comparable with \CFA and Go channels, so OCaml @select@ is not benchmarked againstGo's @select@ and \CFA's @waituntil@.385 Given the differences in features, polymorphism, and expressibility between @waituntil@and @select@, and @_Select@, the aim of the microbenchmarking in this chapter is to show that these implementations lie in the same realm of performance, not to pick a winner.368 Ada's @select@ only operates on methods, which is done in \CFA via the @waitfor@ utility so it is not feasible to benchmark against the @waituntil@, which cannot wait on the same resource. 369 Rust and \CC only offer a busy-wait based approach which is not meaningly comparable to a blocking approach. 370 OCaml's @select@ waits on channels that are not comparable with \CFA and Go channels, which makes the OCaml @select@ infeasible to compare it with Go's @select@ and \CFA's @waituntil@. 371 Given the differences in features, polymorphism, and expressibility between the waituntil and @select@, and @_Select@, the aim of the microbenchmarking in this chapter is to show that these implementations lie in the same realm of performance, not to pick a winner. 386 372 387 373 \subsection{Channel Benchmark} -
doc/theses/colby_parsons_MMAth/thesis.tex
r4c2e561 r3397eed 206 206 \input{waituntil} 207 207 208 \input{conclusion}209 210 208 %---------------------------------------------------------------------- 211 209 % END MATERIAL -
src/Common/ScopedMap.h
r4c2e561 r3397eed 199 199 friend class ScopedMap; 200 200 friend class const_iterator; 201 typedef typename MapType::iterator wrapped_iterator; 202 typedef typename ScopeList::size_type size_type; 201 typedef typename ScopedMap::MapType::iterator wrapped_iterator; 202 typedef typename ScopedMap::ScopeList scope_list; 203 typedef typename scope_list::size_type size_type; 203 204 204 205 /// Checks if this iterator points to a valid item … … 219 220 } 220 221 221 iterator( ScopeList & _scopes, const wrapped_iterator & _it, size_type inLevel)222 iterator(scope_list & _scopes, const wrapped_iterator & _it, size_type inLevel) 222 223 : scopes(&_scopes), it(_it), level(inLevel) {} 223 224 public: … … 265 266 266 267 private: 267 ScopeList *scopes;268 scope_list *scopes; 268 269 wrapped_iterator it; 269 270 size_type level; -
src/Concurrency/KeywordsNew.cpp
r4c2e561 r3397eed 534 534 void ConcurrentSueKeyword::addGetRoutines( 535 535 const ast::ObjectDecl * field, const ast::FunctionDecl * forward ) { 536 // Clone the signature and then build the body.537 ast::FunctionDecl * decl = ast::deepCopy( forward );538 539 536 // Say it is generated at the "same" places as the forward declaration. 540 const CodeLocation & location = decl->location;541 542 const ast::DeclWithType * param = decl->params.front();537 const CodeLocation & location = forward->location; 538 539 const ast::DeclWithType * param = forward->params.front(); 543 540 ast::Stmt * stmt = new ast::ReturnStmt( location, 544 541 new ast::AddressExpr( location, … … 554 551 ); 555 552 553 ast::FunctionDecl * decl = ast::deepCopy( forward ); 556 554 decl->stmts = new ast::CompoundStmt( location, { stmt } ); 557 555 declsToAddAfter.push_back( decl ); -
src/GenPoly/ErasableScopedMap.h
r4c2e561 r3397eed 57 57 /// Starts a new scope 58 58 void beginScope() { 59 scopes.emplace_back(); 59 Scope scope; 60 scopes.push_back(scope); 60 61 } 61 62 … … 144 145 public std::iterator< std::bidirectional_iterator_tag, value_type > { 145 146 friend class ErasableScopedMap; 146 typedef typename Scope::iterator wrapped_iterator; 147 typedef typename ScopeList::size_type size_type; 147 typedef typename std::map< Key, Value >::iterator wrapped_iterator; 148 typedef typename std::vector< std::map< Key, Value > > scope_list; 149 typedef typename scope_list::size_type size_type; 148 150 149 151 /// Checks if this iterator points to a valid item -
src/GenPoly/ScopedSet.h
r4c2e561 r3397eed 47 47 /// Starts a new scope 48 48 void beginScope() { 49 scopes.emplace_back(); 49 Scope scope; 50 scopes.push_back(scope); 50 51 } 51 52 … … 84 85 iterator findNext( const_iterator &it, const Value &key ) { 85 86 if ( it.i == 0 ) return end(); 86 for ( size_type i = it.i - 1; ; --i ) {87 for ( size_type i = it.i - 1; ; --i ) { 87 88 typename Scope::iterator val = scopes[i].find( key ); 88 89 if ( val != scopes[i].end() ) return iterator( scopes, val, i ); … … 111 112 friend class ScopedSet; 112 113 friend class const_iterator; 113 typedef typename Scope::iterator wrapped_iterator; 114 typedef typename ScopeList::size_type size_type; 114 typedef typename std::set< Value >::iterator wrapped_iterator; 115 typedef typename std::vector< std::set< Value > > scope_list; 116 typedef typename scope_list::size_type size_type; 115 117 116 118 /// Checks if this iterator points to a valid item … … 131 133 } 132 134 133 iterator( ScopeList const &_scopes, const wrapped_iterator &_it, size_type _i)135 iterator(scope_list const &_scopes, const wrapped_iterator &_it, size_type _i) 134 136 : scopes(&_scopes), it(_it), i(_i) {} 135 137 public: … … 174 176 175 177 private: 176 ScopeList const *scopes;178 scope_list const *scopes; 177 179 wrapped_iterator it; 178 180 size_type i; … … 183 185 public std::iterator< std::bidirectional_iterator_tag, value_type > { 184 186 friend class ScopedSet; 185 typedef typename Scope::iterator wrapped_iterator; 186 typedef typename Scope::const_iterator wrapped_const_iterator; 187 typedef typename ScopeList::size_type size_type; 187 typedef typename std::set< Value >::iterator wrapped_iterator; 188 typedef typename std::set< Value >::const_iterator wrapped_const_iterator; 189 typedef typename std::vector< std::set< Value > > scope_list; 190 typedef typename scope_list::size_type size_type; 188 191 189 192 /// Checks if this iterator points to a valid item … … 204 207 } 205 208 206 const_iterator( ScopeList const &_scopes, const wrapped_const_iterator &_it, size_type _i)209 const_iterator(scope_list const &_scopes, const wrapped_const_iterator &_it, size_type _i) 207 210 : scopes(&_scopes), it(_it), i(_i) {} 208 211 public: … … 252 255 253 256 private: 254 ScopeList const *scopes;257 scope_list const *scopes; 255 258 wrapped_const_iterator it; 256 259 size_type i; -
src/SymTab/FixFunction.cc
r4c2e561 r3397eed 109 109 110 110 const ast::DeclWithType * postvisit( const ast::FunctionDecl * func ) { 111 // Cannot handle cases with asserions. 112 assert( func->assertions.empty() ); 113 return new ast::ObjectDecl{ 114 func->location, func->name, new ast::PointerType( func->type ), nullptr, 111 return new ast::ObjectDecl{ 112 func->location, func->name, new ast::PointerType{ func->type }, nullptr, 115 113 func->storage, func->linkage, nullptr, copy( func->attributes ) }; 116 114 } … … 119 117 120 118 const ast::Type * postvisit( const ast::ArrayType * array ) { 121 return new ast::PointerType{ 122 array->base, array->dimension, array->isVarLen, array->isStatic, 119 return new ast::PointerType{ 120 array->base, array->dimension, array->isVarLen, array->isStatic, 123 121 array->qualifiers }; 124 122 }
Note:
See TracChangeset
for help on using the changeset viewer.