Context Navigation

← Previous Change
Next Change →

waituntil.tex

Timestamp:

Sep 5, 2023, 6:08:28 PM (3 years ago)

Author:

Michael Brooks <mlbrooks@…>

Branches:

master, stuck-waitfor-destruct

Children:

1fc111c

Parents:

737988b (diff), aae9c17 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

File:

: 1 edited

doc/theses/colby_parsons_MMAth/text/waituntil.tex (modified) (11 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/theses/colby_parsons_MMAth/text/waituntil.tex

-              r737988b
+              refe39bb
 When discussing \gls{synch_multiplex} implementations, the resource being multiplexed is important.
 While CSP waits on channels, the earliest known implementation of synch\-ronous multiplexing is Unix's @select@~\cite{linux:select}, multiplexing over file descriptors.
+While CSP waits on channels, the earliest known implementation of synch\-ronous multiplexing is Unix's @select@~\cite{linux:select}, multiplexing over file descriptors, which conceptually differ from channels in name only.
 The @select@ system-call is passed three sets of file descriptors (read, write, exceptional) to wait on and an optional timeout.
 @select@ blocks until either some subset of file descriptors are available or the timeout expires.
 …
 Figure~\ref{f:BB_Ada} shows the outline of a bounded buffer implemented with an Ada task.
 Note, a task method is associated with the \lstinline[language=ada]{accept} clause of the \lstinline[language=ada]{select} statement, rather than being a separate routine.
+Note that a task method is associated with the \lstinline[language=ada]{accept} clause of the \lstinline[language=ada]{select} statement, rather than being a separate routine.
 The thread executing the loop in the task body blocks at the \lstinline[language=ada]{select} until a call occurs to @insert@ or @remove@.
 Then the appropriate \lstinline[language=ada]{accept} method is run with the called arguments.
 …
 The @waituntil@ statement supports guard clauses, both @or@ and @and@ semantics, and timeout and @else@ for asynchronous multiplexing.
 Figure~\ref{f:wu_example} shows a \CFA @waituntil@ usage, which is waiting for either @Lock@ to be available \emph{or} for a value to be read from @Channel@ into @i@ \emph{and} for @Future@ to be fulfilled \emph{or} a timeout of one second.
 Note, the expression inside a @waituntil@ clause is evaluated once at the start of the @waituntil@ algorithm.
+Note that the expression inside a @waituntil@ clause is evaluated once at the start of the @waituntil@ algorithm.
 \section{Waituntil Semantics}
 …
 If the @and@ operator waited for all clauses to be available before running, it is the same as just acquiring those resources consecutively by a sequence of @waituntil@ statements.
 As for normal C expressions, the @and@ operator binds more tightly than the @or@.
+As with normal C expressions, the @and@ operator binds more tightly than the @or@.
 To give an @or@ operator higher precedence, parenthesis are used.
 For example, the following @waituntil@ unconditionally waits for @C@ and one of either @A@ or @B@, since the @or@ is given higher precedence via parenthesis.
 …
 \end{cfa}
 If only @C2@ is satisfied, \emph{both} timeout code-blocks trigger because 1 second occurs before 3 seconds.
 Note, the \CFA @waitfor@ statement only provides a single @timeout@ clause because it only supports @or@ semantics.
+Note that the \CFA @waitfor@ statement only provides a single @timeout@ clause because it only supports @or@ semantics.
 The \lstinline[deletekeywords={timeout}]{timeout} routine is different from UNIX @sleep@, which blocks for the specified duration and returns the amount of time elapsed since the call started.
 …
 Correspondingly, if thread 2 wins the race and sees that @A@ has a waiting removal, then thread 1 must execute the first clause of its @waituntil@ and thread 2 execute its second.
 Any other execution scenario is incorrect for exclusive-or semantics.
 Note, that priority execution of multiple satisfied @waituntil@ causes (\ie top to bottom) is not violated because, in this scenario, there is only one satisfied clause for either thread.
+Note that priority execution of multiple satisfied @waituntil@ causes (\ie top to bottom) is not violated because, in this scenario, there is only one satisfied clause for either thread.
 The Go @select@ solves this problem by acquiring all the internal locks of the channels before registering the @select@ on the channels.
 …
 The waiting condition of the @waituntil@ statement is implemented as the complete predicate over the resources, joined by the @waituntil@ operators, where a resource is @true@ if it is available, and @false@ otherwise.
 This complete predicate is used as the mechanism to check if a thread is done waiting on a @waituntil@.
+Leveraging the compiler, a predicate routine is generated per @waituntil@ that when passes the statuses of the resources, returns @true@ when the @waituntil@ is done, and false otherwise.
+Leveraging the compiler, a predicate routine is generated per @waituntil@.
+This predicate routine accepts the statuses of the resources being waited on as arguments.
+A resource status is an integer that indicates whether a resource is either not available, available, or has already run its associated code block.
+The predicate routine returns @true@ when the @waituntil@ is done, \ie enough resources have run their associated code blocks to satisfy the @waituntil@'s predicate, and false otherwise.
 To support guards on the \CFA @waituntil@ statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate.
 The generated code allows the predicate that is checked with each iteration to be simplified to not check guard values.
 …
 It would require some way to ensure channels used with internal operators are modified, if and only if, the corresponding code block is run.
 However, that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}.
 Ultimately, I did not deemed this feature to be crucial enough when taking into account the complexity and runtime cost.
+Ultimately, this feature was not considered crucial after taking into account the complexity and runtime cost.
 \subsection{The full \lstinline{waituntil} picture}
 …
 The channel multiplexing benchmarks compare \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}, where the resource being waited on is a set of channels.
+Although Unix's select, poll and epoll multiplex over file descriptors, which are effectively channels, these utilities are not included in the benchmarks.
+Reading or writing to a file descriptor requires a system call which is much more expensive than operating on a user-space channel.
+As such, it is infeasible to compare Unix's select, poll and epoll with \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}.
 The basic structure of the benchmark has the number of cores split evenly between producer and consumer threads, \ie, with 8 cores there are 4 producer and 4 consumer threads.
 The number of resource clauses $C$ is also varied across 2, 4, and 8 clauses, where each clause has a different channel that it waits on.
 …
 Both Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} have similar results when comparing \lstinline[language=Go]{select} and @waituntil@.
 In the AMD benchmarks (left column), the performance is very similar as the number of cores scale.
 The AMD machine has a high-caching contention cost because of its \emph{chicklet} L3 cache (\ie many L3 caches servicing a small number of cores), which creates a bottleneck on the channel locks and dominates the shape of the performance curve for both \CFA and Go.
+The AMD machine has a high-caching contention cost because of its \emph{chiplet} L3 cache (\ie many L3 caches servicing a small number of cores), which creates a bottleneck on the channel locks and dominates the shape of the performance curve for both \CFA and Go.
 Hence, it is difficult to detect differences in the \gls{synch_multiplex}, except at low cores, where Go has significantly better performance, due to an optimization in its scheduler.
 Go heavily optimizes thread handoffs on the local run-queue, which can result in very good performance for low numbers of threads parking/unparking each other~\cite{go:sched}.
 …
 \end{cquote}
 In Go, this setup results in significantly worse performance since @P2@ and @C2@ cannot operate in parallel with @P1@ and @C1@ due to all locks being acquired.
 Interesting, this case may not be as pathological as it seems.
+Interestingly, this case may not be as pathological as it seems.
 If the set of channels belonging to a \lstinline[language=Go]{select} have channels that overlap with the set of another \lstinline[language=Go]{select}, these statements lose the ability to operate in parallel.
 The implementation in \CFA only holds a single lock at a time, resulting in better locking granularity, and hence, more parallelism.

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset efe39bb for doc/theses/colby_parsons_MMAth/text/waituntil.tex

Legend:

doc/theses/colby_parsons_MMAth/text/waituntil.tex

Download in other formats: