Ignore:
Timestamp:
Sep 5, 2023, 6:08:28 PM (3 years ago)
Author:
Michael Brooks <mlbrooks@…>
Branches:
master, stuck-waitfor-destruct
Children:
1fc111c
Parents:
737988b (diff), aae9c17 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/colby_parsons_MMAth/text/waituntil.tex

    r737988b refe39bb  
    103103
    104104When discussing \gls{synch_multiplex} implementations, the resource being multiplexed is important.
    105 While CSP waits on channels, the earliest known implementation of synch\-ronous multiplexing is Unix's @select@~\cite{linux:select}, multiplexing over file descriptors.
     105While CSP waits on channels, the earliest known implementation of synch\-ronous multiplexing is Unix's @select@~\cite{linux:select}, multiplexing over file descriptors, which conceptually differ from channels in name only.
    106106The @select@ system-call is passed three sets of file descriptors (read, write, exceptional) to wait on and an optional timeout.
    107107@select@ blocks until either some subset of file descriptors are available or the timeout expires.
     
    155155
    156156Figure~\ref{f:BB_Ada} shows the outline of a bounded buffer implemented with an Ada task.
    157 Note, a task method is associated with the \lstinline[language=ada]{accept} clause of the \lstinline[language=ada]{select} statement, rather than being a separate routine.
     157Note that a task method is associated with the \lstinline[language=ada]{accept} clause of the \lstinline[language=ada]{select} statement, rather than being a separate routine.
    158158The thread executing the loop in the task body blocks at the \lstinline[language=ada]{select} until a call occurs to @insert@ or @remove@.
    159159Then the appropriate \lstinline[language=ada]{accept} method is run with the called arguments.
     
    319319The @waituntil@ statement supports guard clauses, both @or@ and @and@ semantics, and timeout and @else@ for asynchronous multiplexing.
    320320Figure~\ref{f:wu_example} shows a \CFA @waituntil@ usage, which is waiting for either @Lock@ to be available \emph{or} for a value to be read from @Channel@ into @i@ \emph{and} for @Future@ to be fulfilled \emph{or} a timeout of one second.
    321 Note, the expression inside a @waituntil@ clause is evaluated once at the start of the @waituntil@ algorithm.
     321Note that the expression inside a @waituntil@ clause is evaluated once at the start of the @waituntil@ algorithm.
    322322
    323323\section{Waituntil Semantics}
     
    366366If the @and@ operator waited for all clauses to be available before running, it is the same as just acquiring those resources consecutively by a sequence of @waituntil@ statements.
    367367
    368 As for normal C expressions, the @and@ operator binds more tightly than the @or@.
     368As with normal C expressions, the @and@ operator binds more tightly than the @or@.
    369369To give an @or@ operator higher precedence, parenthesis are used.
    370370For example, the following @waituntil@ unconditionally waits for @C@ and one of either @A@ or @B@, since the @or@ is given higher precedence via parenthesis.
     
    523523\end{cfa}
    524524If only @C2@ is satisfied, \emph{both} timeout code-blocks trigger because 1 second occurs before 3 seconds.
    525 Note, the \CFA @waitfor@ statement only provides a single @timeout@ clause because it only supports @or@ semantics.
     525Note that the \CFA @waitfor@ statement only provides a single @timeout@ clause because it only supports @or@ semantics.
    526526
    527527The \lstinline[deletekeywords={timeout}]{timeout} routine is different from UNIX @sleep@, which blocks for the specified duration and returns the amount of time elapsed since the call started.
     
    624624Correspondingly, if thread 2 wins the race and sees that @A@ has a waiting removal, then thread 1 must execute the first clause of its @waituntil@ and thread 2 execute its second.
    625625Any other execution scenario is incorrect for exclusive-or semantics.
    626 Note, that priority execution of multiple satisfied @waituntil@ causes (\ie top to bottom) is not violated because, in this scenario, there is only one satisfied clause for either thread.
     626Note that priority execution of multiple satisfied @waituntil@ causes (\ie top to bottom) is not violated because, in this scenario, there is only one satisfied clause for either thread.
    627627
    628628The Go @select@ solves this problem by acquiring all the internal locks of the channels before registering the @select@ on the channels.
     
    715715The waiting condition of the @waituntil@ statement is implemented as the complete predicate over the resources, joined by the @waituntil@ operators, where a resource is @true@ if it is available, and @false@ otherwise.
    716716This complete predicate is used as the mechanism to check if a thread is done waiting on a @waituntil@.
    717 Leveraging the compiler, a predicate routine is generated per @waituntil@ that when passes the statuses of the resources, returns @true@ when the @waituntil@ is done, and false otherwise.
     717Leveraging the compiler, a predicate routine is generated per @waituntil@.
     718This predicate routine accepts the statuses of the resources being waited on as arguments.
     719A resource status is an integer that indicates whether a resource is either not available, available, or has already run its associated code block.
     720The predicate routine returns @true@ when the @waituntil@ is done, \ie enough resources have run their associated code blocks to satisfy the @waituntil@'s predicate, and false otherwise.
    718721To support guards on the \CFA @waituntil@ statement, the status of a resource disabled by a guard is set to a boolean value that ensures that the predicate function behaves as if that resource is no longer part of the predicate.
    719722The generated code allows the predicate that is checked with each iteration to be simplified to not check guard values.
     
    760763It would require some way to ensure channels used with internal operators are modified, if and only if, the corresponding code block is run.
    761764However, that is not feasible due to reasons described in the exclusive-or portion of Section~\ref{s:wu_chans}.
    762 Ultimately, I did not deemed this feature to be crucial enough when taking into account the complexity and runtime cost.
     765Ultimately, this feature was not considered crucial after taking into account the complexity and runtime cost.
    763766
    764767\subsection{The full \lstinline{waituntil} picture}
     
    828831
    829832The channel multiplexing benchmarks compare \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}, where the resource being waited on is a set of channels.
     833Although Unix's select, poll and epoll multiplex over file descriptors, which are effectively channels, these utilities are not included in the benchmarks.
     834Reading or writing to a file descriptor requires a system call which is much more expensive than operating on a user-space channel.
     835As such, it is infeasible to compare Unix's select, poll and epoll with \CFA's @waituntil@ and Go's \lstinline[language=Go]{select}.
    830836The basic structure of the benchmark has the number of cores split evenly between producer and consumer threads, \ie, with 8 cores there are 4 producer and 4 consumer threads.
    831837The number of resource clauses $C$ is also varied across 2, 4, and 8 clauses, where each clause has a different channel that it waits on.
     
    901907Both Figures~\ref{f:select_contend_bench} and~\ref{f:select_spin_bench} have similar results when comparing \lstinline[language=Go]{select} and @waituntil@.
    902908In the AMD benchmarks (left column), the performance is very similar as the number of cores scale.
    903 The AMD machine has a high-caching contention cost because of its \emph{chicklet} L3 cache (\ie many L3 caches servicing a small number of cores), which creates a bottleneck on the channel locks and dominates the shape of the performance curve for both \CFA and Go.
     909The AMD machine has a high-caching contention cost because of its \emph{chiplet} L3 cache (\ie many L3 caches servicing a small number of cores), which creates a bottleneck on the channel locks and dominates the shape of the performance curve for both \CFA and Go.
    904910Hence, it is difficult to detect differences in the \gls{synch_multiplex}, except at low cores, where Go has significantly better performance, due to an optimization in its scheduler.
    905911Go heavily optimizes thread handoffs on the local run-queue, which can result in very good performance for low numbers of threads parking/unparking each other~\cite{go:sched}.
     
    939945\end{cquote}
    940946In Go, this setup results in significantly worse performance since @P2@ and @C2@ cannot operate in parallel with @P1@ and @C1@ due to all locks being acquired.
    941 Interesting, this case may not be as pathological as it seems.
     947Interestingly, this case may not be as pathological as it seems.
    942948If the set of channels belonging to a \lstinline[language=Go]{select} have channels that overlap with the set of another \lstinline[language=Go]{select}, these statements lose the ability to operate in parallel.
    943949The implementation in \CFA only holds a single lock at a time, resulting in better locking granularity, and hence, more parallelism.
Note: See TracChangeset for help on using the changeset viewer.