Changeset 9317419


Ignore:
Timestamp:
May 23, 2023, 4:55:30 PM (18 months ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
ADT, ast-experimental, master
Children:
6c15d66, 6ece306, 8463136
Parents:
41639089 (diff), 76e77a4 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Location:
doc/theses/colby_parsons_MMAth
Files:
1 added
5 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/colby_parsons_MMAth/Makefile

    r41639089 r9317419  
    2222        text/mutex_stmt \
    2323        text/channels \
     24        text/waituntil \
    2425}
    2526
  • doc/theses/colby_parsons_MMAth/glossary.tex

    r41639089 r9317419  
    6060description={An implementation of the actor model.}
    6161}
     62
     63\newglossaryentry{synch_multiplex}
     64{
     65name=synchronous multiplexing,
     66description={synchronization on some subset of a set of resources.}
     67}
  • doc/theses/colby_parsons_MMAth/local.bib

    r41639089 r9317419  
    5555url={http://hdl.handle.net/10012/17617}
    5656}
     57
     58@article{Roscoe88,
     59  title={The laws of occam programming},
     60  author={Roscoe, Andrew William and Hoare, Charles Antony Richard},
     61  journal={Theoretical Computer Science},
     62  volume={60},
     63  number={2},
     64  pages={177--229},
     65  year={1988},
     66  publisher={Elsevier}
     67}
     68
     69@article{Pike84,
     70  title={The UNIX system: The blit: A multiplexed graphics terminal},
     71  author={Pike, Rob},
     72  journal={AT\&T Bell Laboratories Technical Journal},
     73  volume={63},
     74  number={8},
     75  pages={1607--1631},
     76  year={1984},
     77  publisher={Nokia Bell Labs}
     78}
     79
     80@inproceedings{Dice11,
     81  title={Brief announcement: multilane-a concurrent blocking multiset},
     82  author={Dice, David and Otenko, Oleksandr},
     83  booktitle={Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures},
     84  pages={313--314},
     85  year={2011}
     86}
     87
     88@misc{go:chan,
     89  author = "The Go Programming Language",
     90  title = "src/runtime/chan.go",
     91  howpublished = {\href{https://go.dev/src/runtime/chan.go}},
     92  note = "[Online; accessed 23-May-2023]"
     93}
     94
     95@misc{go:select,
     96  author = "The Go Programming Language",
     97  title = "src/runtime/chan.go",
     98  howpublished = {\href{https://go.dev/src/runtime/select.go}},
     99  note = "[Online; accessed 23-May-2023]"
     100}
     101
  • doc/theses/colby_parsons_MMAth/text/channels.tex

    r41639089 r9317419  
    1717Additionally all channel operations in CSP are synchronous (no buffering).
    1818Advanced channels as a programming language feature has been popularized in recent years by the language Go~\cite{Go}, which encourages the use of channels as its fundamental concurrent feature.
    19 It was the popularity of Go channels that lead me to implement them in \CFA.
    20 Neither Go nor \CFA channels have the restrictions in early channel-based concurrent systems.
     19It was the popularity of Go channels that lead to their implemention in \CFA.
     20Neither Go nor \CFA channels have the restrictions of the early channel-based concurrent systems.
    2121
    2222\section{Producer-Consumer Problem}
     
    6262Currently, only the Go programming language provides user-level threading where the primary communication mechanism is channels.
    6363Experiments were conducted that varied the producer-consumer problem algorithm and lock type used inside the channel.
    64 With the exception of non-\gls{fcfs} algorithms, no algorithm or lock usage in the channel implementation was found to be consistently more performant that Go's choice of algorithm and lock implementation.
     64With the exception of non-\gls{fcfs} or non-FIFO algorithms, no algorithm or lock usage in the channel implementation was found to be consistently more performant that Go's choice of algorithm and lock implementation.
     65Performance of channels can be improved by sharding the underlying buffer \cite{Dice11}.
     66In doing so the FIFO property is lost, which is undesireable for user-facing channels.
    6567Therefore, the low-level channel implementation in \CFA is largely copied from the Go implementation, but adapted to the \CFA type and runtime systems.
    6668As such the research contributions added by \CFA's channel implementation lie in the realm of safety and productivity features.
    6769
    68 \PAB{Discuss the Go channel implementation. Need to tie in FIFO buffer and FCFS locking.}
     70The Go channel implementation utilitizes cooperation between threads to achieve good performance~\cite{go:chan}.
     71The cooperation between threads only occurs when producers or consumers need to block due to the buffer being full or empty.
     72In these cases the blocking thread stores their relevant data in a shared location and the signalling thread will complete their operation before waking them.
     73This helps improve performance in a few ways.
     74First, each thread interacting with the channel with only acquire and release the internal channel lock exactly once.
     75This decreases contention on the internal lock, as only entering threads will compete for the lock since signalled threads never reacquire the lock.
     76The other advantage of the cooperation approach is that it eliminates the potential bottleneck of waiting for signalled threads.
     77The property of acquiring/releasing the lock only once can be achieved without cooperation by \Newterm{baton passing} the lock.
     78Baton passing is when one thread acquires a lock but does not release it, and instead signals a thread inside the critical section conceptually "passing" the mutual exclusion to the signalled thread.
     79While baton passing is useful in some algorithms, it results in worse performance than the cooperation approach in channel implementations since all entering threads then need to wait for the blocked thread to reach the front of the ready queue and run before other operations on the channel can proceed.
    6980
    7081In this work, all channel sizes \see{Sections~\ref{s:ChannelSize}} are implemented with bounded buffers.
     
    8899
    89100\subsection{Toggle-able Statistics}
    90 \PAB{Discuss toggle-able statistics.}
    91 
     101As discussed, a channel is a concurrent layer over a bounded buffer.
     102To achieve efficient buffering users should aim for as few blocking operations on a channel as possible.
     103Often to achieve this users may change the buffer size, shard a channel into multiple channels, or tweak the number of producer and consumer threads.
     104Fo users to be able to make informed decisions when tuning channel usage, toggle-able channel statistics are provided.
     105The statistics are toggled at compile time via the @CHAN_STATS@ macro to ensure that they are entirely elided when not used.
     106When statistics are turned on, four counters are maintained per channel, two for producers and two for consumers.
     107The two counters per type of operation track the number of blocking operations and total operations.
     108In the channel destructor the counters are printed out aggregated and also per type of operation.
     109An example use case of the counters follows.
     110A user is buffering information between producer and consumer threads and wants to analyze channel performance.
     111Via the statistics they see that producers block for a large percentage of their operations while consumers do not block often.
     112They then can use this information to adjust their number of producers/consumers or channel size to achieve a larger percentage of non-blocking producer operations, thus increasing their channel throughput.
    92113
    93114\subsection{Deadlock Detection}
    94 \PAB{Discuss deadlock detection.}
     115The deadlock detection in the \CFA channels is fairly basic.
     116It only detects the case where threads are blocked on the channel during deallocation.
     117This case is guaranteed to deadlock since the list holding the blocked thread is internal to the channel and will be deallocated.
     118If a user maintained a separate reference to a thread and unparked it outside the channel they could avoid the deadlock, but would run into other runtime errors since the thread would access channel data after waking that is now deallocated.
     119More robust deadlock detection surrounding channel usage would have to be implemented separate from the channel implementation since it would require knowledge about the threading system and other channel/thread state.
    95120
    96121\subsection{Program Shutdown}
    97 % The other safety and productivity feature of \CFA channels deals with concurrent termination.
    98122Terminating concurrent programs is often one of the most difficult parts of writing concurrent code, particularly if graceful termination is needed.
    99123The difficulty of graceful termination often arises from the usage of synchronization primitives that need to be handled carefully during shutdown.
     
    104128Thus, improperly handled \gls{toctou} issues with channels often result in deadlocks as threads trying to perform the termination may end up unexpectedly blocking in their attempt to help other threads exit the system.
    105129
    106 % C_TODO: add reference to select chapter, add citation to go channels info
    107 \paragraph{Go channels} provide a set of tools to help with concurrent shutdown.
     130\paragraph{Go channels} provide a set of tools to help with concurrent shutdown~\cite{go:chan}.
    108131Channels in Go have a @close@ operation and a \Go{select} statement that both can be used to help threads terminate.
    109 The \Go{select} statement is discussed in \ref{waituntil}, where \CFA's @waituntil@ statement is compared with the Go \Go{select} statement.
     132The \Go{select} statement is discussed in \ref{s:waituntil}, where \CFA's @waituntil@ statement is compared with the Go \Go{select} statement.
    110133
    111134The @close@ operation on a channel in Go changes the state of the channel.
    112135When a channel is closed, sends to the channel panic along with additional calls to @close@.
    113 Receives are handled differently where receivers never block on a closed channel and continue to remove elements from the channel.
     136Receives are handled differently.
     137Receivers (consumers) never block on a closed channel and continue to remove elements from the channel.
    114138Once a channel is empty, receivers can continue to remove elements, but receive the zero-value version of the element type.
    115139To avoid unwanted zero-value elements, Go provides the ability to iterate over a closed channel to remove the remaining elements.
     
    120144
    121145While Go's channel closing semantics are powerful enough to perform any concurrent termination needed by a program, their lack of ease of use leaves much to be desired.
    122 Since both closing and sending panic once a channel is closed, a user often has to synchronize the senders to a channel before the channel can be closed to avoid panics.
     146Since both closing and sending panic once a channel is closed, a user often has to synchronize the senders (producers) before the channel can be closed to avoid panics.
    123147However, in doing so it renders the @close@ operation nearly useless, as the only utilities it provides are the ability to ensure receivers no longer block on the channel and receive zero-valued elements.
    124148This functionality is only useful if the zero-typed element is recognized as a sentinel value, but if another sentinel value is necessary, then @close@ only provides the non-blocking feature.
     
    152176
    153177\section{\CFA / Go channel Examples}
    154 To highlight the differences between \CFA's and Go's close semantics, an example program is presented.
    155 The program is a barrier implemented using two channels shown in Figure~\ref{f:ChannelBarrierTermination}.
     178To highlight the differences between \CFA's and Go's close semantics, three examples will be presented.
     179The first example is a simple shutdown case, where there are producer threads and consumer threads operating on a channel for a fixed duration.
     180Once the duration ends, producers and consumers terminate without worrying about any leftover values in the channel.
     181The second example extends the first example by requiring the channel to be empty upon shutdown.
     182Both the first and second example are shown in Figure~\ref{f:ChannelTermination}.
     183
     184
     185First the Go solutions to these examples shown in Figure~\ref{l:go_chan_term} are discussed.
     186Since some of the elements being passed through the channel are zero-valued, closing the channel in Go does not aid in communicating shutdown.
     187Instead, a different mechanism to communicate with the consumers and producers needs to be used.
     188This use of an additional flag or communication method is common in Go channel shutdown code, since to avoid panics on a channel, the shutdown of a channel often has to be communicated with threads before it occurs.
     189In this example, a flag is used to communicate with producers and another flag is used for consumers.
     190Producers and consumers need separate avenues of communication both so that producers terminate before the channel is closed to avoid panicking, and to avoid the case where all the consumers terminate first, which can result in a deadlock for producers if the channel is full.
     191The producer flag is set first, then after producers terminate the consumer flag is set and the channel is closed.
     192In the second example where all values need to be consumed, the main thread iterates over the closed channel to process any remaining values.
     193
     194
     195In the \CFA solutions in Figure~\ref{l:cfa_chan_term}, shutdown is communicated directly to both producers and consumers via the @close@ call.
     196In the first example where all values do not need to be consumed, both producers and consumers do not handle the resumption and finish once they receive the termination exception.
     197The second \CFA example where all values must be consumed highlights how resumption is used with channel shutdown.
     198The @Producer@ thread-main knows to stop producing when the @insert@ call on a closed channel raises exception @channel_closed@.
     199The @Consumer@ thread-main knows to stop consuming after all elements of a closed channel are removed and the call to @remove@ would block.
     200Hence, the consumer knows the moment the channel closes because a resumption exception is raised, caught, and ignored, and then control returns to @remove@ to return another item from the buffer.
     201Only when the buffer is drained and the call to @remove@ would block, a termination exception is raised to stop consuming.
     202The \CFA semantics allow users to communicate channel shutdown directly through the channel, without having to share extra state between threads.
     203Additionally, when the channel needs to be drained, \CFA provides users with easy options for processing the leftover channel values in the main thread or in the consumer threads.
     204If one wishes to consume the leftover values in the consumer threads in Go, extra synchronization between the main thread and the consumer threads is needed.
     205
     206\begin{figure}
     207\centering
     208
     209\begin{lrbox}{\myboxA}
     210\begin{cfa}[aboveskip=0pt,belowskip=0pt]
     211channel( size_t ) Channel{ ChannelSize };
     212
     213thread Consumer {};
     214void main( Consumer & this ) {
     215    try {
     216        for ( ;; )
     217            remove( Channel );
     218    @} catchResume( channel_closed * ) { @
     219    // handled resume => consume from chan
     220    } catch( channel_closed * ) {
     221        // empty or unhandled resume
     222    }
     223}
     224
     225thread Producer {};
     226void main( Producer & this ) {
     227    size_t count = 0;
     228    try {
     229        for ( ;; )
     230            insert( Channel, count++ );
     231    } catch ( channel_closed * ) {
     232        // unhandled resume or full
     233    }
     234}
     235
     236int main( int argc, char * argv[] ) {
     237    Consumer c[Consumers];
     238    Producer p[Producers];
     239    sleep(Duration`s);
     240    close( Channel );
     241    return 0;
     242}
     243\end{cfa}
     244\end{lrbox}
     245
     246\begin{lrbox}{\myboxB}
     247\begin{cfa}[aboveskip=0pt,belowskip=0pt]
     248var cons_done, prod_done bool = false, false;
     249var prodJoin chan int = make(chan int, Producers)
     250var consJoin chan int = make(chan int, Consumers)
     251
     252func consumer( channel chan uint64 ) {
     253    for {
     254        if cons_done { break }
     255        <-channel
     256    }
     257    consJoin <- 0 // synch with main thd
     258}
     259
     260func producer( channel chan uint64 ) {
     261    var count uint64 = 0
     262    for {
     263        if prod_done { break }
     264        channel <- count++
     265    }
     266    prodJoin <- 0 // synch with main thd
     267}
     268
     269func main() {
     270    channel = make(chan uint64, ChannelSize)
     271    for j := 0; j < Consumers; j++ {
     272        go consumer( channel )
     273    }
     274    for j := 0; j < Producers; j++ {
     275        go producer( channel )
     276    }
     277    time.Sleep(time.Second * Duration)
     278    prod_done = true
     279    for j := 0; j < Producers ; j++ {
     280        <-prodJoin // wait for prods
     281    }
     282    cons_done = true
     283    close(channel) // ensure no cons deadlock
     284    @for elem := range channel { @
     285        // process leftover values
     286    @}@
     287    for j := 0; j < Consumers; j++{
     288        <-consJoin // wait for cons
     289    }
     290}
     291\end{cfa}
     292\end{lrbox}
     293
     294\subfloat[\CFA style]{\label{l:cfa_chan_term}\usebox\myboxA}
     295\hspace*{3pt}
     296\vrule
     297\hspace*{3pt}
     298\subfloat[Go style]{\label{l:go_chan_term}\usebox\myboxB}
     299\caption{Channel Termination Examples 1 and 2. Code specific to example 2 is highlighted.}
     300\label{f:ChannelTermination}
     301\end{figure}
     302
     303The final shutdown example uses channels to implement a barrier.
     304It is shown in Figure~\ref{f:ChannelBarrierTermination}.
     305The problem of implementing a barrier is chosen since threads are both producers and consumers on the barrier-internal channels, which removes the ability to easily synchronize producers before consumers during shutdown.
     306As such, while the shutdown details will be discussed with this problem in mind, they are also applicable to other problems taht have individual threads both producing and consuming from channels.
    156307Both of these examples are implemented using \CFA syntax so that they can be easily compared.
    157308Figure~\ref{l:cfa_chan_bar} uses \CFA-style channel close semantics and Figure~\ref{l:go_chan_bar} uses Go-style close semantics.
    158 In this problem it is infeasible to use the Go @close@ call since all threads are both potentially producers and consumers, causing panics on close to be unavoidable.
     309In this example it is infeasible to use the Go @close@ call since all threads are both potentially producers and consumers, causing panics on close to be unavoidable without complex synchronization.
    159310As such in Figure~\ref{l:go_chan_bar} to implement a flush routine for the buffer, a sentinel value of @-1@ has to be used to indicate to threads that they need to leave the barrier.
    160311This sentinel value has to be checked at two points.
    161312Furthermore, an additional flag @done@ is needed to communicate to threads once they have left the barrier that they are done.
    162 This use of an additional flag or communication method is common in Go channel shutdown code, since to avoid panics on a channel, the shutdown of a channel often has to be communicated with threads before it occurs.
     313
    163314In the \CFA version~\ref{l:cfa_chan_bar}, the barrier shutdown results in an exception being thrown at threads operating on it, which informs the threads that they must terminate.
    164315This avoids the need to use a separate communication method other than the barrier, and avoids extra conditional checks on the fast path of the barrier implementation.
     
    275426\end{figure}
    276427
    277 Listing~\ref{l:cfa_resume} is an example of a channel closing with resumption.
    278 The @Producer@ thread-main knows to stop producing when the @insert@ call on a closed channel raises exception @channel_closed@.
    279 The @Consumer@ thread-main knows to stop consuming after all elements of a closed channel are removed and the call to @remove@ would block.
    280 Hence, the consumer knows the moment the channel closes because a resumption exception is raised, caught, and ignored, and then control returns to @remove@ to return another item from the buffer.
    281 Only when the buffer is drained and the call to @removed@ would block is a termination exception raised to stop consuming.
    282 The same program in Go would require explicit synchronization among producers and consumers by a mechanism outside the channel to ensure all elements are removed before threads terminate.
    283 
    284 \begin{cfa}[caption={\CFA channel resumption usage},label={l:cfa_resume}]
    285 channel( int ) chan{ 128 };
    286 thread Producer {};
    287 void main( Producer & this ) {
    288         @try {@
    289                 for ( i; 0~$@$ )
    290                         insert( chan, i );
    291         @} catch( channel_closed * ) {}@                $\C[3in]{// channel closed}$
    292 }
    293 thread Consumer {};
    294 void main( Consumer & this ) {
    295         size_t runs = 0;
    296         @try {@
    297                 for () {
    298                         int i = remove( chan );
    299                 }
    300         @} catchResume( channel_closed * ) {}@  $\C{// remaining item in buffer \(\Rightarrow\) remove it}$
    301           @catch( channel_closed * ) {}@                $\C{// blocking call to remove \(\Rightarrow\) buffer empty}$
    302 }
    303 int main() {
    304         enum { Processors = 8 };
    305         processor p[Processors - 1];                    $\C{// one processor per thread, have one processor}$
    306         Consumer c[Processors / 2];                             $\C{// share processors}$
    307         Producer p[Processors / 2];
    308         sleep( 10`s );
    309         @close( chan );@                                                $\C{// stop producer and consumer}\CRT$
    310 }
    311 \end{cfa}
    312 
    313428\section{Performance}
    314429
    315430Given that the base implementation of the \CFA channels is very similar to the Go implementation, this section aims to show the performance of the two implementations are comparable.
    316 The microbenchmark for the channel comparison is similar to listing~\ref{l:cfa_resume}, where the number of threads and processors is set from the command line.
     431The microbenchmark for the channel comparison is similar to Figure~\ref{f:ChannelTermination}, where the number of threads and processors is set from the command line.
    317432The processors are divided equally between producers and consumers, with one producer or consumer owning each core.
    318433The number of cores is varied to measure how throughput scales.
  • doc/theses/colby_parsons_MMAth/thesis.tex

    r41639089 r9317419  
    200200\input{actors}
    201201
     202\input{waituntil}
     203
    202204%----------------------------------------------------------------------
    203205% END MATERIAL
Note: See TracChangeset for help on using the changeset viewer.