Context Navigation

← Previous Changeset
Next Changeset →

Changeset 9317419

Timestamp:

May 23, 2023, 4:55:30 PM (18 months ago)

Author:

Peter A. Buhr <pabuhr@…>

Branches:

ADT, ast-experimental, master

Children:

6c15d66, 6ece306, 8463136

Parents:

41639089 (diff), 76e77a4 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Location:

doc/theses/colby_parsons_MMAth

Files:

: 1 added
: 5 edited

Makefile (modified) (1 diff)
glossary.tex (modified) (1 diff)
local.bib (modified) (1 diff)
text/channels.tex (modified) (7 diffs)
text/waituntil.tex (added)
thesis.tex (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

doc/theses/colby_parsons_MMAth/Makefile

r41639089	r9317419
22	22	text/mutex_stmt \
23	23	text/channels \
	24	text/waituntil \
24	25	}
25	26

doc/theses/colby_parsons_MMAth/glossary.tex

-                      r41639089
+                      r9317419
 description={An implementation of the actor model.}
+}
+\newglossaryentry{synch_multiplex}
+{
+name=synchronous multiplexing,
+description={synchronization on some subset of a set of resources.}
+}

doc/theses/colby_parsons_MMAth/local.bib

-                      r41639089
+                      r9317419
 url={http://hdl.handle.net/10012/17617}
+}
+@article{Roscoe88,
+  title={The laws of occam programming},
+  author={Roscoe, Andrew William and Hoare, Charles Antony Richard},
+  journal={Theoretical Computer Science},
+  volume={60},
+  number={2},
+  pages={177--229},
+  year={1988},
+  publisher={Elsevier}
+}
+@article{Pike84,
+  title={The UNIX system: The blit: A multiplexed graphics terminal},
+  author={Pike, Rob},
+  journal={AT\&T Bell Laboratories Technical Journal},
+  volume={63},
+  number={8},
+  pages={1607--1631},
+  year={1984},
+  publisher={Nokia Bell Labs}
+}
+@inproceedings{Dice11,
+  title={Brief announcement: multilane-a concurrent blocking multiset},
+  author={Dice, David and Otenko, Oleksandr},
+  booktitle={Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures},
+  pages={313--314},
+  year={2011}
+}
+@misc{go:chan,
+  author = "The Go Programming Language",
+  title = "src/runtime/chan.go",
+  howpublished = {\href{https://go.dev/src/runtime/chan.go}},
+  note = "[Online; accessed 23-May-2023]"
+}
+@misc{go:select,
+  author = "The Go Programming Language",
+  title = "src/runtime/chan.go",
+  howpublished = {\href{https://go.dev/src/runtime/select.go}},
+  note = "[Online; accessed 23-May-2023]"
+}

doc/theses/colby_parsons_MMAth/text/channels.tex

-                      r41639089
+                      r9317419
 Additionally all channel operations in CSP are synchronous (no buffering).
 Advanced channels as a programming language feature has been popularized in recent years by the language Go~\cite{Go}, which encourages the use of channels as its fundamental concurrent feature.
 It was the popularity of Go channels that lead me to implement them in \CFA.
 Neither Go nor \CFA channels have the restrictions in early channel-based concurrent systems.
+It was the popularity of Go channels that lead to their implemention in \CFA.
+Neither Go nor \CFA channels have the restrictions of the early channel-based concurrent systems.
 \section{Producer-Consumer Problem}
 …
 Currently, only the Go programming language provides user-level threading where the primary communication mechanism is channels.
 Experiments were conducted that varied the producer-consumer problem algorithm and lock type used inside the channel.
+With the exception of non-\gls{fcfs} algorithms, no algorithm or lock usage in the channel implementation was found to be consistently more performant that Go's choice of algorithm and lock implementation.
+With the exception of non-\gls{fcfs} or non-FIFO algorithms, no algorithm or lock usage in the channel implementation was found to be consistently more performant that Go's choice of algorithm and lock implementation.
+Performance of channels can be improved by sharding the underlying buffer \cite{Dice11}.
+In doing so the FIFO property is lost, which is undesireable for user-facing channels.
 Therefore, the low-level channel implementation in \CFA is largely copied from the Go implementation, but adapted to the \CFA type and runtime systems.
 As such the research contributions added by \CFA's channel implementation lie in the realm of safety and productivity features.
+\PAB{Discuss the Go channel implementation. Need to tie in FIFO buffer and FCFS locking.}
+The Go channel implementation utilitizes cooperation between threads to achieve good performance~\cite{go:chan}.
+The cooperation between threads only occurs when producers or consumers need to block due to the buffer being full or empty.
+In these cases the blocking thread stores their relevant data in a shared location and the signalling thread will complete their operation before waking them.
+This helps improve performance in a few ways.
+First, each thread interacting with the channel with only acquire and release the internal channel lock exactly once.
+This decreases contention on the internal lock, as only entering threads will compete for the lock since signalled threads never reacquire the lock.
+The other advantage of the cooperation approach is that it eliminates the potential bottleneck of waiting for signalled threads.
+The property of acquiring/releasing the lock only once can be achieved without cooperation by \Newterm{baton passing} the lock.
+Baton passing is when one thread acquires a lock but does not release it, and instead signals a thread inside the critical section conceptually "passing" the mutual exclusion to the signalled thread.
+While baton passing is useful in some algorithms, it results in worse performance than the cooperation approach in channel implementations since all entering threads then need to wait for the blocked thread to reach the front of the ready queue and run before other operations on the channel can proceed.
 In this work, all channel sizes \see{Sections~\ref{s:ChannelSize}} are implemented with bounded buffers.
 …
 \subsection{Toggle-able Statistics}
+\PAB{Discuss toggle-able statistics.}
+As discussed, a channel is a concurrent layer over a bounded buffer.
+To achieve efficient buffering users should aim for as few blocking operations on a channel as possible.
+Often to achieve this users may change the buffer size, shard a channel into multiple channels, or tweak the number of producer and consumer threads.
+Fo users to be able to make informed decisions when tuning channel usage, toggle-able channel statistics are provided.
+The statistics are toggled at compile time via the @CHAN_STATS@ macro to ensure that they are entirely elided when not used.
+When statistics are turned on, four counters are maintained per channel, two for producers and two for consumers.
+The two counters per type of operation track the number of blocking operations and total operations.
+In the channel destructor the counters are printed out aggregated and also per type of operation.
+An example use case of the counters follows.
+A user is buffering information between producer and consumer threads and wants to analyze channel performance.
+Via the statistics they see that producers block for a large percentage of their operations while consumers do not block often.
+They then can use this information to adjust their number of producers/consumers or channel size to achieve a larger percentage of non-blocking producer operations, thus increasing their channel throughput.
 \subsection{Deadlock Detection}
+\PAB{Discuss deadlock detection.}
+The deadlock detection in the \CFA channels is fairly basic.
+It only detects the case where threads are blocked on the channel during deallocation.
+This case is guaranteed to deadlock since the list holding the blocked thread is internal to the channel and will be deallocated.
+If a user maintained a separate reference to a thread and unparked it outside the channel they could avoid the deadlock, but would run into other runtime errors since the thread would access channel data after waking that is now deallocated.
+More robust deadlock detection surrounding channel usage would have to be implemented separate from the channel implementation since it would require knowledge about the threading system and other channel/thread state.
 \subsection{Program Shutdown}
-% The other safety and productivity feature of \CFA channels deals with concurrent termination.
 Terminating concurrent programs is often one of the most difficult parts of writing concurrent code, particularly if graceful termination is needed.
 The difficulty of graceful termination often arises from the usage of synchronization primitives that need to be handled carefully during shutdown.
 …
 Thus, improperly handled \gls{toctou} issues with channels often result in deadlocks as threads trying to perform the termination may end up unexpectedly blocking in their attempt to help other threads exit the system.
+% C_TODO: add reference to select chapter, add citation to go channels info
+\paragraph{Go channels} provide a set of tools to help with concurrent shutdown.
+\paragraph{Go channels} provide a set of tools to help with concurrent shutdown~\cite{go:chan}.
 Channels in Go have a @close@ operation and a \Go{select} statement that both can be used to help threads terminate.
 The \Go{select} statement is discussed in \ref{waituntil}, where \CFA's @waituntil@ statement is compared with the Go \Go{select} statement.
+The \Go{select} statement is discussed in \ref{s:waituntil}, where \CFA's @waituntil@ statement is compared with the Go \Go{select} statement.
 The @close@ operation on a channel in Go changes the state of the channel.
 When a channel is closed, sends to the channel panic along with additional calls to @close@.
+Receives are handled differently where receivers never block on a closed channel and continue to remove elements from the channel.
+Receives are handled differently.
+Receivers (consumers) never block on a closed channel and continue to remove elements from the channel.
 Once a channel is empty, receivers can continue to remove elements, but receive the zero-value version of the element type.
 To avoid unwanted zero-value elements, Go provides the ability to iterate over a closed channel to remove the remaining elements.
 …
 While Go's channel closing semantics are powerful enough to perform any concurrent termination needed by a program, their lack of ease of use leaves much to be desired.
 Since both closing and sending panic once a channel is closed, a user often has to synchronize the senders to a channel before the channel can be closed to avoid panics.
+Since both closing and sending panic once a channel is closed, a user often has to synchronize the senders (producers) before the channel can be closed to avoid panics.
 However, in doing so it renders the @close@ operation nearly useless, as the only utilities it provides are the ability to ensure receivers no longer block on the channel and receive zero-valued elements.
 This functionality is only useful if the zero-typed element is recognized as a sentinel value, but if another sentinel value is necessary, then @close@ only provides the non-blocking feature.
 …
 \section{\CFA / Go channel Examples}
+To highlight the differences between \CFA's and Go's close semantics, an example program is presented.
+The program is a barrier implemented using two channels shown in Figure~\ref{f:ChannelBarrierTermination}.
+To highlight the differences between \CFA's and Go's close semantics, three examples will be presented.
+The first example is a simple shutdown case, where there are producer threads and consumer threads operating on a channel for a fixed duration.
+Once the duration ends, producers and consumers terminate without worrying about any leftover values in the channel.
+The second example extends the first example by requiring the channel to be empty upon shutdown.
+Both the first and second example are shown in Figure~\ref{f:ChannelTermination}.
+First the Go solutions to these examples shown in Figure~\ref{l:go_chan_term} are discussed.
+Since some of the elements being passed through the channel are zero-valued, closing the channel in Go does not aid in communicating shutdown.
+Instead, a different mechanism to communicate with the consumers and producers needs to be used.
+This use of an additional flag or communication method is common in Go channel shutdown code, since to avoid panics on a channel, the shutdown of a channel often has to be communicated with threads before it occurs.
+In this example, a flag is used to communicate with producers and another flag is used for consumers.
+Producers and consumers need separate avenues of communication both so that producers terminate before the channel is closed to avoid panicking, and to avoid the case where all the consumers terminate first, which can result in a deadlock for producers if the channel is full.
+The producer flag is set first, then after producers terminate the consumer flag is set and the channel is closed.
+In the second example where all values need to be consumed, the main thread iterates over the closed channel to process any remaining values.
+In the \CFA solutions in Figure~\ref{l:cfa_chan_term}, shutdown is communicated directly to both producers and consumers via the @close@ call.
+In the first example where all values do not need to be consumed, both producers and consumers do not handle the resumption and finish once they receive the termination exception.
+The second \CFA example where all values must be consumed highlights how resumption is used with channel shutdown.
+The @Producer@ thread-main knows to stop producing when the @insert@ call on a closed channel raises exception @channel_closed@.
+The @Consumer@ thread-main knows to stop consuming after all elements of a closed channel are removed and the call to @remove@ would block.
+Hence, the consumer knows the moment the channel closes because a resumption exception is raised, caught, and ignored, and then control returns to @remove@ to return another item from the buffer.
+Only when the buffer is drained and the call to @remove@ would block, a termination exception is raised to stop consuming.
+The \CFA semantics allow users to communicate channel shutdown directly through the channel, without having to share extra state between threads.
+Additionally, when the channel needs to be drained, \CFA provides users with easy options for processing the leftover channel values in the main thread or in the consumer threads.
+If one wishes to consume the leftover values in the consumer threads in Go, extra synchronization between the main thread and the consumer threads is needed.
+\begin{figure}
+\centering
+\begin{lrbox}{\myboxA}
+\begin{cfa}[aboveskip=0pt,belowskip=0pt]
+channel( size_t ) Channel{ ChannelSize };
+thread Consumer {};
+void main( Consumer & this ) {
+    try {
+        for ( ;; )
+            remove( Channel );
+    @} catchResume( channel_closed * ) { @
+    // handled resume => consume from chan
+    } catch( channel_closed * ) {
+        // empty or unhandled resume
+    }
+}
+thread Producer {};
+void main( Producer & this ) {
+    size_t count = 0;
+    try {
+        for ( ;; )
+            insert( Channel, count++ );
+    } catch ( channel_closed * ) {
+        // unhandled resume or full
+    }
+}
+int main( int argc, char * argv[] ) {
+    Consumer c[Consumers];
+    Producer p[Producers];
+    sleep(Duration`s);
+    close( Channel );
+    return 0;
+}
+\end{cfa}
+\end{lrbox}
+\begin{lrbox}{\myboxB}
+\begin{cfa}[aboveskip=0pt,belowskip=0pt]
+var cons_done, prod_done bool = false, false;
+var prodJoin chan int = make(chan int, Producers)
+var consJoin chan int = make(chan int, Consumers)
+func consumer( channel chan uint64 ) {
+    for {
+        if cons_done { break }
+        <-channel
+    }
+    consJoin <- 0 // synch with main thd
+}
+func producer( channel chan uint64 ) {
+    var count uint64 = 0
+    for {
+        if prod_done { break }
+        channel <- count++
+    }
+    prodJoin <- 0 // synch with main thd
+}
+func main() {
+    channel = make(chan uint64, ChannelSize)
+    for j := 0; j < Consumers; j++ {
+        go consumer( channel )
+    }
+    for j := 0; j < Producers; j++ {
+        go producer( channel )
+    }
+    time.Sleep(time.Second * Duration)
+    prod_done = true
+    for j := 0; j < Producers ; j++ {
+        <-prodJoin // wait for prods
+    }
+    cons_done = true
+    close(channel) // ensure no cons deadlock
+    @for elem := range channel { @
+        // process leftover values
+    @}@
+    for j := 0; j < Consumers; j++{
+        <-consJoin // wait for cons
+    }
+}
+\end{cfa}
+\end{lrbox}
+\subfloat[\CFA style]{\label{l:cfa_chan_term}\usebox\myboxA}
+\hspace*{3pt}
+\vrule
+\hspace*{3pt}
+\subfloat[Go style]{\label{l:go_chan_term}\usebox\myboxB}
+\caption{Channel Termination Examples 1 and 2. Code specific to example 2 is highlighted.}
+\label{f:ChannelTermination}
+\end{figure}
+The final shutdown example uses channels to implement a barrier.
+It is shown in Figure~\ref{f:ChannelBarrierTermination}.
+The problem of implementing a barrier is chosen since threads are both producers and consumers on the barrier-internal channels, which removes the ability to easily synchronize producers before consumers during shutdown.
+As such, while the shutdown details will be discussed with this problem in mind, they are also applicable to other problems taht have individual threads both producing and consuming from channels.
 Both of these examples are implemented using \CFA syntax so that they can be easily compared.
 Figure~\ref{l:cfa_chan_bar} uses \CFA-style channel close semantics and Figure~\ref{l:go_chan_bar} uses Go-style close semantics.
 In this problem it is infeasible to use the Go @close@ call since all threads are both potentially producers and consumers, causing panics on close to be unavoidable.
+In this example it is infeasible to use the Go @close@ call since all threads are both potentially producers and consumers, causing panics on close to be unavoidable without complex synchronization.
 As such in Figure~\ref{l:go_chan_bar} to implement a flush routine for the buffer, a sentinel value of @-1@ has to be used to indicate to threads that they need to leave the barrier.
 This sentinel value has to be checked at two points.
 Furthermore, an additional flag @done@ is needed to communicate to threads once they have left the barrier that they are done.
+This use of an additional flag or communication method is common in Go channel shutdown code, since to avoid panics on a channel, the shutdown of a channel often has to be communicated with threads before it occurs.
 In the \CFA version~\ref{l:cfa_chan_bar}, the barrier shutdown results in an exception being thrown at threads operating on it, which informs the threads that they must terminate.
 This avoids the need to use a separate communication method other than the barrier, and avoids extra conditional checks on the fast path of the barrier implementation.
 …
 \end{figure}
-Listing~\ref{l:cfa_resume} is an example of a channel closing with resumption.
-The @Producer@ thread-main knows to stop producing when the @insert@ call on a closed channel raises exception @channel_closed@.
-The @Consumer@ thread-main knows to stop consuming after all elements of a closed channel are removed and the call to @remove@ would block.
-Hence, the consumer knows the moment the channel closes because a resumption exception is raised, caught, and ignored, and then control returns to @remove@ to return another item from the buffer.
-Only when the buffer is drained and the call to @removed@ would block is a termination exception raised to stop consuming.
-The same program in Go would require explicit synchronization among producers and consumers by a mechanism outside the channel to ensure all elements are removed before threads terminate.
-\begin{cfa}[caption={\CFA channel resumption usage},label={l:cfa_resume}]
-channel( int ) chan{ 128 };
-thread Producer {};
-void main( Producer & this ) {
-        @try {@
-                for ( i; 0~$@$ )
-                        insert( chan, i );
-        @} catch( channel_closed * ) {}@                $\C[3in]{// channel closed}$
+}
-thread Consumer {};
-void main( Consumer & this ) {
-        size_t runs = 0;
-        @try {@
-                for () {
-                        int i = remove( chan );
+                }
-        @} catchResume( channel_closed * ) {}@  $\C{// remaining item in buffer \(\Rightarrow\) remove it}$
-          @catch( channel_closed * ) {}@                $\C{// blocking call to remove \(\Rightarrow\) buffer empty}$
+}
-int main() {
-        enum { Processors = 8 };
-        processor p[Processors - 1];                    $\C{// one processor per thread, have one processor}$
-        Consumer c[Processors / 2];                             $\C{// share processors}$
-        Producer p[Processors / 2];
-        sleep( 10`s );
-        @close( chan );@                                                $\C{// stop producer and consumer}\CRT$
+}
-\end{cfa}
 \section{Performance}
 Given that the base implementation of the \CFA channels is very similar to the Go implementation, this section aims to show the performance of the two implementations are comparable.
 The microbenchmark for the channel comparison is similar to listing~\ref{l:cfa_resume}, where the number of threads and processors is set from the command line.
+The microbenchmark for the channel comparison is similar to Figure~\ref{f:ChannelTermination}, where the number of threads and processors is set from the command line.
 The processors are divided equally between producers and consumers, with one producer or consumer owning each core.
 The number of cores is varied to measure how throughput scales.

doc/theses/colby_parsons_MMAth/thesis.tex

r41639089	r9317419
200	200	\input{actors}
201	201
	202	\input{waituntil}
	203
202	204	%----------------------------------------------------------------------
203	205	% END MATERIAL

Note: See TracChangeset for help on using the changeset viewer.