Ignore:
Timestamp:
May 29, 2023, 11:44:29 AM (2 years ago)
Author:
JiadaL <j82liang@…>
Branches:
ADT
Children:
fa2c005
Parents:
3a513d89 (diff), 2b78949 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' into ADT

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/colby_parsons_MMAth/text/mutex_stmt.tex

    r3a513d89 r044ae62  
    55% ======================================================================
    66
    7 The mutual exclusion problem was introduced by Dijkstra in 1965~\cite{Dijkstra65,Dijkstra65a}.
    8 There are several concurrent processes or threads that communicate by shared variables and from time to time need exclusive access to shared resources.
     7The mutual exclusion problem was introduced by Dijkstra in 1965~\cite{Dijkstra65,Dijkstra65a}:
     8there are several concurrent processes or threads that communicate by shared variables and from time to time need exclusive access to shared resources.
    99A shared resource and code manipulating it form a pairing called a \Newterm{critical section (CS)}, which is a many-to-one relationship;
    1010\eg if multiple files are being written to by multiple threads, only the pairings of simultaneous writes to the same files are CSs.
     
    2626
    2727\section{Monitor}
    28 \CFA provides a high-level locking object, called a \Newterm{monitor}, an elegant, efficient, high-level mechanisms for mutual exclusion and synchronization for shared-memory systems.
    29 First proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}, several concurrent programming languages provide monitors as an explicit language construct: \eg Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, \uC~\cite{Buhr92a} and Java~\cite{Java}.
     28\CFA provides a high-level locking object, called a \Newterm{monitor}, an elegant and efficient abstraction for providing mutual exclusion and synchronization for shared-memory systems.
     29First proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}.
     30Several concurrent programming languages provide monitors as an explicit language construct: \eg Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, \uC~\cite{Buhr92a} and Java~\cite{Java}.
    3031In addition, operating-system kernels and device drivers have a monitor-like structure, although they often use lower-level primitives such as mutex locks or semaphores to manually implement a monitor.
    3132
    3233Figure~\ref{f:AtomicCounter} shows a \CFA and Java monitor implementing an atomic counter.
    3334A \Newterm{monitor} is a programming technique that implicitly binds mutual exclusion to static function scope by call and return.
    34 Lock mutual exclusion, defined by acquire/release calls, is independent of lexical context (analogous to block versus heap storage allocation).
     35In contrast, lock mutual exclusion, defined by acquire/release calls, is independent of lexical context (analogous to block versus heap storage allocation).
    3536Restricting acquire and release points in a monitor eases programming, comprehension, and maintenance, at a slight cost in flexibility and efficiency.
    3637Ultimately, a monitor is implemented using a combination of basic locks and atomic instructions.
     
    110111To increase locking flexibility, some languages introduce a mutex statement.
    111112\VRef[Figure]{f:ReadersWriter} shows the outline of a reader/writer lock written as a \CFA monitor and mutex statements.
    112 (The exact lock implement is irrelevant.)
     113(The exact lock implementation is irrelevant.)
    113114The @read@ and @write@ functions are called with a reader/write lock and any arguments to perform reading or writing.
    114115The @read@ function is not MX because multiple readers can read simultaneously.
     
    271272The @scoped_lock@ uses a deadlock avoidance algorithm where all locks after the first are acquired using @try_lock@ and if any of the lock attempts fail, all acquired locks are released.
    272273This repeats after selecting a new starting point in a cyclic manner until all locks are acquired successfully.
    273 This deadlock avoidance algorithm is shown in Listing~\ref{l:cc_deadlock_avoid}.
     274This deadlock avoidance algorithm is shown in Figure~\ref{f:cc_deadlock_avoid}.
    274275The algorithm is taken directly from the source code of the @<mutex>@ header, with some renaming and comments for clarity.
    275276
    276 \begin{cfa}[caption={\CC \lstinline{scoped_lock} deadlock avoidance algorithm},label={l:cc_deadlock_avoid}]
     277\begin{figure}
     278\begin{cfa}
    277279int first = 0;  // first lock to attempt to lock
    278280do {
     
    291293} while ( ! locks[first].owns_lock() );  $\C{// is first lock held?}$
    292294\end{cfa}
    293 
    294 While the algorithm in \ref{l:cc_deadlock_avoid} successfully avoids deadlock, there is a livelock scenario.
     295\caption{\CC \lstinline{scoped_lock} deadlock avoidance algorithm}
     296\label{f:cc_deadlock_avoid}
     297\end{figure}
     298
     299While this algorithm successfully avoids deadlock, there is a livelock scenario.
    295300Assume two threads, $A$ and $B$, create a @scoped_lock@ accessing two locks, $L1$ and $L2$.
    296301A livelock can form as follows.
     
    335340\end{cquote}
    336341Comparatively, if the @scoped_lock@ is used and the same locks are acquired elsewhere, there is no concern of the @scoped_lock@ deadlocking, due to its avoidance scheme, but it may livelock.
    337 The convenience and safety of the @mutex@ statement, \eg guaranteed lock release with exceptions, should encourage programmers to always use it for locking, mitigating any deadlock scenario.
     342The convenience and safety of the @mutex@ statement, \eg guaranteed lock release with exceptions, should encourage programmers to always use it for locking, mitigating any deadlock scenario versus combining manual locking with the mutex statement.
    338343
    339344\section{Performance}
     
    345350
    346351The benchmark used to evaluate the avoidance algorithms repeatedly acquires a fixed number of locks in a random order and then releases them.
    347 The pseudo code for the deadlock avoidance benchmark is shown in \VRef[Listing]{l:deadlock_avoid_pseudo}.
     352The pseudocode for the deadlock avoidance benchmark is shown in \VRef[Listing]{l:deadlock_avoid_pseudo}.
    348353To ensure the comparison exercises the implementation of each lock avoidance algorithm, an identical spinlock is implemented in each language using a set of builtin atomics available in both \CC and \CFA.
    349354The benchmarks are run for a fixed duration of 10 seconds and then terminate.
     
    352357The median is calculated and is plotted alongside the 95\% confidence intervals for each point.
    353358
    354 \begin{cfa}[caption={Deadlock avoidance bendchmark pseudo code},label={l:deadlock_avoid_pseudo}]
    355 
    356 
    357 
    358 $\PAB{// add pseudo code}$
    359 
    360 
     359\begin{cfa}[caption={Deadlock avoidance benchmark pseudocode},label={l:deadlock_avoid_pseudo}]
     360
     361size_t n_locks; $\C{// number of locks}$
     362size_t n_thds; $\C{// number of threads}$
     363size_t n_gens; $\C{// number of random orderings (default 100)}$
     364size_t total = 0; $\C{// global throughput aggregator}$
     365volatile bool done = false; $\C{// termination flag}$
     366
     367test_spinlock locks[n_locks];
     368size_t rands[n_thds][n_locks * n_gens]; $\C{// random ordering per thread}$
     369
     370void main( worker & w ) with(w) { $\C{// thread main}$
     371    size_t count = 0, idx = 0;
     372    while ( !done ) {
     373        idx = (count % n_locks * n_gens) * n_locks; $\C{// get start of next sequence}$
     374        mutex(locks[rands[0]], ..., locks[rands[n_locks - 1]]){} $\C{// lock sequence of locks}$
     375        count++;
     376    }
     377    __atomic_add_fetch(&total, count, __ATOMIC_SEQ_CST); $\C{// atomically add to total}$
     378}
     379
     380int main( int argc, char * argv[] ) {
     381    gen_orders(); $\C{// generate random orderings}$
     382    {
     383        worker w[n_thds];
     384        sleep( 10`s );
     385        done = true;
     386    }
     387    printf( "%lu\n", total );
     388}
    361389
    362390\end{cfa}
     
    374402
    375403Figure~\ref{f:mutex_bench} shows the results of the benchmark experiments.
    376 \PAB{Make the points in the graphs for each line different.
    377 Also, make the text in the graphs larger.}
    378404The baseline results for both languages are mostly comparable, except for the 8 locks results in \ref{f:mutex_bench8_AMD} and \ref{f:mutex_bench8_Intel}, where the \CFA baseline is slightly slower.
    379405The avoidance result for both languages is significantly different, where \CFA's mutex statement achieves throughput that is magnitudes higher than \CC's @scoped_lock@.
     
    383409For example, on the AMD machine with 32 threads and 8 locks, the benchmarks would occasionally livelock indefinitely, with no threads making any progress for 3 hours before the experiment was terminated manually.
    384410It is likely that shorter bouts of livelock occurred in many of the experiments, which would explain large confidence intervals for some of the data points in the \CC data.
    385 In Figures~\ref{f:mutex_bench8_AMD} and \ref{f:mutex_bench8_Intel} the mutex statement performs better than the baseline.
    386 At 7 locks and above the mutex statement switches from a hard coded sort to insertion sort.
    387 It is likely that the improvement in throughput compared to baseline is due to the time spent in the insertion sort, which decreases contention on the locks.
     411In Figures~\ref{f:mutex_bench8_AMD} and \ref{f:mutex_bench8_Intel} there is the counter-intuitive result of the mutex statement performing better than the baseline.
     412At 7 locks and above the mutex statement switches from a hard coded sort to insertion sort, which should decrease performance.
     413It is likely the increase in throughput compared to baseline is due to the delay spent in the insertion sort, which decreases contention on the locks.
    388414
    389415\begin{figure}
Note: See TracChangeset for help on using the changeset viewer.