[0faacb8] | 1 | % ====================================================================== |
---|
| 2 | % ====================================================================== |
---|
| 3 | \chapter{Mutex Statement}\label{s:mutexstmt} |
---|
| 4 | % ====================================================================== |
---|
[512d937c] | 5 | % ====================================================================== |
---|
| 6 | |
---|
[e0e2f02] | 7 | The mutual exclusion problem was introduced by Dijkstra in 1965~\cite{Dijkstra65,Dijkstra65a}: |
---|
| 8 | there are several concurrent processes or threads that communicate by shared variables and from time to time need exclusive access to shared resources. |
---|
[16dff44] | 9 | A shared resource and code manipulating it form a pairing called a \Newterm{critical section (CS)}, which is a many-to-one relationship; |
---|
| 10 | \eg if multiple files are being written to by multiple threads, only the pairings of simultaneous writes to the same files are CSs. |
---|
| 11 | Regions of code where the thread is not interested in the resource are combined into the \Newterm{non-critical section (NCS)}. |
---|
| 12 | |
---|
| 13 | Exclusive access to a resource is provided by \Newterm{mutual exclusion (MX)}. |
---|
| 14 | MX is implemented by some form of \emph{lock}, where the CS is bracketed by lock procedures @acquire@ and @release@. |
---|
| 15 | Threads execute a loop of the form: |
---|
| 16 | \begin{cfa} |
---|
| 17 | loop of $thread$ p: |
---|
| 18 | NCS; |
---|
| 19 | acquire( lock ); CS; release( lock ); // protected critical section with MX |
---|
| 20 | end loop. |
---|
| 21 | \end{cfa} |
---|
| 22 | MX guarantees there is never more than one thread in the CS. |
---|
| 23 | MX must also guarantee eventual progress: when there are competing threads attempting access, eventually some competing thread succeeds, \ie acquires the CS, releases it, and returns to the NCS. |
---|
| 24 | % Lamport \cite[p.~329]{Lam86mx} extends this requirement to the exit protocol. |
---|
| 25 | A stronger constraint is that every thread that calls @acquire@ eventually succeeds after some reasonable bounded time. |
---|
| 26 | |
---|
| 27 | \section{Monitor} |
---|
[2d831a1] | 28 | \CFA provides a high-level locking object, called a \Newterm{monitor}, an elegant and efficient abstraction for providing mutual exclusion and synchronization for shared-memory systems. |
---|
[e0e2f02] | 29 | First proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}. |
---|
| 30 | Several concurrent programming languages provide monitors as an explicit language construct: \eg Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, \uC~\cite{Buhr92a} and Java~\cite{Java}. |
---|
[16dff44] | 31 | In addition, operating-system kernels and device drivers have a monitor-like structure, although they often use lower-level primitives such as mutex locks or semaphores to manually implement a monitor. |
---|
| 32 | |
---|
| 33 | Figure~\ref{f:AtomicCounter} shows a \CFA and Java monitor implementing an atomic counter. |
---|
| 34 | A \Newterm{monitor} is a programming technique that implicitly binds mutual exclusion to static function scope by call and return. |
---|
[e0e2f02] | 35 | In contrast, lock mutual exclusion, defined by acquire/release calls, is independent of lexical context (analogous to block versus heap storage allocation). |
---|
[16dff44] | 36 | Restricting acquire and release points in a monitor eases programming, comprehension, and maintenance, at a slight cost in flexibility and efficiency. |
---|
| 37 | Ultimately, a monitor is implemented using a combination of basic locks and atomic instructions. |
---|
| 38 | |
---|
| 39 | \begin{figure} |
---|
| 40 | \centering |
---|
| 41 | |
---|
| 42 | \begin{lrbox}{\myboxA} |
---|
| 43 | \begin{cfa}[aboveskip=0pt,belowskip=0pt] |
---|
| 44 | @monitor@ Aint { |
---|
| 45 | int cnt; |
---|
| 46 | }; |
---|
| 47 | int ++?( Aint & @mutex@ m ) { return ++m.cnt; } |
---|
| 48 | int ?=?( Aint & @mutex@ l, int r ) { l.cnt = r; } |
---|
| 49 | int ?=?(int & l, Aint & r) { l = r.cnt; } |
---|
| 50 | |
---|
| 51 | int i = 0, j = 0; |
---|
| 52 | Aint x = { 0 }, y = { 0 }; $\C[1.5in]{// no mutex}$ |
---|
| 53 | ++x; ++y; $\C{// mutex}$ |
---|
| 54 | x = 2; y = i; $\C{// mutex}$ |
---|
| 55 | i = x; j = y; $\C{// no mutex}\CRT$ |
---|
| 56 | \end{cfa} |
---|
| 57 | \end{lrbox} |
---|
| 58 | |
---|
| 59 | \begin{lrbox}{\myboxB} |
---|
| 60 | \begin{java}[aboveskip=0pt,belowskip=0pt] |
---|
| 61 | class Aint { |
---|
| 62 | private int cnt; |
---|
| 63 | public Aint( int init ) { cnt = init; } |
---|
| 64 | @synchronized@ public int inc() { return ++cnt; } |
---|
| 65 | @synchronized@ public void set( int r ) {cnt = r;} |
---|
| 66 | public int get() { return cnt; } |
---|
| 67 | } |
---|
| 68 | int i = 0, j = 0; |
---|
| 69 | Aint x = new Aint( 0 ), y = new Aint( 0 ); |
---|
| 70 | x.inc(); y.inc(); |
---|
| 71 | x.set( 2 ); y.set( i ); |
---|
| 72 | i = x.get(); j = y.get(); |
---|
| 73 | \end{java} |
---|
| 74 | \end{lrbox} |
---|
| 75 | |
---|
| 76 | \subfloat[\CFA]{\label{f:AtomicCounterCFA}\usebox\myboxA} |
---|
| 77 | \hspace*{3pt} |
---|
| 78 | \vrule |
---|
| 79 | \hspace*{3pt} |
---|
| 80 | \subfloat[Java]{\label{f:AtomicCounterJava}\usebox\myboxB} |
---|
| 81 | \caption{Atomic integer counter} |
---|
| 82 | \label{f:AtomicCounter} |
---|
| 83 | \end{figure} |
---|
| 84 | |
---|
| 85 | Like Java, \CFA monitors have \Newterm{multi-acquire} semantics so the thread in the monitor may acquire it multiple times without deadlock, allowing recursion and calling other MX functions. |
---|
| 86 | For robustness, \CFA monitors ensure the monitor lock is released regardless of how an acquiring function ends, normal or exceptional, and returning a shared variable is safe via copying before the lock is released. |
---|
| 87 | Monitor objects can be passed through multiple helper functions without acquiring mutual exclusion, until a designated function associated with the object is called. |
---|
| 88 | \CFA functions are designated MX by one or more pointer/reference parameters having qualifier @mutex@. |
---|
| 89 | Java members are designated MX with \lstinline[language=java]{synchronized}, which applies only to the implicit receiver parameter. |
---|
| 90 | In the example, the increment and setter operations need mutual exclusion, while the read-only getter operation is not MX because reading an integer is atomic. |
---|
| 91 | |
---|
| 92 | As stated, the non-object-oriented nature of \CFA monitors allows a function to acquire multiple mutex objects. |
---|
| 93 | For example, the bank-transfer problem requires locking two bank accounts to safely debit and credit money between accounts. |
---|
| 94 | \begin{cfa} |
---|
| 95 | monitor BankAccount { |
---|
| 96 | int balance; |
---|
| 97 | }; |
---|
| 98 | void deposit( BankAccount & mutex b, int deposit ) with( b ) { |
---|
| 99 | balance += deposit; |
---|
| 100 | } |
---|
| 101 | void transfer( BankAccount & mutex my, BankAccount & mutex your, int me2you ) { |
---|
| 102 | deposit( my, -me2you ); $\C{// debit}$ |
---|
| 103 | deposit( your, me2you ); $\C{// credit}$ |
---|
| 104 | } |
---|
| 105 | \end{cfa} |
---|
| 106 | The \CFA monitor implementation ensures multi-lock acquisition is done in a deadlock-free manner regardless of the number of MX parameters and monitor arguments. |
---|
| 107 | |
---|
| 108 | |
---|
| 109 | \section{\lstinline{mutex} statement} |
---|
| 110 | Restricting implicit lock acquisition to function entry and exit can be awkward for certain problems. |
---|
| 111 | To increase locking flexibility, some languages introduce a mutex statement. |
---|
| 112 | \VRef[Figure]{f:ReadersWriter} shows the outline of a reader/writer lock written as a \CFA monitor and mutex statements. |
---|
[2d831a1] | 113 | (The exact lock implementation is irrelevant.) |
---|
[16dff44] | 114 | The @read@ and @write@ functions are called with a reader/write lock and any arguments to perform reading or writing. |
---|
| 115 | The @read@ function is not MX because multiple readers can read simultaneously. |
---|
| 116 | MX is acquired within @read@ by calling the (nested) helper functions @StartRead@ and @EndRead@ or executing the mutex statements. |
---|
| 117 | Between the calls or statements, reads can execute simultaneous within the body of @read@. |
---|
| 118 | The @write@ function does not require refactoring because writing is a CS. |
---|
| 119 | The mutex-statement version is better because it has fewer names, less argument/parameter passing, and can possibly hold MX for a shorter duration. |
---|
| 120 | |
---|
| 121 | \begin{figure} |
---|
| 122 | \centering |
---|
| 123 | |
---|
| 124 | \begin{lrbox}{\myboxA} |
---|
| 125 | \begin{cfa}[aboveskip=0pt,belowskip=0pt] |
---|
| 126 | monitor RWlock { ... }; |
---|
| 127 | void read( RWlock & rw, ... ) { |
---|
| 128 | void StartRead( RWlock & @mutex@ rw ) { ... } |
---|
| 129 | void EndRead( RWlock & @mutex@ rw ) { ... } |
---|
| 130 | StartRead( rw ); |
---|
| 131 | ... // read without MX |
---|
| 132 | EndRead( rw ); |
---|
| 133 | } |
---|
| 134 | void write( RWlock & @mutex@ rw, ... ) { |
---|
| 135 | ... // write with MX |
---|
| 136 | } |
---|
| 137 | \end{cfa} |
---|
| 138 | \end{lrbox} |
---|
| 139 | |
---|
| 140 | \begin{lrbox}{\myboxB} |
---|
| 141 | \begin{cfa}[aboveskip=0pt,belowskip=0pt] |
---|
| 142 | |
---|
| 143 | void read( RWlock & rw, ... ) { |
---|
| 144 | |
---|
| 145 | |
---|
| 146 | @mutex@( rw ) { ... } |
---|
| 147 | ... // read without MX |
---|
| 148 | @mutex@{ rw ) { ... } |
---|
| 149 | } |
---|
| 150 | void write( RWlock & @mutex@ rw, ... ) { |
---|
| 151 | ... // write with MX |
---|
| 152 | } |
---|
| 153 | \end{cfa} |
---|
| 154 | \end{lrbox} |
---|
| 155 | |
---|
| 156 | \subfloat[monitor]{\label{f:RWmonitor}\usebox\myboxA} |
---|
| 157 | \hspace*{3pt} |
---|
| 158 | \vrule |
---|
| 159 | \hspace*{3pt} |
---|
| 160 | \subfloat[mutex statement]{\label{f:RWmutexstmt}\usebox\myboxB} |
---|
| 161 | \caption{Readers writer problem} |
---|
| 162 | \label{f:ReadersWriter} |
---|
| 163 | \end{figure} |
---|
| 164 | |
---|
| 165 | This work adds a mutex statement to \CFA, but generalizes it beyond implicit monitor locks. |
---|
| 166 | In detail, the mutex statement has a clause and statement block, similar to a conditional or loop statement. |
---|
| 167 | The clause accepts any number of lockable objects (like a \CFA MX function prototype), and locks them for the duration of the statement. |
---|
| 168 | The locks are acquired in a deadlock free manner and released regardless of how control-flow exits the statement. |
---|
| 169 | The mutex statement provides easy lock usage in the common case of lexically wrapping a CS. |
---|
| 170 | Examples of \CFA mutex statement are shown in \VRef[Listing]{l:cfa_mutex_ex}. |
---|
| 171 | |
---|
| 172 | \begin{cfa}[caption={\CFA mutex statement usage},label={l:cfa_mutex_ex}] |
---|
[512d937c] | 173 | owner_lock lock1, lock2, lock3; |
---|
[16dff44] | 174 | @mutex@( lock2, lock3 ) ...; $\C{// inline statement}$ |
---|
| 175 | @mutex@( lock1, lock2, lock3 ) { ... } $\C{// statement block}$ |
---|
| 176 | void transfer( BankAccount & my, BankAccount & your, int me2you ) { |
---|
| 177 | ... // check values, no MX |
---|
| 178 | @mutex@( my, your ) { // MX is shorter duration that function body |
---|
| 179 | deposit( my, -me2you ); $\C{// debit}$ |
---|
| 180 | deposit( your, me2you ); $\C{// credit}$ |
---|
| 181 | } |
---|
[512d937c] | 182 | } |
---|
[9363b1b] | 183 | \end{cfa} |
---|
[512d937c] | 184 | |
---|
| 185 | \section{Other Languages} |
---|
[16dff44] | 186 | There are similar constructs to the mutex statement in other programming languages. |
---|
| 187 | Java has a feature called a synchronized statement, which looks like the \CFA's mutex statement, but only accepts a single object in the clause and only handles monitor locks. |
---|
| 188 | The \CC standard library has a @scoped_lock@, which is also similar to the mutex statement. |
---|
| 189 | The @scoped_lock@ takes any number of locks in its constructor, and acquires them in a deadlock-free manner. |
---|
| 190 | It then releases them when the @scoped_lock@ object is deallocated using \gls{raii}. |
---|
| 191 | An example of \CC @scoped_lock@ is shown in \VRef[Listing]{l:cc_scoped_lock}. |
---|
| 192 | |
---|
| 193 | \begin{cfa}[caption={\CC \lstinline{scoped_lock} usage},label={l:cc_scoped_lock}] |
---|
| 194 | struct BankAccount { |
---|
| 195 | @recursive_mutex m;@ $\C{// must be recursive}$ |
---|
| 196 | int balance = 0; |
---|
| 197 | }; |
---|
| 198 | void deposit( BankAccount & b, int deposit ) { |
---|
| 199 | @scoped_lock lock( b.m );@ $\C{// RAII acquire}$ |
---|
| 200 | b.balance += deposit; |
---|
| 201 | } $\C{// RAII release}$ |
---|
| 202 | void transfer( BankAccount & my, BankAccount & your, int me2you ) { |
---|
| 203 | @scoped_lock lock( my.m, your.m );@ $\C{// RAII acquire}$ |
---|
| 204 | deposit( my, -me2you ); $\C{// debit}$ |
---|
| 205 | deposit( your, me2you ); $\C{// credit}$ |
---|
| 206 | } $\C{// RAII release}$ |
---|
[9363b1b] | 207 | \end{cfa} |
---|
[512d937c] | 208 | |
---|
| 209 | \section{\CFA implementation} |
---|
[16dff44] | 210 | The \CFA mutex statement takes some ideas from both the Java and \CC features. |
---|
| 211 | Like Java, \CFA introduces a new statement rather than building from existing language features. |
---|
| 212 | (\CFA has sufficient language features to mimic \CC RAII locking.) |
---|
| 213 | This syntactic choice makes MX explicit rather than implicit via object declarations. |
---|
| 214 | Hence, it is easier for programmers and language tools to identify MX points in a program, \eg scan for all @mutex@ parameters and statements in a body of code. |
---|
| 215 | Furthermore, concurrent safety is provided across an entire program for the complex operation of acquiring multiple locks in a deadlock-free manner. |
---|
| 216 | Unlike Java, \CFA's mutex statement and \CC's @scoped_lock@ both use parametric polymorphism to allow user defined types to work with this feature. |
---|
| 217 | In this case, the polymorphism allows a locking mechanism to acquire MX over an object without having to know the object internals or what kind of lock it is using. |
---|
| 218 | \CFA's provides and uses this locking trait: |
---|
| 219 | \begin{cfa} |
---|
| 220 | forall( L & | sized(L) ) |
---|
| 221 | trait is_lock { |
---|
| 222 | void lock( L & ); |
---|
| 223 | void unlock( L & ); |
---|
| 224 | }; |
---|
[9363b1b] | 225 | \end{cfa} |
---|
[16dff44] | 226 | \CC @scoped_lock@ has this trait implicitly based on functions accessed in a template. |
---|
| 227 | @scoped_lock@ also requires @try_lock@ because of its technique for deadlock avoidance \see{\VRef{s:DeadlockAvoidance}}. |
---|
[512d937c] | 228 | |
---|
[16dff44] | 229 | The following shows how the @mutex@ statement is used with \CFA streams to eliminate unpredictable results when printing in a concurrent program. |
---|
| 230 | For example, if two threads execute: |
---|
| 231 | \begin{cfa} |
---|
| 232 | thread$\(_1\)$ : sout | "abc" | "def"; |
---|
| 233 | thread$\(_2\)$ : sout | "uvw" | "xyz"; |
---|
| 234 | \end{cfa} |
---|
| 235 | any of the outputs can appear, included a segment fault due to I/O buffer corruption: |
---|
| 236 | \begin{cquote} |
---|
| 237 | \small\tt |
---|
| 238 | \begin{tabular}{@{}l|l|l|l|l@{}} |
---|
| 239 | abc def & abc uvw xyz & uvw abc xyz def & abuvwc dexf & uvw abc def \\ |
---|
| 240 | uvw xyz & def & & yz & xyz |
---|
| 241 | \end{tabular} |
---|
| 242 | \end{cquote} |
---|
| 243 | The stream type for @sout@ is defined to satisfy the @is_lock@ trait, so the @mutex@ statement can be used to lock an output stream while producing output. |
---|
| 244 | From the programmer's perspective, it is sufficient to know an object can be locked and then any necessary MX is easily available via the @mutex@ statement. |
---|
| 245 | This ability improves safety and programmer productivity since it abstracts away the concurrent details. |
---|
| 246 | Hence, a programmer can easily protect cascaded I/O expressions: |
---|
| 247 | \begin{cfa} |
---|
| 248 | thread$\(_1\)$ : mutex( sout ) sout | "abc" | "def"; |
---|
| 249 | thread$\(_2\)$ : mutex( sout ) sout | "uvw" | "xyz"; |
---|
| 250 | \end{cfa} |
---|
| 251 | constraining the output to two different lines in either order: |
---|
| 252 | \begin{cquote} |
---|
| 253 | \small\tt |
---|
| 254 | \begin{tabular}{@{}l|l@{}} |
---|
| 255 | abc def & uvw xyz \\ |
---|
| 256 | uvw xyz & abc def |
---|
| 257 | \end{tabular} |
---|
| 258 | \end{cquote} |
---|
| 259 | where this level of safe nondeterministic output is acceptable. |
---|
| 260 | Alternatively, multiple I/O statements can be protected using the mutex statement block: |
---|
| 261 | \begin{cfa} |
---|
| 262 | mutex( sout ) { // acquire stream lock for sout for block duration |
---|
| 263 | sout | "abc"; |
---|
| 264 | mutex( sout ) sout | "uvw" | "xyz"; // OK because sout lock is recursive |
---|
| 265 | sout | "def"; |
---|
| 266 | } // implicitly release sout lock |
---|
| 267 | \end{cfa} |
---|
| 268 | The inner lock acquire is likely to occur through a function call that does a thread-safe print. |
---|
[512d937c] | 269 | |
---|
[16dff44] | 270 | \section{Deadlock Avoidance}\label{s:DeadlockAvoidance} |
---|
| 271 | The mutex statement uses the deadlock avoidance technique of lock ordering, where the circular-wait condition of a deadlock cannot occur if all locks are acquired in the same order. |
---|
| 272 | The @scoped_lock@ uses a deadlock avoidance algorithm where all locks after the first are acquired using @try_lock@ and if any of the lock attempts fail, all acquired locks are released. |
---|
| 273 | This repeats after selecting a new starting point in a cyclic manner until all locks are acquired successfully. |
---|
[e0e2f02] | 274 | This deadlock avoidance algorithm is shown in Figure~\ref{f:cc_deadlock_avoid}. |
---|
[16dff44] | 275 | The algorithm is taken directly from the source code of the @<mutex>@ header, with some renaming and comments for clarity. |
---|
| 276 | |
---|
[e0e2f02] | 277 | \begin{figure} |
---|
| 278 | \begin{cfa} |
---|
[512d937c] | 279 | int first = 0; // first lock to attempt to lock |
---|
| 280 | do { |
---|
[16dff44] | 281 | // locks is the array of locks to acquire |
---|
| 282 | locks[first].lock(); $\C{// lock first lock}$ |
---|
| 283 | for ( int i = 1; i < Num_Locks; i += 1 ) { $\C{// iterate over rest of locks}$ |
---|
| 284 | const int idx = (first + i) % Num_Locks; |
---|
| 285 | if ( ! locks[idx].try_lock() ) { $\C{// try lock each one}$ |
---|
| 286 | for ( int j = i; j != 0; j -= 1 ) $\C{// release all locks}$ |
---|
| 287 | locks[(first + j - 1) % Num_Locks].unlock(); |
---|
| 288 | first = idx; $\C{// rotate which lock to acquire first}$ |
---|
| 289 | break; |
---|
| 290 | } |
---|
| 291 | } |
---|
[512d937c] | 292 | // if first lock is still held then all have been acquired |
---|
[16dff44] | 293 | } while ( ! locks[first].owns_lock() ); $\C{// is first lock held?}$ |
---|
[9363b1b] | 294 | \end{cfa} |
---|
[e0e2f02] | 295 | \caption{\CC \lstinline{scoped_lock} deadlock avoidance algorithm} |
---|
| 296 | \label{f:cc_deadlock_avoid} |
---|
| 297 | \end{figure} |
---|
[512d937c] | 298 | |
---|
[e0e2f02] | 299 | While this algorithm successfully avoids deadlock, there is a livelock scenario. |
---|
[16dff44] | 300 | Assume two threads, $A$ and $B$, create a @scoped_lock@ accessing two locks, $L1$ and $L2$. |
---|
| 301 | A livelock can form as follows. |
---|
| 302 | Thread $A$ creates a @scoped_lock@ with arguments $L1$, $L2$, and $B$ creates a scoped lock with the lock arguments in the opposite order $L2$, $L1$. |
---|
| 303 | Both threads acquire the first lock in their order and then fail the @try_lock@ since the other lock is held. |
---|
| 304 | Both threads then reset their starting lock to be their second lock and try again. |
---|
| 305 | This time $A$ has order $L2$, $L1$, and $B$ has order $L1$, $L2$, which is identical to the starting setup but with the ordering swapped between threads. |
---|
| 306 | If the threads perform this action in lock-step, they cycle indefinitely without entering the CS, \ie livelock. |
---|
| 307 | Hence, to use @scoped_lock@ safely, a programmer must manually construct and maintain a global ordering of lock arguments passed to @scoped_lock@. |
---|
[512d937c] | 308 | |
---|
[16dff44] | 309 | The lock ordering algorithm used in \CFA mutex functions and statements is deadlock and livelock free. |
---|
| 310 | The algorithm uses the lock memory addresses as keys, sorts the keys, and then acquires the locks in sorted order. |
---|
| 311 | For fewer than 7 locks ($2^3-1$), the sort is unrolled performing the minimum number of compare and swaps for the given number of locks; |
---|
| 312 | for 7 or more locks, insertion sort is used. |
---|
| 313 | Since it is extremely rare to hold more than 6 locks at a time, the algorithm is fast and executes in $O(1)$ time. |
---|
| 314 | Furthermore, lock addresses are unique across program execution, even for dynamically allocated locks, so the algorithm is safe across the entire program execution. |
---|
| 315 | |
---|
| 316 | The downside to the sorting approach is that it is not fully compatible with manual usages of the same locks outside the @mutex@ statement, \ie the lock are acquired without using the @mutex@ statement. |
---|
| 317 | The following scenario is a classic deadlock. |
---|
| 318 | \begin{cquote} |
---|
| 319 | \begin{tabular}{@{}l@{\hspace{30pt}}l@{}} |
---|
| 320 | \begin{cfa} |
---|
| 321 | lock L1, L2; // assume &L1 < &L2 |
---|
| 322 | $\textbf{thread\(_1\)}$ |
---|
| 323 | acquire( L2 ); |
---|
| 324 | acquire( L1 ); |
---|
| 325 | CS |
---|
| 326 | release( L1 ); |
---|
| 327 | release( L2 ); |
---|
| 328 | \end{cfa} |
---|
| 329 | & |
---|
| 330 | \begin{cfa} |
---|
| 331 | |
---|
| 332 | $\textbf{thread\(_2\)}$ |
---|
| 333 | mutex( L1, L2 ) { |
---|
| 334 | |
---|
| 335 | CS |
---|
| 336 | |
---|
| 337 | } |
---|
| 338 | \end{cfa} |
---|
| 339 | \end{tabular} |
---|
| 340 | \end{cquote} |
---|
| 341 | Comparatively, if the @scoped_lock@ is used and the same locks are acquired elsewhere, there is no concern of the @scoped_lock@ deadlocking, due to its avoidance scheme, but it may livelock. |
---|
[e0e2f02] | 342 | The convenience and safety of the @mutex@ statement, \eg guaranteed lock release with exceptions, should encourage programmers to always use it for locking, mitigating any deadlock scenario versus combining manual locking with the mutex statement. |
---|
[512d937c] | 343 | |
---|
| 344 | \section{Performance} |
---|
[16dff44] | 345 | Given the two multi-acquisition algorithms in \CC and \CFA, each with differing advantages and disadvantages, it interesting to compare their performance. |
---|
| 346 | Comparison with Java is not possible, since it only takes a single lock. |
---|
| 347 | |
---|
| 348 | The comparison starts with a baseline that acquires the locks directly without a mutex statement or @scoped_lock@ in a fixed ordering and then releases them. |
---|
| 349 | The baseline helps highlight the cost of the deadlock avoidance/prevention algorithms for each implementation. |
---|
| 350 | |
---|
| 351 | The benchmark used to evaluate the avoidance algorithms repeatedly acquires a fixed number of locks in a random order and then releases them. |
---|
[84018e0] | 352 | The pseudocode for the deadlock avoidance benchmark is shown in \VRef[Listing]{l:deadlock_avoid_pseudo}. |
---|
[16dff44] | 353 | To ensure the comparison exercises the implementation of each lock avoidance algorithm, an identical spinlock is implemented in each language using a set of builtin atomics available in both \CC and \CFA. |
---|
| 354 | The benchmarks are run for a fixed duration of 10 seconds and then terminate. |
---|
| 355 | The total number of times the group of locks is acquired is returned for each thread. |
---|
| 356 | Each variation is run 11 times on 2, 4, 8, 16, 24, 32 cores and with 2, 4, and 8 locks being acquired. |
---|
[9a5a2cd] | 357 | The median is calculated and is plotted alongside the 95\% confidence intervals for each point. |
---|
| 358 | |
---|
[84018e0] | 359 | \begin{cfa}[caption={Deadlock avoidance benchmark pseudocode},label={l:deadlock_avoid_pseudo}] |
---|
[16dff44] | 360 | |
---|
[84018e0] | 361 | size_t n_locks; $\C{// number of locks}$ |
---|
| 362 | size_t n_thds; $\C{// number of threads}$ |
---|
| 363 | size_t n_gens; $\C{// number of random orderings (default 100)}$ |
---|
[2d831a1] | 364 | size_t total = 0; $\C{// global throughput aggregator}$ |
---|
| 365 | volatile bool done = false; $\C{// termination flag}$ |
---|
| 366 | |
---|
[84018e0] | 367 | test_spinlock locks[n_locks]; |
---|
| 368 | size_t rands[n_thds][n_locks * n_gens]; $\C{// random ordering per thread}$ |
---|
[2d831a1] | 369 | |
---|
| 370 | void main( worker & w ) with(w) { $\C{// thread main}$ |
---|
| 371 | size_t count = 0, idx = 0; |
---|
| 372 | while ( !done ) { |
---|
[84018e0] | 373 | idx = (count % n_locks * n_gens) * n_locks; $\C{// get start of next sequence}$ |
---|
| 374 | mutex(locks[rands[0]], ..., locks[rands[n_locks - 1]]){} $\C{// lock sequence of locks}$ |
---|
[2d831a1] | 375 | count++; |
---|
| 376 | } |
---|
| 377 | __atomic_add_fetch(&total, count, __ATOMIC_SEQ_CST); $\C{// atomically add to total}$ |
---|
| 378 | } |
---|
[16dff44] | 379 | |
---|
[2d831a1] | 380 | int main( int argc, char * argv[] ) { |
---|
| 381 | gen_orders(); $\C{// generate random orderings}$ |
---|
| 382 | { |
---|
[84018e0] | 383 | worker w[n_thds]; |
---|
[2d831a1] | 384 | sleep( 10`s ); |
---|
| 385 | done = true; |
---|
| 386 | } |
---|
| 387 | printf( "%lu\n", total ); |
---|
| 388 | } |
---|
[16dff44] | 389 | |
---|
| 390 | \end{cfa} |
---|
| 391 | |
---|
| 392 | The performance experiments were run on the following multi-core hardware systems to determine differences across platforms: |
---|
| 393 | \begin{list}{\arabic{enumi}.}{\usecounter{enumi}\topsep=5pt\parsep=5pt\itemsep=0pt} |
---|
| 394 | % sudo dmidecode -t system |
---|
| 395 | \item |
---|
| 396 | Supermicro AS--1123US--TR4 AMD EPYC 7662 64--core socket, hyper-threading $\times$ 2 sockets (256 processing units) 2.0 GHz, TSO memory model, running Linux v5.8.0--55--generic, gcc--10 compiler |
---|
| 397 | \item |
---|
| 398 | Supermicro SYS--6029U--TR4 Intel Xeon Gold 5220R 24--core socket, hyper-threading $\times$ 2 sockets (48 processing units) 2.2GHz, TSO memory model, running Linux v5.8.0--59--generic, gcc--10 compiler |
---|
| 399 | \end{list} |
---|
| 400 | %The hardware architectures are different in threading (multithreading vs hyper), cache structure (MESI or MESIF), NUMA layout (QPI vs HyperTransport), memory model (TSO vs WO), and energy/thermal mechanisms (turbo-boost). |
---|
| 401 | %Software that runs well on one architecture may run poorly or not at all on another. |
---|
| 402 | |
---|
| 403 | Figure~\ref{f:mutex_bench} shows the results of the benchmark experiments. |
---|
| 404 | The baseline results for both languages are mostly comparable, except for the 8 locks results in \ref{f:mutex_bench8_AMD} and \ref{f:mutex_bench8_Intel}, where the \CFA baseline is slightly slower. |
---|
| 405 | The avoidance result for both languages is significantly different, where \CFA's mutex statement achieves throughput that is magnitudes higher than \CC's @scoped_lock@. |
---|
| 406 | The slowdown for @scoped_lock@ is likely due to its deadlock-avoidance implementation. |
---|
| 407 | Since it uses a retry based mechanism, it can take a long time for threads to progress. |
---|
| 408 | Additionally the potential for livelock in the algorithm can result in very little throughput under high contention. |
---|
| 409 | For example, on the AMD machine with 32 threads and 8 locks, the benchmarks would occasionally livelock indefinitely, with no threads making any progress for 3 hours before the experiment was terminated manually. |
---|
| 410 | It is likely that shorter bouts of livelock occurred in many of the experiments, which would explain large confidence intervals for some of the data points in the \CC data. |
---|
[e0e2f02] | 411 | In Figures~\ref{f:mutex_bench8_AMD} and \ref{f:mutex_bench8_Intel} there is the counter-intuitive result of the mutex statement performing better than the baseline. |
---|
| 412 | At 7 locks and above the mutex statement switches from a hard coded sort to insertion sort, which should decrease performance. |
---|
| 413 | It is likely the increase in throughput compared to baseline is due to the delay spent in the insertion sort, which decreases contention on the locks. |
---|
[16dff44] | 414 | |
---|
| 415 | \begin{figure} |
---|
| 416 | \centering |
---|
| 417 | \subfloat[AMD]{ |
---|
| 418 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Aggregate_Lock_2.pgf}} |
---|
| 419 | } |
---|
| 420 | \subfloat[Intel]{ |
---|
| 421 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Aggregate_Lock_2.pgf}} |
---|
| 422 | } |
---|
| 423 | |
---|
| 424 | \subfloat[AMD]{ |
---|
| 425 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Aggregate_Lock_4.pgf}} |
---|
| 426 | } |
---|
| 427 | \subfloat[Intel]{ |
---|
| 428 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Aggregate_Lock_4.pgf}} |
---|
| 429 | } |
---|
| 430 | |
---|
| 431 | \subfloat[AMD]{ |
---|
| 432 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Aggregate_Lock_8.pgf}} |
---|
| 433 | \label{f:mutex_bench8_AMD} |
---|
| 434 | } |
---|
| 435 | \subfloat[Intel]{ |
---|
| 436 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Aggregate_Lock_8.pgf}} |
---|
| 437 | \label{f:mutex_bench8_Intel} |
---|
| 438 | } |
---|
| 439 | \caption{The aggregate lock benchmark comparing \CC \lstinline{scoped_lock} and \CFA mutex statement throughput (higher is better).} |
---|
| 440 | \label{f:mutex_bench} |
---|
| 441 | \end{figure} |
---|
| 442 | |
---|
| 443 | % Local Variables: % |
---|
| 444 | % tab-width: 4 % |
---|
| 445 | % End: % |
---|