1 | % ======================================================================
|
---|
2 | % ======================================================================
|
---|
3 | \chapter{Mutex Statement}\label{s:mutexstmt}
|
---|
4 | % ======================================================================
|
---|
5 | % ======================================================================
|
---|
6 |
|
---|
7 | The mutual exclusion problem was introduced by Dijkstra in 1965~\cite{Dijkstra65,Dijkstra65a}:
|
---|
8 | there are several concurrent processes or threads that communicate by shared variables and from time to time need exclusive access to shared resources.
|
---|
9 | A shared resource and code manipulating it form a pairing called a \Newterm{critical section (CS)}, which is a many-to-one relationship;
|
---|
10 | \eg if multiple files are being written to by multiple threads, only the pairings of simultaneous writes to the same files are CSs.
|
---|
11 | Regions of code where the thread is not interested in the resource are combined into the \Newterm{non-critical section (NCS)}.
|
---|
12 |
|
---|
13 | Exclusive access to a resource is provided by \Newterm{mutual exclusion (MX)}.
|
---|
14 | MX is implemented by some form of \emph{lock}, where the CS is bracketed by lock procedures @acquire@ and @release@.
|
---|
15 | Threads execute a loop of the form:
|
---|
16 | \begin{cfa}
|
---|
17 | loop of $thread$ p:
|
---|
18 | NCS;
|
---|
19 | acquire( lock ); CS; release( lock ); // protected critical section with MX
|
---|
20 | end loop.
|
---|
21 | \end{cfa}
|
---|
22 | MX guarantees there is never more than one thread in the CS.
|
---|
23 | MX must also guarantee eventual progress: when there are competing threads attempting access, eventually some competing thread succeeds, \ie acquires the CS, releases it, and returns to the NCS.
|
---|
24 | % Lamport \cite[p.~329]{Lam86mx} extends this requirement to the exit protocol.
|
---|
25 | A stronger constraint is that every thread that calls @acquire@ eventually succeeds after some reasonable bounded time.
|
---|
26 |
|
---|
27 | \section{Monitor}
|
---|
28 | \CFA provides a high-level locking object, called a \Newterm{monitor}, an elegant and efficient abstraction for providing mutual exclusion and synchronization for shared-memory systems.
|
---|
29 | First proposed by Brinch Hansen~\cite{Hansen73} and later described and extended by C.A.R.~Hoare~\cite{Hoare74}.
|
---|
30 | Several concurrent programming languages provide monitors as an explicit language construct: \eg Concurrent Pascal~\cite{ConcurrentPascal}, Mesa~\cite{Mesa}, Turing~\cite{Turing:old}, Modula-3~\cite{Modula-3}, \uC~\cite{Buhr92a} and Java~\cite{Java}.
|
---|
31 | In addition, operating-system kernels and device drivers have a monitor-like structure, although they often use lower-level primitives such as mutex locks or semaphores to manually implement a monitor.
|
---|
32 |
|
---|
33 | Figure~\ref{f:AtomicCounter} shows a \CFA and Java monitor implementing an atomic counter.
|
---|
34 | A \Newterm{monitor} is a programming technique that implicitly binds mutual exclusion to static function scope by call and return.
|
---|
35 | In contrast, lock mutual exclusion, defined by acquire/release calls, is independent of lexical context (analogous to block versus heap storage allocation).
|
---|
36 | Restricting acquire and release points in a monitor eases programming, comprehension, and maintenance, at a slight cost in flexibility and efficiency.
|
---|
37 | Ultimately, a monitor is implemented using a combination of basic locks and atomic instructions.
|
---|
38 |
|
---|
39 | \begin{figure}
|
---|
40 | \centering
|
---|
41 |
|
---|
42 | \begin{lrbox}{\myboxA}
|
---|
43 | \begin{cfa}[aboveskip=0pt,belowskip=0pt]
|
---|
44 | @monitor@ Aint {
|
---|
45 | int cnt;
|
---|
46 | };
|
---|
47 | int ++?( Aint & @mutex@ m ) { return ++m.cnt; }
|
---|
48 | int ?=?( Aint & @mutex@ l, int r ) { l.cnt = r; }
|
---|
49 | int ?=?(int & l, Aint & r) { l = r.cnt; }
|
---|
50 |
|
---|
51 | int i = 0, j = 0;
|
---|
52 | Aint x = { 0 }, y = { 0 }; $\C[1.5in]{// no mutex}$
|
---|
53 | ++x; ++y; $\C{// mutex}$
|
---|
54 | x = 2; y = i; $\C{// mutex}$
|
---|
55 | i = x; j = y; $\C{// no mutex}\CRT$
|
---|
56 | \end{cfa}
|
---|
57 | \end{lrbox}
|
---|
58 |
|
---|
59 | \begin{lrbox}{\myboxB}
|
---|
60 | \begin{java}[aboveskip=0pt,belowskip=0pt]
|
---|
61 | class Aint {
|
---|
62 | private int cnt;
|
---|
63 | public Aint( int init ) { cnt = init; }
|
---|
64 | @synchronized@ public int inc() { return ++cnt; }
|
---|
65 | @synchronized@ public void set( int r ) {cnt = r;}
|
---|
66 | public int get() { return cnt; }
|
---|
67 | }
|
---|
68 | int i = 0, j = 0;
|
---|
69 | Aint x = new Aint( 0 ), y = new Aint( 0 );
|
---|
70 | x.inc(); y.inc();
|
---|
71 | x.set( 2 ); y.set( i );
|
---|
72 | i = x.get(); j = y.get();
|
---|
73 | \end{java}
|
---|
74 | \end{lrbox}
|
---|
75 |
|
---|
76 | \subfloat[\CFA]{\label{f:AtomicCounterCFA}\usebox\myboxA}
|
---|
77 | \hspace*{3pt}
|
---|
78 | \vrule
|
---|
79 | \hspace*{3pt}
|
---|
80 | \subfloat[Java]{\label{f:AtomicCounterJava}\usebox\myboxB}
|
---|
81 | \caption{Atomic integer counter}
|
---|
82 | \label{f:AtomicCounter}
|
---|
83 | \end{figure}
|
---|
84 |
|
---|
85 | Like Java, \CFA monitors have \Newterm{multi-acquire} semantics so the thread in the monitor may acquire it multiple times without deadlock, allowing recursion and calling other MX functions.
|
---|
86 | For robustness, \CFA monitors ensure the monitor lock is released regardless of how an acquiring function ends, normal or exceptional, and returning a shared variable is safe via copying before the lock is released.
|
---|
87 | Monitor objects can be passed through multiple helper functions without acquiring mutual exclusion, until a designated function associated with the object is called.
|
---|
88 | \CFA functions are designated MX by one or more pointer/reference parameters having qualifier @mutex@.
|
---|
89 | Java members are designated MX with \lstinline[language=java]{synchronized}, which applies only to the implicit receiver parameter.
|
---|
90 | In the example, the increment and setter operations need mutual exclusion, while the read-only getter operation is not MX because reading an integer is atomic.
|
---|
91 |
|
---|
92 | As stated, the non-object-oriented nature of \CFA monitors allows a function to acquire multiple mutex objects.
|
---|
93 | For example, the bank-transfer problem requires locking two bank accounts to safely debit and credit money between accounts.
|
---|
94 | \begin{cfa}
|
---|
95 | monitor BankAccount {
|
---|
96 | int balance;
|
---|
97 | };
|
---|
98 | void deposit( BankAccount & mutex b, int deposit ) with( b ) {
|
---|
99 | balance += deposit;
|
---|
100 | }
|
---|
101 | void transfer( BankAccount & mutex my, BankAccount & mutex your, int me2you ) {
|
---|
102 | deposit( my, -me2you ); $\C{// debit}$
|
---|
103 | deposit( your, me2you ); $\C{// credit}$
|
---|
104 | }
|
---|
105 | \end{cfa}
|
---|
106 | The \CFA monitor implementation ensures multi-lock acquisition is done in a deadlock-free manner regardless of the number of MX parameters and monitor arguments.
|
---|
107 |
|
---|
108 |
|
---|
109 | \section{\lstinline{mutex} statement}
|
---|
110 | Restricting implicit lock acquisition to function entry and exit can be awkward for certain problems.
|
---|
111 | To increase locking flexibility, some languages introduce a mutex statement.
|
---|
112 | \VRef[Figure]{f:ReadersWriter} shows the outline of a reader/writer lock written as a \CFA monitor and mutex statements.
|
---|
113 | (The exact lock implementation is irrelevant.)
|
---|
114 | The @read@ and @write@ functions are called with a reader/write lock and any arguments to perform reading or writing.
|
---|
115 | The @read@ function is not MX because multiple readers can read simultaneously.
|
---|
116 | MX is acquired within @read@ by calling the (nested) helper functions @StartRead@ and @EndRead@ or executing the mutex statements.
|
---|
117 | Between the calls or statements, reads can execute simultaneous within the body of @read@.
|
---|
118 | The @write@ function does not require refactoring because writing is a CS.
|
---|
119 | The mutex-statement version is better because it has fewer names, less argument/parameter passing, and can possibly hold MX for a shorter duration.
|
---|
120 |
|
---|
121 | \begin{figure}
|
---|
122 | \centering
|
---|
123 |
|
---|
124 | \begin{lrbox}{\myboxA}
|
---|
125 | \begin{cfa}[aboveskip=0pt,belowskip=0pt]
|
---|
126 | monitor RWlock { ... };
|
---|
127 | void read( RWlock & rw, ... ) {
|
---|
128 | void StartRead( RWlock & @mutex@ rw ) { ... }
|
---|
129 | void EndRead( RWlock & @mutex@ rw ) { ... }
|
---|
130 | StartRead( rw );
|
---|
131 | ... // read without MX
|
---|
132 | EndRead( rw );
|
---|
133 | }
|
---|
134 | void write( RWlock & @mutex@ rw, ... ) {
|
---|
135 | ... // write with MX
|
---|
136 | }
|
---|
137 | \end{cfa}
|
---|
138 | \end{lrbox}
|
---|
139 |
|
---|
140 | \begin{lrbox}{\myboxB}
|
---|
141 | \begin{cfa}[aboveskip=0pt,belowskip=0pt]
|
---|
142 |
|
---|
143 | void read( RWlock & rw, ... ) {
|
---|
144 |
|
---|
145 |
|
---|
146 | @mutex@( rw ) { ... }
|
---|
147 | ... // read without MX
|
---|
148 | @mutex@{ rw ) { ... }
|
---|
149 | }
|
---|
150 | void write( RWlock & @mutex@ rw, ... ) {
|
---|
151 | ... // write with MX
|
---|
152 | }
|
---|
153 | \end{cfa}
|
---|
154 | \end{lrbox}
|
---|
155 |
|
---|
156 | \subfloat[monitor]{\label{f:RWmonitor}\usebox\myboxA}
|
---|
157 | \hspace*{3pt}
|
---|
158 | \vrule
|
---|
159 | \hspace*{3pt}
|
---|
160 | \subfloat[mutex statement]{\label{f:RWmutexstmt}\usebox\myboxB}
|
---|
161 | \caption{Readers writer problem}
|
---|
162 | \label{f:ReadersWriter}
|
---|
163 | \end{figure}
|
---|
164 |
|
---|
165 | This work adds a mutex statement to \CFA, but generalizes it beyond implicit monitor locks.
|
---|
166 | In detail, the mutex statement has a clause and statement block, similar to a conditional or loop statement.
|
---|
167 | The clause accepts any number of lockable objects (like a \CFA MX function prototype), and locks them for the duration of the statement.
|
---|
168 | The locks are acquired in a deadlock free manner and released regardless of how control-flow exits the statement.
|
---|
169 | The mutex statement provides easy lock usage in the common case of lexically wrapping a CS.
|
---|
170 | Examples of \CFA mutex statement are shown in \VRef[Listing]{l:cfa_mutex_ex}.
|
---|
171 |
|
---|
172 | \begin{cfa}[caption={\CFA mutex statement usage},label={l:cfa_mutex_ex}]
|
---|
173 | owner_lock lock1, lock2, lock3;
|
---|
174 | @mutex@( lock2, lock3 ) ...; $\C{// inline statement}$
|
---|
175 | @mutex@( lock1, lock2, lock3 ) { ... } $\C{// statement block}$
|
---|
176 | void transfer( BankAccount & my, BankAccount & your, int me2you ) {
|
---|
177 | ... // check values, no MX
|
---|
178 | @mutex@( my, your ) { // MX is shorter duration that function body
|
---|
179 | deposit( my, -me2you ); $\C{// debit}$
|
---|
180 | deposit( your, me2you ); $\C{// credit}$
|
---|
181 | }
|
---|
182 | }
|
---|
183 | \end{cfa}
|
---|
184 |
|
---|
185 | \section{Other Languages}
|
---|
186 | There are similar constructs to the mutex statement in other programming languages.
|
---|
187 | Java has a feature called a synchronized statement, which looks like the \CFA's mutex statement, but only accepts a single object in the clause and only handles monitor locks.
|
---|
188 | The \CC standard library has a @scoped_lock@, which is also similar to the mutex statement.
|
---|
189 | The @scoped_lock@ takes any number of locks in its constructor, and acquires them in a deadlock-free manner.
|
---|
190 | It then releases them when the @scoped_lock@ object is deallocated using \gls{raii}.
|
---|
191 | An example of \CC @scoped_lock@ is shown in \VRef[Listing]{l:cc_scoped_lock}.
|
---|
192 |
|
---|
193 | \begin{cfa}[caption={\CC \lstinline{scoped_lock} usage},label={l:cc_scoped_lock}]
|
---|
194 | struct BankAccount {
|
---|
195 | @recursive_mutex m;@ $\C{// must be recursive}$
|
---|
196 | int balance = 0;
|
---|
197 | };
|
---|
198 | void deposit( BankAccount & b, int deposit ) {
|
---|
199 | @scoped_lock lock( b.m );@ $\C{// RAII acquire}$
|
---|
200 | b.balance += deposit;
|
---|
201 | } $\C{// RAII release}$
|
---|
202 | void transfer( BankAccount & my, BankAccount & your, int me2you ) {
|
---|
203 | @scoped_lock lock( my.m, your.m );@ $\C{// RAII acquire}$
|
---|
204 | deposit( my, -me2you ); $\C{// debit}$
|
---|
205 | deposit( your, me2you ); $\C{// credit}$
|
---|
206 | } $\C{// RAII release}$
|
---|
207 | \end{cfa}
|
---|
208 |
|
---|
209 | \section{\CFA implementation}
|
---|
210 | The \CFA mutex statement takes some ideas from both the Java and \CC features.
|
---|
211 | Like Java, \CFA introduces a new statement rather than building from existing language features.
|
---|
212 | (\CFA has sufficient language features to mimic \CC RAII locking.)
|
---|
213 | This syntactic choice makes MX explicit rather than implicit via object declarations.
|
---|
214 | Hence, it is easier for programmers and language tools to identify MX points in a program, \eg scan for all @mutex@ parameters and statements in a body of code.
|
---|
215 | Furthermore, concurrent safety is provided across an entire program for the complex operation of acquiring multiple locks in a deadlock-free manner.
|
---|
216 | Unlike Java, \CFA's mutex statement and \CC's @scoped_lock@ both use parametric polymorphism to allow user defined types to work with this feature.
|
---|
217 | In this case, the polymorphism allows a locking mechanism to acquire MX over an object without having to know the object internals or what kind of lock it is using.
|
---|
218 | \CFA's provides and uses this locking trait:
|
---|
219 | \begin{cfa}
|
---|
220 | forall( L & | sized(L) )
|
---|
221 | trait is_lock {
|
---|
222 | void lock( L & );
|
---|
223 | void unlock( L & );
|
---|
224 | };
|
---|
225 | \end{cfa}
|
---|
226 | \CC @scoped_lock@ has this trait implicitly based on functions accessed in a template.
|
---|
227 | @scoped_lock@ also requires @try_lock@ because of its technique for deadlock avoidance \see{\VRef{s:DeadlockAvoidance}}.
|
---|
228 |
|
---|
229 | The following shows how the @mutex@ statement is used with \CFA streams to eliminate unpredictable results when printing in a concurrent program.
|
---|
230 | For example, if two threads execute:
|
---|
231 | \begin{cfa}
|
---|
232 | thread$\(_1\)$ : sout | "abc" | "def";
|
---|
233 | thread$\(_2\)$ : sout | "uvw" | "xyz";
|
---|
234 | \end{cfa}
|
---|
235 | any of the outputs can appear, included a segment fault due to I/O buffer corruption:
|
---|
236 | \begin{cquote}
|
---|
237 | \small\tt
|
---|
238 | \begin{tabular}{@{}l|l|l|l|l@{}}
|
---|
239 | abc def & abc uvw xyz & uvw abc xyz def & abuvwc dexf & uvw abc def \\
|
---|
240 | uvw xyz & def & & yz & xyz
|
---|
241 | \end{tabular}
|
---|
242 | \end{cquote}
|
---|
243 | The stream type for @sout@ is defined to satisfy the @is_lock@ trait, so the @mutex@ statement can be used to lock an output stream while producing output.
|
---|
244 | From the programmer's perspective, it is sufficient to know an object can be locked and then any necessary MX is easily available via the @mutex@ statement.
|
---|
245 | This ability improves safety and programmer productivity since it abstracts away the concurrent details.
|
---|
246 | Hence, a programmer can easily protect cascaded I/O expressions:
|
---|
247 | \begin{cfa}
|
---|
248 | thread$\(_1\)$ : mutex( sout ) sout | "abc" | "def";
|
---|
249 | thread$\(_2\)$ : mutex( sout ) sout | "uvw" | "xyz";
|
---|
250 | \end{cfa}
|
---|
251 | constraining the output to two different lines in either order:
|
---|
252 | \begin{cquote}
|
---|
253 | \small\tt
|
---|
254 | \begin{tabular}{@{}l|l@{}}
|
---|
255 | abc def & uvw xyz \\
|
---|
256 | uvw xyz & abc def
|
---|
257 | \end{tabular}
|
---|
258 | \end{cquote}
|
---|
259 | where this level of safe nondeterministic output is acceptable.
|
---|
260 | Alternatively, multiple I/O statements can be protected using the mutex statement block:
|
---|
261 | \begin{cfa}
|
---|
262 | mutex( sout ) { // acquire stream lock for sout for block duration
|
---|
263 | sout | "abc";
|
---|
264 | mutex( sout ) sout | "uvw" | "xyz"; // OK because sout lock is recursive
|
---|
265 | sout | "def";
|
---|
266 | } // implicitly release sout lock
|
---|
267 | \end{cfa}
|
---|
268 | The inner lock acquire is likely to occur through a function call that does a thread-safe print.
|
---|
269 |
|
---|
270 | \section{Deadlock Avoidance}\label{s:DeadlockAvoidance}
|
---|
271 | The mutex statement uses the deadlock avoidance technique of lock ordering, where the circular-wait condition of a deadlock cannot occur if all locks are acquired in the same order.
|
---|
272 | The @scoped_lock@ uses a deadlock avoidance algorithm where all locks after the first are acquired using @try_lock@ and if any of the lock attempts fail, all acquired locks are released.
|
---|
273 | This repeats after selecting a new starting point in a cyclic manner until all locks are acquired successfully.
|
---|
274 | This deadlock avoidance algorithm is shown in Figure~\ref{f:cc_deadlock_avoid}.
|
---|
275 | The algorithm is taken directly from the source code of the @<mutex>@ header, with some renaming and comments for clarity.
|
---|
276 |
|
---|
277 | \begin{figure}
|
---|
278 | \begin{cfa}
|
---|
279 | int first = 0; // first lock to attempt to lock
|
---|
280 | do {
|
---|
281 | // locks is the array of locks to acquire
|
---|
282 | locks[first].lock(); $\C{// lock first lock}$
|
---|
283 | for ( int i = 1; i < Num_Locks; i += 1 ) { $\C{// iterate over rest of locks}$
|
---|
284 | const int idx = (first + i) % Num_Locks;
|
---|
285 | if ( ! locks[idx].try_lock() ) { $\C{// try lock each one}$
|
---|
286 | for ( int j = i; j != 0; j -= 1 ) $\C{// release all locks}$
|
---|
287 | locks[(first + j - 1) % Num_Locks].unlock();
|
---|
288 | first = idx; $\C{// rotate which lock to acquire first}$
|
---|
289 | break;
|
---|
290 | }
|
---|
291 | }
|
---|
292 | // if first lock is still held then all have been acquired
|
---|
293 | } while ( ! locks[first].owns_lock() ); $\C{// is first lock held?}$
|
---|
294 | \end{cfa}
|
---|
295 | \caption{\CC \lstinline{scoped_lock} deadlock avoidance algorithm}
|
---|
296 | \label{f:cc_deadlock_avoid}
|
---|
297 | \end{figure}
|
---|
298 |
|
---|
299 | While this algorithm successfully avoids deadlock, there is a livelock scenario.
|
---|
300 | Assume two threads, $A$ and $B$, create a @scoped_lock@ accessing two locks, $L1$ and $L2$.
|
---|
301 | A livelock can form as follows.
|
---|
302 | Thread $A$ creates a @scoped_lock@ with arguments $L1$, $L2$, and $B$ creates a scoped lock with the lock arguments in the opposite order $L2$, $L1$.
|
---|
303 | Both threads acquire the first lock in their order and then fail the @try_lock@ since the other lock is held.
|
---|
304 | Both threads then reset their starting lock to be their second lock and try again.
|
---|
305 | This time $A$ has order $L2$, $L1$, and $B$ has order $L1$, $L2$, which is identical to the starting setup but with the ordering swapped between threads.
|
---|
306 | If the threads perform this action in lock-step, they cycle indefinitely without entering the CS, \ie livelock.
|
---|
307 | Hence, to use @scoped_lock@ safely, a programmer must manually construct and maintain a global ordering of lock arguments passed to @scoped_lock@.
|
---|
308 |
|
---|
309 | The lock ordering algorithm used in \CFA mutex functions and statements is deadlock and livelock free.
|
---|
310 | The algorithm uses the lock memory addresses as keys, sorts the keys, and then acquires the locks in sorted order.
|
---|
311 | For fewer than 7 locks ($2^3-1$), the sort is unrolled performing the minimum number of compare and swaps for the given number of locks;
|
---|
312 | for 7 or more locks, insertion sort is used.
|
---|
313 | Since it is extremely rare to hold more than 6 locks at a time, the algorithm is fast and executes in $O(1)$ time.
|
---|
314 | Furthermore, lock addresses are unique across program execution, even for dynamically allocated locks, so the algorithm is safe across the entire program execution.
|
---|
315 |
|
---|
316 | The downside to the sorting approach is that it is not fully compatible with manual usages of the same locks outside the @mutex@ statement, \ie the lock are acquired without using the @mutex@ statement.
|
---|
317 | The following scenario is a classic deadlock.
|
---|
318 | \begin{cquote}
|
---|
319 | \begin{tabular}{@{}l@{\hspace{30pt}}l@{}}
|
---|
320 | \begin{cfa}
|
---|
321 | lock L1, L2; // assume &L1 < &L2
|
---|
322 | $\textbf{thread\(_1\)}$
|
---|
323 | acquire( L2 );
|
---|
324 | acquire( L1 );
|
---|
325 | CS
|
---|
326 | release( L1 );
|
---|
327 | release( L2 );
|
---|
328 | \end{cfa}
|
---|
329 | &
|
---|
330 | \begin{cfa}
|
---|
331 |
|
---|
332 | $\textbf{thread\(_2\)}$
|
---|
333 | mutex( L1, L2 ) {
|
---|
334 |
|
---|
335 | CS
|
---|
336 |
|
---|
337 | }
|
---|
338 | \end{cfa}
|
---|
339 | \end{tabular}
|
---|
340 | \end{cquote}
|
---|
341 | Comparatively, if the @scoped_lock@ is used and the same locks are acquired elsewhere, there is no concern of the @scoped_lock@ deadlocking, due to its avoidance scheme, but it may livelock.
|
---|
342 | The convenience and safety of the @mutex@ statement, \eg guaranteed lock release with exceptions, should encourage programmers to always use it for locking, mitigating any deadlock scenario versus combining manual locking with the mutex statement.
|
---|
343 |
|
---|
344 | \section{Performance}
|
---|
345 | Given the two multi-acquisition algorithms in \CC and \CFA, each with differing advantages and disadvantages, it interesting to compare their performance.
|
---|
346 | Comparison with Java is not possible, since it only takes a single lock.
|
---|
347 |
|
---|
348 | The comparison starts with a baseline that acquires the locks directly without a mutex statement or @scoped_lock@ in a fixed ordering and then releases them.
|
---|
349 | The baseline helps highlight the cost of the deadlock avoidance/prevention algorithms for each implementation.
|
---|
350 |
|
---|
351 | The benchmark used to evaluate the avoidance algorithms repeatedly acquires a fixed number of locks in a random order and then releases them.
|
---|
352 | The pseudocode for the deadlock avoidance benchmark is shown in \VRef[Listing]{l:deadlock_avoid_pseudo}.
|
---|
353 | To ensure the comparison exercises the implementation of each lock avoidance algorithm, an identical spinlock is implemented in each language using a set of builtin atomics available in both \CC and \CFA.
|
---|
354 | The benchmarks are run for a fixed duration of 10 seconds and then terminate.
|
---|
355 | The total number of times the group of locks is acquired is returned for each thread.
|
---|
356 | Each variation is run 11 times on 2, 4, 8, 16, 24, 32 cores and with 2, 4, and 8 locks being acquired.
|
---|
357 | The median is calculated and is plotted alongside the 95\% confidence intervals for each point.
|
---|
358 |
|
---|
359 | \begin{cfa}[caption={Deadlock avoidance benchmark pseudocode},label={l:deadlock_avoid_pseudo}]
|
---|
360 |
|
---|
361 | size_t n_locks; $\C{// number of locks}$
|
---|
362 | size_t n_thds; $\C{// number of threads}$
|
---|
363 | size_t n_gens; $\C{// number of random orderings (default 100)}$
|
---|
364 | size_t total = 0; $\C{// global throughput aggregator}$
|
---|
365 | volatile bool done = false; $\C{// termination flag}$
|
---|
366 |
|
---|
367 | test_spinlock locks[n_locks];
|
---|
368 | size_t rands[n_thds][n_locks * n_gens]; $\C{// random ordering per thread}$
|
---|
369 |
|
---|
370 | void main( worker & w ) with(w) { $\C{// thread main}$
|
---|
371 | size_t count = 0, idx = 0;
|
---|
372 | while ( !done ) {
|
---|
373 | idx = (count % n_locks * n_gens) * n_locks; $\C{// get start of next sequence}$
|
---|
374 | mutex(locks[rands[0]], ..., locks[rands[n_locks - 1]]){} $\C{// lock sequence of locks}$
|
---|
375 | count++;
|
---|
376 | }
|
---|
377 | __atomic_add_fetch(&total, count, __ATOMIC_SEQ_CST); $\C{// atomically add to total}$
|
---|
378 | }
|
---|
379 |
|
---|
380 | int main( int argc, char * argv[] ) {
|
---|
381 | gen_orders(); $\C{// generate random orderings}$
|
---|
382 | {
|
---|
383 | worker w[n_thds];
|
---|
384 | sleep( 10`s );
|
---|
385 | done = true;
|
---|
386 | }
|
---|
387 | printf( "%lu\n", total );
|
---|
388 | }
|
---|
389 |
|
---|
390 | \end{cfa}
|
---|
391 |
|
---|
392 | The performance experiments were run on the following multi-core hardware systems to determine differences across platforms:
|
---|
393 | \begin{list}{\arabic{enumi}.}{\usecounter{enumi}\topsep=5pt\parsep=5pt\itemsep=0pt}
|
---|
394 | % sudo dmidecode -t system
|
---|
395 | \item
|
---|
396 | Supermicro AS--1123US--TR4 AMD EPYC 7662 64--core socket, hyper-threading $\times$ 2 sockets (256 processing units) 2.0 GHz, TSO memory model, running Linux v5.8.0--55--generic, gcc--10 compiler
|
---|
397 | \item
|
---|
398 | Supermicro SYS--6029U--TR4 Intel Xeon Gold 5220R 24--core socket, hyper-threading $\times$ 2 sockets (48 processing units) 2.2GHz, TSO memory model, running Linux v5.8.0--59--generic, gcc--10 compiler
|
---|
399 | \end{list}
|
---|
400 | %The hardware architectures are different in threading (multithreading vs hyper), cache structure (MESI or MESIF), NUMA layout (QPI vs HyperTransport), memory model (TSO vs WO), and energy/thermal mechanisms (turbo-boost).
|
---|
401 | %Software that runs well on one architecture may run poorly or not at all on another.
|
---|
402 |
|
---|
403 | Figure~\ref{f:mutex_bench} shows the results of the benchmark experiments.
|
---|
404 | The baseline results for both languages are mostly comparable, except for the 8 locks results in \ref{f:mutex_bench8_AMD} and \ref{f:mutex_bench8_Intel}, where the \CFA baseline is slightly slower.
|
---|
405 | The avoidance result for both languages is significantly different, where \CFA's mutex statement achieves throughput that is magnitudes higher than \CC's @scoped_lock@.
|
---|
406 | The slowdown for @scoped_lock@ is likely due to its deadlock-avoidance implementation.
|
---|
407 | Since it uses a retry based mechanism, it can take a long time for threads to progress.
|
---|
408 | Additionally the potential for livelock in the algorithm can result in very little throughput under high contention.
|
---|
409 | For example, on the AMD machine with 32 threads and 8 locks, the benchmarks would occasionally livelock indefinitely, with no threads making any progress for 3 hours before the experiment was terminated manually.
|
---|
410 | It is likely that shorter bouts of livelock occurred in many of the experiments, which would explain large confidence intervals for some of the data points in the \CC data.
|
---|
411 | In Figures~\ref{f:mutex_bench8_AMD} and \ref{f:mutex_bench8_Intel} there is the counter-intuitive result of the mutex statement performing better than the baseline.
|
---|
412 | At 7 locks and above the mutex statement switches from a hard coded sort to insertion sort, which should decrease performance.
|
---|
413 | It is likely the increase in throughput compared to baseline is due to the delay spent in the insertion sort, which decreases contention on the locks.
|
---|
414 |
|
---|
415 | \begin{figure}
|
---|
416 | \centering
|
---|
417 | \subfloat[AMD]{
|
---|
418 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Aggregate_Lock_2.pgf}}
|
---|
419 | }
|
---|
420 | \subfloat[Intel]{
|
---|
421 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Aggregate_Lock_2.pgf}}
|
---|
422 | }
|
---|
423 |
|
---|
424 | \subfloat[AMD]{
|
---|
425 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Aggregate_Lock_4.pgf}}
|
---|
426 | }
|
---|
427 | \subfloat[Intel]{
|
---|
428 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Aggregate_Lock_4.pgf}}
|
---|
429 | }
|
---|
430 |
|
---|
431 | \subfloat[AMD]{
|
---|
432 | \resizebox{0.5\textwidth}{!}{\input{figures/nasus_Aggregate_Lock_8.pgf}}
|
---|
433 | \label{f:mutex_bench8_AMD}
|
---|
434 | }
|
---|
435 | \subfloat[Intel]{
|
---|
436 | \resizebox{0.5\textwidth}{!}{\input{figures/pyke_Aggregate_Lock_8.pgf}}
|
---|
437 | \label{f:mutex_bench8_Intel}
|
---|
438 | }
|
---|
439 | \caption{The aggregate lock benchmark comparing \CC \lstinline{scoped_lock} and \CFA mutex statement throughput (higher is better).}
|
---|
440 | \label{f:mutex_bench}
|
---|
441 | \end{figure}
|
---|
442 |
|
---|
443 | % Local Variables: %
|
---|
444 | % tab-width: 4 %
|
---|
445 | % End: %
|
---|