1 | \chapter{Micro-Benchmarks}\label{microbench} |
---|
2 | |
---|
3 | The first step of evaluation is always to test-out small controlled cases, to ensure that the basics are working properly. |
---|
4 | This sections presents four different experimental setup, evaluating some of the basic features of \CFA's scheduler. |
---|
5 | |
---|
6 | \section{Cycling latency} |
---|
7 | The most basic evaluation of any ready queue is to evaluate the latency needed to push and pop one element from the ready-queue. |
---|
8 | While these two operation also describe a \texttt{yield} operation, many systems use this as the most basic benchmark. |
---|
9 | However, yielding can be treated as a special case, since it also carries the information that the length of the ready queue will not change. |
---|
10 | Not all systems use this information, but those which do may appear to have better performance than they would for disconnected push/pop pairs. |
---|
11 | For this reason, I chose a different first benchmark, which I call the Cycle Benchmark. |
---|
12 | This benchmark arranges many threads into multiple rings of threads. |
---|
13 | Each ring is effectively a circular singly-linked list. |
---|
14 | At runtime, each thread unparks the next thread before parking itself. |
---|
15 | This corresponds to the desired pair of ready queue operations. |
---|
16 | Unparking the next thread requires pushing that thread onto the ready queue and the ensuing park will cause the runtime to pop a thread from the ready-queue. |
---|
17 | Figure~\ref{fig:cycle} shows a visual representation of this arrangement. |
---|
18 | |
---|
19 | The goal of this ring is that the underlying runtime cannot rely on the guarantee that the number of ready threads will stay constant over the duration of the experiment. |
---|
20 | In fact, the total number of threads waiting on the ready is expected to vary a little because of the race between the next thread unparking and the current thread parking. |
---|
21 | The size of the cycle is also decided based on this race: cycles that are too small may see the |
---|
22 | chain of unparks go full circle before the first thread can park. |
---|
23 | While this would not be a correctness problem, every runtime system must handle that race, it could lead to pushes and pops being optimized away. |
---|
24 | Since silently omitting ready-queue operations would throw off the measuring of these operations. |
---|
25 | Therefore the ring of threads must be big enough so the threads have the time to fully park before they are unparked. |
---|
26 | Note that this problem is only present on SMP machines and is significantly mitigated by the fact that there are multiple rings in the system. |
---|
27 | |
---|
28 | \begin{figure} |
---|
29 | \centering |
---|
30 | \input{cycle.pstex_t} |
---|
31 | \caption[Cycle benchmark]{Cycle benchmark\smallskip\newline Each thread unparks the next thread in the cycle before parking itself.} |
---|
32 | \label{fig:cycle} |
---|
33 | \end{figure} |
---|
34 | |
---|
35 | \todo{check term ``idle sleep handling''} |
---|
36 | To avoid this benchmark from being dominated by the idle sleep handling, the number of rings is kept at least as high as the number of processors available. |
---|
37 | Beyond this point, adding more rings serves to mitigate even more the idle sleep handling. |
---|
38 | This is to avoid the case where one of the worker threads runs out of work because of the variation on the number of ready threads mentionned above. |
---|
39 | |
---|
40 | The actual benchmark is more complicated to handle termination, but that simply requires using a binary semphore or a channel instead of raw \texttt{park}/\texttt{unpark} and carefully picking the order of the \texttt{P} and \texttt{V} with respect to the loop condition. |
---|
41 | |
---|
42 | \todo{mention where to get the code.} |
---|
43 | |
---|
44 | \section{Yield} |
---|
45 | For completion, I also include the yield benchmark. |
---|
46 | This benchmark is much simpler than the cycle tests, it simply creates many threads that call \texttt{yield}. |
---|
47 | |
---|
48 | \section{Locality} |
---|
49 | |
---|
50 | \section{Transfer} |
---|