source: doc/theses/thierry_delisle_PhD/thesis/text/eval_micro.tex @ 36a05d7

Last change on this file since 36a05d7 was 36a05d7, checked in by Thierry Delisle <tdelisle@…>, 6 months ago

Started doing some work on the eval portion of my thesis.

  • Property mode set to 100644
File size: 3.5 KB
3The first step of evaluation is always to test-out small controlled cases, to ensure that the basics are working properly.
4This sections presents four different experimental setup, evaluating some of the basic features of \CFA's scheduler.
6\section{Cycling latency}
7The most basic evaluation of any ready queue is to evaluate the latency needed to push and pop one element from the ready-queue.
8While these two operation also describe a \texttt{yield} operation, many systems use this as the most basic benchmark.
9However, yielding can be treated as a special case, since it also carries the information that the length of the ready queue will not change.
10Not all systems use this information, but those which do may appear to have better performance than they would for disconnected push/pop pairs.
11For this reason, I chose a different first benchmark, which I call the Cycle Benchmark.
12This benchmark arranges many threads into multiple rings of threads.
13Each ring is effectively a circular singly-linked list.
14At runtime, each thread unparks the next thread before parking itself.
15This corresponds to the desired pair of ready queue operations.
16Unparking the next thread requires pushing that thread onto the ready queue and the ensuing park will cause the runtime to pop a thread from the ready-queue.
17Figure~\ref{fig:cycle} shows a visual representation of this arrangement.
19The goal of this ring is that the underlying runtime cannot rely on the guarantee that the number of ready threads will stay constant over the duration of the experiment.
20In fact, the total number of threads waiting on the ready is expected to vary a little because of the race between the next thread unparking and the current thread parking.
21The size of the cycle is also decided based on this race: cycles that are too small may see the
22chain of unparks go full circle before the first thread can park.
23While this would not be a correctness problem, every runtime system must handle that race, it could lead to pushes and pops being optimized away.
24Since silently omitting ready-queue operations would throw off the measuring of these operations.
25Therefore the ring of threads must be big enough so the threads have the time to fully park before they are unparked.
26Note that this problem is only present on SMP machines and is significantly mitigated by the fact that there are multiple rings in the system.
29        \centering
30        \input{cycle.pstex_t}
31        \caption[Cycle benchmark]{Cycle benchmark\smallskip\newline Each thread unparks the next thread in the cycle before parking itself.}
32        \label{fig:cycle}
35\todo{check term ``idle sleep handling''}
36To avoid this benchmark from being dominated by the idle sleep handling, the number of rings is kept at least as high as the number of processors available.
37Beyond this point, adding more rings serves to mitigate even more the idle sleep handling.
38This is to avoid the case where one of the worker threads runs out of work because of the variation on the number of ready threads mentionned above.
40The actual benchmark is more complicated to handle termination, but that simply requires using a binary semphore or a channel instead of raw \texttt{park}/\texttt{unpark} and carefully picking the order of the \texttt{P} and \texttt{V} with respect to the loop condition.
42\todo{mention where to get the code.}
45For completion, I also include the yield benchmark.
46This benchmark is much simpler than the cycle tests, it simply creates many threads that call \texttt{yield}.
Note: See TracBrowser for help on using the repository browser.