Context Navigation

-                      r7a0f798b
+                      ra44514e
 The goal in this chapter is show the \CFA scheduler obtains equivalent performance to other less fair schedulers through the different experiments.
 Note, only the code of the \CFA tests is shown;
 all tests in the other systems are functionally identical and available online~\cite{SchedulingBenchmarks}.
+all tests in the other systems are functionally identical and available online~\cite{GITHUB:SchedulingBenchmarks}.
 \section{Benchmark Environment}\label{microenv}
 …
 For this reason, I designed a different push/pop benchmark, called \newterm{Cycle Benchmark}.
 This benchmark arranges a number of \ats into a ring, as seen in Figure~\ref{fig:cycle}, where the ring is a circular singly-linked list.
 At runtime, each \at unparks the next \at before parking itself.
 Unparking the next \at pushes that \at onto the ready queue while the ensuing park leads to a \at being popped from the ready queue.
+At runtime, each \at unparks the next \at before \glslink{atblock}{parking} itself.
+Unparking the next \at pushes that \at onto the ready queue while the ensuing \park leads to a \at being popped from the ready queue.
 \begin{figure}
         \centering
         \input{cycle.pstex_t}
         \caption[Cycle benchmark]{Cycle benchmark\smallskip\newline Each \at unparks the next \at in the cycle before parking itself.}
+        \caption[Cycle benchmark]{Cycle benchmark\smallskip\newline Each \at unparks the next \at in the cycle before \glslink{atblock}{parking} itself.}
         \label{fig:cycle}
 \end{figure}
 Therefore, the underlying runtime cannot rely on the number of ready \ats staying constant over the duration of the experiment.
 In fact, the total number of \ats waiting on the ready queue is expected to vary because of the race between the next \at unparking and the current \at parking.
+In fact, the total number of \ats waiting on the ready queue is expected to vary because of the race between the next \at \glslink{atsched}{unparking} and the current \at \glslink{atblock}{parking}.
 That is, the runtime cannot anticipate that the current task immediately parks.
 As well, the size of the cycle is also decided based on this race, \eg a small cycle may see the chain of unparks go full circle before the first \at parks because of time-slicing or multiple \procs.
 If this happens, the scheduler push and pop are avoided and the results of the experiment are skewed.
 (Note, an unpark is like a V on a semaphore, so the subsequent park (P) may not block.)
+(Note, an \unpark is like a V on a semaphore, so the subsequent \park (P) may not block.)
 Every runtime system must handle this race and cannot optimized away the ready-queue pushes and pops.
 To prevent any attempt of silently omitting ready-queue operations, the ring of \ats is made big enough so the \ats have time to fully park before being unparked again.
+To prevent any attempt of silently omitting ready-queue operations, the ring of \ats is made big enough so the \ats have time to fully \park before being unparked again.
 Finally, to further mitigate any underlying push/pop optimizations, especially on SMP machines, multiple rings are created in the experiment.
 Figure~\ref{fig:cycle:code} shows the pseudo code for this benchmark, where each cycle has 5 \ats.
 There is additional complexity to handle termination (not shown), which requires a binary semaphore or a channel instead of raw @park@/@unpark@ and carefully picking the order of the @P@ and @V@ with respect to the loop condition.
+There is additional complexity to handle termination (not shown), which requires a binary semaphore or a channel instead of raw \park/\unpark and carefully picking the order of the @P@ and @V@ with respect to the loop condition.
 \begin{figure}
 …
 An interesting aspect to note here is that the runtimes differ in how they handle this situation.
 Indeed, when a \proc unparks a \at that was last run on a different \proc, the \at could be appended to the ready queue of the local \proc or to the ready queue of the remote \proc, which previously ran the \at.
 \CFA, Tokio and Go all use the approach of unparking to the local \proc, while Libfibre unparks to the remote \proc.
+\CFA, Tokio and Go all use the approach of \glslink{atsched}{unparking} to the local \proc, while Libfibre unparks to the remote \proc.
 In this particular benchmark, the inherent chaos of the benchmark, in addition to small memory footprint, means neither approach wins over the other.
 …
 Up to 32 \procs, after which the other runtime manage to outscale Go.
 In conclusion, the objective of this benchmark is to demonstrate that unparking \ats from remote \procs does not cause too much contention on the local queues.
+In conclusion, the objective of this benchmark is to demonstrate that \glslink{atsched}{unparking} \ats from remote \procs does not cause too much contention on the local queues.
 Indeed, the fact that most runtimes achieve some scaling between various \proc count demonstrate migrations do not need to be serialized.
 Again these result demonstrate \CFA achieves satisfactory performance with respect to the other runtimes.
 …
 \section{Locality}
 As mentioned in the churn benchmark, when unparking a \at, it is possible to either unpark to the local or remote ready-queue.\footnote{
 It is also possible to unpark to a third unrelated ready-queue, but without additional knowledge about the situation, it is likely to degrade performance.}
+As mentioned in the churn benchmark, when \glslink{atsched}{unparking} a \at, it is possible to either \unpark to the local or remote ready-queue.\footnote{
+It is also possible to \unpark to a third unrelated ready-queue, but without additional knowledge about the situation, it is likely to degrade performance.}
 The locality experiment includes two variations of the churn benchmark, where a data array is added.
 In both variations, before @V@ing the semaphore, each \at calls a @work@ function which increments random cells inside the data array.
 …
 Figure~\ref{fig:locality:code} shows pseudo code for this benchmark.
 The objective here is to highlight the different decision made by the runtime when unparking.
+The objective here is to highlight the different decision made by the runtime when \glslink{atsched}{unparking}.
 Since each thread unparks a random semaphore, it means that it is unlikely that a \at is unparked from the last \proc it ran on.
 In the noshare variation, unparking the \at on the local \proc is an appropriate choice since the data was last modified on that \proc.
 In the shared variation, unparking the \at on a remote \proc is an appropriate choice.
 The expectation for this benchmark is to see a performance inversion, where runtimes fare notably better in the variation which matches their unparking policy.
+In the noshare variation, \glslink{atsched}{unparking} the \at on the local \proc is an appropriate choice since the data was last modified on that \proc.
+In the shared variation, \glslink{atsched}{unparking} the \at on a remote \proc is an appropriate choice.
+The expectation for this benchmark is to see a performance inversion, where runtimes fare notably better in the variation which matches their \glslink{atsched}{unparking} policy.
 This decision should lead to \CFA, Go and Tokio achieving better performance in the share variation while libfibre achieves better performance in noshare.
 Indeed, \CFA, Go and Tokio have the default policy of unparking \ats on the local \proc, where as libfibre has the default policy of unparking \ats wherever they last ran.
+Indeed, \CFA, Go and Tokio have the default policy of \glslink{atsched}{unparking} \ats on the local \proc, where as libfibre has the default policy of \glslink{atsched}{unparking} \ats wherever they last ran.
 \begin{figure}
 …
 \vrule
 \hspace{3pt}
 \subfloat[Share]{\label{fig:locality:code:T1}\usebox\myboxB}
+\subfloat[Share]{\label{fig:locality:code:T2}\usebox\myboxB}
 \caption[Locality Benchmark : Pseudo Code]{Locality Benchmark : Pseudo Code}
 …
 Looking at the left column on Intel, Figures~\ref{fig:locality:jax:share:ops} and \ref{fig:locality:jax:share:ns} show the results for the share variation.
 \CFA and Tokio slightly outperform libfibre, as expected, based on their \ats placement approach.
 \CFA and Tokio both unpark locally and do not suffer cache misses on the transferred array.
+\CFA and Tokio both \unpark locally and do not suffer cache misses on the transferred array.
 Libfibre on the other hand unparks remotely, and as such the unparked \at is likely to miss on the shared data.
 Go trails behind in this experiment, presumably for the same reasons that were observable in the churn benchmark.
 …
 Indeed, in this case, unparking remotely means the unparked \at is less likely to suffer a cache miss on the array, which leaves the \at data structure and the remote queue as the only source of likely cache misses.
 Results show both are amortized fairly well in this case.
 \CFA and Tokio both unpark locally and as a result suffer a marginal performance degradation from the cache miss on the array.
+\CFA and Tokio both \unpark locally and as a result suffer a marginal performance degradation from the cache miss on the array.
 Looking at the results for the AMD architecture, Figure~\ref{fig:locality:nasus}, shows results similar to the Intel.
 …
 Go still has the same poor performance.
 Overall, this benchmark mostly demonstrates the two options available when unparking a \at.
+Overall, this benchmark mostly demonstrates the two options available when \glslink{atsched}{unparking} a \at.
 Depending on the workload, either of these options can be the appropriate one.
 Since it is prohibitively difficult to dynamically detect which approach is appropriate, all runtimes much choose one of the two and live with the consequences.

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset a44514e for doc/theses/thierry_delisle_PhD/thesis/text/eval_micro.tex

Legend:

doc/theses/thierry_delisle_PhD/thesis/text/eval_micro.tex

Download in other formats: