Ignore:
Timestamp:
Sep 7, 2022, 4:12:00 PM (20 months ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
ADT, ast-experimental, master, pthread-emulation
Children:
e4855f6
Parents:
7a0f798b
Message:

A whole bunch of small changes:
trying to setup a version that I can pass through a spell checker.
Fixing a whole bunch of grammar errors

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/thierry_delisle_PhD/thesis/text/eval_micro.tex

    r7a0f798b ra44514e  
    66The goal in this chapter is show the \CFA scheduler obtains equivalent performance to other less fair schedulers through the different experiments.
    77Note, only the code of the \CFA tests is shown;
    8 all tests in the other systems are functionally identical and available online~\cite{SchedulingBenchmarks}.
     8all tests in the other systems are functionally identical and available online~\cite{GITHUB:SchedulingBenchmarks}.
    99
    1010\section{Benchmark Environment}\label{microenv}
     
    6363For this reason, I designed a different push/pop benchmark, called \newterm{Cycle Benchmark}.
    6464This benchmark arranges a number of \ats into a ring, as seen in Figure~\ref{fig:cycle}, where the ring is a circular singly-linked list.
    65 At runtime, each \at unparks the next \at before parking itself.
    66 Unparking the next \at pushes that \at onto the ready queue while the ensuing park leads to a \at being popped from the ready queue.
     65At runtime, each \at unparks the next \at before \glslink{atblock}{parking} itself.
     66Unparking the next \at pushes that \at onto the ready queue while the ensuing \park leads to a \at being popped from the ready queue.
    6767
    6868\begin{figure}
    6969        \centering
    7070        \input{cycle.pstex_t}
    71         \caption[Cycle benchmark]{Cycle benchmark\smallskip\newline Each \at unparks the next \at in the cycle before parking itself.}
     71        \caption[Cycle benchmark]{Cycle benchmark\smallskip\newline Each \at unparks the next \at in the cycle before \glslink{atblock}{parking} itself.}
    7272        \label{fig:cycle}
    7373\end{figure}
    7474
    7575Therefore, the underlying runtime cannot rely on the number of ready \ats staying constant over the duration of the experiment.
    76 In fact, the total number of \ats waiting on the ready queue is expected to vary because of the race between the next \at unparking and the current \at parking.
     76In fact, the total number of \ats waiting on the ready queue is expected to vary because of the race between the next \at \glslink{atsched}{unparking} and the current \at \glslink{atblock}{parking}.
    7777That is, the runtime cannot anticipate that the current task immediately parks.
    7878As well, the size of the cycle is also decided based on this race, \eg a small cycle may see the chain of unparks go full circle before the first \at parks because of time-slicing or multiple \procs.
    7979If this happens, the scheduler push and pop are avoided and the results of the experiment are skewed.
    80 (Note, an unpark is like a V on a semaphore, so the subsequent park (P) may not block.)
     80(Note, an \unpark is like a V on a semaphore, so the subsequent \park (P) may not block.)
    8181Every runtime system must handle this race and cannot optimized away the ready-queue pushes and pops.
    82 To prevent any attempt of silently omitting ready-queue operations, the ring of \ats is made big enough so the \ats have time to fully park before being unparked again.
     82To prevent any attempt of silently omitting ready-queue operations, the ring of \ats is made big enough so the \ats have time to fully \park before being unparked again.
    8383Finally, to further mitigate any underlying push/pop optimizations, especially on SMP machines, multiple rings are created in the experiment.
    8484
    8585Figure~\ref{fig:cycle:code} shows the pseudo code for this benchmark, where each cycle has 5 \ats.
    86 There is additional complexity to handle termination (not shown), which requires a binary semaphore or a channel instead of raw @park@/@unpark@ and carefully picking the order of the @P@ and @V@ with respect to the loop condition.
     86There is additional complexity to handle termination (not shown), which requires a binary semaphore or a channel instead of raw \park/\unpark and carefully picking the order of the @P@ and @V@ with respect to the loop condition.
    8787
    8888\begin{figure}
     
    416416An interesting aspect to note here is that the runtimes differ in how they handle this situation.
    417417Indeed, when a \proc unparks a \at that was last run on a different \proc, the \at could be appended to the ready queue of the local \proc or to the ready queue of the remote \proc, which previously ran the \at.
    418 \CFA, Tokio and Go all use the approach of unparking to the local \proc, while Libfibre unparks to the remote \proc.
     418\CFA, Tokio and Go all use the approach of \glslink{atsched}{unparking} to the local \proc, while Libfibre unparks to the remote \proc.
    419419In this particular benchmark, the inherent chaos of the benchmark, in addition to small memory footprint, means neither approach wins over the other.
    420420
     
    485485Up to 32 \procs, after which the other runtime manage to outscale Go.
    486486
    487 In conclusion, the objective of this benchmark is to demonstrate that unparking \ats from remote \procs does not cause too much contention on the local queues.
     487In conclusion, the objective of this benchmark is to demonstrate that \glslink{atsched}{unparking} \ats from remote \procs does not cause too much contention on the local queues.
    488488Indeed, the fact that most runtimes achieve some scaling between various \proc count demonstrate migrations do not need to be serialized.
    489489Again these result demonstrate \CFA achieves satisfactory performance with respect to the other runtimes.
     
    491491\section{Locality}
    492492
    493 As mentioned in the churn benchmark, when unparking a \at, it is possible to either unpark to the local or remote ready-queue.\footnote{
    494 It is also possible to unpark to a third unrelated ready-queue, but without additional knowledge about the situation, it is likely to degrade performance.}
     493As mentioned in the churn benchmark, when \glslink{atsched}{unparking} a \at, it is possible to either \unpark to the local or remote ready-queue.\footnote{
     494It is also possible to \unpark to a third unrelated ready-queue, but without additional knowledge about the situation, it is likely to degrade performance.}
    495495The locality experiment includes two variations of the churn benchmark, where a data array is added.
    496496In both variations, before @V@ing the semaphore, each \at calls a @work@ function which increments random cells inside the data array.
     
    499499Figure~\ref{fig:locality:code} shows pseudo code for this benchmark.
    500500
    501 The objective here is to highlight the different decision made by the runtime when unparking.
     501The objective here is to highlight the different decision made by the runtime when \glslink{atsched}{unparking}.
    502502Since each thread unparks a random semaphore, it means that it is unlikely that a \at is unparked from the last \proc it ran on.
    503 In the noshare variation, unparking the \at on the local \proc is an appropriate choice since the data was last modified on that \proc.
    504 In the shared variation, unparking the \at on a remote \proc is an appropriate choice.
    505 
    506 The expectation for this benchmark is to see a performance inversion, where runtimes fare notably better in the variation which matches their unparking policy.
     503In the noshare variation, \glslink{atsched}{unparking} the \at on the local \proc is an appropriate choice since the data was last modified on that \proc.
     504In the shared variation, \glslink{atsched}{unparking} the \at on a remote \proc is an appropriate choice.
     505
     506The expectation for this benchmark is to see a performance inversion, where runtimes fare notably better in the variation which matches their \glslink{atsched}{unparking} policy.
    507507This decision should lead to \CFA, Go and Tokio achieving better performance in the share variation while libfibre achieves better performance in noshare.
    508 Indeed, \CFA, Go and Tokio have the default policy of unparking \ats on the local \proc, where as libfibre has the default policy of unparking \ats wherever they last ran.
     508Indeed, \CFA, Go and Tokio have the default policy of \glslink{atsched}{unparking} \ats on the local \proc, where as libfibre has the default policy of \glslink{atsched}{unparking} \ats wherever they last ran.
    509509
    510510\begin{figure}
     
    554554\vrule
    555555\hspace{3pt}
    556 \subfloat[Share]{\label{fig:locality:code:T1}\usebox\myboxB}
     556\subfloat[Share]{\label{fig:locality:code:T2}\usebox\myboxB}
    557557
    558558\caption[Locality Benchmark : Pseudo Code]{Locality Benchmark : Pseudo Code}
     
    566566Looking at the left column on Intel, Figures~\ref{fig:locality:jax:share:ops} and \ref{fig:locality:jax:share:ns} show the results for the share variation.
    567567\CFA and Tokio slightly outperform libfibre, as expected, based on their \ats placement approach.
    568 \CFA and Tokio both unpark locally and do not suffer cache misses on the transferred array.
     568\CFA and Tokio both \unpark locally and do not suffer cache misses on the transferred array.
    569569Libfibre on the other hand unparks remotely, and as such the unparked \at is likely to miss on the shared data.
    570570Go trails behind in this experiment, presumably for the same reasons that were observable in the churn benchmark.
     
    640640Indeed, in this case, unparking remotely means the unparked \at is less likely to suffer a cache miss on the array, which leaves the \at data structure and the remote queue as the only source of likely cache misses.
    641641Results show both are amortized fairly well in this case.
    642 \CFA and Tokio both unpark locally and as a result suffer a marginal performance degradation from the cache miss on the array.
     642\CFA and Tokio both \unpark locally and as a result suffer a marginal performance degradation from the cache miss on the array.
    643643
    644644Looking at the results for the AMD architecture, Figure~\ref{fig:locality:nasus}, shows results similar to the Intel.
     
    651651Go still has the same poor performance.
    652652
    653 Overall, this benchmark mostly demonstrates the two options available when unparking a \at.
     653Overall, this benchmark mostly demonstrates the two options available when \glslink{atsched}{unparking} a \at.
    654654Depending on the workload, either of these options can be the appropriate one.
    655655Since it is prohibitively difficult to dynamically detect which approach is appropriate, all runtimes much choose one of the two and live with the consequences.
Note: See TracChangeset for help on using the changeset viewer.