- Timestamp:
- Sep 7, 2022, 4:12:00 PM (2 years ago)
- Branches:
- ADT, ast-experimental, master, pthread-emulation
- Children:
- e4855f6
- Parents:
- 7a0f798b
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/thierry_delisle_PhD/thesis/text/eval_micro.tex
r7a0f798b ra44514e 6 6 The goal in this chapter is show the \CFA scheduler obtains equivalent performance to other less fair schedulers through the different experiments. 7 7 Note, only the code of the \CFA tests is shown; 8 all tests in the other systems are functionally identical and available online~\cite{ SchedulingBenchmarks}.8 all tests in the other systems are functionally identical and available online~\cite{GITHUB:SchedulingBenchmarks}. 9 9 10 10 \section{Benchmark Environment}\label{microenv} … … 63 63 For this reason, I designed a different push/pop benchmark, called \newterm{Cycle Benchmark}. 64 64 This benchmark arranges a number of \ats into a ring, as seen in Figure~\ref{fig:cycle}, where the ring is a circular singly-linked list. 65 At runtime, each \at unparks the next \at before parkingitself.66 Unparking the next \at pushes that \at onto the ready queue while the ensuing park leads to a \at being popped from the ready queue.65 At runtime, each \at unparks the next \at before \glslink{atblock}{parking} itself. 66 Unparking the next \at pushes that \at onto the ready queue while the ensuing \park leads to a \at being popped from the ready queue. 67 67 68 68 \begin{figure} 69 69 \centering 70 70 \input{cycle.pstex_t} 71 \caption[Cycle benchmark]{Cycle benchmark\smallskip\newline Each \at unparks the next \at in the cycle before parkingitself.}71 \caption[Cycle benchmark]{Cycle benchmark\smallskip\newline Each \at unparks the next \at in the cycle before \glslink{atblock}{parking} itself.} 72 72 \label{fig:cycle} 73 73 \end{figure} 74 74 75 75 Therefore, the underlying runtime cannot rely on the number of ready \ats staying constant over the duration of the experiment. 76 In fact, the total number of \ats waiting on the ready queue is expected to vary because of the race between the next \at unparking and the current \at parking.76 In fact, the total number of \ats waiting on the ready queue is expected to vary because of the race between the next \at \glslink{atsched}{unparking} and the current \at \glslink{atblock}{parking}. 77 77 That is, the runtime cannot anticipate that the current task immediately parks. 78 78 As well, the size of the cycle is also decided based on this race, \eg a small cycle may see the chain of unparks go full circle before the first \at parks because of time-slicing or multiple \procs. 79 79 If this happens, the scheduler push and pop are avoided and the results of the experiment are skewed. 80 (Note, an unpark is like a V on a semaphore, so the subsequentpark (P) may not block.)80 (Note, an \unpark is like a V on a semaphore, so the subsequent \park (P) may not block.) 81 81 Every runtime system must handle this race and cannot optimized away the ready-queue pushes and pops. 82 To prevent any attempt of silently omitting ready-queue operations, the ring of \ats is made big enough so the \ats have time to fully park before being unparked again.82 To prevent any attempt of silently omitting ready-queue operations, the ring of \ats is made big enough so the \ats have time to fully \park before being unparked again. 83 83 Finally, to further mitigate any underlying push/pop optimizations, especially on SMP machines, multiple rings are created in the experiment. 84 84 85 85 Figure~\ref{fig:cycle:code} shows the pseudo code for this benchmark, where each cycle has 5 \ats. 86 There is additional complexity to handle termination (not shown), which requires a binary semaphore or a channel instead of raw @park@/@unpark@and carefully picking the order of the @P@ and @V@ with respect to the loop condition.86 There is additional complexity to handle termination (not shown), which requires a binary semaphore or a channel instead of raw \park/\unpark and carefully picking the order of the @P@ and @V@ with respect to the loop condition. 87 87 88 88 \begin{figure} … … 416 416 An interesting aspect to note here is that the runtimes differ in how they handle this situation. 417 417 Indeed, when a \proc unparks a \at that was last run on a different \proc, the \at could be appended to the ready queue of the local \proc or to the ready queue of the remote \proc, which previously ran the \at. 418 \CFA, Tokio and Go all use the approach of unparkingto the local \proc, while Libfibre unparks to the remote \proc.418 \CFA, Tokio and Go all use the approach of \glslink{atsched}{unparking} to the local \proc, while Libfibre unparks to the remote \proc. 419 419 In this particular benchmark, the inherent chaos of the benchmark, in addition to small memory footprint, means neither approach wins over the other. 420 420 … … 485 485 Up to 32 \procs, after which the other runtime manage to outscale Go. 486 486 487 In conclusion, the objective of this benchmark is to demonstrate that unparking\ats from remote \procs does not cause too much contention on the local queues.487 In conclusion, the objective of this benchmark is to demonstrate that \glslink{atsched}{unparking} \ats from remote \procs does not cause too much contention on the local queues. 488 488 Indeed, the fact that most runtimes achieve some scaling between various \proc count demonstrate migrations do not need to be serialized. 489 489 Again these result demonstrate \CFA achieves satisfactory performance with respect to the other runtimes. … … 491 491 \section{Locality} 492 492 493 As mentioned in the churn benchmark, when unparking a \at, it is possible to eitherunpark to the local or remote ready-queue.\footnote{494 It is also possible to unpark to a third unrelated ready-queue, but without additional knowledge about the situation, it is likely to degrade performance.}493 As mentioned in the churn benchmark, when \glslink{atsched}{unparking} a \at, it is possible to either \unpark to the local or remote ready-queue.\footnote{ 494 It is also possible to \unpark to a third unrelated ready-queue, but without additional knowledge about the situation, it is likely to degrade performance.} 495 495 The locality experiment includes two variations of the churn benchmark, where a data array is added. 496 496 In both variations, before @V@ing the semaphore, each \at calls a @work@ function which increments random cells inside the data array. … … 499 499 Figure~\ref{fig:locality:code} shows pseudo code for this benchmark. 500 500 501 The objective here is to highlight the different decision made by the runtime when unparking.501 The objective here is to highlight the different decision made by the runtime when \glslink{atsched}{unparking}. 502 502 Since each thread unparks a random semaphore, it means that it is unlikely that a \at is unparked from the last \proc it ran on. 503 In the noshare variation, unparkingthe \at on the local \proc is an appropriate choice since the data was last modified on that \proc.504 In the shared variation, unparkingthe \at on a remote \proc is an appropriate choice.505 506 The expectation for this benchmark is to see a performance inversion, where runtimes fare notably better in the variation which matches their unparkingpolicy.503 In the noshare variation, \glslink{atsched}{unparking} the \at on the local \proc is an appropriate choice since the data was last modified on that \proc. 504 In the shared variation, \glslink{atsched}{unparking} the \at on a remote \proc is an appropriate choice. 505 506 The expectation for this benchmark is to see a performance inversion, where runtimes fare notably better in the variation which matches their \glslink{atsched}{unparking} policy. 507 507 This decision should lead to \CFA, Go and Tokio achieving better performance in the share variation while libfibre achieves better performance in noshare. 508 Indeed, \CFA, Go and Tokio have the default policy of unparking \ats on the local \proc, where as libfibre has the default policy of unparking\ats wherever they last ran.508 Indeed, \CFA, Go and Tokio have the default policy of \glslink{atsched}{unparking} \ats on the local \proc, where as libfibre has the default policy of \glslink{atsched}{unparking} \ats wherever they last ran. 509 509 510 510 \begin{figure} … … 554 554 \vrule 555 555 \hspace{3pt} 556 \subfloat[Share]{\label{fig:locality:code:T 1}\usebox\myboxB}556 \subfloat[Share]{\label{fig:locality:code:T2}\usebox\myboxB} 557 557 558 558 \caption[Locality Benchmark : Pseudo Code]{Locality Benchmark : Pseudo Code} … … 566 566 Looking at the left column on Intel, Figures~\ref{fig:locality:jax:share:ops} and \ref{fig:locality:jax:share:ns} show the results for the share variation. 567 567 \CFA and Tokio slightly outperform libfibre, as expected, based on their \ats placement approach. 568 \CFA and Tokio both unpark locally and do not suffer cache misses on the transferred array.568 \CFA and Tokio both \unpark locally and do not suffer cache misses on the transferred array. 569 569 Libfibre on the other hand unparks remotely, and as such the unparked \at is likely to miss on the shared data. 570 570 Go trails behind in this experiment, presumably for the same reasons that were observable in the churn benchmark. … … 640 640 Indeed, in this case, unparking remotely means the unparked \at is less likely to suffer a cache miss on the array, which leaves the \at data structure and the remote queue as the only source of likely cache misses. 641 641 Results show both are amortized fairly well in this case. 642 \CFA and Tokio both unpark locally and as a result suffer a marginal performance degradation from the cache miss on the array.642 \CFA and Tokio both \unpark locally and as a result suffer a marginal performance degradation from the cache miss on the array. 643 643 644 644 Looking at the results for the AMD architecture, Figure~\ref{fig:locality:nasus}, shows results similar to the Intel. … … 651 651 Go still has the same poor performance. 652 652 653 Overall, this benchmark mostly demonstrates the two options available when unparkinga \at.653 Overall, this benchmark mostly demonstrates the two options available when \glslink{atsched}{unparking} a \at. 654 654 Depending on the workload, either of these options can be the appropriate one. 655 655 Since it is prohibitively difficult to dynamically detect which approach is appropriate, all runtimes much choose one of the two and live with the consequences.
Note: See TracChangeset
for help on using the changeset viewer.