- Timestamp:
- Sep 6, 2022, 4:05:00 PM (20 months ago)
- Branches:
- ADT, ast-experimental, master, pthread-emulation
- Children:
- a44514e
- Parents:
- 9f99799
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/thierry_delisle_PhD/thesis/text/eval_micro.tex
r9f99799 r7a0f798b 101 101 \caption[Cycle Benchmark : Pseudo Code]{Cycle Benchmark : Pseudo Code} 102 102 \label{fig:cycle:code} 103 %\end{figure}ll have a physical key so it's not urgent.104 103 \bigskip 105 %\begin{figure}106 104 \subfloat[][Throughput, 100 cycles per \proc]{ 107 105 \resizebox{0.5\linewidth}{!}{ … … 401 399 402 400 Figures~\ref{fig:churn:jax} and Figure~\ref{fig:churn:nasus} show the results for the churn experiment on Intel and AMD, respectively. 403 Looking at the left column on Intel, Figures~\ref{fig:churn:jax:ops} and \ref{fig:churn:jax:ns} show the results for 100 \ats for each \proc have, and all runtimes obtain fairly similar throughput for most \proc counts.401 Looking at the left column on Intel, Figures~\ref{fig:churn:jax:ops} and \ref{fig:churn:jax:ns} show the results for 100 \ats for each \proc, and all runtimes obtain fairly similar throughput for most \proc counts. 404 402 \CFA does very well on a single \proc but quickly loses its advantage over the other runtimes. 405 403 As expected, it scales decently up to 48 \procs, drops from 48 to 72 \procs, and then plateaus. … … 425 423 Libfibre follows very closely behind with basically the same performance and scaling. 426 424 Tokio maintains effectively the same curve shapes as \CFA and libfibre, but it incurs extra costs for all \proc counts. 427 % As a result it is slightly outperformed by \CFA and libfibre.428 425 While Go maintains overall similar results to the others, it again encounters significant variation at high \proc counts. 429 426 Inexplicably resulting in super-linear scaling for some runs, \ie the scalability curves displays a negative slope. … … 497 494 It is also possible to unpark to a third unrelated ready-queue, but without additional knowledge about the situation, it is likely to degrade performance.} 498 495 The locality experiment includes two variations of the churn benchmark, where a data array is added. 499 In both variations, before @V@ing the semaphore, each \at increments random cells inside the data array by calling a @work@ function.496 In both variations, before @V@ing the semaphore, each \at calls a @work@ function which increments random cells inside the data array. 500 497 In the noshare variation, the array is not passed on and each thread continuously accesses its private array. 501 498 In the share variation, the array is passed to another thread via the semaphore's shadow-queue (each blocking thread can save a word of user data in its blocking node), transferring ownership of the array to the woken thread. … … 506 503 In the noshare variation, unparking the \at on the local \proc is an appropriate choice since the data was last modified on that \proc. 507 504 In the shared variation, unparking the \at on a remote \proc is an appropriate choice. 508 \todo{PAB: I changed these sentences around.}509 505 510 506 The expectation for this benchmark is to see a performance inversion, where runtimes fare notably better in the variation which matches their unparking policy. … … 720 716 This scenario is a harder case to handle because corrective measures must be taken even when work is available. 721 717 Note, runtimes with preemption circumvent this problem by forcing the spinner to yield. 718 In \CFA preemption was disabled as it only obfuscates the results. 719 I am not aware of a method to disable preemption in Go. 722 720 723 721 In both variations, the experiment effectively measures how long it takes for all \ats to run once after a given synchronization point. … … 763 761 The semaphore variation is denoted ``Park'', where the number of \ats dwindles down as the new leader is acknowledged. 764 762 The yielding variation is denoted ``Yield''. 765 The experiment is only run for many \procs, since scaling is not the focus of this experiment.763 The experiment is only run for few and many \procs, since scaling is not the focus of this experiment. 766 764 767 765 The first two columns show the results for the semaphore variation on Intel. … … 771 769 Looking at the next two columns, the results for the yield variation on Intel, the story is very different. 772 770 \CFA achieves better latencies, presumably due to no synchronization with the yield. 773 \todo{PAB: what about \CFA preemption? How does that come into play for your scheduler?}774 771 Go does complete the experiment, but with drastically higher latency: 775 772 latency at 2 \procs is $350\times$ higher than \CFA and $70\times$ higher at 192 \procs. 776 This difference is because Go has a classic work-stealing scheduler, but it adds coarse-grain preemption\footnote{ 777 Preemption is done at the function prolog when the goroutine's stack is increasing; 778 whereas \CFA uses fine-grain preemption between any two instructions.} 773 This difference is because Go has a classic work-stealing scheduler, but it adds coarse-grain preemption 779 774 , which interrupts the spinning leader after a period. 780 775 Neither Libfibre or Tokio complete the experiment.
Note: See TracChangeset
for help on using the changeset viewer.