Ignore:
Timestamp:
Sep 6, 2022, 4:05:00 PM (20 months ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
ADT, ast-experimental, master, pthread-emulation
Children:
a44514e
Parents:
9f99799
Message:

Merged peter's last changes and filled in most of the TODOs

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/thierry_delisle_PhD/thesis/text/eval_micro.tex

    r9f99799 r7a0f798b  
    101101\caption[Cycle Benchmark : Pseudo Code]{Cycle Benchmark : Pseudo Code}
    102102\label{fig:cycle:code}
    103 %\end{figure}ll have a physical key so it's not urgent.
    104103\bigskip
    105 %\begin{figure}
    106104        \subfloat[][Throughput, 100 cycles per \proc]{
    107105                \resizebox{0.5\linewidth}{!}{
     
    401399
    402400Figures~\ref{fig:churn:jax} and Figure~\ref{fig:churn:nasus} show the results for the churn experiment on Intel and AMD, respectively.
    403 Looking at the left column on Intel, Figures~\ref{fig:churn:jax:ops} and \ref{fig:churn:jax:ns} show the results for 100 \ats for each \proc have, and all runtimes obtain fairly similar throughput for most \proc counts.
     401Looking at the left column on Intel, Figures~\ref{fig:churn:jax:ops} and \ref{fig:churn:jax:ns} show the results for 100 \ats for each \proc, and all runtimes obtain fairly similar throughput for most \proc counts.
    404402\CFA does very well on a single \proc but quickly loses its advantage over the other runtimes.
    405403As expected, it scales decently up to 48 \procs, drops from 48 to 72 \procs, and then plateaus.
     
    425423Libfibre follows very closely behind with basically the same performance and scaling.
    426424Tokio maintains effectively the same curve shapes as \CFA and libfibre, but it incurs extra costs for all \proc counts.
    427 % As a result it is slightly outperformed by \CFA and libfibre.
    428425While Go maintains overall similar results to the others, it again encounters significant variation at high \proc counts.
    429426Inexplicably resulting in super-linear scaling for some runs, \ie the scalability curves displays a negative slope.
     
    497494It is also possible to unpark to a third unrelated ready-queue, but without additional knowledge about the situation, it is likely to degrade performance.}
    498495The locality experiment includes two variations of the churn benchmark, where a data array is added.
    499 In both variations, before @V@ing the semaphore, each \at increments random cells inside the data array by calling a @work@ function.
     496In both variations, before @V@ing the semaphore, each \at calls a @work@ function which increments random cells inside the data array.
    500497In the noshare variation, the array is not passed on and each thread continuously accesses its private array.
    501498In the share variation, the array is passed to another thread via the semaphore's shadow-queue (each blocking thread can save a word of user data in its blocking node), transferring ownership of the array to the woken thread.
     
    506503In the noshare variation, unparking the \at on the local \proc is an appropriate choice since the data was last modified on that \proc.
    507504In the shared variation, unparking the \at on a remote \proc is an appropriate choice.
    508 \todo{PAB: I changed these sentences around.}
    509505
    510506The expectation for this benchmark is to see a performance inversion, where runtimes fare notably better in the variation which matches their unparking policy.
     
    720716This scenario is a harder case to handle because corrective measures must be taken even when work is available.
    721717Note, runtimes with preemption circumvent this problem by forcing the spinner to yield.
     718In \CFA preemption was disabled as it only obfuscates the results.
     719I am not aware of a method to disable preemption in Go.
    722720
    723721In both variations, the experiment effectively measures how long it takes for all \ats to run once after a given synchronization point.
     
    763761The semaphore variation is denoted ``Park'', where the number of \ats dwindles down as the new leader is acknowledged.
    764762The yielding variation is denoted ``Yield''.
    765 The experiment is only run for many \procs, since scaling is not the focus of this experiment.
     763The experiment is only run for few and many \procs, since scaling is not the focus of this experiment.
    766764
    767765The first two columns show the results for the semaphore variation on Intel.
     
    771769Looking at the next two columns, the results for the yield variation on Intel, the story is very different.
    772770\CFA achieves better latencies, presumably due to no synchronization with the yield.
    773 \todo{PAB: what about \CFA preemption? How does that come into play for your scheduler?}
    774771Go does complete the experiment, but with drastically higher latency:
    775772latency at 2 \procs is $350\times$ higher than \CFA and $70\times$ higher at 192 \procs.
    776 This difference is because Go has a classic work-stealing scheduler, but it adds coarse-grain preemption\footnote{
    777 Preemption is done at the function prolog when the goroutine's stack is increasing;
    778 whereas \CFA uses fine-grain preemption between any two instructions.}
     773This difference is because Go has a classic work-stealing scheduler, but it adds coarse-grain preemption
    779774, which interrupts the spinning leader after a period.
    780775Neither Libfibre or Tokio complete the experiment.
Note: See TracChangeset for help on using the changeset viewer.