Ignore:
Timestamp:
May 18, 2022, 3:59:14 PM (5 months ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
master, pthread-emulation, qualifiedEnum
Children:
288927f
Parents:
fa2a3b1
Message:

A whole lot of results and some text section done

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/thierry_delisle_PhD/thesis/text/eval_micro.tex

    rfa2a3b1 r622a358  
    66\section{Benchmark Environment}
    77All of these benchmarks are run on two distinct hardware environment, an AMD and an INTEL machine.
     8
     9For all benchmarks, \texttt{taskset} is used to limit the experiment to 1 NUMA Node with no hyper threading.
     10If more \glspl{hthrd} are needed, then 1 NUMA Node with hyperthreading is used.
     11If still more \glspl{hthrd} are needed then the experiment is limited to as few NUMA Nodes as needed.
     12
    813
    914\paragraph{AMD} The AMD machine is a server with two AMD EPYC 7662 CPUs and 256GB of DDR4 RAM.
     
    2328
    2429\section{Cycling latency}
     30\begin{figure}
     31        \centering
     32        \input{cycle.pstex_t}
     33        \caption[Cycle benchmark]{Cycle benchmark\smallskip\newline Each \gls{at} unparks the next \gls{at} in the cycle before parking itself.}
     34        \label{fig:cycle}
     35\end{figure}
    2536The most basic evaluation of any ready queue is to evaluate the latency needed to push and pop one element from the ready-queue.
    2637Since these two operation also describe a \texttt{yield} operation, many systems use this as the most basic benchmark.
     
    4253Note that this problem is only present on SMP machines and is significantly mitigated by the fact that there are multiple rings in the system.
    4354
    44 \begin{figure}
    45         \centering
    46         \input{cycle.pstex_t}
    47         \caption[Cycle benchmark]{Cycle benchmark\smallskip\newline Each \gls{at} unparks the next \gls{at} in the cycle before parking itself.}
    48         \label{fig:cycle}
    49 \end{figure}
    50 
    5155To avoid this benchmark from being dominated by the idle sleep handling, the number of rings is kept at least as high as the number of \glspl{proc} available.
    5256Beyond this point, adding more rings serves to mitigate even more the idle sleep handling.
     
    5458
    5559The actual benchmark is more complicated to handle termination, but that simply requires using a binary semphore or a channel instead of raw \texttt{park}/\texttt{unpark} and carefully picking the order of the \texttt{P} and \texttt{V} with respect to the loop condition.
    56 
    57 \begin{lstlisting}
    58         Thread.main() {
    59                 count := 0
    60                 for {
    61                         wait()
    62                         this.next.wake()
    63                         count ++
    64                         if must_stop() { break }
    65                 }
    66                 global.count += count
    67         }
    68 \end{lstlisting}
    69 
    70 \begin{figure}
    71         \centering
    72         \input{result.cycle.jax.ops.pstex_t}
    73         \vspace*{-10pt}
    74         \label{fig:cycle:ns:jax}
    75 \end{figure}
     60Figure~\ref{fig:cycle:code} shows pseudo code for this benchmark.
     61
     62\begin{figure}
     63        \begin{lstlisting}
     64                Thread.main() {
     65                        count := 0
     66                        for {
     67                                wait()
     68                                this.next.wake()
     69                                count ++
     70                                if must_stop() { break }
     71                        }
     72                        global.count += count
     73                }
     74        \end{lstlisting}
     75        \caption[Cycle Benchmark : Pseudo Code]{Cycle Benchmark : Pseudo Code}
     76        \label{fig:cycle:code}
     77\end{figure}
     78
     79
     80
     81\subsection{Results}
     82\begin{figure}
     83        \subfloat[][Throughput, 100 \ats per \proc]{
     84                \resizebox{0.5\linewidth}{!}{
     85                        \input{result.cycle.jax.ops.pstex_t}
     86                }
     87                \label{fig:cycle:jax:ops}
     88        }
     89        \subfloat[][Throughput, 1 \ats per \proc]{
     90                \resizebox{0.5\linewidth}{!}{
     91                        \input{result.cycle.low.jax.ops.pstex_t}
     92                }
     93                \label{fig:cycle:jax:low:ops}
     94        }
     95
     96        \subfloat[][Latency, 100 \ats per \proc]{
     97                \resizebox{0.5\linewidth}{!}{
     98                        \input{result.cycle.jax.ns.pstex_t}
     99                }
     100
     101        }
     102        \subfloat[][Latency, 1 \ats per \proc]{
     103                \resizebox{0.5\linewidth}{!}{
     104                        \input{result.cycle.low.jax.ns.pstex_t}
     105                }
     106                \label{fig:cycle:jax:low:ns}
     107        }
     108        \caption[Cycle Benchmark on Intel]{Cycle Benchmark on Intel\smallskip\newline Throughput as a function of \proc count, using 100 cycles per \proc, 5 \ats per cycle.}
     109        \label{fig:cycle:jax}
     110\end{figure}
     111Figure~\ref{fig:cycle:jax} shows the throughput as a function of \proc count, with the following constants:
     112Each run uses 100 cycles per \proc, 5 \ats per cycle.
     113
     114\todo{results discussion}
    76115
    77116\section{Yield}
     
    81120Its only interesting variable is the number of \glspl{at} per \glspl{proc}, where ratios close to 1 means the ready queue(s) could be empty.
    82121This sometimes puts more strain on the idle sleep handling, compared to scenarios where there is clearly plenty of work to be done.
    83 
    84 \todo{code, setup, results}
    85 
    86 \begin{lstlisting}
    87         Thread.main() {
    88                 count := 0
    89                 while !stop {
    90                         yield()
    91                         count ++
    92                 }
    93                 global.count += count
    94         }
    95 \end{lstlisting}
     122Figure~\ref{fig:yield:code} shows pseudo code for this benchmark, the ``wait/wake-next'' is simply replaced by a yield.
     123
     124\begin{figure}
     125        \begin{lstlisting}
     126                Thread.main() {
     127                        count := 0
     128                        for {
     129                                yield()
     130                                count ++
     131                                if must_stop() { break }
     132                        }
     133                        global.count += count
     134                }
     135        \end{lstlisting}
     136        \caption[Yield Benchmark : Pseudo Code]{Yield Benchmark : Pseudo Code}
     137        \label{fig:yield:code}
     138\end{figure}
     139
     140\subsection{Results}
     141\begin{figure}
     142        \subfloat[][Throughput, 100 \ats per \proc]{
     143                \resizebox{0.5\linewidth}{!}{
     144                        \input{result.yield.jax.ops.pstex_t}
     145                }
     146                \label{fig:yield:jax:ops}
     147        }
     148        \subfloat[][Throughput, 1 \ats per \proc]{
     149                \resizebox{0.5\linewidth}{!}{
     150                \input{result.yield.low.jax.ops.pstex_t}
     151                }
     152                \label{fig:yield:jax:low:ops}
     153        }
     154
     155        \subfloat[][Latency, 100 \ats per \proc]{
     156                \resizebox{0.5\linewidth}{!}{
     157                \input{result.yield.jax.ns.pstex_t}
     158                }
     159                \label{fig:yield:jax:ns}
     160        }
     161        \subfloat[][Latency, 1 \ats per \proc]{
     162                \resizebox{0.5\linewidth}{!}{
     163                \input{result.yield.low.jax.ns.pstex_t}
     164                }
     165                \label{fig:yield:jax:low:ns}
     166        }
     167        \caption[Yield Benchmark on Intel]{Yield Benchmark on Intel\smallskip\newline Throughput as a function of \proc count, using 1 \ats per \proc.}
     168        \label{fig:yield:jax}
     169\end{figure}
     170Figure~\ref{fig:yield:ops:jax} shows the throughput as a function of \proc count, with the following constants:
     171Each run uses 100 \ats per \proc.
     172
     173\todo{results discussion}
    96174
    97175
     
    105183In either case, this benchmark aims to highlight how each scheduler handles these cases, since both cases can lead to performance degradation if they are not handled correctly.
    106184
    107 To achieve this the benchmark uses a fixed size array of \newterm{chair}s, where a chair is a data structure that holds a single blocked \gls{at}.
    108 When a \gls{at} attempts to block on the chair, it must first unblocked the \gls{at} currently blocked on said chair, if any.
    109 This creates a flow where \glspl{at} push each other out of the chairs before being pushed out themselves.
    110 For this benchmark to work however, the number of \glspl{at} must be equal or greater to the number of chairs plus the number of \glspl{proc}.
     185To achieve this the benchmark uses a fixed size array of semaphores.
     186Each \gls{at} picks a random semaphore, \texttt{V}s it to unblock a \at waiting and then \texttt{P}s on the semaphore.
     187This creates a flow where \glspl{at} push each other out of the semaphores before being pushed out themselves.
     188For this benchmark to work however, the number of \glspl{at} must be equal or greater to the number of semaphores plus the number of \glspl{proc}.
     189Note that the nature of these semaphores mean the counter can go beyond 1, which could lead to calls to \texttt{P} not blocking.
    111190
    112191\todo{code, setup, results}
     
    116195                for {
    117196                        r := random() % len(spots)
    118                         next := xchg(spots[r], this)
    119                         if next { next.wake() }
    120                         wait()
     197                        spots[r].V()
     198                        spots[r].P()
    121199                        count ++
    122200                        if must_stop() { break }
     
    125203        }
    126204\end{lstlisting}
     205
     206\begin{figure}
     207        \subfloat[][Throughput, 100 \ats per \proc]{
     208                \resizebox{0.5\linewidth}{!}{
     209                        \input{result.churn.jax.ops.pstex_t}
     210                }
     211                \label{fig:churn:jax:ops}
     212        }
     213        \subfloat[][Throughput, 1 \ats per \proc]{
     214                \resizebox{0.5\linewidth}{!}{
     215                        \input{result.churn.low.jax.ops.pstex_t}
     216                }
     217                \label{fig:churn:jax:low:ops}
     218        }
     219
     220        \subfloat[][Latency, 100 \ats per \proc]{
     221                \resizebox{0.5\linewidth}{!}{
     222                        \input{result.churn.jax.ns.pstex_t}
     223                }
     224
     225        }
     226        \subfloat[][Latency, 1 \ats per \proc]{
     227                \resizebox{0.5\linewidth}{!}{
     228                        \input{result.churn.low.jax.ns.pstex_t}
     229                }
     230                \label{fig:churn:jax:low:ns}
     231        }
     232        \caption[Churn Benchmark on Intel]{\centering Churn Benchmark on Intel\smallskip\newline Throughput and latency of the Churn on the benchmark on the Intel machine. Throughput is the total operation per second across all cores. Latency is the duration of each opeartion.}
     233        \label{fig:churn:jax}
     234\end{figure}
    127235
    128236\section{Locality}
Note: See TracChangeset for help on using the changeset viewer.