- Timestamp:
- May 18, 2022, 3:59:14 PM (3 years ago)
- Branches:
- ADT, ast-experimental, master, pthread-emulation, qualifiedEnum
- Children:
- 288927f
- Parents:
- fa2a3b1
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/thierry_delisle_PhD/thesis/text/eval_micro.tex
rfa2a3b1 r622a358 6 6 \section{Benchmark Environment} 7 7 All of these benchmarks are run on two distinct hardware environment, an AMD and an INTEL machine. 8 9 For all benchmarks, \texttt{taskset} is used to limit the experiment to 1 NUMA Node with no hyper threading. 10 If more \glspl{hthrd} are needed, then 1 NUMA Node with hyperthreading is used. 11 If still more \glspl{hthrd} are needed then the experiment is limited to as few NUMA Nodes as needed. 12 8 13 9 14 \paragraph{AMD} The AMD machine is a server with two AMD EPYC 7662 CPUs and 256GB of DDR4 RAM. … … 23 28 24 29 \section{Cycling latency} 30 \begin{figure} 31 \centering 32 \input{cycle.pstex_t} 33 \caption[Cycle benchmark]{Cycle benchmark\smallskip\newline Each \gls{at} unparks the next \gls{at} in the cycle before parking itself.} 34 \label{fig:cycle} 35 \end{figure} 25 36 The most basic evaluation of any ready queue is to evaluate the latency needed to push and pop one element from the ready-queue. 26 37 Since these two operation also describe a \texttt{yield} operation, many systems use this as the most basic benchmark. … … 42 53 Note that this problem is only present on SMP machines and is significantly mitigated by the fact that there are multiple rings in the system. 43 54 44 \begin{figure}45 \centering46 \input{cycle.pstex_t}47 \caption[Cycle benchmark]{Cycle benchmark\smallskip\newline Each \gls{at} unparks the next \gls{at} in the cycle before parking itself.}48 \label{fig:cycle}49 \end{figure}50 51 55 To avoid this benchmark from being dominated by the idle sleep handling, the number of rings is kept at least as high as the number of \glspl{proc} available. 52 56 Beyond this point, adding more rings serves to mitigate even more the idle sleep handling. … … 54 58 55 59 The actual benchmark is more complicated to handle termination, but that simply requires using a binary semphore or a channel instead of raw \texttt{park}/\texttt{unpark} and carefully picking the order of the \texttt{P} and \texttt{V} with respect to the loop condition. 56 57 \begin{lstlisting} 58 Thread.main() { 59 count := 0 60 for { 61 wait() 62 this.next.wake() 63 count ++ 64 if must_stop() { break } 65 } 66 global.count += count 67 } 68 \end{lstlisting} 69 70 \begin{figure} 71 \centering 72 \input{result.cycle.jax.ops.pstex_t} 73 \vspace*{-10pt} 74 \label{fig:cycle:ns:jax} 75 \end{figure} 60 Figure~\ref{fig:cycle:code} shows pseudo code for this benchmark. 61 62 \begin{figure} 63 \begin{lstlisting} 64 Thread.main() { 65 count := 0 66 for { 67 wait() 68 this.next.wake() 69 count ++ 70 if must_stop() { break } 71 } 72 global.count += count 73 } 74 \end{lstlisting} 75 \caption[Cycle Benchmark : Pseudo Code]{Cycle Benchmark : Pseudo Code} 76 \label{fig:cycle:code} 77 \end{figure} 78 79 80 81 \subsection{Results} 82 \begin{figure} 83 \subfloat[][Throughput, 100 \ats per \proc]{ 84 \resizebox{0.5\linewidth}{!}{ 85 \input{result.cycle.jax.ops.pstex_t} 86 } 87 \label{fig:cycle:jax:ops} 88 } 89 \subfloat[][Throughput, 1 \ats per \proc]{ 90 \resizebox{0.5\linewidth}{!}{ 91 \input{result.cycle.low.jax.ops.pstex_t} 92 } 93 \label{fig:cycle:jax:low:ops} 94 } 95 96 \subfloat[][Latency, 100 \ats per \proc]{ 97 \resizebox{0.5\linewidth}{!}{ 98 \input{result.cycle.jax.ns.pstex_t} 99 } 100 101 } 102 \subfloat[][Latency, 1 \ats per \proc]{ 103 \resizebox{0.5\linewidth}{!}{ 104 \input{result.cycle.low.jax.ns.pstex_t} 105 } 106 \label{fig:cycle:jax:low:ns} 107 } 108 \caption[Cycle Benchmark on Intel]{Cycle Benchmark on Intel\smallskip\newline Throughput as a function of \proc count, using 100 cycles per \proc, 5 \ats per cycle.} 109 \label{fig:cycle:jax} 110 \end{figure} 111 Figure~\ref{fig:cycle:jax} shows the throughput as a function of \proc count, with the following constants: 112 Each run uses 100 cycles per \proc, 5 \ats per cycle. 113 114 \todo{results discussion} 76 115 77 116 \section{Yield} … … 81 120 Its only interesting variable is the number of \glspl{at} per \glspl{proc}, where ratios close to 1 means the ready queue(s) could be empty. 82 121 This sometimes puts more strain on the idle sleep handling, compared to scenarios where there is clearly plenty of work to be done. 83 84 \todo{code, setup, results} 85 86 \begin{lstlisting} 87 Thread.main() { 88 count := 0 89 while !stop { 90 yield() 91 count ++ 92 } 93 global.count += count 94 } 95 \end{lstlisting} 122 Figure~\ref{fig:yield:code} shows pseudo code for this benchmark, the ``wait/wake-next'' is simply replaced by a yield. 123 124 \begin{figure} 125 \begin{lstlisting} 126 Thread.main() { 127 count := 0 128 for { 129 yield() 130 count ++ 131 if must_stop() { break } 132 } 133 global.count += count 134 } 135 \end{lstlisting} 136 \caption[Yield Benchmark : Pseudo Code]{Yield Benchmark : Pseudo Code} 137 \label{fig:yield:code} 138 \end{figure} 139 140 \subsection{Results} 141 \begin{figure} 142 \subfloat[][Throughput, 100 \ats per \proc]{ 143 \resizebox{0.5\linewidth}{!}{ 144 \input{result.yield.jax.ops.pstex_t} 145 } 146 \label{fig:yield:jax:ops} 147 } 148 \subfloat[][Throughput, 1 \ats per \proc]{ 149 \resizebox{0.5\linewidth}{!}{ 150 \input{result.yield.low.jax.ops.pstex_t} 151 } 152 \label{fig:yield:jax:low:ops} 153 } 154 155 \subfloat[][Latency, 100 \ats per \proc]{ 156 \resizebox{0.5\linewidth}{!}{ 157 \input{result.yield.jax.ns.pstex_t} 158 } 159 \label{fig:yield:jax:ns} 160 } 161 \subfloat[][Latency, 1 \ats per \proc]{ 162 \resizebox{0.5\linewidth}{!}{ 163 \input{result.yield.low.jax.ns.pstex_t} 164 } 165 \label{fig:yield:jax:low:ns} 166 } 167 \caption[Yield Benchmark on Intel]{Yield Benchmark on Intel\smallskip\newline Throughput as a function of \proc count, using 1 \ats per \proc.} 168 \label{fig:yield:jax} 169 \end{figure} 170 Figure~\ref{fig:yield:ops:jax} shows the throughput as a function of \proc count, with the following constants: 171 Each run uses 100 \ats per \proc. 172 173 \todo{results discussion} 96 174 97 175 … … 105 183 In either case, this benchmark aims to highlight how each scheduler handles these cases, since both cases can lead to performance degradation if they are not handled correctly. 106 184 107 To achieve this the benchmark uses a fixed size array of \newterm{chair}s, where a chair is a data structure that holds a single blocked \gls{at}. 108 When a \gls{at} attempts to block on the chair, it must first unblocked the \gls{at} currently blocked on said chair, if any. 109 This creates a flow where \glspl{at} push each other out of the chairs before being pushed out themselves. 110 For this benchmark to work however, the number of \glspl{at} must be equal or greater to the number of chairs plus the number of \glspl{proc}. 185 To achieve this the benchmark uses a fixed size array of semaphores. 186 Each \gls{at} picks a random semaphore, \texttt{V}s it to unblock a \at waiting and then \texttt{P}s on the semaphore. 187 This creates a flow where \glspl{at} push each other out of the semaphores before being pushed out themselves. 188 For this benchmark to work however, the number of \glspl{at} must be equal or greater to the number of semaphores plus the number of \glspl{proc}. 189 Note that the nature of these semaphores mean the counter can go beyond 1, which could lead to calls to \texttt{P} not blocking. 111 190 112 191 \todo{code, setup, results} … … 116 195 for { 117 196 r := random() % len(spots) 118 next := xchg(spots[r], this) 119 if next { next.wake() } 120 wait() 197 spots[r].V() 198 spots[r].P() 121 199 count ++ 122 200 if must_stop() { break } … … 125 203 } 126 204 \end{lstlisting} 205 206 \begin{figure} 207 \subfloat[][Throughput, 100 \ats per \proc]{ 208 \resizebox{0.5\linewidth}{!}{ 209 \input{result.churn.jax.ops.pstex_t} 210 } 211 \label{fig:churn:jax:ops} 212 } 213 \subfloat[][Throughput, 1 \ats per \proc]{ 214 \resizebox{0.5\linewidth}{!}{ 215 \input{result.churn.low.jax.ops.pstex_t} 216 } 217 \label{fig:churn:jax:low:ops} 218 } 219 220 \subfloat[][Latency, 100 \ats per \proc]{ 221 \resizebox{0.5\linewidth}{!}{ 222 \input{result.churn.jax.ns.pstex_t} 223 } 224 225 } 226 \subfloat[][Latency, 1 \ats per \proc]{ 227 \resizebox{0.5\linewidth}{!}{ 228 \input{result.churn.low.jax.ns.pstex_t} 229 } 230 \label{fig:churn:jax:low:ns} 231 } 232 \caption[Churn Benchmark on Intel]{\centering Churn Benchmark on Intel\smallskip\newline Throughput and latency of the Churn on the benchmark on the Intel machine. Throughput is the total operation per second across all cores. Latency is the duration of each opeartion.} 233 \label{fig:churn:jax} 234 \end{figure} 127 235 128 236 \section{Locality}
Note: See TracChangeset
for help on using the changeset viewer.