- Timestamp:
- Nov 24, 2022, 3:41:44 PM (17 months ago)
- Branches:
- ADT, ast-experimental, master
- Children:
- dacd8e6e
- Parents:
- 82a90d4
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/thierry_delisle_PhD/thesis/text/eval_macro.tex
r82a90d4 rddcaff6 1 1 \chapter{Macro-Benchmarks}\label{macrobench} 2 The previous chapter demonstrated th e \CFA scheduler achieves its equivalent performance goal in small and controlled \at-scheduling scenarios.2 The previous chapter demonstrated that the \CFA scheduler achieves its equivalent performance goal in small and controlled \at-scheduling scenarios. 3 3 The next step is to demonstrate performance stays true in more realistic and complete scenarios. 4 Therefore, this chapter exercises both \at and I/O scheduling using two flavours of web servers that demonstrate \CFA performs competitively compared to web servers used in production environments.4 Therefore, this chapter exercises both \at and I/O scheduling using two flavours of web servers that demonstrate that \CFA performs competitively compared to web servers used in production environments. 5 5 6 6 Web servers are chosen because they offer fairly simple applications that perform complex I/O, both network and disk, and are useful as standalone products. … … 10 10 As such, these experiments should highlight the overhead due to any \CFA fairness cost in realistic scenarios. 11 11 12 The most obvious performance metric for web servers is throughput. 13 This metric generally measures the speed at which the server answers and relatedly how fast clients can send requests before the server can no longer keep-up. 14 Another popular performance metric is \newterm{tail} latency, which indicates some notion of fairness among requests across the experiment, \ie do some requests wait longer than other requests for service? 15 Since many web applications rely on a combination of different queries made in parallel, the latency of the slowest response, \ie tail latency, can dictate a performance perception. 16 12 17 \section{Memcached} 13 18 Memcached~\cite{memcached} is an in-memory key-value store used in many production environments, \eg \cite{atikoglu2012workload}. … … 26 31 Each CPU has 6 cores and 2 \glspl{hthrd} per core, for a total of 24 \glspl{hthrd}. 27 32 \item 33 The machine is configured to run each servers on 12 dedicated \glspl{hthrd} and uses 6 of the remaining \glspl{hthrd} for the software interrupt handling~\cite{wiki:softirq}, resulting in maximum CPU utilization of 75\% (18 / 24 \glspl{hthrd}) 34 \item 28 35 A CPU has 384 KB, 3 MB and 30 MB of L1, L2 and L3 caches, respectively. 29 36 \item … … 47 54 \item 48 55 For UDP connections, all the threads listen to a single UDP socket for incoming requests. 49 Threads that are notcurrently dealing with another request ignore the incoming packet.56 Threads that are currently dealing with another request ignore the incoming packet. 50 57 One of the remaining, non-busy, threads reads the request and sends the response. 51 58 This implementation can lead to increased CPU \gls{load} as threads wake from sleep to potentially process the request. … … 79 86 \subsection{Throughput} \label{memcd:tput} 80 87 This experiment is done by having the clients establish 15,360 total connections, which persist for the duration of the experiment. 81 The clients then send read and write queries with only3\% writes (updates), attempting to follow a desired query rate, and the server responds to the desired rate as best as possible.88 The clients then send read and write queries with 3\% writes (updates), attempting to follow a desired query rate, and the server responds to the desired rate as best as possible. 82 89 Figure~\ref{fig:memcd:rate:qps} shows the 3 server versions at different client rates, ``Target \underline{Q}ueries \underline{P}er \underline{S}econd'', and the actual rate, ``Actual QPS'', for all three web servers. 83 90 … … 104 111 105 112 \subsection{Tail Latency} 106 Another popular performance metric is \newterm{tail} latency, which indicates some notion of fairness among requests across the experiment, \ie do some requests wait longer than other requests for service?107 Since many web applications rely on a combination of different queries made in parallel, the latency of the slowest response, \ie tail latency, can dictate a performance perception.108 113 Figure~\ref{fig:memcd:rate:tail} shows the 99th percentile latency results for the same Memcached experiment. 109 114 110 115 Again, each experiment is run 15 times with the median, maximum and minimum plotted with different lines. 111 As expected, the latency starts low and increases as the server gets close to saturation, at which point , the latency increases dramatically because the web servers cannot keep up with the connection rateso client requests are disproportionally delayed.116 As expected, the latency starts low and increases as the server gets close to saturation, at which point the latency increases dramatically because the web servers cannot keep up with the connection rate, so client requests are disproportionally delayed. 112 117 Because of this dramatic increase, the Y-axis is presented using a log scale. 113 118 Note that the graph shows the \emph{target} query rate, the actual response rate is given in Figure~\ref{fig:memcd:rate:qps} as this is the same underlying experiment. … … 186 191 web servers servicing dynamic requests, which read from multiple locations and construct a response, are not as interesting since creating the response takes more time and does not exercise the runtime in a meaningfully different way.} 187 192 The static web server experiment compares NGINX~\cite{nginx} with a custom \CFA-based web server developed for this experiment. 188 189 \subsection{NGINX threading}190 193 NGINX is a high-performance, \emph{full-service}, event-driven web server. 191 194 It can handle both static and dynamic web content, as well as serve as a reverse proxy and a load balancer~\cite{reese2008nginx}. 192 195 This wealth of capabilities comes with a variety of potential configurations, dictating available features and performance. 193 196 The NGINX server runs a master process that performs operations such as reading configuration files, binding to ports, and controlling worker processes. 194 When running as a static web server, it uses an event-driven architecture to service incoming requests. 197 In comparison, the custom \CFA web server was developed specifically with this experiment in mind. 198 However, nothing seems to indicate that NGINX suffers from the increased flexibility. 199 When tuned for performance, NGINX appears to achieve the performance that the underlying hardware can achieve. 200 201 \subsection{NGINX threading} 202 When running as a static web server, NGINX uses an event-driven architecture to service incoming requests. 195 203 Incoming connections are assigned a \emph{stackless} HTTP state machine and worker processes can handle thousands of these state machines. 196 204 For the following experiment, NGINX is configured to use @epoll@ to listen for events on these state machines and have each worker process independently accept new connections. 197 Because of the realities of Linux, see Subsection~\ref{ononblock}, NGINX also maintains a pool of auxiliary threads to handle blocking \io.205 Because of the realities of Linux, (Subsection~\ref{ononblock}), NGINX also maintains a pool of auxiliary threads to handle blocking \io. 198 206 The configuration can set the number of worker processes desired, as well as the size of the auxiliary pool. 199 207 However, for the following experiments, NGINX is configured to let the master process decide the appropriate number of threads. … … 262 270 The computer is booted with only 8 CPUs enabled, which is sufficient to achieve line rate. 263 271 \item 272 Both servers are setup with enough parallelism to achieve 100\% CPU utilization, which happens at higher request rates. 273 \item 264 274 Each CPU has 64 KB, 256 KiB and 8 MB of L1, L2 and L3 caches respectively. 265 275 \item
Note: See TracChangeset
for help on using the changeset viewer.