Index: doc/theses/thierry_delisle_PhD/thesis/text/eval_macro.tex
===================================================================
--- doc/theses/thierry_delisle_PhD/thesis/text/eval_macro.tex	(revision 5548175b1844129458cdc52ce5b9f999d5f0aa4b)
+++ doc/theses/thierry_delisle_PhD/thesis/text/eval_macro.tex	(revision 4ab54c9896771ff155e449a9652d7b4c90cfa166)
@@ -2,17 +2,17 @@
 The previous chapter demonstrated the \CFA scheduler achieves its equivalent performance goal in small and controlled \at-scheduling scenarios.
 The next step is to demonstrate performance stays true in more realistic and complete scenarios.
-Therefore, this chapter exercises both \at and I/O scheduling using two flavours of webservers that demonstrate \CFA performs competitively with production environments.
-
-Webservers are chosen because they offer fairly simple applications that perform complex I/O, both network and disk, and are useful as standalone products.
-Furthermore, webservers are generally amenable to parallelization since their workloads are mostly homogeneous.
-Therefore, webservers offer a stringent performance benchmark for \CFA.
-Indeed, existing webservers have close to optimal performance, while the homogeneity of the workload means fairness may not be a problem.
+Therefore, this chapter exercises both \at and I/O scheduling using two flavours of web servers that demonstrate \CFA performs competitively compared to web servers used in production environments.
+
+Web servers are chosen because they offer fairly simple applications that perform complex I/O, both network and disk, and are useful as standalone products.
+Furthermore, web servers are generally amenable to parallelization since their workloads are mostly homogeneous.
+Therefore, web servers offer a stringent performance benchmark for \CFA.
+Indeed, existing web servers have close to optimal performance, while the homogeneity of the workload means fairness may not be a problem.
 As such, these experiments should highlight the overhead due to any \CFA fairness cost in realistic scenarios.
 
 \section{Memcached}
 Memcached~\cite{memcached} is an in-memory key-value store used in many production environments, \eg \cite{atikoglu2012workload}.
-In fact, the Memcached server is so popular there exists a full-featured front-end for performance testing, called @mutilate@~\cite{GITHUB:mutilate}.
+The Memcached server is so popular there exists a full-featured front-end for performance testing, called @mutilate@~\cite{GITHUB:mutilate}.
 Experimenting on Memcached allows for a simple test of the \CFA runtime as a whole, exercising the scheduler, the idle-sleep mechanism, as well the \io subsystem for sockets.
-Note, this experiment does not exercise the \io subsystem with regards to disk operations because Memcached is an in-memory server.
+Note that this experiment does not exercise the \io subsystem with regard to disk operations because Memcached is an in-memory server.
 
 \subsection{Benchmark Environment}
@@ -51,6 +51,6 @@
 This implementation can lead to increased CPU \gls{load} as threads wake from sleep to potentially process the request.
 \end{itemize}
-Here, Memcached is based on an event-based webserver architecture~\cite{Pai99Flash}, using \gls{kthrd}ing to run multiple largely independent event engines, and if needed, spinning up additional kernel threads to handle blocking I/O.
-Alternative webserver architecture are:
+Here, Memcached is based on an event-based web server architecture~\cite{Pai99Flash}, using \gls{kthrd}ing to run multiple largely independent event engines, and if needed, spinning up additional kernel threads to handle blocking I/O.
+Alternative web server architectures are:
 \begin{itemize}
 \item
@@ -74,5 +74,5 @@
  \item \emph{vanilla}: the official release of Memcached, version~1.6.9.
  \item \emph{fibre}: a modification of vanilla using the thread-per-connection model on top of the libfibre runtime.
- \item \emph{cfa}: a modification of the fibre webserver that replaces the libfibre runtime with \CFA.
+ \item \emph{cfa}: a modification of the fibre web server that replaces the libfibre runtime with \CFA.
 \end{itemize}
 
@@ -80,17 +80,17 @@
 This experiment is done by having the clients establish 15,360 total connections, which persist for the duration of the experiment.
 The clients then send read and write queries with only 3\% writes (updates), attempting to follow a desired query rate, and the server responds to the desired rate as best as possible.
-Figure~\ref{fig:memcd:rate:qps} shows the 3 server versions at different client rates, ``Target \underline{Q}ueries \underline{P}er \underline{S}econd'', and the actual rate, ``Actual QPS'', for all three webservers.
-
-Like the experimental setup in Chapter~\ref{microbench}, each experiment is run 15 times, and for each client rate, the measured webserver rate is plotted.
+Figure~\ref{fig:memcd:rate:qps} shows the 3 server versions at different client rates, ``Target \underline{Q}ueries \underline{P}er \underline{S}econd'', and the actual rate, ``Actual QPS'', for all three web servers.
+
+Like the experimental setup in Chapter~\ref{microbench}, each experiment is run 15 times, and for each client rate, the measured web server rate is plotted.
 The solid line represents the median while the dashed and dotted lines represent the maximum and minimum respectively.
-For rates below 500K queries per seconds, all three webservers match the client rate.
-Beyond 500K, the webservers cannot match the client rate.
-During this interval, vanilla Memcached achieves the highest webserver throughput, with libfibre and \CFA slightly lower but very similar throughput.
-Overall the performance of all three webservers is very similar, especially considering that at 500K the servers have reached saturation, which is discussed more in the next section.
+For rates below 500K queries per second, all three web servers match the client rate.
+Beyond 500K, the web servers cannot match the client rate.
+During this interval, vanilla Memcached achieves the highest web server throughput, with libfibre and \CFA slightly lower but very similar throughput.
+Overall the performance of all three web servers is very similar, especially considering that at 500K the servers have reached saturation, which is discussed more in the next section.
 
 \begin{figure}
 	\centering
 	\resizebox{0.83\linewidth}{!}{\input{result.memcd.rate.qps.pstex_t}}
-	\caption[Memcached Benchmark: Throughput]{Memcached Benchmark: Throughput\smallskip\newline Desired vs Actual query rate for 15,360 connections. Target QPS is the query rate that the clients are attempting to maintain and Actual QPS is the rate at which the server is able to respond.}
+	\caption[Memcached Benchmark: Throughput]{Memcached Benchmark: Throughput\smallskip\newline Desired vs Actual query rate for 15,360 connections. Target QPS is the query rate that the clients are attempting to maintain and Actual QPS is the rate at which the server can respond.}
 	\label{fig:memcd:rate:qps}
 %\end{figure}
@@ -99,26 +99,26 @@
 	\centering
 	\resizebox{0.83\linewidth}{!}{\input{result.memcd.rate.99th.pstex_t}}
-	\caption[Memcached Benchmark : 99th Percentile Latency]{Memcached Benchmark : 99th Percentile Latency\smallskip\newline 99th Percentile of the response latency as a function of \emph{desired} query rate for 15,360 connections. }
+	\caption[Memcached Benchmark: 99th Percentile Latency]{Memcached Benchmark: 99th Percentile Latency\smallskip\newline 99th Percentile of the response latency as a function of \emph{desired} query rate for 15,360 connections. }
 	\label{fig:memcd:rate:tail}
 \end{figure}
 
 \subsection{Tail Latency}
-Another popular performance metric is \newterm{tail} latency, which indicates some notion of fairness among requests across the experiment, \ie do some requests wait longer than other requests for service.
+Another popular performance metric is \newterm{tail} latency, which indicates some notion of fairness among requests across the experiment, \ie do some requests wait longer than other requests for service?
 Since many web applications rely on a combination of different queries made in parallel, the latency of the slowest response, \ie tail latency, can dictate a performance perception.
 Figure~\ref{fig:memcd:rate:tail} shows the 99th percentile latency results for the same Memcached experiment.
 
 Again, each experiment is run 15 times with the median, maximum and minimum plotted with different lines.
-As expected, the latency starts low and increases as the server gets close to saturation, at which point, the latency increases dramatically because the webservers cannot keep up with the connection rate so client requests are disproportionally delayed.
-Because of this dramatic increase, the Y axis is presented using log scale.
-Note that the graph shows \emph{target} query rate, the actual response rate is given in Figure~\ref{fig:memcd:rate:qps} as this is the same underlying experiment.
-
-For all three servers, the saturation point is reached before 500K queries per second, which is when throughput starts to decline among the webservers.
-In this experiment, all three webservers are much more distinguishable than the throughput experiment.
-Vanilla Memcached achieves the lowest latency until 600K, after which all the webservers are struggling to respond to client requests.
+As expected, the latency starts low and increases as the server gets close to saturation, at which point, the latency increases dramatically because the web servers cannot keep up with the connection rate so client requests are disproportionally delayed.
+Because of this dramatic increase, the Y axis is presented using a log scale.
+Note that the graph shows the \emph{target} query rate, the actual response rate is given in Figure~\ref{fig:memcd:rate:qps} as this is the same underlying experiment.
+
+For all three servers, the saturation point is reached before 500K queries per second, which is when throughput starts to decline among the web servers.
+In this experiment, all three web servers are much more distinguishable than in the throughput experiment.
+Vanilla Memcached achieves the lowest latency until 600K, after which all the web servers are struggling to respond to client requests.
 \CFA begins to decline at 600K, indicating some bottleneck after saturation.
-Overall, all three webservers achieve micro-second latencies and the increases in latency mostly follow each other.
+Overall, all three web servers achieve micro-second latencies and the increases in latency mostly follow each other.
 
 \subsection{Update rate}
-Since Memcached is effectively a simple database, the information that is cached can be written to concurrently by multiple queries.
+Since Memcached is effectively a simple database, the cache information can be written to concurrently by multiple queries.
 And since writes can significantly affect performance, it is interesting to see how varying the update rate affects performance.
 Figure~\ref{fig:memcd:updt} shows the results for the same experiment as the throughput and latency experiment but increasing the update percentage to 5\%, 10\% and 50\%, respectively, versus the original 3\% update percentage.
@@ -167,8 +167,8 @@
 	}
 	\caption[Throughput and Latency results at different update rates (percentage of writes).]{Throughput and Latency results at different update rates (percentage of writes).\smallskip\newline On the left, throughput as Desired vs Actual query rate.
-	Target QPS is the query rate that the clients are attempting to maintain and Actual QPS is the rate at which the server is able to respond.
+	Target QPS is the query rate that the clients are attempting to maintain and Actual QPS is the rate at which the server can respond.
 	On the right, tail latency, \ie 99th Percentile of the response latency as a function of \emph{desired} query rate.
 	For throughput, higher is better, for tail-latency, lower is better.
-	Each series represent 15 independent runs, the dashed lines are maximums of each series while the solid lines are the median and the dotted lines are the minimums.}
+	Each series represent 15 independent runs, the dashed lines are the maximums of each series while the solid lines are the median and the dotted lines are the minimums.}
 	All runs have 15,360 client connections.
 	\label{fig:memcd:updt}
@@ -183,32 +183,32 @@
 \section{Static Web-Server}
 The Memcached experiment does not exercise two key aspects of the \io subsystem: accept\-ing new connections and interacting with disks.
-On the other hand, a webserver servicing static web-pages does stress both accepting connections and disk \io by accepting tens of thousands of client requests per second where these requests return static data serviced from the file-system cache or disk.\footnote{
-Webservers servicing dynamic requests, which read from multiple locations and construct a response, are not as interesting since creating the response takes more time and does not exercise the runtime in a meaningfully different way.}
-The static webserver experiment compares NGINX~\cite{nginx} with a custom \CFA-based webserver developed for this experiment.
+On the other hand, a web server servicing static web pages does stress both accepting connections and disk \io by accepting tens of thousands of client requests per second where these requests return static data serviced from the file-system cache or disk.\footnote{
+web servers servicing dynamic requests, which read from multiple locations and construct a response, are not as interesting since creating the response takes more time and does not exercise the runtime in a meaningfully different way.}
+The static web server experiment compares NGINX~\cite{nginx} with a custom \CFA-based web server developed for this experiment.
 
 \subsection{NGINX threading}
-NGINX is an high-performance, \emph{full-service}, event-driven webserver.
+NGINX is a high-performance, \emph{full-service}, event-driven web server.
 It can handle both static and dynamic web content, as well as serve as a reverse proxy and a load balancer~\cite{reese2008nginx}.
 This wealth of capabilities comes with a variety of potential configurations, dictating available features and performance.
 The NGINX server runs a master process that performs operations such as reading configuration files, binding to ports, and controlling worker processes.
-When running as a static webserver, it uses an event-driven architecture to service incoming requests.
-Incoming connections are assigned a \emph{stackless} HTTP state-machine and worker processes can handle thousands of these state machines.
+When running as a static web server, it uses an event-driven architecture to service incoming requests.
+Incoming connections are assigned a \emph{stackless} HTTP state machine and worker processes can handle thousands of these state machines.
 For the following experiment, NGINX is configured to use @epoll@ to listen for events on these state machines and have each worker process independently accept new connections.
 Because of the realities of Linux, see Subsection~\ref{ononblock}, NGINX also maintains a pool of auxiliary threads to handle blocking \io.
 The configuration can set the number of worker processes desired, as well as the size of the auxiliary pool.
-However, for the following experiments, NGINX is configured to let the master process decided the appropriate number of threads.
-
-\subsection{\CFA webserver}
-The \CFA webserver is a straightforward thread-per-connection webserver, where a fixed number of \ats are created upfront.
+However, for the following experiments, NGINX is configured to let the master process decide the appropriate number of threads.
+
+\subsection{\CFA web server}
+The \CFA web server is a straightforward thread-per-connection web server, where a fixed number of \ats are created upfront.
 Each \at calls @accept@, through @io_uring@, on the listening port and handles the incoming connection once accepted.
 Most of the implementation is fairly straightforward;
 however, the inclusion of file \io found an @io_uring@ problem that required an unfortunate workaround.
 
-Normally, webservers use @sendfile@~\cite{MAN:sendfile} to send files over a socket because it performs a direct move in the kernel from the file-system cache to the NIC, eliminating reading/writing the file into the webserver.
-While @io_uring@ does not support @sendfile@, it does supports @splice@~\cite{MAN:splice}, which is strictly more powerful.
+Normally, web servers use @sendfile@~\cite{MAN:sendfile} to send files over a socket because it performs a direct move in the kernel from the file-system cache to the NIC, eliminating reading/writing the file into the web server.
+While @io_uring@ does not support @sendfile@, it does support @splice@~\cite{MAN:splice}, which is strictly more powerful.
 However, because of how Linux implements file \io, see Subsection~\ref{ononblock}, @io_uring@ must delegate splice calls to worker threads \emph{inside} the kernel.
 As of Linux 5.13, @io_uring@ had no mechanism to restrict the number of worker threads, and therefore, when tens of thousands of splice requests are made, it correspondingly creates tens of thousands of internal \glspl{kthrd}.
 Such a high number of \glspl{kthrd} slows Linux significantly.
-Rather than abandon the experiment, the \CFA webserver was switched to @sendfile@.
+Rather than abandon the experiment, the \CFA web server was switched to @sendfile@.
 
 Starting with \emph{blocking} @sendfile@, \CFA achieves acceptable performance until saturation is reached.
@@ -220,43 +220,43 @@
 The call can block if there is insufficient memory, which can be caused by having too many connections in the CLOSE-WAIT state.\footnote{
 \lstinline{sendfile} can always block even in nonblocking mode if the file to be sent is not in the file-system cache, because Linux does not provide nonblocking disk I/O.}
-This effect results in a negative feedback where more timeouts lead to more @sendfile@ calls running out of resources.
+This effect results in a negative feedback loop where more timeouts lead to more @sendfile@ calls running out of resources.
 
 Normally, this problem is addressed by using @select@/@epoll@ to wait for sockets to have sufficient resources.
-However, since @io_uring@ does not support @sendfile@ but does respects non\-blocking semantics, marking all sockets as non-blocking effectively circumvents the @io_uring@ subsystem entirely:
+However, since @io_uring@ does not support @sendfile@ but does respect non\-blocking semantics, marking all sockets as non-blocking effectively circumvents the @io_uring@ subsystem entirely:
 all calls simply immediately return @EAGAIN@ and all asynchronicity is lost.
 
-Switching all of the \CFA runtime to @epoll@ for this experiment is unrealistic and does not help in the evaluation of the \CFA runtime.
-For this reason, the \CFA webserver sets and resets the @O_NONBLOCK@ flag before and after any calls to @sendfile@.
+Switching the entire \CFA runtime to @epoll@ for this experiment is unrealistic and does not help in the evaluation of the \CFA runtime.
+For this reason, the \CFA web server sets and resets the @O_NONBLOCK@ flag before and after any calls to @sendfile@.
 However, when the nonblocking @sendfile@ returns @EAGAIN@, the \CFA server cannot block the \at because its I/O subsystem uses @io_uring@.
-Therefore, the \at spins performing the @sendfile@, yields if the call returns @EAGAIN@, and retries in these cases.
+Therefore, the \at spins performing the @sendfile@, yields if the call returns @EAGAIN@ and retries in these cases.
 
 Interestingly, Linux 5.15 @io_uring@ introduces the ability to limit the number of worker threads that are created through the @IORING_REGISTER_IOWQ_MAX_WORKERS@ option.
 Presumably, this limit would prevent the explosion of \glspl{kthrd}, which justified using @sendfile@ over @io_uring@ and @splice@.
 However, recall from Section~\ref{iouring} that @io_uring@ maintains two pools of workers: bounded workers and unbounded workers.
-For a webserver, the unbounded workers should handle accepts and reads on socket, and the bounded workers should handle reading files from disk.
-This setup allows fine-grained control over the number of workers needed for each operation type and presumably lead to good performance.
+For a web server, the unbounded workers should handle accepts and reads on sockets, and the bounded workers should handle reading files from disk.
+This setup allows fine-grained control over the number of workers needed for each operation type and presumably leads to good performance.
 
 However, @io_uring@ must contend with another reality of Linux: the versatility of @splice@.
 Indeed, @splice@ can be used both for reading and writing to or from any type of file descriptor.
 This generality makes it ambiguous which pool @io_uring@ should delegate @splice@ calls to.
-In the case of splicing from a socket to pipe, @splice@ behaves like an unbounded operation, but when splicing from a regular file to a pipe, @splice@ becomes a bounded operation.
+In the case of splicing from a socket to a pipe, @splice@ behaves like an unbounded operation, but when splicing from a regular file to a pipe, @splice@ becomes a bounded operation.
 To make things more complicated, @splice@ can read from a pipe and write to a regular file.
 In this case, the read is an unbounded operation but the write is a bounded one.
 This leaves @io_uring@ in a difficult situation where it can be very difficult to delegate splice operations to the appropriate type of worker.
 Since there is little or no context available to @io_uring@, it seems to always delegate @splice@ operations to the unbounded workers.
-This decision is unfortunate for this specific experiment, since it prevents the webserver from limiting the number of parallel calls to @splice@ without affecting the performance of @read@ or @accept@.
+This decision is unfortunate for this specific experiment since it prevents the web server from limiting the number of parallel calls to @splice@ without affecting the performance of @read@ or @accept@.
 For this reason, the @sendfile@ approach described above is still the most performant solution in Linux 5.15.
 
 One possible workaround is to create more @io_uring@ instances so @splice@ operations can be issued to a different instance than the @read@ and @accept@ operations.
 However, I do not believe this solution is appropriate in general;
-it simply replaces my current webserver hack with a different, equivalent hack.
+it simply replaces my current web server hack with a different, equivalent hack.
 
 \subsection{Benchmark Environment}
-Unlike the Memcached experiment, the webserver experiment is run on a heterogeneous environment.
+Unlike the Memcached experiment, the web server experiment is run on a heterogeneous environment.
 \begin{itemize}
 \item
 The server runs Ubuntu 20.04.4 LTS on top of Linux Kernel 5.13.0-52.
 \item
-The server computer has four AMD Opteron(tm) Processor 6380 with 16 cores running at 2.5GHz, for a total of 64 \glspl{hthrd}.
+The server computer has four AMD Opteron\texttrademark Processor 6380 with 16 cores running at 2.5GHz, for a total of 64 \glspl{hthrd}.
 \item
 The computer is booted with only 8 CPUs enabled, which is sufficient to achieve line rate.
@@ -273,15 +273,15 @@
 It has two 2.8 GHz Xeon CPUs, and four one-gigabit Ethernet cards.
 \item
-Network routing is performed by a HP 2530 10 Gigabit Ethernet switch.
+Network routing is performed by an HP 2530 10 Gigabit Ethernet switch.
 \item
 A client machine runs two copies of the workload generator.
 \end{itemize}
 The clients and network are sufficiently provisioned to drive the server to saturation and beyond.
-Hence, any server effects are attributable solely to the runtime system and webserver.
-Finally, without restricting the server hardware resources, it is impossible to determine if a runtime system or the webserver using it has any specific design restrictions, \eg using space to reduce time.
-Trying to determine these restriction with large numbers of processors or memory simply means running equally large experiments, which takes longer and are harder to set up.
+Hence, any server effects are attributable solely to the runtime system and web server.
+Finally, without restricting the server hardware resources, it is impossible to determine if a runtime system or the web server using it has any specific design restrictions, \eg using space to reduce time.
+Trying to determine these restrictions with large numbers of processors or memory simply means running equally large experiments, which take longer and are harder to set up.
 
 \subsection{Throughput}
-To measure webserver throughput, the server computer is loaded with 21,600 files, sharded across 650 directories, occupying about 2.2GB of disk, distributed over the server's RAID-5 4-drives to achieve high throughput for disk I/O.
+To measure web server throughput, the server computer is loaded with 21,600 files, sharded across 650 directories, occupying about 2.2GB of disk, distributed over the server's RAID-5 4-drives to achieve high throughput for disk I/O.
 The clients run httperf~\cite{httperf} to request a set of static files.
 The httperf load-generator is used with session files to simulate a large number of users and to implement a partially open-loop system.
@@ -290,11 +290,11 @@
 The experiments are run with 16 clients, each running a copy of httperf (one copy per CPU), requiring a set of 16 log files with requests conforming to a Zipf distribution.
 This distribution is representative of users accessing static data through a web browser.
-Each request reads a file name from its trace, establishes a connection, performs an HTTP get-request for the file name, receive the file data, close the connection, and repeat the process.
+Each request reads a file name from its trace, establishes a connection, performs an HTTP get-request for the file name, receives the file data, closes the connection, and repeats the process.
 Some trace elements have multiple file names that are read across a persistent connection.
-A client times-out if the server does not complete a request within 10 seconds.
+A client times out if the server does not complete a request within 10 seconds.
 
 An experiment consists of running a server with request rates ranging from 10,000 to 70,000 requests per second;
 each rate takes about 5 minutes to complete.
-There is 20 seconds idle time between rates and between experiments to allow connections in the TIME-WAIT state to clear.
+There are 20 seconds of idle time between rates and between experiments to allow connections in the TIME-WAIT state to clear.
 Server throughput is measured both at peak and after saturation (\ie after peak).
 Peak indicates the level of client requests the server can handle and after peak indicates if a server degrades gracefully.
@@ -328,5 +328,5 @@
 		\label{fig:swbsrv:err}
 	}
-	\caption[Static Webserver Benchmark : Throughput]{Static Webserver Benchmark : Throughput\smallskip\newline Throughput vs request rate for short lived connections connections.}
+	\caption[Static web server Benchmark: Throughput]{Static web server Benchmark: Throughput\smallskip\newline Throughput vs request rate for short-lived connections.}
 	\label{fig:swbsrv}
 \end{figure}
@@ -334,18 +334,18 @@
 Figure~\ref{fig:swbsrv} shows the results comparing \CFA to NGINX in terms of throughput.
 These results are fairly straightforward.
-Both servers achieve the same throughput until around 57,500 requests per seconds.
+Both servers achieve the same throughput until around 57,500 requests per second.
 Since the clients are asking for the same files, the fact that the throughput matches exactly is expected as long as both servers are able to serve the request rate.
 Once the saturation point is reached, both servers are still very close.
 NGINX achieves slightly better throughput.
 However, Figure~\ref{fig:swbsrv:err} shows the rate of errors, a gross approximation of tail latency, where \CFA achieves notably fewer errors once the servers reach saturation.
-This suggests \CFA is slightly fairer with less throughput, while NGINX sacrifice fairness for more throughput.
-This experiment demonstrate that the \CFA webserver is able to match the performance of NGINX up-to and beyond the saturation point of the machine.
+This suggests \CFA is slightly fairer with less throughput, while NGINX sacrifices fairness for more throughput.
+This experiment demonstrates that the \CFA web server is able to match the performance of NGINX up to and beyond the saturation point of the machine.
 
 \subsection{Disk Operations}
-With 25GB of memory, the entire experimental file-set plus the webserver and OS fit in memory.
+With 25GB of memory, the entire experimental file-set plus the web server and OS fit in memory.
 If memory is constrained, the OS must evict files from the file cache, which causes @sendfile@ to read from disk.\footnote{
 For the in-memory experiments, the file-system cache was warmed by running an experiment three times before measuring started to ensure all files are in the file-system cache.}
-Webservers can behave very differently once file I/O begins and increases.
-Hence, prior work~\cite{Harji10} suggests running both kinds of experiments to test overall webserver performance.
+web servers can behave very differently once file I/O begins and increases.
+Hence, prior work~\cite{Harji10} suggests running both kinds of experiments to test overall web server performance.
 
 However, after reducing memory to 2.5GB, the problem with @splice@ and @io_uring@ rears its ugly head again.
@@ -355,5 +355,5 @@
 The only solution for the bounded side is to spawn more threads and let these handle the blocking.
 
-Supporting this case in the webserver would require creating more \procs or creating a dedicated thread-pool.
+Supporting this case in the web server would require creating more \procs or creating a dedicated thread pool.
 However, I felt this kind of modification moves too far away from my goal of evaluating the \CFA runtime, \ie it begins writing another runtime system;
 hence, I decided to forgo experiments on low-memory performance.