Index: doc/theses/thierry_delisle_PhD/thesis/text/conclusion.tex
===================================================================
--- doc/theses/thierry_delisle_PhD/thesis/text/conclusion.tex	(revision 9d67a6d7a3470b54ccdcc83446eeb2925561d394)
+++ doc/theses/thierry_delisle_PhD/thesis/text/conclusion.tex	(revision 7e5da64c14b3c725edda822f5a1f3e139c9f86d6)
@@ -1,5 +1,5 @@
 \chapter{Conclusion}\label{conclusion}
 
-Building the \CFA runtime has been an extremely challenging project.
+Building the \CFA runtime has been a challenging project.
 The work was divided between high-level concurrency design and a user-level threading runtime (Masters thesis), and low-level support of the user-level runtime using OS kernel-threading and its (multiple) I/O subsystems (Ph.D. thesis).
 Because I am the main developer for both components of this project, there is strong continuity across the design and implementation.
@@ -7,35 +7,25 @@
 
 I believed my Masters work would provide the background to make the Ph.D work reasonably straightforward.
-What I discovered is that interacting with kernel locking, threading, and I/O in the UNIX (Linux) operating system is extremely difficult.
-There are multiple concurrency aspects in UNIX that are poorly designed, not only for user-level threading but also for kernel-level threading.
-Basically, UNIX-based OSs are not very concurrency friendly.
+However, in doing so I discovered two expected challenges.
+First, while modern symmetric multiprocessing CPU have a significant penalties for communicating across cores, state-of-the-art work-stealing schedulers are very effective a avoid the need for communication in many common workloads.
+This leaves very little space for adding fairness guarantees without notable performance penalties.
+Second, the kernel locking, threading, and I/O in the Linux operating system offers very little flexibility of use.
+There are multiple concurrency aspects in Linux that require carefully following a strict procedure in order to achieve acceptable performance.
 To be fair, many of these concurrency aspects were designed 30-40 years ago, when there were few multi-processor computers and concurrency knowledge was just developing.
-It is unfortunately so little has changed in the intervening years.
+It is unfortunate that little has changed in the intervening years.
 
 Also, my decision to use @io_uring@ was both a positive and negative.
-The positive is that @io_uring@ supports the panoply of I/O mechanisms in UNIX;
-hence, the \CFA runtime uses one I/O mechanism to provide non-blocking I/O, rather than using @select@ to handle TTY I/O, @epoll@ to handle network I/O, and some unknown to handle disk I/O.
-It is unclear to me how I would have merged all these different I/O mechanisms into a coherent scheduling implementation.
+The positive is that @io_uring@ supports the panoply of I/O mechanisms in Linux;
+hence, the \CFA runtime uses one I/O mechanism to provide non-blocking I/O, rather than using @select@ to handle TTY I/O, @epoll@ to handle network I/O, and managing a thread pool to handle disk I/O.
+Merging all these different I/O mechanisms into a coherent scheduling implementation would require a much more work than what is present in this thesis, as well as detailed knowledge of the I/O mechanisms in Linux.
 The negative is that @io_uring@ is new and developing.
 As a result, there is limited documentation, few places to find usage examples, and multiple errors that required workarounds.
 Given what I now know about @io_uring@, I would say it is insufficiently coupled with the Linux kernel to properly handle non-blocking I/O.
-Specifically, spinning up internal kernel threads to handle blocking scenarios is what developers already do outside of the kernel.
+It does not seem to reach deep into the Kernel's handling of \io, and as such it must contend with the same realities that users of epoll must contend with.
+Specifically, in cases where @O_NONBLOCK@ behaves as desired, operations must still be retried.
+To preserve the illusion of asynchronicity, this requires delegating operations to kernel threads.
+This is also true of cases where @O_NONBLOCK@ does not prevent blocking.
+Spinning up internal kernel threads to handle blocking scenarios is what developers already do outside of the kernel, and managing these threads adds significant burden to the system.
 Nonblocking I/O should not be handled in this way.
-
-% While \gls{uthrding} is an old idea, going back to the first multi-processor computers, it was largely set aside in the 1990s because of the complexity in making it work well between applications and the operating system.
-% Unfortunately,, a large amount of that complexity still exists, making \gls{uthrding} a difficult task for library and programming-language implementers.
-% For example, the introduction of thread-local storage and its usage in many C libraries causes the serially reusable problem~\cite{SeriallyReusable} for all \gls{uthrding} implementers.
-% Specifically, if a \gls{kthrd} is preempted, it always restarts with the same thread-local storage;
-% when a user thread is preempted, it can be restarted on another \gls{kthrd}, accessing the new thread-local storage, or worse, the previous thread-local storage.
-% The latter case causes failures when an operation using the thread-local storage is assumed to be atomic at the kernel-threading level.
-% If library implementers always used the pthreads interface to access threads, locks, and thread-local storage, language runtimes can interpose a \gls{uthrding} version of pthreads, switching the kind of threads, locks, and storage, and hence, make it safe.
-% However, C libraries are currently filled with direct declarations of thread-local storage and low-level atomic instructions.
-% In essence, every library developer is inventing their own threading mechanisms to solve their unique problem, independent from any standardize approaches.
-% This state of affairs explains why concurrency is such a mess.
-% 
-% To address the concurrency mess, new programming languages integrate threading into the language rather than using the operating-system supplied interface.
-% The reason is that many sequential code-optimizations invalid correctly written concurrent programs.
-% While providing safe concurrent-code generation is essential, the underlying language runtime is still free to implement threading using kernel or user/kernel threading, \eg Java versus Go.
-% In either case, the language/runtime manages all the details to simplify concurrency and increase safety.
 
 \section{Goals}
@@ -85,5 +75,5 @@
 \section{Future Work}
 While the \CFA runtime achieves a better compromise, in term of performance and fairness, than other schedulers, I believe further improvements can be made to reduce or eliminate the few cases where performance does deteriorate.
-Fundamentally, achieving performance and starvation freedom will always be opposing goals even outside of scheduling algorithms.
+Fundamentally, achieving performance and starvation freedom will always be goals with opposing needs even outside of scheduling algorithms.
 
 \subsection{Idle Sleep}
@@ -101,5 +91,5 @@
 The correctness of that hand-shake is critical when the last \proc goes to sleep but could be relaxed when several \procs are awake.
 \item
-Furthermore, organizing the sleeping \procs as a LIDO stack makes sense to keep cold \procs as cold as possible, but it might be more appropriate to attempt to keep cold CPU sockets instead.
+Furthermore, organizing the sleeping \procs as a LIFO stack makes sense to keep cold \procs as cold as possible, but it might be more appropriate to attempt to keep cold CPU sockets instead.
 \end{itemize}
 However, using these techniques would require significant investigation.