Index: doc/theses/thierry_delisle_PhD/thesis/text/intro.tex
===================================================================
--- doc/theses/thierry_delisle_PhD/thesis/text/intro.tex	(revision 1f201238257561a4db08fd69c2087b10464b4c6f)
+++ doc/theses/thierry_delisle_PhD/thesis/text/intro.tex	(revision 6e8d4461c0898ee99ea0f5ebe70e6c3df581d161)
@@ -1,6 +1,91 @@
-\chapter*{Introduction}\label{intro}
+\chapter{Introduction}\label{intro}
 \todo{A proper intro}
 
-The C programming language~\cite{C11}
+Any shared system needs scheduling.
+Computer systems share multiple resources across many threads of execution, even on single user computers like a laptop.
+Scheduling occurs at discreet points when there are transitions in a system.
+For example, a thread cycles through the following transitions during its execution.
+\begin{center}
+\input{executionStates.pstex_t}
+\end{center}
+These \newterm{state transition}s are initiated in response to events (\Index{interrupt}s):
+\begin{itemize}
+\item
+entering the system (new $\rightarrow$ ready)
+\item
+timer alarm for preemption (running $\rightarrow$ ready)
+\item
+long term delay versus spinning (running $\rightarrow$ blocked)
+\item
+blocking ends, \ie network or I/O completion (blocked $\rightarrow$ ready)
+\item
+normal completion or error, \ie segment fault (running $\rightarrow$ halted)
+\end{itemize}
+Key to scheduling is that a thread cannot bypass the ``ready'' state during a transition so the scheduler maintains complete control of the system.
+
+In detail, in a computer system with multiple processors and work units, there exists the problem of mapping work onto processors in an efficient manner, called \newterm{scheduling}.
+These systems are normally \newterm{open}, meaning new work arrives from an external source or spawned from an existing work unit.
+Scheduling is a zero-sum game as computer processors normally have a fixed, maximum number of cycles per unit time.
+(Frequency scaling and turbot boost add a degree of complexity that can be ignored in this discussion without loss of generality.)
+
+On a computer system, the scheduler takes a sequence of work requests in the form of threads and attempts to complete the work, subject to performance objectives, such as resource utilization.
+A general-purpose dynamic-scheduler for an open system cannot anticipate future work requests, so its performance is rarely optimal.
+Even with complete knowledge of arrive order and work, an optimal solution is NP (bin packing).
+However, schedulers do take advantage of regularities in work patterns to produce excellent solutions.
+Nevertheless, schedulers are a series of compromises, occasionally with some static or dynamic tuning parameters to enhance specific patterns.
+
+When the workload exceeds the capacity of the processors, \ie work cannot be executed immediately, it is placed on a queue for subsequent service, called a \newterm{ready queue}.
+(In real-life, a work unit (person) can \newterm{balk}, and leave the system rather than wait.)
+Ready queues organize threads for scheduling, which indirectly organizes the work to be performed.
+The structure of ready queues forms a spectrum.
+At one end is the single-queue multi-server (SIMS) and at the other end is the multi-queue multi-server (MOMS).
+\begin{center}
+\begin{tabular}{l|l}
+\multicolumn{1}{c|}{\textbf{SIMS}} & \multicolumn{1}{c}{\textbf{MOMS}} \\
+\hline
+\raisebox{0.5\totalheight}{\input{SQMS.pstex_t}} & \input{MQMSG.pstex_t}
+\end{tabular}
+\end{center}
+(While \newterm{pipeline} organizations also exist, \ie chaining schedulers together, they do not provide an advantage in this context.)
+
+The three major optimization criteria for a scheduler are:
+\begin{enumerate}[leftmargin=*]
+\item
+\newterm{load balancing}: available work is distributed so no processor is idle when work is available.
+
+\noindent
+Eventual progress for each work unit is often an important consideration, \ie no starvation.
+\item
+\newterm{affinity}: processors access state through a complex memory hierarchy, so it is advantageous to keep a work unit's state on a single or closely bound set of processors.
+
+\noindent
+Essentially, all multi-processor computers have non-uniform memory access (NUMB), with one or more quantized steps to access data at different levels (steps) in the memory hierarchy.
+When a system has a large number of independently executing threads, affinity is impossible because of \newterm{thread churn}.
+That is, threads must be scheduled on multiple processors to obtain high processors utilization because the number of threads $\ggg$ processors.
+
+\item
+\newterm{contention}: safe access of shared objects by multiple processors requires mutual exclusion in the form of locking\footnote{
+Lock-free data-structures is still locking because all forms of locking invoke delays.}
+
+\noindent
+Locking cost and latency increases significantly with the number of processors accessing a shared object.
+\end{enumerate}
+
+SIMS has perfect load-balancing but poor affinity and high contention by the processors, because of the single queue.
+MOMS has poor load-balancing but perfect affinity and no contention, because each processor has its own queue.
+Between these two schedulers are a host of options, \ie adding an optional global, shared ready-queue to MOMS.
+Significant research effort has also looked at load sharing/stealing among queues, when a ready queue is too long or short, respectively.
+These approaches attempt to perform better load-balancing at the cost of affinity and contention.
+Load sharing/stealing schedulers attempt to push/pull work units to/from other ready queues
+
+Note, a computer system is often lightly loaded (or has no load);
+any scheduler can handle this situation, \ie all schedulers are equal in this context.
+As the workload increases, there is a point where some schedulers begin to perform better than others based on the above criteria.
+A poorer scheduler might saturate for some reason and not be able to assign available work to idle processors, \ie processors are spinning when work is available.
+
+
+\section{\CFA programming language}
+
+\todo{Brief discussion on \CFA features used in the thesis.}
 
 The \CFA programming language~\cite{cfa:frontpage,cfa:typesystem} extends the C programming language by adding modern safety and productivity features, while maintaining backwards compatibility. Among its productivity features, \CFA supports user-level threading~\cite{Delisle21} allowing programmers to write modern concurrent and parallel programs.
@@ -9,2 +94,31 @@
 
 As a research project, this work builds exclusively on newer versions of the Linux operating-system and gcc/clang compilers. While \CFA is released, supporting older versions of Linux ($<$~Ubuntu 16.04) and gcc/clang compilers ($<$~gcc 6.0) is not a goal of this work.
+
+
+\section{Contributions}
+\label{s:Contributions}
+
+This work provides the following contributions in the area of user-level scheduling in an advanced programming-language runtime-system:
+\begin{enumerate}[leftmargin=*]
+\item
+Guarantee no thread or set of threads can conspire to prevent other threads from executing.
+\begin{itemize}
+\item
+There must exist some form of preemption to prevent CPU-bound threads from gaining exclusive access to one or more processors.
+\item
+There must be a scheduler guarantee that threads are executed after CPU-bound threads are preempted.
+Otherwise, CPU-bound threads can immediately rescheduled, negating the preemption.
+\end{itemize}
+Hence, the runtime/scheduler provides a notion of fairness that is impossible for a programmer to violate.
+\item
+Provide comprehensive scheduling across all forms of preemption and blocking, where threads move from the running to ready or blocked states.
+
+Once a thread stops running, the processor executing it must be rescheduled, if possible.
+Knowing what threads are in the ready state for scheduling is difficult because blocking operations complete asynchronous and with poor notification.
+\item
+Efficiently deal with unbalanced workloads among processors.
+
+Simpler to 
+\item
+Efficiently deal with idle processors when there is less work than the available computing capacity.
+\end{enumerate}
