Index: doc/theses/thierry_delisle_PhD/thesis/text/eval_macro.tex
===================================================================
--- doc/theses/thierry_delisle_PhD/thesis/text/eval_macro.tex	(revision 36a05d7487d931cd7a2061e8d8591e698f2c4f83)
+++ doc/theses/thierry_delisle_PhD/thesis/text/eval_macro.tex	(revision 36a05d7487d931cd7a2061e8d8591e698f2c4f83)
@@ -0,0 +1,15 @@
+\chapter{Macro-Benchmarks}\label{macrobench}
+
+\section{Static Web-Server}
+
+In Memory Plain Text
+
+Networked Plain Text
+
+Networked ZIPF
+
+\section{Memcached}
+
+In Memory
+
+Networked
Index: doc/theses/thierry_delisle_PhD/thesis/text/eval_micro.tex
===================================================================
--- doc/theses/thierry_delisle_PhD/thesis/text/eval_micro.tex	(revision 36a05d7487d931cd7a2061e8d8591e698f2c4f83)
+++ doc/theses/thierry_delisle_PhD/thesis/text/eval_micro.tex	(revision 36a05d7487d931cd7a2061e8d8591e698f2c4f83)
@@ -0,0 +1,50 @@
+\chapter{Micro-Benchmarks}\label{microbench}
+
+The first step of evaluation is always to test-out small controlled cases, to ensure that the basics are working properly.
+This sections presents four different experimental setup, evaluating some of the basic features of \CFA's scheduler.
+
+\section{Cycling latency}
+The most basic evaluation of any ready queue is to evaluate the latency needed to push and pop one element from the ready-queue.
+While these two operation also describe a \texttt{yield} operation, many systems use this as the most basic benchmark.
+However, yielding can be treated as a special case, since it also carries the information that the length of the ready queue will not change.
+Not all systems use this information, but those which do may appear to have better performance than they would for disconnected push/pop pairs.
+For this reason, I chose a different first benchmark, which I call the Cycle Benchmark.
+This benchmark arranges many threads into multiple rings of threads.
+Each ring is effectively a circular singly-linked list.
+At runtime, each thread unparks the next thread before parking itself.
+This corresponds to the desired pair of ready queue operations.
+Unparking the next thread requires pushing that thread onto the ready queue and the ensuing park will cause the runtime to pop a thread from the ready-queue.
+Figure~\ref{fig:cycle} shows a visual representation of this arrangement.
+
+The goal of this ring is that the underlying runtime cannot rely on the guarantee that the number of ready threads will stay constant over the duration of the experiment.
+In fact, the total number of threads waiting on the ready is expected to vary a little because of the race between the next thread unparking and the current thread parking.
+The size of the cycle is also decided based on this race: cycles that are too small may see the
+chain of unparks go full circle before the first thread can park.
+While this would not be a correctness problem, every runtime system must handle that race, it could lead to pushes and pops being optimized away.
+Since silently omitting ready-queue operations would throw off the measuring of these operations.
+Therefore the ring of threads must be big enough so the threads have the time to fully park before they are unparked.
+Note that this problem is only present on SMP machines and is significantly mitigated by the fact that there are multiple rings in the system.
+
+\begin{figure}
+	\centering
+	\input{cycle.pstex_t}
+	\caption[Cycle benchmark]{Cycle benchmark\smallskip\newline Each thread unparks the next thread in the cycle before parking itself.}
+	\label{fig:cycle}
+\end{figure}
+
+\todo{check term ``idle sleep handling''}
+To avoid this benchmark from being dominated by the idle sleep handling, the number of rings is kept at least as high as the number of processors available.
+Beyond this point, adding more rings serves to mitigate even more the idle sleep handling.
+This is to avoid the case where one of the worker threads runs out of work because of the variation on the number of ready threads mentionned above.
+
+The actual benchmark is more complicated to handle termination, but that simply requires using a binary semphore or a channel instead of raw \texttt{park}/\texttt{unpark} and carefully picking the order of the \texttt{P} and \texttt{V} with respect to the loop condition.
+
+\todo{mention where to get the code.}
+
+\section{Yield}
+For completion, I also include the yield benchmark.
+This benchmark is much simpler than the cycle tests, it simply creates many threads that call \texttt{yield}.
+
+\section{Locality}
+
+\section{Transfer}