Index: doc/theses/thierry_delisle_PhD/comp_II/Makefile
===================================================================
--- doc/theses/thierry_delisle_PhD/comp_II/Makefile	(revision bbdb0c643e5e1d39beac9cc933345506b44d7ff5)
+++ doc/theses/thierry_delisle_PhD/comp_II/Makefile	(revision f100a839520c4f0c1a242b3868d214831ef643f9)
@@ -22,5 +22,7 @@
 	emptybit \
 	emptytree \
+	emptytls \
 	resize \
+	system \
 }
 
Index: doc/theses/thierry_delisle_PhD/comp_II/comp_II.tex
===================================================================
--- doc/theses/thierry_delisle_PhD/comp_II/comp_II.tex	(revision bbdb0c643e5e1d39beac9cc933345506b44d7ff5)
+++ doc/theses/thierry_delisle_PhD/comp_II/comp_II.tex	(revision f100a839520c4f0c1a242b3868d214831ef643f9)
@@ -131,8 +131,7 @@
 \begin{figure}
 	\begin{center}
-%		{\resizebox{0.8\textwidth}{!}{\input{base}}}
 		\input{base}
 	\end{center}
-	\caption{Relaxed FIFO list at the base of the scheduler: an array of strictly FIFO lists. }
+	\caption{Relaxed FIFO list at the base of the scheduler: an array of strictly FIFO lists. The timestamp is in all nodes and cell arrays.}
 	\label{fig:base}
 \end{figure}
@@ -140,5 +139,4 @@
 \begin{figure}
 	\begin{center}
-%		{\resizebox{0.8\textwidth}{!}{\input{empty}}}
 		\input{empty}
 	\end{center}
@@ -199,27 +197,44 @@
 Finally, a third approach is to use dense information, similar to the bitmap, but have each thread keep its own independant copies of it. While this approach can offer good scalability \emph{and} low latency, the livelyness of the information can become a problem. In the simple cases, local copies of which underlying queues are empty can become stale and end-up not being useful for the pop operation. A more serious problem is that reliable information is necessary for some parts of this algorithm to be correct. As mentionned in this section, processors must know \emph{reliably} whether the list is empty or not to decide if they can return \texttt{NULL} or if they must keep looking during a pop operation. Section~\ref{sec:sleep} discusses another case where reliable information is required for the algorithm to be correct.
 
+\begin{figure}
+	\begin{center}
+		{\resizebox{0.8\textwidth}{!}{\input{emptytls}}}
+	\end{center}
+	\caption{``More empty'' queue with added per processor bitmask to indicate which array cells have items.}
+	\label{fig:emptytls}
+\end{figure}
+
 There is a fundamental tradeoff among these approach. Dense global information about empty underlying queues helps zero-contention cases at the cost of high-contention case. Sparse global information helps high-contention cases but increases latency in zero-contention-cases, to read and ``aggregate'' the information\footnote{Hiearchical structures, e.g., binary search tree, effectively aggregate information but following pointer chains, learning information for each node. Similarly, other sparse schemes would need to read multiple cachelines to acquire all the information needed.}. Finally, dense local information has both the advantages of low latency in zero-contention cases and scalability in high-contention cases, however the information can become stale making it difficult to use to ensure correctness. The fact that these solutions have these fundamental limits suggest to me that a better solution combines these properties in an interesting ways. The lock discussed in Section~\ref{sec:resize} also allows for solutions that adapt to the number of processors, which could also prove useful.
 
 \paragraph{Objectives and Existing Work}
 
-How much scalability is actually needed is highly debatable, libfibre\cit has compared favorably to other schedulers in webserver tests\cit and uses a single atomic counter in its scheduling algorithm similarly to the proposed bitmask. As such, the single atomic instruction on a shared cacheline may be sufficiently performant.
-
-I have built a prototype of this ready-queue (including the bitmask and BMI2 usage, but not the sharded bitmask) and ran performance experiments on it but it is difficult to compare this prototype to a thread scheduler as the prototype is used as a data-queue. I have also integrated this prototype into the \CFA runtime, but have not yet created performance experiments to compare results. I believe that the bitmask approach is currently one of the larger risks of the proposal, early tests lead me to believe it may work but it is not clear that the contention problem can be overcome. The worst-case scenario is a case where the number of processors and the number of ready threads are similar, yet scheduling events are very frequent. Fewer threads should lead to the Idle Sleep mechanism discussed in Section~\ref{sec:sleep} to reduce contention while having many threads ready leads to optimal performance. It is difficult to evaluate the likeliness of this worst-case scenario in real workloads. I believe, frequent scheduling events suggest a more ``bursty'' workload where new work is finely divided among many threads which race to completion. This type of workload would only see a peek of contention close to the end of the work, but no sustained contention. Very fine-grained pipelines are less ``bursty'', these may lead to more sustained contention. However, they could also easily benefit from a direct hand-off strategy which would circumvent the problem entirely.
+How much scalability is actually needed is highly debatable \emph{libfibre}\cit has compared favorably to other schedulers in webserver tests\cit and uses a single atomic counter in its scheduling algorithm similarly to the proposed bitmask. As such, the single atomic instruction on a shared cacheline may be sufficiently performant.
+
+I have built a prototype of this ready-queue in the shape of a data-queue, i.e., nodes on the queue are structures with a single int and the intrusive data fields. Using this prototype I ran preliminary performance experiments which confirm the expected performance in Table~\ref{tab:perfcases}. However, these experiments only offer a hint at the actual performance of the scheduler since threads form more complex operations than simple integer nodes, e.g., threads are not independant of each other, when a thread blocks some other thread must intervene to wake it.
+
+I have also integrated this prototype into the \CFA runtime, but have not yet created performance experiments to compare results. As creating one-to-one comparisons with the prototype will be complex.
 
 \subsection{Dynamic Resizing} \label{sec:resize}
-The \CFA runtime system currently handles dynamically adding and removing processors from clusters at any time. Since this is part of the existing design, the proposed scheduler must also support this behaviour. However, dynamicly resizing the clusters is considered a rare event associated with setup, teardown and major configuration changes. This assumptions is made both in the design of the proposed scheduler as well as in the original design of the \CFA runtime system. As such, the proposed scheduler must honor the correctness of these behaviour but does not have any performance objectives with regards to resizing a cluster. How long adding or removing processors take and how much this disrupts the performance of other threads is considered a secondary concern since it should be amortized over long period of times. However, as mentionned in Section~\ref{sec:queue}, contention on the underlying queues can have a direct impact on performance, the number of underlying queues must therefore be adjusted as the number of processors grows or shrinks. Since the underlying queues are stored in a dense array, changing the number of queues requires resizing the array and therefore moving it. This can introduce memory reclamation problems if not done correctly.
-
-\begin{figure}
-	\begin{center}
-%		{\resizebox{0.8\textwidth}{!}{\input{resize}}}
+\begin{figure}
+	\begin{center}
+		\input{system}
+	\end{center}
+	\caption{Global structure of the \CFA runtime system.}
+	\label{fig:system}
+\end{figure}
+
+The \CFA runtime system groups processors together as clusters. Threads on a cluster are always scheduled on one of the processors of the cluster. Currently, the runtime handles dynamically adding and removing processors from clusters at any time. Since this is part of the existing design, the proposed scheduler must also support this behaviour. However, dynamicaly resizing a cluster is considered a rare event associated with setup, teardown and major configuration changes. This assumption is made both in the design of the proposed scheduler as well as in the original design of the \CFA runtime system. As such, the proposed scheduler must honor the correctness of these behaviour but does not have any performance objectives with regard to resizing a cluster. How long adding or removing processors take and how much this disrupts the performance of other threads is considered a secondary concern since it should be amortized over long period of times. However, as mentionned in Section~\ref{sec:queue}, contention on the underlying queues can have a direct impact on performance. The number of underlying queues must therefore be adjusted as the number of processors grows or shrinks. Since the underlying queues are stored in a dense array, changing the number of queues requires resizing the array and expanding the array requires moving it. This can introduce memory reclamation problems if not done correctly.
+
+\begin{figure}
+	\begin{center}
 		\input{resize}
 	\end{center}
-	\caption{Copy of data structure shown in Figure~\ref{fig:base}. The cells of the array can be modified concurrently but resizing the array, which requires moving it, is not safe to do concurrently. This can also be true of the accompanying data structures used to find non-empty queues.}
+	\caption{Copy of data structure shown in Figure~\ref{fig:base}. }
 	\label{fig:base2}
 \end{figure}
 
-It is important to note that how the array is used in this case. While the array cells are modified by every push and pop operation, the array itself, i.e., the pointer that would change when resized, is only read during these operations. Therefore the use is this pointer can be described as frequent reads and in frequent writes. This description effectively matches with the description of a Reader-Writer lock, infrequent but invasive updates among frequent read operations. In the case of the Ready-Queue described above, read operations are operations that push or pop from the ready-queue but do not invalidate any references to the ready queue data structures. Writes on the other-hand would add or remove inner queues, invalidating references to the array of inner queues in the process. Therefore, the current proposed approach to this problem is the add a per-cluster Reader Writer lock around the ready queue to prevent restructuring of the ready-queue data structure while threads are being pushed or popped.
-
-There are possible alternatives to the Reader Writer lock solution. This problem is effectively a memory reclamation problem and as such there is a large body of research on the subject. However, the RWlock solution is simple and can be leveraged to solve other problems (e.g. processor ordering and memory reclamation of threads) which makes it an attractive solution.
+It is important to note how the array is used in this case. While the array cells are modified by every push and pop operation, the array itself, i.e., the pointer that would change when resized, is only read during these operations. Therefore the use of this pointer can be described as frequent reads and infrequent writes. This description effectively matches with the description of a Reader-Writer lock, infrequent but invasive updates among frequent read operations. In the case of the Ready-Queue described above, read operations are operations that push or pop from the ready-queue but do not invalidate any references to the ready queue data structures. Writes on the other-hand would add or remove inner queues, invalidating references to the array of inner queues in the process. Therefore, the current proposed approach to this problem is to add a per-cluster Reader Writer lock around the ready queue to prevent restructuring of the ready-queue data structure while threads are being pushed or popped.
+
+There are possible alternatives to the Reader Writer lock solution. This problem is effectively a memory reclamation problem and as such there is a large body of research on the subject\cit. However, the RWlock solution is simple and can be leveraged to solve other problems (e.g., processor ordering and memory reclamation of threads), which makes it an attractive solution.
 
 \paragraph{Objectives and Existing Work}
@@ -227,25 +242,25 @@
 
 \subsection{Idle Sleep} \label{sec:sleep}
-As mentionned above, idle sleep is the process of putting processors to sleep while they do not have threads to execute. In this context, processors are kernel-threads and sleeping refers to asking the kernel to block a thread. This can be achieved with either thread synchronization operations like pthread\_cond\_wait or using signal operations like sigsuspend. The goal of putting idle processors to sleep is two-fold, it reduces energy consumption in cases where more idle kernel-threads translate to idle hardware threads, and reduces contention on the ready queue, since the otherwise idle processors generally contend trying to pop items from the queue. Since energy efficiency is a growing concern in many industry sectors\cit, there is not real need to solve the contention problem without using idle sleep.
-
-Support for idle sleep broadly involves calling the operating system to block the kernel thread but also handling the race between the sleeping and the waking up, and handling which kernel thread should sleep or wake-up.
-
-When a processor decides to sleep, there is a race that occurs between it signalling that it will go to sleep (so other processors can find sleeping processors) and actually blocking the kernel thread. This is equivalent to the classic problem of missing signals when using condition variables, the ``sleepy'' processor indicates that it will sleep but has not yet gone to sleep, if another processor attempts to wake it up, the waking-up operation may claim nothing needs to be done and the signal will have been missed. In cases where threads are scheduled from processors on the current cluster, loosing signals is not necessarily critical, because at least some processors on the cluster are awake. Individual processors always finish scheduling threads before looking for new work, which means that the last processor to go to sleep cannot miss threads scheduled from inside the cluster (if they do, that demonstrates the ready-queue is not linearizable). However, this guarantee does not hold if threads are scheduled from outside the cluster, either due to an external event like timers and I/O, or due to a thread migrating from a different cluster. In this case, missed signals can lead to the cluster deadlocking where it should not\footnote{Clusters ``should'' never deadlock, but for this proposal, cases where \CFA users \emph{actually} wrote \CFA code that leads to a deadlock it is considered as a deadlock that ``should'' happen. }. Therefore, it is important that the scheduling of threads include a mechanism where signals \emph{cannot} be missed. For performance reasons, it can be advantageous to have a secondary mechanism that allows signals to be missed in cases where it cannot lead to a deadlock. To be safe, this process must include a ``handshake'' where it is guaranteed that either~: the sleepy processor notices that a thread was scheduled after it signalled its intent to block or code scheduling threads sees the intent to sleep before scheduling and be able to wake-up the processor. This matter is complicated by the fact that pthread offers few tools to implement this solution and offers no guarantee of ordering of threads waking up for most of these tools.
-
-Another issues is trying to avoid kernel sleeping and waking frequently. A possible partial solution is to order the processors so that the one which most recently went to sleep is woken up. This allows other sleeping processors to reach deeper sleep state (when these are available) while keeping ``hot'' processors warmer. Note that while this generally means organising the processors in a stack, I believe that the unique index provided by the ReaderWriter lock can be reused to strictly order the waking order of processors, causing a LIFO like waking order. While a strict LIFO stack is probably better, using the processor index could prove useful and offer a sufficiently LIFO ordering.
-
-Finally, another important aspect of Idle Sleep is when should processors make the decision to sleep and when it is appropriate for sleeping processors to be woken up. Processors that are unnecessarily awake lead to unnecessary contention and power consumption, while too many sleeping processors can lead to sub-optimal throughput. Furthermore, transitions from sleeping to awake and vice-versa also add unnecessary latency. There is already a wealth of research on the subject and I do not plan to implement a novel idea for the Idle Sleep heuristic in this project.
+As mentioned, idle sleep is the process of putting processors to sleep when they have no threads to execute. In this context, processors are kernel-threads and sleeping refers to asking the kernel to block a thread. This benefit can be achieved with either thread synchronization operations like pthread\_cond\_wait or using signal operations like sigsuspend. The goal of putting idle processors to sleep is two-fold, it reduces energy consumption in cases where more idle kernel-threads translate to idle hardware threads, and reduces contention on the ready queue, since the otherwise idle processors generally contend trying to pop items from the queue.
+
+Support for idle sleep broadly involves calling the operating system to block the kernel thread and handling the race between a blocking thread  and the waking thread, and handling which kernel thread should sleep or wake-up.
+
+When a processor decides to sleep, there is a race that occurs between it signalling that is going to sleep (so other processors can find sleeping processors) and actually blocking the kernel thread. This is equivalent to the classic problem of missing signals when using condition variables: the ``sleepy'' processor indicates that it will sleep but has not yet gone to sleep, when another processor attempts to wake it up, the waking-up operation may claim nothing needs to be done and the signal is missed. In cases where threads are scheduled from processors on the current cluster, loosing signals is not necessarily critical, because at least some processors on the cluster are awake and may check for more processors eventually. Individual processors always finish scheduling threads before looking for new work, which means that the last processor to go to sleep cannot miss threads scheduled from inside the cluster (if they do, that demonstrates the ready-queue is not linearizable). However, this guarantee does not hold if threads are scheduled from outside the cluster, either due to an external event like timers and I/O, or due to a thread migrating from a different cluster. In this case, missed signals can lead to the cluster deadlocking where it should not\footnote{Clusters ``should'' never deadlock, but for this proposal, cases where \CFA users \emph{actually} write \CFA code that leads to a deadlock is considered as a deadlock that ``should'' happen. }. Therefore, it is important that the scheduling of threads include a mechanism where signals \emph{cannot} be missed. For performance reasons, it can be advantageous to have a secondary mechanism that allows signals to be missed in cases where it cannot lead to a deadlock. To be safe, this process must include a ``handshake'' where it is guaranteed that either~: the sleepy processor notices that a thread is scheduled after it signalled its intent to block or code scheduling threads sees the intent to sleep before scheduling and be able to wake-up the processor. This matter is complicated by the fact that pthreads offers few tools to implement this solution and offers no guarantee of ordering of threads waking up for most of these tools.
+
+Another issues is trying to avoid kernel threads sleeping and waking frequently. A possible partial solution is to order the processors so that the one which most recently went to sleep is woken up. This allows other sleeping processors to reach deeper sleep state (when these are available) while keeping ``hot'' processors warmer. Note that while this generally means organising the processors in a stack, I believe that the unique index provided by the ReaderWriter lock can be reused to strictly order the waking order of processors, causing a LIFO like waking order. While a strict LIFO stack is probably better, using the processor index could prove useful and offer a sufficiently LIFO ordering.
+
+Finally, another important aspect of Idle Sleep is when should processors make the decision to sleep and when is it appropriate for sleeping processors to be woken up. Processors that are unnecessarily unblocked lead to unnecessary contention and power consumption, while too many sleeping processors can lead to sub-optimal throughput. Furthermore, transitions from sleeping to awake and vice-versa also add unnecessary latency. There is already a wealth of research on the subject and I may use an existing approach for the Idle Sleep heuristic in this project.
 
 \subsection{Asynchronous I/O}
-The final aspect of this proposal is asynchronous I/O. Without it, user threads that execute I/O operations will block the underlying kernel thread. This leads to poor throughput, it would be preferrable to block the user-thread and reuse the underlying kernel-thread to run other ready threads. This requires intercepting the user-threads' calls to I/O operations, redirecting them to an asynchronous I/O interface and handling the multiplexing between the synchronous and asynchronous API. As such, these are the three components needed to implemented to support asynchronous I/O : an OS abstraction layer over the asynchronous interface, an event-engine to (de)multiplex the operations and a synchronous interface for users to use. None of these components currently exist in \CFA and I will need to build all three for this project.
+The final aspect of this proposal is asynchronous I/O. Without it, user threads that execute I/O operations block the underlying kernel thread, which leads to poor throughput. It would be preferrable to block the user-thread and reuse the underlying kernel-thread to run other ready threads. This approach requires intercepting the user-threads' calls to I/O operations, redirecting them to an asynchronous I/O interface and handling the multiplexing between the synchronous and asynchronous API. As such, these are the three components needed to implemented support for asynchronous I/O : an OS abstraction layer over the asynchronous interface, an event-engine to (de)multiplex the operations and a synchronous interface for users to use. None of these components currently exist in \CFA and I will need to build all three for this project.
 
 \paragraph{OS Abstraction}
-One of the fundamental part of converting blocking I/O operations into non-blocking ones is having an underlying asynchronous I/O interface to direct the I/O operations. While there exists many different APIs for asynchronous I/O, it is not part of this proposal to create a novel API, simply to use an existing one that is sufficient. uC++ uses the \texttt{select} as its interface, which handles ttys, pipes and sockets, but not disk. It entails significant complexity and is being replaced which make it a less interesting alternative. Another interface which is becoming popular recently\cit is \texttt{epoll}, which is supposed to be cheaper than \texttt{select}. However, epoll also does not handle file system and seems to have problem to linux pipes and \texttt{TTY}s\cit. A very recent alternative that must still be investigated is \texttt{io\_uring}. It claims to address some of the issues with \texttt{epoll} but is too recent to be confident that it does. Finally, a popular cross-platform alternative is \texttt{libuv}, which offers asynchronous sockets and asynchronous file system operations (among other features). However, as a full-featured library it includes much more than what is needed and could conflict with other features of \CFA unless significant efforts are made to merge them together.
+One fundamental part for converting blocking I/O operations into non-blocking ones is having an underlying asynchronous I/O interface to direct the I/O operations. While there exists many different APIs for asynchronous I/O, it is not part of this proposal to create a novel API. I plan to use an existing one that is sufficient. uC++ uses the \texttt{select} as its interface, which handles ttys, pipes and sockets, but not disk. It entails significant complexity and is being replaced, which make it a less interesting alternative. Another popular interface is \texttt{epoll}\cit, which is supposed to be cheaper than \texttt{select}. However, epoll also does not handle the file system and seems to have problem to linux pipes and \texttt{TTY}s\cit. A very recent alternative that must still be investigated is \texttt{io\_uring}. It claims to address some of the issues with \texttt{epoll} but is too recent to be confident that it does. Finally, a popular cross-platform alternative is \texttt{libuv}, which offers asynchronous sockets and asynchronous file system operations (among other features). However, as a full-featured library it includes much more than what is needed and could conflict with other features of \CFA unless significant effort is made to merge them together.
 
 \paragraph{Event-Engine}
-Laying on top of the asynchronous interface layer is the event-engine. This engine is responsible for multiplexing (batching) the synchronous I/O requests into an asynchronous I/O request and demultiplexing the results onto appropriate blocked threads. This can be straightforward for the simple cases, but can become quite complex. Decisions that will need to be made include : whether to poll from a seperate kernel thread or a regularly scheduled user thread, what should be the ordering used when results satisfy many requests, how to handle threads waiting for multiple operations, etc.
+Laying on top of the asynchronous interface layer is the event-engine. This engine is responsible for multiplexing (batching) the synchronous I/O requests into an asynchronous I/O request and demultiplexing the results onto appropriate blocked threads. This step can be straightforward for the simple cases, but can become quite complex. Decisions that need to be made include : whether to poll from a seperate kernel thread or a regularly scheduled user thread, what should be the ordering used when results satisfy many requests, how to handle threads waiting for multiple operations, etc.
 
 \paragraph{Interface}
-Finally, for these components to be available, it is necessary to expose them through a synchronous interface. This can be a novel interface but it is preferrable to attempt to intercept the existing POSIX interface in order to be compatible with existing code. This allows C programs written using this interface to be transparently converted to \CFA with minimal effeort. Where this is not applicable, a novel interface will be created to fill the gaps.
+Finally, for these components to be available, it is necessary to expose them through a synchronous interface. The interface can be novel but it is preferrable to match the existing POSIX interface in order to be compatible with existing code. Matching allows C programs written using this interface to be transparently converted to \CFA with minimal effeort. Where this is not applicable, a novel interface will be created to fill the gaps.
 
 
Index: doc/theses/thierry_delisle_PhD/comp_II/img/base.fig
===================================================================
--- doc/theses/thierry_delisle_PhD/comp_II/img/base.fig	(revision bbdb0c643e5e1d39beac9cc933345506b44d7ff5)
+++ doc/theses/thierry_delisle_PhD/comp_II/img/base.fig	(revision f100a839520c4f0c1a242b3868d214831ef643f9)
@@ -1,7 +1,7 @@
-#FIG 3.2  Produced by xfig version 3.2.5c
+#FIG 3.2  Produced by xfig version 3.2.7a
 Landscape
 Center
 Inches
-Letter  
+Letter
 100.00
 Single
@@ -109,6 +109,15 @@
 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
 	 2400 3900 7200 3900 7200 4500 2400 4500 2400 3900
-4 2 0 50 -1 0 12 0.0000 2 180 660 2100 4200 Array of\001
-4 2 0 50 -1 0 12 0.0000 2 165 600 2100 4425 Queues\001
-4 2 0 50 -1 0 12 0.0000 2 135 645 2100 3075 Threads\001
-4 2 0 50 -1 0 12 0.0000 2 180 525 2100 2850 Ready\001
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
+	 2454 2520 2957 2520
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
+	 2454 3417 2957 3417
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
+	 2400 4350 3000 4350
+4 2 0 50 -1 0 12 0.0000 2 165 720 2100 4200 Array of\001
+4 2 0 50 -1 0 12 0.0000 2 150 540 2100 4425 Queues\001
+4 2 0 50 -1 0 12 0.0000 2 135 630 2100 3075 Threads\001
+4 2 0 50 -1 0 12 0.0000 2 165 450 2100 2850 Ready\001
+4 0 0 50 -1 0 11 0.0000 2 135 180 2595 3561 TS\001
+4 0 0 50 -1 0 11 0.0000 2 135 180 2595 2665 TS\001
+4 0 0 50 -1 0 11 0.0000 2 135 180 2595 4479 TS\001
Index: doc/theses/thierry_delisle_PhD/comp_II/img/emptytls.fig
===================================================================
--- doc/theses/thierry_delisle_PhD/comp_II/img/emptytls.fig	(revision f100a839520c4f0c1a242b3868d214831ef643f9)
+++ doc/theses/thierry_delisle_PhD/comp_II/img/emptytls.fig	(revision f100a839520c4f0c1a242b3868d214831ef643f9)
@@ -0,0 +1,63 @@
+#FIG 3.2  Produced by xfig version 3.2.7a
+Landscape
+Center
+Inches
+Letter
+100.00
+Single
+-2
+1200 2
+6 4800 3075 5400 4200
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 5100 4200 5100 3600
+2 3 0 1 0 7 50 -1 -1 0.000 0 0 0 0 0 7
+	 5400 3335 5250 3075 4950 3075 4800 3335 4950 3595 5250 3595
+	 5400 3335
+-6
+6 2400 2175 3000 3375
+2 3 0 1 0 7 50 -1 -1 0.000 0 0 0 0 0 7
+	 3000 2435 2850 2175 2550 2175 2400 2435 2550 2695 2850 2695
+	 3000 2435
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 2700 3375 2700 2700
+-6
+6 2400 3075 3000 4200
+2 3 0 1 0 7 50 -1 -1 0.000 0 0 0 0 0 7
+	 3000 3335 2850 3075 2550 3075 2400 3335 2550 3595 2850 3595
+	 3000 3335
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 2700 4200 2700 3600
+-6
+6 6750 4125 7050 4275
+1 3 0 1 0 0 50 -1 20 0.000 1 0.0000 6825 4200 20 20 6825 4200 6845 4200
+1 3 0 1 0 0 50 -1 20 0.000 1 0.0000 6900 4200 20 20 6900 4200 6920 4200
+1 3 0 1 0 0 50 -1 20 0.000 1 0.0000 6975 4200 20 20 6975 4200 6995 4200
+-6
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
+	 3000 3900 3000 4500
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
+	 3600 3900 3600 4500
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
+	 4200 3900 4200 4500
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
+	 4800 3900 4800 4500
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
+	 5400 3900 5400 4500
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
+	 6000 3900 6000 4500
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
+	 6600 3900 6600 4500
+2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
+	 2400 3900 7200 3900 7200 4500 2400 4500 2400 3900
+4 2 0 50 -1 0 12 0.0000 2 165 720 2100 4200 Array of\001
+4 2 0 50 -1 0 12 0.0000 2 150 540 2100 4425 Queues\001
+4 2 0 50 -1 0 12 0.0000 2 135 630 2100 3075 Threads\001
+4 2 0 50 -1 0 12 0.0000 2 165 450 2100 2850 Ready\001
+4 0 0 50 -1 5 14 0.0000 2 165 1080 2400 5100 [1000100...]\001
+4 0 0 50 -1 5 14 0.0000 2 165 1080 4425 5100 [1000100...]\001
+4 0 0 50 -1 5 14 0.0000 2 165 1080 6450 5100 [1000100...]\001
+4 0 0 50 -1 0 13 0.0000 2 135 630 1500 5175 Bitmask\001
+4 0 0 50 -1 0 13 0.0000 2 135 1080 1050 4950 Thread-Local\001
Index: doc/theses/thierry_delisle_PhD/comp_II/img/resize.fig
===================================================================
--- doc/theses/thierry_delisle_PhD/comp_II/img/resize.fig	(revision bbdb0c643e5e1d39beac9cc933345506b44d7ff5)
+++ doc/theses/thierry_delisle_PhD/comp_II/img/resize.fig	(revision f100a839520c4f0c1a242b3868d214831ef643f9)
@@ -1,7 +1,7 @@
-#FIG 3.2  Produced by xfig version 3.2.5c
+#FIG 3.2  Produced by xfig version 3.2.7a
 Landscape
 Center
 Inches
-Letter  
+Letter
 100.00
 Single
@@ -98,7 +98,7 @@
 	1 1 1.00 45.00 90.00
 	 7500 4200 7950 4200
-4 0 0 50 -1 0 12 0.0000 2 135 915 7500 3825 Grows with\001
-4 0 0 50 -1 0 12 0.0000 2 135 840 7500 4050 additional\001
-4 0 0 50 -1 0 12 0.0000 2 135 840 7500 4425 processors\001
+4 0 0 50 -1 0 12 0.0000 2 135 900 7500 3825 Grows with\001
+4 0 0 50 -1 0 12 0.0000 2 135 900 7500 4050 additional\001
+4 0 0 50 -1 0 12 0.0000 2 120 900 7500 4425 processors\001
 -6
 2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 2
@@ -118,6 +118,13 @@
 2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
 	 2400 3900 7200 3900 7200 4500 2400 4500 2400 3900
-4 2 0 50 -1 0 12 0.0000 2 180 660 2100 4200 Array of\001
-4 2 0 50 -1 0 12 0.0000 2 165 600 2100 4425 Queues\001
-4 2 0 50 -1 0 12 0.0000 2 135 645 2100 3075 Threads\001
-4 2 0 50 -1 0 12 0.0000 2 180 525 2100 2850 Ready\001
+2 2 0 1 0 7 50 -1 -1 0.000 0 0 -1 0 0 5
+	 3600 5100 4500 5100 4500 6300 3600 6300 3600 5100
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 3
+	1 1 1.00 60.00 120.00
+	 3750 5550 2400 5550 2400 4500
+4 2 0 50 -1 0 12 0.0000 2 165 720 2100 4200 Array of\001
+4 2 0 50 -1 0 12 0.0000 2 150 540 2100 4425 Queues\001
+4 2 0 50 -1 0 12 0.0000 2 135 630 2100 3075 Threads\001
+4 2 0 50 -1 0 12 0.0000 2 165 450 2100 2850 Ready\001
+4 0 0 50 -1 0 13 0.0000 2 135 1980 3600 5025 Cluster Data Strcuture\001
+4 0 0 50 -1 0 13 0.0000 2 165 1170 2400 5775 Array Pointer\001
Index: doc/theses/thierry_delisle_PhD/comp_II/img/system.fig
===================================================================
--- doc/theses/thierry_delisle_PhD/comp_II/img/system.fig	(revision f100a839520c4f0c1a242b3868d214831ef643f9)
+++ doc/theses/thierry_delisle_PhD/comp_II/img/system.fig	(revision f100a839520c4f0c1a242b3868d214831ef643f9)
@@ -0,0 +1,171 @@
+#FIG 3.2  Produced by xfig version 3.2.5c
+Landscape
+Center
+Inches
+Letter  
+100.00
+Single
+-2
+1200 2
+6 3855 2775 4155 2925
+1 3 0 1 0 0 0 0 0 0.000 1 0.0000 3930 2850 30 30 3930 2850 3960 2880
+1 3 0 1 0 0 0 0 0 0.000 1 0.0000 4035 2850 30 30 4035 2850 4065 2880
+-6
+6 4755 3525 5055 3675
+1 3 0 1 0 0 0 0 0 0.000 1 0.0000 4830 3600 30 30 4830 3600 4860 3630
+1 3 0 1 0 0 0 0 0 0.000 1 0.0000 4935 3600 30 30 4935 3600 4965 3630
+-6
+6 4650 2775 4950 2925
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4725 2850 15 15 4725 2850 4740 2865
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4800 2850 15 15 4800 2850 4815 2865
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4875 2850 15 15 4875 2850 4890 2865
+-6
+6 3225 2400 3525 2550
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3300 2475 15 15 3300 2475 3315 2490
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3375 2475 15 15 3375 2475 3390 2490
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 3450 2475 15 15 3450 2475 3465 2490
+-6
+6 5475 3450 5625 3750
+1 3 0 1 -1 -1 0 0 20 0.000 1 4.7120 5550 3525 15 15 5550 3525 5535 3540
+1 3 0 1 -1 -1 0 0 20 0.000 1 4.7120 5550 3600 15 15 5550 3600 5535 3615
+1 3 0 1 -1 -1 0 0 20 0.000 1 4.7120 5550 3675 15 15 5550 3675 5535 3690
+-6
+6 4275 3525 4575 3675
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4350 3600 15 15 4350 3600 4365 3615
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4425 3600 15 15 4425 3600 4440 3615
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4500 3600 15 15 4500 3600 4515 3615
+-6
+6 3225 4125 4650 4425
+6 4350 4200 4650 4350
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4425 4275 15 15 4425 4275 4440 4290
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4500 4275 15 15 4500 4275 4515 4290
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 4575 4275 15 15 4575 4275 4590 4290
+-6
+1 1 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3450 4275 225 150 3450 4275 3675 4425
+1 1 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4050 4275 225 150 4050 4275 4275 4425
+-6
+6 6675 4125 7500 4425
+6 7200 4200 7500 4350
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 7275 4275 15 15 7275 4275 7290 4290
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 7350 4275 15 15 7350 4275 7365 4290
+1 3 0 1 -1 -1 0 0 20 0.000 1 0.0000 7425 4275 15 15 7425 4275 7440 4290
+-6
+1 1 0 1 -1 -1 0 0 -1 0.000 1 0.0000 6900 4275 225 150 6900 4275 7125 4425
+-6
+6 6675 3525 8025 3975
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 6675 3750 6975 3750
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 7125 3750 7350 3750
+2 2 0 1 -1 -1 0 0 -1 0.000 0 0 0 0 0 5
+	 7800 3975 7800 3525 7350 3525 7350 3975 7800 3975
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 7800 3750 8025 3750
+-6
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 5550 2625 150 150 5550 2625 5700 2625
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 5550 3225 150 150 5550 3225 5700 3225
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 5550 3975 150 150 5550 3975 5700 3975
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3525 2850 150 150 3525 2850 3675 2850
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4200 2475 150 150 4200 2475 4350 2475
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4425 2850 150 150 4425 2850 4575 2850
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4650 2475 150 150 4650 2475 4800 2475
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3975 3600 150 150 3975 3600 4125 3600
+1 3 0 1 0 0 0 0 0 0.000 1 0.0000 3525 3600 30 30 3525 3600 3555 3630
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3750 2475 150 150 3750 2475 3900 2625
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4875 3600 150 150 4875 3600 5025 3750
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3975 2850 150 150 3975 2850 4125 2850
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 7200 2775 150 150 7200 2775 7350 2775
+1 3 0 1 0 0 0 0 0 0.000 1 0.0000 2250 4830 30 30 2250 4830 2280 4860
+1 3 0 1 0 0 0 0 0 0.000 1 0.0000 7200 2775 30 30 7200 2775 7230 2805
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3525 3600 150 150 3525 3600 3675 3600
+1 3 0 1 -1 -1 0 0 -1 0.000 1 0.0000 3875 4800 100 100 3875 4800 3975 4800
+1 1 0 1 -1 -1 0 0 -1 0.000 1 0.0000 4650 4800 150 75 4650 4800 4800 4875
+2 2 0 1 -1 -1 0 0 -1 0.000 0 0 0 0 0 5
+	 2400 4200 2400 3750 1950 3750 1950 4200 2400 4200
+2 2 1 1 -1 -1 0 0 -1 4.000 0 0 0 0 0 5
+	 6300 4500 6300 1800 3000 1800 3000 4500 6300 4500
+2 2 0 1 -1 -1 0 0 -1 0.000 0 0 0 0 0 5
+	 5775 2850 5775 2400 5325 2400 5325 2850 5775 2850
+2 2 0 1 -1 -1 0 0 -1 0.000 0 0 0 0 0 5
+	 5775 4200 5775 3750 5325 3750 5325 4200 5775 4200
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 5175 3975 5325 3975
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 5175 3225 5325 3225
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 5175 2625 5325 2625
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 5775 3975 5925 3975
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 5775 3225 5925 3225
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 5775 2625 5925 2625
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2
+	 5175 3975 5175 2625
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 5925 3975 5925 2025
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 5925 3750 6225 3750
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 3450 2625 3225 2625
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 3
+	1 1 1.00 45.00 90.00
+	 5925 2025 4200 2025 4200 2250
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2
+	 3225 2625 3225 3600
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 3075 3600 3375 3600
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 3675 3600 3825 3600
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 4125 3600 4275 3600
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 4575 3600 4725 3600
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 5025 3600 5175 3600
+2 2 0 1 -1 -1 0 0 -1 0.000 0 0 0 0 0 5
+	 5775 3450 5775 3000 5325 3000 5325 3450 5775 3450
+2 2 1 1 -1 -1 0 0 -1 4.000 0 0 0 0 0 5
+	 8100 4500 8100 1800 6600 1800 6600 4500 8100 4500
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 7050 2775 6825 2775
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 0 0 2
+	 6825 2775 6825 3750
+2 1 0 1 -1 -1 0 0 -1 0.000 0 0 -1 1 0 4
+	1 1 1.00 45.00 90.00
+	 7875 3750 7875 2325 7200 2325 7200 2550
+2 2 0 1 -1 -1 0 0 -1 0.000 0 0 0 0 0 5
+	 5850 4950 5850 4725 5625 4725 5625 4950 5850 4950
+2 2 1 1 -1 -1 0 0 -1 3.000 0 0 0 0 0 5
+	 6975 4950 6750 4950 6750 4725 6975 4725 6975 4950
+4 1 -1 0 0 0 10 0.0000 2 105 720 5550 4425 Processors\001
+4 1 -1 0 0 0 10 0.0000 2 120 1005 4200 3225 Blocked Tasks\001
+4 1 -1 0 0 0 10 0.0000 2 150 870 4200 3975 Ready Tasks\001
+4 1 -1 0 0 0 10 0.0000 2 135 1095 7350 1725 Other Cluster(s)\001
+4 1 -1 0 0 0 10 0.0000 2 105 840 4650 1725 User Cluster\001
+4 1 -1 0 0 0 10 0.0000 2 150 615 2175 3675 Manager\001
+4 1 -1 0 0 0 10 0.0000 2 105 990 2175 3525 Discrete-event\001
+4 1 -1 0 0 0 10 0.0000 2 135 795 2175 4350 preemption\001
+4 0 -1 0 0 0 10 0.0000 2 150 1290 2325 4875 genrator/coroutine\001
+4 0 -1 0 0 0 10 0.0000 2 120 270 4050 4875 task\001
+4 0 -1 0 0 0 10 0.0000 2 105 450 7050 4875 cluster\001
+4 0 -1 0 0 0 10 0.0000 2 105 660 5925 4875 processor\001
+4 0 -1 0 0 0 10 0.0000 2 105 555 4875 4875 monitor\001
