1 | \chapter{Scheduling Core}\label{core} |
---|
2 | |
---|
3 | Before discussing scheduling in general, where it is important to address systems that are changing states, this document discusses scheduling in a somewhat ideal scenerio, where the system has reached a steady state. For this purpose, a steady state is loosely defined as a state where there are always \glspl{thrd} ready to run and but the system has the ressources necessary to accomplish the work. In short, the system is neither overloaded or underloaded. |
---|
4 | |
---|
5 | I believe it is important to discuss the steady state first because it is the easiest case to handle and, relatedly, the case in which the best performance is to be expected. As such, when the system is either overloaded or underloaded, a common approach is to try to adapt the system to the new load and return to the steady state. Flaws in the scheduling in the steady state tend therefore to be pervasive in all states. |
---|
6 | |
---|
7 | \section{Design Goals} |
---|
8 | As with most of the design decisions behind \CFA, the main goal is to match the expectation of the programmer, according to their probable mental model. To match these expectations, the design must offer the programmers sufficient guarantees so that, as long as the programmer respects the mental model, the system will also respect this model. |
---|
9 | |
---|
10 | For threading, a simple and common mental model is the ``Ideal multi-tasking CPU'' : |
---|
11 | |
---|
12 | \begin{displayquote}[Linux CFS\cit{https://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt}] |
---|
13 | {[The]} ``Ideal multi-tasking CPU'' is a (non-existent :-)) CPU that has 100\% physical power and which can run each task at precise equal speed, in parallel, each at [an equal fraction of the] speed. For example: if there are 2 tasks running, then it runs each at 50\% physical power --- i.e., actually in parallel. |
---|
14 | \end{displayquote} |
---|
15 | |
---|
16 | Applied to threads, this model states that every ready \gls{thrd} immediately runs in parallel with all other ready \glspl{thrd}. While a strict implementation of this model is not feasible, programmers still have expectations about scheduling that come from this model. |
---|
17 | |
---|
18 | In general, the expectation at the center of this model is that ready \glspl{thrd} do not interfere with eachother but simply share the hardware. This makes it easier to reason about threading because ready \glspl{thrd} can be taken in isolation and the effect of the scheduler can be virtually ignored. This expectation of \gls{thrd} independence means the scheduler is expected to offer 2 guarantees: |
---|
19 | \begin{enumerate} |
---|
20 | \item A fairness guarantee: a \gls{thrd} that is ready to run will not be prevented to do so by another thread. |
---|
21 | \item A performance guarantee: a \gls{thrd} that wants to start or stop running will not be slowed down by other threads wanting to do the same. |
---|
22 | \end{enumerate} |
---|
23 | |
---|
24 | It is important to note that these guarantees are expected only up to a point. \Glspl{thrd} that are ready to run should not be prevented to do so, but they still need to share a limited amount of hardware. Therefore, the guarantee is considered respected if a \gls{thrd} gets access to a \emph{fair share} of the hardware, even if that share is very small. |
---|
25 | |
---|
26 | Similarly the performance guarantee, the lack of interferance between threads is only relevant op to a point. Ideally the cost of running and blocking would be constant regardless of contention, but the guarantee is considered satisfied if the cost is not \emph{too high} with or without contention. How much is an acceptable cost is obviously highly variable. For this document the performance experimentation will attempt to show that the cost of scheduling is not a major factor in application performance. This demonstration can be made by comparing application built in \CFA to applications built with other languages or other models. If the performance of an application built in \CFA is not meaningfully different than one built with a different runtime, then the scheduler has a negigeable impact on performance, \ie its impact can be ignored. Recall from a few paragraphs ago that the expectation of programmers is that the impact of the scheduler can be ignored. Therefore, if the cost of scheduling is not a significant portion of the runtime of several different application, I will consider the guarantee achieved. |
---|
27 | |
---|
28 | \todo{This paragraph should be moved later} |
---|
29 | % The next step is then to decide what is considered a \emph{fair share}, \ie what metric is used to measure fairness. Since \CFA is intended to allow numerous short lived threads, I decided to avoid total CPU time as the measure of fairness. Total CPU time inherently favors new \glspl{thrd} over older ones which isn't necessarily a good thing. Instead, fairness is measured in terms of opportunities to run. This metric is more appropriate for a mix of short and long lived \glspl{thrd}. |
---|
30 | |
---|
31 | \section{Design} |
---|
32 | While avoiding the pitfalls of Feedback Scheduling is fairly easy, scheduling does not innately require feedback, avoiding prioritization of \glspl{thrd} is more difficult because of implicitly priorities, see Subsection~\ref{priority}. A strictly \glsxtrshort{fifo} rea |
---|
33 | |
---|
34 | \subsection{Sharding} |
---|
35 | |
---|
36 | \begin{figure} |
---|
37 | \begin{center} |
---|
38 | \input{base.pstex_t} |
---|
39 | \end{center} |
---|
40 | \caption{Relaxed FIFO list} |
---|
41 | \label{fig:base} |
---|
42 | List at the base of the scheduler: an array of strictly FIFO lists. |
---|
43 | The timestamp is in all nodes and cell arrays. |
---|
44 | \end{figure} |
---|
45 | |
---|
46 | \subsection{Finding threads} |
---|
47 | Once threads have been distributed onto multiple queues, indentifying which queues are empty and which aren't can become a problem. |
---|
48 | Indeed, if the number of \glspl{thrd} does not far exceed the number of queues, it is probable that several of these queues are empty. |
---|
49 | Figure~\ref{fig:empty} shows an example with 2 \glspl{thrd} running on 8 queues, where the chances of getting an empty queue is 75\% per pick, meaning two random picks yield a \gls{thrd} only half the time. |
---|
50 | |
---|
51 | |
---|
52 | \begin{figure} |
---|
53 | \begin{center} |
---|
54 | \input{empty.pstex_t} |
---|
55 | \end{center} |
---|
56 | \caption{``More empty'' Relaxed FIFO list} |
---|
57 | \label{fig:empty} |
---|
58 | Emptier state of the queue: the array contains many empty cells, that is strictly FIFO lists containing no elements. |
---|
59 | \end{figure} |
---|
60 | |
---|
61 | This can lead to performance problems since picks that do not yield a \gls{thrd} are not useful and do not necessarily help make more informed guesses. |
---|
62 | |
---|
63 | Solutions to this problem can take many forms, but they ultimately all have to encode where the threads are in some form. My results show that the density and locality of this encoding is generally the dominating factor in these scheme. |
---|
64 | |
---|
65 | \paragraph{Dense Information} |
---|