# Changeset 0a945fd for doc/theses/thierry_delisle_PhD/comp_II/comp_II.tex

Ignore:
Timestamp:
Sep 22, 2020, 1:02:45 PM (2 years ago)
Branches:
arm-eh, enum, forall-pointer-decay, jacob/cs343-translation, master, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
305cd5c
Parents:
7a80113
Message:

Minor fixes in compII

File:
1 edited

### Legend:

Unmodified
 r7a80113 \paragraph{Efficiency} Finally, efficient usage of CPU resources is also an important requirement and is discussed in depth towards the end of the proposal. \newterm{Efficiency} means avoiding using CPU cycles when there are no threads to run (conserve energy/heat), and conversely, using as many available CPU cycles when the workload can benefit from it. \newterm{Efficiency} means avoiding using CPU cycles when there are no threads to run (to conserve energy), and conversely, using as many available CPU cycles when the workload can benefit from it. Balancing these two states is where the complexity lies. The \CFA scheduler should be efficient with respect to the underlying (shared) computer. \begin{enumerate} \item Threads live long enough for useful feedback information to be gathered. \item Threads belong to multiple users so fairness across users is largely invisible. \item Threads belong to multiple users so fairness across users is important. \end{enumerate} Security concerns mean more precise and robust fairness metrics must be used to guarantee fairness across processes created by users as well as threads created within a process. In the case of the \CFA scheduler, every thread runs in the same user space and is controlled by the same user. Fairness across threads is therefore a given and it is then possible to safely ignore the possibility that threads are malevolent. Fairness across users is therefore a given and it is then possible to safely ignore the possibility that threads are malevolent. This approach allows for a much simpler fairness metric, and in this proposal, \emph{fairness} is defined as: \begin{quote} \subsection{Schedulers without feedback or priorities} This proposal conjectures that it is possible to construct a default scheduler for the \CFA runtime that offers good scalability and a simple fairness guarantee that is easy for programmers to reason about. The simplest fairness guarantee is FIFO ordering, \ie threads scheduled first come first. The simplest fairness guarantee is FIFO ordering, \ie threads scheduled first run first. However, enforcing FIFO ordering generally conflicts with scalability across multiple processors because of the additional synchronization. Thankfully, strict FIFO is not needed for sufficient fairness. \input{empty.pstex_t} \end{center} \caption{Unloaded relaxed FIFO list where the array contains many empty cells.} \caption{Underloaded relaxed FIFO list where the array contains many empty cells.} \label{fig:empty} \end{figure} In an unloaded system, several of the queues are empty, so selecting a random queue for popping is less likely to yield a successful selection and more attempts are needed, resulting in a performance degradation. In an underloaded system, several of the queues are empty, so selecting a random queue for popping is less likely to yield a successful selection and more attempts are needed, resulting in a performance degradation. Figure~\ref{fig:empty} shows an example with fewer elements, where the chances of getting an empty queue is 5/7 per pick, meaning two random picks yield an item only half the time. Since the ready queue is not empty, the pop operation \emph{must} find an element before returning and therefore must retry. \end{center} \vspace*{-5pt} \caption{Unloaded queue with added bitmask to indicate which array cells have items.} \caption{Underloaded queue with added bitmask to indicate which array cells have items.} \label{fig:emptybit} \begin{center} \end{center} \vspace*{-5pt} \caption{Unloaded queue with added binary search tree indicate which array cells have items.} \caption{Underloaded queue with added binary search tree indicate which array cells have items.} \label{fig:emptytree} \begin{center} \end{center} \vspace*{-5pt} \caption{Unloaded queue with added per processor bitmask to indicate which array cells have items.} \caption{Underloaded queue with added per processor bitmask to indicate which array cells have items.} \label{fig:emptytls} \end{figure} Figure~\ref{fig:emptytree} shows an approach using a hierarchical tree data-structure to reduce contention and has been shown to work in similar cases~\cite{ellen2007snzi}\footnote{This particular paper seems to be patented in the US. How does that affect \CFA? Can I use it in my work?}. Figure~\ref{fig:emptytree} shows an approach using a hierarchical tree data-structure to reduce contention and has been shown to work in similar cases~\cite{ellen2007snzi}. However, this approach may lead to poorer performance in Table~\ref{tab:perfcases} case~B due to the inherent pointer chasing cost and already low contention cost in that case. Figure~\ref{fig:emptytls} shows an approach using dense information, similar to the bitmap, but have each thread keep its own independent copy of it. While this approach can offer good scalability \emph{and} low latency, the liveliness of the information can become a problem. In the simple cases, local copies with empty underlying queues can become stale and end-up not being useful for the pop operation. In the simple cases, local copies can become stale and end-up not being useful for the pop operation. A more serious problem is that reliable information is necessary for some parts of this algorithm to be correct. As mentioned in this section, processors must know \emph{reliably} whether the list is empty or not to decide if they can return \texttt{NULL} or if they must keep looking during a pop operation. A very recent alternative that I am investigating is $io_uring$~\cite{io_uring}. It claims to address some of the issues with $epoll$ and my early investigating suggests that the claim is accurate. $io_uring$ uses a much more general approach where system calls are registered to a queue and later executed by the kernel, rather than relying on system calls to subsequently wait for changes on file descriptors or return an error. $io_uring$ uses a much more general approach where system calls are registered to a queue and later executed by the kernel, rather than relying on system calls to support returning an error instead of blocking. I believe this approach allows for fewer problems, \eg the manpage for $open$~\cite{open} states: \begin{quote}