Changeset 0a945fd for doc/theses


Ignore:
Timestamp:
Sep 22, 2020, 1:02:45 PM (13 months ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
arm-eh, jacob/cs343-translation, master, new-ast-unique-expr
Children:
305cd5c
Parents:
7a80113
Message:

Minor fixes in compII

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/thierry_delisle_PhD/comp_II/comp_II.tex

    r7a80113 r0a945fd  
    138138
    139139\paragraph{Efficiency} Finally, efficient usage of CPU resources is also an important requirement and is discussed in depth towards the end of the proposal.
    140 \newterm{Efficiency} means avoiding using CPU cycles when there are no threads to run (conserve energy/heat), and conversely, using as many available CPU cycles when the workload can benefit from it.
     140\newterm{Efficiency} means avoiding using CPU cycles when there are no threads to run (to conserve energy), and conversely, using as many available CPU cycles when the workload can benefit from it.
    141141Balancing these two states is where the complexity lies.
    142142The \CFA scheduler should be efficient with respect to the underlying (shared) computer.
     
    150150\begin{enumerate}
    151151        \item Threads live long enough for useful feedback information to be gathered.
    152         \item Threads belong to multiple users so fairness across users is largely invisible.
     152        \item Threads belong to multiple users so fairness across users is important.
    153153\end{enumerate}
    154154
     
    162162Security concerns mean more precise and robust fairness metrics must be used to guarantee fairness across processes created by users as well as threads created within a process.
    163163In the case of the \CFA scheduler, every thread runs in the same user space and is controlled by the same user.
    164 Fairness across threads is therefore a given and it is then possible to safely ignore the possibility that threads are malevolent.
     164Fairness across users is therefore a given and it is then possible to safely ignore the possibility that threads are malevolent.
    165165This approach allows for a much simpler fairness metric, and in this proposal, \emph{fairness} is defined as:
    166166\begin{quote}
     
    193193\subsection{Schedulers without feedback or priorities}
    194194This proposal conjectures that it is possible to construct a default scheduler for the \CFA runtime that offers good scalability and a simple fairness guarantee that is easy for programmers to reason about.
    195 The simplest fairness guarantee is FIFO ordering, \ie threads scheduled first come first.
     195The simplest fairness guarantee is FIFO ordering, \ie threads scheduled first run first.
    196196However, enforcing FIFO ordering generally conflicts with scalability across multiple processors because of the additional synchronization.
    197197Thankfully, strict FIFO is not needed for sufficient fairness.
     
    236236                \input{empty.pstex_t}
    237237        \end{center}
    238         \caption{Unloaded relaxed FIFO list where the array contains many empty cells.}
     238        \caption{Underloaded relaxed FIFO list where the array contains many empty cells.}
    239239        \label{fig:empty}
    240240\end{figure}
    241241
    242 In an unloaded system, several of the queues are empty, so selecting a random queue for popping is less likely to yield a successful selection and more attempts are needed, resulting in a performance degradation.
     242In an underloaded system, several of the queues are empty, so selecting a random queue for popping is less likely to yield a successful selection and more attempts are needed, resulting in a performance degradation.
    243243Figure~\ref{fig:empty} shows an example with fewer elements, where the chances of getting an empty queue is 5/7 per pick, meaning two random picks yield an item only half the time.
    244244Since the ready queue is not empty, the pop operation \emph{must} find an element before returning and therefore must retry.
     
    288288        \end{center}
    289289        \vspace*{-5pt}
    290         \caption{Unloaded queue with added bitmask to indicate which array cells have items.}
     290        \caption{Underloaded queue with added bitmask to indicate which array cells have items.}
    291291        \label{fig:emptybit}
    292292        \begin{center}
     
    294294        \end{center}
    295295        \vspace*{-5pt}
    296         \caption{Unloaded queue with added binary search tree indicate which array cells have items.}
     296        \caption{Underloaded queue with added binary search tree indicate which array cells have items.}
    297297        \label{fig:emptytree}
    298298        \begin{center}
     
    300300        \end{center}
    301301        \vspace*{-5pt}
    302         \caption{Unloaded queue with added per processor bitmask to indicate which array cells have items.}
     302        \caption{Underloaded queue with added per processor bitmask to indicate which array cells have items.}
    303303        \label{fig:emptytls}
    304304\end{figure}
    305305
    306 Figure~\ref{fig:emptytree} shows an approach using a hierarchical tree data-structure to reduce contention and has been shown to work in similar cases~\cite{ellen2007snzi}\footnote{This particular paper seems to be patented in the US.
    307 How does that affect \CFA? Can I use it in my work?}.
     306Figure~\ref{fig:emptytree} shows an approach using a hierarchical tree data-structure to reduce contention and has been shown to work in similar cases~\cite{ellen2007snzi}.
    308307However, this approach may lead to poorer performance in Table~\ref{tab:perfcases} case~B due to the inherent pointer chasing cost and already low contention cost in that case.
    309308
    310309Figure~\ref{fig:emptytls} shows an approach using dense information, similar to the bitmap, but have each thread keep its own independent copy of it.
    311310While this approach can offer good scalability \emph{and} low latency, the liveliness of the information can become a problem.
    312 In the simple cases, local copies with empty underlying queues can become stale and end-up not being useful for the pop operation.
     311In the simple cases, local copies can become stale and end-up not being useful for the pop operation.
    313312A more serious problem is that reliable information is necessary for some parts of this algorithm to be correct.
    314313As mentioned in this section, processors must know \emph{reliably} whether the list is empty or not to decide if they can return \texttt{NULL} or if they must keep looking during a pop operation.
     
    458457A very recent alternative that I am investigating is $io_uring$~\cite{io_uring}.
    459458It claims to address some of the issues with $epoll$ and my early investigating suggests that the claim is accurate.
    460 $io_uring$ uses a much more general approach where system calls are registered to a queue and later executed by the kernel, rather than relying on system calls to subsequently wait for changes on file descriptors or return an error.
     459$io_uring$ uses a much more general approach where system calls are registered to a queue and later executed by the kernel, rather than relying on system calls to support returning an error instead of blocking.
    461460I believe this approach allows for fewer problems, \eg the manpage for $open$~\cite{open} states:
    462461\begin{quote}
Note: See TracChangeset for help on using the changeset viewer.