Changeset d489da8


Ignore:
Timestamp:
Sep 19, 2022, 11:22:32 AM (2 years ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
ADT, ast-experimental, master, pthread-emulation
Children:
4520b77e, aa144c5a
Parents:
4407b7e
Message:

core final read

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/thierry_delisle_PhD/thesis/text/core.tex

    r4407b7e rd489da8  
    1313To match expectations, the design must offer the programmer sufficient guarantees so that, as long as they respect the execution mental model, the system also respects this model.
    1414
    15 For threading, a simple and common execution mental model is the ``ideal multitasking CPU'' :
     15For threading, a simple and common execution mental model is the ``ideal multitasking CPU'':
    1616
    1717\begin{displayquote}[Linux CFS\cite{MAN:linux/cfs}]
     
    120120On the other hand, work-stealing schedulers only attempt to do load-balancing when a \gls{proc} runs out of work.
    121121This means that the scheduler never balances unfair loads unless they result in a \gls{proc} running out of work.
    122 Chapter~\ref{microbench} shows that pathological cases work stealing can lead to indefinite starvation.
     122Chapter~\ref{microbench} shows that, in pathological cases, work stealing can lead to indefinite starvation.
    123123
    124124Based on these observations, the conclusion is that a \emph{perfect} scheduler should behave similarly to work-stealing in the steady-state case, but load balance proactively when the need arises.
     
    126126\subsection{Relaxed-FIFO}
    127127A different scheduling approach is to create a ``relaxed-FIFO'' queue, as in \cite{alistarh2018relaxed}.
    128 This approach forgoes any ownership between \gls{proc} and sub-queue, and simply creates a pool of ready queues from which \glspl{proc} pick.
     128This approach forgoes any ownership between \gls{proc} and sub-queue, and simply creates a pool of sub-queues from which \glspl{proc} pick.
    129129Scheduling is performed as follows:
    130130\begin{itemize}
     
    134134Timestamps are added to each element of a sub-queue.
    135135\item
    136 A \gls{proc} randomly tests ready queues until it has acquired one or two queues.
     136A \gls{proc} randomly tests sub-queues until it has acquired one or two queues.
    137137\item
    138138If two queues are acquired, the older of the two \ats is dequeued from the front of the acquired queues.
     
    254254
    255255With these additions to work stealing, scheduling can be made as fair as the relaxed-FIFO approach, avoiding the majority of unnecessary migrations.
    256 Unfortunately, the work to achieve fairness has a performance cost, especially when the workload is inherently fair, and hence, there is only short-term or no starvation.
     256Unfortunately, the work to achieve fairness has a performance cost, especially when the workload is inherently fair, and hence, there is only short-term unfairness or no starvation.
    257257The problem is that the constant polling, \ie reads, of remote sub-queues generally entails cache misses because the TSs are constantly being updated, \ie, writes.
    258258To make things worst, remote sub-queues that are very active, \ie \ats are frequently enqueued and dequeued from them, lead to higher chances that polling will incur a cache-miss.
     
    342342\subsection{Topological Work Stealing}
    343343\label{s:TopologicalWorkStealing}
    344 Therefore, the approach used in the \CFA scheduler is to have per-\proc sub-queues, but have an explicit data structure to track which cache substructure each sub-queue is tied to.
     344The approach used in the \CFA scheduler is to have per-\proc sub-queues, but have an explicit data structure to track which cache substructure each sub-queue is tied to.
    345345This tracking requires some finesse because reading this data structure must lead to fewer cache misses than not having the data structure in the first place.
    346346A key element, however, is that, like the timestamps for helping, reading the cache instance mapping only needs to give the correct result \emph{often enough}.
Note: See TracChangeset for help on using the changeset viewer.