Ignore:
Timestamp:
Jul 18, 2022, 8:06:18 AM (22 months ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
ADT, ast-experimental, master, pthread-emulation, qualifiedEnum
Children:
6a896b0, d677355
Parents:
4f3807d
Message:

proofread chapter text/io.tex, and updates in other chapaters

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/thierry_delisle_PhD/thesis/text/practice.tex

    r4f3807d r847bb6f  
    1515Programmers are mostly expected to resize clusters on startup or teardown.
    1616Therefore dynamically changing the number of \procs is an appropriate moment to allocate or free resources to match the new state.
    17 As such all internal arrays that are sized based on the number of \procs need to be \texttt{realloc}ed.
     17As such all internal arrays that are sized based on the number of \procs need to be @realloc@ed.
    1818This also means that any references into these arrays, pointers or indexes, may need to be fixed when shrinking\footnote{Indexes may still need fixing when shrinkingbecause some indexes are expected to refer to dense contiguous resources and there is no guarantee the resource being removed has the highest index.}.
    1919
     
    107107First some data structure needs to keep track of all \procs that are in idle sleep.
    108108Because of idle sleep can be spurious, this data structure has strict performance requirements in addition to the strict correctness requirements.
    109 Next, some tool must be used to block kernel threads \glspl{kthrd}, \eg \texttt{pthread\_cond\_wait}, pthread semaphores.
     109Next, some tool must be used to block kernel threads \glspl{kthrd}, \eg @pthread_cond_wait@, pthread semaphores.
    110110The complexity here is to support \at parking and unparking, timers, \io operations and all other \CFA features with minimal complexity.
    111111Finally, idle sleep also includes a heuristic to determine the appropriate number of \procs to be in idle sleep an any given time.
     
    117117In terms of blocking a \gls{kthrd} until some event occurs the linux kernel has many available options:
    118118
    119 \paragraph{\texttt{pthread\_mutex}/\texttt{pthread\_cond}}
    120 The most classic option is to use some combination of \texttt{pthread\_mutex} and \texttt{pthread\_cond}.
    121 These serve as straight forward mutual exclusion and synchronization tools and allow a \gls{kthrd} to wait on a \texttt{pthread\_cond} until signalled.
    122 While this approach is generally perfectly appropriate for \glspl{kthrd} waiting after eachother, \io operations do not signal \texttt{pthread\_cond}s.
    123 For \io results to wake a \proc waiting on a \texttt{pthread\_cond} means that a different \glspl{kthrd} must be woken up first, and then the \proc can be signalled.
    124 
    125 \subsection{\texttt{io\_uring} and Epoll}
    126 An alternative is to flip the problem on its head and block waiting for \io, using \texttt{io\_uring} or even \texttt{epoll}.
     119\paragraph{\lstinline{pthread_mutex}/\lstinline{pthread_cond}}
     120The most classic option is to use some combination of @pthread_mutex@ and @pthread_cond@.
     121These serve as straight forward mutual exclusion and synchronization tools and allow a \gls{kthrd} to wait on a @pthread_cond@ until signalled.
     122While this approach is generally perfectly appropriate for \glspl{kthrd} waiting after eachother, \io operations do not signal @pthread_cond@s.
     123For \io results to wake a \proc waiting on a @pthread_cond@ means that a different \glspl{kthrd} must be woken up first, and then the \proc can be signalled.
     124
     125\subsection{\lstinline{io_uring} and Epoll}
     126An alternative is to flip the problem on its head and block waiting for \io, using @io_uring@ or even @epoll@.
    127127This creates the inverse situation, where \io operations directly wake sleeping \procs but waking \proc from a running \gls{kthrd} must use an indirect scheme.
    128128This generally takes the form of creating a file descriptor, \eg, a dummy file, a pipe or an event fd, and using that file descriptor when \procs need to wake eachother.
     
    132132\subsection{Event FDs}
    133133Another interesting approach is to use an event file descriptor\cit{eventfd}.
    134 This is a Linux feature that is a file descriptor that behaves like \io, \ie, uses \texttt{read} and \texttt{write}, but also behaves like a semaphore.
     134This is a Linux feature that is a file descriptor that behaves like \io, \ie, uses @read@ and @write@, but also behaves like a semaphore.
    135135Indeed, all read and writes must use 64bits large values\footnote{On 64-bit Linux, a 32-bit Linux would use 32 bits values.}.
    136 Writes add their values to the buffer, that is arithmetic addition and not buffer append, and reads zero out the buffer and return the buffer values so far\footnote{This is without the \texttt{EFD\_SEMAPHORE} flag. This flags changes the behavior of \texttt{read} but is not needed for this work.}.
     136Writes add their values to the buffer, that is arithmetic addition and not buffer append, and reads zero out the buffer and return the buffer values so far\footnote{
     137This is without the \lstinline{EFD_SEMAPHORE} flag. This flags changes the behavior of \lstinline{read} but is not needed for this work.}.
    137138If a read is made while the buffer is already 0, the read blocks until a non-0 value is added.
    138 What makes this feature particularly interesting is that \texttt{io\_uring} supports the \texttt{IORING\_REGISTER\_EVENTFD} command, to register an event fd to a particular instance.
    139 Once that instance is registered, any \io completion will result in \texttt{io\_uring} writing to the event FD.
     139What makes this feature particularly interesting is that @io_uring@ supports the @IORING_REGISTER_EVENTFD@ command, to register an event fd to a particular instance.
     140Once that instance is registered, any \io completion will result in @io\_uring@ writing to the event FD.
    140141This means that a \proc waiting on the event FD can be \emph{directly} woken up by either other \procs or incomming \io.
    141142
     
    172173This means that whichever entity removes idle \procs from the sleeper list must be able to do so in any order.
    173174Using a simple lock over this data structure makes the removal much simpler than using a lock-free data structure.
    174 The notification process then simply needs to wake-up the desired idle \proc, using \texttt{pthread\_cond\_signal}, \texttt{write} on an fd, etc., and the \proc will handle the rest.
     175The notification process then simply needs to wake-up the desired idle \proc, using @pthread_cond_signal@, @write@ on an fd, etc., and the \proc will handle the rest.
    175176
    176177\subsection{Reducing Latency}
     
    190191The contention is mostly due to the lock on the list needing to be held to get to the head \proc.
    191192That lock can be contended by \procs attempting to go to sleep, \procs waking or notification attempts.
    192 The contentention from the \procs attempting to go to sleep can be mitigated slightly by using \texttt{try\_acquire} instead, so the \procs simply continue searching for \ats if the lock is held.
     193The contentention from the \procs attempting to go to sleep can be mitigated slightly by using @try\_acquire@ instead, so the \procs simply continue searching for \ats if the lock is held.
    193194This trick cannot be used for waking \procs since they are not in a state where they can run \ats.
    194195However, it is worth nothing that notification does not strictly require accessing the list or the head \proc.
    195196Therefore, contention can be reduced notably by having notifiers avoid the lock entirely and adding a pointer to the event fd of the first idle \proc, as in Figure~\ref{fig:idle2}.
    196 To avoid contention between the notifiers, instead of simply reading the atomic pointer, notifiers atomically exchange it to \texttt{null} so only only notifier will contend on the system call.
     197To avoid contention between the notifiers, instead of simply reading the atomic pointer, notifiers atomically exchange it to @null@ so only only notifier will contend on the system call.
    197198
    198199\begin{figure}
     
    206207This can be done by adding what is effectively a benaphore\cit{benaphore} in front of the event fd.
    207208A simple three state flag is added beside the event fd to avoid unnecessary system calls, as shown in Figure~\ref{fig:idle:state}.
    208 The flag starts in state \texttt{SEARCH}, while the \proc is searching for \ats to run.
    209 The \proc then confirms the sleep by atomically swaping the state to \texttt{SLEEP}.
    210 If the previous state was still \texttt{SEARCH}, then the \proc does read the event fd.
    211 Meanwhile, notifiers atomically exchange the state to \texttt{AWAKE} state.
    212 if the previous state was \texttt{SLEEP}, then the notifier must write to the event fd.
     209The flag starts in state @SEARCH@, while the \proc is searching for \ats to run.
     210The \proc then confirms the sleep by atomically swaping the state to @SLEEP@.
     211If the previous state was still @SEARCH@, then the \proc does read the event fd.
     212Meanwhile, notifiers atomically exchange the state to @AWAKE@ state.
     213if the previous state was @SLEEP@, then the notifier must write to the event fd.
    213214However, if the notify arrives almost immediately after the \proc marks itself idle, then both reads and writes on the event fd can be omitted, which reduces latency notably.
    214215This leads to the final data structure shown in Figure~\ref{fig:idle}.
Note: See TracChangeset for help on using the changeset viewer.