Ignore:
Timestamp:
Apr 13, 2022, 1:08:44 PM (3 years ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
ADT, ast-experimental, enum, master, pthread-emulation, qualifiedEnum
Children:
4ec9513
Parents:
3112733
Message:

Some more writing, mostly pushing to have it on other machines

Location:
doc/theses/thierry_delisle_PhD/thesis
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/thierry_delisle_PhD/thesis/local.bib

    r3112733 r6b06abe  
    685685  note = "[Online; accessed 9-February-2021]"
    686686}
     687
     688@misc{wiki:rcu,
     689  author = "{Wikipedia contributors}",
     690  title = "Read-copy-update --- {W}ikipedia{,} The Free Encyclopedia",
     691  year = "2022",
     692  url = "https://en.wikipedia.org/wiki/Linear_congruential_generator",
     693  note = "[Online; accessed 12-April-2022]"
     694}
     695
     696@misc{wiki:rwlock,
     697  author = "{Wikipedia contributors}",
     698  title = "Readers-writer lock --- {W}ikipedia{,} The Free Encyclopedia",
     699  year = "2021",
     700  url = "https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock",
     701  note = "[Online; accessed 12-April-2022]"
     702}
  • doc/theses/thierry_delisle_PhD/thesis/text/practice.tex

    r3112733 r6b06abe  
    22The scheduling algorithm discribed in Chapter~\ref{core} addresses scheduling in a stable state.
    33However, it does not address problems that occur when the system changes state.
    4 Indeed the \CFA runtime, supports expanding and shrinking the number of KTHREAD\_place \todo{add kthrd to glossary}, both manually and, to some extent automatically.
     4Indeed the \CFA runtime, supports expanding and shrinking the number of \procs, both manually and, to some extent, automatically.
    55This entails that the scheduling algorithm must support these transitions.
    66
    7 \section{Resizing}
     7More precise \CFA supports adding \procs using the RAII object @processor@.
     8These objects can be created at any time and can be destroyed at any time.
     9They are normally create as automatic stack variables, but this is not a requirement.
     10
     11The consequence is that the scheduler and \io subsystems must support \procs comming in and out of existence.
     12
     13\section{Manual Resizing}
     14The consequence of dynamically changing the number of \procs is that all internal arrays that are sized based on the number of \procs neede to be \texttt{realloc}ed.
     15This also means that any references into these arrays, pointers or indexes, may need to be fixed when shrinking\footnote{Indexes may still need fixing because there is no guarantee the \proc causing the shrink had the highest index. Therefore indexes need to be reassigned to preserve contiguous indexes.}.
     16
     17There are no performance requirements, within reason, for resizing since this is usually considered as part of setup and teardown.
     18However, this operation has strict correctness requirements since shrinking and idle sleep can easily lead to deadlocks.
     19It should also avoid as much as possible any effect on performance when the number of \procs remain constant.
     20This later requirement prehibits simple solutions, like simply adding a global lock to these arrays.
     21
     22\subsection{Read-Copy-Update}
     23One solution is to use the Read-Copy-Update\cite{wiki:rcu} pattern.
     24In this pattern, resizing is done by creating a copy of the internal data strucures, updating the copy with the desired changes, and then attempt an Idiana Jones Switch to replace the original witht the copy.
     25This approach potentially has the advantage that it may not need any synchronization to do the switch.
     26The switch definitely implies a race where \procs could still use the previous, original, data structure after the copy was switched in.
     27The important question then becomes whether or not this race can be recovered from.
     28If the changes that arrived late can be transferred from the original to the copy then this solution works.
     29
     30For linked-lists, dequeing is somewhat of a problem.
     31Dequeing from the original will not necessarily update the copy which could lead to multiple \procs dequeing the same \at.
     32Fixing this requires making the array contain pointers to subqueues rather than the subqueues themselves.
     33
     34Another challenge is that the original must be kept until all \procs have witnessed the change.
     35This is a straight forward memory reclamation challenge but it does mean that every operation will need \emph{some} form of synchronization.
     36If each of these operation does need synchronization then it is possible a simpler solution achieves the same performance.
     37Because in addition to the classic challenge of memory reclamation, transferring the original data to the copy before reclaiming it poses additional challenges.
     38Especially merging subqueues while having a minimal impact on fairness and locality.
     39
     40\subsection{Read-Writer Lock}
     41A simpler approach would be to use a \newterm{Readers-Writer Lock}\cite{wiki:rwlock} where the resizing requires acquiring the lock as a writer while simply enqueing/dequeing \ats requires acquiring the lock as a reader.
     42Using a Readers-Writer lock solves the problem of dynamically resizing and leaves the challenge of finding or building a lock with sufficient good read-side performance.
     43Since this is not a very complex challenge and an ad-hoc solution is perfectly acceptable, building a Readers-Writer lock was the path taken.
     44
     45To maximize reader scalability, the readers should not contend with eachother when attempting to acquire and release the critical sections.
     46This effectively requires that each reader have its own piece of memory to mark as locked and unlocked.
     47Reades then acquire the lock wait for writers to finish the critical section and then acquire their local spinlocks.
     48Writers acquire the global lock, so writers have mutual exclusion among themselves, and then acquires each of the local reader locks.
     49Acquiring all the local locks guarantees mutual exclusion between the readers and the writer, while the wait on the read side prevents readers from continously starving the writer.
     50\todo{reference listings}
     51
     52\begin{lstlisting}
     53void read_lock() {
     54        // Step 1 : make sure no writers in
     55        while write_lock { Pause(); }
     56
     57        // May need fence here
     58
     59        // Step 2 : acquire our local lock
     60        while atomic_xchg( tls.lock ) {
     61                Pause();
     62        }
     63}
     64
     65void read_unlock() {
     66        tls.lock = false;
     67}
     68\end{lstlisting}
     69
     70\begin{lstlisting}
     71void write_lock()  {
     72        // Step 1 : lock global lock
     73        while atomic_xchg( write_lock ) {
     74                Pause();
     75        }
     76
     77        // Step 2 : lock per-proc locks
     78        for t in all_tls {
     79                while atomic_xchg( t.lock ) {
     80                        Pause();
     81                }
     82        }
     83}
     84
     85void write_unlock() {
     86        // Step 1 : release local locks
     87        for t in all_tls {
     88                t.lock = false;
     89        }
     90
     91        // Step 2 : release global lock
     92        write_lock = false;
     93}
     94\end{lstlisting}
    895
    996\section{Idle-Sleep}
     97
     98\subsection{Tracking Sleepers}
     99
     100\subsection{Event FDs}
     101
     102\subsection{Epoll}
     103
     104\subsection{\texttt{io\_uring}}
     105
     106\subsection{Reducing Latency}
Note: See TracChangeset for help on using the changeset viewer.