Changeset b9e2b87
- Timestamp:
- Jul 21, 2022, 8:26:59 AM (2 years ago)
- Branches:
- ADT, ast-experimental, master, pthread-emulation, qualifiedEnum
- Children:
- 0809c4e, 18070ee
- Parents:
- 6bf35d1 (diff), e6662f5 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/thierry_delisle_PhD/thesis/text/practice.tex
r6bf35d1 rb9e2b87 24 24 Therefore, dynamically changing the number of \procs is an appropriate moment to allocate or free resources to match the new state. 25 25 As such, all internal scheduling arrays that are sized based on the number of \procs need to be @realloc@ed. 26 This requirement also means any references into these arrays, \eg pointers or indexes, may need to be updated if elements are moved for compaction or any other reason. 27 % \footnote{Indexes may still need fixing when shrinking because some indexes are expected to refer to dense contiguous resources and there is no guarantee the resource being removed has the highest index.} 26 This requirement also means any references into these arrays, \eg pointers or indexes, may need to be updated if elements are moved for compaction or for any other reason. 28 27 29 28 There are no performance requirements, within reason, for resizing since it is expected to be rare. … … 34 33 \subsection{Read-Copy-Update} 35 34 One solution is to use the Read-Copy-Update pattern~\cite{wiki:rcu}. 36 In this pattern, resizing is done by creating a copy of the internal data structures (\eg see Figure~\ref{fig:base-ts2}), updating the copy with the desired changes, and then attempt an Indiana Jones Switch to replace the original with the copy.35 In this pattern, resizing is done by creating a copy of the internal data structures, \eg see Figure~\ref{fig:base-ts2}, updating the copy with the desired changes, and then attempt an Indiana Jones Switch to replace the original with the copy. 37 36 This approach has the advantage that it may not need any synchronization to do the switch. 38 37 However, there is a race where \procs still use the original data structure after the copy is switched. … … 100 99 \section{Idle-Sleep} 101 100 While manual resizing of \procs is expected to be rare, the number of \ats can vary significantly over an application's lifetime, which means there are times when there are too few or too many \procs. 102 For this work, it is the programer's responsibility to manually create \procs, so if there a too few \procs, the application must address this issue.101 For this work, it is the programer's responsibility to manually create \procs, so if there are too few \procs, the application must address this issue. 103 102 This leaves too many \procs when there are not enough \ats for all the \procs to be useful. 104 103 These idle \procs cannot be removed because their lifetime is controlled by the application, and only the application knows when the number of \ats may increase or decrease. 105 While idle \procs can spin until work appears, this approach wastes the processor (from other applications), energy and heat.104 While idle \procs can spin until work appears, this approach wastes energy, unnecessarily produces heat and prevents other applications from using the processor. 106 105 Therefore, idle \procs are put into an idle state, called \newterm{Idle-Sleep}, where the \gls{kthrd} is blocked until the scheduler deems it is needed. 107 106 … … 129 128 This generally takes the form of creating a file descriptor, \eg, dummy file, pipe, or event fd, and using that file descriptor when \procs need to wake each other. 130 129 This leads to additional complexity because there can be a race between these artificial \io and genuine \io operations. 131 If not handled correctly, this can lead to artificial files getting delay ingtoo long behind genuine files, resulting in longer latency.130 If not handled correctly, this can lead to artificial files getting delayed too long behind genuine files, resulting in longer latency. 132 131 133 132 \subsection{Event FDs} … … 148 147 As a result, improper handling of this race leads to all \procs going to sleep when there are ready \ats and the system deadlocks. 149 148 149 The handshake closing the race is done with both the notifier and the idle \proc executing two ordered steps. 150 The notifier first make sure the newly ready \at is visible to \procs searching for \ats, and then attempt to notify an idle \proc. 151 On the other side, \procs make themselves visible as idle \procs and then search for any \ats they may have missed. 152 Unlike regular work-stealing, this search must be exhaustive to make sure that pre-existing \at is missed. 153 These steps from both sides guarantee that if the search misses a newly ready \at, then the notifier is guaranteed to see at least one idle \proc. 154 Conversly, if the notifier does not see any idle \proc, then a \proc is guaranteed to find the new \at in its exhaustive search. 155 150 156 Furthermore, the ``Race-to-Idle'' approach means that there may be contention on the data structure tracking sleepers. 151 157 Contention can be tolerated for \procs attempting to sleep or wake-up because these \procs are not doing useful work, and therefore, not contributing to overall performance. … … 154 160 \subsection{Sleepers List} 155 161 Each cluster maintains a list of idle \procs, organized as a stack. 156 This ordering allows \procs at the head of the list to stay constantly active and those at the tail to stay in idle sleep for extended period of times.162 This ordering allows \procs at the tail to stay in idle sleep for extended period of times while those at the head of the list wake-up for bursts of activity. 157 163 Because of unbalanced performance requirements, the algorithm tracking sleepers is designed to have idle \procs handle as much of the work as possible. 158 164 The idle \procs maintain the stack of sleepers among themselves and notifying a sleeping \proc takes as little work as possible. … … 184 190 The contention from the \procs attempting to go to sleep can be mitigated slightly by using @try_acquire@, so the \procs simply busy wait again searching for \ats if the lock is held. 185 191 This trick cannot be used when waking \procs since the waker needs to return immediately to what it was doing. 192 186 193 Interestingly, general notification, \ie waking any idle processor versus a specific one, does not strictly require modifying the list. 187 194 Here, contention can be reduced notably by having notifiers avoid the lock entirely by adding a pointer to the event @fd@ of the first idle \proc, as in Figure~\ref{fig:idle2}. 188 195 To avoid contention among notifiers, notifiers atomically exchange it to @NULL@ so only one notifier contends on the system call. 189 \todo{Expand explanation of how a notification works.} 196 The first notifier will succeed the atomic exchange and obtain the @fd@ of an idle \proc. 197 The notifier will the normally write to the @fd@, which will wake a \proc. 198 The woken \proc can then update the atomic pointer while it is updating the head of the list. 199 Notifiers that obtained a @NULL@ in the exchange simply move on, knowing that another notifier is already waking a \proc. 200 This behaviour is equivalent to having multiple notifier write to the @fd@ since reads consume all previous writes. 190 201 191 202 \begin{figure}[t] … … 196 207 \end{figure} 197 208 198 The next optimization is to avoid the latency of the event @fd@, which can be done by adding what is effectively a b enaphore\cit{benaphore} in front of the event @fd@.209 The next optimization is to avoid the latency of the event @fd@, which can be done by adding what is effectively a binary benaphore\cit{benaphore} in front of the event @fd@. 199 210 A simple three state flag is added beside the event @fd@ to avoid unnecessary system calls, as shown in Figure~\ref{fig:idle:state}. 200 In Topological Work Stealing (see Section~\ref{s:TopologicalWorkStealing}), a \proc without \ats begins searching by setting the state flag to @SEARCH@. 201 If no \ats can be found to steal, the \proc then confirms it is going to sleep by atomically swapping the state to @SLEEP@. 211 A \proc begins its idle sleep by adding itself to the idle list before searching for an \at. 212 In the process of adding itself to the idle list, it sets the state flag to @SEARCH@. 213 If no \ats can be found during the search, the \proc then confirms it is going to sleep by atomically swapping the state to @SLEEP@. 202 214 If the previous state is still @SEARCH@, then the \proc does read the event @fd@. 203 215 Meanwhile, notifiers atomically exchange the state to @AWAKE@ state. 204 216 If the previous state is @SLEEP@, then the notifier must write to the event @fd@. 205 However, if the notify arrives almost immediately after the \proc marks itself sleeping (idle), then both reads and writes on the event @fd@ can be omitted, which reduces latency notably.217 However, if the notify arrives almost immediately after the \proc marks itself idle, then both reads and writes on the event @fd@ can be omitted, which reduces latency notably. 206 218 These extensions leads to the final data structure shown in Figure~\ref{fig:idle}. 207 \todo{You never talk about the Beaphore. What is its purpose and when is it used?}208 219 209 220 \begin{figure}
Note: See TracChangeset
for help on using the changeset viewer.