Changeset d93ea1d for doc/theses/thierry_delisle_PhD/thesis
- Timestamp:
- Aug 15, 2022, 5:06:43 PM (2 years ago)
- Branches:
- ADT, ast-experimental, master, pthread-emulation
- Children:
- e116db3
- Parents:
- 8bee858
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/thierry_delisle_PhD/thesis/text/conclusion.tex
r8bee858 rd93ea1d 19 19 20 20 \subsection{Idle Sleep} 21 An interesting 21 A difficult challenge that was not fully address in this thesis is idle-sleep. 22 While a correct and somewhat low-cost idle-sleep mechanism was presented, several of the benchmarks show notable performance degradation when too few \ats are present in the system. 23 The idle sleep mechanism could therefore benefit from a reduction of spurious cases of sleeping. 24 Furthermore, this thesis did not present any heuristic for when \procs should be put to sleep and when \procs should be woken up. 25 It is especially worth noting that relaxed timestamps and topology aware helping lead to notable improvements in performance. 26 Neither of these techniques were used for the idle sleep mechanism. 27 28 There are opportunities where these techniques could be use: 29 The mechanism uses a hand-shake between notification and sleep to ensure that no \at is missed. 30 The correctness of that hand-shake is cirtical when the last \proc goes to sleep but could be relaxed when several \procs are awake. 31 Furthermore, organizing the sleeping \procs as a LIDO stack makes sense to keep cold \procs as cold as possible, but it might be more appropriate to attempt to keep cold CPU sockets instead. 32 33 However, using these techniques could require significant investigation. 34 For example, keeping a CPU socket cold might be appropriate for power consumption reasons but can affect overall memory bandwith. 35 The balance between these is not necessarily obvious. 22 36 23 37 \subsection{Hardware} 38 One challenge that needed to be overcome for this thesis was that the modern x86-64 has very few tools to implement fairness. 39 \Glspl{proc} attempting to help eachother inherently cause cache-coherence traffic. 40 However, as mentioned in Section~\ref{helping}, relaxed requirements mean this traffic is not necessarily productive. 41 In cases like this one, there is an opportunity to improve performance by extending the hardware. 42 43 Many different extensions would be suitable here. 44 For example, when attempting to read remote timestamps when deciding to whether or not to help, it could be useful to allow cancelling the remote read if it will lead to significant latency. 45 If the latency is due to a recent cache invalidation, it is unlikely that the timestamp is old and that helping will be needed. 46 As such, simply moving on without the result is likely to be acceptable. 47 Another option would be to attempt to read multiple memory addresses and only wait for \emph{one of} these reads to retire. 48 This would have a similar effect, where cache-lines with more traffic would be waited on less often. 49 In both of these examples, some care would probably be needed to make sure that the reads to an address \emph{sometimes} retire. 50 51 Note that this is similar to the feature \newterm{Hardware Transactional Memory}~\cite{HTM}, which allows groups of instructions to be aborted and rolled-back if they encounter memory conflicts when being retired. 52 However, I believe this feature is generally aimed at large groups of instructions. 53 A more fine-grained approach may be more amenable to carefully picking which aspects of an algorithm require exact correctness and which do not.
Note: See TracChangeset
for help on using the changeset viewer.