Ignore:
Timestamp:
Aug 15, 2022, 5:06:43 PM (2 years ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
ADT, ast-experimental, master, pthread-emulation
Children:
e116db3
Parents:
8bee858
Message:

Filled in something for the conclusion that is kind of complete

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/thierry_delisle_PhD/thesis/text/conclusion.tex

    r8bee858 rd93ea1d  
    1919
    2020\subsection{Idle Sleep}
    21 An interesting
     21A difficult challenge that was not fully address in this thesis is idle-sleep.
     22While a correct and somewhat low-cost idle-sleep mechanism was presented, several of the benchmarks show notable performance degradation when too few \ats are present in the system.
     23The idle sleep mechanism could therefore benefit from a reduction of spurious cases of sleeping.
     24Furthermore, this thesis did not present any heuristic for when \procs should be put to sleep and when \procs should be woken up.
     25It is especially worth noting that relaxed timestamps and topology aware helping lead to notable improvements in performance.
     26Neither of these techniques were used for the idle sleep mechanism.
     27
     28There are opportunities where these techniques could be use:
     29The mechanism uses a hand-shake between notification and sleep to ensure that no \at is missed.
     30The correctness of that hand-shake is cirtical when the last \proc goes to sleep but could be relaxed when several \procs are awake.
     31Furthermore, organizing the sleeping \procs as a LIDO stack makes sense to keep cold \procs as cold as possible, but it might be more appropriate to attempt to keep cold CPU sockets instead.
     32
     33However, using these techniques could require significant investigation.
     34For example, keeping a CPU socket cold might be appropriate for power consumption reasons but can affect overall memory bandwith.
     35The balance between these is not necessarily obvious.
    2236
    2337\subsection{Hardware}
     38One challenge that needed to be overcome for this thesis was that the modern x86-64 has very few tools to implement fairness.
     39\Glspl{proc} attempting to help eachother inherently cause cache-coherence traffic.
     40However, as mentioned in Section~\ref{helping}, relaxed requirements mean this traffic is not necessarily productive.
     41In cases like this one, there is an opportunity to improve performance by extending the hardware.
     42
     43Many different extensions would be suitable here.
     44For example, when attempting to read remote timestamps when deciding to whether or not to help, it could be useful to allow cancelling the remote read if it will lead to significant latency.
     45If the latency is due to a recent cache invalidation, it is unlikely that the timestamp is old and that helping will be needed.
     46As such, simply moving on without the result is likely to be acceptable.
     47Another option would be to attempt to read multiple memory addresses and only wait for \emph{one of} these reads to retire.
     48This would have a similar effect, where cache-lines with more traffic would be waited on less often.
     49In both of these examples, some care would probably be needed to make sure that the reads to an address \emph{sometimes} retire.
     50
     51Note that this is similar to the feature \newterm{Hardware Transactional Memory}~\cite{HTM}, which allows groups of instructions to be aborted and rolled-back if they encounter memory conflicts when being retired.
     52However, I believe this feature is generally aimed at large groups of instructions.
     53A more fine-grained approach may be more amenable to carefully picking which aspects of an algorithm require exact correctness and which do not.
Note: See TracChangeset for help on using the changeset viewer.