source: doc/theses/thierry_delisle_PhD/thesis/text/conclusion.tex @ 50ff1d0

ADTast-experimentalpthread-emulation
Last change on this file since 50ff1d0 was d93ea1d, checked in by Thierry Delisle <tdelisle@…>, 2 years ago

Filled in something for the conclusion that is kind of complete

  • Property mode set to 100644
File size: 4.7 KB
Line 
1\chapter{Conclusion}\label{conclusion}
2
3\Gls{uthrding} is popular.
4It makes sense for \CFA to use it.
5
6\todo{Obivously fix the above}
7
8An important aspect of this approach to threading is how threads are scheduled.
9As \CFA aims to increase productivity and safety of C while maintaining its performance, so to should the threading runtime achieve these goals.
10For scheduling, productivity and safety manifest in removing pitfalls in the efficient usage of the threading runtime.
11This thesis contributes to this goal by presenting a low-latency scheduler that offers improved starvation prevention compared to other state-of-the-art schedulers.
12It presents a core algorithm (Chapter~\ref{core}) that provides increased fairness through helping (Section~\ref{heling}) as well as optimizations which virtually remove the cost of this fairness (Section~\ref{relaxedtimes}).
13Building upon the fundamental scheduling algorithm, an implementation of user-level \io blocking is presented (Chapter~\ref{io}) which achieves the same performance and fairness balance as the scheduler itself.
14From these core algorithms, and a low-latency idle-sleep mechanism is presented (Chapter~\ref{practice}) which allows the \CFA runtime to stay viable for workloads that do not consistently saturate the system.
15
16\section{Future Work}
17While the \CFA runtime achieves a better compromise in term of performance and fairness than other schedulers, I do believe that further improvements could be made to reduce even further the number of cases where performance deteriorates.
18Furthermore, I believe that achieve performance and starvation freedom simultaneously is generally a challenge even outside of scheduling algorithms.
19
20\subsection{Idle Sleep}
21A difficult challenge that was not fully address in this thesis is idle-sleep.
22While a correct and somewhat low-cost idle-sleep mechanism was presented, several of the benchmarks show notable performance degradation when too few \ats are present in the system.
23The idle sleep mechanism could therefore benefit from a reduction of spurious cases of sleeping.
24Furthermore, this thesis did not present any heuristic for when \procs should be put to sleep and when \procs should be woken up.
25It is especially worth noting that relaxed timestamps and topology aware helping lead to notable improvements in performance.
26Neither of these techniques were used for the idle sleep mechanism.
27
28There are opportunities where these techniques could be use:
29The mechanism uses a hand-shake between notification and sleep to ensure that no \at is missed.
30The correctness of that hand-shake is cirtical when the last \proc goes to sleep but could be relaxed when several \procs are awake.
31Furthermore, organizing the sleeping \procs as a LIDO stack makes sense to keep cold \procs as cold as possible, but it might be more appropriate to attempt to keep cold CPU sockets instead.
32
33However, using these techniques could require significant investigation.
34For example, keeping a CPU socket cold might be appropriate for power consumption reasons but can affect overall memory bandwith.
35The balance between these is not necessarily obvious.
36
37\subsection{Hardware}
38One challenge that needed to be overcome for this thesis was that the modern x86-64 has very few tools to implement fairness.
39\Glspl{proc} attempting to help eachother inherently cause cache-coherence traffic.
40However, as mentioned in Section~\ref{helping}, relaxed requirements mean this traffic is not necessarily productive.
41In cases like this one, there is an opportunity to improve performance by extending the hardware.
42
43Many different extensions would be suitable here.
44For example, when attempting to read remote timestamps when deciding to whether or not to help, it could be useful to allow cancelling the remote read if it will lead to significant latency.
45If the latency is due to a recent cache invalidation, it is unlikely that the timestamp is old and that helping will be needed.
46As such, simply moving on without the result is likely to be acceptable.
47Another option would be to attempt to read multiple memory addresses and only wait for \emph{one of} these reads to retire.
48This would have a similar effect, where cache-lines with more traffic would be waited on less often.
49In both of these examples, some care would probably be needed to make sure that the reads to an address \emph{sometimes} retire.
50
51Note that this is similar to the feature \newterm{Hardware Transactional Memory}~\cite{HTM}, which allows groups of instructions to be aborted and rolled-back if they encounter memory conflicts when being retired.
52However, I believe this feature is generally aimed at large groups of instructions.
53A more fine-grained approach may be more amenable to carefully picking which aspects of an algorithm require exact correctness and which do not.
Note: See TracBrowser for help on using the repository browser.