Changeset 7e5da64


Ignore:
Timestamp:
Aug 31, 2022, 4:09:49 PM (20 months ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
ADT, ast-experimental, master, pthread-emulation
Children:
f403c46
Parents:
9d67a6d
Message:

Reworked conclusion except Goals section

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/thierry_delisle_PhD/thesis/text/conclusion.tex

    r9d67a6d r7e5da64  
    11\chapter{Conclusion}\label{conclusion}
    22
    3 Building the \CFA runtime has been an extremely challenging project.
     3Building the \CFA runtime has been a challenging project.
    44The work was divided between high-level concurrency design and a user-level threading runtime (Masters thesis), and low-level support of the user-level runtime using OS kernel-threading and its (multiple) I/O subsystems (Ph.D. thesis).
    55Because I am the main developer for both components of this project, there is strong continuity across the design and implementation.
     
    77
    88I believed my Masters work would provide the background to make the Ph.D work reasonably straightforward.
    9 What I discovered is that interacting with kernel locking, threading, and I/O in the UNIX (Linux) operating system is extremely difficult.
    10 There are multiple concurrency aspects in UNIX that are poorly designed, not only for user-level threading but also for kernel-level threading.
    11 Basically, UNIX-based OSs are not very concurrency friendly.
     9However, in doing so I discovered two expected challenges.
     10First, while modern symmetric multiprocessing CPU have a significant penalties for communicating across cores, state-of-the-art work-stealing schedulers are very effective a avoid the need for communication in many common workloads.
     11This leaves very little space for adding fairness guarantees without notable performance penalties.
     12Second, the kernel locking, threading, and I/O in the Linux operating system offers very little flexibility of use.
     13There are multiple concurrency aspects in Linux that require carefully following a strict procedure in order to achieve acceptable performance.
    1214To be fair, many of these concurrency aspects were designed 30-40 years ago, when there were few multi-processor computers and concurrency knowledge was just developing.
    13 It is unfortunately so little has changed in the intervening years.
     15It is unfortunate that little has changed in the intervening years.
    1416
    1517Also, my decision to use @io_uring@ was both a positive and negative.
    16 The positive is that @io_uring@ supports the panoply of I/O mechanisms in UNIX;
    17 hence, the \CFA runtime uses one I/O mechanism to provide non-blocking I/O, rather than using @select@ to handle TTY I/O, @epoll@ to handle network I/O, and some unknown to handle disk I/O.
    18 It is unclear to me how I would have merged all these different I/O mechanisms into a coherent scheduling implementation.
     18The positive is that @io_uring@ supports the panoply of I/O mechanisms in Linux;
     19hence, the \CFA runtime uses one I/O mechanism to provide non-blocking I/O, rather than using @select@ to handle TTY I/O, @epoll@ to handle network I/O, and managing a thread pool to handle disk I/O.
     20Merging all these different I/O mechanisms into a coherent scheduling implementation would require a much more work than what is present in this thesis, as well as detailed knowledge of the I/O mechanisms in Linux.
    1921The negative is that @io_uring@ is new and developing.
    2022As a result, there is limited documentation, few places to find usage examples, and multiple errors that required workarounds.
    2123Given what I now know about @io_uring@, I would say it is insufficiently coupled with the Linux kernel to properly handle non-blocking I/O.
    22 Specifically, spinning up internal kernel threads to handle blocking scenarios is what developers already do outside of the kernel.
     24It does not seem to reach deep into the Kernel's handling of \io, and as such it must contend with the same realities that users of epoll must contend with.
     25Specifically, in cases where @O_NONBLOCK@ behaves as desired, operations must still be retried.
     26To preserve the illusion of asynchronicity, this requires delegating operations to kernel threads.
     27This is also true of cases where @O_NONBLOCK@ does not prevent blocking.
     28Spinning up internal kernel threads to handle blocking scenarios is what developers already do outside of the kernel, and managing these threads adds significant burden to the system.
    2329Nonblocking I/O should not be handled in this way.
    24 
    25 % While \gls{uthrding} is an old idea, going back to the first multi-processor computers, it was largely set aside in the 1990s because of the complexity in making it work well between applications and the operating system.
    26 % Unfortunately,, a large amount of that complexity still exists, making \gls{uthrding} a difficult task for library and programming-language implementers.
    27 % For example, the introduction of thread-local storage and its usage in many C libraries causes the serially reusable problem~\cite{SeriallyReusable} for all \gls{uthrding} implementers.
    28 % Specifically, if a \gls{kthrd} is preempted, it always restarts with the same thread-local storage;
    29 % when a user thread is preempted, it can be restarted on another \gls{kthrd}, accessing the new thread-local storage, or worse, the previous thread-local storage.
    30 % The latter case causes failures when an operation using the thread-local storage is assumed to be atomic at the kernel-threading level.
    31 % If library implementers always used the pthreads interface to access threads, locks, and thread-local storage, language runtimes can interpose a \gls{uthrding} version of pthreads, switching the kind of threads, locks, and storage, and hence, make it safe.
    32 % However, C libraries are currently filled with direct declarations of thread-local storage and low-level atomic instructions.
    33 % In essence, every library developer is inventing their own threading mechanisms to solve their unique problem, independent from any standardize approaches.
    34 % This state of affairs explains why concurrency is such a mess.
    35 %
    36 % To address the concurrency mess, new programming languages integrate threading into the language rather than using the operating-system supplied interface.
    37 % The reason is that many sequential code-optimizations invalid correctly written concurrent programs.
    38 % While providing safe concurrent-code generation is essential, the underlying language runtime is still free to implement threading using kernel or user/kernel threading, \eg Java versus Go.
    39 % In either case, the language/runtime manages all the details to simplify concurrency and increase safety.
    4030
    4131\section{Goals}
     
    8575\section{Future Work}
    8676While the \CFA runtime achieves a better compromise, in term of performance and fairness, than other schedulers, I believe further improvements can be made to reduce or eliminate the few cases where performance does deteriorate.
    87 Fundamentally, achieving performance and starvation freedom will always be opposing goals even outside of scheduling algorithms.
     77Fundamentally, achieving performance and starvation freedom will always be goals with opposing needs even outside of scheduling algorithms.
    8878
    8979\subsection{Idle Sleep}
     
    10191The correctness of that hand-shake is critical when the last \proc goes to sleep but could be relaxed when several \procs are awake.
    10292\item
    103 Furthermore, organizing the sleeping \procs as a LIDO stack makes sense to keep cold \procs as cold as possible, but it might be more appropriate to attempt to keep cold CPU sockets instead.
     93Furthermore, organizing the sleeping \procs as a LIFO stack makes sense to keep cold \procs as cold as possible, but it might be more appropriate to attempt to keep cold CPU sockets instead.
    10494\end{itemize}
    10595However, using these techniques would require significant investigation.
Note: See TracChangeset for help on using the changeset viewer.