Ignore:
Timestamp:
Nov 24, 2022, 3:41:44 PM (17 months ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
ADT, ast-experimental, master
Children:
dacd8e6e
Parents:
82a90d4
Message:

Last corrections to my thesis... hopefully

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/thierry_delisle_PhD/thesis/text/intro.tex

    r82a90d4 rddcaff6  
    11\chapter{Introduction}\label{intro}
    22
    3 \Gls{uthrding} (M:N) is gaining popularity over kernel-level threading (1:1) in many programming languages.
     3\Gls{uthrding} (M:N) is gaining popularity over kernel-level threading (1:1) in many programming languages, like Go~\cite{GITHUB:go}, Java's Project Loom~\cite{MAN:project-loom} and Kotlin~\cite{MAN:kotlin}.
    44The user threading approach is often a better mechanism to express complex concurrent applications by efficiently running 10,000+ threads on multicore systems.
    55Indeed, over-partitioning into small work units with user threading significantly eases load bal\-ancing, while simultaneously providing advanced synchronization and mutual exclusion capabilities.
    66To manage these high levels of concurrency, the underlying runtime must efficiently schedule many user threads across a few kernel threads;
    7 which begs the question of how many kernel threads are needed and should the number be dynamically reevaluated.
     7which raises the question of how many kernel threads are needed and should the number be dynamically reevaluated.
    88Furthermore, scheduling must prevent kernel threads from blocking, otherwise user-thread parallelism drops.
    9 When user-threading parallelism does drop, how and when should idle kernel-threads be put to sleep to avoid wasting CPU resources.
     9When user-threading parallelism does drop, how and when should idle kernel-threads be put to sleep to avoid wasting CPU resources?
    1010Finally, the scheduling system must provide fairness to prevent a user thread from monopolizing a kernel thread;
    1111otherwise, other user threads can experience short/long term starvation or kernel threads can deadlock waiting for events to occur on busy kernel threads.
     
    1616Fairness is handled through preemption and/or ad hoc solutions, which leads to coarse-grained fairness with some pathological cases.
    1717
    18 After examining, testing and selecting specific approaches to these scheduling issues, a completely new scheduler was created and tested in the \CFA (C-for-all) user-threading runtime system.
     18After examining, testing and selecting specific approaches to these scheduling issues, a new scheduler was created and tested in the \CFA (C-for-all) user-threading runtime system.
    1919The goal of the new scheduler is to offer increased safety and productivity without sacrificing performance.
    20 The quality of the new scheduler is demonstrated by comparing it with other user-threading work-stealing schedulers with, the aim of showing equivalent or better performance while offering better fairness.
     20The quality of the new scheduler is demonstrated by comparing it with other user-threading work-stealing schedulers with the aim of showing equivalent or better performance while offering better fairness.
    2121
    2222Chapter~\ref{intro} defines scheduling and its general goals.
     
    3232Computer systems share multiple resources across many threads of execution, even on single-user computers like laptops or smartphones.
    3333On a computer system with multiple processors and work units (routines, coroutines, threads, programs, \etc), there exists the problem of mapping many different kinds of work units onto many different kinds of processors efficiently, called \newterm{scheduling}.
    34 Scheduling systems are normally \newterm{open}, meaning new work arrives from an external source or is randomly spawned from an existing work unit.
     34Scheduling systems are normally \newterm{open}, meaning new work arrives from an external source or is randomly spawned from an existing work unit\footnote{
     35Open systems constrasts to \newterm{closed} systems, where work never arrives from external sources.
     36This definition is a extension of open/closed systems in the field of thermodynamics.
     37}.
     38
    3539In general, work units without threads, like routines and coroutines, are self-scheduling, while work units with threads, like tasks and programs, are scheduled.
    3640For scheduled work-units, a scheduler takes a sequence of threads and attempts to run them to completion, subject to shared resource restrictions and utilization.
     
    9498\end{enumerate}
    9599
    96 Scheduling is a zero-sum game as computer processors normally have a fixed, maximum number of cycles per unit time.\footnote{
     100At a high-level, scheduling is considered zero-sum game as computer processors normally have a fixed, maximum number of cycles per unit time.\footnote{
    97101Frequency scaling and turbo-boost add a degree of complexity that can be ignored in this discussion without loss of generality.}
    98 Hence, schedulers are a series of compromises, occasionally with some static or dynamic tuning parameters to enhance specific workload patterns.
     102This almost invariably leads to schedulers needing to pick some \ats over others, opening the door to fairness problems.
     103However, at a lower-level, schedulers can make inefficient or incorrect decisions leading to strictly worse outcomes than what the theoretical zero-sum game suggests.
     104Since it can be difficult to avoid these poor decisions, schedulers are generally a series of compromises, occasionally with some static or dynamic tuning parameters to enhance specific workload patterns.
    99105For example, SQMS has perfect load-balancing but poor affinity and high contention by the processors, because of the single queue.
    100106While MQMS has poor load-balancing but perfect affinity and no contention, because each processor has its own queue.
     
    113119Specifically, this work concentrates on advanced thread and \glsxtrshort{io} scheduling.
    114120Prior to this work, the \CFA runtime used a strict SQMS \gls{rQ} and provided no nonblocking \glsxtrshort{io} capabilities at the user-thread level.\footnote{
    115 C/\CC only support \glsxtrshort{io} capabilities at the kernel level, which means many \io operations block \glspl{kthrd} reducing parallelism at the user level.}
     121C/\CC only support \glsxtrshort{io} capabilities at the kernel level, which means many \io operations block \glspl{kthrd}, reducing parallelism at the user level.}
    116122
    117123Since \CFA attempts to improve the safety and productivity of C, the new scheduler presented in this thesis attempts to achieve the same goals.
    118 More specifically, safety and productivity for scheduling mean supporting a wide range of workloads so that programmers can rely on progress guarantees (safety) and more easily achieve acceptable performance (productivity).
     124More specifically, safety and productivity for scheduling mean supporting a wide range of workloads, so that programmers can rely on progress guarantees (safety) and more easily achieve acceptable performance (productivity).
    119125The new scheduler also includes support for implicit nonblocking \io, allowing applications to have more user-threads blocking on \io operations than there are \glspl{kthrd}.
    120126To complete the scheduler, an idle sleep mechanism is implemented that significantly reduces wasted CPU cycles, which are then available outside the application.
    121127
    122 As a research project, this work builds exclusively on newer versions of the Linux operating system and gcc/clang compilers.
     128As a research project, this work runs exclusively on newer versions of the Linux operating system and gcc/clang compilers.
    123129The new scheduler implementation uses several optimizations to successfully balance the cost of fairness against performance;
    124130some of these optimizations rely on interesting hardware optimizations only present on modern CPUs.
     
    138144An algorithm for load-balancing and idle sleep of processors, including NUMA awareness.
    139145\item
    140 A mechanism for adding fairness on top of MQMS algorithm through helping, used both for scalable scheduling algorithm and the user-level \glsxtrshort{io}.
     146A mechanism for adding fairness on top of the MQMS algorithm through helping, used both for scalable scheduling algorithm and the user-level \glsxtrshort{io}.
    141147\item
    142148An optimization of the helping mechanism for load balancing to reduce scheduling costs.
Note: See TracChangeset for help on using the changeset viewer.