Changeset a44514e for doc/theses/thierry_delisle_PhD/thesis/text/intro.tex
- Timestamp:
- Sep 7, 2022, 4:12:00 PM (2 years ago)
- Branches:
- ADT, ast-experimental, master, pthread-emulation
- Children:
- e4855f6
- Parents:
- 7a0f798b
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/thierry_delisle_PhD/thesis/text/intro.tex
r7a0f798b ra44514e 2 2 3 3 \Gls{uthrding} (M:N) is gaining popularity over kernel-level threading (1:1) in many programming languages. 4 The user threading approach is often a better mechanism to express complex concurrent applications by efficiently running 10,000+ threads on multi -core systems.4 The user threading approach is often a better mechanism to express complex concurrent applications by efficiently running 10,000+ threads on multicore systems. 5 5 Indeed, over-partitioning into small work-units with user threading significantly eases load bal\-ancing, while simultaneously providing advanced synchronization and mutual exclusion capabilities. 6 6 To manage these high levels of concurrency, the underlying runtime must efficiently schedule many user threads across a few kernel threads; 7 which begs ofthe question of how many kernel threads are needed and should the number be dynamically reevaluated.7 which begs the question of how many kernel threads are needed and should the number be dynamically reevaluated. 8 8 Furthermore, scheduling must prevent kernel threads from blocking, otherwise user-thread parallelism drops. 9 9 When user-threading parallelism does drop, how and when should idle kernel-threads be put to sleep to avoid wasting CPU resources. … … 13 13 This thesis analyses multiple scheduler systems, where each system attempts to fulfill the necessary requirements for \gls{uthrding}. 14 14 The predominant technique for managing high levels of concurrency is sharding the ready-queue with one queue per kernel-thread and using some form of work stealing/sharing to dynamically rebalance workload shifts. 15 Preventing kernel blocking is accomplish by transforming kernel locks and I/O operations into user-level operations that do not block the kernel thread or spin up new kernel threads to manage the blocking.15 Preventing kernel blocking is accomplished by transforming kernel locks and I/O operations into user-level operations that do not block the kernel thread or spin up new kernel threads to manage the blocking. 16 16 Fairness is handled through preemption and/or ad-hoc solutions, which leads to coarse-grained fairness with some pathological cases. 17 17 18 18 After examining, testing and selecting specific approaches to these scheduling issues, a completely new scheduler was created and tested in the \CFA (C-for-all) user-threading runtime-system. 19 19 The goal of the new scheduler is to offer increased safety and productivity without sacrificing performance. 20 The quality of the new scheduler is demonstrated by comparing it with other user-threading work-stealing schedulers with the aim of showing equivalent or better performance while offering better fairness.20 The quality of the new scheduler is demonstrated by comparing it with other user-threading work-stealing schedulers with, the aim of showing equivalent or better performance while offering better fairness. 21 21 22 22 Chapter~\ref{intro} defines scheduling and its general goals. 23 23 Chapter~\ref{existing} discusses how scheduler implementations attempt to achieve these goals, but all implementations optimize some workloads better than others. 24 Chapter~\ref{cfaruntime} presents the relevant aspects of the \CFA runtime system that have a significant affect on the new scheduler design and implementation.24 Chapter~\ref{cfaruntime} presents the relevant aspects of the \CFA runtime system that have a significant effect on the new scheduler design and implementation. 25 25 Chapter~\ref{core} analyses different scheduler approaches, while looking for scheduler mechanisms that provide both performance and fairness. 26 26 Chapter~\ref{userio} covers the complex mechanisms that must be used to achieve nonblocking I/O to prevent the blocking of \glspl{kthrd}. … … 31 31 \section{Scheduling}\label{sched} 32 32 Computer systems share multiple resources across many threads of execution, even on single-user computers like laptops or smartphones. 33 On a computer system with multiple processors and work units (routines, coroutines, threads, programs, \etc), there exists the problem of mapping many different kinds of work units onto many different kinds of processors in an efficient manner, called \newterm{scheduling}.33 On a computer system with multiple processors and work units (routines, coroutines, threads, programs, \etc), there exists the problem of mapping many different kinds of work units onto many different kinds of processors efficiently, called \newterm{scheduling}. 34 34 Scheduling systems are normally \newterm{open}, meaning new work arrives from an external source or is randomly spawned from an existing work unit. 35 35 In general, work units without threads, like routines and coroutines, are self-scheduling, while work units with threads, like tasks and programs, are scheduled. … … 39 39 However, optimal solutions are often not required: schedulers often produce excellent solutions, without needing optimality, by taking advantage of regularities in work patterns. 40 40 41 Scheduling occurs at discre etpoints when there are transitions in a system.42 For example, a threadcycles through the following transitions during its execution.41 Scheduling occurs at discrete points when there are transitions in a system. 42 For example, a \at cycles through the following transitions during its execution. 43 43 \begin{center} 44 44 \input{executionStates.pstex_t} … … 49 49 entering the system (new $\rightarrow$ ready) 50 50 \item 51 scheduler assigns a threadto a computing resource, \eg CPU (ready $\rightarrow$ running)51 scheduler assigns a \at to a computing resource, \eg CPU (ready $\rightarrow$ running) 52 52 \item 53 53 timer alarm for preemption (running $\rightarrow$ ready) … … 59 59 normal completion or error, \eg segment fault (running $\rightarrow$ halted) 60 60 \end{itemize} 61 Key to scheduling is that a threadcannot bypass the ``ready'' state during a transition so the scheduler maintains complete control of the system, \ie no self-scheduling among threads.61 Key to scheduling is that a \at cannot bypass the ``ready'' state during a transition so the scheduler maintains complete control of the system, \ie no self-scheduling among threads. 62 62 63 63 When the workload exceeds the capacity of the processors, \ie work cannot be executed immediately, it is placed on a queue for subsequent service, called a \newterm{ready queue}. … … 71 71 \end{tabular} 72 72 \end{center} 73 Beyond these two schedulers are a host of options, \eg adding a n global shared queue to MQMS or adding multiple private queues with distinccharacteristics.73 Beyond these two schedulers are a host of options, \eg adding a global shared queue to MQMS or adding multiple private queues with distinct characteristics. 74 74 75 75 Once there are multiple resources and ready queues, a scheduler is faced with three major optimization criteria: … … 86 86 Essentially, all multi-processor computers have non-uniform memory access (NUMA), with one or more quantized steps to access data at different levels in the memory hierarchy. 87 87 When a system has a large number of independently executing threads, affinity becomes difficult because of \newterm{thread churn}. 88 That is, threads must be scheduled on different processors to obtain high processor sutilization because the number of threads $\ggg$ processors.88 That is, threads must be scheduled on different processors to obtain high processor utilization because the number of threads $\ggg$ processors. 89 89 90 90 \item … … 118 118 More specifically, safety and productivity for scheduling means supporting a wide range of workloads so that programmers can rely on progress guarantees (safety) and more easily achieve acceptable performance (productivity). 119 119 The new scheduler also includes support for implicit nonblocking \io, allowing applications to have more user-threads blocking on \io operations than there are \glspl{kthrd}. 120 To complete the scheduler, an idle sleep mechanism is implemented that significantly reduces wasted CPU cycles, which are then available outside ofthe application.120 To complete the scheduler, an idle sleep mechanism is implemented that significantly reduces wasted CPU cycles, which are then available outside the application. 121 121 122 122 As a research project, this work builds exclusively on newer versions of the Linux operating-system and gcc/clang compilers.
Note: See TracChangeset
for help on using the changeset viewer.