Changeset fc96890 for doc/theses/thierry_delisle_PhD/thesis/text/intro.tex
- Timestamp:
- Sep 13, 2022, 3:07:25 PM (2 years ago)
- Branches:
- ADT, ast-experimental, master, pthread-emulation
- Children:
- 3606fe4
- Parents:
- 1c334d1
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/thierry_delisle_PhD/thesis/text/intro.tex
r1c334d1 rfc96890 3 3 \Gls{uthrding} (M:N) is gaining popularity over kernel-level threading (1:1) in many programming languages. 4 4 The user threading approach is often a better mechanism to express complex concurrent applications by efficiently running 10,000+ threads on multicore systems. 5 Indeed, over-partitioning into small work -units with user threading significantly eases load bal\-ancing, while simultaneously providing advanced synchronization and mutual exclusion capabilities.5 Indeed, over-partitioning into small work units with user threading significantly eases load bal\-ancing, while simultaneously providing advanced synchronization and mutual exclusion capabilities. 6 6 To manage these high levels of concurrency, the underlying runtime must efficiently schedule many user threads across a few kernel threads; 7 7 which begs the question of how many kernel threads are needed and should the number be dynamically reevaluated. … … 11 11 otherwise, other user threads can experience short/long term starvation or kernel threads can deadlock waiting for events to occur on busy kernel threads. 12 12 13 This thesis analy ses multiple scheduler systems, where each system attempts to fulfill the requirements for \gls{uthrding}.14 The predominant technique for managing high levels of concurrency is sharding the ready queue with one queue per kernel -thread and using some form of work stealing/sharing to dynamically rebalance workload shifts.13 This thesis analyzes multiple scheduler systems, where each system attempts to fulfill the requirements for \gls{uthrding}. 14 The predominant technique for managing high levels of concurrency is sharding the ready queue with one queue per kernel thread and using some form of work stealing/sharing to dynamically rebalance workload shifts. 15 15 Preventing kernel blocking is accomplished by transforming kernel locks and I/O operations into user-level operations that do not block the kernel thread or spin up new kernel threads to manage the blocking. 16 Fairness is handled through preemption and/or ad -hoc solutions, which leads to coarse-grained fairness with some pathological cases.16 Fairness is handled through preemption and/or ad hoc solutions, which leads to coarse-grained fairness with some pathological cases. 17 17 18 After examining, testing and selecting specific approaches to these scheduling issues, a completely new scheduler was created and tested in the \CFA (C-for-all) user-threading runtime -system.18 After examining, testing and selecting specific approaches to these scheduling issues, a completely new scheduler was created and tested in the \CFA (C-for-all) user-threading runtime system. 19 19 The goal of the new scheduler is to offer increased safety and productivity without sacrificing performance. 20 20 The quality of the new scheduler is demonstrated by comparing it with other user-threading work-stealing schedulers with, the aim of showing equivalent or better performance while offering better fairness. … … 35 35 In general, work units without threads, like routines and coroutines, are self-scheduling, while work units with threads, like tasks and programs, are scheduled. 36 36 For scheduled work-units, a scheduler takes a sequence of threads and attempts to run them to completion, subject to shared resource restrictions and utilization. 37 A general-purpose dynamic-scheduler for an open systemcannot anticipate work requests, so its performance is rarely optimal.37 In an open system, a general-purpose dynamic scheduler cannot anticipate work requests, so its performance is rarely optimal. 38 38 Even with complete knowledge of arrival order and work, creating an optimal solution is a bin packing problem~\cite{wiki:binpak}. 39 39 However, optimal solutions are often not required: schedulers often produce excellent solutions, without needing optimality, by taking advantage of regularities in work patterns. … … 53 53 timer alarm for preemption (running $\rightarrow$ ready) 54 54 \item 55 long 55 long-term delay versus spinning (running $\rightarrow$ blocked) 56 56 \item 57 57 completion of delay, \eg network or I/O completion (blocked $\rightarrow$ ready) … … 84 84 85 85 \noindent 86 Essentially, all multi -processor computers have non-uniform memory access (NUMA), with one or more quantized steps to access data at different levels in the memory hierarchy.86 Essentially, all multiprocessor computers have non-uniform memory access (NUMA), with one or more quantized steps to access data at different levels in the memory hierarchy. 87 87 When a system has a large number of independently executing threads, affinity becomes difficult because of \newterm{thread churn}. 88 88 That is, threads must be scheduled on different processors to obtain high processor utilization because the number of threads $\ggg$ processors. … … 120 120 To complete the scheduler, an idle sleep mechanism is implemented that significantly reduces wasted CPU cycles, which are then available outside the application. 121 121 122 As a research project, this work builds exclusively on newer versions of the Linux operating -system and gcc/clang compilers.122 As a research project, this work builds exclusively on newer versions of the Linux operating system and gcc/clang compilers. 123 123 The new scheduler implementation uses several optimizations to successfully balance the cost of fairness against performance; 124 124 some of these optimizations rely on interesting hardware optimizations only present on modern CPUs. 125 The \io implementation is based on the @io_uring@ kernel -interface, a recent addition to the Linux kernel, because it purports to handle nonblocking \emph{file} and network \io.125 The \io implementation is based on the @io_uring@ kernel interface, a recent addition to the Linux kernel, because it purports to handle nonblocking \emph{file} and network \io. 126 126 This decision allowed an interesting performance and fairness comparison with other threading systems using @select@, @epoll@, \etc. 127 127 While the current \CFA release supports older versions of Linux ($\ge$~Ubuntu 16.04) and gcc/clang compilers ($\ge$~gcc 6.0), it is not the purpose of this project to find workarounds in these older systems to provide backwards compatibility. … … 129 129 130 130 \section{Contributions}\label{s:Contributions} 131 This work provides the following scheduling contributions for advanced \gls{uthrding} runtime -systems:131 This work provides the following scheduling contributions for advanced \gls{uthrding} runtime systems: 132 132 \begin{enumerate}[leftmargin=*] 133 133 \item … … 140 140 A mechanism for adding fairness on top of MQMS algorithm through helping, used both for scalable scheduling algorithm and the user-level \glsxtrshort{io}. 141 141 \item 142 An optimization of the helping -mechanism for load balancing to reduce scheduling costs.142 An optimization of the helping mechanism for load balancing to reduce scheduling costs. 143 143 \item 144 144 An optimization for the alternative relaxed-list for load balancing to reduce scheduling costs in embarrassingly parallel cases.
Note: See TracChangeset
for help on using the changeset viewer.