Changeset 06bdba4
- Timestamp:
- Jun 30, 2022, 11:33:20 AM (2 years ago)
- Branches:
- ADT, ast-experimental, master, pthread-emulation, qualifiedEnum
- Children:
- 25404c7
- Parents:
- fc2c57a9 (diff), adf03a6 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - Files:
-
- 2 added
- 12 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/thierry_delisle_PhD/thesis/local.bib
rfc2c57a9 r06bdba4 701 701 note = "[Online; accessed 12-April-2022]" 702 702 } 703 @misc{wiki:binpak, 704 author = "{Wikipedia contributors}", 705 title = "Bin packing problem --- {W}ikipedia{,} The Free Encyclopedia", 706 year = "2022", 707 url = "https://en.wikipedia.org/wiki/Bin_packing_problem", 708 note = "[Online; accessed 29-June-2022]" 709 } 703 710 704 711 % RMR notes : -
doc/theses/thierry_delisle_PhD/thesis/text/existing.tex
rfc2c57a9 r06bdba4 14 14 15 15 \section{Naming Convention} 16 Scheduling has been studied by various communities concentrating on different incarnation of the same problems. As a result, there is no standard naming conventions for scheduling that is respected across these communities. This document uses the term \newterm{\Gls{at}} to refer to the abstract objects being scheduled and the term \newterm{\Gls{proc}} to refer to the concrete objects executing these \glspl{at}.16 Scheduling has been studied by various communities concentrating on different incarnation of the same problems. As a result, there are no standard naming conventions for scheduling that is respected across these communities. This document uses the term \newterm{\Gls{at}} to refer to the abstract objects being scheduled and the term \newterm{\Gls{proc}} to refer to the concrete objects executing these \ats. 17 17 18 18 \section{Static Scheduling} 19 \newterm{Static schedulers} require \ gls{at}dependencies and costs be explicitly and exhaustively specified prior to scheduling.19 \newterm{Static schedulers} require \ats dependencies and costs be explicitly and exhaustively specified prior to scheduling. 20 20 The scheduler then processes this input ahead of time and produces a \newterm{schedule} the system follows during execution. 21 21 This approach is popular in real-time systems since the need for strong guarantees justifies the cost of determining and supplying this information. 22 In general, static schedulers are less relevant to this project because they require input from the programmers that aprogramming language does not have as part of its concurrency semantic.22 In general, static schedulers are less relevant to this project because they require input from the programmers that the programming language does not have as part of its concurrency semantic. 23 23 Specifying this information explicitly adds a significant burden to the programmer and reduces flexibility. 24 24 For this reason, the \CFA scheduler does not require this information. 25 25 26 26 \section{Dynamic Scheduling} 27 \newterm{Dynamic schedulers} determine \gls{at} dependencies and costs (if at all) during scheduling. 28 % Schedulers that support this detection at runtime are referred to as \newterm{Dynamic Schedulers}. 29 Hence, unlike static scheduling, \gls{at} dependencies are conditional and detected at runtime. This detection takes the form of observing new \gls{at}(s) in the system and determining dependencies from their behaviour, including suspending or halting a \gls{at} that dynamically detects unfulfilled dependencies. Furthermore, each \gls{at} has the responsibility of adding dependent \glspl{at} back into the system once dependencies are fulfilled. As a consequence, the scheduler often has an incomplete view of the system, seeing only \glspl{at} with no pending dependencies. 27 \newterm{Dynamic schedulers} determine \ats dependencies and costs during scheduling, if at all. 28 Hence, unlike static scheduling, \ats dependencies are conditional and detected at runtime. This detection takes the form of observing new \ats(s) in the system and determining dependencies from their behaviour, including suspending or halting a \ats that dynamically detects unfulfilled dependencies. 29 Furthermore, each \ats has the responsibility of adding dependent \ats back into the system once dependencies are fulfilled. 30 As a consequence, the scheduler often has an incomplete view of the system, seeing only \ats with no pending dependencies. 30 31 31 32 \subsection{Explicitly Informed Dynamic Schedulers} 32 While dynamic schedulers may not have an exhaustive list of dependencies for a \gls{at}, some information may be available about each \gls{at}, \eg expected duration, required resources, relative importance, \etc. When available, a scheduler can then use this information to direct the scheduling decisions. \cit{Examples of schedulers with more information} However, most programmers do not determine or even \emph{predict} this information; 33 at best, the scheduler has only some imprecision information provided by the programmer, \eg, indicating a \glspl{at} takes approximately 3--7 seconds to complete, rather than exactly 5 seconds. Providing this kind of information is a significant programmer burden especially if the the information does not scale with the number of \glspl{at} and their complexity. For example, providing an exhaustive list of files read by 5 \glspl{at} is an easier requirement then providing an exhaustive list of memory addresses accessed by 10,000 independent \glspl{at}. 34 35 Since the goal of this thesis is to provide an \emph{informed} scheduler as a replacement for \CFA's existing \emph{uninformed} scheduler, explicitly informed schedulers are less relevant to this project. Nevertheless, some strategies are worth mentioning. 33 While dynamic schedulers may not have an exhaustive list of dependencies for a \ats, some information may be available about each \ats, \eg expected duration, required resources, relative importance, \etc. 34 When available, a scheduler can then use this information to direct the scheduling decisions. \cit{Examples of schedulers with more information} 35 However, most programmers do not determine or even \emph{predict} this information; 36 at best, the scheduler has only some imprecise information provided by the programmer, \eg, indicating a \ats takes approximately 3--7 seconds to complete, rather than exactly 5 seconds. 37 Providing this kind of information is a significant programmer burden especially if the information does not scale with the number of \ats and their complexity. 38 For example, providing an exhaustive list of files read by 5 \ats is an easier requirement then providing an exhaustive list of memory addresses accessed by 10,000 independent \ats. 39 40 Since the goal of this thesis is to provide a scheduler as a replacement for \CFA's existing \emph{uninformed} scheduler, explicitly informed schedulers are less relevant to this project. Nevertheless, some strategies are worth mentioning. 36 41 37 42 \subsubsection{Priority Scheduling} 38 Common information used by schedulers to direct their algorithm is priorities. Each task is given a priority and higher-priority \glspl{at} are preferred to lower-priority ones. The simplest priority scheduling algorithm is to require that every \gls{at} have a distinct pre-established priority and always run the available \gls{at} with the highest priority. Asking programmers to provide an exhaustive set of unique priorities can be prohibitive when the system has a large number of \glspl{at}. It can therefore be desirable for schedulers to support \glspl{at} with identical priorities and/or automatically setting and adjusting priorities for \glspl{at}. The most common adopting some variant on priorities with overlaps and dynamic priority adjustments. For example, Microsoft Windows uses a pair of priorities 43 Common information used by schedulers to direct their algorithm is priorities. 44 Each \ats is given a priority and higher-priority \ats are preferred to lower-priority ones. 45 The simplest priority scheduling algorithm is to require that every \ats have a distinct pre-established priority and always run the available \ats with the highest priority. 46 Asking programmers to provide an exhaustive set of unique priorities can be prohibitive when the system has a large number of \ats. 47 It can therefore be desirable for schedulers to support \ats with identical priorities and/or automatically setting and adjusting priorities for \ats. 48 Most common operating systems use some variant on priorities with overlaps and dynamic priority adjustments. 49 For example, Microsoft Windows uses a pair of priorities 39 50 \cit{https://docs.microsoft.com/en-us/windows/win32/procthread/scheduling-priorities,https://docs.microsoft.com/en-us/windows/win32/taskschd/taskschedulerschema-priority-settingstype-element}, one specified by users out of ten possible options and one adjusted by the system. 40 51 41 52 \subsection{Uninformed and Self-Informed Dynamic Schedulers} 42 Several scheduling algorithms do not require programmers to provide additional information on each \ gls{at}, and instead make scheduling decisions based solely on internal state and/or information implicitly gathered by the scheduler.53 Several scheduling algorithms do not require programmers to provide additional information on each \ats, and instead make scheduling decisions based solely on internal state and/or information implicitly gathered by the scheduler. 43 54 44 55 45 56 \subsubsection{Feedback Scheduling} 46 As mentioned, schedulers may also gather information about each \glspl{at} to direct their decisions. This design effectively moves the scheduler into the realm of \newterm{Control Theory}~\cite{wiki:controltheory}. This information gathering does not generally involve programmers, and as such, does not increase programmer burden the same way explicitly provided information may. However, some feedback schedulers do allow programmers to offer additional information on certain \glspl{at}, in order to direct scheduling decisions. The important distinction being whether or not the scheduler can function without this additional information. 57 As mentioned, schedulers may also gather information about each \ats to direct their decisions. 58 This design effectively moves the scheduler into the realm of \newterm{Control Theory}~\cite{wiki:controltheory}. 59 This information gathering does not generally involve programmers, and as such, does not increase programmer burden the same way explicitly provided information may. 60 However, some feedback schedulers do allow programmers to offer additional information on certain \ats, in order to direct scheduling decisions. 61 The important distinction being whether or not the scheduler can function without this additional information. 47 62 48 63 49 64 \section{Work Stealing}\label{existing:workstealing} 50 One of the most popular scheduling algorithm in practice (see~\ref{existing:prod}) is work stealing. This idea, introduce by \cite{DBLP:conf/fpca/BurtonS81}, effectively has each worker process its local \glspl{at} first, but allows the possibility for other workers to steal local \glspl{at} if they run out of \glspl{at}. \cite{DBLP:conf/focs/Blumofe94} introduced the more familiar incarnation of this, where each workers has a queue of \glspl{at} and workers without \glspl{at} steal \glspl{at} from random workers. (The Burton and Sleep algorithm had trees of \glspl{at} and steal only among neighbours). Blumofe and Leiserson also prove worst case space and time requirements for well-structured computations. 65 One of the most popular scheduling algorithm in practice (see~\ref{existing:prod}) is work stealing. 66 This idea, introduce by \cite{DBLP:conf/fpca/BurtonS81}, effectively has each worker process its local \ats first, but allows the possibility for other workers to steal local \ats if they run out of \ats. 67 \cite{DBLP:conf/focs/Blumofe94} introduced the more familiar incarnation of this, where each workers has a queue of \ats and workers without \ats steal \ats from random workers\footnote{The Burton and Sleep algorithm had trees of \ats and steal only among neighbours.}. 68 Blumofe and Leiserson also prove worst case space and time requirements for well-structured computations. 51 69 52 70 Many variations of this algorithm have been proposed over the years~\cite{DBLP:journals/ijpp/YangH18}, both optimizations of existing implementations and approaches that account for new metrics. 53 71 54 \paragraph{Granularity} A significant portion of early work-stealing research concentrated on \newterm{Implicit Parallelism}~\cite{wiki:implicitpar}. Since the system is responsible for splitting the work, granularity is a challenge that cannot be left to programmers (as opposed to \newterm{Explicit Parallelism}\cite{wiki:explicitpar} where the burden can be left to programmers). In general, fine granularity is better for load balancing and coarse granularity reduces communication overhead. The best performance generally means finding a middle ground between the two. Several methods can be employed, but I believe these are less relevant for threads, which are generally explicit and more coarse grained. 55 56 \paragraph{Task Placement} Since modern computers rely heavily on cache hierarchies\cit{Do I need a citation for this}, migrating \glspl{at} from one core to another can be . \cite{DBLP:journals/tpds/SquillanteL93} 72 \paragraph{Granularity} A significant portion of early work-stealing research concentrated on \newterm{Implicit Parallelism}~\cite{wiki:implicitpar}. 73 Since the system is responsible for splitting the work, granularity is a challenge that cannot be left to programmers, as opposed to \newterm{Explicit Parallelism}\cite{wiki:explicitpar} where the burden can be left to programmers. 74 In general, fine granularity is better for load balancing and coarse granularity reduces communication overhead. 75 The best performance generally means finding a middle ground between the two. 76 Several methods can be employed, but I believe these are less relevant for threads, which are generally explicit and more coarse grained. 77 78 \paragraph{Task Placement} Since modern computers rely heavily on cache hierarchies\cit{Do I need a citation for this}, migrating \ats from one core to another can be . \cite{DBLP:journals/tpds/SquillanteL93} 57 79 58 80 \todo{The survey is not great on this subject} … … 61 83 62 84 \subsection{Theoretical Results} 63 There is also a large body of research on the theoretical aspects of work stealing. These evaluate, for example, the cost of migration~\cite{DBLP:conf/sigmetrics/SquillanteN91,DBLP:journals/pe/EagerLZ86}, how affinity affects performance~\cite{DBLP:journals/tpds/SquillanteL93,DBLP:journals/mst/AcarBB02,DBLP:journals/ipl/SuksompongLS16} and theoretical models for heterogeneous systems~\cite{DBLP:journals/jpdc/MirchandaneyTS90,DBLP:journals/mst/BenderR02,DBLP:conf/sigmetrics/GastG10}. \cite{DBLP:journals/jacm/BlellochGM99} examines the space bounds of work stealing and \cite{DBLP:journals/siamcomp/BerenbrinkFG03} shows that for under-loaded systems, the scheduler completes its computations in finite time, \ie is \newterm{stable}. Others show that work stealing is applicable to various scheduling contexts~\cite{DBLP:journals/mst/AroraBP01,DBLP:journals/anor/TchiboukdjianGT13,DBLP:conf/isaac/TchiboukdjianGTRB10,DBLP:conf/ppopp/AgrawalLS10,DBLP:conf/spaa/AgrawalFLSSU14}. \cite{DBLP:conf/ipps/ColeR13} also studied how randomized work-stealing affects false sharing among \glspl{at}. 64 65 However, as \cite{DBLP:journals/ijpp/YangH18} highlights, it is worth mentioning that this theoretical research has mainly focused on ``fully-strict'' computations, \ie workloads that can be fully represented with a direct acyclic graph. It is unclear how well these distributions represent workloads in real world scenarios. 85 There is also a large body of research on the theoretical aspects of work stealing. These evaluate, for example, the cost of migration~\cite{DBLP:conf/sigmetrics/SquillanteN91,DBLP:journals/pe/EagerLZ86}, how affinity affects performance~\cite{DBLP:journals/tpds/SquillanteL93,DBLP:journals/mst/AcarBB02,DBLP:journals/ipl/SuksompongLS16} and theoretical models for heterogeneous systems~\cite{DBLP:journals/jpdc/MirchandaneyTS90,DBLP:journals/mst/BenderR02,DBLP:conf/sigmetrics/GastG10}. 86 \cite{DBLP:journals/jacm/BlellochGM99} examines the space bounds of work stealing and \cite{DBLP:journals/siamcomp/BerenbrinkFG03} shows that for under-loaded systems, the scheduler completes its computations in finite time, \ie is \newterm{stable}. 87 Others show that work stealing is applicable to various scheduling contexts~\cite{DBLP:journals/mst/AroraBP01,DBLP:journals/anor/TchiboukdjianGT13,DBLP:conf/isaac/TchiboukdjianGTRB10,DBLP:conf/ppopp/AgrawalLS10,DBLP:conf/spaa/AgrawalFLSSU14}. 88 \cite{DBLP:conf/ipps/ColeR13} also studied how randomized work-stealing affects false sharing among \ats. 89 90 However, as \cite{DBLP:journals/ijpp/YangH18} highlights, it is worth mentioning that this theoretical research has mainly focused on ``fully-strict'' computations, \ie workloads that can be fully represented with a direct acyclic graph. 91 It is unclear how well these distributions represent workloads in real world scenarios. 66 92 67 93 \section{Preemption} 68 One last aspect of scheduling is preemption since many schedulers rely on it for some of their guarantees. Preemption is the idea of interrupting \glspl{at} that have been running too long, effectively injecting suspend points into the application. There are multiple techniques to achieve this effect but they all aim to guarantee suspend points in a \gls{at} are never further apart than some fixed duration. While this helps schedulers guarantee that no \glspl{at} unfairly monopolizes a worker, preemption can effectively added to any scheduler. Therefore, the only interesting aspect of preemption for the design of scheduling is whether or not to require it. 94 One last aspect of scheduling is preemption since many schedulers rely on it for some of their guarantees. 95 Preemption is the idea of interrupting \ats that have been running too long, effectively injecting suspend points into the application. 96 There are multiple techniques to achieve this effect but they all aim to guarantee that the suspend points in a \ats are never further apart than some fixed duration. 97 While this helps schedulers guarantee that no \ats unfairly monopolizes a worker, preemption can effectively be added to any scheduler. 98 Therefore, the only interesting aspect of preemption for the design of scheduling is whether or not to require it. 69 99 70 100 \section{Production Schedulers}\label{existing:prod} 71 This section presents a quick overview of several current schedulers. While these schedulers do not necessarily represent the most recent advances in scheduling, they are what is generally accessible to programmers. As such, I believe these schedulers are at least as relevant as those presented in published work. Schedulers that operate in kernel space and in user space are considered, as both can offer relevant insight for this project. However, real-time schedulers as not considered, as these have constraints that are much stricter than what is needed for this project. 101 This section presents a quick overview of several current schedulers. 102 While these schedulers do not necessarily represent the most recent advances in scheduling, they are what is generally accessible to programmers. 103 As such, I believe these schedulers are at least as relevant as those presented in published work. 104 Schedulers that operate in kernel space and in user space are considered, as both can offer relevant insight for this project. 105 However, real-time schedulers are not considered, as these have constraints that are much stricter than what is needed for this project. 72 106 73 107 \subsection{Operating System Schedulers} 74 Operating System Schedulers tend to be fairly complex as they generally support some amount of real-time, aim to balance interactive and non-interactive \glspl{at} and support multiple users sharing hardware without requiring these users to cooperate. Here are more details on a few schedulers used in the common operating systems: Linux, FreeBSD, Microsoft Windows and Apple's OS X. The information is less complete for operating systems with closed source. 108 Operating System Schedulers tend to be fairly complex as they generally support some amount of real-time, aim to balance interactive and non-interactive \ats and support multiple users sharing hardware without requiring these users to cooperate. 109 Here are more details on a few schedulers used in the common operating systems: Linux, FreeBSD, Microsoft Windows and Apple's OS X. 110 The information is less complete for operating systems with closed source. 75 111 76 112 \paragraph{Linux's CFS} 77 The default scheduler used by Linux (the Completely Fair Scheduler)~\cite{MAN:linux/cfs,MAN:linux/cfs2} is a feedback scheduler based on CPU time. For each processor, it constructs a Red-Black tree of \glspl{at} waiting to run, ordering them by the amount of CPU time used. The \gls{at} that has used the least CPU time is scheduled. It also supports the concept of \newterm{Nice values}, which are effectively multiplicative factors on the CPU time used. The ordering of \glspl{at} is also affected by a group based notion of fairness, where \glspl{at} belonging to groups having used less CPU time are preferred to \glspl{at} belonging to groups having used more CPU time. Linux achieves load-balancing by regularly monitoring the system state~\cite{MAN:linux/cfs/balancing} and using some heuristic on the load (currently CPU time used in the last millisecond plus a decayed version of the previous time slots~\cite{MAN:linux/cfs/pelt}.). 78 79 \cite{DBLP:conf/eurosys/LoziLFGQF16} shows that Linux's CFS also does work stealing to balance the workload of each processors, but the paper argues this aspect can be improved significantly. The issues highlighted stem from Linux's need to support fairness across \glspl{at} \emph{and} across users\footnote{Enforcing fairness across users means that given two users, one with a single \gls{at} and the other with one thousand \glspl{at}, the user with a single \gls{at} does not receive one thousandth of the CPU time.}, increasing the complexity. 80 81 Linux also offers a FIFO scheduler, a real-time scheduler, which runs the highest-priority \gls{at}, and a round-robin scheduler, which is an extension of the FIFO-scheduler that adds fixed time slices. \cite{MAN:linux/sched} 113 The default scheduler used by Linux, the Completely Fair Scheduler~\cite{MAN:linux/cfs,MAN:linux/cfs2}, is a feedback scheduler based on CPU time. 114 For each processor, it constructs a Red-Black tree of \ats waiting to run, ordering them by the amount of CPU time used. 115 The \ats that has used the least CPU time is scheduled. 116 It also supports the concept of \newterm{Nice values}, which are effectively multiplicative factors on the CPU time used. 117 The ordering of \ats is also affected by a group based notion of fairness, where \ats belonging to groups having used less CPU time are preferred to \ats belonging to groups having used more CPU time. 118 Linux achieves load-balancing by regularly monitoring the system state~\cite{MAN:linux/cfs/balancing} and using some heuristic on the load, currently CPU time used in the last millisecond plus a decayed version of the previous time slots~\cite{MAN:linux/cfs/pelt}. 119 120 \cite{DBLP:conf/eurosys/LoziLFGQF16} shows that Linux's CFS also does work stealing to balance the workload of each processors, but the paper argues this aspect can be improved significantly. 121 The issues highlighted stem from Linux's need to support fairness across \ats \emph{and} across users\footnote{Enforcing fairness across users means that given two users, one with a single \ats and the other with one thousand \ats, the user with a single \ats does not receive one thousandth of the CPU time.}, increasing the complexity. 122 123 Linux also offers a FIFO scheduler, a real-time scheduler, which runs the highest-priority \ats, and a round-robin scheduler, which is an extension of the FIFO-scheduler that adds fixed time slices. \cite{MAN:linux/sched} 82 124 83 125 \paragraph{FreeBSD} 84 The ULE scheduler used in FreeBSD\cite{DBLP:conf/bsdcon/Roberson03} is a feedback scheduler similar to Linux's CFS. It uses different data structures and heuristics but also schedules according to some combination of CPU time used and niceness values. It also periodically balances the load of the system (according to a different heuristic), but uses a simpler work stealing approach. 126 The ULE scheduler used in FreeBSD\cite{DBLP:conf/bsdcon/Roberson03} is a feedback scheduler similar to Linux's CFS. 127 It uses different data structures and heuristics but also schedules according to some combination of CPU time used and niceness values. 128 It also periodically balances the load of the system (according to a different heuristic), but uses a simpler work stealing approach. 85 129 86 130 \paragraph{Windows(OS)} 87 Microsoft's Operating System's Scheduler~\cite{MAN:windows/scheduler} is a feedback scheduler with priorities. It supports 32 levels of priorities, some of which are reserved for real-time and privileged applications. It schedules \glspl{at} based on the highest priorities (lowest number) and how much CPU time each \gls{at} has used. The scheduler may also temporarily adjust priorities after certain effects like the completion of I/O requests. 131 Microsoft's Operating System's Scheduler~\cite{MAN:windows/scheduler} is a feedback scheduler with priorities. 132 It supports 32 levels of priorities, some of which are reserved for real-time and privileged applications. 133 It schedules \ats based on the highest priorities (lowest number) and how much CPU time each \ats has used. 134 The scheduler may also temporarily adjust priorities after certain effects like the completion of I/O requests. 88 135 89 136 \todo{load balancing} … … 102 149 103 150 \subsection{User-Level Schedulers} 104 By comparison, user level schedulers tend to be simpler, gathering fewer metrics and avoid complex notions of fairness. Part of the simplicity is due to the fact that all \ glspl{at}have the same user, and therefore cooperation is both feasible and probable.151 By comparison, user level schedulers tend to be simpler, gathering fewer metrics and avoid complex notions of fairness. Part of the simplicity is due to the fact that all \ats have the same user, and therefore cooperation is both feasible and probable. 105 152 106 153 \paragraph{Go}\label{GoSafePoint} 107 Go's scheduler uses a randomized work-stealing algorithm that has a global run-queue (\emph{GRQ}) and each processor (\emph{P}) has both a fixed-size run-queue (\emph{LRQ}) and a high-priority next ``chair'' holding a single element~\cite{GITHUB:go,YTUBE:go}. Preemption is present, but only at safe-points,~\cit{https://go.dev/src/runtime/preempt.go} which are inserted detection code at various frequent access boundaries. 154 Go's scheduler uses a randomized work-stealing algorithm that has a global run-queue (\emph{GRQ}) and each processor (\emph{P}) has both a fixed-size run-queue (\emph{LRQ}) and a high-priority next ``chair'' holding a single element~\cite{GITHUB:go,YTUBE:go}. 155 Preemption is present, but only at safe-points,~\cit{https://go.dev/src/runtime/preempt.go} which are inserted detection code at various frequent access boundaries. 108 156 109 157 The algorithm is as follows : … … 119 167 120 168 \paragraph{Erlang} 121 Erlang is a functional language that supports concurrency in the form of processes: threads that share no data. It uses a kind of round-robin scheduler, with a mix of work sharing and stealing to achieve load balancing~\cite{:erlang}, where under-loaded workers steal from other workers, but overloaded workers also push work to other workers. This migration logic is directed by monitoring logic that evaluates the load a few times per seconds. 169 Erlang is a functional language that supports concurrency in the form of processes: threads that share no data. 170 It uses a kind of round-robin scheduler, with a mix of work sharing and stealing to achieve load balancing~\cite{:erlang}, where under-loaded workers steal from other workers, but overloaded workers also push work to other workers. 171 This migration logic is directed by monitoring logic that evaluates the load a few times per seconds. 122 172 123 173 \paragraph{Intel\textregistered ~Threading Building Blocks} 124 \newterm{Thread Building Blocks} (TBB) is Intel's task parallelism \cite{wiki:taskparallel} framework. It runs \newterm{jobs}, which are uninterruptable \glspl{at} that must always run to completion, on a pool of worker threads. TBB's scheduler is a variation of randomized work-stealing that also supports higher-priority graph-like dependencies~\cite{MAN:tbb/scheduler}. It schedules \glspl{at} as follows (where \textit{t} is the last \gls{at} completed): 174 \newterm{Thread Building Blocks} (TBB) is Intel's task parallelism \cite{wiki:taskparallel} framework. 175 It runs \newterm{jobs}, which are uninterruptable \ats that must always run to completion, on a pool of worker threads. 176 TBB's scheduler is a variation of randomized work-stealing that also supports higher-priority graph-like dependencies~\cite{MAN:tbb/scheduler}. 177 It schedules \ats as follows (where \textit{t} is the last \ats completed): 125 178 \begin{displayquote} 126 179 \begin{enumerate} … … 139 192 140 193 \paragraph{Quasar/Project Loom} 141 Java has two projects, Quasar~\cite{MAN:quasar} and Project Loom~\cite{MAN:project-loom}\footnote{It is unclear if these are distinct projects.}, that are attempting to introduce lightweight thread\-ing in the form of Fibers. Both projects seem to be based on the \texttt{ForkJoinPool} in Java, which appears to be a simple incarnation of randomized work-stealing~\cite{MAN:java/fork-join}. 194 Java has two projects, Quasar~\cite{MAN:quasar} and Project Loom~\cite{MAN:project-loom}\footnote{It is unclear if these are distinct projects.}, that are attempting to introduce lightweight thread\-ing in the form of Fibers. 195 Both projects seem to be based on the \texttt{ForkJoinPool} in Java, which appears to be a simple incarnation of randomized work-stealing~\cite{MAN:java/fork-join}. 142 196 143 197 \paragraph{Grand Central Dispatch} 144 An Apple\cit{Official GCD source} API that offers task parallelism~\cite{wiki:taskparallel}. Its distinctive aspect is multiple ``Dispatch Queues'', some of which are created by programmers. Each queue has its own local ordering guarantees, \eg \glspl{at} on queue $A$ are executed in \emph{FIFO} order. 198 An Apple\cit{Official GCD source} API that offers task parallelism~\cite{wiki:taskparallel}. 199 Its distinctive aspect is multiple ``Dispatch Queues'', some of which are created by programmers. 200 Each queue has its own local ordering guarantees, \eg \ats on queue $A$ are executed in \emph{FIFO} order. 145 201 146 202 \todo{load balancing and scheduling} … … 148 204 % http://web.archive.org/web/20090920043909/http://images.apple.com/macosx/technology/docs/GrandCentral_TB_brief_20090903.pdf 149 205 150 In terms of semantics, the Dispatch Queues seem to be very similar in semantics to Intel\textregistered ~TBB \texttt{execute()} and predecessor semantics. % Where it would be possible to convert from one to the other.206 In terms of semantics, the Dispatch Queues seem to be very similar to Intel\textregistered ~TBB \texttt{execute()} and predecessor semantics. 151 207 152 208 \paragraph{LibFibre} 153 LibFibre~\cite{DBLP:journals/pomacs/KarstenB20} is a light-weight user-level threading framework developed at the University of Waterloo. Similarly to Go, it uses a variation of work stealing with a global queue that is higher priority than stealing. Unlike Go, it does not have the high-priority next ``chair'' and does not use randomized work-stealing. 209 LibFibre~\cite{DBLP:journals/pomacs/KarstenB20} is a light-weight user-level threading framework developed at the University of Waterloo. 210 Similarly to Go, it uses a variation of work stealing with a global queue that is higher priority than stealing. 211 Unlike Go, it does not have the high-priority next ``chair'' and does not use randomized work-stealing. -
doc/theses/thierry_delisle_PhD/thesis/text/intro.tex
rfc2c57a9 r06bdba4 1 1 \chapter{Introduction}\label{intro} 2 \ todo{A proper intro}2 \section{\CFA programming language} 3 3 4 Any shared system needs scheduling. 5 Computer systems share multiple resources across many threads of execution, even on single user computers like a laptop. 4 The \CFA programming language~\cite{cfa:frontpage,cfa:typesystem} extends the C programming language by adding modern safety and productivity features, while maintaining backwards compatibility. 5 Among its productivity features, \CFA supports user-level threading~\cite{Delisle21} allowing programmers to write modern concurrent and parallel programs. 6 My previous master's thesis on concurrent in \CFA focused on features and interfaces. 7 This Ph.D.\ thesis focuses on performance, introducing \glsxtrshort{api} changes only when required by performance considerations. 8 Specifically, this work concentrates on scheduling and \glsxtrshort{io}. 9 Prior to this work, the \CFA runtime used a strict \glsxtrshort{fifo} \gls{rQ} and no \glsxtrshort{io} capabilities at the user-thread level\footnote{C supports \glsxtrshort{io} capabilities at the kernel level, which means blocking operations block kernel threads where blocking user-level threads whould be more appropriate for \CFA.}. 10 11 As a research project, this work builds exclusively on newer versions of the Linux operating-system and gcc/clang compilers. 12 While \CFA is released, supporting older versions of Linux ($<$~Ubuntu 16.04) and gcc/clang compilers ($<$~gcc 6.0) is not a goal of this work. 13 14 \section{Scheduling} 15 Computer systems share multiple resources across many threads of execution, even on single user computers like laptops or smartphones. 16 On a computer system with multiple processors and work units, there exists the problem of mapping work onto processors in an efficient manner, called \newterm{scheduling}. 17 These systems are normally \newterm{open}, meaning new work arrives from an external source or is spawned from an existing work unit. 18 On a computer system, the scheduler takes a sequence of work requests in the form of threads and attempts to complete the work, subject to performance objectives, such as resource utilization. 19 A general-purpose dynamic-scheduler for an open system cannot anticipate future work requests, so its performance is rarely optimal. 20 With complete knowledge of arrive order and work, creating an optimal solution still effectively needs solving the bin packing problem\cite{wiki:binpak}. 21 However, optimal solutions are often not required. 22 Schedulers do produce excellent solutions, whitout needing optimality, by taking advantage of regularities in work patterns. 23 6 24 Scheduling occurs at discreet points when there are transitions in a system. 7 25 For example, a thread cycles through the following transitions during its execution. … … 21 39 \item 22 40 normal completion or error, \ie segment fault (running $\rightarrow$ halted) 41 \item 42 scheduler assigns a thread to a resource (ready $\rightarrow$ running) 23 43 \end{itemize} 24 44 Key to scheduling is that a thread cannot bypass the ``ready'' state during a transition so the scheduler maintains complete control of the system. 25 45 26 In detail, in a computer system with multiple processors and work units, there exists the problem of mapping work onto processors in an efficient manner, called \newterm{scheduling}.27 These systems are normally \newterm{open}, meaning new work arrives from an external source or spawned from an existing work unit.28 Scheduling is a zero-sum game as computer processors normally have a fixed, maximum number of cycles per unit time.29 (Frequency scaling and turbot boost add a degree of complexity that can be ignored in this discussion without loss of generality.)30 31 On a computer system, the scheduler takes a sequence of work requests in the form of threads and attempts to complete the work, subject to performance objectives, such as resource utilization.32 A general-purpose dynamic-scheduler for an open system cannot anticipate future work requests, so its performance is rarely optimal.33 Even with complete knowledge of arrive order and work, an optimal solution is NP (bin packing).34 However, schedulers do take advantage of regularities in work patterns to produce excellent solutions.35 Nevertheless, schedulers are a series of compromises, occasionally with some static or dynamic tuning parameters to enhance specific patterns.36 37 46 When the workload exceeds the capacity of the processors, \ie work cannot be executed immediately, it is placed on a queue for subsequent service, called a \newterm{ready queue}. 38 (In real-life, a work unit (person) can \newterm{balk}, and leave the system rather than wait.)39 47 Ready queues organize threads for scheduling, which indirectly organizes the work to be performed. 40 The structure of ready queues forms a spectrum.41 At one end is the single-queue multi-server (SIMS) and at the other end is the multi-queue multi-server (MOMS).48 The structure of ready queues can take many different forms. 49 Where simple examples include single-queue multi-server (SQMS) and the multi-queue multi-server (MQMS). 42 50 \begin{center} 43 51 \begin{tabular}{l|l} 44 \multicolumn{1}{c|}{\textbf{S IMS}} & \multicolumn{1}{c}{\textbf{MOMS}} \\52 \multicolumn{1}{c|}{\textbf{SQMS}} & \multicolumn{1}{c}{\textbf{MQMS}} \\ 45 53 \hline 46 54 \raisebox{0.5\totalheight}{\input{SQMS.pstex_t}} & \input{MQMSG.pstex_t} 47 55 \end{tabular} 48 56 \end{center} 49 (While \newterm{pipeline} organizations also exist, \ie chaining schedulers together, they do not provide an advantage in this context.) 57 Beyond these two schedulers are a host of options, \ie adding an optional global, shared queue to MQMS. 50 58 51 59 The three major optimization criteria for a scheduler are: … … 60 68 61 69 \noindent 62 Essentially, all multi-processor computers have non-uniform memory access (NUM B), with one or more quantized steps to access data at different levels (steps)in the memory hierarchy.63 When a system has a large number of independently executing threads, affinity is impossiblebecause of \newterm{thread churn}.70 Essentially, all multi-processor computers have non-uniform memory access (NUMA), with one or more quantized steps to access data at different levels in the memory hierarchy. 71 When a system has a large number of independently executing threads, affinity becomes difficult because of \newterm{thread churn}. 64 72 That is, threads must be scheduled on multiple processors to obtain high processors utilization because the number of threads $\ggg$ processors. 65 73 66 74 \item 67 \newterm{contention}: safe access of shared objects by multiple processors requires mutual exclusion in the form oflocking\footnote{68 Lock-free data-structures is still locking because all forms of locking invoke delays.}75 \newterm{contention}: safe access of shared objects by multiple processors requires mutual exclusion in some form, generally locking\footnote{ 76 Lock-free data-structures do not involve locking but incurr similar costs to achieve mutual exclusion.} 69 77 70 78 \noindent 71 Lockingcost and latency increases significantly with the number of processors accessing a shared object.79 Mutual exclusion cost and latency increases significantly with the number of processors accessing a shared object. 72 80 \end{enumerate} 73 81 74 SIMS has perfect load-balancing but poor affinity and high contention by the processors, because of the single queue. 75 MOMS has poor load-balancing but perfect affinity and no contention, because each processor has its own queue. 76 Between these two schedulers are a host of options, \ie adding an optional global, shared ready-queue to MOMS. 82 Nevertheless, schedulers are a series of compromises, occasionally with some static or dynamic tuning parameters to enhance specific patterns. 83 Scheduling is a zero-sum game as computer processors normally have a fixed, maximum number of cycles per unit time\footnote{Frequency scaling and turbot boost add a degree of complexity that can be ignored in this discussion without loss of generality.}. 84 SQMS has perfect load-balancing but poor affinity and high contention by the processors, because of the single queue. 85 MQMS has poor load-balancing but perfect affinity and no contention, because each processor has its own queue. 86 77 87 Significant research effort has also looked at load sharing/stealing among queues, when a ready queue is too long or short, respectively. 78 88 These approaches attempt to perform better load-balancing at the cost of affinity and contention. 79 89 Load sharing/stealing schedulers attempt to push/pull work units to/from other ready queues 80 90 81 Note , a computer system is often lightly loaded (or has no load);82 any scheduler can handle this situation, \ie all schedulers are equal in this context.83 As the workload increases, there is a point where some schedulers begin to perform better than others based on the above criteria.84 A poorer scheduler might saturate for some reason and not be able to assign available work to idle processors, \ie processors are spinning when work is available.91 Note however that while any change comes at a cost, hence the zero-sum game, not all compromises are necessarily equivalent. 92 Some schedulers can perform very well only in very specific workload scenarios, others might offer acceptable performance but be applicable to a wider range of workloads. 93 Since \CFA attempts to improve the safety and productivity of C, the scheduler presented in this thesis attempts to achieve the same goals. 94 More specifically, safety and productivity for scheduling means supporting a wide range of workloads so that programmers can rely on progress guarantees (safety) and more easily achieve acceptable performance (productivity). 85 95 86 96 87 \section{\CFA programming language} 88 89 \todo{Brief discussion on \CFA features used in the thesis.} 90 91 The \CFA programming language~\cite{cfa:frontpage,cfa:typesystem} extends the C programming language by adding modern safety and productivity features, while maintaining backwards compatibility. Among its productivity features, \CFA supports user-level threading~\cite{Delisle21} allowing programmers to write modern concurrent and parallel programs. 92 My previous master's thesis on concurrent in \CFA focused on features and interfaces. 93 This Ph.D.\ thesis focuses on performance, introducing \glsxtrshort{api} changes only when required by performance considerations. Specifically, this work concentrates on scheduling and \glsxtrshort{io}. Prior to this work, the \CFA runtime used a strict \glsxtrshort{fifo} \gls{rQ} and no non-blocking I/O capabilities at the user-thread level. 94 95 As a research project, this work builds exclusively on newer versions of the Linux operating-system and gcc/clang compilers. While \CFA is released, supporting older versions of Linux ($<$~Ubuntu 16.04) and gcc/clang compilers ($<$~gcc 6.0) is not a goal of this work. 96 97 98 \section{Contributions} 99 \label{s:Contributions} 100 97 \section{Contributions}\label{s:Contributions} 101 98 This work provides the following contributions in the area of user-level scheduling in an advanced programming-language runtime-system: 102 99 \begin{enumerate}[leftmargin=*] 103 100 \item 104 Guarantee no thread or set of threads can conspire to prevent other threads from executing. 105 \begin{itemize} 101 A scalable scheduling algorithm that offers progress guarantees. 106 102 \item 107 There must exist some form of preemption to prevent CPU-bound threads from gaining exclusive access to one or more processors.103 An algorithm for load-balancing and idle sleep of processors, including NUMA awareness. 108 104 \item 109 There must be a scheduler guarantee that threads are executed after CPU-bound threads are preempted. 110 Otherwise, CPU-bound threads can immediately rescheduled, negating the preemption. 111 \end{itemize} 112 Hence, the runtime/scheduler provides a notion of fairness that is impossible for a programmer to violate. 113 \item 114 Provide comprehensive scheduling across all forms of preemption and blocking, where threads move from the running to ready or blocked states. 115 116 Once a thread stops running, the processor executing it must be rescheduled, if possible. 117 Knowing what threads are in the ready state for scheduling is difficult because blocking operations complete asynchronous and with poor notification. 118 \item 119 Efficiently deal with unbalanced workloads among processors. 120 121 Simpler to 122 \item 123 Efficiently deal with idle processors when there is less work than the available computing capacity. 105 Support for user-level \glsxtrshort{io} capabilities based on Linux's \texttt{io\_uring}. 124 106 \end{enumerate} -
doc/theses/thierry_delisle_PhD/thesis/text/runtime.tex
rfc2c57a9 r06bdba4 4 4 \section{C Threading} 5 5 6 \Celeven introduced threading features, such the @_Thread_local@ storage class, and libraries @stdatomic.h@ and @threads.h@. Interestingly, almost a decade after the \Celeven standard, the most recent versions of gcc, clang, and msvc do not support the \Celeven include @threads.h@, indicating no interest in the C11 concurrency approach (possibly because of the recent effort to add concurrency to \CC). While the \Celeven standard does not state a threading model, the historical association with pthreads suggests implementations would adopt kernel-level threading (1:1)~\cite{ThreadModel}, as for \CC. This model uses \glspl{kthrd} to achieve parallelism and concurrency. In this model, every thread of computation maps to an object in the kernel. The kernel then has the responsibility of managing these threads, \eg creating, scheduling, blocking. A consequence of this approach is that the kernel has a perfect view of every thread executing in the system\footnote{This is not completely true due to primitives like \lstinline|futex|es, which have a significant portion of their logic in user space.}. 6 \Celeven introduced threading features, such the @_Thread_local@ storage class, and libraries @stdatomic.h@ and @threads.h@. 7 Interestingly, almost a decade after the \Celeven standard, the most recent versions of gcc, clang, and msvc do not support the \Celeven include @threads.h@, indicating no interest in the C11 concurrency approach (possibly because of the recent effort to add concurrency to \CC). 8 While the \Celeven standard does not state a threading model, the historical association with pthreads suggests implementations would adopt kernel-level threading (1:1)~\cite{ThreadModel}, as for \CC. 9 This model uses \glspl{kthrd} to achieve parallelism and concurrency. In this model, every thread of computation maps to an object in the kernel. 10 The kernel then has the responsibility of managing these threads, \eg creating, scheduling, blocking. 11 A consequence of this approach is that the kernel has a perfect view of every thread executing in the system\footnote{This is not completely true due to primitives like \lstinline|futex|es, which have a significant portion of their logic in user space.}. 7 12 8 13 \section{M:N Threading}\label{prev:model} … … 10 15 Threading in \CFA is based on \Gls{uthrding}, where \glspl{thrd} are the representation of a unit of work. As such, \CFA programmers should expect these units to be fairly inexpensive, \ie programmers should be able to create a large number of \glspl{thrd} and switch among \glspl{thrd} liberally without many concerns for performance. 11 16 12 The \CFA M:N threading models is implemented using many user-level threads mapped onto fewer \glspl{kthrd}. The user-level threads have the same semantic meaning as a \glspl{kthrd} in the 1:1 model: they represent an independent thread of execution with its own stack. The difference is that user-level threads do not have a corresponding object in the kernel; they are handled by the runtime in user space and scheduled onto \glspl{kthrd}, referred to as \glspl{proc} in this document. \Glspl{proc} run a \gls{thrd} until it context switches out, it then chooses a different \gls{thrd} to run. 17 The \CFA M:N threading models is implemented using many user-level threads mapped onto fewer \glspl{kthrd}. 18 The user-level threads have the same semantic meaning as a \glspl{kthrd} in the 1:1 model: they represent an independent thread of execution with its own stack. 19 The difference is that user-level threads do not have a corresponding object in the kernel; they are handled by the runtime in user space and scheduled onto \glspl{kthrd}, referred to as \glspl{proc} in this document. \Glspl{proc} run a \gls{thrd} until it context switches out, it then chooses a different \gls{thrd} to run. 13 20 14 21 \section{Clusters} 15 \CFA allows the option to group user-level threading, in the form of clusters. Both \glspl{thrd} and \glspl{proc} belong to a specific cluster. \Glspl{thrd} are only scheduled onto \glspl{proc} in the same cluster and scheduling is done independently of other clusters. Figure~\ref{fig:system} shows an overview of the \CFA runtime, which allows programmers to tightly control parallelism. It also opens the door to handling effects like NUMA, by pinning clusters to a specific NUMA node\footnote{This capability is not currently implemented in \CFA, but the only hurdle left is creating a generic interface for CPU masks.}. 22 \CFA allows the option to group user-level threading, in the form of clusters. 23 Both \glspl{thrd} and \glspl{proc} belong to a specific cluster. 24 \Glspl{thrd} are only scheduled onto \glspl{proc} in the same cluster and scheduling is done independently of other clusters. 25 Figure~\ref{fig:system} shows an overview of the \CFA runtime, which allows programmers to tightly control parallelism. 26 It also opens the door to handling effects like NUMA, by pinning clusters to a specific NUMA node\footnote{This capability is not currently implemented in \CFA, but the only hurdle left is creating a generic interface for CPU masks.}. 16 27 17 28 \begin{figure} … … 30 41 31 42 \begin{quote} 32 Given a simple network program with 2 \glspl{thrd} and a single \gls{proc}, one \gls{thrd} sends network requests to a server and the other \gls{thrd} waits for a response from the server. If the second \gls{thrd} races ahead, it may wait for responses to requests that have not been sent yet. In theory, this should not be a problem, even if the second \gls{thrd} waits, because the first \gls{thrd} is still ready to run and should be able to get CPU time to send the request. With M:N threading, while the first \gls{thrd} is ready, the lone \gls{proc} \emph{cannot} run the first \gls{thrd} if it is blocked in the \glsxtrshort{io} operation of the second \gls{thrd}. If this happen, the system is in a synchronization deadlock\footnote{In this example, the deadlock could be resolved if the server sends unprompted messages to the client. However, this solution is neither general nor appropriate even in this simple case.}. 43 Given a simple network program with 2 \glspl{thrd} and a single \gls{proc}, one \gls{thrd} sends network requests to a server and the other \gls{thrd} waits for a response from the server. 44 If the second \gls{thrd} races ahead, it may wait for responses to requests that have not been sent yet. 45 In theory, this should not be a problem, even if the second \gls{thrd} waits, because the first \gls{thrd} is still ready to run and should be able to get CPU time to send the request. 46 With M:N threading, while the first \gls{thrd} is ready, the lone \gls{proc} \emph{cannot} run the first \gls{thrd} if it is blocked in the \glsxtrshort{io} operation of the second \gls{thrd}. 47 If this happen, the system is in a synchronization deadlock\footnote{In this example, the deadlock could be resolved if the server sends unprompted messages to the client. 48 However, this solution is neither general nor appropriate even in this simple case.}. 33 49 \end{quote} 34 50 35 Therefore, one of the objective of this work is to introduce \emph{User-Level \glsxtrshort{io}}, which like \glslink{uthrding}{User-Level \emph{Threading}}, blocks \glspl{thrd} rather than \glspl{proc} when doing \glsxtrshort{io} operations. This feature entails multiplexing the \glsxtrshort{io} operations of many \glspl{thrd} onto fewer \glspl{proc}. The multiplexing requires a single \gls{proc} to execute multiple \glsxtrshort{io} operations in parallel. This requirement cannot be done with operations that block \glspl{proc}, \ie \glspl{kthrd}, since the first operation would prevent starting new operations for its blocking duration. Executing \glsxtrshort{io} operations in parallel requires \emph{asynchronous} \glsxtrshort{io}, sometimes referred to as \emph{non-blocking}, since the \gls{kthrd} does not block. 51 Therefore, one of the objective of this work is to introduce \emph{User-Level \glsxtrshort{io}}, which like \glslink{uthrding}{User-Level \emph{Threading}}, blocks \glspl{thrd} rather than \glspl{proc} when doing \glsxtrshort{io} ope rations. 52 This feature entails multiplexing the \glsxtrshort{io} operations of many \glspl{thrd} onto fewer \glspl{proc}. 53 The multiplexing requires a single \gls{proc} to execute multiple \glsxtrshort{io} operations in parallel. 54 This requirement cannot be done with operations that block \glspl{proc}, \ie \glspl{kthrd}, since the first operation would prevent starting new operations for its blocking duration. 55 Executing \glsxtrshort{io} operations in parallel requires \emph{asynchronous} \glsxtrshort{io}, sometimes referred to as \emph{non-blocking}, since the \gls{kthrd} does not block. 36 56 37 57 \section{Interoperating with C} … … 49 69 \item Introducing safe-point code (see Go~page~\pageref{GoSafePoint}) can have a significant impact on general performance. 50 70 \end{enumerate} 51 Because of these consequences, this work does not attempt to ``sandbox'' calls to C. Therefore, it is possible calls to an unknown library function can block a \gls{kthrd} leading to deadlocks in \CFA's M:N threading model, which would not occur in a traditional 1:1 threading model. Currently, all M:N thread systems interacting with UNIX without sandboxing suffer from this problem but manage to work very well in the majority of applications. Therefore, a complete solution to this problem is outside the scope of this thesis.\footnote{\CFA does provide a pthreads emulation, so any library function using embedded pthreads locks are redirected to \CFA user-level locks. This capability further reduces the chances of blocking a \gls{kthrd}.} 71 Because of these consequences, this work does not attempt to ``sandbox'' calls to C. 72 Therefore, it is possible calls to an unknown library function can block a \gls{kthrd} leading to deadlocks in \CFA's M:N threading model, which would not occur in a traditional 1:1 threading model. 73 Currently, all M:N thread systems interacting with UNIX without sandboxing suffer from this problem but manage to work very well in the majority of applications. 74 Therefore, a complete solution to this problem is outside the scope of this thesis.\footnote{\CFA does provide a pthreads emulation, so any library function using embedded pthreads locks are redirected to \CFA user-level locks. This capability further reduces the chances of blocking a \gls{kthrd}.} -
libcfa/src/concurrency/locks.hfa
rfc2c57a9 r06bdba4 533 533 #endif 534 534 lock( lock, node ); 535 while( held) Pause();536 held = true;535 while(__atomic_load_n(&held, __ATOMIC_SEQ_CST)) Pause(); 536 __atomic_store_n(&held, true, __ATOMIC_SEQ_CST); 537 537 unlock( lock, node ); 538 538 #ifdef __CFA_DEBUG__ … … 545 545 owner = 0p; 546 546 #endif 547 held = false;547 __atomic_store_n(&held, false, __ATOMIC_SEQ_CST); 548 548 } 549 549 … … 586 586 #endif 587 587 lock( lock ); 588 while( held) Pause();589 held = true;588 while(__atomic_load_n(&held, __ATOMIC_SEQ_CST)) Pause(); 589 __atomic_store_n(&held, true, __ATOMIC_RELEASE); 590 590 unlock( lock ); 591 591 #ifdef __CFA_DEBUG__ … … 598 598 owner = 0p; 599 599 #endif 600 held = false;600 __atomic_store_n(&held, false, __ATOMIC_RELEASE); 601 601 } 602 602 -
src/AST/Decl.hpp
rfc2c57a9 r06bdba4 316 316 EnumDecl( const CodeLocation& loc, const std::string& name, 317 317 std::vector<ptr<Attribute>>&& attrs = {}, Linkage::Spec linkage = Linkage::Cforall, Type * base = nullptr, 318 318 std::unordered_map< std::string, long long > enumValues = std::unordered_map< std::string, long long >() ) 319 319 : AggregateDecl( loc, name, std::move(attrs), linkage ), base(base), enumValues(enumValues) {} 320 320 -
src/AST/Inspect.cpp
rfc2c57a9 r06bdba4 5 5 // file "LICENCE" distributed with Cforall. 6 6 // 7 // Node.hpp --7 // Inspect.cpp -- Helpers to get information from the AST. 8 8 // 9 9 // Author : Thierry Delisle 10 10 // Created On : Fri Jun 24 13:16:31 2022 11 // Last Modified By : 12 // Last Modified On : 13 // Update Count : 11 // Last Modified By : Andrew Beach 12 // Last Modified On : Mon Jun 27 15:35:00 2022 13 // Update Count : 1 14 14 // 15 15 … … 21 21 22 22 namespace ast { 23 bool structHasFlexibleArray( const ast::StructDecl * decl ) { 24 if(decl->members.size() == 0) return false; 25 const auto & last = *decl->members.rbegin(); 26 auto lastd = last.as<ast::DeclWithType>(); 27 if(!lastd) return false; // I don't know what this is possible, but it might be. 28 auto atype = dynamic_cast<const ast::ArrayType *>(lastd->get_type()); 29 if(!atype) return false; 30 return !atype->isVarLen && !atype->dimension; 31 } 32 }; 23 24 bool structHasFlexibleArray( const ast::StructDecl * decl ) { 25 if(decl->members.size() == 0) return false; 26 const auto & last = *decl->members.rbegin(); 27 auto lastd = last.as<ast::DeclWithType>(); 28 // I don't know what this is possible, but it might be. 29 if(!lastd) return false; 30 auto atype = dynamic_cast<const ast::ArrayType *>(lastd->get_type()); 31 if(!atype) return false; 32 return !atype->isVarLen && !atype->dimension; 33 } 34 35 } // namespace ast -
src/AST/Inspect.hpp
rfc2c57a9 r06bdba4 5 5 // file "LICENCE" distributed with Cforall. 6 6 // 7 // Node.hpp --7 // Inspect.hpp -- Helpers to get information from the AST. 8 8 // 9 9 // Author : Thierry Delisle 10 10 // Created On : Fri Jun 24 13:16:31 2022 11 // Last Modified By : 12 // Last Modified On : 13 // Update Count : 11 // Last Modified By : Andrew Beach 12 // Last Modified On : Mon Jun 27 15:35:00 2022 13 // Update Count : 1 14 14 // 15 15 … … 17 17 18 18 namespace ast { 19 bool structHasFlexibleArray( const ast::StructDecl * ); 19 20 // Does the structure end in a flexable array declaration? 21 bool structHasFlexibleArray( const ast::StructDecl * ); 22 20 23 } -
src/SymTab/Validate.cc
rfc2c57a9 r06bdba4 312 312 Stats::Heap::newPass("validate-B"); 313 313 Stats::Time::BlockGuard guard("validate-B"); 314 //linkReferenceToTypes( translationUnit );314 linkReferenceToTypes( translationUnit ); // Must happen before auto-gen, because it uses the sized flag. 315 315 mutateAll( translationUnit, fixQual ); // must happen after LinkReferenceToTypes_old, because aggregate members are accessed 316 316 HoistStruct::hoistStruct( translationUnit ); -
src/Validate/module.mk
rfc2c57a9 r06bdba4 41 41 Validate/LabelAddressFixer.cpp \ 42 42 Validate/LabelAddressFixer.hpp \ 43 Validate/LinkReferenceToTypes.cpp \ 44 Validate/LinkReferenceToTypes.hpp \ 43 45 Validate/NoIdSymbolTable.hpp \ 44 46 Validate/ReturnCheck.cpp \ -
src/main.cc
rfc2c57a9 r06bdba4 85 85 #include "Validate/InitializerLength.hpp" // for setLengthFromInitializer 86 86 #include "Validate/LabelAddressFixer.hpp" // for fixLabelAddresses 87 #include "Validate/LinkReferenceToTypes.hpp" // for linkReferenceToTypes 87 88 #include "Validate/ReturnCheck.hpp" // for checkReturnStatements 88 89 #include "Virtual/ExpandCasts.h" // for expandCasts … … 333 334 PASS( "Validate-A", SymTab::validate_A( translationUnit ) ); 334 335 335 // Must happen before auto-gen, because it uses the sized flag.336 PASS( "Link Reference To Types", SymTab::linkReferenceToTypes( translationUnit ) );337 338 336 CodeTools::fillLocations( translationUnit ); 339 337 … … 348 346 349 347 forceFillCodeLocations( transUnit ); 348 349 // Must happen before auto-gen, because it uses the sized flag. 350 PASS( "Link Reference To Types", Validate::linkReferenceToTypes( transUnit ) ); 350 351 351 352 // Must happen after Link References To Types, -
tests/enum_tests/structEnum.cfa
rfc2c57a9 r06bdba4 24 24 int main() { 25 25 printf("%d %c\n", apple.x, apple.y); 26 // Failed; enumInstType is now not a real type and not instantiated. 26 // Failed; enumInstType is now not a real type and not instantiated. 27 27 // Not sure if we want that 28 28 // printf("%d %c\n", second.x, second.y); 29 29 return 0; 30 30 } 31 32 33
Note: See TracChangeset
for help on using the changeset viewer.