Changeset 43aec9e


Ignore:
Timestamp:
Jun 27, 2022, 5:18:01 PM (7 weeks ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
master, pthread-emulation
Children:
72e76fd
Parents:
7b71402
Message:

Merged some of peter's changes

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/thierry_delisle_PhD/thesis/text/existing.tex

    r7b71402 r43aec9e  
    1414
    1515\section{Naming Convention}
    16 Scheduling has been studied by various communities concentrating on different incarnation of the same problems. As a result, there is no standard naming conventions for scheduling that is respected across these communities. This document uses the term \newterm{\Gls{at}} to refer to the abstract objects being scheduled and the term \newterm{\Gls{proc}} to refer to the concrete objects executing these \glspl{at}.
     16Scheduling has been studied by various communities concentrating on different incarnation of the same problems. As a result, there are no standard naming conventions for scheduling that is respected across these communities. This document uses the term \newterm{\Gls{at}} to refer to the abstract objects being scheduled and the term \newterm{\Gls{proc}} to refer to the concrete objects executing these \glspl{at}.
    1717
    1818\section{Static Scheduling}
     
    2020The scheduler then processes this input ahead of time and produces a \newterm{schedule} the system follows during execution.
    2121This approach is popular in real-time systems since the need for strong guarantees justifies the cost of determining and supplying this information.
    22 In general, static schedulers are less relevant to this project because they require input from the programmers that a programming language does not have as part of its concurrency semantic.
     22In general, static schedulers are less relevant to this project because they require input from the programmers that the programming language does not have as part of its concurrency semantic.
    2323Specifying this information explicitly adds a significant burden to the programmer and reduces flexibility.
    2424For this reason, the \CFA scheduler does not require this information.
    2525
    2626\section{Dynamic Scheduling}
    27 \newterm{Dynamic schedulers} determine \gls{at} dependencies and costs (if at all) during scheduling.
    28 % Schedulers that support this detection at runtime are referred to as \newterm{Dynamic Schedulers}.
    29 Hence, unlike static scheduling, \gls{at} dependencies are conditional and detected at runtime. This detection takes the form of observing new \gls{at}(s) in the system and determining dependencies from their behaviour, including suspending or halting a \gls{at} that dynamically detects unfulfilled dependencies. Furthermore, each \gls{at} has the responsibility of adding dependent \glspl{at} back into the system once dependencies are fulfilled. As a consequence, the scheduler often has an incomplete view of the system, seeing only \glspl{at} with no pending dependencies.
     27\newterm{Dynamic schedulers} determine \gls{at} dependencies and costs during scheduling, if at all.
     28Hence, unlike static scheduling, \gls{at} dependencies are conditional and detected at runtime. This detection takes the form of observing new \gls{at}(s) in the system and determining dependencies from their behaviour, including suspending or halting a \gls{at} that dynamically detects unfulfilled dependencies.
     29Furthermore, each \gls{at} has the responsibility of adding dependent \glspl{at} back into the system once dependencies are fulfilled.
     30As a consequence, the scheduler often has an incomplete view of the system, seeing only \glspl{at} with no pending dependencies.
    3031
    3132\subsection{Explicitly Informed Dynamic Schedulers}
    32 While dynamic schedulers may not have an exhaustive list of dependencies for a \gls{at}, some information may be available about each \gls{at}, \eg expected duration, required resources, relative importance, \etc. When available, a scheduler can then use this information to direct the scheduling decisions. \cit{Examples of schedulers with more information} However, most programmers do not determine or even \emph{predict} this information;
    33 at best, the scheduler has only some imprecision information provided by the programmer, \eg, indicating a \glspl{at} takes approximately 3--7 seconds to complete, rather than exactly 5 seconds. Providing this kind of information is a significant programmer burden especially if the the information does not scale with the number of \glspl{at} and their complexity. For example, providing an exhaustive list of files read by 5 \glspl{at} is an easier requirement then providing an exhaustive list of memory addresses accessed by 10,000 independent \glspl{at}.
    34 
    35 Since the goal of this thesis is to provide an \emph{informed} scheduler as a replacement for \CFA's existing \emph{uninformed} scheduler, explicitly informed schedulers are less relevant to this project. Nevertheless, some strategies are worth mentioning.
     33While dynamic schedulers may not have an exhaustive list of dependencies for a \gls{at}, some information may be available about each \gls{at}, \eg expected duration, required resources, relative importance, \etc.
     34When available, a scheduler can then use this information to direct the scheduling decisions. \cit{Examples of schedulers with more information}
     35However, most programmers do not determine or even \emph{predict} this information;
     36at best, the scheduler has only some imprecise information provided by the programmer, \eg, indicating a \glspl{at} takes approximately 3--7 seconds to complete, rather than exactly 5 seconds.
     37Providing this kind of information is a significant programmer burden especially if the information does not scale with the number of \glspl{at} and their complexity.
     38For example, providing an exhaustive list of files read by 5 \glspl{at} is an easier requirement then providing an exhaustive list of memory addresses accessed by 10,000 independent \glspl{at}.
     39
     40Since the goal of this thesis is to provide a scheduler as a replacement for \CFA's existing \emph{uninformed} scheduler, explicitly informed schedulers are less relevant to this project. Nevertheless, some strategies are worth mentioning.
    3641
    3742\subsubsection{Priority Scheduling}
    38 Common information used by schedulers to direct their algorithm is priorities. Each task is given a priority and higher-priority \glspl{at} are preferred to lower-priority ones. The simplest priority scheduling algorithm is to require that every \gls{at} have a distinct pre-established priority and always run the available \gls{at} with the highest priority. Asking programmers to provide an exhaustive set of unique priorities can be prohibitive when the system has a large number of \glspl{at}. It can therefore be desirable for schedulers to support \glspl{at} with identical priorities and/or automatically setting and adjusting priorities for \glspl{at}. The most common adopting some variant on priorities with overlaps and dynamic priority adjustments. For example, Microsoft Windows uses a pair of priorities
     43Common information used by schedulers to direct their algorithm is priorities.
     44Each \gls{at} is given a priority and higher-priority \glspl{at} are preferred to lower-priority ones.
     45The simplest priority scheduling algorithm is to require that every \gls{at} have a distinct pre-established priority and always run the available \gls{at} with the highest priority.
     46Asking programmers to provide an exhaustive set of unique priorities can be prohibitive when the system has a large number of \glspl{at}.
     47It can therefore be desirable for schedulers to support \glspl{at} with identical priorities and/or automatically setting and adjusting priorities for \glspl{at}.
     48Most common operating systems use some variant on priorities with overlaps and dynamic priority adjustments.
     49For example, Microsoft Windows uses a pair of priorities
    3950\cit{https://docs.microsoft.com/en-us/windows/win32/procthread/scheduling-priorities,https://docs.microsoft.com/en-us/windows/win32/taskschd/taskschedulerschema-priority-settingstype-element}, one specified by users out of ten possible options and one adjusted by the system.
    4051
     
    4455
    4556\subsubsection{Feedback Scheduling}
    46 As mentioned, schedulers may also gather information about each \glspl{at} to direct their decisions. This design effectively moves the scheduler into the realm of \newterm{Control Theory}~\cite{wiki:controltheory}. This information gathering does not generally involve programmers, and as such, does not increase programmer burden the same way explicitly provided information may. However, some feedback schedulers do allow programmers to offer additional information on certain \glspl{at}, in order to direct scheduling decisions. The important distinction being whether or not the scheduler can function without this additional information.
     57As mentioned, schedulers may also gather information about each \glspl{at} to direct their decisions.
     58This design effectively moves the scheduler into the realm of \newterm{Control Theory}~\cite{wiki:controltheory}.
     59This information gathering does not generally involve programmers, and as such, does not increase programmer burden the same way explicitly provided information may.
     60However, some feedback schedulers do allow programmers to offer additional information on certain \glspl{at}, in order to direct scheduling decisions.
     61The important distinction being whether or not the scheduler can function without this additional information.
    4762
    4863
    4964\section{Work Stealing}\label{existing:workstealing}
    50 One of the most popular scheduling algorithm in practice (see~\ref{existing:prod}) is work stealing. This idea, introduce by \cite{DBLP:conf/fpca/BurtonS81}, effectively has each worker process its local \glspl{at} first, but allows the possibility for other workers to steal local \glspl{at} if they run out of \glspl{at}. \cite{DBLP:conf/focs/Blumofe94} introduced the more familiar incarnation of this, where each workers has a queue of \glspl{at} and workers without \glspl{at} steal \glspl{at} from random workers. (The Burton and Sleep algorithm had trees of \glspl{at} and steal only among neighbours). Blumofe and Leiserson also prove worst case space and time requirements for well-structured computations.
     65One of the most popular scheduling algorithm in practice (see~\ref{existing:prod}) is work stealing.
     66This idea, introduce by \cite{DBLP:conf/fpca/BurtonS81}, effectively has each worker process its local \glspl{at} first, but allows the possibility for other workers to steal local \glspl{at} if they run out of \glspl{at}.
     67\cite{DBLP:conf/focs/Blumofe94} introduced the more familiar incarnation of this, where each workers has a queue of \glspl{at} and workers without \glspl{at} steal \glspl{at} from random workers\footnote{The Burton and Sleep algorithm had trees of \glspl{at} and steal only among neighbours.}.
     68Blumofe and Leiserson also prove worst case space and time requirements for well-structured computations.
    5169
    5270Many variations of this algorithm have been proposed over the years~\cite{DBLP:journals/ijpp/YangH18}, both optimizations of existing implementations and approaches that account for new metrics.
    5371
    54 \paragraph{Granularity} A significant portion of early work-stealing research concentrated on \newterm{Implicit Parallelism}~\cite{wiki:implicitpar}. Since the system is responsible for splitting the work, granularity is a challenge that cannot be left to programmers (as opposed to \newterm{Explicit Parallelism}\cite{wiki:explicitpar} where the burden can be left to programmers). In general, fine granularity is better for load balancing and coarse granularity reduces communication overhead. The best performance generally means finding a middle ground between the two. Several methods can be employed, but I believe these are less relevant for threads, which are generally explicit and more coarse grained.
     72\paragraph{Granularity} A significant portion of early work-stealing research concentrated on \newterm{Implicit Parallelism}~\cite{wiki:implicitpar}.
     73Since the system is responsible for splitting the work, granularity is a challenge that cannot be left to programmers, as opposed to \newterm{Explicit Parallelism}\cite{wiki:explicitpar} where the burden can be left to programmers.
     74In general, fine granularity is better for load balancing and coarse granularity reduces communication overhead.
     75The best performance generally means finding a middle ground between the two.
     76Several methods can be employed, but I believe these are less relevant for threads, which are generally explicit and more coarse grained.
    5577
    5678\paragraph{Task Placement} Since modern computers rely heavily on cache hierarchies\cit{Do I need a citation for this}, migrating \glspl{at} from one core to another can be .  \cite{DBLP:journals/tpds/SquillanteL93}
     
    6183
    6284\subsection{Theoretical Results}
    63 There is also a large body of research on the theoretical aspects of work stealing. These evaluate, for example, the cost of migration~\cite{DBLP:conf/sigmetrics/SquillanteN91,DBLP:journals/pe/EagerLZ86}, how affinity affects performance~\cite{DBLP:journals/tpds/SquillanteL93,DBLP:journals/mst/AcarBB02,DBLP:journals/ipl/SuksompongLS16} and theoretical models for heterogeneous systems~\cite{DBLP:journals/jpdc/MirchandaneyTS90,DBLP:journals/mst/BenderR02,DBLP:conf/sigmetrics/GastG10}. \cite{DBLP:journals/jacm/BlellochGM99} examines the space bounds of work stealing and \cite{DBLP:journals/siamcomp/BerenbrinkFG03} shows that for under-loaded systems, the scheduler completes its computations in finite time, \ie is \newterm{stable}. Others show that work stealing is applicable to various scheduling contexts~\cite{DBLP:journals/mst/AroraBP01,DBLP:journals/anor/TchiboukdjianGT13,DBLP:conf/isaac/TchiboukdjianGTRB10,DBLP:conf/ppopp/AgrawalLS10,DBLP:conf/spaa/AgrawalFLSSU14}. \cite{DBLP:conf/ipps/ColeR13} also studied how randomized work-stealing affects false sharing among \glspl{at}.
    64 
    65 However, as \cite{DBLP:journals/ijpp/YangH18} highlights, it is worth mentioning that this theoretical research has mainly focused on ``fully-strict'' computations, \ie workloads that can be fully represented with a direct acyclic graph. It is unclear how well these distributions represent workloads in real world scenarios.
     85There is also a large body of research on the theoretical aspects of work stealing. These evaluate, for example, the cost of migration~\cite{DBLP:conf/sigmetrics/SquillanteN91,DBLP:journals/pe/EagerLZ86}, how affinity affects performance~\cite{DBLP:journals/tpds/SquillanteL93,DBLP:journals/mst/AcarBB02,DBLP:journals/ipl/SuksompongLS16} and theoretical models for heterogeneous systems~\cite{DBLP:journals/jpdc/MirchandaneyTS90,DBLP:journals/mst/BenderR02,DBLP:conf/sigmetrics/GastG10}.
     86\cite{DBLP:journals/jacm/BlellochGM99} examines the space bounds of work stealing and \cite{DBLP:journals/siamcomp/BerenbrinkFG03} shows that for under-loaded systems, the scheduler completes its computations in finite time, \ie is \newterm{stable}.
     87Others show that work stealing is applicable to various scheduling contexts~\cite{DBLP:journals/mst/AroraBP01,DBLP:journals/anor/TchiboukdjianGT13,DBLP:conf/isaac/TchiboukdjianGTRB10,DBLP:conf/ppopp/AgrawalLS10,DBLP:conf/spaa/AgrawalFLSSU14}.
     88\cite{DBLP:conf/ipps/ColeR13} also studied how randomized work-stealing affects false sharing among \glspl{at}.
     89
     90However, as \cite{DBLP:journals/ijpp/YangH18} highlights, it is worth mentioning that this theoretical research has mainly focused on ``fully-strict'' computations, \ie workloads that can be fully represented with a direct acyclic graph.
     91It is unclear how well these distributions represent workloads in real world scenarios.
    6692
    6793\section{Preemption}
    68 One last aspect of scheduling is preemption since many schedulers rely on it for some of their guarantees. Preemption is the idea of interrupting \glspl{at} that have been running too long, effectively injecting suspend points into the application. There are multiple techniques to achieve this effect but they all aim to guarantee suspend points in a \gls{at} are never further apart than some fixed duration. While this helps schedulers guarantee that no \glspl{at} unfairly monopolizes a worker, preemption can effectively added to any scheduler. Therefore, the only interesting aspect of preemption for the design of scheduling is whether or not to require it.
     94One last aspect of scheduling is preemption since many schedulers rely on it for some of their guarantees.
     95Preemption is the idea of interrupting \glspl{at} that have been running too long, effectively injecting suspend points into the application.
     96There are multiple techniques to achieve this effect but they all aim to guarantee that the suspend points in a \gls{at} are never further apart than some fixed duration.
     97While this helps schedulers guarantee that no \glspl{at} unfairly monopolizes a worker, preemption can effectively be added to any scheduler.
     98Therefore, the only interesting aspect of preemption for the design of scheduling is whether or not to require it.
    6999
    70100\section{Production Schedulers}\label{existing:prod}
    71 This section presents a quick overview of several current schedulers. While these schedulers do not necessarily represent the most recent advances in scheduling, they are what is generally accessible to programmers. As such, I believe these schedulers are at least as relevant as those presented in published work. Schedulers that operate in kernel space and in user space are considered, as both can offer relevant insight for this project. However, real-time schedulers as not considered, as these have constraints that are much stricter than what is needed for this project.
     101This section presents a quick overview of several current schedulers.
     102While these schedulers do not necessarily represent the most recent advances in scheduling, they are what is generally accessible to programmers.
     103As such, I believe these schedulers are at least as relevant as those presented in published work.
     104Schedulers that operate in kernel space and in user space are considered, as both can offer relevant insight for this project.
     105However, real-time schedulers are not considered, as these have constraints that are much stricter than what is needed for this project.
    72106
    73107\subsection{Operating System Schedulers}
    74 Operating System Schedulers tend to be fairly complex as they generally support some amount of real-time, aim to balance interactive and non-interactive \glspl{at} and support multiple users sharing hardware without requiring these users to cooperate. Here are more details on a few schedulers used in the common operating systems: Linux, FreeBSD, Microsoft Windows and Apple's OS X. The information is less complete for operating systems with closed source.
     108Operating System Schedulers tend to be fairly complex as they generally support some amount of real-time, aim to balance interactive and non-interactive \glspl{at} and support multiple users sharing hardware without requiring these users to cooperate.
     109Here are more details on a few schedulers used in the common operating systems: Linux, FreeBSD, Microsoft Windows and Apple's OS X.
     110The information is less complete for operating systems with closed source.
    75111
    76112\paragraph{Linux's CFS}
    77 The default scheduler used by Linux (the Completely Fair Scheduler)~\cite{MAN:linux/cfs,MAN:linux/cfs2} is a feedback scheduler based on CPU time. For each processor, it constructs a Red-Black tree of \glspl{at} waiting to run, ordering them by the amount of CPU time used. The \gls{at} that has used the least CPU time is scheduled. It also supports the concept of \newterm{Nice values}, which are effectively multiplicative factors on the CPU time used. The ordering of \glspl{at} is also affected by a group based notion of fairness, where \glspl{at} belonging to groups having used less CPU time are preferred to \glspl{at} belonging to groups having used more CPU time. Linux achieves load-balancing by regularly monitoring the system state~\cite{MAN:linux/cfs/balancing} and using some heuristic on the load (currently CPU time used in the last millisecond plus a decayed version of the previous time slots~\cite{MAN:linux/cfs/pelt}.).
    78 
    79 \cite{DBLP:conf/eurosys/LoziLFGQF16} shows that Linux's CFS also does work stealing to balance the workload of each processors, but the paper argues this aspect can be improved significantly. The issues highlighted stem from Linux's need to support fairness across \glspl{at} \emph{and} across users\footnote{Enforcing fairness across users means that given two users, one with a single \gls{at} and the other with one thousand \glspl{at}, the user with a single \gls{at} does not receive one thousandth of the CPU time.}, increasing the complexity.
     113The default scheduler used by Linux, the Completely Fair Scheduler~\cite{MAN:linux/cfs,MAN:linux/cfs2}, is a feedback scheduler based on CPU time.
     114For each processor, it constructs a Red-Black tree of \glspl{at} waiting to run, ordering them by the amount of CPU time used.
     115The \gls{at} that has used the least CPU time is scheduled.
     116It also supports the concept of \newterm{Nice values}, which are effectively multiplicative factors on the CPU time used.
     117The ordering of \glspl{at} is also affected by a group based notion of fairness, where \glspl{at} belonging to groups having used less CPU time are preferred to \glspl{at} belonging to groups having used more CPU time.
     118Linux achieves load-balancing by regularly monitoring the system state~\cite{MAN:linux/cfs/balancing} and using some heuristic on the load, currently CPU time used in the last millisecond plus a decayed version of the previous time slots~\cite{MAN:linux/cfs/pelt}.
     119
     120\cite{DBLP:conf/eurosys/LoziLFGQF16} shows that Linux's CFS also does work stealing to balance the workload of each processors, but the paper argues this aspect can be improved significantly.
     121The issues highlighted stem from Linux's need to support fairness across \glspl{at} \emph{and} across users\footnote{Enforcing fairness across users means that given two users, one with a single \gls{at} and the other with one thousand \glspl{at}, the user with a single \gls{at} does not receive one thousandth of the CPU time.}, increasing the complexity.
    80122
    81123Linux also offers a FIFO scheduler, a real-time scheduler, which runs the highest-priority \gls{at}, and a round-robin scheduler, which is an extension of the FIFO-scheduler that adds fixed time slices. \cite{MAN:linux/sched}
    82124
    83125\paragraph{FreeBSD}
    84 The ULE scheduler used in FreeBSD\cite{DBLP:conf/bsdcon/Roberson03} is a feedback scheduler similar to Linux's CFS. It uses different data structures and heuristics but also schedules according to some combination of CPU time used and niceness values. It also periodically balances the load of the system (according to a different heuristic), but uses a simpler work stealing approach.
     126The ULE scheduler used in FreeBSD\cite{DBLP:conf/bsdcon/Roberson03} is a feedback scheduler similar to Linux's CFS.
     127It uses different data structures and heuristics but also schedules according to some combination of CPU time used and niceness values.
     128It also periodically balances the load of the system (according to a different heuristic), but uses a simpler work stealing approach.
    85129
    86130\paragraph{Windows(OS)}
    87 Microsoft's Operating System's Scheduler~\cite{MAN:windows/scheduler} is a feedback scheduler with priorities. It supports 32 levels of priorities, some of which are reserved for real-time and privileged applications. It schedules \glspl{at} based on the highest priorities (lowest number) and how much CPU time each \gls{at} has used. The scheduler may also temporarily adjust priorities after certain effects like the completion of I/O requests.
     131Microsoft's Operating System's Scheduler~\cite{MAN:windows/scheduler} is a feedback scheduler with priorities.
     132It supports 32 levels of priorities, some of which are reserved for real-time and privileged applications.
     133It schedules \glspl{at} based on the highest priorities (lowest number) and how much CPU time each \gls{at} has used.
     134The scheduler may also temporarily adjust priorities after certain effects like the completion of I/O requests.
    88135
    89136\todo{load balancing}
     
    105152
    106153\paragraph{Go}\label{GoSafePoint}
    107 Go's scheduler uses a randomized work-stealing algorithm that has a global run-queue (\emph{GRQ}) and each processor (\emph{P}) has both a fixed-size run-queue (\emph{LRQ}) and a high-priority next ``chair'' holding a single element~\cite{GITHUB:go,YTUBE:go}. Preemption is present, but only at safe-points,~\cit{https://go.dev/src/runtime/preempt.go} which are inserted detection code at various frequent access boundaries.
     154Go's scheduler uses a randomized work-stealing algorithm that has a global run-queue (\emph{GRQ}) and each processor (\emph{P}) has both a fixed-size run-queue (\emph{LRQ}) and a high-priority next ``chair'' holding a single element~\cite{GITHUB:go,YTUBE:go}.
     155Preemption is present, but only at safe-points,~\cit{https://go.dev/src/runtime/preempt.go} which are inserted detection code at various frequent access boundaries.
    108156
    109157The algorithm is as follows :
     
    119167
    120168\paragraph{Erlang}
    121 Erlang is a functional language that supports concurrency in the form of processes: threads that share no data. It uses a kind of round-robin scheduler, with a mix of work sharing and stealing to achieve load balancing~\cite{:erlang}, where under-loaded workers steal from other workers, but overloaded workers also push work to other workers. This migration logic is directed by monitoring logic that evaluates the load a few times per seconds.
     169Erlang is a functional language that supports concurrency in the form of processes: threads that share no data.
     170It uses a kind of round-robin scheduler, with a mix of work sharing and stealing to achieve load balancing~\cite{:erlang}, where under-loaded workers steal from other workers, but overloaded workers also push work to other workers.
     171This migration logic is directed by monitoring logic that evaluates the load a few times per seconds.
    122172
    123173\paragraph{Intel\textregistered ~Threading Building Blocks}
    124 \newterm{Thread Building Blocks} (TBB) is Intel's task parallelism \cite{wiki:taskparallel} framework. It runs \newterm{jobs}, which are uninterruptable \glspl{at} that must always run to completion, on a pool of worker threads. TBB's scheduler is a variation of randomized work-stealing that also supports higher-priority graph-like dependencies~\cite{MAN:tbb/scheduler}. It schedules \glspl{at} as follows (where \textit{t} is the last \gls{at} completed):
     174\newterm{Thread Building Blocks} (TBB) is Intel's task parallelism \cite{wiki:taskparallel} framework.
     175It runs \newterm{jobs}, which are uninterruptable \glspl{at} that must always run to completion, on a pool of worker threads.
     176TBB's scheduler is a variation of randomized work-stealing that also supports higher-priority graph-like dependencies~\cite{MAN:tbb/scheduler}.
     177It schedules \glspl{at} as follows (where \textit{t} is the last \gls{at} completed):
    125178\begin{displayquote}
    126179        \begin{enumerate}
     
    139192
    140193\paragraph{Quasar/Project Loom}
    141 Java has two projects, Quasar~\cite{MAN:quasar} and Project Loom~\cite{MAN:project-loom}\footnote{It is unclear if these are distinct projects.}, that are attempting to introduce lightweight thread\-ing in the form of Fibers. Both projects seem to be based on the \texttt{ForkJoinPool} in Java, which appears to be a simple incarnation of randomized work-stealing~\cite{MAN:java/fork-join}.
     194Java has two projects, Quasar~\cite{MAN:quasar} and Project Loom~\cite{MAN:project-loom}\footnote{It is unclear if these are distinct projects.}, that are attempting to introduce lightweight thread\-ing in the form of Fibers.
     195Both projects seem to be based on the \texttt{ForkJoinPool} in Java, which appears to be a simple incarnation of randomized work-stealing~\cite{MAN:java/fork-join}.
    142196
    143197\paragraph{Grand Central Dispatch}
    144 An Apple\cit{Official GCD source} API that offers task parallelism~\cite{wiki:taskparallel}. Its distinctive aspect is multiple ``Dispatch Queues'', some of which are created by programmers.  Each queue has its own local ordering guarantees, \eg \glspl{at} on queue $A$ are executed in \emph{FIFO} order.
     198An Apple\cit{Official GCD source} API that offers task parallelism~\cite{wiki:taskparallel}.
     199Its distinctive aspect is multiple ``Dispatch Queues'', some of which are created by programmers.
     200Each queue has its own local ordering guarantees, \eg \glspl{at} on queue $A$ are executed in \emph{FIFO} order.
    145201
    146202\todo{load balancing and scheduling}
     
    148204% http://web.archive.org/web/20090920043909/http://images.apple.com/macosx/technology/docs/GrandCentral_TB_brief_20090903.pdf
    149205
    150 In terms of semantics, the Dispatch Queues seem to be very similar in semantics to Intel\textregistered ~TBB \texttt{execute()} and predecessor semantics. % Where it would be possible to convert from one to the other.
     206In terms of semantics, the Dispatch Queues seem to be very similar to Intel\textregistered ~TBB \texttt{execute()} and predecessor semantics.
    151207
    152208\paragraph{LibFibre}
    153 LibFibre~\cite{DBLP:journals/pomacs/KarstenB20} is a light-weight user-level threading framework developed at the University of Waterloo. Similarly to Go, it uses a variation of work stealing with a global queue that is higher priority than stealing. Unlike Go, it does not have the high-priority next ``chair'' and does not use randomized work-stealing.
     209LibFibre~\cite{DBLP:journals/pomacs/KarstenB20} is a light-weight user-level threading framework developed at the University of Waterloo.
     210Similarly to Go, it uses a variation of work stealing with a global queue that is higher priority than stealing.
     211Unlike Go, it does not have the high-priority next ``chair'' and does not use randomized work-stealing.
Note: See TracChangeset for help on using the changeset viewer.