Ignore:
Timestamp:
Nov 29, 2017, 4:33:46 PM (5 years ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
aaron-thesis, arm-eh, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, resolv-new, with_gc
Children:
f0743a7
Parents:
9d48a17
Message:

Ran Antidoe 9 spell checker on my thesis

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/proposals/concurrency/text/parallelism.tex

    r9d48a17 r6090518  
    1010
    1111\section{Paradigms}
    12 \subsection{User-level threads}
    13 A direct improvement on the \gls{kthread} approach is to use \glspl{uthread}. These threads offer most of the same features that the operating system already provide but can be used on a much larger scale. This approach is the most powerful solution as it allows all the features of multi-threading, while removing several of the more expensive costs of kernel threads. The down side is that almost none of the low-level threading problems are hidden; users still have to think about data races, deadlocks and synchronization issues. These issues can be somewhat alleviated by a concurrency toolkit with strong guarantees but the parallelism toolkit offers very little to reduce complexity in itself.
     12\subsection{User-Level Threads}
     13A direct improvement on the \gls{kthread} approach is to use \glspl{uthread}. These threads offer most of the same features that the operating system already provides but can be used on a much larger scale. This approach is the most powerful solution as it allows all the features of multithreading, while removing several of the more expensive costs of kernel threads. The downside is that almost none of the low-level threading problems are hidden; users still have to think about data races, deadlocks and synchronization issues. These issues can be somewhat alleviated by a concurrency toolkit with strong guarantees but the parallelism toolkit offers very little to reduce complexity in itself.
    1414
    1515Examples of languages that support \glspl{uthread} are Erlang~\cite{Erlang} and \uC~\cite{uC++book}.
    1616
    17 \subsection{Fibers : user-level threads without preemption} \label{fibers}
    18 A popular variant of \glspl{uthread} is what is often referred to as \glspl{fiber}. However, \glspl{fiber} do not present meaningful semantical differences with \glspl{uthread}. The significant difference between \glspl{uthread} and \glspl{fiber} is the lack of \gls{preemption} in the latter. Advocates of \glspl{fiber} list their high performance and ease of implementation as majors strengths but the performance difference between \glspl{uthread} and \glspl{fiber} is controversial, and the ease of implementation, while true, is a weak argument in the context of language design. Therefore this proposal largely ignores fibers.
     17\subsection{Fibers : User-Level Threads Without Preemption} \label{fibers}
     18A popular variant of \glspl{uthread} is what is often referred to as \glspl{fiber}. However, \glspl{fiber} do not present meaningful semantic differences with \glspl{uthread}. The significant difference between \glspl{uthread} and \glspl{fiber} is the lack of \gls{preemption} in the latter. Advocates of \glspl{fiber} list their high performance and ease of implementation as major strengths but the performance difference between \glspl{uthread} and \glspl{fiber} is controversial, and the ease of implementation, while true, is a weak argument in the context of language design. Therefore this proposal largely ignores fibers.
    1919
    2020An example of a language that uses fibers is Go~\cite{Go}
    2121
    22 \subsection{Jobs and thread pools}
    23 An approach on the opposite end of the spectrum is to base parallelism on \glspl{pool}. Indeed, \glspl{pool} offer limited flexibility but at the benefit of a simpler user interface. In \gls{pool} based systems, users express parallelism as units of work, called jobs, and a dependency graph (either explicit or implicit) that tie them together. This approach means users need not worry about concurrency but significantly limit the interaction that can occur among jobs. Indeed, any \gls{job} that blocks also blocks the underlying worker, which effectively means the CPU utilization, and therefore throughput, suffers noticeably. It can be argued that a solution to this problem is to use more workers than available cores. However, unless the number of jobs and the number of workers are comparable, having a significant amount of blocked jobs always results in idles cores.
     22\subsection{Jobs and Thread Pools}
     23An approach on the opposite end of the spectrum is to base parallelism on \glspl{pool}. Indeed, \glspl{pool} offer limited flexibility but at the benefit of a simpler user interface. In \gls{pool} based systems, users express parallelism as units of work, called jobs, and a dependency graph (either explicit or implicit) that ties them together. This approach means users need not worry about concurrency but significantly limit the interaction that can occur among jobs. Indeed, any \gls{job} that blocks also block the underlying worker, which effectively means the CPU utilization, and therefore throughput, suffers noticeably. It can be argued that a solution to this problem is to use more workers than available cores. However, unless the number of jobs and the number of workers are comparable, having a significant number of blocked jobs always results in idles cores.
    2424
    2525The gold standard of this implementation is Intel's TBB library~\cite{TBB}.
    2626
    27 \subsection{Paradigm performance}
    28 While the choice between the three paradigms listed above may have significant performance implication, it is difficult to pin-down the performance implications of choosing a model at the language level. Indeed, in many situations one of these paradigms may show better performance but it all strongly depends on the workload. Having a large amount of mostly independent units of work to execute almost guarantees that the \gls{pool} based system has the best performance thanks to the lower memory overhead (i.e., no thread stack per job). However, interactions among jobs can easily exacerbate contention. User-level threads allow fine-grain context switching, which results in better resource utilization, but a context switch is more expensive and the extra control means users need to tweak more variables to get the desired performance. Finally, if the units of uninterrupted work are large enough the paradigm choice is largely amortized by the actual work done.
     27\subsection{Paradigm Performance}
     28While the choice between the three paradigms listed above may have significant performance implication, it is difficult to pin down the performance implications of choosing a model at the language level. Indeed, in many situations one of these paradigms may show better performance but it all strongly depends on the workload. Having a large amount of mostly independent units of work to execute almost guarantees that the \gls{pool}-based system has the best performance thanks to the lower memory overhead (i.e., no thread stack per job). However, interactions among jobs can easily exacerbate contention. User-level threads allow fine-grain context switching, which results in better resource utilization, but a context switch is more expensive and the extra control means users need to tweak more variables to get the desired performance. Finally, if the units of uninterrupted work are large enough the paradigm choice is largely amortized by the actual work done.
    2929
    3030\section{The \protect\CFA\ Kernel : Processors, Clusters and Threads}\label{kernel}
     
    3333\Glspl{cfacluster} have not been fully implemented in the context of this thesis, currently \CFA only supports one \gls{cfacluster}, the initial one.
    3434
    35 \subsection{Future Work: Machine setup}\label{machine}
     35\subsection{Future Work: Machine Setup}\label{machine}
    3636While this was not done in the context of this thesis, another important aspect of clusters is affinity. While many common desktop and laptop PCs have homogeneous CPUs, other devices often have more heterogeneous setups. For example, a system using \acrshort{numa} configurations may benefit from users being able to tie clusters and\/or kernel threads to certain CPU cores. OS support for CPU affinity is now common~\cite{affinityLinux, affinityWindows, affinityFreebsd, affinityNetbsd, affinityMacosx} which means it is both possible and desirable for \CFA to offer an abstraction mechanism for portable CPU affinity.
    3737
    3838\subsection{Paradigms}\label{cfaparadigms}
    39 Given these building blocks, it is possible to reproduce all three of the popular paradigms. Indeed, \glspl{uthread} is the default paradigm in \CFA. However, disabling \gls{preemption} on the \gls{cfacluster} means \glspl{cfathread} effectively become \glspl{fiber}. Since several \glspl{cfacluster} with different scheduling policy can coexist in the same application, this allows \glspl{fiber} and \glspl{uthread} to coexist in the runtime of an application. Finally, it is possible to build executors for thread pools from \glspl{uthread} or \glspl{fiber}, which includes specialize jobs like actors~\cite{Actors}.
     39Given these building blocks, it is possible to reproduce all three of the popular paradigms. Indeed, \glspl{uthread} is the default paradigm in \CFA. However, disabling \gls{preemption} on the \gls{cfacluster} means \glspl{cfathread} effectively become \glspl{fiber}. Since several \glspl{cfacluster} with different scheduling policy can coexist in the same application, this allows \glspl{fiber} and \glspl{uthread} to coexist in the runtime of an application. Finally, it is possible to build executors for thread pools from \glspl{uthread} or \glspl{fiber}, which includes specialized jobs like actors~\cite{Actors}.
Note: See TracChangeset for help on using the changeset viewer.