Changes in / [8f9cc50:30b65d8]


Ignore:
Files:
6 edited

Legend:

Unmodified
Added
Removed
  • doc/proposals/concurrency/concurrency.tex

    r8f9cc50 r30b65d8  
    673673% #       #     # #     # #     # ####### ####### ####### ####### ###  #####  #     #
    674674\section{Parallelism}
    675 Historically, computer performance was about processor speeds and instructions count. However, with heat dissipation being a direct consequence of speed increase, parallelism has become the new source for increased performance~\cite{Sutter05, Sutter05b}. In this decade, it is not longer reasonnable to create high-performance application without caring about parallelism. Indeed, parallelism is an important aspect of performance and more specifically throughput and hardware utilization. The lowest level approach of parallelism is to use \glspl{kthread} in combination with semantics like \code{fork}, \code{join}, etc. However, since these have significant costs and limitations \glspl{kthread} are now mostly used as an implementation tool rather than a user oriented one. There are several alternatives to solve these issues that all have strengths and weaknesses. While there are many variations of the presented paradigms, most of these variations do not actually change the guarantees or the semantics, they simply move costs in order to achieve better performance for certain workloads.
     675Historically, computer performance was about processor speeds and instructions count. However, with heat dissipation being an ever growing challenge, parallelism has become the new source of greatest performance~\cite{Sutter05, Sutter05b}. In this decade, it is not longer reasonnable to create high-performance application without caring about parallelism. Indeed, parallelism is an important aspect of performance and more specifically throughput and hardware utilization. The lowest level approach of parallelism is to use \glspl{kthread}. However since these have significant costs and limitations \glspl{kthread} are now mostly used as an implementation tool rather than a user oriented one. There are several alternatives to solve these issues which all have strengths and weaknesses.
    676676
    677677\subsection{User-level threads}
    678 A direct improvement on the \gls{kthread} approach is to use \glspl{uthread}. These threads offer most of the same features that the operating system already provide but can be used on a much larger scale. This approach is the most powerfull solution as it allows all the features of multi-threading while removing several of the more expensives costs of using kernel threads. The down side is that almost none of the low-level threading problems are hidden, users still have to think about data races, deadlocks and synchronization issues. These issues can be somewhat alleviated by a concurrency toolkit with strong garantees but the parallelism toolkit offers very little to reduce complexity in itself.
    679 
    680 Examples of languages that support are Erlang~\cite{Erlang} and \uC~\cite{uC++book}.
     678A direct improvement on the \gls{kthread} approach is to use \glspl{uthread}. These threads offer most of the same features that the operating system already provide but can be used on a much larger scale. This is the most powerfull solution as it allows all the features of multi-threading while removing several of the more expensives costs of using kernel threads. The down side is that almost none of the low-level threading complexities are hidden, users still have to think about data races, deadlocks and synchronization issues. This can be somewhat alleviated by a concurrency toolkit with strong garantees but the parallelism toolkit offers very little to reduce complexity in itself.
     679
     680Examples of languages that support are Java~\cite{Java}, Haskell~\cite{Haskell} and \uC~\cite{uC++book}.
     681
     682\subsection{Jobs and thread pools}
     683The approach on the opposite end of the spectrum is to base parallelism on \glspl{job}. Indeed, \glspl{job} offer limited flexibility but at the benefit of a simpler user interface. In \gls{job} based systems users express parallelism as units of work and the dependency graph (either explicit or implicit) that tie them together. This means users need not to worry about concurrency but significantly limits the interaction that can occur between different jobs. Indeed, any \gls{job} that blocks also blocks the underlying \gls{kthread}, this effectively mean the CPU utilization, and therefore throughput, will suffer noticeably.
     684The golden standard of this implementation is Intel's TBB library~\cite{TBB}.
    681685
    682686\subsection{Fibers : user-level threads without preemption}
    683 In the middle of the flexibility versus complexity spectrum lay \glspl{fiber} which offer \glspl{uthread} without the complexity of preemption by using cooperative scheduling. On a single core machine this means users need not worry about concurrency. On multi-core machines, while concurrency is still a concern, it is only a problem for fibers across cores but not while on the same core. This extra guarantee plus the fact that creating and destroying fibers are implicit synchronizing points means preventing mutable shared ressources still leaves many control flow options. However, multi-core processors can still execute fibers in parallel. This means users either need to worry about mutual exclusion, deadlocks and race conditions, or limit themselves to subset of concurrency primitives, raising the complexity in both cases. In this aspect, fibers can be seen as a more powerfull alternative to \glspl{job}.
    684 
     687Finally, in the middle of the flexibility versus complexity spectrum lay \glspl{fiber} which offer \glspl{uthread} without the complexity of preemption. This means users don't have to worry about other \glspl{fiber} suddenly executing between two instructions which signficantly reduces complexity. However, any call to IO or other concurrency primitives can lead to context switches. Furthermore, users can also block \glspl{fiber} in the middle of their execution without blocking a full processor core. This means users still have to worry about mutual exclusion, deadlocks and race conditions in their code, raising the complexity significantly.
    685688An example of a language that uses fibers is Go~\cite{Go}
    686689
    687 \subsection{Jobs and thread pools}
    688 The approach on the opposite end of the spectrum is to base parallelism on \glspl{pool}. Indeed, \glspl{pool} offer limited flexibility but at the benefit of a simpler user interface. In \gls{pool} based systems, users express parallelism as units of work and the dependency graph (either explicit or implicit) that tie them together. This approach means users need not worry about concurrency but significantly limits the interaction that can occur between different jobs. Indeed, any \gls{job} that blocks also blocks the underlying worker, which effectively means the CPU utilization, and therefore throughput, suffers noticeably. It can be argued that a solution to this problem is to use more workers than available cores. However, unless the number of jobs and the number of workers are comparable, having a significant amount of blocked jobs will always results in idles cores.
    689 
    690 The gold standard of this implementation is Intel's TBB library~\cite{TBB}.
    691 
    692690\subsection{Paradigm performance}
    693 While the choice between the three paradigms listed above may have significant performance implication, it is difficult to pindown the performance implications of chosing a model at the language level. Indeed, in many situations one of these paradigms may show better performance but it all strongly depends on the workload. Having a large amount of mostly independent units of work to execute almost guarantess that the \gls{pool} based system has the best performance thanks to the lower memory overhead. However, interactions between jobs can easily exacerbate contention. User-level threads may allows fine grain context switching which may result in better resource utilisation but context switches will be more expansive and it is also harder for users to get perfect tunning. As with every example, fibers sit somewhat in the middle of the spectrum. Furthermore, if the units of uninterrupted work are large enough the paradigm choice will be largely amorticised by the actual work done.
     691While the choice between the three paradigms listed above may have significant performance implication, it is difficult to pin the performance implications of chosing a model at the language level. Indeed, in many situations one of these paradigms will show better performance but it all strongly depends on the usage. Having mostly indepent units of work to execute almost guarantess that the \gls{job} based system will have the best performance. However, add interactions between jobs and the processor utilisation might suffer. User-level threads may allow maximum ressource utilisation but context switches will be more expansive and it is also harder for users to get perfect tunning. As with every example, fibers sit somewhat in the middle of the spectrum. Furthermore, if the units of uninterrupted work are large enough the paradigm choice will be largely amorticised by the actual work done.
    694692
    695693%  #####  #######    #          ####### ######  ######
     
    708706\multicolumn{1}{ c| }{} & Has a stack & Preemptive \\
    709707\hline
    710 \Glspl{pool} & X & X \\
     708\Glspl{job} & X & X \\
    711709\hline
    712710\Glspl{fiber} & \checkmark & X \\
     
    717715\end{center}
    718716
    719 This table is missing several variations (for example jobs on \glspl{uthread} or \glspl{fiber}), but these variation affect mostly performance and do not effect the guarantees as the presented paradigm do.
    720 
    721717As shown in section \ref{cfaparadigms} these different blocks being available in \CFA it is trivial to reproduce any of these paradigm.
    722718
     
    730726
    731727\subsection{Thread Interface}
    732 The basic building blocks of \CFA are \glspl{cfathread}. By default these are implemented as \glspl{uthread} and as such offer a flexible and lightweight threading interface (lightweight comparatievely to \glspl{kthread}). A thread can be declared using a struct declaration with prefix \code{thread} as follows :
     728The basic building blocks of \CFA are \glspl{cfathread}. By default these are implemented as \glspl{uthread} and as such offer a flexible and lightweight threading interface (lightweight comparatievely to \glspl{kthread}). A thread can be declared using a struct declaration prefix with the \code{thread} as follows :
    733729
    734730\begin{lstlisting}
     
    736732\end{lstlisting}
    737733
    738 Obviously, for this thread implementation to be usefull it must run some user code. Several other threading interfaces use some function pointer representation as the interface of threads (for example : \Csharp~\cite{Csharp} and Scala~\cite{Scala}). However, this proposal considers that statically tying a \code{main} routine to a thread superseeds this approach. Since the \code{main} routine is definitely a special routine in \CFA, the existing syntax for declaring routines with unordinary name can be extended, i.e. operator overloading. As such the \code{main} routine of a thread can be defined as :
     734Obviously, for this thread implementation to be usefull it must run some user code. Several other threading interfaces use some function pointer representation as the interface of threads (for example : \Csharp~\cite{Csharp} and Scala~\cite{Scala}). However, we consider that statically tying a \code{main} routine to a thread superseeds this approach. Since the \code{main} routine is definetely a special routine in \CFA, we can reuse the existing syntax for declaring routines with unordinary name, i.e. operator overloading. As such the \code{main} routine of a thread can be defined as such :
    739735\begin{lstlisting}
    740736        thread struct foo {};
     
    784780}
    785781\end{lstlisting}
    786 This semantic has several advantages over explicit semantics : typesafety is guaranteed, a thread is always started and stopped exaclty once and users cannot make any progamming errors. Furthermore it naturally follows the memory allocation semantics, which means users do not need to learn multiple semantics.
     782This semantic has several advantages over explicit semantics : typesafety is guaranteed, any thread will always be started and stopped exaclty once and users can't make any progamming errors. Furthermore it naturally follows the memory allocation semantics which means users don't need to learn multiple semantics.
    787783
    788784These semantics also naturally scale to multiple threads meaning basic synchronisation is very simple :
     
    807803
    808804                //Wait for the 10 threads to finish
    809         }
    810 
    811         void bar() {
    812                 MyThread* thrds = new MyThread[10];
    813                 //Start 10 threads at the beginning of the scope
    814 
    815                 DoStuff();
    816 
    817                 //Wait for the 10 threads to finish
    818                 delete MyThread;
    819805        }
    820806\end{lstlisting}
  • doc/proposals/concurrency/glossary.tex

    r8f9cc50 r30b65d8  
    5252}
    5353
    54 \longnewglossaryentry{pool}
    55 {name={thread-pool}}
    56 {
    57 Group of homogeneuous threads that loop executing units of works after another.
    58 
    59 \textit{Synonyms : }
    60 }
    61 
    6254\longnewglossaryentry{cfacluster}
    6355{name={cluster}}
  • doc/proposals/concurrency/version

    r8f9cc50 r30b65d8  
    1 0.5.150
     10.5.146
  • src/ResolvExpr/AlternativeFinder.cc

    r8f9cc50 r30b65d8  
    10041004                                        AssertionSet needAssertions, haveAssertions;
    10051005                                        Alternative newAlt( 0, third->env, first->cost + second->cost + third->cost );
    1006                                         Type* commonType = nullptr;
     1006                                        Type* commonType;
    10071007                                        if ( unify( second->expr->get_result(), third->expr->get_result(), newAlt.env, needAssertions, haveAssertions, openVars, indexer, commonType ) ) {
    10081008                                                ConditionalExpr *newExpr = new ConditionalExpr( first->expr->clone(), second->expr->clone(), third->expr->clone() );
  • src/ResolvExpr/Resolver.cc

    r8f9cc50 r30b65d8  
    7272                Type * functionReturn = nullptr;
    7373                Type *initContext = nullptr;
     74                Type *switchType = nullptr;
    7475                bool inEnumDecl = false;
    7576        };
  • src/Tuples/TupleExpansion.cc

    r8f9cc50 r30b65d8  
    3232namespace Tuples {
    3333        namespace {
    34                 class MemberTupleExpander final : public Mutator {
     34                class MemberTupleExpander : public Mutator {
    3535                public:
    3636                        typedef Mutator Parent;
    37                         using Parent::mutate;
    38 
    39                         virtual Expression * mutate( UntypedMemberExpr * memberExpr ) override;
    40                 };
    41 
    42                 class UniqueExprExpander final : public GenPoly::DeclMutator {
     37                        virtual Expression * mutate( UntypedMemberExpr * memberExpr );
     38                };
     39
     40                class UniqueExprExpander : public GenPoly::DeclMutator {
    4341                public:
    4442                        typedef GenPoly::DeclMutator Parent;
    45                         using Parent::mutate;
    46 
    47                         virtual Expression * mutate( UniqueExpr * unqExpr ) override;
     43
     44                        virtual Expression * mutate( UniqueExpr * unqExpr );
    4845
    4946                        std::map< int, Expression * > decls; // not vector, because order added may not be increasing order
     
    5956                public:
    6057                        typedef Mutator Parent;
    61                         using Parent::mutate;
    62 
    6358                        virtual Expression * mutate( TupleAssignExpr * tupleExpr );
    6459                };
     
    6762                  public:
    6863                        typedef GenPoly::DeclMutator Parent;
    69                         using Parent::mutate;
    70 
    71                         virtual Type * mutate( TupleType * tupleType ) override;
    72 
    73                         virtual CompoundStmt * mutate( CompoundStmt * stmt ) override {
     64
     65                        virtual Type * mutate( TupleType * tupleType );
     66
     67                        virtual CompoundStmt * mutate( CompoundStmt * stmt ) {
    7468                                typeMap.beginScope();
    7569                                stmt = Parent::mutate( stmt );
     
    8175                };
    8276
    83                 class TupleIndexExpander final : public Mutator {
     77                class TupleIndexExpander : public Mutator {
    8478                public:
    8579                        typedef Mutator Parent;
    86                         using Parent::mutate;
    87 
    88                         virtual Expression * mutate( TupleIndexExpr * tupleExpr ) override;
    89                 };
    90 
    91                 class TupleExprExpander final : public Mutator {
     80                        virtual Expression * mutate( TupleIndexExpr * tupleExpr );
     81                };
     82
     83                class TupleExprExpander : public Mutator {
    9284                public:
    9385                        typedef Mutator Parent;
    94                         using Parent::mutate;
    95                        
    96                         virtual Expression * mutate( TupleExpr * tupleExpr ) override;
     86                        virtual Expression * mutate( TupleExpr * tupleExpr );
    9787                };
    9888        }
     
    328318// compile-command: "make install" //
    329319// End: //
     320
Note: See TracChangeset for help on using the changeset viewer.