Ignore:
Timestamp:
Oct 5, 2016, 11:47:40 AM (8 years ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
a7976d79
Parents:
7e10773
Message:

-added pl blibliography
-rename citations to local bib
-added glossary
-started working on parallelism

Location:
doc/proposals/concurrency
Files:
1 added
2 edited
1 moved

Legend:

Unmodified
Added
Removed
  • doc/proposals/concurrency/Makefile

    r7e10773 rc69adb7  
    11## Define the appropriate configuration variables.
    22
    3 TeXLIB = .:../../LaTeXmacros:../../LaTeXmacros/listings:../../LaTeXmacros/enumitem:
     3TeXLIB = .:../../LaTeXmacros:../../LaTeXmacros/listings:../../LaTeXmacros/enumitem:~/bibliographies:
    44LaTeX  = TEXINPUTS=${TeXLIB} && export TEXINPUTS && latex -halt-on-error
    55BibTeX = BIBINPUTS=${TeXLIB} && export BIBINPUTS && bibtex
     
    3232
    3333clean :
    34         rm -f *.bbl *.aux *.dvi *.idx *.ilg *.ind *.brf *.out *.log *.toc *.blg *.pstex_t *.cf \
     34        rm -f *.bbl *.aux *.dvi *.idx *.ilg *.ind *.brf *.out *.log *.toc *.blg *.pstex_t *.cf *.glg *.glo *.gls *.ist \
    3535                ${FIGURES} ${PICTURES} ${PROGRAMS} ${GRAPHS} ${basename ${DOCUMENT}}.ps ${DOCUMENT}
    3636
     
    5454        -${BibTeX} ${basename $@}
    5555        # Make index from *.aux entries and input index at end of document
    56         #makeindex -s ../../LaTeXmacros/indexstyle ${basename $@}.idx
     56        makeglossaries ${basename $@}
    5757        #${LaTeX} ${basename $@}.tex
    5858        # Run again to get index title into table of contents
    5959        ${LaTeX} ${basename $@}.tex
     60        ${LaTeX} ${basename $@}.tex
     61
    6062
    6163predefined :
  • doc/proposals/concurrency/concurrency.tex

    r7e10773 rc69adb7  
    11% requires tex packages: texlive-base texlive-latex-base tex-common texlive-humanities texlive-latex-extra texlive-fonts-recommended
    22
    3 % inline code ©...© (copyright symbol) emacs: C-q M-)
    4 % red highlighting ®...® (registered trademark symbol) emacs: C-q M-.
    5 % blue highlighting ß...ß (sharp s symbol) emacs: C-q M-_
    6 % green highlighting ¢...¢ (cent symbol) emacs: C-q M-"
    7 % LaTex escape §...§ (section symbol) emacs: C-q M-'
    8 % keyword escape ¶...¶ (pilcrow symbol) emacs: C-q M-^
     3% inline code �...� (copyright symbol) emacs: C-q M-)
     4% red highlighting �...� (registered trademark symbol) emacs: C-q M-.
     5% blue highlighting �...� (sharp s symbol) emacs: C-q M-_
     6% green highlighting �...� (cent symbol) emacs: C-q M-"
     7% LaTex escape �...� (section symbol) emacs: C-q M-'
     8% keyword escape �...� (pilcrow symbol) emacs: C-q M-^
    99% math escape $...$ (dollar symbol)
    1010
     
    2424\usepackage{graphicx}
    2525\usepackage{tabularx}
     26\usepackage{glossaries}
    2627\usepackage{varioref}                                                           % extended references
    2728\usepackage{inconsolata}
     
    3637\usepackage[dvips,plainpages=false,pdfpagelabels,pdfpagemode=UseNone,colorlinks=true,pagebackref=true,linkcolor=blue,citecolor=blue,urlcolor=blue,pagebackref=true,breaklinks=true]{hyperref}
    3738\usepackage{breakurl}
     39
    3840\renewcommand{\UrlFont}{\small\sf}
    3941
     
    5759\newcommand{\code}[1]{\lstinline{#1}}
    5860
     61\input{glossary}
    5962
    6063\newsavebox{\LstBox}
     
    8184Indeed, for higly productive parallel programming high-level approaches are much more popular\cite{HPP:Study}. Examples are task based parallelism, message passing, implicit threading.
    8285
    83 There are actually two problems that need to be solved in the design of the concurrency for a language. Which concurrency tools are available to the users and which parallelism tools are available. While these two concepts are often seen together, they are in fact distinct concepts that require different sorts of tools\cite{Myths}. Concurrency tools need to handle mutual exclusion and synchronization while parallelism tools are more about performance, cost and ressource utilisation.
     86There are actually two problems that need to be solved in the design of the concurrency for a language. Which concurrency tools are available to the users and which parallelism tools are available. While these two concepts are often seen together, they are in fact distinct concepts that require different sorts of tools\cite{Buhr05a}. Concurrency tools need to handle mutual exclusion and synchronization while parallelism tools are more about performance, cost and ressource utilisation.
    8487
    8588\section{Concurrency}
    86 Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared state completely (Erlang, Haskel, Akka (Scala))\cit. In the paradigms, interaction between concurrent objects rely on message passing or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages these approaches entail a clear distinction between concurrent and non concurrent paradigms. Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective at their jobs. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. However for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. This paper proposes Monitors\cit as the core concurrency construct.
    87 
    88 Finally, an approach that is worth mentionning because it is gaining in popularity is transactionnal memory\cit. However, the performance and feature set is currently too restrictive to be possible to add such a paradigm to a language like C or \CC\cit, which is why it was rejected as the core paradigm for concurrency in \CFA.
     89Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared state completely (Erlang\cite{Erlang}, Haskell\cite{Haskell}, Akka (Scala)\cit). In the paradigms, interaction between concurrent objects rely on message passing or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages these approaches entail a clear distinction between concurrent and non concurrent paradigms. Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective at their jobs. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. However for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. This paper proposes Monitors\cit as the core concurrency construct.
     90
     91Finally, an approach that is worth mentionning because it is gaining in popularity is transactionnal memory\cite{Dice10}. However, the performance and feature set is currently too restrictive to be possible to add such a paradigm to a language like C or \CC\cit, which is why it was rejected as the core paradigm for concurrency in \CFA.
    8992
    9093\section{Monitors}
    91 A monitor is a set of routines that ensure mutual exclusion when accessing shared state. This concept is generally associated with Object-Oriented Languages like Java\cit or \uC\cite{uCPP:Book} but does not strictly require OOP semantics. The only requirements is to be able to declare a handle to a shared object and a set of routines that act on it :
     94A monitor is a set of routines that ensure mutual exclusion when accessing shared state. This concept is generally associated with Object-Oriented Languages like Java\cite{Java} or \uC\cite{uC++book} but does not strictly require OOP semantics. The only requirements is to be able to declare a handle to a shared object and a set of routines that act on it :
    9295\begin{lstlisting}
    9396        typedef /*some monitor type*/ monitor;
     
    442445Regardless of the option chosen for wait semantics, signal must be symmetrical. In all cases, signal only needs a single parameter, the condition variable that needs to be signalled. But \code{signal} needs to be called from the same monitor(s) than the call to \code{wait}. Otherwise, mutual exclusion cannot be properly transferred back to the waiting monitor.
    443446
     447Finally, an additionnal semantic which can be very usefull is the \code{signalBlock} routine. This routine behaves like signal for all of the semantics discussed above, but with the subtelty that mutual exclusion is transferred to the waiting task immediately rather than wating for the end of the critical section.
     448
    444449\subsection{External scheduling} \label{extsched}
    445 \textbf{\large{Work in progress...}}
    446 As one might expect, the alternative to Internal scheduling is to use external scheduling instead. The goal of external scheduling is to be able to have the same scheduling power as internal scheduling without the requirement that any thread can acquire the monitor lock. This method is somewhat more robust to deadlocks since one of the threads keeps a relatively tight control on scheduling. External scheduling can generally be done either in terms of control flow (see \uC) or in terms of data (see Go). Of course, both of these paradigms have their own strenghts and weaknesses but for this project control flow semantics where chosen to stay consistent with the reset of the languages semantics. Two challenges specific to \CFA arise when trying to add external scheduling which is loose object definitions and multi-monitor routines.
     450As one might expect, the alternative to Internal scheduling is to use External scheduling instead. This method is somewhat more robust to deadlocks since one of the threads keeps a relatively tight control on scheduling. Indeed, as the following examples will demontrate, external scheduling allows users to wait for events from other threads without the concern of unrelated events occuring. External scheduling can generally be done either in terms of control flow (see \uC) or in terms of data (see Go). Of course, both of these paradigms have their own strenghts and weaknesses but for this project control flow semantics where chosen to stay consistent with the reset of the languages semantics. Two challenges specific to \CFA arise when trying to add external scheduling which is loose object definitions and multi-monitor routines. The following example shows what a simple use \code{accept} versus \code{wait}/\code{signal} and its advantages.
     451
     452\begin{center}
     453\begin{tabular}{|c|c|}
     454Internal Scheduling & External Scheduling \\
     455\hline
     456\begin{lstlisting}
     457        _Monitor blarg {
     458                condition c;
     459        public:
     460                void f();
     461                void g() { signal}
     462                void h() { wait(c); }
     463        private:
     464        }
     465\end{lstlisting}&\begin{lstlisting}
     466        _Monitor blarg {
     467
     468        public:
     469                void f();
     470                void g();
     471                void h() { _Accept(g); }
     472        private:
     473        }
     474\end{lstlisting}
     475\end{tabular}
     476\end{center}
     477
     478In the case of internal scheduling, the call to \code{wait} only guarantees that \code{g} was the last routine to access the monitor. This intails that the routine \code{f} may have acquired mutual exclusion several times while routine \code{h} was waiting. On the other hand, external scheduling guarantees that while routine \code{h} was waiting, no routine other than \code{g} could acquire the monitor.
    447479
    448480\subsubsection{Loose object definitions}
    449 In \uC monitor definitions include an exhaustive list of monitor operations :
    450 \begin{lstlisting}
    451         _Monitor blarg {
    452         public:
    453                 void f() { _Accept(g); }
    454                 void g();
    455         private:
    456         }
    457 \end{lstlisting}
    458 
    459 Since \CFA is not an object oriented it becomes much more difficult to implement but also much less clear for the user :
     481In \uC monitor definitions include an exhaustive list of monitor operations. Since \CFA is not an object oriented it becomes much more difficult to implement but also much less clear for the user :
    460482
    461483\begin{lstlisting}
    462484        mutex struct A {};
    463485
    464         void f(A & mutex a) { accept(g); }
     486        void f(A & mutex a);
    465487        void g(A & mutex a);
     488        void h(A & mutex a) { accept(g); }
    466489\end{lstlisting}
    467490
     
    556579To support multi-monitor external scheduling means that some kind of entry-queues must be used that is aware of both monitors. However, acceptable routines must be aware of the entry queues which means they most be stored inside at least one of the monitors that will be acquired. This in turn adds the requirement a systematic algorithm of disambiguating which queue is relavant regardless of user ordering. The proposed algorithm is to fall back on monitors lock ordering and specify that the monitor that is acquired first is the lock with the relevant entry queue. This assumes that the lock acquiring order is static for the lifetime of all concerned objects gut that is a reasonnable contraint. This algorithm choice has two consequences, the ofthe highest priority monitor is no longer a true FIFO queue and the queue of the lowest priority monitor is both required and probably unused. The queue can no longer be a FIFO queue because instead of simply containing the waiting threads in order arrival, they also contain the second mutex. Therefore, another thread with the same highest priority monitor but a different lowest priority monitor may arrive first but enter the critical section after a thread with the correct pairing. Secondly, since it may not be known at compile time which monitor will be the lowest priority monitor, every monitor needs to have the correct queues even though it is probably that half the multi-monitor queues will go unused for the entire duration of the program.
    557580
    558 \section{Parrallelism}
    559 
    560 \section{Tasks}
    561 
    562 \section{Naming}
     581\subsection{Other concurrency tools}
     582
     583\section{Parallelism}
     584Historically, computer performance was about processor speeds and instructions count. However, with heat dissipaction being an ever growing challenge, parallelism has become the new source of greatest performance \cite{Sutter05, Sutter05b}. In this decade, it is not longer reasonnable create high-performance application without caring about parallelism. Indeed, parallelism an important aspect of performance and more specifically throughput and hardware utilization. The lowest level approach parallelism is to use \glspl{kthread}. However since these have significant costs and limitations, \glspl{kthread} are now mostly used as an implementation tool rather than a user oriented one. There are several alternatives to solve these issues which all have strengths and weaknesses.
     585
     586\subsection{User-level threads}
     587A direct improvement on the \gls{kthread} approach is to use \glspl{uthread}. These threads offer most of the same features that the operating system already provide but can be used on a much larger scale. This is the most powerfull solution as it allows all the features of multi-threading while removing several of the more expensives costs of using kernel threads. The down side is that almost none of the low-level threading complexities are hidden, users still have to think about data races, deadlocks and synchronization issues. This can be somewhat alleviated by a concurrency toolkit with strong garantees but the parallelism toolkit offers very little to reduce complexity in itself.
     588
     589Examples of languages that support are Java\cite{Java}, Haskell\cite{Haskell} and \uC\cite{uC++book}.
     590
     591\subsection{Jobs and thread pools}
     592The opposite approach is to base parallelism on \glspl{job}. Indeed, \glspl{job} offer limited flexibility but at the benefit of a simpler user interface. In \gls{job} based systems users express parallelism as units of work and the dependency graph (either explicit or implicit) that tie them together. This means users need not to worry about concurrency but significantly limits the interaction that can occur between different jobs. Indeed, any \gls{job} that blocks also blocks the underlying \gls{kthread}, this effectively mean the CPU utilization, and therefore throughput, will suffer noticeably. The golden standard of this implementation is Intel's TBB library\cite{TBB}.
     593
     594\subsection{Fibers : user-level threads without preemption}
     595Finally, in the middle of the flexibility versus complexity spectrum lay \glspl{fiber} which offer \glspl{uthread} without the complexity of preemption. This means users don't have to worry about other \glspl{fiber} suddenly executing between two instructions which signficantly reduces complexity. However, any call to IO or other concurrency primitives can lead to context switches. Furthermore, users can also block \glspl{fiber} in the middle of their execution without blocking a full processor core. This means users still have to worry about mutual exclusion, deadlocks and race conditions in their code, raising the complexity significantly.
     596\cite{Go}
     597
     598\subsection{Paradigm performance}
     599While the choice between the three paradigms listed above can have significant performance implication, it is difficult to pin the performance implications of chosing a model at the language level. Indeed, in many situations own of these paradigms will show better performance but it all depends on the usage.
     600Having mostly indepent units of work to execute almost guarantess that the \gls{job} based system will have the best performance. However, add interactions between jobs and the processor utilisation might suffer. User-level threads may allow maximum ressource utilisation but context switches will be more expansive and it is also harder for users to get perfect tunning. As with every example, fibers sit somewhat in the middle of the spectrum.
     601
     602\section{Parallelism in \CFA}
     603As a system level language, \CFA should offer both performance and flexibilty as its primary goals, simplicity and user-friendliness being a secondary concern. Therefore, the core of parallelism in \CFA should prioritize power and efficiency.
     604
     605\subsection{Kernel core}\label{kernel}
     606At the ro
     607\subsubsection{Threads}
     608\CFA threads have all the caracteristiques of
     609
     610\subsection{High-level options}\label{tasks}
     611
     612\subsubsection{Thread interface}
     613constructors destructors
     614        initializer lists
     615monitors
     616
     617\subsubsection{Futures}
     618
     619\subsubsection{Implicit threading}
     620Finally, simpler applications can benefit greatly from having implicit parallelism. That is, parallelism that does not rely on the user to write concurrency. This type of parallelism can be achieved both at the language level and at the system level.
     621
     622\begin{center}
     623\begin{tabular}[t]{|c|c|c|}
     624Sequential & System Parallel & Language Parallel \\
     625\begin{lstlisting}
     626void big_sum(int* a, int* b,
     627                 int* out,
     628                 size_t length)
     629{
     630        for(int i = 0; i < length; ++i ) {
     631                out[i] = a[i] + b[i];
     632        }
     633}
     634
     635
     636
     637
     638
     639int* a[10000];
     640int* b[10000];
     641int* c[10000];
     642//... fill in a and b ...
     643big_sum(a, b, c, 10000);
     644\end{lstlisting} &\begin{lstlisting}
     645void big_sum(int* a, int* b,
     646                 int* out,
     647                 size_t length)
     648{
     649        range ar(a, a + length);
     650        range br(b, b + length);
     651        range or(out, out + length);
     652        parfor( ai, bi, oi,
     653        [](int* ai, int* bi, int* oi) {
     654                oi = ai + bi;
     655        });
     656}
     657
     658int* a[10000];
     659int* b[10000];
     660int* c[10000];
     661//... fill in a and b ...
     662big_sum(a, b, c, 10000);
     663\end{lstlisting}&\begin{lstlisting}
     664void big_sum(int* a, int* b,
     665                 int* out,
     666                 size_t length)
     667{
     668        for (ai, bi, oi) in (a, b, out) {
     669                oi = ai + bi;
     670        }
     671}
     672
     673
     674
     675
     676
     677int* a[10000];
     678int* b[10000];
     679int* c[10000];
     680//... fill in a and b ...
     681big_sum(a, b, c, 10000);
     682\end{lstlisting}
     683\end{tabular}
     684\end{center}
     685
     686\subsection{Machine setup}\label{machine}
     687Threads are all good and well but wee still some OS support to fully utilize available hardware.
     688
     689\textbf{\large{Work in progress...}} Do wee need something beyond specifying the number of kernel threads?
    563690
    564691\section{Future work}
     692Concurrency and parallelism is still a very active field that strongly benefits from hardware advances. As such certain features that aren't necessarily mature enough in their current state could become relevant in the lifetime of \CFA.
     693\subsection{Transactions}
    565694
    566695\section*{Acknowledgements}
    567696
    568 
    569 
     697\clearpage
     698\printglossary
     699
     700\clearpage
    570701\bibliographystyle{plain}
    571 \bibliography{citations}
     702\bibliography{pl,local}
    572703
    573704
  • doc/proposals/concurrency/local.bib

    r7e10773 rc69adb7  
    2222
    2323
    24 @mastersthesis{Bilson:CFA,
    25     keywords            = {Cforall, Overloading, Polymorphism},
    26     author      = {Richard C. Bilson},
    27     title       = {Implementing Overloading and Polymorphism in Cforall},
    28     school              = "University of Waterloo",
    29     year                = "2003"
    30 }
    31 
    3224@article{HPP:Study,
    3325        keywords        = {Parallel, Productivity},
    3426        author  = {Lorin Hochstein and Jeff Carver and Forrest Shull and Sima Asgari and Victor Basili and Jeffrey K. Hollingsworth and Marvin V. Zelkowitz },
    3527        title   = {Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers},
    36 }
    37 @article{CFA:Refrat,
    38     keywords    = {Cforall, refrat},
    39     author              = {Glen Ditchfield},
    40     title               = {Cforall Reference Manual and Rationale},
    41     month               = jan,
    42     year                = 2003
    4328}
    4429
     
    5035}
    5136
    52 @article{Myths,
    53         author  = {Peter A. Buhr and Ashif S. Harji},
    54         title   = {Concurrent Urban Legends},
    55         year            = 2005
     37@article{TBB,
     38        keywords        = {Intel, TBB},
     39        title   = {Intel Thread Building Blocks},
    5640}
    57 
    58 @article{uCPP:Book,
    59     keywords            = {uC++, manual, book},
    60     author      = {Peter A. Buhr},
    61     title       = {Understanding Control Flow with Concurrent Programming using $\mu${C}{\kern-.1em\hbox{\large\texttt{+\kern-.25em+}}}},
    62     month       = aug,
    63     year        = 2014
    64 }
    65 
    66 @techreport{ISO:Ada,
    67     type                = {International Standard},
    68     key                 = {ISO/IEC 8652:1995},
    69     year                = {1995},
    70     title               = {Ada},
    71     volume              = {1995},
    72     institution         = {International Organization for Standardization}
    73 }
Note: See TracChangeset for help on using the changeset viewer.