Context Navigation

← Previous Changeset
Next Changeset →

Changeset c69adb7

Timestamp:

Oct 5, 2016, 11:47:40 AM (8 years ago)

Author:

Thierry Delisle <tdelisle@…>

Branches:

ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc

Children:

Parents:

Message:

-added pl blibliography
-rename citations to local bib
-added glossary
-started working on parallelism

Location:

doc/proposals/concurrency

Files:

: 1 added
: 2 edited
: 1 moved

Makefile (modified) (3 diffs)
concurrency.tex (modified) (7 diffs)
glossary.tex (added)
local.bib (moved) (moved from doc/proposals/concurrency/citations.bib) (2 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/proposals/concurrency/Makefile

-                      r7e10773
+                      rc69adb7
 ## Define the appropriate configuration variables.
 TeXLIB = .:../../LaTeXmacros:../../LaTeXmacros/listings:../../LaTeXmacros/enumitem:
+TeXLIB = .:../../LaTeXmacros:../../LaTeXmacros/listings:../../LaTeXmacros/enumitem:~/bibliographies:
 LaTeX  = TEXINPUTS=${TeXLIB} && export TEXINPUTS && latex -halt-on-error
 BibTeX = BIBINPUTS=${TeXLIB} && export BIBINPUTS && bibtex
 …
 clean :
         rm -f *.bbl *.aux *.dvi *.idx *.ilg *.ind *.brf *.out *.log *.toc *.blg *.pstex_t *.cf \
+        rm -f *.bbl *.aux *.dvi *.idx *.ilg *.ind *.brf *.out *.log *.toc *.blg *.pstex_t *.cf *.glg *.glo *.gls *.ist \
                 ${FIGURES} ${PICTURES} ${PROGRAMS} ${GRAPHS} ${basename ${DOCUMENT}}.ps ${DOCUMENT}
 …
         -${BibTeX} ${basename $@}
         # Make index from *.aux entries and input index at end of document
         #makeindex -s ../../LaTeXmacros/indexstyle ${basename $@}.idx
+        makeglossaries ${basename $@}
         #${LaTeX} ${basename $@}.tex
         # Run again to get index title into table of contents
         ${LaTeX} ${basename $@}.tex
+        ${LaTeX} ${basename $@}.tex
 predefined :

doc/proposals/concurrency/concurrency.tex

-                      r7e10773
+                      rc69adb7
 % requires tex packages: texlive-base texlive-latex-base tex-common texlive-humanities texlive-latex-extra texlive-fonts-recommended
 % inline code ©...© (copyright symbol) emacs: C-q M-)
 % red highlighting ®...® (registered trademark symbol) emacs: C-q M-.
 % blue highlighting ß...ß (sharp s symbol) emacs: C-q M-_
 % green highlighting ¢...¢ (cent symbol) emacs: C-q M-"
 % LaTex escape §...§ (section symbol) emacs: C-q M-'
 % keyword escape ¶...¶ (pilcrow symbol) emacs: C-q M-^
+% inline code �...� (copyright symbol) emacs: C-q M-)
+% red highlighting �...� (registered trademark symbol) emacs: C-q M-.
+% blue highlighting �...� (sharp s symbol) emacs: C-q M-_
+% green highlighting �...� (cent symbol) emacs: C-q M-"
+% LaTex escape �...� (section symbol) emacs: C-q M-'
+% keyword escape �...� (pilcrow symbol) emacs: C-q M-^
 % math escape $...$ (dollar symbol)
 …
 \usepackage{graphicx}
 \usepackage{tabularx}
+\usepackage{glossaries}
 \usepackage{varioref}                                                           % extended references
 \usepackage{inconsolata}
 …
 \usepackage[dvips,plainpages=false,pdfpagelabels,pdfpagemode=UseNone,colorlinks=true,pagebackref=true,linkcolor=blue,citecolor=blue,urlcolor=blue,pagebackref=true,breaklinks=true]{hyperref}
 \usepackage{breakurl}
 \renewcommand{\UrlFont}{\small\sf}
 …
 \newcommand{\code}[1]{\lstinline{#1}}
+\input{glossary}
 \newsavebox{\LstBox}
 …
 Indeed, for higly productive parallel programming high-level approaches are much more popular\cite{HPP:Study}. Examples are task based parallelism, message passing, implicit threading.
 There are actually two problems that need to be solved in the design of the concurrency for a language. Which concurrency tools are available to the users and which parallelism tools are available. While these two concepts are often seen together, they are in fact distinct concepts that require different sorts of tools\cite{Myths}. Concurrency tools need to handle mutual exclusion and synchronization while parallelism tools are more about performance, cost and ressource utilisation.
+There are actually two problems that need to be solved in the design of the concurrency for a language. Which concurrency tools are available to the users and which parallelism tools are available. While these two concepts are often seen together, they are in fact distinct concepts that require different sorts of tools\cite{Buhr05a}. Concurrency tools need to handle mutual exclusion and synchronization while parallelism tools are more about performance, cost and ressource utilisation.
 \section{Concurrency}
 Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared state completely (Erlang, Haskel, Akka (Scala))\cit. In the paradigms, interaction between concurrent objects rely on message passing or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages these approaches entail a clear distinction between concurrent and non concurrent paradigms. Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective at their jobs. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. However for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. This paper proposes Monitors\cit as the core concurrency construct.
 Finally, an approach that is worth mentionning because it is gaining in popularity is transactionnal memory\cit. However, the performance and feature set is currently too restrictive to be possible to add such a paradigm to a language like C or \CC\cit, which is why it was rejected as the core paradigm for concurrency in \CFA.
+Several tool can be used to solve concurrency challenges. Since these challenges always appear with the use of mutable shared state, some languages and libraries simply disallow mutable shared state completely (Erlang\cite{Erlang}, Haskell\cite{Haskell}, Akka (Scala)\cit). In the paradigms, interaction between concurrent objects rely on message passing or other paradigms that often closely relate to networking concepts. However, in imperative or OO languages these approaches entail a clear distinction between concurrent and non concurrent paradigms. Which in turns mean that programmers need to learn two sets of designs patterns in order to be effective at their jobs. Approaches based on shared memory are more closely related to non-concurrent paradigms since they often rely on non-concurrent constructs like routine calls and objects. At a lower level these can be implemented as locks and atomic operations. However for productivity reasons it is desireable to have a higher-level construct to be the core concurrency paradigm\cite{HPP:Study}. This paper proposes Monitors\cit as the core concurrency construct.
+Finally, an approach that is worth mentionning because it is gaining in popularity is transactionnal memory\cite{Dice10}. However, the performance and feature set is currently too restrictive to be possible to add such a paradigm to a language like C or \CC\cit, which is why it was rejected as the core paradigm for concurrency in \CFA.
 \section{Monitors}
 A monitor is a set of routines that ensure mutual exclusion when accessing shared state. This concept is generally associated with Object-Oriented Languages like Java\cit or \uC\cite{uCPP:Book} but does not strictly require OOP semantics. The only requirements is to be able to declare a handle to a shared object and a set of routines that act on it :
+A monitor is a set of routines that ensure mutual exclusion when accessing shared state. This concept is generally associated with Object-Oriented Languages like Java\cite{Java} or \uC\cite{uC++book} but does not strictly require OOP semantics. The only requirements is to be able to declare a handle to a shared object and a set of routines that act on it :
 \begin{lstlisting}
         typedef /*some monitor type*/ monitor;
 …
 Regardless of the option chosen for wait semantics, signal must be symmetrical. In all cases, signal only needs a single parameter, the condition variable that needs to be signalled. But \code{signal} needs to be called from the same monitor(s) than the call to \code{wait}. Otherwise, mutual exclusion cannot be properly transferred back to the waiting monitor.
+Finally, an additionnal semantic which can be very usefull is the \code{signalBlock} routine. This routine behaves like signal for all of the semantics discussed above, but with the subtelty that mutual exclusion is transferred to the waiting task immediately rather than wating for the end of the critical section.
 \subsection{External scheduling} \label{extsched}
+\textbf{\large{Work in progress...}}
+As one might expect, the alternative to Internal scheduling is to use external scheduling instead. The goal of external scheduling is to be able to have the same scheduling power as internal scheduling without the requirement that any thread can acquire the monitor lock. This method is somewhat more robust to deadlocks since one of the threads keeps a relatively tight control on scheduling. External scheduling can generally be done either in terms of control flow (see \uC) or in terms of data (see Go). Of course, both of these paradigms have their own strenghts and weaknesses but for this project control flow semantics where chosen to stay consistent with the reset of the languages semantics. Two challenges specific to \CFA arise when trying to add external scheduling which is loose object definitions and multi-monitor routines.
+As one might expect, the alternative to Internal scheduling is to use External scheduling instead. This method is somewhat more robust to deadlocks since one of the threads keeps a relatively tight control on scheduling. Indeed, as the following examples will demontrate, external scheduling allows users to wait for events from other threads without the concern of unrelated events occuring. External scheduling can generally be done either in terms of control flow (see \uC) or in terms of data (see Go). Of course, both of these paradigms have their own strenghts and weaknesses but for this project control flow semantics where chosen to stay consistent with the reset of the languages semantics. Two challenges specific to \CFA arise when trying to add external scheduling which is loose object definitions and multi-monitor routines. The following example shows what a simple use \code{accept} versus \code{wait}/\code{signal} and its advantages.
+\begin{center}
+\begin{tabular}{|c|c|}
+Internal Scheduling & External Scheduling \\
+\hline
+\begin{lstlisting}
+        _Monitor blarg {
+                condition c;
+        public:
+                void f();
+                void g() { signal}
+                void h() { wait(c); }
+        private:
+        }
+\end{lstlisting}&\begin{lstlisting}
+        _Monitor blarg {
+        public:
+                void f();
+                void g();
+                void h() { _Accept(g); }
+        private:
+        }
+\end{lstlisting}
+\end{tabular}
+\end{center}
+In the case of internal scheduling, the call to \code{wait} only guarantees that \code{g} was the last routine to access the monitor. This intails that the routine \code{f} may have acquired mutual exclusion several times while routine \code{h} was waiting. On the other hand, external scheduling guarantees that while routine \code{h} was waiting, no routine other than \code{g} could acquire the monitor.
 \subsubsection{Loose object definitions}
+In \uC monitor definitions include an exhaustive list of monitor operations :
+\begin{lstlisting}
+        _Monitor blarg {
+        public:
+                void f() { _Accept(g); }
+                void g();
+        private:
+        }
+\end{lstlisting}
+Since \CFA is not an object oriented it becomes much more difficult to implement but also much less clear for the user :
+In \uC monitor definitions include an exhaustive list of monitor operations. Since \CFA is not an object oriented it becomes much more difficult to implement but also much less clear for the user :
 \begin{lstlisting}
         mutex struct A {};
         void f(A & mutex a) { accept(g); }
+        void f(A & mutex a);
         void g(A & mutex a);
+        void h(A & mutex a) { accept(g); }
 \end{lstlisting}
 …
 To support multi-monitor external scheduling means that some kind of entry-queues must be used that is aware of both monitors. However, acceptable routines must be aware of the entry queues which means they most be stored inside at least one of the monitors that will be acquired. This in turn adds the requirement a systematic algorithm of disambiguating which queue is relavant regardless of user ordering. The proposed algorithm is to fall back on monitors lock ordering and specify that the monitor that is acquired first is the lock with the relevant entry queue. This assumes that the lock acquiring order is static for the lifetime of all concerned objects gut that is a reasonnable contraint. This algorithm choice has two consequences, the ofthe highest priority monitor is no longer a true FIFO queue and the queue of the lowest priority monitor is both required and probably unused. The queue can no longer be a FIFO queue because instead of simply containing the waiting threads in order arrival, they also contain the second mutex. Therefore, another thread with the same highest priority monitor but a different lowest priority monitor may arrive first but enter the critical section after a thread with the correct pairing. Secondly, since it may not be known at compile time which monitor will be the lowest priority monitor, every monitor needs to have the correct queues even though it is probably that half the multi-monitor queues will go unused for the entire duration of the program.
+\section{Parrallelism}
+\section{Tasks}
+\section{Naming}
+\subsection{Other concurrency tools}
+\section{Parallelism}
+Historically, computer performance was about processor speeds and instructions count. However, with heat dissipaction being an ever growing challenge, parallelism has become the new source of greatest performance \cite{Sutter05, Sutter05b}. In this decade, it is not longer reasonnable create high-performance application without caring about parallelism. Indeed, parallelism an important aspect of performance and more specifically throughput and hardware utilization. The lowest level approach parallelism is to use \glspl{kthread}. However since these have significant costs and limitations, \glspl{kthread} are now mostly used as an implementation tool rather than a user oriented one. There are several alternatives to solve these issues which all have strengths and weaknesses.
+\subsection{User-level threads}
+A direct improvement on the \gls{kthread} approach is to use \glspl{uthread}. These threads offer most of the same features that the operating system already provide but can be used on a much larger scale. This is the most powerfull solution as it allows all the features of multi-threading while removing several of the more expensives costs of using kernel threads. The down side is that almost none of the low-level threading complexities are hidden, users still have to think about data races, deadlocks and synchronization issues. This can be somewhat alleviated by a concurrency toolkit with strong garantees but the parallelism toolkit offers very little to reduce complexity in itself.
+Examples of languages that support are Java\cite{Java}, Haskell\cite{Haskell} and \uC\cite{uC++book}.
+\subsection{Jobs and thread pools}
+The opposite approach is to base parallelism on \glspl{job}. Indeed, \glspl{job} offer limited flexibility but at the benefit of a simpler user interface. In \gls{job} based systems users express parallelism as units of work and the dependency graph (either explicit or implicit) that tie them together. This means users need not to worry about concurrency but significantly limits the interaction that can occur between different jobs. Indeed, any \gls{job} that blocks also blocks the underlying \gls{kthread}, this effectively mean the CPU utilization, and therefore throughput, will suffer noticeably. The golden standard of this implementation is Intel's TBB library\cite{TBB}.
+\subsection{Fibers : user-level threads without preemption}
+Finally, in the middle of the flexibility versus complexity spectrum lay \glspl{fiber} which offer \glspl{uthread} without the complexity of preemption. This means users don't have to worry about other \glspl{fiber} suddenly executing between two instructions which signficantly reduces complexity. However, any call to IO or other concurrency primitives can lead to context switches. Furthermore, users can also block \glspl{fiber} in the middle of their execution without blocking a full processor core. This means users still have to worry about mutual exclusion, deadlocks and race conditions in their code, raising the complexity significantly.
+\cite{Go}
+\subsection{Paradigm performance}
+While the choice between the three paradigms listed above can have significant performance implication, it is difficult to pin the performance implications of chosing a model at the language level. Indeed, in many situations own of these paradigms will show better performance but it all depends on the usage.
+Having mostly indepent units of work to execute almost guarantess that the \gls{job} based system will have the best performance. However, add interactions between jobs and the processor utilisation might suffer. User-level threads may allow maximum ressource utilisation but context switches will be more expansive and it is also harder for users to get perfect tunning. As with every example, fibers sit somewhat in the middle of the spectrum.
+\section{Parallelism in \CFA}
+As a system level language, \CFA should offer both performance and flexibilty as its primary goals, simplicity and user-friendliness being a secondary concern. Therefore, the core of parallelism in \CFA should prioritize power and efficiency.
+\subsection{Kernel core}\label{kernel}
+At the ro
+\subsubsection{Threads}
+\CFA threads have all the caracteristiques of
+\subsection{High-level options}\label{tasks}
+\subsubsection{Thread interface}
+constructors destructors
+        initializer lists
+monitors
+\subsubsection{Futures}
+\subsubsection{Implicit threading}
+Finally, simpler applications can benefit greatly from having implicit parallelism. That is, parallelism that does not rely on the user to write concurrency. This type of parallelism can be achieved both at the language level and at the system level.
+\begin{center}
+\begin{tabular}[t]{|c|c|c|}
+Sequential & System Parallel & Language Parallel \\
+\begin{lstlisting}
+void big_sum(int* a, int* b,
+                 int* out,
+                 size_t length)
+{
+        for(int i = 0; i < length; ++i ) {
+                out[i] = a[i] + b[i];
+        }
+}
+int* a[10000];
+int* b[10000];
+int* c[10000];
+//... fill in a and b ...
+big_sum(a, b, c, 10000);
+\end{lstlisting} &\begin{lstlisting}
+void big_sum(int* a, int* b,
+                 int* out,
+                 size_t length)
+{
+        range ar(a, a + length);
+        range br(b, b + length);
+        range or(out, out + length);
+        parfor( ai, bi, oi,
+        [](int* ai, int* bi, int* oi) {
+                oi = ai + bi;
+        });
+}
+int* a[10000];
+int* b[10000];
+int* c[10000];
+//... fill in a and b ...
+big_sum(a, b, c, 10000);
+\end{lstlisting}&\begin{lstlisting}
+void big_sum(int* a, int* b,
+                 int* out,
+                 size_t length)
+{
+        for (ai, bi, oi) in (a, b, out) {
+                oi = ai + bi;
+        }
+}
+int* a[10000];
+int* b[10000];
+int* c[10000];
+//... fill in a and b ...
+big_sum(a, b, c, 10000);
+\end{lstlisting}
+\end{tabular}
+\end{center}
+\subsection{Machine setup}\label{machine}
+Threads are all good and well but wee still some OS support to fully utilize available hardware.
+\textbf{\large{Work in progress...}} Do wee need something beyond specifying the number of kernel threads?
 \section{Future work}
+Concurrency and parallelism is still a very active field that strongly benefits from hardware advances. As such certain features that aren't necessarily mature enough in their current state could become relevant in the lifetime of \CFA.
+\subsection{Transactions}
 \section*{Acknowledgements}
+\clearpage
+\printglossary
+\clearpage
 \bibliographystyle{plain}
 \bibliography{citations}
+\bibliography{pl,local}

doc/proposals/concurrency/local.bib

-                      r7e10773
+                      rc69adb7
-@mastersthesis{Bilson:CFA,
-    keywords            = {Cforall, Overloading, Polymorphism},
-    author      = {Richard C. Bilson},
-    title       = {Implementing Overloading and Polymorphism in Cforall},
-    school              = "University of Waterloo",
-    year                = "2003"
+}
 @article{HPP:Study,
         keywords        = {Parallel, Productivity},
         author  = {Lorin Hochstein and Jeff Carver and Forrest Shull and Sima Asgari and Victor Basili and Jeffrey K. Hollingsworth and Marvin V. Zelkowitz },
         title   = {Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers},
+}
-@article{CFA:Refrat,
-    keywords    = {Cforall, refrat},
-    author              = {Glen Ditchfield},
-    title               = {Cforall Reference Manual and Rationale},
-    month               = jan,
-    year                = 2003
+}
 …
+}
+@article{Myths,
+        author  = {Peter A. Buhr and Ashif S. Harji},
+        title   = {Concurrent Urban Legends},
+        year            = 2005
+@article{TBB,
+        keywords        = {Intel, TBB},
+        title   = {Intel Thread Building Blocks},
+}
-@article{uCPP:Book,
-    keywords            = {uC++, manual, book},
-    author      = {Peter A. Buhr},
-    title       = {Understanding Control Flow with Concurrent Programming using $\mu${C}{\kern-.1em\hbox{\large\texttt{+\kern-.25em+}}}},
-    month       = aug,
-    year        = 2014
+}
-@techreport{ISO:Ada,
-    type                = {International Standard},
-    key                 = {ISO/IEC 8652:1995},
-    year                = {1995},
-    title               = {Ada},
-    volume              = {1995},
-    institution         = {International Organization for Standardization}
+}

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: