Ignore:
Timestamp:
Nov 17, 2017, 10:56:16 AM (6 years ago)
Author:
Rob Schluntz <rschlunt@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
cdbfab0
Parents:
f5c3b6c (diff), b7f8cb4 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' into fix-bug-51

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/proposals/concurrency/text/future.tex

    rf5c3b6c r0fe4e62  
    55% ======================================================================
    66
    7 Concurrency and parallelism is still a very active field that strongly benefits from hardware advances. As such certain features that aren't necessarily mature enough in their current state could become relevant in the lifetime of \CFA.
    8 \section{Non-Blocking IO}
     7\section{Flexible Scheduling} \label{futur:sched}
     8An important part of concurrency is scheduling. Different scheduling algorithm can affact peformance (both in terms of average and variation). However, no single scheduler is optimal for all workloads and therefore there is value in being able to change the scheduler for given programs. One solution is to offer various tweaking options to users, allowing the scheduler to be adjusted the to requirements of the workload. However, in order to be truly flexible, it would be interesting to allow users to add arbitrary data and arbirary scheduling algorithms to the scheduler. For example, a web server could attach Type-of-Service information to threads and have a ``ToS aware'' scheduling algorithm tailored to this specific web server. This path of flexible schedulers will be explored for \CFA.
     9
     10\section{Non-Blocking IO} \label{futur:nbio}
     11While most of the parallelism tools
     12However, many modern workloads are not bound on computation but on IO operations, an common case being webservers and XaaS (anything as a service). These type of workloads often require significant engineering around amortising costs of blocking IO operations. While improving throughtput of these operations is outside what \CFA can do as a language, it can help users to make better use of the CPU time otherwise spent waiting on IO operations. The current trend is to use asynchronous programming using tools like callbacks and/or futurs and promises\cite. However, while these are valid solutions, they lead to code that is harder to read and maintain because it is much less linear
     13
     14\section{Other concurrency tools} \label{futur:tools}
     15While monitors offer a flexible and powerful concurent core for \CFA, other concurrency tools are also necessary for a complete multi-paradigm concurrency package. Example of such tools can include simple locks and condition variables, futures and promises\cite{promises}, and executors. These additional features are useful when monitors offer a level of abstraction which is indaquate for certain tasks.
     16
     17\section{Implicit threading} \label{futur:implcit}
     18Simpler applications can benefit greatly from having implicit parallelism. That is, parallelism that does not rely on the user to write concurrency. This type of parallelism can be achieved both at the language level and at the library level. The cannonical example of implcit parallelism is parallel for loops, which are the simplest example of a divide and conquer algorithm\cite{uC++book}. Listing \ref{lst:parfor} shows three different code examples that accomplish pointwise sums of large arrays. Note that none of these example explicitly declare any concurrency or parallelism objects.
     19
     20\begin{figure}
     21\begin{center}
     22\begin{tabular}[t]{|c|c|c|}
     23Sequential & Library Parallel & Language Parallel \\
     24\begin{cfacode}[tabsize=3]
     25void big_sum(
     26        int* a, int* b,
     27        int* o,
     28        size_t len)
     29{
     30        for(
     31                int i = 0;
     32                i < len;
     33                ++i )
     34        {
     35                o[i]=a[i]+b[i];
     36        }
     37}
    938
    1039
    11 \section{Other concurrency tools}
    1240
    1341
    14 \section{Implicit threading}
    15 % Finally, simpler applications can benefit greatly from having implicit parallelism. That is, parallelism that does not rely on the user to write concurrency. This type of parallelism can be achieved both at the language level and at the system level.
    16 %
    17 % \begin{center}
    18 % \begin{tabular}[t]{|c|c|c|}
    19 % Sequential & System Parallel & Language Parallel \\
    20 % \begin{lstlisting}
    21 % void big_sum(int* a, int* b,
    22 %                int* out,
    23 %                size_t length)
    24 % {
    25 %       for(int i = 0; i < length; ++i ) {
    26 %               out[i] = a[i] + b[i];
    27 %       }
    28 % }
    29 %
    30 %
    31 %
    32 %
    33 %
    34 % int* a[10000];
    35 % int* b[10000];
    36 % int* c[10000];
    37 % //... fill in a and b ...
    38 % big_sum(a, b, c, 10000);
    39 % \end{lstlisting} &\begin{lstlisting}
    40 % void big_sum(int* a, int* b,
    41 %                int* out,
    42 %                size_t length)
    43 % {
    44 %       range ar(a, a + length);
    45 %       range br(b, b + length);
    46 %       range or(out, out + length);
    47 %       parfor( ai, bi, oi,
    48 %       [](int* ai, int* bi, int* oi) {
    49 %               oi = ai + bi;
    50 %       });
    51 % }
    52 %
    53 % int* a[10000];
    54 % int* b[10000];
    55 % int* c[10000];
    56 % //... fill in a and b ...
    57 % big_sum(a, b, c, 10000);
    58 % \end{lstlisting}&\begin{lstlisting}
    59 % void big_sum(int* a, int* b,
    60 %                int* out,
    61 %                size_t length)
    62 % {
    63 %       for (ai, bi, oi) in (a, b, out) {
    64 %               oi = ai + bi;
    65 %       }
    66 % }
    67 %
    68 %
    69 %
    70 %
    71 %
    72 % int* a[10000];
    73 % int* b[10000];
    74 % int* c[10000];
    75 % //... fill in a and b ...
    76 % big_sum(a, b, c, 10000);
    77 % \end{lstlisting}
    78 % \end{tabular}
    79 % \end{center}
    80 %
     42
     43int* a[10000];
     44int* b[10000];
     45int* c[10000];
     46//... fill in a & b
     47big_sum(a,b,c,10000);
     48\end{cfacode} &\begin{cfacode}[tabsize=3]
     49void big_sum(
     50        int* a, int* b,
     51        int* o,
     52        size_t len)
     53{
     54        range ar(a, a+len);
     55        range br(b, b+len);
     56        range or(o, o+len);
     57        parfor( ai, bi, oi,
     58        [](     int* ai,
     59                int* bi,
     60                int* oi)
     61        {
     62                oi=ai+bi;
     63        });
     64}
    8165
    8266
    83 \section{Multiple Paradigms}
     67int* a[10000];
     68int* b[10000];
     69int* c[10000];
     70//... fill in a & b
     71big_sum(a,b,c,10000);
     72\end{cfacode}&\begin{cfacode}[tabsize=3]
     73void big_sum(
     74        int* a, int* b,
     75        int* o,
     76        size_t len)
     77{
     78        parfor (ai,bi,oi)
     79            in (a, b, o )
     80        {
     81                oi = ai + bi;
     82        }
     83}
    8484
    8585
    86 \section{Transactions}
     86
     87
     88
     89
     90
     91int* a[10000];
     92int* b[10000];
     93int* c[10000];
     94//... fill in a & b
     95big_sum(a,b,c,10000);
     96\end{cfacode}
     97\end{tabular}
     98\end{center}
     99\caption{For loop to sum numbers: Sequential, using library parallelism and language parallelism.}
     100\label{lst:parfor}
     101\end{figure}
     102
     103Implicit parallelism is a general solution and therefore has its limitations. However, it is a quick and simple approach to parallelism which may very well be sufficient for smaller applications and reduces the amount of boiler-plate that is needed to start benefiting from parallelism in modern CPUs.
     104
     105
Note: See TracChangeset for help on using the changeset viewer.