source: doc/proposals/concurrency/text/future.tex @ 8a62d04

ADTaaron-thesisarm-ehast-experimentalcleanup-dtorsdeferred_resndemanglerenumforall-pointer-decayjacob/cs343-translationjenkins-sandboxnew-astnew-ast-unique-exprnew-envno_listpersistent-indexerpthread-emulationqualifiedEnumresolv-newwith_gc
Last change on this file since 8a62d04 was cf966b5, checked in by Thierry Delisle <tdelisle@…>, 6 years ago

Results need to be updated but otherwise, tentative final draft

  • Property mode set to 100644
File size: 6.0 KB
Line 
1
2\chapter{Conclusion}
3As mentionned in the introduction, this thesis provides a minimal concurrency \acrshort{api} that is simple, efficient and usable as the basis for higher-level features. The approach presented is based on a lighweight thread system for parallelism which sits on top of clusters of processors. This M:N model is jugded to be both more efficient and allow more flexibility for users. Furthermore, this document introduces monitors as the main concurrency tool for users. This thesis also offers a novel approach which allows using multiple monitors at once without running into the Nested Monitor Problem~\cite{Lister77}. It also offers a full implmentation of the concurrency runtime wirtten enterily in \CFA, effectively the largest \CFA code base to date.
4
5
6% ======================================================================
7% ======================================================================
8\section{Future Work}
9% ======================================================================
10% ======================================================================
11
12\subsection{Performance} \label{futur:perf}
13This thesis presents a first implementation of the \CFA runtime. Therefore, there is still significant work to do to improve performance. Many of the data structures and algorithms will change in the future to more efficient versions. For example, \CFA the number of monitors in a single \gls{bulk-acq} is only bound by the stack size, this is probably unnecessarily generous. It may be possible that limiting the number help increase performance. However, it is not obvious that the benefit would be significant.
14
15\subsection{Flexible Scheduling} \label{futur:sched}
16An important part of concurrency is scheduling. Different scheduling algorithm can affect performance (both in terms of average and variation). However, no single scheduler is optimal for all workloads and therefore there is value in being able to change the scheduler for given programs. One solution is to offer various tweaking options to users, allowing the scheduler to be adjusted to the requirements of the workload. However, in order to be truly flexible, it would be interesting to allow users to add arbitrary data and arbitrary scheduling algorithms to the scheduler. For example, a web server could attach Type-of-Service information to threads and have a ``ToS aware'' scheduling algorithm tailored to this specific web server. This path of flexible schedulers will be explored for \CFA.
17
18\subsection{Non-Blocking IO} \label{futur:nbio}
19While most of the parallelism tools are aimed at data parallelism and control-flow parallelism, many modern workloads are not bound on computation but on IO operations, a common case being web-servers and XaaS (anything as a service). These type of workloads often require significant engineering around amortizing costs of blocking IO operations. At its core, Non-Blocking IO is a operating system level feature that allows queuing IO operations (e.g., network operations) and registering for notifications instead of waiting for requests to complete. In this context, the role of the language make Non-Blocking IO easily available and with low overhead. The current trend is to use asynchronous programming using tools like callbacks and/or futures and promises, which can be seen in frameworks like Node.js~\cite{NodeJs} for JavaScript, Spring MVC~\cite{SpringMVC} for Java and Django~\cite{Django} for Python. However, while these are valid solutions, they lead to code that is harder to read and maintain because it is much less linear.
20
21\subsection{Other concurrency tools} \label{futur:tools}
22While monitors offer a flexible and powerful concurrent core for \CFA, other concurrency tools are also necessary for a complete multi-paradigm concurrency package. Example of such tools can include simple locks and condition variables, futures and promises~\cite{promises}, executors and actors. These additional features are useful when monitors offer a level of abstraction that is inadequate for certain tasks.
23
24\subsection{Implicit threading} \label{futur:implcit}
25Simpler applications can benefit greatly from having implicit parallelism. That is, parallelism that does not rely on the user to write concurrency. This type of parallelism can be achieved both at the language level and at the library level. The canonical example of implicit parallelism is parallel for loops, which are the simplest example of a divide and conquer algorithm~\cite{uC++book}. Table \ref{lst:parfor} shows three different code examples that accomplish point-wise sums of large arrays. Note that none of these examples explicitly declare any concurrency or parallelism objects.
26
27\begin{table}
28\begin{center}
29\begin{tabular}[t]{|c|c|c|}
30Sequential & Library Parallel & Language Parallel \\
31\begin{cfacode}[tabsize=3]
32void big_sum(
33        int* a, int* b,
34        int* o,
35        size_t len)
36{
37        for(
38                int i = 0;
39                i < len;
40                ++i )
41        {
42                o[i]=a[i]+b[i];
43        }
44}
45
46
47
48
49
50int* a[10000];
51int* b[10000];
52int* c[10000];
53//... fill in a & b
54big_sum(a,b,c,10000);
55\end{cfacode} &\begin{cfacode}[tabsize=3]
56void big_sum(
57        int* a, int* b,
58        int* o,
59        size_t len)
60{
61        range ar(a, a+len);
62        range br(b, b+len);
63        range or(o, o+len);
64        parfor( ai, bi, oi,
65        [](     int* ai,
66                int* bi,
67                int* oi)
68        {
69                oi=ai+bi;
70        });
71}
72
73
74int* a[10000];
75int* b[10000];
76int* c[10000];
77//... fill in a & b
78big_sum(a,b,c,10000);
79\end{cfacode}&\begin{cfacode}[tabsize=3]
80void big_sum(
81        int* a, int* b,
82        int* o,
83        size_t len)
84{
85        parfor (ai,bi,oi)
86            in (a, b, o )
87        {
88                oi = ai + bi;
89        }
90}
91
92
93
94
95
96
97
98int* a[10000];
99int* b[10000];
100int* c[10000];
101//... fill in a & b
102big_sum(a,b,c,10000);
103\end{cfacode}
104\end{tabular}
105\end{center}
106\caption{For loop to sum numbers: Sequential, using library parallelism and language parallelism.}
107\label{lst:parfor}
108\end{table}
109
110Implicit parallelism is a restrictive solution and therefore has its limitations. However, it is a quick and simple approach to parallelism, which may very well be sufficient for smaller applications and reduces the amount of boiler-plate needed to start benefiting from parallelism in modern CPUs.
111
112
Note: See TracBrowser for help on using the repository browser.