source: doc/proposals/concurrency/text/future.tex @ cae28da

ADTaaron-thesisarm-ehast-experimentalcleanup-dtorsdeferred_resndemanglerenumforall-pointer-decayjacob/cs343-translationjenkins-sandboxnew-astnew-ast-unique-exprnew-envno_listpersistent-indexerpthread-emulationqualifiedEnumresolv-newwith_gc
Last change on this file since cae28da was cae28da, checked in by Thierry Delisle <tdelisle@…>, 6 years ago

Finished reviewing thesis

  • Property mode set to 100644
File size: 6.0 KB
Line 
1
2\chapter{Conclusion}
3This thesis has achieved a minimal concurrency \acrshort{api} that is simple, efficient and usable as the basis for higher-level features. The approach presented is based on a lightweight thread-system for parallelism, which sits on top of clusters of processors. This M:N model is judged to be both more efficient and allow more flexibility for users. Furthermore, this document introduces monitors as the main concurrency tool for users. This thesis also offers a novel approach allowing multiple monitors to be accessed simultaneously without running into the Nested Monitor Problem~\cite{Lister77}. It also offers a full implementation of the concurrency runtime written entirely in \CFA, effectively the largest \CFA code base to date.
4
5
6% ======================================================================
7% ======================================================================
8\section{Future Work}
9% ======================================================================
10% ======================================================================
11
12\subsection{Performance} \label{futur:perf}
13This thesis presents a first implementation of the \CFA concurrency runtime. Therefore, there is still significant work to improve performance. Many of the data structures and algorithms may change in the future to more efficient versions. For example, the number of monitors in a single \gls{bulk-acq} is only bound by the stack size, this is probably unnecessarily generous. It may be possible that limiting the number helps increase performance. However, it is not obvious that the benefit would be significant.
14
15\subsection{Flexible Scheduling} \label{futur:sched}
16An important part of concurrency is scheduling. Different scheduling algorithms can affect performance (both in terms of average and variation). However, no single scheduler is optimal for all workloads and therefore there is value in being able to change the scheduler for given programs. One solution is to offer various tweaking options to users, allowing the scheduler to be adjusted to the requirements of the workload. However, in order to be truly flexible, it would be interesting to allow users to add arbitrary data and arbitrary scheduling algorithms. For example, a web server could attach Type-of-Service information to threads and have a ``ToS aware'' scheduling algorithm tailored to this specific web server. This path of flexible schedulers will be explored for \CFA.
17
18\subsection{Non-Blocking I/O} \label{futur:nbio}
19While most of the parallelism tools are aimed at data parallelism and control-flow parallelism, many modern workloads are not bound on computation but on IO operations, a common case being web servers and XaaS (anything as a service). These types of workloads often require significant engineering around amortizing costs of blocking IO operations. At its core, non-blocking I/O is an operating system level feature that allows queuing IO operations (e.g., network operations) and registering for notifications instead of waiting for requests to complete. In this context, the role of the language makes Non-Blocking IO easily available and with low overhead. The current trend is to use asynchronous programming using tools like callbacks and/or futures and promises, which can be seen in frameworks like Node.js~\cite{NodeJs} for JavaScript, Spring MVC~\cite{SpringMVC} for Java and Django~\cite{Django} for Python. However, while these are valid solutions, they lead to code that is harder to read and maintain because it is much less linear.
20
21\subsection{Other Concurrency Tools} \label{futur:tools}
22While monitors offer a flexible and powerful concurrent core for \CFA, other concurrency tools are also necessary for a complete multi-paradigm concurrency package. Examples of such tools can include simple locks and condition variables, futures and promises~\cite{promises}, executors and actors. These additional features are useful when monitors offer a level of abstraction that is inadequate for certain tasks.
23
24\subsection{Implicit Threading} \label{futur:implcit}
25Simpler applications can benefit greatly from having implicit parallelism. That is, parallelism that does not rely on the user to write concurrency. This type of parallelism can be achieved both at the language level and at the library level. The canonical example of implicit parallelism is parallel for loops, which are the simplest example of a divide and conquer algorithms~\cite{uC++book}. Table \ref{lst:parfor} shows three different code examples that accomplish point-wise sums of large arrays. Note that none of these examples explicitly declare any concurrency or parallelism objects.
26
27\begin{table}
28\begin{center}
29\begin{tabular}[t]{|c|c|c|}
30Sequential & Library Parallel & Language Parallel \\
31\begin{cfacode}[tabsize=3]
32void big_sum(
33        int* a, int* b,
34        int* o,
35        size_t len)
36{
37        for(
38                int i = 0;
39                i < len;
40                ++i )
41        {
42                o[i]=a[i]+b[i];
43        }
44}
45
46
47
48
49
50int* a[10000];
51int* b[10000];
52int* c[10000];
53//... fill in a & b
54big_sum(a,b,c,10000);
55\end{cfacode} &\begin{cfacode}[tabsize=3]
56void big_sum(
57        int* a, int* b,
58        int* o,
59        size_t len)
60{
61        range ar(a, a+len);
62        range br(b, b+len);
63        range or(o, o+len);
64        parfor( ai, bi, oi,
65        [](     int* ai,
66                int* bi,
67                int* oi)
68        {
69                oi=ai+bi;
70        });
71}
72
73
74int* a[10000];
75int* b[10000];
76int* c[10000];
77//... fill in a & b
78big_sum(a,b,c,10000);
79\end{cfacode}&\begin{cfacode}[tabsize=3]
80void big_sum(
81        int* a, int* b,
82        int* o,
83        size_t len)
84{
85        parfor (ai,bi,oi)
86            in (a, b, o )
87        {
88                oi = ai + bi;
89        }
90}
91
92
93
94
95
96
97
98int* a[10000];
99int* b[10000];
100int* c[10000];
101//... fill in a & b
102big_sum(a,b,c,10000);
103\end{cfacode}
104\end{tabular}
105\end{center}
106\caption{For loop to sum numbers: Sequential, using library parallelism and language parallelism.}
107\label{lst:parfor}
108\end{table}
109
110Implicit parallelism is a restrictive solution and therefore has its limitations. However, it is a quick and simple approach to parallelism, which may very well be sufficient for smaller applications and reduces the amount of boilerplate needed to start benefiting from parallelism in modern CPUs.
111
112
Note: See TracBrowser for help on using the repository browser.