Context Navigation

source: doc/proposals/concurrency/text/future.tex @ cf966b5

ADTaaron-thesisarm-ehast-experimentalcleanup-dtorsdeferred_resndemanglerenumforall-pointer-decayjacob/cs343-translationjenkins-sandboxnew-astnew-ast-unique-exprnew-envno_listpersistent-indexerpthread-emulationqualifiedEnumresolv-newwith_gc

Last change on this file since cf966b5 was cf966b5, checked in by Thierry Delisle <tdelisle@…>, 7 years ago
Results need to be updated but otherwise, tentative final draft
Property mode set to `100644`
File size: 6.0 KB

Line
1
2	\chapter{Conclusion}
3	As mentionned in the introduction, this thesis provides a minimal concurrency \acrshort{api} that is simple, efficient and usable as the basis for higher-level features. The approach presented is based on a lighweight thread system for parallelism which sits on top of clusters of processors. This M:N model is jugded to be both more efficient and allow more flexibility for users. Furthermore, this document introduces monitors as the main concurrency tool for users. This thesis also offers a novel approach which allows using multiple monitors at once without running into the Nested Monitor Problem~\cite{Lister77}. It also offers a full implmentation of the concurrency runtime wirtten enterily in \CFA, effectively the largest \CFA code base to date.
4
5
6	% ======================================================================
7	% ======================================================================
8	\section{Future Work}
9	% ======================================================================
10	% ======================================================================
11
12	\subsection{Performance} \label{futur:perf}
13	This thesis presents a first implementation of the \CFA runtime. Therefore, there is still significant work to do to improve performance. Many of the data structures and algorithms will change in the future to more efficient versions. For example, \CFA the number of monitors in a single \gls{bulk-acq} is only bound by the stack size, this is probably unnecessarily generous. It may be possible that limiting the number help increase performance. However, it is not obvious that the benefit would be significant.
14
15	\subsection{Flexible Scheduling} \label{futur:sched}
16	An important part of concurrency is scheduling. Different scheduling algorithm can affect performance (both in terms of average and variation). However, no single scheduler is optimal for all workloads and therefore there is value in being able to change the scheduler for given programs. One solution is to offer various tweaking options to users, allowing the scheduler to be adjusted to the requirements of the workload. However, in order to be truly flexible, it would be interesting to allow users to add arbitrary data and arbitrary scheduling algorithms to the scheduler. For example, a web server could attach Type-of-Service information to threads and have a ``ToS aware'' scheduling algorithm tailored to this specific web server. This path of flexible schedulers will be explored for \CFA.
17
18	\subsection{Non-Blocking IO} \label{futur:nbio}
19	While most of the parallelism tools are aimed at data parallelism and control-flow parallelism, many modern workloads are not bound on computation but on IO operations, a common case being web-servers and XaaS (anything as a service). These type of workloads often require significant engineering around amortizing costs of blocking IO operations. At its core, Non-Blocking IO is a operating system level feature that allows queuing IO operations (e.g., network operations) and registering for notifications instead of waiting for requests to complete. In this context, the role of the language make Non-Blocking IO easily available and with low overhead. The current trend is to use asynchronous programming using tools like callbacks and/or futures and promises, which can be seen in frameworks like Node.js~\cite{NodeJs} for JavaScript, Spring MVC~\cite{SpringMVC} for Java and Django~\cite{Django} for Python. However, while these are valid solutions, they lead to code that is harder to read and maintain because it is much less linear.
20
21	\subsection{Other concurrency tools} \label{futur:tools}
22	While monitors offer a flexible and powerful concurrent core for \CFA, other concurrency tools are also necessary for a complete multi-paradigm concurrency package. Example of such tools can include simple locks and condition variables, futures and promises~\cite{promises}, executors and actors. These additional features are useful when monitors offer a level of abstraction that is inadequate for certain tasks.
23
24	\subsection{Implicit threading} \label{futur:implcit}
25	Simpler applications can benefit greatly from having implicit parallelism. That is, parallelism that does not rely on the user to write concurrency. This type of parallelism can be achieved both at the language level and at the library level. The canonical example of implicit parallelism is parallel for loops, which are the simplest example of a divide and conquer algorithm~\cite{uC++book}. Table \ref{lst:parfor} shows three different code examples that accomplish point-wise sums of large arrays. Note that none of these examples explicitly declare any concurrency or parallelism objects.
26
27	\begin{table}
28	\begin{center}
29	\begin{tabular}[t]{\|c\|c\|c\|}
30	Sequential & Library Parallel & Language Parallel \\
31	\begin{cfacode}[tabsize=3]
32	void big_sum(
33	int* a, int* b,
34	int* o,
35	size_t len)
36	{
37	for(
38	int i = 0;
39	i < len;
40	++i )
41	{
42	o[i]=a[i]+b[i];
43	}
44	}
45
46
47
48
49
50	int* a[10000];
51	int* b[10000];
52	int* c[10000];
53	//... fill in a & b
54	big_sum(a,b,c,10000);
55	\end{cfacode} &\begin{cfacode}[tabsize=3]
56	void big_sum(
57	int* a, int* b,
58	int* o,
59	size_t len)
60	{
61	range ar(a, a+len);
62	range br(b, b+len);
63	range or(o, o+len);
64	parfor( ai, bi, oi,
65	[]( int* ai,
66	int* bi,
67	int* oi)
68	{
69	oi=ai+bi;
70	});
71	}
72
73
74	int* a[10000];
75	int* b[10000];
76	int* c[10000];
77	//... fill in a & b
78	big_sum(a,b,c,10000);
79	\end{cfacode}&\begin{cfacode}[tabsize=3]
80	void big_sum(
81	int* a, int* b,
82	int* o,
83	size_t len)
84	{
85	parfor (ai,bi,oi)
86	in (a, b, o )
87	{
88	oi = ai + bi;
89	}
90	}
91
92
93
94
95
96
97
98	int* a[10000];
99	int* b[10000];
100	int* c[10000];
101	//... fill in a & b
102	big_sum(a,b,c,10000);
103	\end{cfacode}
104	\end{tabular}
105	\end{center}
106	\caption{For loop to sum numbers: Sequential, using library parallelism and language parallelism.}
107	\label{lst:parfor}
108	\end{table}
109
110	Implicit parallelism is a restrictive solution and therefore has its limitations. However, it is a quick and simple approach to parallelism, which may very well be sufficient for smaller applications and reduces the amount of boiler-plate needed to start benefiting from parallelism in modern CPUs.
111
112

Note: See TracBrowser for help on using the repository browser.

Download in other formats: