Context Navigation

source: doc/theses/thierry_delisle_MMath/text/future.tex @ 95cb63b

ADTarm-ehast-experimentalenumforall-pointer-decayjacob/cs343-translationnew-astnew-ast-unique-exprpthread-emulationqualifiedEnum

Last change on this file since 95cb63b was 67982887, checked in by Peter A. Buhr <pabuhr@…>, 6 years ago
specialize thesis directory-names
Property mode set to `100644`
File size: 6.0 KB

Line
1
2	\chapter{Conclusion}
3	This thesis has achieved a minimal concurrency \acrshort{api} that is simple, efficient and usable as the basis for higher-level features. The approach presented is based on a lightweight thread-system for parallelism, which sits on top of clusters of processors. This M:N model is judged to be both more efficient and allow more flexibility for users. Furthermore, this document introduces monitors as the main concurrency tool for users. This thesis also offers a novel approach allowing multiple monitors to be accessed simultaneously without running into the Nested Monitor Problem~\cite{Lister77}. It also offers a full implementation of the concurrency runtime written entirely in \CFA, effectively the largest \CFA code base to date.
4
5
6	% ======================================================================
7	% ======================================================================
8	\section{Future Work}
9	% ======================================================================
10	% ======================================================================
11
12	\subsection{Performance} \label{futur:perf}
13	This thesis presents a first implementation of the \CFA concurrency runtime. Therefore, there is still significant work to improve performance. Many of the data structures and algorithms may change in the future to more efficient versions. For example, the number of monitors in a single \gls{bulk-acq} is only bound by the stack size, this is probably unnecessarily generous. It may be possible that limiting the number helps increase performance. However, it is not obvious that the benefit would be significant.
14
15	\subsection{Flexible Scheduling} \label{futur:sched}
16	An important part of concurrency is scheduling. Different scheduling algorithms can affect performance (both in terms of average and variation). However, no single scheduler is optimal for all workloads and therefore there is value in being able to change the scheduler for given programs. One solution is to offer various tweaking options to users, allowing the scheduler to be adjusted to the requirements of the workload. However, in order to be truly flexible, it would be interesting to allow users to add arbitrary data and arbitrary scheduling algorithms. For example, a web server could attach Type-of-Service information to threads and have a ``ToS aware'' scheduling algorithm tailored to this specific web server. This path of flexible schedulers will be explored for \CFA.
17
18	\subsection{Non-Blocking I/O} \label{futur:nbio}
19	While most of the parallelism tools are aimed at data parallelism and control-flow parallelism, many modern workloads are not bound on computation but on IO operations, a common case being web servers and XaaS (anything as a service). These types of workloads often require significant engineering around amortizing costs of blocking IO operations. At its core, non-blocking I/O is an operating system level feature that allows queuing IO operations (e.g., network operations) and registering for notifications instead of waiting for requests to complete. In this context, the role of the language makes Non-Blocking IO easily available and with low overhead. The current trend is to use asynchronous programming using tools like callbacks and/or futures and promises, which can be seen in frameworks like Node.js~\cite{NodeJs} for JavaScript, Spring MVC~\cite{SpringMVC} for Java and Django~\cite{Django} for Python. However, while these are valid solutions, they lead to code that is harder to read and maintain because it is much less linear.
20
21	\subsection{Other Concurrency Tools} \label{futur:tools}
22	While monitors offer a flexible and powerful concurrent core for \CFA, other concurrency tools are also necessary for a complete multi-paradigm concurrency package. Examples of such tools can include simple locks and condition variables, futures and promises~\cite{promises}, executors and actors. These additional features are useful when monitors offer a level of abstraction that is inadequate for certain tasks.
23
24	\subsection{Implicit Threading} \label{futur:implcit}
25	Simpler applications can benefit greatly from having implicit parallelism. That is, parallelism that does not rely on the user to write concurrency. This type of parallelism can be achieved both at the language level and at the library level. The canonical example of implicit parallelism is parallel for loops, which are the simplest example of a divide and conquer algorithms~\cite{uC++book}. Table \ref{lst:parfor} shows three different code examples that accomplish point-wise sums of large arrays. Note that none of these examples explicitly declare any concurrency or parallelism objects.
26
27	\begin{table}
28	\begin{center}
29	\begin{tabular}[t]{\|c\|c\|c\|}
30	Sequential & Library Parallel & Language Parallel \\
31	\begin{cfacode}[tabsize=3]
32	void big_sum(
33	int* a, int* b,
34	int* o,
35	size_t len)
36	{
37	for(
38	int i = 0;
39	i < len;
40	++i )
41	{
42	o[i]=a[i]+b[i];
43	}
44	}
45
46
47
48
49
50	int* a[10000];
51	int* b[10000];
52	int* c[10000];
53	//... fill in a & b
54	big_sum(a,b,c,10000);
55	\end{cfacode} &\begin{cfacode}[tabsize=3]
56	void big_sum(
57	int* a, int* b,
58	int* o,
59	size_t len)
60	{
61	range ar(a, a+len);
62	range br(b, b+len);
63	range or(o, o+len);
64	parfor( ai, bi, oi,
65	[]( int* ai,
66	int* bi,
67	int* oi)
68	{
69	oi=ai+bi;
70	});
71	}
72
73
74	int* a[10000];
75	int* b[10000];
76	int* c[10000];
77	//... fill in a & b
78	big_sum(a,b,c,10000);
79	\end{cfacode}&\begin{cfacode}[tabsize=3]
80	void big_sum(
81	int* a, int* b,
82	int* o,
83	size_t len)
84	{
85	parfor (ai,bi,oi)
86	in (a, b, o )
87	{
88	oi = ai + bi;
89	}
90	}
91
92
93
94
95
96
97
98	int* a[10000];
99	int* b[10000];
100	int* c[10000];
101	//... fill in a & b
102	big_sum(a,b,c,10000);
103	\end{cfacode}
104	\end{tabular}
105	\end{center}
106	\caption{For loop to sum numbers: Sequential, using library parallelism and language parallelism.}
107	\label{lst:parfor}
108	\end{table}
109
110	Implicit parallelism is a restrictive solution and therefore has its limitations. However, it is a quick and simple approach to parallelism, which may very well be sufficient for smaller applications and reduces the amount of boilerplate needed to start benefiting from parallelism in modern CPUs.
111
112

Note: See TracBrowser for help on using the repository browser.

Download in other formats: