Context Navigation

source: doc/theses/fangren_yu_COOP_S20/Report.tex @ d1864da

ADTarm-ehast-experimentalenumforall-pointer-decayjacob/cs343-translationnew-ast-unique-exprpthread-emulationqualifiedEnum

Last change on this file since d1864da was d1864da, checked in by Peter A. Buhr <pabuhr@…>, 4 years ago
continue latex version of Fangren's co-op report
Property mode set to `100644`
File size: 15.2 KB

Line
1	\documentclass[twoside,12pt]{article}
2
3	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
4
5	% Latex packages used in the document.
6	\usepackage{fullpage,times,comment}
7	\usepackage{epic,eepic}
8	\usepackage{upquote} % switch curled `'" to straight
9	\usepackage{calc}
10	\usepackage{varioref} % extended references
11	\usepackage[labelformat=simple,aboveskip=0pt,farskip=0pt]{subfig}
12	\renewcommand{\thesubfigure}{\alph{subfigure})}
13	\usepackage{latexsym} % \Box glyph
14	\usepackage{mathptmx} % better math font with "times"
15	\usepackage[usenames]{color}
16	\input{common} % common CFA document macros
17	\lstset{
18	escapechar=\$, % LaTeX escape in CFA code
19	moredelim=**[is][\color{red}]{`}{`},
20	}% lstset
21	\lstMakeShortInline@%
22	\usepackage[dvips,plainpages=false,pdfpagelabels,pdfpagemode=UseNone,colorlinks=true,pagebackref=true,linkcolor=blue,citecolor=blue,urlcolor=blue,pagebackref=true,breaklinks=true]{hyperref}
23	\usepackage{breakurl}
24
25	\usepackage[pagewise]{lineno}
26	\renewcommand{\linenumberfont}{\scriptsize\sffamily}
27
28	% Default underscore is too low and wide. Cannot use lstlisting "literate" as replacing underscore
29	% removes it as a variable-name character so keywords in variables are highlighted. MUST APPEAR
30	% AFTER HYPERREF.
31	\renewcommand{\textunderscore}{\leavevmode\makebox[1.2ex][c]{\rule{1ex}{0.075ex}}}
32
33	\setlength{\topmargin}{-0.45in} % move running title into header
34	\setlength{\headsep}{0.25in}
35
36	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
37
38	\CFAStyle % use default CFA format-style
39	\lstnewenvironment{C++}[1][] % use C++ style
40	{\lstset{language=C++,moredelim=**[is][\protect\color{red}]{`}{`},#1}}
41	{}
42
43	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
44
45	\setcounter{secnumdepth}{3} % number subsubsections
46	\setcounter{tocdepth}{3} % subsubsections in table of contents
47	\makeindex
48
49	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
50
51	\title{\Huge
52	cfa-cc Developer's Reference
53	}% title
54
55	\author{\LARGE
56	Fangren Yu
57	}% author
58
59	\date{
60	\today
61	}% date
62
63	%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
64
65	\begin{document}
66	\pagestyle{headings}
67	% changed after setting pagestyle
68	\renewcommand{\sectionmark}[1]{\markboth{\thesection\quad #1}{\thesection\quad #1}}
69	\renewcommand{\subsectionmark}[1]{\markboth{\thesubsection\quad #1}{\thesubsection\quad #1}}
70	\pagenumbering{roman}
71	\linenumbers % comment out to turn off line numbering
72
73	\maketitle
74	\pdfbookmark[1]{Contents}{section}
75	\tableofcontents
76
77	\clearpage
78	\thispagestyle{plain}
79	\pagenumbering{arabic}
80
81
82	\section{Overview}
83
84	cfa-cc is the reference compiler for the Cforall programming language, which is a non-
85	object-oriented extension to C.
86	Cforall attempts to introduce productive modern programming language features to C
87	while maintaining as much backward-compatibility as possible, so that most existing C
88	programs can seamlessly work with Cforall.
89
90	Since the Cforall project was dated back to the early 2000s, and only restarted in the past
91	few years, there is a significant amount of legacy code in the current compiler codebase,
92	with little proper documentation available. This becomes a difficulty while developing new
93	features based on the previous implementations, and especially while diagnosing
94	problems.
95
96	Currently, the Cforall team is also facing another problem: bad compiler performance. For
97	the development of a new programming language, writing a standard library is an
98	important part. The incompetence of the compiler causes building the library files to take
99	tens of minutes, making iterative development and testing almost impossible. There is
100	ongoing effort to rewrite the core data structure of the compiler to overcome the
101	performance issue, but many bugs may appear during the work, and lack of documentation
102	makes debugging extremely difficult.
103
104	This developer's reference will be continuously improved and eventually cover the
105	compiler codebase. For now, the focus is mainly on the parts being rewritten, and also the
106	performance bottleneck, namely the resolution algorithm. It is aimed to provide new
107	developers to the project enough guidance and clarify the purposes and behavior of certain
108	functions which are not mentioned in the previous Cforall research papers.
109
110
111	\section{Compiler Framework}
112
113	\subsection{AST Representation}
114
115	Source code input is first transformed into abstract syntax tree (AST) representation by the
116	parser before analyzed by the compiler.
117
118	There are 4 major categories of AST nodes used by the compiler, along with some derived
119	structures.
120
121	\subsubsection{Declaration nodes}
122
123	A declaration node represents either of:
124	\begin{itemize}
125	\item
126	Type declaration: struct, union, typedef or type parameter (see Appendix A.3)
127	\item
128	Variable declaration
129	\item
130	Function declaration
131	\end{itemize}
132	Declarations are introduced by standard C declarations, with the usual scoping rules.
133	In addition, declarations can also be introduced by the forall clause (which is the origin
134	of Cforall's name):
135	\begin{cfa}
136	forall (<$\emph{TypeParameterList}$> \| <$\emph{AssertionList}$>)
137	$\emph{declaration}$
138	\end{cfa}
139	Type parameters in Cforall are similar to \CC template type parameters. The Cforall
140	declaration
141	\begin{cfa}
142	forall (dtype T) ...
143	\end{cfa}
144	behaves similarly as the \CC template declaration
145	\begin{C++}
146	template <typename T> ...
147	\end{C++}
148
149	Assertions are a distinctive feature of Cforall: contrary to the \CC template where
150	arbitrary functions and operators can be used in a template definition, in a Cforall
151	parametric function, operations on parameterized types must be declared in assertions.
152
153	Consider the following \CC template:
154	\begin{C++}
155	template <typename T> int foo(T t) {
156	return bar(t) + baz(t);
157	}
158	\end{C++}
159	Unless bar and baz are also parametric functions taking any argument type, they must be
160	declared in the assertions, or otherwise the code will not compile:
161	\begin{cfa}
162	forall (dtype T \| { int bar(T); int baz(t); }) int foo (T t) {
163	return bar(t) + baz(t);
164	}
165	\end{cfa}
166	Assertions are written using the usual function declaration syntax. The scope of type
167	parameters and assertions is the following declaration.
168
169	\subsubsection{Type nodes}
170
171	A type node represents the type of an object or expression.
172	Named types reference the corresponding type declarations. The type of a function is its
173	function pointer type (same as standard C).
174	With the addition of type parameters, named types may contain a list of parameter values
175	(actual parameter types).
176
177	\subsubsection{Statement nodes}
178
179	Statement nodes represent the statements in the program, including basic expression
180	statements, control flows and blocks.
181	Local declarations (within a block statement) are represented as declaration statements.
182
183	\subsubsection{Expression nodes}
184
185	Some expressions are represented differently in the compiler before and after resolution
186	stage:
187	\begin{itemize}
188	\item
189	Name expressions: NameExpr pre-resolution, VariableExpr post-resolution
190	\item
191	Member expressions: UntypedMemberExpr pre-resolution, MemberExpr post-resolution
192	\item
193	Function call expressions (including overloadable operators): UntypedExpr pre-resolution, ApplicationExpr post-resolution
194	\end{itemize}
195	The pre-resolution representations contain only the symbols. Post-resolution results link
196	them to the actual variable and function declarations.
197
198
199	\subsection{Compilation Passes}
200
201	Compilation steps are implemented as passes, which follows a general structural recursion
202	pattern on the syntax tree.
203
204	The basic work flow of compilation passes follows preorder and postorder traversal on
205	tree data structure, implemented with visitor pattern, and can be loosely described with
206	the following pseudocode:
207	\begin{cfa}
208	Pass::visit (node_t node) {
209	previsit(node);
210	if (visit_children)
211	for each child of node:
212	child.accept(this);
213	postvisit(node);
214	}
215	\end{cfa}
216	Operations in previsit() happen in preorder (top to bottom) and operations in
217	postvisit() happen in postorder (bottom to top). The precise order of recursive
218	operations on child nodes can be found in @Common/PassVisitor.impl.h@ (old) and
219	@AST/Pass.impl.hpp@ (new).
220	Implementations of compilation passes need to follow certain conventions:
221	\begin{itemize}
222	\item
223	Passes \textbf{should not} directly override the visit method (Non-virtual Interface
224	principle); if a pass desires different recursion behavior, it should set
225	@visit_children@ to false and perform recursive calls manually within previsit or
226	postvisit procedures. To enable this option, inherit from @WithShortCircuiting@ mixin.
227	\item
228	previsit may mutate the node but \textbf{must not} change the node type or return null.
229	\item
230	postvisit may mutate the node, reconstruct it to a different node type, or delete it by
231	returning null.
232	\item
233	If the previsit or postvisit method is not defined for a node type, the step is skipped.
234	If the return type is declared as void, the original node is returned by default. These
235	behaviors are controlled by template specialization rules; see
236	@Common/PassVisitor.proto.h@ (old) and @AST/Pass.proto.hpp@ (new) for details.
237	\end{itemize}
238	Other useful mixin classes for compilation passes include:
239	\begin{itemize}
240	\item
241	WithGuards allows saving values of variables and restore automatically upon exiting
242	the current node.
243	\item
244	WithVisitorRef creates a wrapped entity of current pass (the actual argument
245	passed to recursive calls internally) for explicit recursion, usually used together
246	with WithShortCircuiting.
247	\item
248	WithSymbolTable gives a managed symbol table with built-in scoping rule handling
249	(e.g. on entering and exiting a block statement)
250	\end{itemize}
251	\textbf{NOTE}: If a pass extends the functionality of another existing pass, due to \CC overloading
252	resolution rules, it \textbf{must} explicitly introduce the inherited previsit and postvisit procedures
253	to its own scope, or otherwise they will not be picked up by template resolution:
254	\begin{cfa}
255	class Pass2: public Pass1 {
256	using Pass1::previsit;
257	using Pass1::postvisit;
258	// new procedures
259	}
260	\end{cfa}
261
262
263	\subsection{Data Structure Change WIP (new-ast)}
264
265	It has been observed that excessive copying of syntax tree structures accounts for a
266	majority of computation cost and significantly slows down the compiler. In the previous
267	implementation of the syntax tree, every internal node has a unique parent; therefore all
268	copies are required to duplicate everything down to the bottom. A new, experimental
269	re-implementation of the syntax tree (source under directory AST/ hereby referred to as
270	``new-ast'') attempts to overcome this issue with a functional approach that allows sharing
271	of common sub-structures and only makes copies when necessary.
272
273	The core of new-ast is a customized implementation of smart pointers, similar to
274	@std::shared_ptr@ and @std::weak_ptr@ in C++ standard library. Reference counting is
275	used to detect sharing and allows optimization. For a purely functional (a.k.a. immutable)
276	data structure, all mutations are modelled by shallow copies along the path of mutation.
277	With reference counting optimization, unique nodes are allowed to be mutated in place.
278	This however, may potentially introduce some complications and bugs; a few issues are
279	discussed near the end of this section.
280
281	\subsubsection{Source: AST/Node.hpp}
282
283	class @ast::Node@ is the base class of all new-ast node classes, which implements
284	reference counting mechanism. Two different counters are recorded: ``strong'' reference
285	count for number of nodes semantically owning it; ``weak'' reference count for number of
286	nodes holding a mere reference and only need to observe changes.
287	class @ast::ptr_base@ is the smart pointer implementation and also takes care of
288	resource management.
289
290	Direct access through the smart pointer is read-only. A mutable access should be obtained
291	by calling shallowCopy or mutate as below.
292
293	Currently, the weak pointers are only used to reference declaration nodes from a named
294	type, or a variable expression. Since declaration nodes are intended to denote unique
295	entities in the program, weak pointers always point to unique (unshared) nodes. This may
296	change in the future, and weak references to shared nodes may introduce some problems;
297	see mutate function below.
298
299	All node classes should always use smart pointers in the structure and should not use raw
300	pointers.
301
302	\begin{C++}
303	void ast::Node::increment(ref_type ref)
304	\end{C++}
305	Increments this node's strong or weak reference count.
306	\begin{C++}
307	void ast::Node::decrement(ref_type ref, bool do_delete = true)
308	\end{C++}
309	Decrements this node's strong or weak reference count. If strong reference count reaches
310	zero, the node is deleted by default.
311	\textbf{NOTE}: Setting @do_delete@ to false may result in a detached node. Subsequent code should
312	manually delete the node or assign it to a strong pointer to prevent memory leak.
313	Reference counting functions are internally called by @ast::ptr_base@.
314	\begin{C++}
315	template<typename node_t>
316	node_t * shallowCopy(const node_t * node)
317	\end{C++}
318	Returns a mutable, shallow copy of node: all child pointers are pointing to the same child
319	nodes.
320	\begin{C++}
321	template<typename node_t>
322	node_t * mutate(const node_t * node)
323	\end{C++}
324	If node is unique (strong reference count is 1), returns a mutable pointer to the same node.
325	Otherwise, returns shallowCopy(node).
326	It is an error to mutate a shared node that is weak-referenced. Currently this does not
327	happen. The problem may appear once weak pointers to shared nodes (e.g. expression
328	nodes) are used; special care will be needed.
329
330	\textbf{NOTE}: This naive uniqueness check may not be sufficient in some cases. A discussion of the
331	issue is presented at the end of this section.
332	\begin{C++}
333	template<typename node_t, typename parent_t, typename field_t, typename assn_t>
334	const node_t * mutate_field(const node_t * node, field_t parent_t::*field, assn_t && val)
335	\end{C++}
336	\begin{C++}
337	template<typename node_t, typename parent_t, typename coll_t, typename ind_t,
338	typename field_t>
339	const node_t * mutate_field_index(const node_t * node, coll_t parent_t::* field, ind_t i,
340	field_t && val)
341	\end{C++}
342	Helpers for mutating a field on a node using pointer to member (creates shallow copy
343	when necessary).
344
345	\subsubsection{Issue: Undetected sharing}
346
347	The @mutate@ behavior described above has a problem: deeper shared nodes may be
348	mistakenly considered as unique. \VRef[Figure]{f:DeepNodeSharing} shows how the problem could arise:
349	\begin{figure}
350	\centering
351	\input{DeepNodeSharing}
352	\caption{Deep sharing of nodes}
353	\label{f:DeepNodeSharing}
354	\end{figure}
355	Suppose that we are working on the tree rooted at P1, which
356	is logically the chain P1-A-B and P2 is irrelevant, and then
357	mutate(B) is called. The algorithm considers B as unique since
358	it is only directly owned by A. However, the other tree P2-A-B
359	indirectly shares the node B and is therefore wrongly mutated.
360
361	To partly address this problem, if the mutation is called higher up the tree, a chain
362	mutation helper can be used:
363
364	\subsubsection{Source: AST/Chain.hpp}
365
366	\begin{C++}
367	template<typename node_t, Node::ref_type ref_t>
368	auto chain_mutate(ptr_base<node_t, ref_t> & base)
369	\end{C++}
370	This function returns a chain mutator handle which takes pointer-to-member to go down
371	the tree while creating shallow copies as necessary; see @struct _chain_mutator@ in the
372	source code for details.
373
374	\bibliographystyle{plain}
375	\bibliography{pl}
376
377
378	\end{document}
379
380	% Local Variables: %
381	% tab-width: 4 %
382	% fill-column: 100 %
383	% compile-command: "make" %
384	% End: %

Note: See TracBrowser for help on using the repository browser.

Download in other formats: