source: doc/theses/fangren_yu_COOP_S20/Report.tex @ ae45007

arm-ehjacob/cs343-translationnew-ast-unique-expr
Last change on this file since ae45007 was ae45007, checked in by Peter A. Buhr <pabuhr@…>, 15 months ago

start latex version of Fangren's co-op report

  • Property mode set to 100644
File size: 12.3 KB
Line 
1\documentclass[twoside,12pt]{article}
2
3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
4
5% Latex packages used in the document.
6\usepackage{fullpage,times,comment}
7\usepackage{epic,eepic}
8\usepackage{upquote}                                                                    % switch curled `'" to straight
9\usepackage{calc}
10\usepackage{varioref}                                                                   % extended references
11\usepackage[labelformat=simple,aboveskip=0pt,farskip=0pt]{subfig}
12\renewcommand{\thesubfigure}{\alph{subfigure})}
13\usepackage{latexsym}                                   % \Box glyph
14\usepackage{mathptmx}                                   % better math font with "times"
15\usepackage[usenames]{color}
16\input{common}                                          % common CFA document macros
17\lstset{
18escapechar=\$,                                                                                  % LaTeX escape in CFA code
19moredelim=**[is][\color{red}]{`}{`},
20}% lstset
21\lstMakeShortInline@%
22\usepackage[dvips,plainpages=false,pdfpagelabels,pdfpagemode=UseNone,colorlinks=true,pagebackref=true,linkcolor=blue,citecolor=blue,urlcolor=blue,pagebackref=true,breaklinks=true]{hyperref}
23\usepackage{breakurl}
24
25\usepackage[pagewise]{lineno}
26\renewcommand{\linenumberfont}{\scriptsize\sffamily}
27
28% Default underscore is too low and wide. Cannot use lstlisting "literate" as replacing underscore
29% removes it as a variable-name character so keywords in variables are highlighted. MUST APPEAR
30% AFTER HYPERREF.
31\renewcommand{\textunderscore}{\leavevmode\makebox[1.2ex][c]{\rule{1ex}{0.075ex}}}
32
33\setlength{\topmargin}{-0.45in}                                                 % move running title into header
34\setlength{\headsep}{0.25in}
35
36%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
37
38\CFAStyle                                                                                               % use default CFA format-style
39\lstnewenvironment{C++}[1][]                            % use C++ style
40{\lstset{language=C++,moredelim=**[is][\protect\color{red}]{`}{`},#1}}
41{}
42
43%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
44
45\setcounter{secnumdepth}{3}                             % number subsubsections
46\setcounter{tocdepth}{3}                                % subsubsections in table of contents
47\makeindex
48
49%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
50
51\title{\Huge
52cfa-cc Developer's Reference
53}% title
54
55\author{\LARGE
56Fangren Yu
57}% author
58
59\date{
60\today
61}% date
62
63%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
64
65\begin{document}
66\pagestyle{headings}
67% changed after setting pagestyle
68\renewcommand{\sectionmark}[1]{\markboth{\thesection\quad #1}{\thesection\quad #1}}
69\renewcommand{\subsectionmark}[1]{\markboth{\thesubsection\quad #1}{\thesubsection\quad #1}}
70\pagenumbering{roman}
71\linenumbers                                            % comment out to turn off line numbering
72
73\maketitle
74\pdfbookmark[1]{Contents}{section}
75\tableofcontents
76
77\clearpage
78\thispagestyle{plain}
79\pagenumbering{arabic}
80
81
82\section{Overview}
83
84cfa-cc is the reference compiler for the Cforall programming language, which is a non-
85object-oriented extension to C.
86Cforall attempts to introduce productive modern programming language features to C
87while maintaining as much backward-compatibility as possible, so that most existing C
88programs can seamlessly work with Cforall.
89
90Since the Cforall project was dated back to the early 2000s, and only restarted in the past
91few years, there is a significant amount of legacy code in the current compiler codebase,
92with little proper documentation available. This becomes a difficulty while developing new
93features based on the previous implementations, and especially while diagnosing
94problems.
95
96Currently, the Cforall team is also facing another problem: bad compiler performance. For
97the development of a new programming language, writing a standard library is an
98important part. The incompetence of the compiler causes building the library files to take
99tens of minutes, making iterative development and testing almost impossible. There is
100ongoing effort to rewrite the core data structure of the compiler to overcome the
101performance issue, but many bugs may appear during the work, and lack of documentation
102makes debugging extremely difficult.
103
104This developer's reference will be continuously improved and eventually cover the
105compiler codebase. For now, the focus is mainly on the parts being rewritten, and also the
106performance bottleneck, namely the resolution algorithm. It is aimed to provide new
107developers to the project enough guidance and clarify the purposes and behavior of certain
108functions which are not mentioned in the previous Cforall research papers.
109
110
111\section{Compiler Framework}
112
113\subsection{AST Representation}
114
115Source code input is first transformed into abstract syntax tree (AST) representation by the
116parser before analyzed by the compiler.
117
118There are 4 major categories of AST nodes used by the compiler, along with some derived
119structures.
120
121\paragraph{Declaration nodes} ~
122
123\noindent
124A declaration node represents either of:
125\begin{itemize}
126\item
127Type declaration: struct, union, typedef or type parameter (see Appendix A.3)
128\item
129Variable declaration
130\item
131Function declaration
132\end{itemize}
133Declarations are introduced by standard C declarations, with the usual scoping rules.
134In addition, declarations can also be introduced by the forall clause (which is the origin
135of Cforall's name):
136\begin{cfa}
137forall (<$\emph{TypeParameterList}$> | <$\emph{AssertionList}$>)
138        $\emph{declaration}$
139\end{cfa}
140Type parameters in Cforall are similar to \CC template type parameters. The Cforall
141declaration
142\begin{cfa}
143forall (dtype T) ...
144\end{cfa}
145behaves similarly as the \CC template declaration
146\begin{C++}
147template <typename T> ...
148\end{C++}
149
150Assertions are a distinctive feature of Cforall: contrary to the \CC template where
151arbitrary functions and operators can be used in a template definition, in a Cforall
152parametric function, operations on parameterized types must be declared in assertions.
153
154Consider the following \CC template:
155\begin{C++}
156template <typename T> int foo(T t) {
157        return bar(t) + baz(t);
158}
159\end{C++}
160Unless bar and baz are also parametric functions taking any argument type, they must be
161declared in the assertions, or otherwise the code will not compile:
162\begin{cfa}
163forall (dtype T | { int bar(T); int baz(t); }) int foo (T t) {
164        return bar(t) + baz(t);
165}
166\end{cfa}
167Assertions are written using the usual function declaration syntax. The scope of type
168parameters and assertions is the following declaration.
169
170\paragraph{Type nodes} ~
171
172\noindent
173A type node represents the type of an object or expression.
174Named types reference the corresponding type declarations. The type of a function is its
175function pointer type (same as standard C).
176With the addition of type parameters, named types may contain a list of parameter values
177(actual parameter types).
178
179\paragraph{Statement nodes} ~
180
181\noindent
182Statement nodes represent the statements in the program, including basic expression
183statements, control flows and blocks.
184Local declarations (within a block statement) are represented as declaration statements.
185
186\paragraph{Expression nodes} ~
187
188\noindent
189Some expressions are represented differently in the compiler before and after resolution
190stage:
191\begin{itemize}
192\item
193Name expressions: NameExpr pre-resolution, VariableExpr post-resolution
194\item
195Member expressions: UntypedMemberExpr pre-resolution, MemberExpr post-resolution
196\item
197Function call expressions (including overloadable operators): UntypedExpr pre-resolution, ApplicationExpr post-resolution
198\end{itemize}
199The pre-resolution representations contain only the symbols. Post-resolution results link
200them to the actual variable and function declarations.
201
202
203\subsection{Compilation Passes}
204
205Compilation steps are implemented as passes, which follows a general structural recursion
206pattern on the syntax tree.
207
208The basic work flow of compilation passes follows preorder and postorder traversal on
209tree data structure, implemented with visitor pattern, and can be loosely described with
210the following pseudocode:
211\begin{cfa}
212Pass::visit (node_t node) {
213        previsit(node);
214        if (visit_children)
215                for each child of node:
216                        child.accept(this);
217        postvisit(node);
218}
219\end{cfa}
220Operations in previsit() happen in preorder (top to bottom) and operations in
221postvisit() happen in postorder (bottom to top). The precise order of recursive
222operations on child nodes can be found in @Common/PassVisitor.impl.h@ (old) and
223@AST/Pass.impl.hpp@ (new).
224Implementations of compilation passes need to follow certain conventions:
225\begin{itemize}
226\item
227Passes \textbf{should not} directly override the visit method (Non-virtual Interface
228principle); if a pass desires different recursion behavior, it should set
229@visit_children@ to false and perform recursive calls manually within previsit or
230postvisit procedures. To enable this option, inherit from @WithShortCircuiting@ mixin.
231\item
232previsit may mutate the node but \textbf{must not} change the node type or return null.
233\item
234postvisit may mutate the node, reconstruct it to a different node type, or delete it by
235returning null.
236\item
237If the previsit or postvisit method is not defined for a node type, the step is skipped.
238If the return type is declared as void, the original node is returned by default. These
239behaviors are controlled by template specialization rules; see
240@Common/PassVisitor.proto.h@ (old) and @AST/Pass.proto.hpp@ (new) for details.
241\end{itemize}
242Other useful mixin classes for compilation passes include:
243\begin{itemize}
244\item
245WithGuards allows saving values of variables and restore automatically upon exiting
246the current node.
247\item
248WithVisitorRef creates a wrapped entity of current pass (the actual argument
249passed to recursive calls internally) for explicit recursion, usually used together
250with WithShortCircuiting.
251\item
252WithSymbolTable gives a managed symbol table with built-in scoping rule handling
253(e.g. on entering and exiting a block statement)
254\end{itemize}
255\textbf{NOTE}: If a pass extends the functionality of another existing pass, due to \CC overloading
256resolution rules, it \textbf{must} explicitly introduce the inherited previsit and postvisit procedures
257to its own scope, or otherwise they will not be picked up by template resolution:
258\begin{cfa}
259class Pass2: public Pass1 {
260        using Pass1::previsit;
261        using Pass1::postvisit;
262        // new procedures
263}
264\end{cfa}
265
266
267\subsection{Data Structure Change WIP (new-ast)}
268
269It has been observed that excessive copying of syntax tree structures accounts for a
270majority of computation cost and significantly slows down the compiler. In the previous
271implementation of the syntax tree, every internal node has a unique parent; therefore all
272copies are required to duplicate everything down to the bottom. A new, experimental
273re-implementation of the syntax tree (source under directory AST/ hereby referred to as
274``new-ast'') attempts to overcome this issue with a functional approach that allows sharing
275of common sub-structures and only makes copies when necessary.
276
277The core of new-ast is a customized implementation of smart pointers, similar to
278@std::shared_ptr@ and @std::weak_ptr@ in C++ standard library. Reference counting is
279used to detect sharing and allows optimization. For a purely functional (a.k.a. immutable)
280data structure, all mutations are modelled by shallow copies along the path of mutation.
281With reference counting optimization, unique nodes are allowed to be mutated in place.
282This however, may potentially introduce some complications and bugs; a few issues are
283discussed near the end of this section.
284
285\paragraph{Source: AST/Node.hpp} ~
286
287\noindent
288class @ast::Node@ is the base class of all new-ast node classes, which implements
289reference counting mechanism. Two different counters are recorded: ``strong'' reference
290count for number of nodes semantically owning it; ``weak'' reference count for number of
291nodes holding a mere reference and only need to observe changes.
292class @ast::ptr_base@ is the smart pointer implementation and also takes care of
293resource management.
294
295Direct access through the smart pointer is read-only. A mutable access should be obtained
296by calling shallowCopy or mutate as below.
297
298Currently, the weak pointers are only used to reference declaration nodes from a named
299type, or a variable expression. Since declaration nodes are intended to denote unique
300entities in the program, weak pointers always point to unique (unshared) nodes. This may
301change in the future, and weak references to shared nodes may introduce some problems;
302see mutate function below.
303
304All node classes should always use smart pointers in the structure and should not use raw
305pointers.
306
307\bibliographystyle{plain}
308\bibliography{pl}
309
310
311\end{document}
312
313% Local Variables: %
314% tab-width: 4 %
315% fill-column: 100 %
316% compile-command: "make" %
317% End: %
Note: See TracBrowser for help on using the repository browser.