source: doc/aaron_comp_II/comp_II.tex @ ef3b335

aaron-thesisarm-ehcleanup-dtorsctordeferred_resndemanglerjacob/cs343-translationjenkins-sandboxmemorynew-astnew-ast-unique-exprnew-envno_listpersistent-indexerresolv-newwith_gc
Last change on this file since ef3b335 was ef3b335, checked in by Aaron Moss <a3moss@…>, 5 years ago

Minor edits to comp II draft

  • Property mode set to 100644
File size: 20.2 KB
Line 
1% inline code ©...© (copyright symbol) emacs: C-q M-)
2% red highlighting ®...® (registered trademark symbol) emacs: C-q M-.
3% blue highlighting ß...ß (sharp s symbol) emacs: C-q M-_
4% green highlighting ¢...¢ (cent symbol) emacs: C-q M-"
5% LaTex escape §...§ (section symbol) emacs: C-q M-'
6% keyword escape ¶...¶ (pilcrow symbol) emacs: C-q M-^
7% math escape $...$ (dollar symbol)
8
9\documentclass[twoside,11pt]{article}
10
11%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
12
13% Latex packages used in the document (copied from CFA user manual).
14\usepackage[T1]{fontenc}                                % allow Latin1 (extended ASCII) characters
15\usepackage{textcomp}
16\usepackage[latin1]{inputenc}
17\usepackage{fullpage,times,comment}
18\usepackage{epic,eepic}
19\usepackage{upquote}                                                                    % switch curled `'" to straight
20\usepackage{calc}
21\usepackage{xspace}
22\usepackage{graphicx}
23\usepackage{varioref}                                                                   % extended references
24\usepackage{listings}                                                                   % format program code
25\usepackage[flushmargin]{footmisc}                                              % support label/reference in footnote
26\usepackage{latexsym}                                   % \Box glyph
27\usepackage{mathptmx}                                   % better math font with "times"
28\usepackage[usenames]{color}
29\usepackage[pagewise]{lineno}
30\renewcommand{\linenumberfont}{\scriptsize\sffamily}
31\input{common}                                          % bespoke macros used in the document
32\usepackage[dvips,plainpages=false,pdfpagelabels,pdfpagemode=UseNone,colorlinks=true,pagebackref=true,linkcolor=blue,citecolor=blue,urlcolor=blue,pagebackref=true,breaklinks=true]{hyperref}
33\usepackage{breakurl}
34\renewcommand{\UrlFont}{\small\sf}
35
36\setlength{\topmargin}{-0.45in}                                                 % move running title into header
37\setlength{\headsep}{0.25in}
38
39%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
40
41\newsavebox{\LstBox}
42
43%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
44
45\title{\Huge
46\vspace*{1in}
47Efficient Type Resolution in \CFA \\
48\vspace*{0.25in}
49\huge
50PhD Comprehensive II Research Proposal
51\vspace*{1in}
52}
53
54\author{\huge
55\vspace*{0.1in}
56Aaron Moss \\
57\Large Cheriton School of Computer Science \\
58\Large University of Waterloo
59}
60
61\date{
62\today
63}
64
65%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
66
67\begin{document}
68\pagestyle{headings}
69% changed after setting pagestyle
70\renewcommand{\sectionmark}[1]{\markboth{\thesection\quad #1}{\thesection\quad #1}}
71\renewcommand{\subsectionmark}[1]{\markboth{\thesubsection\quad #1}{\thesubsection\quad #1}}
72\pagenumbering{roman}
73\linenumbers                                            % comment out to turn off line numbering
74
75\maketitle
76\thispagestyle{empty}
77
78\clearpage
79\thispagestyle{plain}
80\pdfbookmark[1]{Contents}{section}
81\tableofcontents
82
83\clearpage
84\thispagestyle{plain}
85\pagenumbering{arabic}
86
87\section{Introduction}
88
89\CFA\footnote{Pronounced ``C-for-all'', and written \CFA, CFA, or \CFL.} is an evolutionary modernization of the C programming language currently being designed and built at the University of Waterloo by a team led by Peter Buhr.
90Features added to C by \CFA include name overloading, user-defined operators, parametric-polymorphic routines, and type constructors and destructors, among others.
91These features make \CFA significantly more powerful and expressive than C, but impose a significant compile-time cost to implement, particularly in the expression resolver, which must evaluate the typing rules of a much more complex type system.
92The primary goal of this proposed research project is to develop a sufficiently performant expression resolution algorithm, experimentally validate its performance, and integrate it into \Index*{CFA-CC}, the \CFA reference compiler.
93Secondary goals of this project include the development of various new language features for \CFA; parametric-polymorphic (``generic'') types have already been designed and implemented, and reference types and user-defined conversions are under design consideration.
94The experimental performance-testing architecture for resolution algorithms will also be used to determine the compile-time cost of adding such new features to the \CFA type system.
95
96\section{\CFA}
97
98To make the scope of the proposed expression resolution problem more explicit, it is necessary to define the features of both C and \CFA (both current and proposed) which affect this algorithm.
99In some cases the interactions of multiple features make expression resolution a significantly more complex problem than any individual feature would; in others a feature which does not by itself add any complexity to expression resolution will trigger previously rare edge cases much more frequently.
100
101\subsection{Polymorphic Functions}
102The most significant feature \CFA adds is parametric-polymorphic functions.
103Such functions are written using a ©forall© clause, the feature that gave the language its name:
104\begin{lstlisting}
105forall(otype T)
106T identity(T x) {
107    return x;
108}
109
110int forty_two = identity(42); // T is bound to int, forty_two == 42
111\end{lstlisting}
112The ©identity© function above can be applied to any complete object type (or ``©otype©'').
113The type variable ©T© is transformed into a set of additional implicit parameters to ©identity© which encode sufficient information about ©T© to create and return a variable of that type.
114The current \CFA implementation passes the size and alignment of the type represented by an ©otype© parameter, as well as an assignment operator, constructor, copy constructor \& destructor.
115
116Since bare polymorphic types do not provide a great range of available operations, \CFA also provides a \emph{type assertion} mechanism to provide further information about a type:
117\begin{lstlisting}
118forall(otype T | { T twice(T); })
119T four_times(T x) {
120    return twice( twice(x) );
121}
122
123double twice(double d) { return d * 2.0; } // (1)
124
125double magic = four_times(10.5); // T is bound to double, uses (1) to satisfy type assertion
126\end{lstlisting}
127These type assertions may be either variable or function declarations which depend on a polymorphic type variable.
128©four_times© can only be called with an argument for which there exists a function named ©twice© that can take that argument and return another value of the same type; a pointer to the appropriate ©twice© function will be passed as an additional implicit parameter to the call to ©four_times©.
129
130Monomorphic specializations of polymorphic functions can themselves be used to satisfy type assertions.
131For instance, ©twice© could have been define as below, using the \CFA syntax for operator overloading:
132\begin{lstlisting}
133forall(otype S | { S ?+?(S, S); })
134S twice(S x) { return x + x; }  // (2)
135\end{lstlisting} 
136This version of ©twice© will work for any type ©S© that has an addition operator defined for it, and it could have been used to satisfy the type assertion on ©four_times©.
137The compiler accomplishes this by creating a wrapper function calling ©twice // (2)© with ©S© bound to ©double©, then providing this wrapper function to ©four_times©\footnote{©twice // (2)© could have had a type parameter named ©T©; \CFA specifies a renaming the type parameters, which would avoid the name conflict with the parameter ©T© of ©four_times©.}.
138
139Finding appropriate functions to satisfy type assertions is essentially a recursive case of expression resolution, as it takes a name (that of the type assertion) and attempts to match it to a suitable declaration in the current scope.
140If a polymorphic function can be used to satisfy one of its own type assertions, this recursion may not terminate, as it is possible that function will be examined as a candidate for its own type assertion unboundedly repeatedly.
141To avoid infinite loops, the current \Index*{CFA-CC} compiler imposes a fixed limit on the possible depth of recursion, similar to that employed by most \Index*[C++]{\CC} compilers for template expansion; this restriction means that there are some semantically well-typed expressions which cannot be resolved by {CFA-CC}.
142One area of potential improvement this project proposes to investigate is the possibility of using the compiler's knowledge of the current set of declarations to make a more precise judgement of when further type assertion satisfaction recursion will not produce a well-typed expression.
143
144\subsection{Name Overloading}
145In C, no more than one function or variable in the same scope may share the same name, and function or variable declarations in inner scopes with the same name as a declaration in an outer scope hide the outer declaration. 
146This makes finding the proper declaration to match to a function application or variable expression a simple matter of symbol table lookup, which can be easily and efficiently implemented.
147\CFA, on the other hand, allows overloading of variable and function names, so long as the overloaded declarations do not have the same type, avoiding the multiplication of function names for different types common in the C standard library, as in the following example:
148\begin{lstlisting}
149int three = 3;
150double three = 3.0;
151
152int thrice(int i) { return i * three; } // uses int three
153double thrice(double d) { return d * three; } // uses double three
154
155// thrice(three); // ERROR: ambiguous
156int nine = thrice(three);    // uses int thrice and three, based on return type
157double nine = thrice(three); // uses double thrice and three, based on return type
158\end{lstlisting}
159
160The presence of name overloading in \CFA means that simple table lookup is not sufficient to match identifiers to declarations, and a type matching algorithm must be part of expression resolution.
161
162\subsection{Implicit Conversions}
163In addition to the multiple interpretations of an expression produced by name overloading, \CFA also supports all of the implicit conversions present in C, producing further candidate interpretations for expressions.
164C does not have a traditionally-defined inheritance hierarchy of types, but the C standard's rules for the ``usual arithmetic conversions'' define which of the built-in types are implicitly convertable to which other types, and the relative cost of any pair of such conversions from a single source type.
165\CFA adds to the usual arithmetic conversions rules for determining the cost of binding a polymorphic type variable in a function call; such bindings are cheaper than any \emph{unsafe} (narrowing) conversion, \eg ©int© to ©char©, but more expensive than any \emph{safe} (widening) conversion, \eg ©int© to ©double©.
166The expression resolution problem, then, is to find the unique minimal-cost interpretation of each expression in the program, where all identifiers must be matched to a declaration and implicit conversions or polymorphic bindings of the result of an expression may increase the cost of the expression.
167Note that which subexpression interpretation is minimal-cost may require contextual information to disambiguate.
168
169\subsubsection{User-generated Implicit Conversions}
170One possible additional feature to \CFA included in this research proposal is \emph{user-generated implicit conversions}.
171Such a conversion system should be simple for user programmers to utilize, and fit naturally with the existing design of implicit conversions in C; ideally it would also be sufficiently powerful to encode C's usual arithmetic conversions itself, so that \CFA only has one set of rules for conversions.
172
173Glen Ditchfield \textbf{TODO CITE} has laid out a framework for using polymorphic conversion constructor functions to create a directed acyclic graph (DAG) of conversions.
174A monomorphic variant of these functions can be used to mark a conversion arc in the DAG as only usable as the final step in a conversion.
175With these two types of conversion arcs, separate DAGs can be created for the safe and the unsafe conversions, and conversion cost can be represented as path length through the DAG.
176Open research questions on this topic include whether a conversion graph can be generated that represents each allowable conversion in C with a unique minimal-length path, such that the path lengths accurately represent the relative costs of the conversions, whether such a graph representation can be usefully augmented to include user-defined types as well as built-in types, and whether the graph can be efficiently represented and included in the expression resolver.
177
178\subsection{Constructors \& Destructors}
179Rob Shluntz, a current member of the \CFA research team, has added constructors and destructors to \CFA.
180Each type has an overridable default-generated zero-argument constructor, copy constructor, assignment operator, and destructor; for struct types these functions each call their equivalents on each field of the struct.
181This affects expression resolution because an ©otype© type variable ©T© implicitly adds four type assertions, one for each of these four functions, so assertion resolution is pervasive in \CFA polymorphic functions, even those without any explicit type assertions.
182
183\subsection{Generic Types}
184The author has added a generic type capability to \CFA, designed to efficiently and naturally integrate with \CFA's existing polymorphic functions.
185A generic type can be declared by placing a ©forall© specifier on a struct or union declaration, and instantiated using a parenthesized list of types after the type name:
186\begin{lstlisting}
187forall(otype R, otype S) struct pair {
188    R first;
189    S second;
190};
191
192forall(otype T)
193T value( pair(const char*, T) *p ) { return p->second; }
194
195pair(const char*, int) p = { "magic", 42 };
196int magic = value( &p );
197\end{lstlisting}
198For \emph{concrete} generic types, that is, those where none of the type parameters depend on polymorphic type variables (like ©pair(const char*, int)© above), the struct is essentially template expanded to a new struct type; for \emph{polymorphic} generic types (such as ©pair(const char*, T)© above), member access is handled by a runtime calculation of the field offset, based on the size and alignment information of the polymorphic parameter type.
199The default-generated constructors, destructor \& assignment operator for a generic type are polymorphic functions with the same list of type parameters as the generic type definition.
200
201Aside from giving users the ability to create more parameterized types than just the built-in pointer, array \& function types, the combination of generic types with polymorphic functions and implicit conversions makes the edge case where a polymorphic function can match its own assertions much more common, as follows:
202\begin{itemize} 
203\item Given an expression in an untyped context, such as a top-level function call with no assignment of return values, apply a polymorphic implicit conversion to the expression that can produce multiple types (the built-in conversion from ©void*© to any other pointer type is one, but not the only).
204\item When attempting to use a generic type with ©otype© parameters (such as ©box© above) for the result type of the expression, the resolver will also need to decide what type to use for the ©otype© parameters on the constructors and related functions, and will have no constraints on what they may be.
205\item Attempting to match some yet-to-be-determined specialization of the generic type to this ©otype© parameter will create a recursive case of the default constructor, \etc matching their own type assertions, creating an unboundedly deep nesting of the generic type inside itself.
206\end{itemize}
207As discussed above, any \CFA expression resolver must handle this possible infinite recursion somehow, but the combination of generic types with other language features makes this particular edge case occur somewhat frequently in user code.
208
209\subsection{Tuple Types}
210\CFA adds \emph{tuple types} to C, a facility for referring to multiple values with a single identifier.
211A variable may name a tuple, and a function may return one.
212Particularly relevantly for resolution, a tuple may be automatically \emph{destructured} into a list of values, as in the ©swap© function below:
213\begin{lstlisting}
214[char, char] x = [ '!', '?' ];
215int x = 42;
216
217forall(otype T) [T, T] swap( T a, T b ) { return [b, a]; }
218
219x = swap( x ); // destructure [char, char] x into two elements of parameter list
220// ^ can't use int x for parameter, not enough arguments to swap
221\end{lstlisting}
222Tuple destructuring means that the mapping from the position of a subexpression in the argument list to the position of a paramter in the function declaration is not straightforward, as some arguments may be expandable to different numbers of parameters, like ©x© above.
223
224\subsection{Reference Types}
225The author, in collaboration with the rest of the \CFA research team, has been designing \emph{reference types} for \CFA.
226Given some type ©T©, a ©T&© (``reference to ©T©'') is essentially an automatically dereferenced pointer; with these semantics most of the C standard's discussions of lvalues can be expressed in terms of references instead, with the benefit of being able to express the difference between the reference and non-reference version of a type in user code.
227References preserve C's existing qualifier-dropping lvalue-to-rvalue conversion (\eg a ©const volatile int&© can be implicitly converted to a bare ©int©); the reference proposal also adds a rvalue-to-lvalue conversion to \CFA, implemented by storing the value in a new compiler-generated temporary and passing a reference to the temporary.
228These two conversions can chain, producing a qualifier-dropping conversion for references, for instance converting a reference to a ©const int© into a reference to a non-©const int© by copying the originally refered to value into a fresh temporary and taking a reference to this temporary.
229These reference conversions may also chain with the other implicit type conversions.
230The main implication of this for expression resolution is the multiplication of available implicit conversions, though in a restricted context that may be able to be treated efficiently as a special case.
231
232\subsection{Literal Types}
233Another proposal currently under consideration for the \CFA type system is assigning special types to the literal values ©0© and ©1©.%, say ©zero_t© and ©one_t©.
234Implicit conversions from these types would allow ©0© and ©1© to be considered as values of many different types, depending on context, allowing expression desugarings like ©if ( x ) {}© $\Rightarrow$ ©if ( x != 0 ) {}© to be implemented efficiently and precicely.
235This is a generalization of C's existing behaviour of treating ©0© as either an integer zero or a null pointer constant, and treating either of those values as boolean false.
236The main implication for expression resolution is that the frequently encountered expressions ©0© and ©1© may have a significant number of valid interpretations.
237
238\subsection{Deleted Function Declarations}
239One final proposal for \CFA with an impact on the expression resolver is \emph{deleted function declarations}; in \CCeleven, a function declaration can be deleted as below:
240\begin{lstlisting}
241int somefn(char) = delete;
242\end{lstlisting}
243To add a similar feature to \CFA would involve including the deleted function declarations in expression resolution along with the normal declarations, but producing a compiler error if the deleted function was the best resolution.
244How conflicts should be handled between resolution of an expression to both a deleted and a non-deleted function is a small but open research question.
245
246\section{Expression Resolution}
247% TODO cite Baker, Cormack, etc.
248
249\section{Completion Timeline}
250The following is a preliminary estimate of the time necessary to complete the major components of this research project:
251\begin{center}
252\begin{tabular}{ | r @{--} l | p{4in} | }
253\hline       May 2015 & April 2016   & Project familiarization and generic types design \& implementation. \\
254\hline       May 2016 & April 2017   & Design \& implement prototype resolver and run performance experiments. \\
255\hline       May 2017 & August 2017  & Integrate new language features and best-performing resolver prototype into {CFA-CC}. \\
256\hline September 2017 & January 2018 & Thesis writing \& defense. \\
257\hline
258\end{tabular}
259\end{center}
260
261\section{Conclusion}
262
263\newpage
264
265\bibliographystyle{plain}
266\bibliography{cfa}
267
268
269\addcontentsline{toc}{section}{\indexname} % add index name to table of contents
270\begin{theindex}
271Italic page numbers give the location of the main entry for the referenced term.
272Plain page numbers denote uses of the indexed term.
273Entries for grammar non-terminals are italicized.
274A typewriter font is used for grammar terminals and program identifiers.
275\indexspace
276\input{comp_II.ind}
277\end{theindex}
278
279\end{document}
Note: See TracBrowser for help on using the repository browser.