# source:doc/theses/aaron_moss_PhD/comp_II/comp_II.tex@72b1800

Last change on this file since 72b1800 was 67982887, checked in by Peter A. Buhr <pabuhr@…>, 4 years ago

specialize thesis directory-names

• Property mode set to 100644
File size: 60.2 KB
Line
1\documentclass[twoside,11pt]{article}
2
3%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
4
5% Latex packages used in the document (copied from CFA user manual).
6\usepackage[T1]{fontenc}                                % allow Latin1 (extended ASCII) characters
7\usepackage{textcomp}
8\usepackage[latin1]{inputenc}
9\usepackage{fullpage,times,comment}
10\usepackage{epic,eepic}
11\usepackage{upquote}                                                                    % switch curled '" to straight
12\usepackage{calc}
13\usepackage{xspace}
14\usepackage{graphicx}
15\usepackage{varioref}                                                                   % extended references
16\usepackage{listings}                                                                   % format program code
17\usepackage[flushmargin]{footmisc}                                              % support label/reference in footnote
18\usepackage{latexsym}                                   % \Box glyph
19\usepackage{mathptmx}                                   % better math font with "times"
20\usepackage[usenames]{color}
21\input{common}                                          % bespoke macros used in the document
23\usepackage{breakurl}
24\renewcommand{\UrlFont}{\small\sf}
25
26\usepackage[pagewise]{lineno}
27\renewcommand{\linenumberfont}{\scriptsize\sffamily}
28
29% Default underscore is too low and wide. Cannot use lstlisting "literate" as replacing underscore
30% removes it as a variable-name character so keywords in variables are highlighted. MUST APPEAR
31% AFTER HYPERREF.
32\renewcommand{\textunderscore}{\leavevmode\makebox[1.2ex][c]{\rule{1ex}{0.075ex}}}
33
34\setlength{\topmargin}{-0.45in}                                                 % move running title into header
36
37\CFAStyle                                                                                               % use default CFA format-style
38
40% red highlighting ®...® (registered trademark symbol) emacs: C-q M-.
41% blue highlighting ß...ß (sharp s symbol) emacs: C-q M-_
42% green highlighting ¢...¢ (cent symbol) emacs: C-q M-"
43% LaTex escape §...§ (section symbol) emacs: C-q M-'
44% keyword escape ¶...¶ (pilcrow symbol) emacs: C-q M-^
45% math escape $...$ (dollar symbol)
46
47\usepackage{caption}
48\usepackage{subcaption}
49
50%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
51
52\newsavebox{\LstBox}
53
54%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
55
56\title{
57\Huge \vspace*{1in} Efficient Type Resolution in \CFA \\
58\huge \vspace*{0.25in} PhD Comprehensive II Research Proposal
59\vspace*{1in}
60}
61
62\author{
63\huge Aaron Moss \\
64\Large \vspace*{0.1in} \texttt{a3moss@uwaterloo.ca} \\
65\Large Cheriton School of Computer Science \\
66\Large University of Waterloo
67}
68
69\date{
70\today
71}
72
73%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
74
75\newcommand{\bigO}[1]{O\!\left( #1 \right)}
76
77\begin{document}
79% changed after setting pagestyle
82\pagenumbering{roman}
83\linenumbers                                            % comment out to turn off line numbering
84
85\maketitle
86\thispagestyle{empty}
87
88\clearpage
89\thispagestyle{plain}
90\pdfbookmark[1]{Contents}{section}
91\tableofcontents
92
93\clearpage
94\thispagestyle{plain}
95\pagenumbering{arabic}
96
97\section{Introduction}
98
99\CFA\footnote{Pronounced C-for-all'', and written \CFA or \CFL.} is an evolutionary modernization of the C programming language currently being designed and built at the University of Waterloo by a team led by Peter Buhr.
100\CFA both fixes existing design problems and adds multiple new features to C, including name overloading, user-defined operators, parametric-polymorphic routines, and type constructors and destructors, among others.
101The new features make \CFA more powerful and expressive than C, but impose a compile-time cost, particularly in the expression resolver, which must evaluate the typing rules of a significantly more complex type-system.
102
103The primary goal of this research project is to develop a sufficiently performant expression resolution algorithm, experimentally validate its performance, and integrate it into CFA, the \CFA reference compiler.
104Secondary goals of this project include the development of various new language features for \CFA: parametric-polymorphic (generic'') types have already been designed and implemented, and reference types and user-defined conversions are under design consideration.
105An experimental performance-testing architecture for resolution algorithms is under development to determine the relative performance of different expression resolution algorithms, as well as the compile-time cost of adding various new features to the \CFA type-system.
106More broadly, this research should provide valuable data for implementers of compilers for other programming languages with similarly powerful static type-systems.
107
108\section{\CFA}
109
110To make the scope of the proposed expression resolution problem more explicit, it is necessary to define the features of both C and \CFA (both current and proposed) that affect this algorithm.
111In some cases the interactions of multiple features make expression resolution a significantly more complex problem than any individual feature would; in other cases a feature that does not by itself add any complexity to expression resolution triggers previously rare edge cases more frequently.
112
113It is important to note that \CFA is not an object-oriented language.
114\CFA does have a system of (possibly implicit) type conversions derived from C's type conversions; while these conversions may be thought of as something like an inheritance hierarchy, the underlying semantics are significantly different and such an analogy is loose at best.
115Particularly, \CFA has no concept of subclass'', and thus no need to integrate an inheritance-based form of polymorphism with its parametric and overloading-based polymorphism.
116The graph structure of the \CFA type conversions is also markedly different than an inheritance graph; it has neither a top nor a bottom type, and does not satisfy the lattice properties typical of inheritance graphs.
117
118\subsection{Polymorphic Functions}
119The most significant feature \CFA adds is parametric-polymorphic functions.
120Such functions are written using a ©forall© clause (which gives the language its name):
121\begin{lstlisting}
122®forall(otype T)®
123T identity(T x) {
124    return x;
125}
126
127int forty_two = identity(42); // T is bound to int, forty_two == 42
128\end{lstlisting}
129The ©identity© function above can be applied to any complete object type (or ©otype©'').
131The current \CFA implementation passes the size and alignment of the type represented by an ©otype© parameter, as well as an assignment operator, constructor, copy constructor and destructor.
132Here, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; preliminary experiments have shown this overhead to be similar to \CC virtual function calls.
133Determining if packaging all polymorphic arguments to a function into a virtual function table would reduce the runtime overhead of polymorphic calls is an open research question.
134
135Since bare polymorphic types do not provide a great range of available operations, \CFA provides a \emph{type assertion} mechanism to provide further information about a type:
136\begin{lstlisting}
137forall(otype T ®| { T twice(T); }®)
138T four_times(T x) {
139    return twice( twice(x) );
140}
141
142double twice(double d) { return d * 2.0; } // (1)
143
144double magic = four_times(10.5); // T is bound to double, uses (1) to satisfy type assertion
145\end{lstlisting}
146These type assertions may be either variable or function declarations that depend on a polymorphic type variable.
148
149Monomorphic specializations of polymorphic functions can themselves be used to satisfy type assertions.
151\begin{lstlisting}
152forall(otype S | { ®S ?+?(S, S);® })
153S twice(S x) { return x + x; }  // (2)
154\end{lstlisting}
157
158Finding appropriate functions to satisfy type assertions is essentially a recursive case of expression resolution, as it takes a name (that of the type assertion) and attempts to match it to a suitable declaration \emph{in the current scope}.
159If a polymorphic function can be used to satisfy one of its own type assertions, this recursion may not terminate, as it is possible that function is examined as a candidate for its own type assertion unboundedly repeatedly.
160To avoid infinite loops, the current CFA compiler imposes a fixed limit on the possible depth of recursion, similar to that employed by most \CC compilers for template expansion; this restriction means that there are some semantically well-typed expressions that cannot be resolved by CFA.
161One area of potential improvement this project proposes to investigate is the possibility of using the compiler's knowledge of the current set of declarations to more precicely determine when further type assertion satisfaction recursion does not produce a well-typed expression.
162
163\subsubsection{Traits}
164\CFA provides \emph{traits} as a means to name a group of type assertions, as in the example below:
165\begin{lstlisting}
166®trait has_magnitude(otype T)® {
167    bool ?<?(T, T);        // comparison operator for T
168    T -?(T);               // negation operator for T
169    void ?{}(T*, zero_t);  // constructor from 0 literal
170};
171
172forall(otype M | has_magnitude(M))
173M abs( M m ) {
174    M zero = 0;  // uses zero_t constructor from trait
175    return m < zero ? -m : m;
176}
177
178forall(otype M | has_magnitude(M))
179M max_magnitude( M a, M b ) {
180    return abs(a) < abs(b) ? b : a;
181}
182\end{lstlisting}
183
184Semantically, traits are simply a named lists of type assertions, but they may be used for many of the same purposes that interfaces in Java or abstract base classes in \CC are used for.
185Unlike Java interfaces or \CC base classes, \CFA types do not explicitly state any inheritance relationship to traits they satisfy; this can be considered a form of structural inheritance, similar to implementation of an interface in Go, as opposed to the nominal inheritance model of Java and \CC.
186Nominal inheritance can be simulated with traits using marker variables or functions:
187\begin{lstlisting}
188trait nominal(otype T) {
189    ®T is_nominal;®
190};
191
192int is_nominal;  // int now satisfies the nominal trait
193{
194    char is_nominal; // char satisfies the nominal trait
195}
196// char no longer satisfies the nominal trait here
197\end{lstlisting}
198
199Traits, however, are significantly more powerful than nominal-inheritance interfaces; firstly, due to the scoping rules of the declarations that satisfy a trait's type assertions, a type may not satisfy a trait everywhere that the type is declared, as with ©char© and the ©nominal© trait above.
200Secondly, traits may be used to declare a relationship among multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems:
201\begin{lstlisting}
202trait pointer_like(®otype Ptr, otype El®) {
203    lvalue El *?(Ptr); // Ptr can be dereferenced into a modifiable value of type El
204}
205
206struct list {
207    int value;
208    list *next;  // may omit "struct" on type names
209};
210
211typedef list *list_iterator;
212
213lvalue int *?( list_iterator it ) {
214    return it->value;
215}
216\end{lstlisting}
217
220While a nominal-inheritance system with associated types could model one of those two relationships by making ©El© an associated type of ©Ptr© in the ©pointer_like© implementation, few such systems could model both relationships simultaneously.
221
222The flexibility of \CFA's implicit trait-satisfaction mechanism provides programmers with a great deal of power, but also blocks some optimization approaches for expression resolution.
223The ability of types to begin to or cease to satisfy traits when declarations go into or out of scope makes caching of trait satisfaction judgements difficult, and the ability of traits to take multiple type parameters can lead to a combinatorial explosion of work in any attempt to pre-compute trait satisfaction relationships.
224On the other hand, the addition of a nominal inheritance mechanism to \CFA's type system or replacement of \CFA's trait satisfaction system with a more object-oriented inheritance model and investigation of possible expression resolution optimizations for such a system may be an interesting avenue of further research.
225
227In C, no more than one variable or function in the same scope may share the same name\footnote{Technically, C has multiple separated namespaces, one holding ©struct©, ©union©, and ©enum© tags, one holding labels, one holding typedef names, variable, function, and enumerator identifiers, and one for each ©struct© or ©union© type holding the field names.}, and variable or function declarations in inner scopes with the same name as a declaration in an outer scope hide the outer declaration.
228This restriction makes finding the proper declaration to match to a variable expression or function application a simple matter of symbol-table lookup, which can be easily and efficiently implemented.
229\CFA, on the other hand, allows overloading of variable and function names, so long as the overloaded declarations do not have the same type, avoiding the multiplication of variable and function names for different types common in the C standard library, as in the following example:
230\begin{lstlisting}
231#include <limits.h>
232
233int max(int a, int b) { return a < b ? b : a; }  // (1)
234double max(double a, double b) { return a < b ? b : a; }  // (2)
235
236int max = INT_MAX;     // (3)
237double max = DBL_MAX;  // (4)
238
239max(7, -max);   // uses (1) and (3), by matching int type of the constant 7
240max(max, 3.14); // uses (2) and (4), by matching double type of the constant 3.14
241
242max(max, -max);  // ERROR: ambiguous
243int m = max(max, -max); // uses (1) once and (3) twice, by matching return type
244\end{lstlisting}
245
246The presence of name overloading in \CFA means that simple table lookup is insufficient to match identifiers to declarations, and a type matching algorithm must be part of expression resolution.
247
248\subsection{Implicit Conversions}
249In addition to the multiple interpretations of an expression produced by name overloading, \CFA must support all of the implicit conversions present in C for backward compatibility, producing further candidate interpretations for expressions.
250C does not have a inheritance hierarchy of types, but the C standard's rules for the usual arithmetic conversions'' define which of the built-in types are implicitly convertable to which other types, and the relative cost of any pair of such conversions from a single source type.
252
253The expression resolution problem, then, is to find the unique minimal-cost interpretation of each expression in the program, where all identifiers must be matched to a declaration, and implicit conversions or polymorphic bindings of the result of an expression may increase the cost of the expression.
254Note that which subexpression interpretation is minimal-cost may require contextual information to disambiguate.
255For instance, in the example in the previous subsection, ©max(max, -max)© cannot be unambiguously resolved, but ©int m = max(max, -max)© has a single minimal-cost resolution.
257
258\subsubsection{User-generated Implicit Conversions}
259One possible additional feature to \CFA included in this research proposal is \emph{user-generated implicit conversions}.
260Such a conversion system should be simple for programmers to utilize, and fit naturally with the existing design of implicit conversions in C; ideally it would also be sufficiently powerful to encode C's usual arithmetic conversions itself, so that \CFA only has one set of rules for conversions.
261
262Ditchfield~\cite{Ditchfield:conversions} laid out a framework for using polymorphic-conversion-constructor functions to create a directed acyclic graph (DAG) of conversions.
263A monomorphic variant of these functions can be used to mark a conversion arc in the DAG as only usable as the final step in a conversion.
264With these two types of conversion arcs, separate DAGs can be created for the safe and the unsafe conversions, and conversion cost can be represented the length of the shortest path through the DAG from one type to another.
265\begin{figure}[h]
266\centering
267\includegraphics{conversion_dag}
268\caption{A portion of the implicit conversion DAG for built-in types.}\label{fig:conv_dag}
269\end{figure}
270As can be seen in Figure~\ref{fig:conv_dag}, there are either safe or unsafe paths between each of the arithmetic types listed; the final'' arcs are important both to avoid creating cycles in the signed-unsigned conversions, and to disambiguate potential diamond conversions (\eg, if the ©int© to ©unsigned int© conversion was not marked final there would be two length-two paths from ©int© to ©unsigned long©, making it impossible to choose which one; however, since the ©unsigned int© to ©unsigned long© arc can not be traversed after the final ©int© to ©unsigned int© arc, there is a single unambiguous conversion path from ©int© to ©unsigned long©).
271
272Open research questions on this topic include:
273\begin{itemize}
274\item Can a conversion graph be generated that represents each allowable conversion in C with a unique minimal-length path such that the path lengths accurately represent the relative costs of the conversions?
275\item Can such a graph representation be usefully augmented to include user-defined types as well as built-in types?
276\item Can the graph be efficiently represented and used in the expression resolver?
277\end{itemize}
278
279\subsection{Constructors and Destructors}
280Rob Shluntz, a current member of the \CFA research team, has added constructors and destructors to \CFA.
281Each type has an overridable default-generated zero-argument constructor, copy constructor, assignment operator, and destructor.
283This feature affects expression resolution because an ©otype© type variable ©T© implicitly adds four type assertions, one for each of these four functions, so assertion resolution is pervasive in \CFA polymorphic functions, even those without any explicit type assertions.
284The following example shows the implicitly-generated code in green:
285\begin{lstlisting}
286struct kv {
287    int key;
288    char *value;
289};
290
291¢void ?{}(kv *this) {  // default constructor
292    ?{}(&(this->key));  // call recursively on members
293    ?{}(&(this->value));
294}
295void ?{}(kv *this, kv that) {  // copy constructor
296    ?{}(&(this->key), that.key);
297    ?{}(&(this->value), that.value);
298}
299kv ?=?(kv *this, kv that) {  // assignment operator
300    ?=?(&(this->key), that.key);
301    ?=?(&(this->value), that.value);
302    return *this;
303}
304void ^?{}(kv *this) {  // destructor
305    ^?{}(&(this->key));
306    ^?{}(&(this->value));
307}¢
308
309forall(otype T ¢| { void ?{}(T*); void ?{}(T*, T); T ?=?(T*, T); void ^?{}(T*); }¢)
310void foo(T);
311\end{lstlisting}
312
313\subsection{Generic Types}
314I have already added a generic type capability to \CFA, designed to efficiently and naturally integrate with \CFA's existing polymorphic functions.
316\begin{lstlisting}
317forall(otype R, otype S) struct pair {
318    R first;
319    S second;
320};
321
322forall(otype T)
323T value( pair(const char*, T) *p ) { return p->second; }
324
325pair(const char*, int) p = { "magic", 42 };
326int magic = value( &p );
327\end{lstlisting}
328For \emph{concrete} generic types, that is, those where none of the type parameters depend on polymorphic type variables (like ©pair(const char*, int)© above), the struct is essentially template expanded to a new struct type; for \emph{polymorphic} generic types (such as ©pair(const char*, T)© above), member access is handled by a runtime calculation of the field offset, based on the size and alignment information of the polymorphic parameter type.
329The default-generated constructors, destructor and assignment operator for a generic type are polymorphic functions with the same list of type parameters as the generic type definition.
330
331Aside from giving users the ability to create more parameterized types than just the built-in pointer, array and function types, the combination of generic types with polymorphic functions and implicit conversions makes the edge case where the resolver may enter an infinite loop much more common, as in the following code example:
332\begin{lstlisting}
333forall(otype T) struct box { T x; };
334
335void f(void*); // (1)
336
337forall(otype S)
338void f(box(S)* b) { // (2)
339        f(®(void*)0®);
340}
341\end{lstlisting}
342
343The loop in the resolver happens as follows:
344\begin{itemize}
348\item The previous step repeats until stopped, with four times as much work performed at each step.
349\end{itemize}
350This problem can occur in any resolution context where a polymorphic function can satisfy its own type assertions is required for a possible interpretation of an expression with no constraints on its type, and is thus not limited to combinations of generic types with ©void*© conversions.
351However, constructors for generic types often satisfy their own assertions and a polymorphic conversion such as the ©void*© conversion to a polymorphic variable is a common way to create an expression with no constraints on its type.
352As discussed above, the \CFA expression resolver must handle this possible infinite recursion somehow, and it occurs fairly naturally in code like the above that uses generic types.
353
354\subsection{Tuple Types}
355\CFA adds \emph{tuple types} to C, a syntactic facility for referring to lists of values anonymously or with a single identifier.
356An identifier may name a tuple, and a function may return one.
357Particularly relevantly for resolution, a tuple may be implicitly \emph{destructured} into a list of values, as in the call to ©swap©:
358\begin{lstlisting}
359[char, char] x = [ '!', '?' ];  // (1)
360int x = 42;  // (2)
361
362forall(otype T) [T, T] swap( T a, T b ) { return [b, a]; }  // (3)
363
364x = swap( x ); // destructure [char, char] x into two elements of parameter list
365// cannot use int x for parameter, not enough arguments to swap
366
367void swap( int, char, char ); // (4)
368
369swap( x, x ); // resolved as (4) on (2) and (1)
370// (3) on (2) and (2) is close, but the polymorphic binding makes it not minimal-cost
371\end{lstlisting}
372Tuple destructuring means that the mapping from the position of a subexpression in the argument list to the position of a paramter in the function declaration is not straightforward, as some arguments may be expandable to different numbers of parameters, like ©x© above.
374
375\subsection{Reference Types}
376I have been designing \emph{reference types} for \CFA, in collaboration with the rest of the \CFA research team.
377Given some type ©T©, a ©T&© (reference to ©T©'') is essentially an automatically dereferenced pointer; with these semantics most of the C standard's discussions of lvalues can be expressed in terms of references instead, with the benefit of being able to express the difference between the reference and non-reference version of a type in user code.
379The reference proposal also adds a rvalue-to-lvalue conversion to \CFA, implemented by storing the value in a new compiler-generated temporary and passing a reference to the temporary.
380These two conversions can chain, producing a qualifier-dropping conversion for references, for instance converting a reference to a ©const int© into a reference to a non-©const int© by copying the originally refered to value into a fresh temporary and taking a reference to this temporary, as in:
381\begin{lstlisting}
382const int magic = 42;
383
384void inc_print( int& x ) { printf("%d\n", ++x); }
385
386print_inc( magic ); // legal; implicitly generated code in green below:
387
388¢int tmp = magic;¢ // to safely strip const-qualifier
389¢print_inc( tmp );¢ // tmp is incremented, magic is unchanged
390\end{lstlisting}
391These reference conversions may also chain with the other implicit type-conversions.
392The main implication of the reference conversions for expression resolution is the multiplication of available implicit conversions, though given the restricted context reference conversions may be able to be treated efficiently as a special case of implicit conversions.
393
394\subsection{Special Literal Types}
396Implicit conversions from these types allow ©0© and ©1© to be considered as values of many different types, depending on context, allowing expression desugarings like ©if ( x ) {}© $\Rightarrow$ ©if ( x != 0 ) {}© to be implemented efficiently and precisely.
397This approach is a generalization of C's existing behaviour of treating ©0© as either an integer zero or a null pointer constant, and treating either of those values as boolean false.
398The main implication for expression resolution is that the frequently encountered expressions ©0© and ©1© may have a large number of valid interpretations.
399
400\subsection{Deleted Function Declarations}
401One final proposal for \CFA with an impact on the expression resolver is \emph{deleted function declarations}; in \CCeleven, a function declaration can be deleted as below:
402\begin{lstlisting}
403int somefn(char) = delete;
404\end{lstlisting}
405This feature is typically used in \CCeleven to make a type non-copyable by deleting its copy constructor and assignment operator\footnote{In previous versions of \CC a type could be made non-copyable by declaring a private copy constructor and assignment operator, but not defining either. This idiom is well-known, but depends on some rather subtle and \CC-specific rules about private members and implicitly-generated functions; the deleted-function form is both clearer and less verbose.}, or forbidding some interpretations of a polymorphic function by specifically deleting the forbidden overloads\footnote{Specific polymorphic function overloads can also be forbidden in previous \CC versions through use of template metaprogramming techniques, though this advanced usage is beyond the skills of many programmers. A similar effect can be produced on an ad-hoc basis at the appropriate call sites through use of casts to determine the function type. In both cases, the deleted-function form is clearer and more concise.}.
406To add a similar feature to \CFA involves including the deleted function declarations in expression resolution along with the normal declarations, but producing a compiler error if the deleted function is the best resolution.
407How conflicts should be handled between resolution of an expression to both a deleted and a non-deleted function is a small but open research question.
408
409\section{Expression Resolution}
410\subsection{Analysis}
411The expression resolution problem is determining an optimal match between some combination of argument interpretations and the parameter list of some overloaded instance of a function; the argument interpretations are produced by recursive invocations of expression resolution, where the base case is zero-argument functions (which are, for purposes of this discussion, semantically equivalent to named variables or constant literal expressions).
412Assuming that the matching between a function's parameter list and a combination of argument interpretations can be done in $\bigO{p^k}$ time, where $p$ is the number of parameters and $k$ is some positive number, if there are $\bigO{i}$ valid interpretations for each subexpression, there will be $\bigO{i}$ candidate functions and $\bigO{i^p}$ possible argument combinations for each expression, so for a single recursive call expression resolution takes $\bigO{i^{p+1} \cdot p^k}$ time if it must compare all combinations, or $\bigO{i(p+1) \cdot p^k}$ time if argument-parameter matches can be chosen independently of each other.
413Given these bounds, resolution of a single top-level expression tree of depth $d$ takes $\bigO{i^{p+1} \cdot p^{k \cdot d}}$ time under full-combination matching, or $\bigO{i(p+1) \cdot p^{k \cdot d}}$ time for independent-parameter matching\footnote{A call tree has leaves at depth $\bigO{d}$, and each internal node has $\bigO{p}$ fan-out, producing $\bigO{p^d}$ total recursive calls.}.
414
415Expression resolution is somewhat unavoidably exponential in $d$, the depth of the expression tree, and if arguments cannot be matched to parameters independently of each other, expression resolution is also exponential in $p$.
416However, both $d$ and $p$ are fixed by the programmer, and generally bounded by reasonably small constants.
417$k$, on the other hand, is mostly dependent on the representation of types in the system and the efficiency of type assertion checking; if a candidate argument combination can be compared to a function parameter list in linear time in the length of the list (\ie $k = 1$), then the $p^{k \cdot d}$ factor is linear in the input size of the source code for the expression, otherwise the resolution algorithm exibits sub-linear performance scaling on code containing more-deeply nested expressions.
418The number of valid interpretations of any subexpression, $i$, is bounded by the number of types in the system, which is possibly infinite, though practical resolution algorithms for \CFA must be able to place some finite bound on $i$, possibly at the expense of type-system completeness.
419
420\subsection{Expression Costs}
421The expression resolution problem involves minimization of a cost function; loosely defined, this cost function is the number of implicit conversions in the top-level expression interpretation.
422With more specificity, the \emph{cost} of a particular expression interpretation is a lexicographically-ordered tuple, where each element of the tuple corresponds to a particular kind of conversion.
423In \CFA today, cost is a three-tuple including the number of unsafe conversions, the number of polymorphic parameter bindings, and the number of safe conversions.
424These counts include conversions used in subexpression interpretations, as well as those necessary to satisfy the type assertions of any polymorphic functions included in the interpretation.
425
426\begin{lstlisting}
427void f(char, long);  // $f_1$ - cost (2, 0, 1)
428forall(otype T) void f(T, long); // $f_2$ - cost (0, 1, 1)
429void f(long, long); // $f_{3a}$ - cost (0, 0, 2)
430void f(int, float); // $f_{3b}$ - cost (0, 0, 2)
431void f(int, long);  // $f_4$ - cost (0, 0, 1)
432
433f(7, 11);
434\end{lstlisting}
435
436In the example above, the expression resolves to $f_4$.
437$f_1$ has an unsafe conversion (from ©int© to ©char©), and is thus the highest cost, followed by $f_2$, which has a polymorphic binding (from ©int© to ©T©).
438Neither $f_{3a}$, $f_{3b}$, or $f_4$ match exactly with the type of the call expression (©void (*)(int, int)©), each involving safe conversions, but in this case $f_4$ is cheaper than $f_{3a}$, because it converts fewer arguments, and is also cheaper than $f_{3b}$, because ©long© is a closer match for ©int© than ©float© is.
439If the declaration of $f_4$ was missing, the expression would be ambiguous, because the two single-step ©int©-to-©long© conversions in $f_{3a}$ cost the same as the one double-step ©int©-to-©float© conversion in $f_{3b}$.
440
441In the course of this project I may modify the cost tuple,\footnote{I have considered adding an element to distinguish between cast expressions used as conversions and those used as type ascriptions, and another element to differentiate interpretations based on closer qualifier matches. The existing costing of polymorphic functions could also be made more precice than a bare count of parameter bindings.} but the essential nature of the cost calculation should remain the same.
442
443\subsection{Objectives}
444The research goal of this project is to develop a performant expression resolver for \CFA; this analysis suggests three primary areas of investigation to accomplish that end.
445The first area of investigation is efficient argument-parameter matching; Bilson~\cite{Bilson03} mentions significant optimization opportunities available in the current literature to improve on the existing CFA compiler.
446%TODO: look up and lit review
447The second area of investigation is minimizing dependencies between argument-parameter matches; the current CFA compiler attempts to match entire argument combinations against functions at once, potentially attempting to match the same argument against the same parameter multiple times.
448Whether the feature set of \CFA admits an expression resolution algorithm where arguments can be matched to parameters independently of other arguments in the same function application is an area of open research; polymorphic type paramters produce enough cross-argument dependencies that the problem is not trivial.
449If cross-argument resolution dependencies cannot be completely eliminated, effective caching strategies to reduce duplicated work between equivalent argument-parameter matches in different combinations may mitigate the asymptotic defecits of the whole-combination matching approach.
450The final area of investigation is heuristics and algorithmic approaches to reduce the number of argument interpretations considered in the common case; if argument-parameter matches cannot be made independent, even small reductions in $i$ should yield significant reductions in the $i^{p+1}$ resolver runtime factor.
451
452The discussion below presents a number of largely orthagonal axes for expression resolution algorithm design to be investigated, noting prior work where applicable.
453Though some of the proposed improvements to the expression resolution algorithm are based on heuristics rather than asymptoticly superior algorithms, it should be noted that programmers often employ idioms and other programming patterns to reduce the mental burden of producing correct code, and if these patterns can be identified and exploited by the compiler then the significant reduction in expression resolution time for common, idiomatic expressions should result in lower total compilation time even for code including difficult-to-resolve expressions that push the expression resolver to its theoretical worst case.
454
455\subsection{Argument-Parameter Matching}
456The first axis for consideration is the argument-parameter matching direction --- whether the type matching for a candidate function to a set of candidate arguments is directed by the argument types or the parameter types.
457For programming languages without implicit conversions, argument-parameter matching is essentially the entirety of the expression resolution problem, and is generally referred to as overload resolution'' in the literature.
458All expression-resolution algorithms form a DAG of interpretations, some explicitly, some implicitly; in this DAG, arcs point from function-call interpretations to argument interpretations, as in Figure~\ref{fig:res_dag}:
459\begin{figure}[h]
460\centering
461\begin{subfigure}[h]{2in}
462\begin{lstlisting}
463int *p;  // $p_i$
464char *p; // $p_c$
465
466double *f(int*, int*); // $f_d$
467char *f(char*, int*); // $f_c$
468
469f( f( p, p ), p );
470\end{lstlisting}
471\end{subfigure}~\begin{subfigure}[h]{2in}
472\includegraphics{resolution_dag}
473\end{subfigure}
474\caption{Resolution DAG for a simple expression. Functions that do not have a valid argument matching are covered with an \textsf{X}.}\label{fig:res_dag}
475\end{figure}
476
477Note that some interpretations may be part of more than one super-interpretation, as with the second $p_i$ in the bottom row, while some valid subexpression interpretations, like $f_d$ in the middle row, are not used in any interpretation of their superexpression.
478
479\subsubsection{Argument-directed (Bottom-up)}
480Baker's algorithm for expression resolution~\cite{Baker82} pre-computes argument candidates, from the leaves of the expression tree up.
481For each candidate function, Baker attempts to match argument types to parameter types in sequence, failing if any parameter cannot be matched.
482
483Bilson~\cite{Bilson03} similarly pre-computes argument candidates in the original \CFA compiler, but then explicitly enumerates all possible argument combinations for a multi-parameter function; these argument combinations are matched to the parameter types of the candidate function as a unit rather than individual arguments.
484This approach is less efficient than Baker's approach, as the same argument may be compared to the same parameter many times, but allows a more straightforward handling of polymorphic type-binding and multiple return-types.
485It is possible the efficiency losses here relative to Baker could be significantly reduced by keeping a memoized cache of argument-parameter type comparisons and reading previously-seen argument-parameter matches from this cache rather than recomputing them.
486
487\subsubsection{Parameter-directed (Top-down)}
488Unlike Baker and Bilson, Cormack's algorithm~\cite{Cormack81} requests argument candidates that match the type of each parameter of each candidate function, from the top-level expression down; memoization of these requests is presented as an optimization.
489As presented, this algorithm requires the result of the expression to have a known type, though an algorithm based on Cormack's could reasonably request a candidate set of any return type, though such a set may be quite large.
490
491\subsubsection{Hybrid}
492This proposal includes the investigation of hybrid top-down/bottom-up argument-parameter matching.
493A reasonable hybrid approach might take a top-down approach when the expression to be matched has a fixed type, and a bottom-up approach in untyped contexts.
494This approach may involve switching from one type to another at different levels of the expression tree.
495For instance, in:
496\begin{lstlisting}
497forall(otype T)
498int f(T x);  // (1)
499
500void* f(char y);  // (2)
501
502int x = f( f( '!' ) );
503\end{lstlisting}
505
506Deciding when to switch between bottom-up and top-down resolution to minimize wasted work in a hybrid algorithm is a necessarily heuristic process, and finding good heuristics for which subexpressions to swich matching strategies on is an open question.
507One reasonable approach might be to set a threshold $t$ for the number of candidate functions, and to use top-down resolution for any subexpression with fewer than $t$ candidate functions, to minimize the number of unmatchable argument interpretations computed, but to use bottom-up resolution for any subexpression with at least $t$ candidate functions, to reduce duplication in argument interpretation computation between the different candidate functions.
508
509Ganzinger and Ripken~\cite{Ganzinger80} propose an approach (later refined by Pennello~\etal~\cite{Pennello80}) that uses a top-down filtering pass followed by a bottom-up filtering pass to reduce the number of candidate interpretations; they prove that for the Ada programming language a small number of such iterations is sufficient to converge to a solution for the expression resolution problem.
510Persch~\etal~\cite{PW:overload} developed a similar two-pass approach where the bottom-up pass is followed by the top-down pass.
511These algorithms differ from the hybrid approach under investigation in that they take multiple passes over the expression tree to yield a solution, and that they also apply both filtering heuristics to all expression nodes; \CFA's polymorphic functions and implicit conversions make the approach of filtering out invalid types taken by all of these algorithms infeasible.
512
513\subsubsection{Common Subexpression Caching}
514With any of these argument-parameter approaches, it may be a useful optimization to cache the resolution results for common subexpressions; in Figure~\ref{fig:res_dag} this optimization would result in the list of interpretations $[p_c, p_i]$ for ©p© only being calculated once, and re-used for each of the three instances of ©p©.
515
516\subsection{Implicit Conversion Application}
517With the exception of Bilson, the authors mentioned above do not account for implicit conversions in their algorithms\footnote{Baker does briefly comment on an approach for handling implicit conversions, but does not provide an implementable algorithm.}; all assume that there is at most one valid interpretation of a given expression for each distinct type.
518Integrating implicit conversion handling into the presented argument-parameter matching algorithms thus provides some choice of implementation approach.
519
520Inference of polymorphic type variables can be considered a form of implicit conversion application, where monomorphic types are implicitly converted to instances of some polymorphic type\footnote{This conversion'' may not be implemented in any explicit way at runtime, but does need to be handled by the expression resolver as an inexact match between argument and parameter types.}.
521This form of implicit conversion is particularly common in functional languages; Haskell's type classes~\cite{typeclass} are a particularly well-studied variant of this inference.
522However, type classes arguably do not allow name overloading, as (at least in the Haskell implmentation) identifiers belonging to type classes may not be overloaded in any other context than an implementation of that type class; this provides a single (possibly polymorphic) interpretation of any identifier, simplifing the expression resolution problem relative to \CFA.
523\CC~\cite{ANSI98:C++} includes both name overloading and implicit conversions in its expression resolution specification, though unlike \CFA it does complete type-checking on a generated monomorphization of template functions, where \CFA simply checks a list of type constraints.
524The upcoming Concepts standard~\cite{C++concepts} defines a system of type constraints similar in principle to \CFA's.
525Cormack and Wright~\cite{Cormack90} present an algorithm that integrates overload resolution with a polymorphic type inference approach very similar to \CFA's.
526However, their algorithm does not account for implicit conversions other than polymorphic type binding and their discussion of their overload resolution algorithm is not sufficiently detailed to classify it with the other argument-parameter matching approaches\footnote{Their overload resolution algorithm is possibly a variant of Ganzinger and Ripken~\cite{Ganzinger80} or Pennello~\etal~\cite{Pennello80}, modified to allow for polymorphic type binding.}.
527
528\subsubsection{On Parameters}
529Bilson does account for implicit conversions in his algorithm, but it is unclear if the approach is optimal.
530His algorithm integrates checking for valid implicit conversions into the argument-parameter-matching step, essentially trading more expensive matching for a smaller number of argument interpretations.
531This approach may result in the same subexpression being checked for a type match with the same type multiple times, though again memoization may mitigate this cost; however, this approach does not generate implicit conversions that are not useful to match the containing function.
532
533\subsubsection{On Arguments}
534Another approach is to generate a set of possible implicit conversions for each set of interpretations of a given argument.
535This approach has the benefit of detecting ambiguous interpretations of arguments at the level of the argument rather than its containing call, never finds more than one interpretation of the argument with a given type, and re-uses calculation of implicit conversions between function candidates.
536On the other hand, this approach may unnecessarily generate argument interpretations that never match any parameter, wasting work.
537Furthermore, in the presence of tuple types, this approach may lead to a combinatorial explosion of argument interpretations considered, unless the tuple can be considered as a sequence of elements rather than a unified whole.
538
539\subsection{Candidate Set Generation}
540All the algorithms discussed to this point generate the complete set of candidate argument interpretations before attempting to match the containing function-call expression.
541However, given that the top-level expression interpretation that is ultimately chosen is the minimal-cost valid interpretation, any consideration of non-minimal-cost interpretations is wasted work.
542Under the assumption that programmers generally write function calls with relatively low-cost interpretations, a possible work-saving heuristic is to generate only the lowest-cost argument interpretations first, attempt to find a valid top-level interpretation using them, and only if that fails generate the next higher-cost argument interpretations.
543
544\subsubsection{Eager}
545Within the eager approach taken by the existing top-down and bottom-up algorithms, there are still variants to explore.
546Cormack and Baker do not account for implict conversions, and thus do not account for the possibility of multiple valid interpretations with distinct costs; Bilson, on the other hand, sorts the list of interpretations to aid in finding minimal-cost interpretations.
547Sorting the lists of argument or function call interpretations by cost at some point during resolution may provide useful opportunities to short-circuit expression evaluation when a minimal-cost interpretation is found, though it is unclear if this short-circuiting behaviour justifies the cost of the sort.
548
549\subsubsection{Lazy}
550In the presence of implicit conversions, many argument interpretations may match a given parameter by application of an appropriate implicit conversion.
551However, if programmers actually use relatively few implicit conversions, then the on arguments'' approach to implicit conversions generates a large number of high-cost interpretations that may never be used.
552Even if the on parameters'' approach to implicit conversions is used, eager generation of interpretations spends extra time attempting possibly expensive polymorphic or conversion-based matches in cases where an exact monomorphic interpretation exists.
553
554The essence of the lazy approach to candidate set generation is to wrap the matching algorithm into the element generator of a lazy list, only generating as few elements at a time to ensure the next-smallest-cost interpretation has been generated.
555Assuming argument interpretations are provided to the parameter matching algorithm in sorted order, a sorted list of function call interpretations can be produced by generating combinations of arguments sorted by total cost\footnote{I have already developed a lazy $n$-way combination generation algorithm to perform this task.}, then generating function call interpretations in the order suggested by this list.
556The function call interpretation chosen may have costs of its own, for instance polymorphic type binding, so in some cases a number of argument combinations (any combination whose marginal cost does not exceed the cost of the function call interpretation itself) may need to be considered to determine the next-smallest-cost function call interpretation.
557Ideally, this candidate generation approach leads to very few unused candidates being generated (in the expected case where the programmer has, in fact, provided a validly-typable program), but it is an open question whether or not the overheads of lazy generation exceed the benefit produced from considering fewer interpretations.
558
559\subsubsection{Stepwise Lazy}
560As a compromise between the trade-offs of the eager and lazy approaches, I also propose to investigate a stepwise lazy'' approach, where all the interpretations for some `step'' are eagerly generated, then the interpretations in the later steps are only generated on demand.
561Under this approach the \CFA resolver could, for instance, try expression interpretations in the following order:
562\begin{enumerate}
563\item Interpretations with no polymorphic type binding or implicit conversions.
564\item Interpretations containing no polymorphic type binding and at least one safe implicit conversion.
565\item Interpretations containing polymorphic type binding, but only safe implicit conversions.
566\item Interpretations containing at least one unsafe implicit conversion.
567\end{enumerate}
568If a valid expression interpretation is found in one step, it is guaranteed to be lower-cost than any interpretation in a later step (by the structure of \CFA interpretation costs), so no further steps need be considered.
569This approach may save significant amounts of work, especially given that the first steps avoid potentially expensive handling of implicit conversions and type assertion satisfaction entirely, and cover a large proportion of common monomorphic code.
570
571%\subsection{Parameter-Directed}
572%\textbf{TODO: Richard's algorithm isn't Baker (Cormack?), disentangle from this section \ldots}.
573%The expression resolution algorithm used by the existing iteration of CFA is based on Baker's\cite{Baker82} algorithm for overload resolution in Ada.
574%The essential idea of this algorithm is to first find the possible interpretations of the most deeply nested subexpressions, then to use these interpretations to recursively generate valid interpretations of their superexpressions.
575%To simplify matters, the only expressions considered in this discussion of the algorithm are function application and literal expressions; other expression types can generally be considered to be variants of one of these for the purposes of the resolver, \eg variables are essentially zero-argument functions.
576%If we consider expressions as graph nodes with arcs connecting them to their subexpressions, these expressions form a DAG, generated by the algorithm from the bottom up.
577%Literal expressions are represented by leaf nodes, annotated with the type of the expression, while a function application will have a reference to the function declaration chosen, as well as arcs to the interpretation nodes for its argument expressions; functions are annotated with their return type (or types, in the case of multiple return values).
578%
579%\textbf{TODO: Figure}
580%
581%Baker's algorithm was designed to account for name overloading; Richard Bilson\cite{Bilson03} extended this algorithm to also handle polymorphic functions, implicit conversions and multiple return types when designing the original \CFA compiler.
582%The core of the algorithm is a function which Baker refers to as $gen\_calls$.
583%$gen\_calls$ takes as arguments the name of a function $f$ and a list containing the set of possible subexpression interpretations $S_j$ for each argument of the function and returns a set of possible interpretations of calling that function on those arguments.
584%The subexpression interpretations are generally either singleton sets generated by the single valid interpretation of a literal expression, or the results of a previous call to $gen\_calls$.
585%If there are no valid interpretations of an expression, the set returned by $gen\_calls$ will be empty, at which point resolution can cease, since each subexpression must have at least one valid interpretation to produce an interpretation of the whole expression.
586%On the other hand, if for some type $T$ there is more than one valid interpretation of an expression with type $T$, all interpretations of that expression with type $T$ can be collapsed into a single \emph{ambiguous expression} of type $T$, since the only way to disambiguate expressions is by their return types.
587%If a subexpression interpretation is ambiguous, than any expression interpretation containing it will also be ambiguous.
588%In the variant of this algorithm including implicit conversions, the interpretation of an expression as type $T$ is ambiguous only if there is more than one \emph{minimal-cost} interpretation of the expression as type $T$, as cheaper expressions are always chosen in preference to more expensive ones.
589%
590%Given this description of the behaviour of $gen\_calls$, its implementation is quite straightforward: for each function declaration $f_i$ matching the name of the function, consider each of the parameter types $p_j$ of $f_i$, attempting to match the type of an element of $S_j$ to $p_j$ (this may include checking of implicit conversions).
591%If no such element can be found, there is no valid interpretation of the expression using $f_i$, while if more than one such (minimal-cost) element is found than an ambiguous interpretation with the result type of $f_i$ is produced.
592%In the \CFA variant, which includes polymorphic functions, it is possible that a single polymorphic function definition $f_i$ can produce multiple valid interpretations by different choices of type variable bindings; these interpretations are unambiguous so long as the return type of $f_i$ is different for each type binding.
593%If all the parameters $p_j$ of $f_i$ can be uniquely matched to a candidate interpretation, then a valid interpretation based on $f_i$ and those $p_j$ is produced.
594%$gen\_calls$ collects the produced interpretations for each $f_i$ and returns them; a top level expression is invalid if this list is empty, ambiguous if there is more than one (minimal-cost) result, or if this single result is ambiguous, and valid otherwise.
595%
596%In this implementation, resolution of a single top-level expression takes time $O(\ldots)$, where \ldots. \textbf{TODO:} \textit{Look at 2.3.1 in Richard's thesis when working out complexity; I think he does get the Baker algorithm wrong on combinations though, maybe\ldots}
597%
598%\textbf{TODO: Basic Lit Review} \textit{Look at 2.4 in Richard's thesis for any possible more-recent citations of Baker\ldots} \textit{Look back at Baker's related work for other papers that look similar to what you're doing, then check their citations as well\ldots} \textit{Look at Richard's citations in 2.3.2 w.r.t. type data structures\ldots}
599%\textit{CormackWright90 seems to describe a solution for the same problem, mostly focused on how to find the implicit parameters}
600
601\section{Proposal}
602Baker~\cite{Baker82} discussed various expression resolution algorithms that can handle name overloading, but left experimental comparison of those algorithms to future work; Bilson~\cite{Bilson03} described one extension of Baker's algorithm to handle implicit conversions, but did not fully explore the space of algorithmic approaches to handle both overloaded names and implicit conversions.
603This project is intended to experimentally test a number of expression resolution algorithms that are powerful enough to handle the \CFA type-system, including both name overloading and implicit conversions.
604This comparison closes Baker's open research question, as well as potentially improving Bilson's \CFA compiler.
605
606Rather than testing all of these algorithms in-place in the \CFA compiler, a resolver prototype is being developed that acts on a simplified input language encapsulating the essential details of the \CFA type-system\footnote{Note this simplified input language is not a usable programming language.}.
607Multiple variants of this resolver prototype will be implemented, each encapsulating a different expression resolution variant, sharing as much code as feasible.
608These variants will be instrumented to test runtime performance, and run on a variety of input files; the input files may be generated programmatically or from exisiting code in \CFA or similar languages.
609These experimental results should make it possible to determine the algorithm likely to be most performant in practical use, and replace CFA's existing expression resolver.
610
611The experimental results will also provide some empirical sense of the compile-time cost of various language features by comparing the results of the most performant resolver variant that supports a feature with the most performant resolver variant that does not support that feature, a useful capability to guide language design.
612As an example, there are currently multiple open proposals for how implicit conversions should interact with polymorphic type binding in \CFA, each with distinct levels of expressive power; if the resolver prototype is modified to support each proposal, the optimal algorithm for each proposal can be compared, providing an empirical demonstration of the trade-off between expressive power and compiler runtime.
613
614This proposed project should provide valuable data on how to implement a performant compiler for programming languages such as \CFA with powerful static type-systems, specifically targeting the feature interaction between name overloading and implicit conversions.
615This work is not limited in applicability to \CFA, but may also be useful for supporting efficient compilation of the upcoming Concepts standard~\cite{C++concepts} for \CC template constraints, for instance.
616
617\appendix
618\section{Completion Timeline}
619The following is a preliminary estimate of the time necessary to complete the major components of this research project:
620\begin{center}
621\begin{tabular}{ | r @{--} l | p{4in} | }
622\hline       May 2015 & April 2016   & Project familiarization and generic types design and implementation. \\
623\hline       May 2016 & April 2017   & Design and implement resolver prototype and run performance experiments. \\
624\hline       May 2017 & August 2017  & Integrate new language features and best-performing resolver prototype into CFA. \\
625\hline September 2017 & January 2018 & Thesis writing and defense. \\
626\hline
627\end{tabular}
628\end{center}
629
631\bibliographystyle{plain}
632\bibliography{pl}
633