Context Navigation

source: doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex @ 3b40801b

ADTaaron-thesisarm-ehast-experimentalcleanup-dtorsenumforall-pointer-decayjacob/cs343-translationjenkins-sandboxnew-astnew-ast-unique-exprpthread-emulationqualifiedEnum

Last change on this file since 3b40801b was 3b40801b, checked in by Aaron Moss <a3moss@…>, 5 years ago
thesis: add changebars
Property mode set to `100644`
File size: 52.5 KB

Line
1	\chapter{Resolution Algorithms}
2	\label{resolution-chap}
3
4	The main task of the \CFACC{} type-checker is \emph{expression resolution}: determining which declarations the identifiers in each expression correspond to.
5	Resolution is a straightforward task in C, as no simultaneously-visible declarations share identifiers, but in \CFA{}, the name overloading features discussed in Section~\ref{overloading-sec} generate multiple candidate declarations for each identifier.
6	A given matching between identifiers and declarations in an expression is an \emph{interpretation}; an interpretation also includes information about polymorphic type bindings and implicit casts to support the \CFA{} features discussed in Sections~\ref{poly-func-sec} and~\ref{implicit-conv-sec}, each of which increase the number of valid candidate interpretations.
7	To choose among valid interpretations, a \emph{conversion cost} is used to rank interpretations.
8	\cbstart
9	This conversion cost is summed over all subexpression interpretations in the interpretation of a top-level expression.
10	\cbend
11	Hence, the expression resolution problem is to find the unique minimal-cost interpretation for an expression, reporting an error if no such unique interpretation exists.
12
13	\section{Expression Resolution}
14
15	\cbstart
16	The expression resolution pass in \CFACC{} must traverse the input expression, match identifiers to available declarations, rank candidate interpretations according to their conversion cost, and check type assertion satisfaction for these candidates.
17	Once the set of valid interpretations for the top-level expression has been found, the expression resolver selects the unique minimal-cost candidate or reports an error.
18
19	The expression resolution problem in \CFA{} is more difficult than the analogous problems in C or \CC{}.
20	As mentioned above, the lack of name overloading in C makes its resolution problem substantially easier, but a comparison of the richer type systems in \CFA{} and \CC{} highlights some of the challenges in \CFA{} expression resolution.
21	The key distinction between \CFA{} and \CC{} resolution is that \CC{} uses a greedy algorithm for selection of candidate functions given their argument interpretations, whereas \CFA{} allows contextual information from superexpressions to influence the choice among candidate functions.
22	One key use of this contextual information is for type inference of polymorphic return types; \CC{} requires explicit specification of template type parameters which only occur in a function's return type, while \CFA{} allows the instantiation of these type parameters to be inferred from context (and in fact does not allow explicit specification of type parameters to a function).
23	Similarly, while both \CFA{} and \CC{} rank candidate functions based on a cost metric for implicit conversions, \CFA{} allows a suboptimal subexpression interpretation to be selected if it allows a lower-cost overall interpretation, while \CC{} requires that each subexpression interpretation have minimal cost.
24	Because of this use of contextual information, the \CFA{} expression resolver must consider multiple interpretations of each function argument, while the \CC{} compiler has only a single interpretation for each argument\footnote{With the exception of address-of operations on functions.}.
25	Additionally, until the introduction of concepts in \CCtwenty{} \cite{C++Concepts}, \CC{} expression resolution has no analogue to \CFA{} assertion satisfaction checking, a further complication for a \CFA{} compiler.
26	The precise definition of \CFA{} expression resolution in this section further expands on the challenges of this problem.
27	\cbend
28
29	\subsection{Type Unification}
30
31	The polymorphism features of \CFA{} require binding of concrete types to polymorphic type variables.
32	Briefly, \CFACC{} keeps a mapping from type variables to the concrete types they are bound to as an auxiliary data structure during expression resolution; Chapter~\ref{env-chap} describes this \emph{environment} data structure in more detail.
33	A \emph{unification} algorithm is used to simultaneously check two types for equivalence with respect to the substitutions in an environment and update that environment.
34	Essentially, unification recursively traverses the structure of both types, checking them for equivalence, and when it encounters a type variable, it replaces it with the concrete type it is bound to; if the type variable has not yet been bound, the unification algorithm assigns the equivalent type as the bound type of the variable, after performing various consistency checks.
35	Ditchfield~\cite{Ditchfield92} and Bilson~\cite{Bilson03} describe the semantics of \CFA{} unification in more detail.
36
37	\subsection{Conversion Cost} \label{conv-cost-sec}
38
39	\cbstart
40	\CFA{}, like C, allows inexact matches between the type of function parameters and function call arguments.
41	Both languages insert \emph{implicit conversions} in these situations to produce an exact type match, and \CFA{} also uses the relative \emph{cost} of different conversions to select among overloaded function candidates.
42	\cbend
43	C does not have an explicit cost model for implicit conversions, but the ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} used to decide which arithmetic operators to apply define one implicitly.
44	The only context in which C has name overloading is the arithmetic operators, and the usual arithmetic conversions define a \emph{common type} for mixed-type arguments to binary arithmetic operators.
45	Since for backward-compatibility purposes the conversion costs of \CFA{} must produce an equivalent result to these common type rules, it is appropriate to summarize \cite[\S{}6.3.1.8]{C11} here:
46
47	\begin{itemize}
48	\item If either operand is a floating-point type, the common type is the size of the largest floating-point type. If either operand is !_Complex!, the common type is also !_Complex!.
49	\item If both operands are of integral type, the common type has the same size\footnote{Technically, the C standard defines a notion of \emph{rank} in \cite[\S{}6.3.1.1]{C11}, a distinct value for each \lstinline{signed} and \lstinline{unsigned} pair; integral types of the same size thus may have distinct ranks. For instance, though \lstinline{int} and \lstinline{long} may have the same size, \lstinline{long} always has greater rank. The standard-defined types are declared to have greater rank than any types of the same size added as compiler extensions.} as the larger type.
50	\item If the operands have opposite signedness, the common type is !signed! if the !signed! operand is strictly larger, or !unsigned! otherwise. If the operands have the same signedness, the common type shares it.
51	\end{itemize}
52
53	Beginning with the work of Bilson~\cite{Bilson03}, \CFA{} defines a \emph{conversion cost} for each function call in a way that generalizes C's conversion rules.
54	Loosely defined, the conversion cost counts the implicit conversions utilized by an interpretation.
55	With more specificity, the cost is a lexicographically-ordered tuple, where each element corresponds to a particular kind of conversion.
56	In Bilson's design, conversion cost is a 3-tuple, $(unsafe, poly, safe)$, where $unsafe$ is the count of unsafe (narrowing) conversions, $poly$ is the count of polymorphic type bindings, and $safe$ is the sum of the degree of safe (widening) conversions.
57	Degree of safe conversion is calculated as path weight in a directed graph of safe conversions between types; Bilson's version of this graph is in Figure~\ref{bilson-conv-fig}.
58	The safe conversion graph is designed such that the common type $c$ of two types $u$ and $v$ is compatible with the C standard definitions from \cite[\S{}6.3.1.8]{C11} and can be calculated as the unique type minimizing the sum of the path weights of $\overrightarrow{uc}$ and $\overrightarrow{vc}$.
59	The following example lists the cost in the Bilson model of calling each of the following functions with two !int! parameters, where the interpretation with the minimum total cost will be selected:
60
61	\begin{cfa}
62	void f$\(_1\)$(char, long); $\C{// (1,0,1)}$
63	void f$\(_2\)$(short, long); $\C{// (1,0,1)}$
64	forall(otype T) void f$\(_3\)$(T, long); $\C{// (0,1,1)}$
65	void f$\(_4\)$(long, long); $\C{// (0,0,2)}$
66	void f$\(_5\)$(int, unsigned long); $\C{// (0,0,2)}$
67	void f$\(_6\)$(int, long); $\C{// (0,0,1)}$
68	\end{cfa}
69
70	Note that safe and unsafe conversions are handled differently; \CFA{} counts distance of safe conversions (\eg{} !int! to !long! is cheaper than !int! to !unsigned long!), while only counting the number of unsafe conversions (\eg{} !int! to !char! and !int! to !short! both have unsafe cost 1, as in !f!$_1$ and !f!$_2$ above).
71	These costs are summed over the parameters in a call; in the example above, the cost of the two !int! to !long! conversions for !f!$_4$ sum equal to the one !int! to !unsigned long! conversion for !f!$_5$.
72
73	\begin{figure}
74	\centering
75	\begin{subfigure}[h]{3in}
76	\includegraphics{figures/bilson-conv-graph}
77	\caption{Bilson} \label{bilson-conv-fig}
78	\end{subfigure}~\begin{subfigure}[h]{3in}
79	\includegraphics{figures/extended-conv-graph}
80	\caption{Extended} \label{extended-conv-fig}
81	\end{subfigure}
82	% \includegraphics{figures/safe-conv-graph}
83	\caption[Safe conversion graphs.]{Safe conversion graphs. In both graphs, plain arcs have cost $safe = 1, sign = 0$ while dashed sign-conversion arcs have cost $safe = 1, sign = 1$. As per \cite[\S{}6.3.1.8]{C11}, types promote to types of the same signedness with greater rank, from \lstinline{signed} to \lstinline{unsigned} with the same rank, and from \lstinline{unsigned} to \lstinline{signed} with greater size. The arc from \lstinline{unsigned long} to \lstinline{long long} (highlighted in red in \ref{bilson-conv-fig}) is deliberately omitted in \ref{extended-conv-fig}, as on the presented system \lstinline{sizeof(long) == sizeof(long long)}.}
84	\label{safe-conv-fig}
85	\end{figure}
86
87	As part of adding reference types to \CFA{} (see Section~\ref{type-features-sec}), Schluntz added a new $reference$ element to the cost tuple, which counts the number of implicit reference-to-rvalue conversions performed so that candidate interpretations can be distinguished by how closely they match the nesting of reference types; since references are meant to act almost indistinguishably from lvalues, this $reference$ element is the least significant in the lexicographic comparison of cost tuples.
88
89	I also refined the \CFA{} cost model as part of this thesis work.
90	Bilson's \CFA{} cost model includes the cost of polymorphic type bindings from a function's type assertions in the $poly$ element of the cost tuple; this has the effect of making more-constrained functions more expensive than less-constrained functions, as in the following example, based on differing iterator types:
91
92	\begin{cfa}
93	forall(dtype T \| { T& ++?(T&); }) T& advance$\(_1\)$(T& i, int n);
94	forall(dtype T \| { T& ++?(T&); T& ?+=?(T&, int)}) T& advance$\(_2\)$(T& i, int n);
95	\end{cfa}
96
97	In resolving a call to !advance!, the binding to the !T&! parameter in the assertions is added to the $poly$ cost in Bilson's model.
98	However, type assertions actually make a function \emph{less} polymorphic, and as such functions with more type assertions should be preferred in type resolution.
99	In the example above, if the meaning of !advance! is ``increment !i! !n! times'', !advance!$_1$ requires an !n!-iteration loop, while !advance!$_2$ can be implemented more efficiently with the !?+=?! operator; as such, !advance!$_2$ should be chosen over !advance!$_1$ whenever its added constraint can be satisfied.
100	Accordingly, a $specialization$ element is now included in the \CFA{} cost tuple, the values of which are always negative.
101	Each type assertion subtracts 1 from $specialization$, so that more-constrained functions cost less, and thus are chosen over less-constrained functions, all else being equal.
102	A more sophisticated design would define a partial order over sets of type assertions by set inclusion (\ie{} one function would only cost less than another if it had a strict superset of assertions, rather than just more total assertions), but I did not judge the added complexity of computing and testing this order to be worth the gain in specificity.
103
104	I also incorporated an unimplemented aspect of Ditchfield's earlier cost model.
105	In the example below, adapted from \cite[p.89]{Ditchfield92}, Bilson's cost model only distinguished between the first two cases by accounting extra cost for the extra set of !otype! parameters, which, as discussed above, is not a desirable solution:
106
107	\begin{cfa}
108	forall(otype T, otype U) void f$\(_1\)$(T, U); $\C[3.25in]{// polymorphic}$
109	forall(otype T) void f$\(_2\)$(T, T); $\C[3.25in]{// less polymorphic}$
110	forall(otype T) void f$\(_3\)$(T, int); $\C[3.25in]{// even less polymorphic}$
111	forall(otype T) void f$\(_4\)$(T*, int); $\C[3.25in]{// least polymorphic}$
112	\end{cfa}
113
114	The new cost model accounts for the fact that functions with more polymorphic variables are less constrained by introducing a $var$ cost element that counts the number of type variables on a candidate function.
115	In the example above, !f!$_1$ has $var = 2$, while the others have $var = 1$.
116
117	The new cost model also accounts for a nuance unhandled by Ditchfield or Bilson, in that it makes the more specific !f!$_4$ cheaper than the more generic !f!$_3$; !f!$_4$ is presumably somewhat optimized for handling pointers, but the prior \CFA{} cost model could not account for the more specific binding, as it simply counted the number of polymorphic unifications.
118	In the modified model, each level of constraint on a polymorphic type in the parameter list results in a decrement of the $specialization$ cost element, which is shared with the count of assertions due to their common nature as constraints on polymorphic type bindings.
119	Thus, all else equal, if both a binding to !T! and a binding to !T! are available, the model chooses the more specific !T! binding with $specialization = -1$.
120	This process is recursive, such that !T**! has $specialization = -2$.
121	This calculation works similarly for generic types, \eg{} !box(T)! also has specialization cost $-1$.
122	For multi-argument generic types, the least-specialized polymorphic parameter sets the specialization cost, \eg{} the specialization cost of !pair(T, S*)! is $-1$ (from !T!) rather than $-2$ (from !S!).
123	Specialization cost is not counted on the return type list; since $specialization$ is a property of the function declaration, a lower specialization cost prioritizes one declaration over another.
124	User programmers can choose between functions with varying parameter lists by adjusting the arguments, but the same is not true in general of varying return types\footnote{In particular, as described in Section~\ref{expr-cost-sec}, cast expressions take the cheapest valid and convertable interpretation of the argument expression, and expressions are resolved as a cast to \lstinline{void}. As a result of this, including return types in the $specialization$ cost means that a function with return type \lstinline{T*} for some polymorphic type \lstinline{T} would \emph{always} be chosen over a function with the same parameter types returning \lstinline{void}, even for \lstinline{void} contexts, an unacceptably counter-intuitive result.}, so the return types are omitted from the $specialization$ element.
125	Since both $vars$ and $specialization$ are properties of the declaration rather than any particular interpretation, they are prioritized less than the interpretation-specific conversion costs from Bilson's original 3-tuple.
126
127	A final refinement I have made to the \CFA{} cost model is with regard to choosing among arithmetic conversions.
128	The C standard \cite[\S{}6.3.1.8]{C11} states that the common type of !int! and !unsigned int! is !unsigned int! and that the common type of !int! and !long! is !long!, but does not provide guidance for making a choice among conversions.
129	Bilson's \CFACC{} uses conversion costs based off Figure~\ref{bilson-conv-fig}.
130	However, Bilson's design results in inconsistent and somewhat surprising costs, with conversion to the next-larger same-sign type generally (but not always) double the cost of conversion to the !unsigned! type of the same size.
131	In the redesign, for consistency with the approach of the usual arithmetic conversions, which select a common type primarily based on size, but secondarily on sign, arcs in the new graph are annotated with whether they represent a sign change, and such sign changes are summed in a new $sign$ cost element that lexicographically succeeds $safe$.
132	This means that sign conversions are approximately the same cost as widening conversions, but slightly more expensive (as opposed to less expensive in Bilson's graph), so maintaining the same signedness is consistently favoured.
133	This refined conversion graph is shown in Figure~\ref{extended-conv-fig}.
134
135	With these modifications, the current \CFA{} cost tuple is as follows:
136
137	\begin{equation*}
138	(unsafe, poly, safe, sign, vars, specialization, reference)
139	\end{equation*}
140
141	\subsection{Expression Cost} \label{expr-cost-sec}
142
143	The mapping from \CFA{} expressions to cost tuples is described by Bilson in \cite{Bilson03}, and remains effectively unchanged with the exception of the refinements to the cost tuple described above.
144	Nonetheless, some salient details are repeated here for the sake of completeness.
145
146	On a theoretical level, the resolver treats most expressions as if they were function calls.
147	Operators in \CFA{} (both those existing in C and added features like constructors) are all modelled as function calls.
148	In terms of the core argument-parameter matching algorithm, overloaded variables are handled the same as zero-argument function calls, aside from a different pool of candidate declarations and setup for different code generation.
149	Similarly, an aggregate member expression !a.m! can be modelled as a unary function !m! that takes one argument of the aggregate type.
150	Literals do not require sophisticated resolution, as in C the syntactic form of each implies their result types: !42! is !int!, !"hello"! is !char*!, \etc{}\footnote{ \cbstart Struct literals (\eg{} \lstinline\|(S)\{ 1, 2, 3 \}\| for some struct \lstinline{S} \cbend ) are a somewhat special case, as they are known to be of type \lstinline{S}, but require resolution of the implied constructor call described in Section~\ref{ctor-sec}.}.
151
152	Since most expressions can be treated as function calls, nested function calls are the primary component of complexity in expression resolution.
153	Each function call has an \emph{identifier} that must match the name of the corresponding declaration, and a possibly-empty list of \emph{arguments}.
154	These arguments may be function call expressions themselves, producing a tree of function-call expressions to resolve, where the leaf expressions are generally nullary functions, variable expressions, or literals.
155	A single instance of expression resolution consists of matching declarations to all the identifiers in the expression tree of a top-level expression, along with inserting any conversions and satisfying all assertions necessary for that matching.
156	The cost of a function-call expression is the sum of the conversion costs of each argument type to the corresponding parameter and the total cost of each subexpression, recursively calculated.
157	\CFA{} expression resolution must produce either the unique lowest-cost interpretation of the top-level expression, or an appropriate error message if none exists.
158	The cost model of \CFA{} precludes a greedy bottom-up resolution pass, as constraints and costs introduced by calls higher in the expression tree can change the interpretation of those lower in the tree, as in the following example:
159
160	\begin{cfa}
161	void f(int);
162	double g$\(_1\)$(int);
163	int g$\(_2\)$(long);
164
165	f( g(42) );
166	\end{cfa}
167
168	Considered independently, !g!$_1$!(42)! is the cheapest interpretation of !g(42)!, with cost $(0,0,0,0,0,0,0)$ since the argument type is an exact match.
169	However, in context, an unsafe conversion is required to downcast the return type of !g!$_1$ to an !int! suitable for !f!, for a total cost of $(1,0,0,0,0,0,0)$ for !f( g!$_1$!(42) )!.
170	If !g!$_2$ is chosen, on the other hand, there is a safe upcast from the !int! type of !42! to !long!, but no cast on the return of !g!$_2$, for a total cost of $(0,0,1,0,0,0,0)$ for !f( g!$_2$!(42) )!; as this is cheaper, !g!$_2$ is chosen.
171	Due to this design, all valid interpretations of subexpressions must in general be propagated to the top of the expression tree before any can be eliminated, a lazy form of expression resolution, as opposed to the eager expression resolution allowed by C or \CC{}, where each expression can be resolved given only the resolution of its immediate subexpressions.
172
173	If there are no valid interpretations of the top-level expression, expression resolution fails and must produce an appropriate error message.
174	If any subexpression has no valid interpretations, the process can be short-circuited and the error produced at that time.
175	If there are multiple valid interpretations of a top-level expression, ties are broken based on the conversion cost, calculated as above.
176	If there are multiple minimal-cost valid interpretations of a top-level expression, that expression is said to be \emph{ambiguous}, and an error must be produced.
177	Multiple minimal-cost interpretations of a subexpression do not necessarily imply an ambiguous top-level expression, however, as the subexpression interpretations may be disambiguated based on their return type or by selecting a more-expensive interpretation of that subexpression to reduce the overall expression cost, as in the example above.
178
179	The \CFA{} resolver uses type assertions to filter out otherwise-valid subexpression interpretations.
180	An interpretation can only be selected if all the type assertions in the !forall! clause on the corresponding declaration can be satisfied with a unique minimal-cost set of satisfying declarations.
181	Type assertion satisfaction is tested by performing type unification on the type of the assertion and the type of the declaration satisfying the assertion.
182	That is, a declaration that satisfies a type assertion must have the same name and type as the assertion after applying the substitutions in the type environment.
183	Assertion-satisfying declarations may be polymorphic functions with assertions of their own that must be satisfied recursively.
184	This recursive assertion satisfaction has the potential to introduce infinite loops into the type resolution algorithm, a situation which \CFACC{} avoids by imposing a hard limit on the depth of recursive assertion satisfaction (currently 4); this approach is also taken by \CC{} to prevent infinite recursion in template expansion, and has proven to be effective and not unduly restrictive of the expressive power of \CFA{}.
185
186	Cast expressions must be treated somewhat differently than functions for backwards compatibility purposes with C.
187	In C, cast expressions can serve two purposes, \emph{conversion} (\eg{} !(int)3.14!), which semantically converts a value to another value in a different type with a different bit representation, or \emph{coercion} (\eg{} !void* p; (int*)p;!), which assigns a different type to the same bit value.
188	C provides a set of built-in conversions and coercions, and user programmers are able to force a coercion over a conversion if desired by casting pointers.
189	The overloading features in \CFA{} introduce a third cast semantic, \emph{ascription} (\eg{} !int x; double x; (int)x;!), which selects the overload that most-closely matches the cast type.
190	However, since ascription does not exist in C due to the lack of overloadable identifiers, if a cast argument has an unambiguous interpretation as a conversion argument then it must be interpreted as such, even if the ascription interpretation would have a lower overall cost.
191	This is demonstrated in the following example, adapted from the C standard library:
192
193	\begin{cfa}
194	unsigned long long x;
195	(unsigned)(x >> 32);
196	\end{cfa}
197
198	In C semantics, this example is unambiguously upcasting !32! to !unsigned long long!, performing the shift, then downcasting the result to !unsigned!, at total cost $(1,0,3,1,0,0,0)$.
199	If ascription were allowed to be a first-class interpretation of a cast expression, it would be cheaper to select the !unsigned! interpretation of !?>>?! by downcasting !x! to !unsigned! and upcasting !32! to !unsigned!, at a total cost of $(1,0,1,1,0,0,0)$.
200	However, this break from C semantics is not backwards compatibile, so to maintain C compatibility, the \CFA{} resolver selects the lowest-cost interpretation of the cast argument for which a conversion or coercion to the target type exists (upcasting to !unsigned long long! in the example above, due to the lack of unsafe downcasts), using the cost of the conversion itself only as a tie-breaker.
201	For example, in !int x; double x; (int)x;!, both declarations have zero-cost interpretations as !x!, but the !int x! interpretation is cheaper to cast to !int!, and is thus selected.
202	Thus, in contrast to the lazy resolution of nested function-call expressions discussed above, where final interpretations for each subexpression are not chosen until the top-level expression is reached, cast expressions introduce eager resolution of their argument subexpressions, as if that argument was itself a top-level expression.
203
204	\section{Resolution Algorithms}
205
206	\CFA{} expression resolution is not, in general, polynomial in the size of the input expression, as shown in Section~\ref{resn-analysis-sec}.
207	While this theoretical result is daunting, its implications can be mitigated in practice.
208	\CFACC{} does not solve one instance of expression resolution in the course of compiling a program, but rather thousands; therefore, if the worst case of expression resolution is sufficiently rare, worst-case instances can be amortized by more-common easy instances for an acceptable overall runtime, as shown in Section~\ref{instance-expr-sec}.
209	Secondly, while a programmer \emph{can} deliberately generate a program designed for inefficient compilation\footnote{See for instance \cite{Haberman16}, which generates arbitrarily large \CC{} template expansions from a fixed-size source file.}, source code tends to follow common patterns.
210	Programmers generally do not want to run the full compiler algorithm in their heads, and as such keep mental shortcuts in the form of language idioms.
211	If the compiler can be tuned to handle idiomatic code more efficiently, then the reduction in runtime for idiomatic (but otherwise difficult) resolution instances can make a significant difference in total compiler runtime.
212
213	\subsection{Worst-case Analysis} \label{resn-analysis-sec}
214
215	Expression resolution has a number of components that contribute to its runtime, including argument-parameter type unification, recursive traversal of the expression tree, and satisfaction of type assertions.
216
217	If the bound type for a type variable can be looked up or mutated in constant time (as asserted in Table~\ref{env-bounds-table}), then the runtime of the unification algorithm to match an argument to a parameter is usually proportional to the complexity of the types being unified.
218	In C, complexity of type representation is bounded by the most-complex type explicitly written in a declaration, effectively a small constant; in \CFA{}, however, polymorphism can generate more-complex types:
219
220	\begin{cfa}
221	forall(otype T) pair(T) wrap(T x, T y);
222
223	wrap(wrap(wrap(1, 2), wrap(3, 4)), wrap(wrap(5, 6), wrap(7, 8)));
224	\end{cfa}
225
226	To resolve the outermost !wrap!, the resolver must check that !pair(pair(int))! unifies with itself, but at three levels of nesting, !pair(pair(int))! is more complex than either !pair(T)! or !T!, the types in the declaration of !wrap!.
227	Accordingly, the cost of a single argument-parameter unification is $O(d)$, where $d$ is the depth of the expression tree, and the cost of argument-parameter unification for a single candidate for a given function call expression is $O(pd)$, where $p$ is the number of parameters.
228	This bound does not, however, account for the higher costs of unifying two polymorphic type variables, which may in the worst case result in a recursive unification of all type variables in the expression (as discussed in Chapter~\ref{env-chap}).
229	Since this recursive unification reduces the number of type variables, it may happen at most once, for an added $O(p^d)$ cost for a top-level expression with $O(p^d)$ type variables.
230
231	Implicit conversions are also checked in argument-parameter matching, but the cost of checking for the existence of an implicit conversion is again proportional to the complexity of the type, $O(d)$.
232	Polymorphism also introduces a potential expense here; for a monomorphic function there is only one potential implicit conversion from argument type to parameter type, while if the parameter type is an unbound polymorphic type-variable then any implicit conversion from the argument type could potentially be considered a valid binding for that type variable.
233	\CFA{}, however, requires exact matches for the bound type of polymorphic parameters, removing this problem.
234	An interesting question for future work is whether loosening this requirement incurs a significant compiler runtime cost in practice; preliminary results from the prototype system described in Chapter~\ref{expr-chap} suggest it does not.
235
236	Considering the recursive traversal of the expression tree, polymorphism again greatly expands the worst-case runtime.
237	Let $i$ be the number of candidate declarations for each function call; if all of these candidates are monomorphic, then there are no more than $i$ unambiguous interpretations of the subexpression rooted at that function call.
238	Ambiguous minimal-cost subexpression interpretations may also be collapsed into a single \emph{ambiguous interpretation}, as the presence of such a subexpression interpretation in the final solution is an error condition.
239	One safe pruning operation during expression resolution is to discard all subexpression interpretations with greater-than-minimal cost for their return type, as such interpretations cannot beat the minimal-cost interpretation with their return type for the overall optimal solution.
240	As such, with no polymorphism, each declaration can generate no more than one minimal-cost interpretation with its return type, so the number of possible subexpression interpretations is $O(i)$ (note that in C, which lacks overloading, $i \leq 1$).
241	With polymorphism, however, a single declaration (like !wrap! above) can have many concrete return types after type variable substitution, and could in principle have a different concrete return type for each combination of argument interpretations.
242	Calculated recursively, the bound on the total number of candidate interpretations is $O(i^{p^d})$, each with a distinct type.
243
244	Given these calculations of number of subexpression interpretations and matching costs, the upper bound on runtime for generating candidates for a single subexpression $d$ levels up from the leaves is $O( i^{p^d} \cdot pd )$.
245	Since there are $O(p^d)$ subexpressions in a single top-level expression, the total worst-case cost of argument-parameter matching with the overloading and polymorphism features of \CFA{} is $O( i^{p^d} \cdot pd \cdot p^d )$.
246	Since the size of the expression is $O(p^d)$, letting $n = p^d$ this simplifies to $O(i^n \cdot n^2)$
247
248	This bound does not yet account for the cost of assertion satisfaction, however.
249	\CFA{} uses type unification on the assertion type and the candidate declaration type to test assertion satisfaction; this unification calculation has cost proportional to the complexity of the declaration type after substitution of bound type variables; as discussed above, this cost is $O(d)$.
250	If there are $O(a)$ type assertions on each declaration, there are $O(i)$ candidates to satisfy each assertion, for a total of $O(ai)$ candidates to check for each declaration.
251	However, each assertion candidate may generate another $O(a)$ assertions, recursively until the assertion recursion limit $r$ is reached, for a total cost of $O((ai)^r \cdot d)$.
252	Now, $a$ and $i$ are properties of the set of declarations in scope, while $r$ is defined by the language spec, so $(ai)^r$ is essentially a constant for purposes of expression resolution, albeit a very large one.
253	It is not uncommon in \CFA{} to have functions with dozens of assertions, and common function names (\eg{} !?{}!, the constructor) can have hundreds of overloads.
254
255	It is clear that assertion satisfaction costs can be very large, and in fact a method for heuristically reducing these costs is one of the key contributions of this thesis, but it should be noted that the worst-case analysis is a particularly poor match for actual code in the case of assertions.
256	It is reasonable to assume that most code compiles without errors, as an actively-developed project is compiled many times, generally with relatively few new errors introduced between compiles.
257	However, the worst-case bound for assertion satisfaction is based on recursive assertion satisfaction calls exceeding the limit, which is an error case.
258	In practice, then, the depth of recursive assertion satisfaction should be bounded by a small constant for error-free code, which accounts for the vast majority of problem instances.
259
260	Similarly, uses of polymorphism like those that generate the $O(d)$ bound on unification or the $O(i^{p^d})$ bound on number of candidates are rare, but not completely absent.
261	This analysis points to type unification, argument-parameter matching, and assertion satisfaction as potentially costly elements of expression resolution, and thus profitable targets for algorithmic investigation.
262	Type unification is discussed in Chapter~\ref{env-chap}, while the other aspects are covered below.
263
264	\subsection{Argument-Parameter Matching} \label{arg-parm-matching-sec}
265
266	Pruning possible interpretations as early as possible is one way to reduce the real-world cost of expression resolution, provided that a sufficient proportion of interpretations are pruned to pay for the cost of the pruning algorithm.
267	One opportunity for interpretation pruning is by the argument-parameter type matching, but the literature \cite{Baker82,Bilson03,Cormack81,Ganzinger80,Pennello80,PW:overload} provides no clear answers on whether candidate functions should be chosen according to their available arguments, or whether argument resolution should be driven by the available function candidates.
268	For programming languages without implicit conversions, argument-parameter matching is essentially the entirety of the expression resolution problem, and is generally referred to as ``overload resolution'' in the literature.
269	All expression-resolution algorithms form a DAG of interpretations, some explicitly, some implicitly; in this DAG, arcs point from function-call interpretations to argument interpretations, as in Figure~\ref{res-dag-fig}
270
271	\begin{figure}[h]
272	\centering
273	\begin{subfigure}[h]{3in}
274	\begin{cfa}
275	char *p$\(_1\)$;
276	int *p$\(_2\)$;
277
278	char f$\(_1\)$(char, int*);
279	double f$\(_2\)$(int, int*);
280
281	f$\(_A\)$( f$\(_B\)$( p$\(_A\)$, p$\(_B\)$ ), p$\(_C\)$ );
282	\end{cfa}
283	\end{subfigure}~\begin{subfigure}[h]{2.5in}
284	\includegraphics{figures/resolution-dag}
285	\end{subfigure}
286	\caption[Resolution DAG for a simple expression.]{Resolution DAG for a simple expression, annotated with explanatory subscripts. Functions that do not have a valid argument matching are covered with an \textsf{X}.} \label{res-dag-fig}
287	\end{figure}
288
289	Note that some interpretations may be part of more than one super-interpretation, as with the !p!$_2$ interpretation of !p!$_B$, while some valid subexpression interpretations, like the !f!$_2$ interpretation of !f!$_B$, are not used in any interpretation of their superexpression.
290
291	Overload resolution was first seriously considered in the development of compilers for the Ada programming language, with different algorithms making various numbers of passes over the expression DAG, these passes being either top-down or bottom-up.
292	Baker's algorithm~\cite{Baker82} takes a single pass from the leaves of the expression tree up, pre-computing argument candidates at each step.
293	For each candidate function, Baker attempts to match argument types to parameter types in sequence, failing if any parameter cannot be matched.
294
295	Bilson~\cite{Bilson03} similarly pre-computes argument-candidates in a single bottom-up pass in the original \CFACC{}, but then explicitly enumerates all possible argument combinations for a multi-parameter function.
296	These argument combinations are matched to the parameter types of the candidate function as a unit rather than individual arguments.
297	Bilson's approach is less efficient than Baker's, as the same argument may be compared to the same parameter many times, but does allow a more straightforward handling of polymorphic type-binding and tuple-typed expressions.
298
299	Unlike Baker and Bilson, Cormack's algorithm~\cite{Cormack81} requests argument candidates that match the type of each parameter of each candidate function, in a single pass from the top-level expression down; memoization of these requests is presented as an optimization.
300	As presented, this algorithm requires the parameter to have a known type, which is a poor fit for polymorphic type parameters in \CFA{}.
301	Cormack's algorithm can be modified to request argument interpretations of \emph{any} type when provided an unbound parameter type, but this eliminates any pruning gains that could be provided by the algorithm.
302
303	Ganzinger and Ripken~\cite{Ganzinger80} propose an approach (later refined by Pennello~\etal{}~\cite{Pennello80}) that uses a top-down filtering pass followed by a bottom-up filtering pass to reduce the number of candidate interpretations; they prove that a small number of such iterations is sufficient to converge to a solution for the overload resolution problem in the Ada programming language.
304	Persch~\etal{}~\cite{PW:overload} developed a similar two-pass approach where the bottom-up pass is followed by the top-down pass.
305	These approaches differ from Baker, Bilson, or Cormack in that they take multiple passes over the expression tree to yield a solution by applying filtering heuristics to all expression nodes.
306	This approach of filtering out invalid types is unsuited to \CFA{} expression resolution, however, due to the presence of polymorphic functions and implicit conversions.
307
308	\cbstart
309	Some other language designs solve the matching problem by forcing a bottom-up order.
310	\CC{}, for instance, defines its overload-selection algorithm in terms of a partial order between function overloads given a fixed list of argument candidates, implying that the arguments must be selected before the function.
311	This design choice improves worst-case expression resolution time by only propagating a single candidate for each subexpression, but type annotations must be provided for any function call which is polymorphic in its return type, and these annotations are often redundant.
312	\CFA{}, by contrast, saves programmers from redundant annotations with its richer inference:
313
314	\begin{cfa}
315	forall(dtype T \| sized(T)) T* malloc();
316
317	int* p = malloc(); $\C{// Infers T = int from left-hand side}$
318	\end{cfa}
319	\cbend
320
321	Baker~\cite{Baker82} left empirical comparison of different overload resolution algorithms to future work; Bilson~\cite{Bilson03} described an extension of Baker's algorithm to handle implicit conversions and polymorphism, but did not further explore the space of algorithmic approaches to handle both overloaded names and implicit conversions.
322	This thesis closes that gap in the literature by performing performance comparisons of both top-down and bottom-up expression resolution algorithms, with results reported in Chapter~\ref{expr-chap}.
323
324	\subsection{Assertion Satisfaction} \label{assn-sat-sec}
325
326	The assertion satisfaction algorithm designed by Bilson~\cite{Bilson03} for the original \CFACC{} is the most-relevant prior work to this project.
327	Before accepting any subexpression candidate, Bilson first checks that that candidate's assertions can all be resolved; this is necessary due to Bilson's addition of assertion satisfaction costs to candidate costs (discussed in Section~\ref{conv-cost-sec}).
328	If this subexpression interpretation ends up not being used in the final resolution, then the (sometimes substantial) work of checking its assertions ends up wasted.
329	Bilson's assertion checking function recurses on two lists, !need! and !newNeed!, the current declaration's assertion set and those implied by the assertion-satisfying declarations, respectively, as detailed in the pseudo-code below (ancillary aspects of the algorithm are omitted for clarity):
330
331	\begin{cfa}
332	List(List(Declaration)) checkAssertions(
333	List(Assertion) need, List(Assertion) newNeed, List(Declaration) have,
334	Environment env ) {
335	if ( is_empty(need) ) {
336	if ( is_empty( newNeed ) ) return { have };
337	else return checkAssertions( newNeed, {}, have, env );
338	}
339
340	Assertion a = head(need);
341	Type adjType = substitute( a.type, env );
342	List(Declaration) candidates = decls_matching( a.name );
343	List(List(Declaration)) alternatives = {}
344	for ( Declaration c : candidates ) {
345	Environment newEnv = env;
346	if ( unify( adjType, c.type, newEnv ) ) {
347	append( alternatives,
348	checkAssertions(
349	tail(need), append(newNeed, c.need), append(have, c), newEnv ) );
350	}
351	}
352	return alternatives;
353	}
354	\end{cfa}
355
356	One shortcoming of this approach is that if an earlier assertion has multiple valid candidates, later assertions may be checked many times due to the structure of the recursion.
357	Satisfying declarations for assertions are not completely independent of each other, since the unification process may produce new type bindings in the environment, and these bindings may not be compatible between independently-checked assertions.
358	Nonetheless, with the environment data-structures discussed in Chapter~\ref{env-chap}, I have found it more efficient to produce a list of possibly-satisfying declarations for each assertion once, then check their respective environments for mutual compatibility when combining lists of assertions together.
359
360	Another improvement I have made to the assertion resolution scheme in \CFACC{} is to consider all assertion-satisfying combinations at one level of recursion before attempting to recursively satisfy any !newNeed! assertions.
361	Monomorphic functions are cheaper than polymorphic functions for assertion satisfaction because they are an exact match for the environment-adjusted assertion type, whereas polymorphic functions require an extra type binding.
362	Thus, if there is any mutually-compatible set of assertion-satisfying declarations that does not include any polymorphic functions (and associated recursive assertions), then the optimal set of assertions does not require any recursive !newNeed! satisfaction.
363	More generally, due to the \CFA{} cost-model changes I introduced in Section~\ref{conv-cost-sec}, the conversion cost of an assertion-satisfying declaration is no longer dependent on the conversion cost of its own assertions.
364	As such, all sets of mutually-compatible assertion-satisfying declarations can be sorted by their summed conversion costs, and the recursive !newNeed! satisfaction pass is required only to check the feasibility of the minimal-cost sets.
365	This optimization significantly reduces wasted work relative to Bilson's approach, as well as avoiding generation of deeply-recursive assertion sets, for a significant performance improvement relative to Bilson's \CFACC{}.
366
367	Making the conversion cost of an interpretation independent of the cost of satisfying its assertions has further benefits.
368	Bilson's algorithm checks assertions for all subexpression interpretations immediately, including those that are not ultimately used; I have developed a \emph{deferred} variant of assertion checking that waits until a top-level interpretation has been generated to check any assertions.
369	If the assertions of the minimal-cost top-level interpretation cannot be satisfied then the next-most-minimal-cost interpretation's assertions are checked, and so forth until a minimal-cost satisfiable interpretation (or ambiguous set thereof) is found, or no top-level interpretations are found to have satisfiable assertions.
370	In the common case where the code actually does compile, this saves the work of checking assertions for ultimately-rejected interpretations, though it does rule out some pruning opportunities for subinterpretations with unsatisfiable assertions or which are more expensive than a minimal-cost polymorphic function with the same return type.
371	The experimental results in Chapter~\ref{expr-chap} indicate that this is a worthwhile trade-off.
372
373	Optimizing assertion satisfaction for common idioms has also proved effective in \CFA{}; the code below is an unexceptional print statement derived from the \CFA{} test suite that nonetheless is a very difficult instance of expression resolution:
374
375	\begin{cfa}
376	sout \| "one" \| 1 \| "two" \| 2 \| "three" \| 3 \| "four" \| 4 \| "five" \| 5 \| "six" \| 6
377	\| "seven" \| 7 \| "eight" \| 8 \| "nine" \| 9 \| "ten" \| 10 \| "end" \| nl \| nl;
378	\end{cfa}
379
380	The first thing that makes this expression so difficult is that it is 23 levels deep; Section~\ref{resn-analysis-sec} indicates that the worst-case bounds on expression resolution are exponential in expression depth.
381	Secondly, the !?\|?! operator is significantly overloaded in \CFA{} --- there are 74 such operators in the \CFA{} standard library, and while 9 are arithmetic operators inherited from C, the rest are polymorphic I/O operators that look similar to:
382
383	\begin{cfa}
384	forall( dtype ostype \| ostream( ostype ) )
385	ostype& ?\|? ( ostype&, int );
386	\end{cfa}
387
388	Note that !ostream! is a trait with 25 type assertions, and that the output operators for the other arithmetic types are also valid for the !int!-type parameters due to implicit conversions.
389	On this instance, deferred assertion satisfaction saves wasted work checking assertions on the wrong output operators, but does nothing about the 23 repeated checks of the 25 assertions to determine that !ofstream! (the type of !sout!) satisfies !ostream!.
390
391	To solve this problem, I have developed a \emph{cached} variant of assertion checking.
392	During the course of checking the assertions of a single top-level expression, the results are cached for each assertion checked.
393	The search key for this cache is the assertion declaration with its type variables substituted according to the type environment to distinguish satisfaction of the same assertion for different types.
394	This adjusted assertion declaration is then run through the \CFA{} name-mangling algorithm to produce an equivalent string-type key.
395
396	\cbstart
397	One superficially-promising optimization which I did not pursue is caching assertion-satisfaction judgements between top-level expressions.
398	This approach would be difficult to correctly implement in a \CFA{} compiler, due to the lack of a closed set of operations for a given type.
399	New declarations related to a particular type can be introduced in any lexical scope in \CFA{}, and these added declarations may cause an assertion that was previously satisfiable to fail due to an introduced ambiguity.
400	Furthermore, given the recursive nature of assertion satisfaction and the possibility of this satisfaction judgement depending on an inferred type, an added declaration may break satisfaction of an assertion with a different name and that operates on different types.
401	Given these concerns, correctly invalidating a cross-expression assertion satisfaction cache for \CFA{} is a non-trivial problem, and the overhead of such an approach may possibly outweigh any benefits from such caching.
402	\cbend
403
404	The assertion satisfaction aspect of \CFA{} expression resolution bears some similarity to satisfiability problems from logic, and as such other languages with similar trait and assertion mechanisms make use of logic-program solvers in their compilers.
405	For instance, Matsakis~\cite{Matsakis17} and the Rust team have developed a PROLOG-based engine to check satisfaction of Rust traits.
406	The combination of the assertion satisfaction elements of the problem with the conversion-cost model of \CFA{} makes this logic-solver approach difficult to apply in \CFACC{}, however.
407	Expressing assertion resolution as a satisfiability problem ignores the cost optimization aspect, which is necessary to decide among what are often many possible satisfying assignments of declarations to assertions.
408	On the other hand, the deeply-recursive nature of the satisfiability problem makes it difficult to adapt to optimizing solver approaches such as linear programming.
409	To maintain a well-defined programming language, any optimization algorithm used must provide an exact (rather than approximate) solution; this constraint also rules out a whole class of approximately-optimal generalized solvers.
410	As such, I opted to continue Bilson's approach of designing a bespoke solver for \CFA{} assertion satisfaction, rather than attempting to re-express the problem in some more general formalism.
411
412	\section{Conclusion \& Future Work} \label{resn-conclusion-sec}
413
414	As the results in Chapter~\ref{expr-chap} show, the algorithmic approaches I have developed for \CFA{} expression resolution are sufficient to build a practically-performant \CFA{} compiler.
415	This work may also be of use to other compiler construction projects, notably to members of the \CC{} community as they implement the new Concepts \cite{C++Concepts} standard, which includes type assertions similar to those used in \CFA{}, as well as the C-derived implicit conversion system already present in \CC{}.
416
417	I have experimented with using expression resolution rather than type unification to check assertion satisfaction; this variant of the expression resolution problem should be investigated further in future work.
418	This approach is more flexible than type unification, allowing for conversions to be applied to functions to satisfy assertions.
419	Anecdotally, this flexibility matches user-programmer expectations better, as small type differences (\eg{} the presence or absence of a reference type, or the usual conversion from !int! to !long!) no longer break assertion satisfaction.
420	Practically, the resolver prototype discussed in Chapter~\ref{expr-chap} uses this model of assertion satisfaction, with no apparent deficit in performance; the generated expressions that are resolved to satisfy the assertions are easier than the general case because they never have nested subexpressions, which eliminates much of the theoretical differences between unification and resolution.
421	The main challenge to implement this approach in \CFACC{} is applying the implicit conversions generated by the resolution process in the code-generation for the thunk functions that \CFACC{} uses to pass type assertions to their requesting functions with the proper signatures.
422
423	One \CFA{} feature that could be added to improve the ergonomics of overload selection is an \emph{ascription cast}; as discussed in Section~\ref{expr-cost-sec}, the semantics of the C cast operator are to choose the cheapest argument interpretation which is convertable to the target type, using the conversion cost as a tie-breaker.
424	An ascription cast would reverse these priorities, choosing the argument interpretation with the cheapest conversion to the target type, only using interpretation cost to break ties\footnote{A possible stricter semantics would be to select the cheapest interpretation with a zero-cost conversion to the target type, reporting a compiler error otherwise.}.
425	This would allow ascription casts to the desired return type to be used for overload selection:
426
427	\begin{cfa}
428	int f$\(_1\)$(int);
429	int f$\(_2\)$(double);
430	int g$\(_1\)$(int);
431	double g$\(_2\)$(long);
432
433	f((double)42); $\C[4.5in]{// select f\(_2\) by argument cast}$
434	(as double)g(42); $\C[4.5in]{// select g\(_2\) by return ascription cast}$
435	(double)g(42); $\C[4.5in]{// select g\(_1\) NOT g\(_2\) because of parameter conversion cost}$
436	\end{cfa}
437
438	Though performance of the existing resolution algorithms is promising, some further optimizations do present themselves.
439	The refined cost model discussed in Section~\ref{conv-cost-sec} is more expressive, but requires more than twice as many fields; it may be fruitful to investigate more tightly-packed in-memory representations of the cost-tuple, as well as comparison operations that require fewer instructions than a full lexicographic comparison.
440	Integer or vector operations on a more-packed representation may prove effective, though dealing with the negative-valued $specialization$ field may require some effort.
441
442	Parallelization of various phases of expression resolution may also be useful.
443	The algorithmic variants I have introduced for both argument-parameter matching and assertion satisfaction are essentially divide-and-conquer algorithms, which solve subproblem instances for each argument or assertion, respectively, then check mutual compatibility of the solutions.
444	While the checks for mutual compatibility are naturally more serial, there may be some benefit to parallel resolution of the subproblem instances.
445
446	The resolver prototype built for this project and described in Chapter~\ref{expr-chap} would be a suitable vehicle for many of these further experiments, and thus a technical contribution of continuing utility.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: