# Changeset c1f3d1a8 for doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex

Ignore:
Timestamp:
Apr 20, 2019, 6:20:13 PM (3 years ago)
Branches:
aaron-thesis, arm-eh, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr
Children:
2834e99
Parents:
e71272a
Message:

thesis: more motivation for expression resolution problem, per Gregor

File:
1 edited

### Legend:

Unmodified
 re71272a \label{resolution-chap} % consider using "satisfaction" throughout when talking about assertions % "valid" instead of "feasible" interpretations The main task of the \CFACC{} type-checker is \emph{expression resolution}: determining which declarations the identifiers in each expression correspond to. Resolution is a straightforward task in C, as no declarations share identifiers, but in \CFA{}, the name overloading features discussed in Section~\ref{overloading-sec} generate multiple candidate declarations for each identifier. Resolution is a straightforward task in C, as no simultaneously-visible declarations share identifiers, but in \CFA{}, the name overloading features discussed in Section~\ref{overloading-sec} generate multiple candidate declarations for each identifier. A given matching between identifiers and declarations in an expression is an \emph{interpretation}; an interpretation also includes information about polymorphic type bindings and implicit casts to support the \CFA{} features discussed in Sections~\ref{poly-func-sec} and~\ref{implicit-conv-sec}, each of which increase the number of valid candidate interpretations. To choose among valid interpretations, a \emph{conversion cost} is used to rank interpretations. \section{Expression Resolution} The expression resolution pass in \CFACC{} must traverse the input expression, match identifiers to available declarations, rank candidate interpretations according to their conversion cost, and check type assertion satisfaction for these candidates. Once the set of valid interpretations for the top-level expression has been found, the expression resolver selects the unique minimal-cost candidate or reports an error. The expression resolution problem in \CFA{} is more difficult than the analogous problems in C or \CC{}. As mentioned above, the lack of name overloading in C makes its resolution problem substantially easier, but a comparison of the richer type systems in \CFA{} and \CC{} highlights some of the challenges in \CFA{} expression resolution. The key distinction between \CFA{} and \CC{} resolution is that \CC{} uses a greedy algorithm for selection of candidate functions given their argument interpretations, whereas \CFA{} allows contextual information from superexpressions to influence the choice among candidate functions. One key use of this contextual information is for type inference of polymorphic return types; \CC{} requires explicit specification of template type parameters which only occur in a function's return type, while \CFA{} allows the instantiation of these type parameters to be inferred from context (and in fact does not allow explicit specification of type parameters to a function). Similarly, while both \CFA{} and \CC{} rank candidate functions based on a cost metric for implicit conversions, \CFA{} allows a suboptimal subexpression interpretation to be selected if it allows a lower-cost overall interpretation, while \CC{} requires that each subexpression interpretation have minimal cost. Because of this use of contextual information, the \CFA{} expression resolver must consider multiple interpretations of each function argument, while the \CC{} compiler has only a single interpretation for each argument\footnote{With the exception of address-of operations on functions.}. Additionally, until the introduction of concepts in \CCtwenty{} \cite{C++Concepts}, \CC{} expression resolution has no analogue to \CFA{} assertion satisfaction checking, a further  complication for a \CFA{} compiler. The precise definition of \CFA{} expression resolution in this section further expands on the challenges of this problem. \subsection{Type Unification} \subsection{Conversion Cost} \label{conv-cost-sec} C does not have an explicit cost model for implicit conversions, but the usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} used to decide which arithmetic operators to use define one implicitly. \CFA{}, like C, allows inexact matches between the type of function parameters and function call arguments. Both languages insert \emph{implicit conversions} in these situations to produce an exact type match, and \CFA{} also uses the relative \emph{cost} of different conversions to select among overloaded function candidates. C does not have an explicit cost model for implicit conversions, but the usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} used to decide which arithmetic operators to apply define one implicitly. The only context in which C has name overloading is the arithmetic operators, and the usual arithmetic conversions define a \emph{common type} for mixed-type arguments to binary arithmetic operators. Since for backward-compatibility purposes the conversion costs of \CFA{} must produce an equivalent result to these common type rules, it is appropriate to summarize \cite[\S{}6.3.1.8]{C11} here: Degree of safe conversion is calculated as path weight in a directed graph of safe conversions between types; Bilson's version and the current version of this graph are in Figures~\ref{bilson-conv-fig} and~\ref{extended-conv-fig}, respectively. The safe conversion graph is designed such that the common type $c$ of two types $u$ and $v$ is compatible with the C standard definitions from \cite[\S{}6.3.1.8]{C11} and can be calculated as the unique type minimizing the sum of the path weights of $\overrightarrow{uc}$ and $\overrightarrow{vc}$. The following example lists the cost in the Bilson model of calling each of the following functions with two !int! parameters: The following example lists the cost in the Bilson model of calling each of the following functions with two !int! parameters, where the interpretation with the minimum total cost will be selected: \begin{cfa} \begin{cfa} forall(dtype T | { T& ++?(T&); }) T& advance$$$1$$$(T& i, int n); forall(dtype T | { T& ++?(T&); T& ?+=?(T&, int)}) T& advance$$$2$$$(T& i, int n); forall(dtype T | { T& ++?(T&); }) T& advance$$$_1$$$(T& i, int n); forall(dtype T | { T& ++?(T&); T& ?+=?(T&, int)}) T& advance$$$_2$$$(T& i, int n); \end{cfa} The new cost model accounts for the fact that functions with more polymorphic variables are less constrained by introducing a $var$ cost element that counts the number of type variables on a candidate function. In the example above, !f!$_1$ has $var = 2$, while the others have $var = 1$. The new cost model also accounts for a nuance unhandled by Ditchfield or Bilson, in that it makes the more specific !f!$_4$ cheaper than the more generic !f!$_3$; !f!$_4$ is presumably somewhat optimized for handling pointers, but the prior \CFA{} cost model could not account for the more specific binding, as it simply counted the number of polymorphic unifications. In the modified model, each level of constraint on a polymorphic type in the parameter list results in a decrement of the $specialization$ cost element, which is shared with the count of assertions due to their common nature as constraints on polymorphic type bindings. Thus, all else equal, if both a binding to !T! and a binding to !T*! are available, the model chooses the more specific !T*! binding with $specialization = -1$. However, in context, an unsafe conversion is required to downcast the return type of !g!$_1$ to an !int! suitable for !f!, for a total cost of $(1,0,0,0,0,0,0)$ for !f( g!$_1$!(42) )!. If !g!$_2$ is chosen, on the other hand, there is a safe upcast from the !int! type of !42! to !long!, but no cast on the return of !g!$_2$, for a total cost of $(0,0,1,0,0,0,0)$ for !f( g!$_2$!(42) )!; as this is cheaper, !g!$_2$ is chosen. Due to this design, all valid interpretations of subexpressions must in general be propagated to the top of the expression tree before any can be eliminated, a lazy form of expression resolution, as opposed to the eager expression resolution allowed by C, where each expression can be resolved given only the resolution of its immediate subexpressions. Due to this design, all valid interpretations of subexpressions must in general be propagated to the top of the expression tree before any can be eliminated, a lazy form of expression resolution, as opposed to the eager expression resolution allowed by C or \CC{}, where each expression can be resolved given only the resolution of its immediate subexpressions. If there are no valid interpretations of the top-level expression, expression resolution fails and must produce an appropriate error message.