# Changeset f845e80 for doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex

Ignore:
Timestamp:
Apr 25, 2019, 2:23:48 PM (3 years ago)
Branches:
aaron-thesis, arm-eh, cleanup-dtors, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr
Children:
98b4b12
Parents:
69c37cc
Message:

thesis: apply round 2 revisions and strip change bars

File:
1 edited

### Legend:

Unmodified
 r69c37cc A given matching between identifiers and declarations in an expression is an \emph{interpretation}; an interpretation also includes information about polymorphic type bindings and implicit casts to support the \CFA{} features discussed in Sections~\ref{poly-func-sec} and~\ref{implicit-conv-sec}, each of which increase the number of valid candidate interpretations. To choose among valid interpretations, a \emph{conversion cost} is used to rank interpretations. \cbstart This conversion cost is summed over all subexpression interpretations in the interpretation of a top-level expression. \cbend Hence, the expression resolution problem is to find the unique minimal-cost interpretation for an expression, reporting an error if no such unique interpretation exists. \section{Expression Resolution} \cbstart The expression resolution pass in \CFACC{} must traverse the input expression, match identifiers to available declarations, rank candidate interpretations according to their conversion cost, and check type assertion satisfaction for these candidates. Once the set of valid interpretations for the top-level expression has been found, the expression resolver selects the unique minimal-cost candidate or reports an error. The expression resolution pass in \CFACC{} must traverse an input expression, match identifiers to available declarations, rank candidate interpretations according to their conversion cost, and check type assertion satisfaction for these candidates. Once the set of valid interpretations for the top-level expression is found, the expression resolver selects the unique minimal-cost candidate or reports an error. The expression resolution problem in \CFA{} is more difficult than the analogous problems in C or \CC{}. As mentioned above, the lack of name overloading in C makes its resolution problem substantially easier, but a comparison of the richer type systems in \CFA{} and \CC{} highlights some of the challenges in \CFA{} expression resolution. As mentioned above, the lack of name overloading in C (except for built-in operators) makes its resolution problem substantially easier. A comparison of the richer type systems in \CFA{} and \CC{} highlights some of the challenges in \CFA{} expression resolution. The key distinction between \CFA{} and \CC{} resolution is that \CC{} uses a greedy algorithm for selection of candidate functions given their argument interpretations, whereas \CFA{} allows contextual information from superexpressions to influence the choice among candidate functions. One key use of this contextual information is for type inference of polymorphic return types; \CC{} requires explicit specification of template type parameters that only occur in a function's return type, while \CFA{} allows the instantiation of these type parameters to be inferred from context (and in fact does not allow explicit specification of type parameters to a function). One key use of this contextual information is for type inference of polymorphic return types; \CC{} requires explicit specification of template parameters that only occur in a function's return type, while \CFA{} allows the instantiation of these type parameters to be inferred from context(and in fact does not allow explicit specification of type parameters to a function), as in the following example: \begin{cfa} forall(dtype T) T& deref(T*); $\C{// dereferences pointer}$ forall(otype T) T* def(); $\C{// new heap-allocated default-initialized value}$ int& i = deref( def() ); \end{cfa} In this example, the \CFA{} compiler infers the type arguments of !deref! and !def! from the !int&! type of !i!; \CC{}, by contrast, requires a type parameter on !def!\footnote{The type parameter of \lstinline{deref} can be inferred from its argument.}, \ie{} !deref( def() )!. Similarly, while both \CFA{} and \CC{} rank candidate functions based on a cost metric for implicit conversions, \CFA{} allows a suboptimal subexpression interpretation to be selected if it allows a lower-cost overall interpretation, while \CC{} requires that each subexpression interpretation have minimal cost. Because of this use of contextual information, the \CFA{} expression resolver must consider multiple interpretations of each function argument, while the \CC{} compiler has only a single interpretation for each argument\footnote{With the exception of address-of operations on functions.}. Additionally, until the introduction of concepts in \CCtwenty{} \cite{C++Concepts}, \CC{} expression resolution has no analogue to \CFA{} assertion satisfaction checking, a further  complication for a \CFA{} compiler. The precise definition of \CFA{} expression resolution in this section further expands on the challenges of this problem. \cbend \subsection{Type Unification} \subsection{Conversion Cost} \label{conv-cost-sec} \cbstart \CFA{}, like C, allows inexact matches between the type of function parameters and function call arguments. Both languages insert \emph{implicit conversions} in these situations to produce an exact type match, and \CFA{} also uses the relative \emph{cost} of different conversions to select among overloaded function candidates. \cbend C does not have an explicit cost model for implicit conversions, but the usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} used to decide which arithmetic operators to apply define one implicitly. The only context in which C has name overloading is the arithmetic operators, and the usual arithmetic conversions define a \emph{common type} for mixed-type arguments to binary arithmetic operators. In terms of the core argument-parameter matching algorithm, overloaded variables are handled the same as zero-argument function calls, aside from a different pool of candidate declarations and setup for different code generation. Similarly, an aggregate member expression !a.m! can be modelled as a unary function !m! that takes one argument of the aggregate type. Literals do not require sophisticated resolution, as in C the syntactic form of each implies their result types: !42! is !int!, !"hello"! is !char*!, \etc{}\footnote{ \cbstart Struct literals (\eg{} \lstinline|(S)\{ 1, 2, 3 \}| for some struct \lstinline{S} \cbend ) are a somewhat special case, as they are known to be of type \lstinline{S}, but require resolution of the implied constructor call described in Section~\ref{ctor-sec}.}. Literals do not require sophisticated resolution, as in C the syntactic form of each implies their result types: !42! is !int!, !"hello"! is !char*!, \etc{}\footnote{Struct literals (\eg{} \lstinline|(S)\{ 1, 2, 3 \}| for some struct \lstinline{S}) are a somewhat special case, as they are known to be of type \lstinline{S}, but require resolution of the implied constructor call described in Section~\ref{ctor-sec}.}. Since most expressions can be treated as function calls, nested function calls are the primary component of complexity in expression resolution. This approach of filtering out invalid types is unsuited to \CFA{} expression resolution, however, due to the presence of polymorphic functions and implicit conversions. \cbstart Some other language designs solve the matching problem by forcing a bottom-up order. \CC{}, for instance, defines its overload-selection algorithm in terms of a partial order between function overloads given a fixed list of argument candidates, implying that the arguments must be selected before the function. This design choice improves worst-case expression resolution time by only propagating a single candidate for each subexpression, but type annotations must be provided for any function call that is polymorphic in its return type, and these annotations are often redundant. \CFA{}, by contrast, saves programmers from redundant annotations with its richer inference: This design choice improves worst-case expression resolution time by only propagating a single candidate for each subexpression, but type annotations must be provided for any function call that is polymorphic in its return type, and these annotations are often redundant: \begin{C++} template T* malloc() { /* ... */ } int* p = malloc(); $\C{// T = int must be explicitly supplied}$ \end{C++} \CFA{} saves programmers from redundant annotations with its richer inference: \begin{cfa} int* p = malloc(); $\C{// Infers T = int from left-hand side}$ \end{cfa} \cbend Baker~\cite{Baker82} left empirical comparison of different overload resolution algorithms to future work; Bilson~\cite{Bilson03} described an extension of Baker's algorithm to handle implicit conversions and polymorphism, but did not further explore the space of algorithmic approaches to handle both overloaded names and implicit conversions. This adjusted assertion declaration is then run through the \CFA{} name-mangling algorithm to produce an equivalent string-type key. \cbstart One superficially-promising optimization which I did not pursue is caching assertion-satisfaction judgments between top-level expressions. One superficially-promising optimization, which I did not pursue, is caching assertion-satisfaction judgments among top-level expressions. This approach would be difficult to correctly implement in a \CFA{} compiler, due to the lack of a closed set of operations for a given type. New declarations related to a particular type can be introduced in any lexical scope in \CFA{}, and these added declarations may cause an assertion that was previously satisfiable to fail due to an introduced ambiguity. Furthermore, given the recursive nature of assertion satisfaction and the possibility of this satisfaction judgment depending on an inferred type, an added declaration may break satisfaction of an assertion with a different name and that operates on different types. Given these concerns, correctly invalidating a cross-expression assertion satisfaction cache for \CFA{} is a non-trivial problem, and the overhead of such an approach may possibly outweigh any benefits from such caching. \cbend The assertion satisfaction aspect of \CFA{} expression resolution bears some similarity to satisfiability problems from logic, and as such other languages with similar trait and assertion mechanisms make use of logic-program solvers in their compilers. The combination of the assertion satisfaction elements of the problem with the conversion-cost model of \CFA{} makes this logic-solver approach difficult to apply in \CFACC{}, however. Expressing assertion resolution as a satisfiability problem ignores the cost optimization aspect, which is necessary to decide among what are often many possible satisfying assignments of declarations to assertions. \cbstartx (MaxSAT solvers \cite{Morgado13}, which allow weights on solutions to satisfiability problems, may be a productive avenue for future investigation.) \cbendx On the other hand, the deeply-recursive nature of the satisfiability problem makes it difficult to adapt to optimizing solver approaches such as linear programming. To maintain a well-defined programming language, any optimization algorithm used must provide an exact (rather than approximate) solution; this constraint also rules out a whole class of approximately-optimal generalized solvers.