Ignore:
Timestamp:
Apr 25, 2019, 2:23:48 PM (3 years ago)
Author:
Aaron Moss <a3moss@…>
Branches:
aaron-thesis, arm-eh, cleanup-dtors, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr
Children:
98b4b12
Parents:
69c37cc
Message:

thesis: apply round 2 revisions and strip change bars

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex

    r69c37cc rf845e80  
    66A given matching between identifiers and declarations in an expression is an \emph{interpretation}; an interpretation also includes information about polymorphic type bindings and implicit casts to support the \CFA{} features discussed in Sections~\ref{poly-func-sec} and~\ref{implicit-conv-sec}, each of which increase the number of valid candidate interpretations.
    77To choose among valid interpretations, a \emph{conversion cost} is used to rank interpretations.
    8 \cbstart
    98This conversion cost is summed over all subexpression interpretations in the interpretation of a top-level expression.
    10 \cbend
    119Hence, the expression resolution problem is to find the unique minimal-cost interpretation for an expression, reporting an error if no such unique interpretation exists.
    1210
    1311\section{Expression Resolution}
    1412
    15 \cbstart
    16 The expression resolution pass in \CFACC{} must traverse the input expression, match identifiers to available declarations, rank candidate interpretations according to their conversion cost, and check type assertion satisfaction for these candidates.
    17 Once the set of valid interpretations for the top-level expression has been found, the expression resolver selects the unique minimal-cost candidate or reports an error.
     13The expression resolution pass in \CFACC{} must traverse an input expression, match identifiers to available declarations, rank candidate interpretations according to their conversion cost, and check type assertion satisfaction for these candidates.
     14Once the set of valid interpretations for the top-level expression is found, the expression resolver selects the unique minimal-cost candidate or reports an error.
    1815
    1916The expression resolution problem in \CFA{} is more difficult than the analogous problems in C or \CC{}.
    20 As mentioned above, the lack of name overloading in C makes its resolution problem substantially easier, but a comparison of the richer type systems in \CFA{} and \CC{} highlights some of the challenges in \CFA{} expression resolution.
     17As mentioned above, the lack of name overloading in C (except for built-in operators) makes its resolution problem substantially easier.
     18A comparison of the richer type systems in \CFA{} and \CC{} highlights some of the challenges in \CFA{} expression resolution.
    2119The key distinction between \CFA{} and \CC{} resolution is that \CC{} uses a greedy algorithm for selection of candidate functions given their argument interpretations, whereas \CFA{} allows contextual information from superexpressions to influence the choice among candidate functions.
    22 One key use of this contextual information is for type inference of polymorphic return types; \CC{} requires explicit specification of template type parameters that only occur in a function's return type, while \CFA{} allows the instantiation of these type parameters to be inferred from context (and in fact does not allow explicit specification of type parameters to a function).
     20One key use of this contextual information is for type inference of polymorphic return types; \CC{} requires explicit specification of template parameters that only occur in a function's return type, while \CFA{} allows the instantiation of these type parameters to be inferred from context(and in fact does not allow explicit specification of type parameters to a function), as in the following example:
     21
     22\begin{cfa}
     23forall(dtype T) T& deref(T*); $\C{// dereferences pointer}$
     24forall(otype T) T* def(); $\C{// new heap-allocated default-initialized value}$
     25
     26int& i = deref( def() );
     27\end{cfa}
     28
     29In this example, the \CFA{} compiler infers the type arguments of !deref! and !def! from the !int&! type of !i!; \CC{}, by contrast, requires a type parameter on !def!\footnote{The type parameter of \lstinline{deref} can be inferred from its argument.}, \ie{} !deref( def<int>() )!.
    2330Similarly, while both \CFA{} and \CC{} rank candidate functions based on a cost metric for implicit conversions, \CFA{} allows a suboptimal subexpression interpretation to be selected if it allows a lower-cost overall interpretation, while \CC{} requires that each subexpression interpretation have minimal cost.
    2431Because of this use of contextual information, the \CFA{} expression resolver must consider multiple interpretations of each function argument, while the \CC{} compiler has only a single interpretation for each argument\footnote{With the exception of address-of operations on functions.}.
    2532Additionally, until the introduction of concepts in \CCtwenty{} \cite{C++Concepts}, \CC{} expression resolution has no analogue to \CFA{} assertion satisfaction checking, a further  complication for a \CFA{} compiler.
    2633The precise definition of \CFA{} expression resolution in this section further expands on the challenges of this problem.
    27 \cbend
    2834
    2935\subsection{Type Unification}
     
    3743\subsection{Conversion Cost} \label{conv-cost-sec}
    3844
    39 \cbstart
    4045\CFA{}, like C, allows inexact matches between the type of function parameters and function call arguments.
    4146Both languages insert \emph{implicit conversions} in these situations to produce an exact type match, and \CFA{} also uses the relative \emph{cost} of different conversions to select among overloaded function candidates.
    42 \cbend
    4347C does not have an explicit cost model for implicit conversions, but the ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} used to decide which arithmetic operators to apply define one implicitly.
    4448The only context in which C has name overloading is the arithmetic operators, and the usual arithmetic conversions define a \emph{common type} for mixed-type arguments to binary arithmetic operators.
     
    148152In terms of the core argument-parameter matching algorithm, overloaded variables are handled the same as zero-argument function calls, aside from a different pool of candidate declarations and setup for different code generation.
    149153Similarly, an aggregate member expression !a.m! can be modelled as a unary function !m! that takes one argument of the aggregate type.
    150 Literals do not require sophisticated resolution, as in C the syntactic form of each implies their result types: !42! is !int!, !"hello"! is !char*!, \etc{}\footnote{ \cbstart Struct literals (\eg{} \lstinline|(S)\{ 1, 2, 3 \}| for some struct \lstinline{S} \cbend ) are a somewhat special case, as they are known to be of type \lstinline{S}, but require resolution of the implied constructor call described in Section~\ref{ctor-sec}.}.
     154Literals do not require sophisticated resolution, as in C the syntactic form of each implies their result types: !42! is !int!, !"hello"! is !char*!, \etc{}\footnote{Struct literals (\eg{} \lstinline|(S)\{ 1, 2, 3 \}| for some struct \lstinline{S}) are a somewhat special case, as they are known to be of type \lstinline{S}, but require resolution of the implied constructor call described in Section~\ref{ctor-sec}.}.
    151155
    152156Since most expressions can be treated as function calls, nested function calls are the primary component of complexity in expression resolution.
     
    306310This approach of filtering out invalid types is unsuited to \CFA{} expression resolution, however, due to the presence of polymorphic functions and implicit conversions.
    307311
    308 \cbstart
    309312Some other language designs solve the matching problem by forcing a bottom-up order.
    310313\CC{}, for instance, defines its overload-selection algorithm in terms of a partial order between function overloads given a fixed list of argument candidates, implying that the arguments must be selected before the function.
    311 This design choice improves worst-case expression resolution time by only propagating a single candidate for each subexpression, but type annotations must be provided for any function call that is polymorphic in its return type, and these annotations are often redundant.
    312 \CFA{}, by contrast, saves programmers from redundant annotations with its richer inference:
     314This design choice improves worst-case expression resolution time by only propagating a single candidate for each subexpression, but type annotations must be provided for any function call that is polymorphic in its return type, and these annotations are often redundant:
     315
     316\begin{C++}
     317template<typename T> T* malloc() { /* ... */ }
     318
     319int* p = malloc<int>(); $\C{// T = int must be explicitly supplied}$
     320\end{C++}
     321
     322\CFA{} saves programmers from redundant annotations with its richer inference:
    313323
    314324\begin{cfa}
     
    317327int* p = malloc(); $\C{// Infers T = int from left-hand side}$
    318328\end{cfa}
    319 \cbend
    320329
    321330Baker~\cite{Baker82} left empirical comparison of different overload resolution algorithms to future work; Bilson~\cite{Bilson03} described an extension of Baker's algorithm to handle implicit conversions and polymorphism, but did not further explore the space of algorithmic approaches to handle both overloaded names and implicit conversions.
     
    394403This adjusted assertion declaration is then run through the \CFA{} name-mangling algorithm to produce an equivalent string-type key.
    395404
    396 \cbstart
    397 One superficially-promising optimization which I did not pursue is caching assertion-satisfaction judgments between top-level expressions.
     405One superficially-promising optimization, which I did not pursue, is caching assertion-satisfaction judgments among top-level expressions.
    398406This approach would be difficult to correctly implement in a \CFA{} compiler, due to the lack of a closed set of operations for a given type.
    399407New declarations related to a particular type can be introduced in any lexical scope in \CFA{}, and these added declarations may cause an assertion that was previously satisfiable to fail due to an introduced ambiguity.
    400408Furthermore, given the recursive nature of assertion satisfaction and the possibility of this satisfaction judgment depending on an inferred type, an added declaration may break satisfaction of an assertion with a different name and that operates on different types.
    401409Given these concerns, correctly invalidating a cross-expression assertion satisfaction cache for \CFA{} is a non-trivial problem, and the overhead of such an approach may possibly outweigh any benefits from such caching.
    402 \cbend
    403410
    404411The assertion satisfaction aspect of \CFA{} expression resolution bears some similarity to satisfiability problems from logic, and as such other languages with similar trait and assertion mechanisms make use of logic-program solvers in their compilers.
     
    406413The combination of the assertion satisfaction elements of the problem with the conversion-cost model of \CFA{} makes this logic-solver approach difficult to apply in \CFACC{}, however.
    407414Expressing assertion resolution as a satisfiability problem ignores the cost optimization aspect, which is necessary to decide among what are often many possible satisfying assignments of declarations to assertions.
    408 \cbstartx
    409415(MaxSAT solvers \cite{Morgado13}, which allow weights on solutions to satisfiability problems, may be a productive avenue for future investigation.)
    410 \cbendx
    411416On the other hand, the deeply-recursive nature of the satisfiability problem makes it difficult to adapt to optimizing solver approaches such as linear programming.
    412417To maintain a well-defined programming language, any optimization algorithm used must provide an exact (rather than approximate) solution; this constraint also rules out a whole class of approximately-optimal generalized solvers.
Note: See TracChangeset for help on using the changeset viewer.