Ignore:
Timestamp:
Apr 20, 2019, 6:20:13 PM (3 years ago)
Author:
Aaron Moss <a3moss@…>
Branches:
aaron-thesis, arm-eh, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr
Children:
2834e99
Parents:
e71272a
Message:

thesis: more motivation for expression resolution problem, per Gregor

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex

    re71272a rc1f3d1a8  
    22\label{resolution-chap}
    33
    4 % consider using "satisfaction" throughout when talking about assertions
    5 % "valid" instead of "feasible" interpretations
    6 
    74The main task of the \CFACC{} type-checker is \emph{expression resolution}: determining which declarations the identifiers in each expression correspond to.
    8 Resolution is a straightforward task in C, as no declarations share identifiers, but in \CFA{}, the name overloading features discussed in Section~\ref{overloading-sec} generate multiple candidate declarations for each identifier.
     5Resolution is a straightforward task in C, as no simultaneously-visible declarations share identifiers, but in \CFA{}, the name overloading features discussed in Section~\ref{overloading-sec} generate multiple candidate declarations for each identifier.
    96A given matching between identifiers and declarations in an expression is an \emph{interpretation}; an interpretation also includes information about polymorphic type bindings and implicit casts to support the \CFA{} features discussed in Sections~\ref{poly-func-sec} and~\ref{implicit-conv-sec}, each of which increase the number of valid candidate interpretations.
    107To choose among valid interpretations, a \emph{conversion cost} is used to rank interpretations.
     
    129
    1310\section{Expression Resolution}
     11
     12The expression resolution pass in \CFACC{} must traverse the input expression, match identifiers to available declarations, rank candidate interpretations according to their conversion cost, and check type assertion satisfaction for these candidates.
     13Once the set of valid interpretations for the top-level expression has been found, the expression resolver selects the unique minimal-cost candidate or reports an error.
     14
     15The expression resolution problem in \CFA{} is more difficult than the analogous problems in C or \CC{}.
     16As mentioned above, the lack of name overloading in C makes its resolution problem substantially easier, but a comparison of the richer type systems in \CFA{} and \CC{} highlights some of the challenges in \CFA{} expression resolution.
     17The key distinction between \CFA{} and \CC{} resolution is that \CC{} uses a greedy algorithm for selection of candidate functions given their argument interpretations, whereas \CFA{} allows contextual information from superexpressions to influence the choice among candidate functions.
     18One key use of this contextual information is for type inference of polymorphic return types; \CC{} requires explicit specification of template type parameters which only occur in a function's return type, while \CFA{} allows the instantiation of these type parameters to be inferred from context (and in fact does not allow explicit specification of type parameters to a function).
     19Similarly, while both \CFA{} and \CC{} rank candidate functions based on a cost metric for implicit conversions, \CFA{} allows a suboptimal subexpression interpretation to be selected if it allows a lower-cost overall interpretation, while \CC{} requires that each subexpression interpretation have minimal cost.
     20Because of this use of contextual information, the \CFA{} expression resolver must consider multiple interpretations of each function argument, while the \CC{} compiler has only a single interpretation for each argument\footnote{With the exception of address-of operations on functions.}.
     21Additionally, until the introduction of concepts in \CCtwenty{} \cite{C++Concepts}, \CC{} expression resolution has no analogue to \CFA{} assertion satisfaction checking, a further  complication for a \CFA{} compiler.
     22The precise definition of \CFA{} expression resolution in this section further expands on the challenges of this problem.
    1423
    1524\subsection{Type Unification}
     
    2332\subsection{Conversion Cost} \label{conv-cost-sec}
    2433
    25 C does not have an explicit cost model for implicit conversions, but the ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} used to decide which arithmetic operators to use define one implicitly.
     34\CFA{}, like C, allows inexact matches between the type of function parameters and function call arguments.
     35Both languages insert \emph{implicit conversions} in these situations to produce an exact type match, and \CFA{} also uses the relative \emph{cost} of different conversions to select among overloaded function candidates.
     36C does not have an explicit cost model for implicit conversions, but the ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} used to decide which arithmetic operators to apply define one implicitly.
    2637The only context in which C has name overloading is the arithmetic operators, and the usual arithmetic conversions define a \emph{common type} for mixed-type arguments to binary arithmetic operators.
    2738Since for backward-compatibility purposes the conversion costs of \CFA{} must produce an equivalent result to these common type rules, it is appropriate to summarize \cite[\S{}6.3.1.8]{C11} here:
     
    3950Degree of safe conversion is calculated as path weight in a directed graph of safe conversions between types; Bilson's version and the current version of this graph are in Figures~\ref{bilson-conv-fig} and~\ref{extended-conv-fig}, respectively.
    4051The safe conversion graph is designed such that the common type $c$ of two types $u$ and $v$ is compatible with the C standard definitions from \cite[\S{}6.3.1.8]{C11} and can be calculated as the unique type minimizing the sum of the path weights of $\overrightarrow{uc}$ and $\overrightarrow{vc}$.
    41 The following example lists the cost in the Bilson model of calling each of the following functions with two !int! parameters:
     52The following example lists the cost in the Bilson model of calling each of the following functions with two !int! parameters, where the interpretation with the minimum total cost will be selected:
    4253
    4354\begin{cfa}
     
    7384
    7485\begin{cfa}
    75 forall(dtype T | { T& ++?(T&); }) T& advance$\(1\)$(T& i, int n);
    76 forall(dtype T | { T& ++?(T&); T& ?+=?(T&, int)}) T& advance$\(2\)$(T& i, int n);
     86forall(dtype T | { T& ++?(T&); }) T& advance$\(_1\)$(T& i, int n);
     87forall(dtype T | { T& ++?(T&); T& ?+=?(T&, int)}) T& advance$\(_2\)$(T& i, int n);
    7788\end{cfa}
    7889
     
    96107The new cost model accounts for the fact that functions with more polymorphic variables are less constrained by introducing a $var$ cost element that counts the number of type variables on a candidate function.
    97108In the example above, !f!$_1$ has $var = 2$, while the others have $var = 1$.
     109
    98110The new cost model also accounts for a nuance unhandled by Ditchfield or Bilson, in that it makes the more specific !f!$_4$ cheaper than the more generic !f!$_3$; !f!$_4$ is presumably somewhat optimized for handling pointers, but the prior \CFA{} cost model could not account for the more specific binding, as it simply counted the number of polymorphic unifications.
    99 
    100111In the modified model, each level of constraint on a polymorphic type in the parameter list results in a decrement of the $specialization$ cost element, which is shared with the count of assertions due to their common nature as constraints on polymorphic type bindings.
    101112Thus, all else equal, if both a binding to !T! and a binding to !T*! are available, the model chooses the more specific !T*! binding with $specialization = -1$.
     
    150161However, in context, an unsafe conversion is required to downcast the return type of !g!$_1$ to an !int! suitable for !f!, for a total cost of $(1,0,0,0,0,0,0)$ for !f( g!$_1$!(42) )!.
    151162If !g!$_2$ is chosen, on the other hand, there is a safe upcast from the !int! type of !42! to !long!, but no cast on the return of !g!$_2$, for a total cost of $(0,0,1,0,0,0,0)$ for !f( g!$_2$!(42) )!; as this is cheaper, !g!$_2$ is chosen.
    152 Due to this design, all valid interpretations of subexpressions must in general be propagated to the top of the expression tree before any can be eliminated, a lazy form of expression resolution, as opposed to the eager expression resolution allowed by C, where each expression can be resolved given only the resolution of its immediate subexpressions.
     163Due to this design, all valid interpretations of subexpressions must in general be propagated to the top of the expression tree before any can be eliminated, a lazy form of expression resolution, as opposed to the eager expression resolution allowed by C or \CC{}, where each expression can be resolved given only the resolution of its immediate subexpressions.
    153164
    154165If there are no valid interpretations of the top-level expression, expression resolution fails and must produce an appropriate error message.
Note: See TracChangeset for help on using the changeset viewer.