Changeset c1f3d1a8


Ignore:
Timestamp:
Apr 20, 2019, 6:20:13 PM (6 years ago)
Author:
Aaron Moss <a3moss@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
2834e99
Parents:
e71272a
Message:

thesis: more motivation for expression resolution problem, per Gregor

Location:
doc/theses/aaron_moss_PhD/phd
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/aaron_moss_PhD/phd/background.tex

    re71272a rc1f3d1a8  
    1212As this development has been proceeding concurrently with the work described in this thesis, the state of \CFA{} has been somewhat of a moving target; however, Moss~\etal~\cite{Moss18} provides a reasonable summary of the current design.
    1313Notable features added during this period include generic types (Chapter~\ref{generic-chap}), constructors and destructors \cite{Schluntz17}, improved support for tuples \cite{Schluntz17}, reference types \cite{Moss18}, first-class concurrent and parallel programming support \cite{Delisle18}, as well as numerous pieces of syntactic sugar and the start of an idiomatic standard library \cite{Moss18}.
     14
     15This thesis is primarily concerned with the \emph{expression resolution} portion of \CFA{} type-checking; resolution is discussed in more detail in Chapter~\ref{resolution-chap}, but is essentially determining which declarations the identifiers in each expression correspond to.
     16Given that no simultaneously-visible C declarations share identifiers, expression resolution in C is not difficult, but the added features of \CFA{} make its resolution algorithms significantly more complex.
     17Due to this complexity, the expression-resolution pass in \CFACC{} requires 95\% of compiler runtime on some source files, making an efficient procedure for expression resolution a requirement for a performant \CFA{} compiler.
    1418
    1519The selection of features presented in this chapter are chosen to elucidate the design constraints of the work presented in this thesis.
     
    141145However, !twice!$_2$ also works for any type !S! that has an addition operator defined for it.
    142146
    143 Finding appropriate functions to satisfy type assertions is essentially a recursive case of expression resolution, as it takes a name (that of the type assertion) and attempts to match it to a suitable declaration in the current scope.
     147Finding appropriate functions to satisfy type assertions is essentially a recursive case of expression resolution, as it takes a name (that of the type assertion) and attempts to match it to a suitable declaration in the current scope\footnote{\CFACC{} actually performs a type-unification computation for assertion satisfaction rather than an expression resolution computation; see Section~\ref{assn-sat-sec} for details.}.
    144148If a polymorphic function can be used to satisfy one of its own type assertions, this recursion may not terminate, as it is possible that that function is examined as a candidate for its own assertion unboundedly repeatedly.
    145149To avoid such infinite loops, \CFACC{} imposes a fixed limit on the possible depth of recursion, similar to that employed by most \CC{} compilers for template expansion; this restriction means that there are some otherwise semantically well-typed expressions that cannot be resolved by \CFACC{}.
     
    225229One contribution of this thesis, discussed in Section~\ref{conv-cost-sec}, is a number of refinements to this cost model to more efficiently resolve polymorphic function calls.
    226230
    227 The expression resolution problem, which is the focus of Chapter~\ref{resolution-chap}, is to find the unique minimal-cost interpretation of each expression in the program, where all identifiers must be matched to a declaration, and implicit conversions or polymorphic bindings of the result of an expression may increase the cost of the expression.
     231In the context of these implicit conversions, the expression resolution problem can be restated as finding the unique minimal-cost interpretation of each expression in the program, where all identifiers must be matched to a declaration, and implicit conversions or polymorphic bindings of the result of an expression may increase the cost of the expression.
    228232While semantically valid \CFA{} code always has such a unique minimal-cost interpretation, \CFACC{} must also be able to detect and report as errors expressions that have either no interpretation or multiple ambiguous minimal-cost interpretations.
    229233Note that which subexpression interpretation is minimal-cost may require contextual information to disambiguate.
  • doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex

    re71272a rc1f3d1a8  
    22\label{resolution-chap}
    33
    4 % consider using "satisfaction" throughout when talking about assertions
    5 % "valid" instead of "feasible" interpretations
    6 
    74The main task of the \CFACC{} type-checker is \emph{expression resolution}: determining which declarations the identifiers in each expression correspond to.
    8 Resolution is a straightforward task in C, as no declarations share identifiers, but in \CFA{}, the name overloading features discussed in Section~\ref{overloading-sec} generate multiple candidate declarations for each identifier.
     5Resolution is a straightforward task in C, as no simultaneously-visible declarations share identifiers, but in \CFA{}, the name overloading features discussed in Section~\ref{overloading-sec} generate multiple candidate declarations for each identifier.
    96A given matching between identifiers and declarations in an expression is an \emph{interpretation}; an interpretation also includes information about polymorphic type bindings and implicit casts to support the \CFA{} features discussed in Sections~\ref{poly-func-sec} and~\ref{implicit-conv-sec}, each of which increase the number of valid candidate interpretations.
    107To choose among valid interpretations, a \emph{conversion cost} is used to rank interpretations.
     
    129
    1310\section{Expression Resolution}
     11
     12The expression resolution pass in \CFACC{} must traverse the input expression, match identifiers to available declarations, rank candidate interpretations according to their conversion cost, and check type assertion satisfaction for these candidates.
     13Once the set of valid interpretations for the top-level expression has been found, the expression resolver selects the unique minimal-cost candidate or reports an error.
     14
     15The expression resolution problem in \CFA{} is more difficult than the analogous problems in C or \CC{}.
     16As mentioned above, the lack of name overloading in C makes its resolution problem substantially easier, but a comparison of the richer type systems in \CFA{} and \CC{} highlights some of the challenges in \CFA{} expression resolution.
     17The key distinction between \CFA{} and \CC{} resolution is that \CC{} uses a greedy algorithm for selection of candidate functions given their argument interpretations, whereas \CFA{} allows contextual information from superexpressions to influence the choice among candidate functions.
     18One key use of this contextual information is for type inference of polymorphic return types; \CC{} requires explicit specification of template type parameters which only occur in a function's return type, while \CFA{} allows the instantiation of these type parameters to be inferred from context (and in fact does not allow explicit specification of type parameters to a function).
     19Similarly, while both \CFA{} and \CC{} rank candidate functions based on a cost metric for implicit conversions, \CFA{} allows a suboptimal subexpression interpretation to be selected if it allows a lower-cost overall interpretation, while \CC{} requires that each subexpression interpretation have minimal cost.
     20Because of this use of contextual information, the \CFA{} expression resolver must consider multiple interpretations of each function argument, while the \CC{} compiler has only a single interpretation for each argument\footnote{With the exception of address-of operations on functions.}.
     21Additionally, until the introduction of concepts in \CCtwenty{} \cite{C++Concepts}, \CC{} expression resolution has no analogue to \CFA{} assertion satisfaction checking, a further  complication for a \CFA{} compiler.
     22The precise definition of \CFA{} expression resolution in this section further expands on the challenges of this problem.
    1423
    1524\subsection{Type Unification}
     
    2332\subsection{Conversion Cost} \label{conv-cost-sec}
    2433
    25 C does not have an explicit cost model for implicit conversions, but the ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} used to decide which arithmetic operators to use define one implicitly.
     34\CFA{}, like C, allows inexact matches between the type of function parameters and function call arguments.
     35Both languages insert \emph{implicit conversions} in these situations to produce an exact type match, and \CFA{} also uses the relative \emph{cost} of different conversions to select among overloaded function candidates.
     36C does not have an explicit cost model for implicit conversions, but the ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} used to decide which arithmetic operators to apply define one implicitly.
    2637The only context in which C has name overloading is the arithmetic operators, and the usual arithmetic conversions define a \emph{common type} for mixed-type arguments to binary arithmetic operators.
    2738Since for backward-compatibility purposes the conversion costs of \CFA{} must produce an equivalent result to these common type rules, it is appropriate to summarize \cite[\S{}6.3.1.8]{C11} here:
     
    3950Degree of safe conversion is calculated as path weight in a directed graph of safe conversions between types; Bilson's version and the current version of this graph are in Figures~\ref{bilson-conv-fig} and~\ref{extended-conv-fig}, respectively.
    4051The safe conversion graph is designed such that the common type $c$ of two types $u$ and $v$ is compatible with the C standard definitions from \cite[\S{}6.3.1.8]{C11} and can be calculated as the unique type minimizing the sum of the path weights of $\overrightarrow{uc}$ and $\overrightarrow{vc}$.
    41 The following example lists the cost in the Bilson model of calling each of the following functions with two !int! parameters:
     52The following example lists the cost in the Bilson model of calling each of the following functions with two !int! parameters, where the interpretation with the minimum total cost will be selected:
    4253
    4354\begin{cfa}
     
    7384
    7485\begin{cfa}
    75 forall(dtype T | { T& ++?(T&); }) T& advance$\(1\)$(T& i, int n);
    76 forall(dtype T | { T& ++?(T&); T& ?+=?(T&, int)}) T& advance$\(2\)$(T& i, int n);
     86forall(dtype T | { T& ++?(T&); }) T& advance$\(_1\)$(T& i, int n);
     87forall(dtype T | { T& ++?(T&); T& ?+=?(T&, int)}) T& advance$\(_2\)$(T& i, int n);
    7788\end{cfa}
    7889
     
    96107The new cost model accounts for the fact that functions with more polymorphic variables are less constrained by introducing a $var$ cost element that counts the number of type variables on a candidate function.
    97108In the example above, !f!$_1$ has $var = 2$, while the others have $var = 1$.
     109
    98110The new cost model also accounts for a nuance unhandled by Ditchfield or Bilson, in that it makes the more specific !f!$_4$ cheaper than the more generic !f!$_3$; !f!$_4$ is presumably somewhat optimized for handling pointers, but the prior \CFA{} cost model could not account for the more specific binding, as it simply counted the number of polymorphic unifications.
    99 
    100111In the modified model, each level of constraint on a polymorphic type in the parameter list results in a decrement of the $specialization$ cost element, which is shared with the count of assertions due to their common nature as constraints on polymorphic type bindings.
    101112Thus, all else equal, if both a binding to !T! and a binding to !T*! are available, the model chooses the more specific !T*! binding with $specialization = -1$.
     
    150161However, in context, an unsafe conversion is required to downcast the return type of !g!$_1$ to an !int! suitable for !f!, for a total cost of $(1,0,0,0,0,0,0)$ for !f( g!$_1$!(42) )!.
    151162If !g!$_2$ is chosen, on the other hand, there is a safe upcast from the !int! type of !42! to !long!, but no cast on the return of !g!$_2$, for a total cost of $(0,0,1,0,0,0,0)$ for !f( g!$_2$!(42) )!; as this is cheaper, !g!$_2$ is chosen.
    152 Due to this design, all valid interpretations of subexpressions must in general be propagated to the top of the expression tree before any can be eliminated, a lazy form of expression resolution, as opposed to the eager expression resolution allowed by C, where each expression can be resolved given only the resolution of its immediate subexpressions.
     163Due to this design, all valid interpretations of subexpressions must in general be propagated to the top of the expression tree before any can be eliminated, a lazy form of expression resolution, as opposed to the eager expression resolution allowed by C or \CC{}, where each expression can be resolved given only the resolution of its immediate subexpressions.
    153164
    154165If there are no valid interpretations of the top-level expression, expression resolution fails and must produce an appropriate error message.
Note: See TracChangeset for help on using the changeset viewer.