source: doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex @ 2240ec4

Last change on this file since 2240ec4 was 2240ec4, checked in by Aaron Moss <a3moss@…>, 5 years ago

Thesis: further elaboration of cost model

  • Property mode set to 100644
File size: 6.9 KB
1\chapter{Resolution Heuristics}
4The main task of the \CFACC{} type-checker is \emph{expression resolution}, determining which declarations the identifiers in each expression correspond to.
5Resolution is a straightforward task in C, as each declaration has a unique identifier, but in \CFA{} the name overloading features discussed in Section~\ref{overloading-sec} generate multiple candidate declarations for each identifier.
6I refer to a given matching between identifiers and declarations in an expression as an \emph{interpretation}; an interpretation also includes information about polymorphic type bindings and implicit casts to support the \CFA{} features discussed in Sections~\ref{poly-func-sec} and~\ref{implicit-conv-sec}, each of which increase the proportion of feasible candidate interpretations.
7To choose between feasible interpretations, \CFA{} defines a \emph{conversion cost} to rank interpretations; the expression resolution problem is thus to find the unique minimal-cost interpretation for an expression, reporting an error if no such interpretation exists.
9\section{Expression Resolution}
11\subsection{Conversion Cost}
13C does not have an explicit cost model for implicit conversions, but the ``usual arithmetic conversions''\cit{} used to decide which arithmetic operators to use define one implicitly.
14Beginning with the work of Bilson\cite{Bilson03}, \CFA{} has defined a \emph{conversion cost} for each function call in a way that generalizes C's conversion rules.
15Loosely defined, the conversion cost counts the implicit conversions utilized by an interpretation.
16With more specificity, the cost is a lexicographically-ordered tuple, where each element corresponds to a particular kind of conversion.
17In Bilson's \CFA{} design, conversion cost is a three-tuple, $(unsafe, poly, safe)$, where $unsafe$ is the count of unsafe (narrowing) conversions, $poly$ is the count of polymorphic type bindings, and $safe$ is the sum of the degree of safe (widening) conversions.
18The following example lists the cost in the Bilson model of calling each of the following functions with two !int! parameters:
21void f(char, long); $\C{// (1,0,1)}$
22forall(otype T) void f(T, long); $\C{// (0,1,1)}$
23void f(long, long); $\C{// (0,0,2)}$
24void f(int, unsigned long); $\C{// (0,0,2)}$
25void f(int, long); $\C{// (0,0,1)}$
28Note that safe and unsafe conversions are handled differently; \CFA{} counts ``distance'' of safe conversions (\eg{} !int! to !long! is cheaper than !int! to !unsigned long!), while only counting the number of unsafe conversions (\eg{} !int! to !char! and !int! to !short! both have unsafe cost 1).
29As part of adding reference types to \CFA{} (see Section~\ref{type-features-sec}), Schluntz added a new $reference$ element to the cost tuple, which counts the number of implicit reference-to-rvalue conversions performed so that candidate interpretations can be distinguished by how closely they match the nesting of reference types; since references are meant to act almost indistinguishably from lvalues, this $reference$ element is the least significant in the lexicographic comparison of cost tuples.
31I have also refined the \CFA{} cost model as part of this thesis work.
32Bilson's \CFA{} cost model includes the cost of polymorphic type bindings from a function's type assertions in the $poly$ element of the cost tuple; this has the effect of making more-constrained functions more expensive than less-constrained functions.
33However, type assertions actually make a function \emph{less} polymorphic, and as such functions with more type assertions should be preferred in type resolution.
34As an example, some iterator-based algorithms can work on a forward iterator that only provides an increment operator, but are more efficient on a random-access iterator that can be incremented by an arbitrary number of steps in a single operation.
35The random-access iterator has more type constraints, but should be chosen whenever those constraints can be satisfied.
36As such, I have added a $specialization$ element to the \CFA{} cost tuple, the values of which are always negative.
37Each type assertion subtracts 1 from $specialization$, so that more-constrained functions will be chosen over less-constrained functions, all else being equal.
38A more sophisticated design would define a partial order over sets of type assertions by set inclusion (\ie{} one function would only cost less than another if it had all of the same assertions, plus more, rather than just more total assertions), but I did not judge the added complexity of computing and testing this order to be worth the gain in specificity.
40I have also incorporated an unimplemented aspect of Ditchfield's earlier cost model.
41In the example below, adapted from \cite[89]{Ditchfield}, Bilson's cost model only distinguished between the first two cases by accounting extra cost for the extra set of !otype! parameters, which, as discussed above, is not a desirable solution:
44forall(otype T, otype U) void f(T, U);  $\C{// polymorphic}$
45forall(otype T) void f(T, T);  $\C{// less polymorphic}$
46forall(otype T) void f(T, int);  $\C{// even less polymorphic}$
47forall(otype T) void f(T*, int); $\C{// least polymorphic}$
50I account for the fact that functions with more polymorphic variables are less constrained by introducing a $var$ cost element that counts the number of type variables on a candidate function.
51In the example above, the first !f! has $var = 2$, while the remainder have $var = 1$.
52My new \CFA{} cost model also accounts for a nuance un-handled by Ditchfield or Bilson, in that it makes the more specific fourth function above cheaper than the more generic third function.
53The fourth function is presumably somewhat optimized for handling pointers, but the prior \CFA{} cost model could not account for the more specific binding, as it simply counted the number of polymorphic unifications.
55In my modified model, each level of constraint on a polymorphic type in the parameter list results in a decrement of the $specialization$ cost element.
56Thus, all else equal, if both a binding to !T! and a binding to !T*! are available, \CFA{} will pick the more specific !T*! binding.
57This process is recursive, such that !T**! produces a -2 specialization cost, as opposed to the -1 cost for $T*$.
58This works similarly for generic types, \eg{} !box(T)! also has specialization cost -1; for multi-argument generic types, the least-specialized polymorphic parameter sets the specialization cost, \eg{} the specialization cost of !pair(T, S*)! is -1 (from !T!) rather than -2 (from !S!).
59Since the user programmer provides parameters, but cannot provide guidance on return type, specialization cost is not counted for the return type list.
60The final \CFA{} cost tuple is thus as follows:
63        (unsafe, poly, safe, vars, specialization, reference)
66\subsection{Expression Cost}
68Having defined the cost model ...
70% mention cast expression as "eager"
72% Mention relevance of work to C++20 concepts
Note: See TracBrowser for help on using the repository browser.