| 1 | \chapter{Resolution Heuristics}
|
|---|
| 2 | \label{resolution-chap}
|
|---|
| 3 |
|
|---|
| 4 | The main task of the \CFACC{} type-checker is \emph{expression resolution}, determining which declarations the identifiers in each expression correspond to.
|
|---|
| 5 | Resolution is a straightforward task in C, as each declaration has a unique identifier, but in \CFA{} the name overloading features discussed in Section~\ref{overloading-sec} generate multiple candidate declarations for each identifier.
|
|---|
| 6 | I refer to a given matching between identifiers and declarations in an expression as an \emph{interpretation}; an interpretation also includes information about polymorphic type bindings and implicit casts to support the \CFA{} features discussed in Sections~\ref{poly-func-sec} and~\ref{implicit-conv-sec}, each of which increase the proportion of feasible candidate interpretations.
|
|---|
| 7 | To choose between feasible interpretations, \CFA{} defines a \emph{conversion cost} to rank interpretations; the expression resolution problem is thus to find the unique minimal-cost interpretation for an expression, reporting an error if no such interpretation exists.
|
|---|
| 8 |
|
|---|
| 9 | \section{Expression Resolution}
|
|---|
| 10 |
|
|---|
| 11 | \subsection{Conversion Cost}
|
|---|
| 12 |
|
|---|
| 13 | C does not have an explicit cost model for implicit conversions, but the ``usual arithmetic conversions''\cit{} used to decide which arithmetic operators to use define one implicitly.
|
|---|
| 14 | Beginning with the work of Bilson\cite{Bilson03}, \CFA{} has defined a \emph{conversion cost} for each function call in a way that generalizes C's conversion rules.
|
|---|
| 15 | Loosely defined, the conversion cost counts the implicit conversions utilized by an interpretation.
|
|---|
| 16 | With more specificity, the cost is a lexicographically-ordered tuple, where each element corresponds to a particular kind of conversion.
|
|---|
| 17 | In Bilson's \CFA{} design, conversion cost is a three-tuple, $(unsafe, poly, safe)$, where $unsafe$ is the count of unsafe (narrowing) conversions, $poly$ is the count of polymorphic type bindings, and $safe$ is the sum of the degree of safe (widening) conversions.
|
|---|
| 18 | The following example lists the cost in the Bilson model of calling each of the following functions with two !int! parameters:
|
|---|
| 19 |
|
|---|
| 20 | \begin{cfa}
|
|---|
| 21 | void f(char, long); $\C{// (1,0,1)}$
|
|---|
| 22 | forall(otype T) void f(T, long); $\C{// (0,1,1)}$
|
|---|
| 23 | void f(long, long); $\C{// (0,0,2)}$
|
|---|
| 24 | void f(int, unsigned long); $\C{// (0,0,2)}$
|
|---|
| 25 | void f(int, long); $\C{// (0,0,1)}$
|
|---|
| 26 | \end{cfa}
|
|---|
| 27 |
|
|---|
| 28 | Note that safe and unsafe conversions are handled differently; \CFA{} counts ``distance'' of safe conversions (\eg{} !int! to !long! is cheaper than !int! to !unsigned long!), while only counting the number of unsafe conversions (\eg{} !int! to !char! and !int! to !short! both have unsafe cost 1).
|
|---|
| 29 | As part of adding reference types to \CFA{} (see Section~\ref{type-features-sec}), Schluntz added a new $reference$ element to the cost tuple, which counts the number of implicit reference-to-rvalue conversions performed so that candidate interpretations can be distinguished by how closely they match the nesting of reference types; since references are meant to act almost indistinguishably from lvalues, this $reference$ element is the least significant in the lexicographic comparison of cost tuples.
|
|---|
| 30 |
|
|---|
| 31 | I have also refined the \CFA{} cost model as part of this thesis work.
|
|---|
| 32 | Bilson's \CFA{} cost model includes the cost of polymorphic type bindings from a function's type assertions in the $poly$ element of the cost tuple; this has the effect of making more-constrained functions more expensive than less-constrained functions.
|
|---|
| 33 | However, type assertions actually make a function \emph{less} polymorphic, and as such functions with more type assertions should be preferred in type resolution.
|
|---|
| 34 | As an example, some iterator-based algorithms can work on a forward iterator that only provides an increment operator, but are more efficient on a random-access iterator that can be incremented by an arbitrary number of steps in a single operation.
|
|---|
| 35 | The random-access iterator has more type constraints, but should be chosen whenever those constraints can be satisfied.
|
|---|
| 36 | As such, I have added a $specialization$ element to the \CFA{} cost tuple, the values of which are always negative.
|
|---|
| 37 | Each type assertion subtracts 1 from $specialization$, so that more-constrained functions will be chosen over less-constrained functions, all else being equal.
|
|---|
| 38 | A more sophisticated design would define a partial order over sets of type assertions by set inclusion (\ie{} one function would only cost less than another if it had all of the same assertions, plus more, rather than just more total assertions), but I did not judge the added complexity of computing and testing this order to be worth the gain in specificity.
|
|---|
| 39 |
|
|---|
| 40 | I have also incorporated an unimplemented aspect of Ditchfield's earlier cost model.
|
|---|
| 41 | In the example below, adapted from \cite[89]{Ditchfield}, Bilson's cost model only distinguished between the first two cases by accounting extra cost for the extra set of !otype! parameters, which, as discussed above, is not a desirable solution:
|
|---|
| 42 |
|
|---|
| 43 | \begin{cfa}
|
|---|
| 44 | forall(otype T, otype U) void f(T, U); $\C{// polymorphic}$
|
|---|
| 45 | forall(otype T) void f(T, T); $\C{// less polymorphic}$
|
|---|
| 46 | forall(otype T) void f(T, int); $\C{// even less polymorphic}$
|
|---|
| 47 | forall(otype T) void f(T*, int); $\C{// least polymorphic}$
|
|---|
| 48 | \end{cfa}
|
|---|
| 49 |
|
|---|
| 50 | I account for the fact that functions with more polymorphic variables are less constrained by introducing a $var$ cost element that counts the number of type variables on a candidate function.
|
|---|
| 51 | In the example above, the first !f! has $var = 2$, while the remainder have $var = 1$.
|
|---|
| 52 | My new \CFA{} cost model also accounts for a nuance un-handled by Ditchfield or Bilson, in that it makes the more specific fourth function above cheaper than the more generic third function.
|
|---|
| 53 | The fourth function is presumably somewhat optimized for handling pointers, but the prior \CFA{} cost model could not account for the more specific binding, as it simply counted the number of polymorphic unifications.
|
|---|
| 54 |
|
|---|
| 55 | In my modified model, each level of constraint on a polymorphic type in the parameter list results in a decrement of the $specialization$ cost element.
|
|---|
| 56 | Thus, all else equal, if both a binding to !T! and a binding to !T*! are available, \CFA{} will pick the more specific !T*! binding.
|
|---|
| 57 | This process is recursive, such that !T**! produces a -2 specialization cost, as opposed to the -1 cost for $T*$.
|
|---|
| 58 | This works similarly for generic types, \eg{} !box(T)! also has specialization cost -1; for multi-argument generic types, the least-specialized polymorphic parameter sets the specialization cost, \eg{} the specialization cost of !pair(T, S*)! is -1 (from !T!) rather than -2 (from !S!).
|
|---|
| 59 | Since the user programmer provides parameters, but cannot provide guidance on return type, specialization cost is not counted for the return type list.
|
|---|
| 60 | The final \CFA{} cost tuple is thus as follows:
|
|---|
| 61 |
|
|---|
| 62 | \begin{equation*}
|
|---|
| 63 | (unsafe, poly, safe, vars, specialization, reference)
|
|---|
| 64 | \end{equation*}
|
|---|
| 65 |
|
|---|
| 66 | \subsection{Expression Cost}
|
|---|
| 67 |
|
|---|
| 68 | Having defined the cost model ...
|
|---|
| 69 |
|
|---|
| 70 | % mention cast expression as "eager"
|
|---|
| 71 |
|
|---|
| 72 | % Mention relevance of work to C++20 concepts
|
|---|