1 | \chapter{Resolution Heuristics} |
---|
2 | \label{resolution-chap} |
---|
3 | |
---|
4 | The main task of the \CFACC{} type-checker is \emph{expression resolution}, determining which declarations the identifiers in each expression correspond to. |
---|
5 | Resolution is a straightforward task in C, as each declaration has a unique identifier, but in \CFA{} the name overloading features discussed in Section~\ref{overloading-sec} generate multiple candidate declarations for each identifier. |
---|
6 | I refer to a given matching between identifiers and declarations in an expression as an \emph{interpretation}; an interpretation also includes information about polymorphic type bindings and implicit casts to support the \CFA{} features discussed in Sections~\ref{poly-func-sec} and~\ref{implicit-conv-sec}, each of which increase the proportion of feasible candidate interpretations. |
---|
7 | To choose between feasible interpretations, \CFA{} defines a \emph{conversion cost} to rank interpretations; the expression resolution problem is thus to find the unique minimal-cost interpretation for an expression, reporting an error if no such interpretation exists. |
---|
8 | |
---|
9 | \section{Expression Resolution} |
---|
10 | |
---|
11 | \subsection{Conversion Cost} |
---|
12 | |
---|
13 | C does not have an explicit cost model for implicit conversions, but the ``usual arithmetic conversions''\cit{} used to decide which arithmetic operators to use define one implicitly. |
---|
14 | Beginning with the work of Bilson\cite{Bilson03}, \CFA{} has defined a \emph{conversion cost} for each function call in a way that generalizes C's conversion rules. |
---|
15 | Loosely defined, the conversion cost counts the implicit conversions utilized by an interpretation. |
---|
16 | With more specificity, the cost is a lexicographically-ordered tuple, where each element corresponds to a particular kind of conversion. |
---|
17 | In Bilson's \CFA{} design, conversion cost is a three-tuple, $(unsafe, poly, safe)$, where $unsafe$ is the count of unsafe (narrowing) conversions, $poly$ is the count of polymorphic type bindings, and $safe$ is the sum of the degree of safe (widening) conversions. |
---|
18 | The following example lists the cost in the Bilson model of calling each of the following functions with two !int! parameters: |
---|
19 | |
---|
20 | \begin{cfa} |
---|
21 | void f(char, long); $\C{// (1,0,1)}$ |
---|
22 | forall(otype T) void f(T, long); $\C{// (0,1,1)}$ |
---|
23 | void f(long, long); $\C{// (0,0,2)}$ |
---|
24 | void f(int, unsigned long); $\C{// (0,0,2)}$ |
---|
25 | void f(int, long); $\C{// (0,0,1)}$ |
---|
26 | \end{cfa} |
---|
27 | |
---|
28 | Note that safe and unsafe conversions are handled differently; \CFA{} counts ``distance'' of safe conversions (\eg{} !int! to !long! is cheaper than !int! to !unsigned long!), while only counting the number of unsafe conversions (\eg{} !int! to !char! and !int! to !short! both have unsafe cost 1). |
---|
29 | As part of adding reference types to \CFA{} (see Section~\ref{type-features-sec}), Schluntz added a new $reference$ element to the cost tuple, which counts the number of implicit reference-to-rvalue conversions performed so that candidate interpretations can be distinguished by how closely they match the nesting of reference types; since references are meant to act almost indistinguishably from lvalues, this $reference$ element is the least significant in the lexicographic comparison of cost tuples. |
---|
30 | |
---|
31 | I have also refined the \CFA{} cost model as part of this thesis work. |
---|
32 | Bilson's \CFA{} cost model includes the cost of polymorphic type bindings from a function's type assertions in the $poly$ element of the cost tuple; this has the effect of making more-constrained functions more expensive than less-constrained functions. |
---|
33 | However, type assertions actually make a function \emph{less} polymorphic, and as such functions with more type assertions should be preferred in type resolution. |
---|
34 | As an example, some iterator-based algorithms can work on a forward iterator that only provides an increment operator, but are more efficient on a random-access iterator that can be incremented by an arbitrary number of steps in a single operation. |
---|
35 | The random-access iterator has more type constraints, but should be chosen whenever those constraints can be satisfied. |
---|
36 | As such, I have added a $specialization$ element to the \CFA{} cost tuple, the values of which are always negative. |
---|
37 | Each type assertion subtracts 1 from $specialization$, so that more-constrained functions will be chosen over less-constrained functions, all else being equal. |
---|
38 | A more sophisticated design would define a partial order over sets of type assertions by set inclusion (\ie{} one function would only cost less than another if it had all of the same assertions, plus more, rather than just more total assertions), but I did not judge the added complexity of computing and testing this order to be worth the gain in specificity. |
---|
39 | |
---|
40 | I have also incorporated an unimplemented aspect of Ditchfield's earlier cost model. |
---|
41 | In the example below, adapted from \cite[89]{Ditchfield}, Bilson's cost model only distinguished between the first two cases by accounting extra cost for the extra set of !otype! parameters, which, as discussed above, is not a desirable solution: |
---|
42 | |
---|
43 | \begin{cfa} |
---|
44 | forall(otype T, otype U) void f(T, U); $\C{// polymorphic}$ |
---|
45 | forall(otype T) void f(T, T); $\C{// less polymorphic}$ |
---|
46 | forall(otype T) void f(T, int); $\C{// even less polymorphic}$ |
---|
47 | forall(otype T) void f(T*, int); $\C{// least polymorphic}$ |
---|
48 | \end{cfa} |
---|
49 | |
---|
50 | I account for the fact that functions with more polymorphic variables are less constrained by introducing a $var$ cost element that counts the number of type variables on a candidate function. |
---|
51 | In the example above, the first !f! has $var = 2$, while the remainder have $var = 1$. |
---|
52 | My new \CFA{} cost model also accounts for a nuance un-handled by Ditchfield or Bilson, in that it makes the more specific fourth function above cheaper than the more generic third function. |
---|
53 | The fourth function is presumably somewhat optimized for handling pointers, but the prior \CFA{} cost model could not account for the more specific binding, as it simply counted the number of polymorphic unifications. |
---|
54 | |
---|
55 | In my modified model, each level of constraint on a polymorphic type in the parameter list results in a decrement of the $specialization$ cost element. |
---|
56 | Thus, all else equal, if both a binding to !T! and a binding to !T*! are available, \CFA{} will pick the more specific !T*! binding. |
---|
57 | This process is recursive, such that !T**! produces a -2 specialization cost, as opposed to the -1 cost for $T*$. |
---|
58 | This works similarly for generic types, \eg{} !box(T)! also has specialization cost -1; for multi-argument generic types, the least-specialized polymorphic parameter sets the specialization cost, \eg{} the specialization cost of !pair(T, S*)! is -1 (from !T!) rather than -2 (from !S!). |
---|
59 | Since the user programmer provides parameters, but cannot provide guidance on return type, specialization cost is not counted for the return type list. |
---|
60 | The final \CFA{} cost tuple is thus as follows: |
---|
61 | |
---|
62 | \begin{equation*} |
---|
63 | (unsafe, poly, safe, vars, specialization, reference) |
---|
64 | \end{equation*} |
---|
65 | |
---|
66 | \subsection{Expression Cost} |
---|
67 | |
---|
68 | Having defined the cost model ... |
---|
69 | |
---|
70 | % mention cast expression as "eager" |
---|
71 | |
---|
72 | % Mention relevance of work to C++20 concepts |
---|