| [41634098] | 1 | ## User-defined Conversions ## | 
|---|
|  | 2 | C's implicit "usual arithmetic conversions" define a structure among the | 
|---|
|  | 3 | built-in types consisting of _unsafe_ narrowing conversions and a hierarchy of | 
|---|
|  | 4 | _safe_ widening conversions. | 
|---|
|  | 5 | There is also a set of _explicit_ conversions that are only allowed through a | 
|---|
|  | 6 | cast expression. | 
|---|
|  | 7 | Based on Glen's notes on conversions [1], I propose that safe and unsafe | 
|---|
|  | 8 | conversions be expressed as constructor variants, though I make explicit | 
|---|
|  | 9 | (cast) conversions a constructor variant as well rather than a dedicated | 
|---|
|  | 10 | operator. | 
|---|
|  | 11 | Throughout this article, I will use the following operator names for | 
|---|
|  | 12 | constructors and conversion functions from `From` to `To`: | 
|---|
|  | 13 |  | 
|---|
|  | 14 | void ?{} ( To*, To );            // copy constructor | 
|---|
|  | 15 | void ?{} ( To*, From );          // explicit constructor | 
|---|
|  | 16 | void ?{explicit} ( To*, From );  // explicit cast conversion | 
|---|
|  | 17 | void ?{safe} ( To*, From );      // implicit safe conversion | 
|---|
|  | 18 | void ?{unsafe} ( To*, From );    // implicit unsafe conversion | 
|---|
|  | 19 |  | 
|---|
|  | 20 | [1] http://plg.uwaterloo.ca/~cforall/Conversions/index.html | 
|---|
|  | 21 |  | 
|---|
|  | 22 | Glen's design made no distinction between constructors and unsafe implicit | 
|---|
|  | 23 | conversions; this is elegant, but interacts poorly with tuples. | 
|---|
|  | 24 | Essentially, without making this distinction, a constructor like the following | 
|---|
|  | 25 | would add an interpretation of any two `int`s as a `Coord`, needlessly | 
|---|
|  | 26 | multiplying the space of possible interpretations of all functions: | 
|---|
|  | 27 |  | 
|---|
|  | 28 | void ?{}( Coord *this, int x, int y ); | 
|---|
|  | 29 |  | 
|---|
|  | 30 | That said, it would certainly be possible to make a multiple-argument implicit | 
|---|
|  | 31 | conversion, as below, though the argument above suggests this option should be | 
|---|
|  | 32 | used infrequently: | 
|---|
|  | 33 |  | 
|---|
|  | 34 | void ?{unsafe}( Coord *this, int x, int y ); | 
|---|
|  | 35 |  | 
|---|
|  | 36 | An alternate possibility would be to only count two-arg constructors | 
|---|
|  | 37 | `void ?{} ( To*, From )` as unsafe conversions; under this semantics, safe and | 
|---|
|  | 38 | explicit conversions should also have a compiler-enforced restriction to | 
|---|
|  | 39 | ensure that they are two-arg functions (this restriction may be valuable | 
|---|
|  | 40 | regardless). | 
|---|
|  | 41 |  | 
|---|
|  | 42 | Regardless of syntax, there should be a type assertion that expresses `From` | 
|---|
|  | 43 | is convertable to `To`. | 
|---|
|  | 44 | If user-defined conversions are not added to the language, | 
|---|
|  | 45 | `void ?{} ( To*, From )` may be a suitable representation, relying on | 
|---|
|  | 46 | conversions on the argument types to account for transitivity. | 
|---|
|  | 47 | On the other hand, `To*` should perhaps match its target type exactly, so | 
|---|
|  | 48 | another assertion syntax specific to conversions may be required, e.g. | 
|---|
|  | 49 | `From -> To`. | 
|---|
|  | 50 |  | 
|---|
|  | 51 | ### Constructor Idiom ### | 
|---|
|  | 52 | Basing our notion of conversions off otherwise normal Cforall functions means | 
|---|
|  | 53 | that we can use the full range of Cforall features for conversions, including | 
|---|
|  | 54 | polymorphism. | 
|---|
|  | 55 | Glen [1] defines a _constructor idiom_ that can be used to create chains of | 
|---|
|  | 56 | safe conversions without duplicating code; given a type `Safe` which members | 
|---|
|  | 57 | of another type `From` can be directly converted to, the constructor idiom | 
|---|
|  | 58 | allows us to write a conversion for any type `To` which `Safe` converts to: | 
|---|
|  | 59 |  | 
|---|
|  | 60 | forall(otype To | { void ?{safe}( To*, Safe ) }) | 
|---|
|  | 61 | void ?{safe}( To *this, From that ) { | 
|---|
|  | 62 | Safe tmp = /* some expression involving that */; | 
|---|
|  | 63 | *this = tmp; // uses assertion parameter | 
|---|
|  | 64 | } | 
|---|
|  | 65 |  | 
|---|
|  | 66 | This idiom can also be used with only minor variations for a parallel set of | 
|---|
|  | 67 | unsafe conversions. | 
|---|
|  | 68 |  | 
|---|
|  | 69 | What selective non-use of the constructor idiom gives us is the ability to | 
|---|
|  | 70 | define a conversion that may only be the *last* conversion in a chain of such. | 
|---|
|  | 71 | Constructing a conversion graph able to unambiguously represent the full | 
|---|
|  | 72 | hierarchy of implicit conversions in C is provably impossible using only | 
|---|
|  | 73 | single-step conversions with no additional information (see Appendix A), but | 
|---|
|  | 74 | this mechanism is sufficiently powerful (see [1], though the design there has | 
|---|
|  | 75 | some minor bugs; the general idea is to use the constructor idiom to define | 
|---|
|  | 76 | two chains of conversions, one among the signed integral types, another among | 
|---|
|  | 77 | the unsigned, and to use monomorphic conversions to allow conversions between | 
|---|
|  | 78 | signed and unsigned integer types). | 
|---|
|  | 79 |  | 
|---|
|  | 80 | ### Appendix A: Partial and Total Orders ### | 
|---|
|  | 81 | The `<=` relation on integers is a commonly known _total order_, and | 
|---|
|  | 82 | intuitions based on how it works generally apply well to other total orders. | 
|---|
|  | 83 | Formally, a total order is some binary relation `<=` over a set `S` such that | 
|---|
|  | 84 | for any two members `a` and `b` of `S`, `a <= b` or `b <= a` (if both, `a` and | 
|---|
|  | 85 | `b` must be equal, the _antisymmetry_ property); total orders also have a | 
|---|
|  | 86 | _transitivity_ property, that if `a <= b` and `b <= c`, then `a <= c`. | 
|---|
|  | 87 | If `a` and `b` are distinct elements and `a <= b`, we may write `a < b`. | 
|---|
|  | 88 |  | 
|---|
|  | 89 | A _partial order_ is a generalization of this concept where the `<=` relation | 
|---|
|  | 90 | is not required to be defined over all pairs of elements in `S` (though there | 
|---|
|  | 91 | is a _reflexivity_ requirement that for all `a` in `S`, `a <= a`); in other | 
|---|
|  | 92 | words, it is possible for two elements `a` and `b` of `S` to be | 
|---|
|  | 93 | _incomparable_, unable to be ordered with respect to one another (any `a` and | 
|---|
|  | 94 | `b` for which either `a <= b` or `b <= a` are called _comparable_). | 
|---|
|  | 95 | Antisymmetry and transitivity are also required for a partial order, so all | 
|---|
|  | 96 | total orders are also partial orders by definition. | 
|---|
|  | 97 | One fairly natural partial order is the "subset of" relation over sets from | 
|---|
|  | 98 | the same universe; `{ }` is a subset of both `{ 1 }` and `{ 2 }`, which are | 
|---|
|  | 99 | both subsets of `{ 1, 2 }`, but neither `{ 1 }` nor `{ 2 }` is a subset of the | 
|---|
|  | 100 | other - they are incomparable under this relation. | 
|---|
|  | 101 |  | 
|---|
|  | 102 | We can compose two (or more) partial orders to produce a new partial order on | 
|---|
|  | 103 | tuples drawn from both (or all the) sets. | 
|---|
|  | 104 | For example, given `a` and `c` from set `S` and `b` and `d` from set `R`, | 
|---|
|  | 105 | where both `S` and `R` both have partial orders defined on them, we can define | 
|---|
|  | 106 | a ordering relation between `(a, b)` and `(c, d)`. | 
|---|
|  | 107 | One common order is the _lexicographical order_, where `(a, b) <= (c, d)` iff | 
|---|
|  | 108 | `a < c` or both `a = c` and `b <= d`; this can be thought of as ordering by | 
|---|
|  | 109 | the first set and "breaking ties" by the second set. | 
|---|
|  | 110 | Another common order is the _product order_, which can be roughly thought of | 
|---|
|  | 111 | as "all the components are ordered the same way"; formally `(a, b) <= (c, d)` | 
|---|
|  | 112 | iff `a <= c` and `b <= d`. | 
|---|
|  | 113 | One difference between the lexicographical order and the product order is that | 
|---|
|  | 114 | in the lexicographical order if both `a` and `c` and `b` and `d` are | 
|---|
|  | 115 | comparable then `(a, b)` and `(c, d)` will be comparable, while in the product | 
|---|
|  | 116 | order you can have `a <= c` and `d <= b` (both comparable) which will make | 
|---|
|  | 117 | `(a, b)` and `(c, d)` incomparable. | 
|---|
|  | 118 | The product order, on the other hand, has the benefit of not prioritizing one | 
|---|
|  | 119 | order over the other. | 
|---|
|  | 120 |  | 
|---|
|  | 121 | Any partial order has a natural representation as a directed acyclic graph | 
|---|
|  | 122 | (DAG). | 
|---|
|  | 123 | Each element `a` of the set becomes a node of the DAG, with an arc pointing to | 
|---|
|  | 124 | its _covering_ elements, any element `b` such that `a < b` but where there is | 
|---|
|  | 125 | no `c` such that `a < c` and `c < b`. | 
|---|
|  | 126 | Intuitively, the covering elements are the "next ones larger", where you can't | 
|---|
|  | 127 | fit another element between the two. | 
|---|
|  | 128 | Under this construction, `a < b` is equivalent to "there is a path from `a` to | 
|---|
|  | 129 | `b` in the DAG", and the lack of cycles in the directed graph is ensured by | 
|---|
|  | 130 | the antisymmetry property of the partial order. | 
|---|
|  | 131 |  | 
|---|
|  | 132 | Partial orders can be generalized to _preorders_ by removing the antisymmetry | 
|---|
|  | 133 | property. | 
|---|
|  | 134 | In a preorder the relation is generally called `<~`, and it is possible for | 
|---|
|  | 135 | two distict elements `a` and `b` to have `a <~ b` and `b <~ a` - in this case | 
|---|
|  | 136 | we write `a ~ b`; `a <~ b` and not `a ~ b` is written `a < b`. | 
|---|
|  | 137 | Preorders may also be represented as directed graphs, but in this case the | 
|---|
|  | 138 | graph may contain cycles. | 
|---|
|  | 139 |  | 
|---|
|  | 140 | ### Appendix B: Building a Conversion Graph from Un-annotated Single Steps ### | 
|---|
|  | 141 | The short answer is that it's impossible. | 
|---|
|  | 142 |  | 
|---|
|  | 143 | The longer answer is that it has to do with what's essentially a diamond | 
|---|
|  | 144 | inheritance problem. | 
|---|
|  | 145 | In C, `int` converts to `unsigned int` and also `long` "safely"; both convert | 
|---|
|  | 146 | to `unsigned long` safely, and it's possible to chain the conversions to | 
|---|
|  | 147 | convert `int` to `unsigned long`. | 
|---|
|  | 148 | There are two constraints here; one is that the `int` to `unsigned long` | 
|---|
|  | 149 | conversion needs to cost more than the other two (because the types aren't as | 
|---|
|  | 150 | "close" in a very intuitive fashion), and the other is that the system needs a | 
|---|
|  | 151 | way to choose which path to take to get to the destination type. | 
|---|
|  | 152 | Now, a fairly natural solution for this would be to just say "C knows how to | 
|---|
|  | 153 | convert from `int` to `unsigned long`, so we just put in a direct conversion | 
|---|
|  | 154 | and make the compiler smart enough to figure out the costs" - this is the | 
|---|
|  | 155 | approach taken by the existing compipler, but given that in a user-defined | 
|---|
|  | 156 | conversion proposal the users can build an arbitrary graph of conversions, | 
|---|
|  | 157 | this case still needs to be handled. | 
|---|
|  | 158 |  | 
|---|
|  | 159 | We can define a preorder over the types by saying that `a <~ b` if there | 
|---|
|  | 160 | exists a chain of conversions from `a` to `b` (see Appendix A for description | 
|---|
|  | 161 | of preorders and related constructs). | 
|---|
|  | 162 | This preorder corresponds roughly to a more usual type-theoretic concept of | 
|---|
|  | 163 | subtyping ("if I can convert `a` to `b`, `a` is a more specific type than | 
|---|
|  | 164 | `b`"); however, since this graph is arbitrary, it may contain cycles, so if | 
|---|
|  | 165 | there is also a path to convert `b` to `a` they are in some sense equivalently | 
|---|
|  | 166 | specific. | 
|---|
|  | 167 |  | 
|---|
|  | 168 | Now, to compare the cost of two conversion chains `(s, x1, x2, ... xn)` and | 
|---|
|  | 169 | `(s, y1, y2, ... ym)`, we have both the length of the chains (`n` versus `m`) | 
|---|
|  | 170 | and this conversion preorder over the destination types `xn` and `ym`. | 
|---|
|  | 171 | We could define a preorder by taking chain length and breaking ties by the | 
|---|
|  | 172 | conversion preorder, but this would lead to unexpected behaviour when closing | 
|---|
|  | 173 | diamonds with an arm length of longer than 1. | 
|---|
|  | 174 | Consider a set of types `A`, `B1`, `B2`, `C` with the arcs `A->B1`, `B1->B2`, | 
|---|
|  | 175 | `B2->C`, and `A->C`. | 
|---|
|  | 176 | If we are comparing conversions from `A` to both `B2` and `C`, we expect the | 
|---|
|  | 177 | conversion to `B2` to be chosen because it's the more specific type under the | 
|---|
|  | 178 | conversion preorder, but since its chain length is longer than the conversion | 
|---|
|  | 179 | to `C`, it loses and `C` is chosen. | 
|---|
|  | 180 | However, taking the conversion preorder and breaking ties or ambiguities by | 
|---|
|  | 181 | chain length also doesn't work, because of cases like the following example | 
|---|
|  | 182 | where the transitivity property is broken and we can't find a global maximum: | 
|---|
|  | 183 |  | 
|---|
|  | 184 | `X->Y1->Y2`, `X->Z1->Z2->Z3->W`, `X->W` | 
|---|
|  | 185 |  | 
|---|
|  | 186 | In this set of arcs, if we're comparing conversions from `X` to each of `Y2`, | 
|---|
|  | 187 | `Z3` and `W`, converting to `Y2` is cheaper than converting to `Z3`, because | 
|---|
|  | 188 | there are no conversions between `Y2` and `Z3`, and `Y2` has the shorter chain | 
|---|
|  | 189 | length. | 
|---|
|  | 190 | Also, comparing conversions from `X` to `Z3` and to `W`, we find that the | 
|---|
|  | 191 | conversion to `Z3` is cheaper, because `Z3 < W` by the conversion preorder, | 
|---|
|  | 192 | and so is considered to be the nearer type. | 
|---|
|  | 193 | By transitivity, then, the conversion from `X` to `Y2` should be cheaper than | 
|---|
|  | 194 | the conversion from `X` to `W`, but in this case the `X` and `W` are | 
|---|
|  | 195 | incomparable by the conversion preorder, so the tie is broken by the shorter | 
|---|
|  | 196 | path from `X` to `W` in favour of `W`, contradicting the transitivity property | 
|---|
|  | 197 | for this proposed order. | 
|---|
|  | 198 |  | 
|---|
|  | 199 | Without transitivity, we would need to compare all pairs of conversions, which | 
|---|
|  | 200 | would be expensive, and possibly not yield a minimal-cost conversion even if | 
|---|
|  | 201 | all pairs were comparable. | 
|---|
|  | 202 | In short, this ordering is infeasible, and by extension I believe any ordering | 
|---|
|  | 203 | composed solely of single-step conversions between types with no further | 
|---|
|  | 204 | user-supplied information will be insufficiently powerful to express the | 
|---|
|  | 205 | built-in conversions between C's types. | 
|---|