source: doc/proposals/user_conversions.md@ 0a616e0

ADT aaron-thesis arm-eh ast-experimental cleanup-dtors enum forall-pointer-decay jacob/cs343-translation jenkins-sandbox new-ast new-ast-unique-expr pthread-emulation qualifiedEnum
Last change on this file since 0a616e0 was 6eb131c, checked in by Aaron Moss <a3moss@…>, 7 years ago

Proposal documents for user-defined conversions

  • pull Glen's old conversions document into working docs
  • update user_conversions.md to reference new location
  • Property mode set to 100644
File size: 12.7 KB
Line 
1## User-defined Conversions ##
2C's implicit "usual arithmetic conversions" define a structure among the
3built-in types consisting of _unsafe_ narrowing conversions and a hierarchy of
4_safe_ widening conversions.
5There is also a set of _explicit_ conversions that are only allowed through a
6cast expression.
7I propose that safe, unsafe, and explicit (cast) conversions be expressed as
8constructor variants.
9Throughout this article, I will use the following operator names for
10constructors and conversion functions from `From` to `To`:
11
12 void ?{} ( To&, To ); // copy constructor
13 void ?{} ( To&, From ); // explicit constructor
14 void ?{explicit} ( To&, From ); // explicit cast conversion
15 void ?{safe} ( To&, From ); // implicit safe conversion
16 void ?{unsafe} ( To&, From ); // implicit unsafe conversion
17
18It has been suggested that all constructors would define unsafe implicit
19conversions; this is elegant, but interacts poorly with tuples.
20Essentially, without making this distinction, a constructor like the following
21would add an interpretation of any two `int`s as a `Coord`, needlessly
22multiplying the space of possible interpretations of all functions:
23
24 void ?{}( Coord& this, int x, int y );
25
26That said, it would certainly be possible to make a multiple-argument implicit
27conversion, as below, though the argument above suggests this option should be
28used infrequently:
29
30 void ?{unsafe}( Coord& this, int x, int y );
31
32An alternate possibility would be to only count two-arg constructors
33`void ?{} ( To&, From )` as unsafe conversions; under this semantics, safe and
34explicit conversions should also have a compiler-enforced restriction to
35ensure that they are two-arg functions (this restriction may be valuable
36regardless).
37
38Regardless of syntax, there should be a type assertion that expresses `From`
39is convertable to `To`.
40If user-defined conversions are not added to the language,
41`void ?{} ( To&, From )` may be a suitable representation, relying on
42conversions on the argument types to account for transitivity.
43Since `To&` should be an exact match on `To`, this should put all the implicit
44conversions on the RHS.
45On the other hand, under some models (like [1]), implicit conversions are not
46allowed in assertion parameters, so another assertion syntax specific to
47conversions may be required, e.g. `From -> To`.
48It has also been suggested that, for programmer control, no implicit
49conversions (except, possibly, for polymorphic specialization) should be
50allowed in resolution of cast operators.
51
52[1] ../working/assertion_resolution.md
53
54### Constructor Idiom ###
55Basing our notion of conversions off otherwise normal Cforall functions means
56that we can use the full range of Cforall features for conversions, including
57polymorphism.
58In an earlier version of this proposal, Glen Ditchfield defines a
59_constructor idiom_ that can be used to create chains of safe conversions
60without duplicating code; given a type `Safe` which members of another type
61`From` can be directly converted to, the constructor idiom allows us to write
62a conversion for any type `To` which `Safe` converts to:
63
64 forall(otype To | { void ?{safe}( To&, Safe ) })
65 void ?{safe}( To& this, From that ) {
66 Safe tmp = /* some expression involving that */;
67 this{ tmp }; // initialize from assertion parameter
68 }
69
70This idiom can also be used with only minor variations for a parallel set of
71unsafe conversions.
72
73Glen's original suggestion said the copy constructor for `To` should also be
74accepted as a resolution for `void ?{safe}( To&, Safe )` (`Safe` == `To`),
75allowing this same code to be used for the single-step conversion as well.
76This proposal does come at the cost of an extra copy initialization of the
77target value, though.
78
79Contrariwise, if a monomorphic conversion from `From` to `Safe` is written,
80e.g:
81
82 void ?{safe}( Safe& this, From that ) {
83 this{ /* some parameters involving that */ };
84 }
85
86Then the code for a transitive conversion from `From` to any `To` type
87convertable from `Safe` is written:
88
89 forall(otype To | { void ?{safe}( To&, Safe ) })
90 void ?{safe}( To& this, From that ) {
91 Safe tmp = that; // uses monomorphic conversion
92 this{ tmp }; // initialize from assertion parameter
93 }
94
95Given the entirely-boilerplate nature of this code, but negative performance
96implications of the unmodified constructor idiom, it might be fruitful to have
97transitive and single step conversion operators, and let CFA build the
98transitive conversions; some possible names:
99
100 void ?{safe} (To&, From); void ?{final safe} (To&, From); // single-step
101 void ?{safe*} (To&, From); void ?{safe} (To&, From); // transitive
102
103What selective non-use of the constructor idiom gives us is the ability to
104define a conversion that may only be the *last* conversion in a chain of such.
105One use for this is to solve the problem that `explicit` conversions were
106added to C++ for, that of conversions to `bool` chaining to become conversions
107to any arithmetic type.
108Another use is to unambiguously represent the full hierarchy of implicit
109conversions in C by making sign conversions non-transitive, allowing the
110compiler to resolve e.g. `int -> unsigned long` as
111`int -> long -> unsigned long` over `int -> unsigned int -> unsigned long`.
112See [2] for more details.
113
114[2] ../working/glen_conversions/index.html#usual
115
116### Appendix A: Partial and Total Orders ###
117The `<=` relation on integers is a commonly known _total order_, and
118intuitions based on how it works generally apply well to other total orders.
119Formally, a total order is some binary relation `<=` over a set `S` such that
120for any two members `a` and `b` of `S`, `a <= b` or `b <= a` (if both, `a` and
121`b` must be equal, the _antisymmetry_ property); total orders also have a
122_transitivity_ property, that if `a <= b` and `b <= c`, then `a <= c`.
123If `a` and `b` are distinct elements and `a <= b`, we may write `a < b`.
124
125A _partial order_ is a generalization of this concept where the `<=` relation
126is not required to be defined over all pairs of elements in `S` (though there
127is a _reflexivity_ requirement that for all `a` in `S`, `a <= a`); in other
128words, it is possible for two elements `a` and `b` of `S` to be
129_incomparable_, unable to be ordered with respect to one another (any `a` and
130`b` for which either `a <= b` or `b <= a` are called _comparable_).
131Antisymmetry and transitivity are also required for a partial order, so all
132total orders are also partial orders by definition.
133One fairly natural partial order is the "subset of" relation over sets from
134the same universe; `{ }` is a subset of both `{ 1 }` and `{ 2 }`, which are
135both subsets of `{ 1, 2 }`, but neither `{ 1 }` nor `{ 2 }` is a subset of the
136other - they are incomparable under this relation.
137
138We can compose two (or more) partial orders to produce a new partial order on
139tuples drawn from both (or all the) sets.
140For example, given `a` and `c` from set `S` and `b` and `d` from set `R`,
141where both `S` and `R` both have partial orders defined on them, we can define
142a ordering relation between `(a, b)` and `(c, d)`.
143One common order is the _lexicographical order_, where `(a, b) <= (c, d)` iff
144`a < c` or both `a = c` and `b <= d`; this can be thought of as ordering by
145the first set and "breaking ties" by the second set.
146Another common order is the _product order_, which can be roughly thought of
147as "all the components are ordered the same way"; formally `(a, b) <= (c, d)`
148iff `a <= c` and `b <= d`.
149One difference between the lexicographical order and the product order is that
150in the lexicographical order if both `a` and `c` and `b` and `d` are
151comparable then `(a, b)` and `(c, d)` will be comparable, while in the product
152order you can have `a <= c` and `d <= b` (both comparable) which will make
153`(a, b)` and `(c, d)` incomparable.
154The product order, on the other hand, has the benefit of not prioritizing one
155order over the other.
156
157Any partial order has a natural representation as a directed acyclic graph
158(DAG).
159Each element `a` of the set becomes a node of the DAG, with an arc pointing to
160its _covering_ elements, any element `b` such that `a < b` but where there is
161no `c` such that `a < c` and `c < b`.
162Intuitively, the covering elements are the "next ones larger", where you can't
163fit another element between the two.
164Under this construction, `a < b` is equivalent to "there is a path from `a` to
165`b` in the DAG", and the lack of cycles in the directed graph is ensured by
166the antisymmetry property of the partial order.
167
168Partial orders can be generalized to _preorders_ by removing the antisymmetry
169property.
170In a preorder the relation is generally called `<~`, and it is possible for
171two distict elements `a` and `b` to have `a <~ b` and `b <~ a` - in this case
172we write `a ~ b`; `a <~ b` and not `a ~ b` is written `a < b`.
173Preorders may also be represented as directed graphs, but in this case the
174graph may contain cycles.
175
176### Appendix B: Building a Conversion Graph from Un-annotated Single Steps ###
177The short answer is that it's impossible.
178
179The longer answer is that it has to do with what's essentially a diamond
180inheritance problem.
181In C, `int` converts to `unsigned int` and also `long` "safely"; both convert
182to `unsigned long` safely, and it's possible to chain the conversions to
183convert `int` to `unsigned long`.
184There are two constraints here; one is that the `int` to `unsigned long`
185conversion needs to cost more than the other two (because the types aren't as
186"close" in a very intuitive fashion), and the other is that the system needs a
187way to choose which path to take to get to the destination type.
188Now, a fairly natural solution for this would be to just say "C knows how to
189convert from `int` to `unsigned long`, so we just put in a direct conversion
190and make the compiler smart enough to figure out the costs" - this is the
191approach taken by the existing compiler, but given that in a user-defined
192conversion proposal the users can build an arbitrary graph of conversions,
193this case still needs to be handled.
194
195We can define a preorder over the types by saying that `a <~ b` if there
196exists a chain of conversions from `a` to `b` (see Appendix A for description
197of preorders and related constructs).
198This preorder roughly corresponds to a more usual type-theoretic concept of
199subtyping ("if I can convert `a` to `b`, `a` is a more specific type than
200`b`"); however, since this graph is arbitrary, it may contain cycles, so if
201there is also a path to convert `b` to `a` they are in some sense equivalently
202specific.
203
204Now, to compare the cost of two conversion chains `(s, x1, x2, ... xn)` and
205`(s, y1, y2, ... ym)`, we have both the length of the chains (`n` versus `m`)
206and this conversion preorder over the destination types `xn` and `ym`.
207We could define a preorder by taking chain length and breaking ties by the
208conversion preorder, but this would lead to unexpected behaviour when closing
209diamonds with an arm length of longer than 1.
210Consider a set of types `A`, `B1`, `B2`, `C` with the arcs `A->B1`, `B1->B2`,
211`B2->C`, and `A->C`.
212If we are comparing conversions from `A` to both `B2` and `C`, we expect the
213conversion to `B2` to be chosen because it's the more specific type under the
214conversion preorder, but since its chain length is longer than the conversion
215to `C`, it loses and `C` is chosen.
216However, taking the conversion preorder and breaking ties or ambiguities by
217chain length also doesn't work, because of cases like the following example
218where the transitivity property is broken and we can't find a global maximum:
219
220 `X->Y1->Y2`, `X->Z1->Z2->Z3->W`, `X->W`
221
222In this set of arcs, if we're comparing conversions from `X` to each of `Y2`,
223`Z3` and `W`, converting to `Y2` is cheaper than converting to `Z3`, because
224there are no conversions between `Y2` and `Z3`, and `Y2` has the shorter chain
225length.
226Also, comparing conversions from `X` to `Z3` and to `W`, we find that the
227conversion to `Z3` is cheaper, because `Z3 < W` by the conversion preorder,
228and so is considered to be the nearer type.
229By transitivity, then, the conversion from `X` to `Y2` should be cheaper than
230the conversion from `X` to `W`, but in this case the `Y2` and `W` are
231incomparable by the conversion preorder, so the tie is broken by the shorter
232path from `X` to `W` in favour of `W`, contradicting the transitivity property
233for this proposed order.
234
235Without transitivity, we would need to compare all pairs of conversions, which
236would be expensive, and possibly not yield a minimal-cost conversion even if
237all pairs were comparable.
238In short, this ordering is infeasible, and by extension I believe any ordering
239composed solely of single-step conversions between types with no further
240user-supplied information will be insufficiently powerful to express the
241built-in conversions between C's types.
Note: See TracBrowser for help on using the repository browser.