Context Navigation

source: doc/proposals/user_conversions.md@ d807ca28

Visit:

ADT aaron-thesis arm-eh ast-experimental cleanup-dtors deferred_resn demangler enum forall-pointer-decay jacob/cs343-translation jenkins-sandbox new-ast new-ast-unique-expr new-env no_list persistent-indexer pthread-emulation qualifiedEnum with_gc

Last change on this file since d807ca28 was 41634098, checked in by Aaron Moss <a3moss@…>, 8 years ago
Added white paper on user-defined conversions based on resolver design doc
Property mode set to `100644`
File size: 11.3 KB

Line
1	## User-defined Conversions ##
2	C's implicit "usual arithmetic conversions" define a structure among the
3	built-in types consisting of _unsafe_ narrowing conversions and a hierarchy of
4	_safe_ widening conversions.
5	There is also a set of _explicit_ conversions that are only allowed through a
6	cast expression.
7	Based on Glen's notes on conversions [1], I propose that safe and unsafe
8	conversions be expressed as constructor variants, though I make explicit
9	(cast) conversions a constructor variant as well rather than a dedicated
10	operator.
11	Throughout this article, I will use the following operator names for
12	constructors and conversion functions from `From` to `To`:
13
14	void ?{} ( To*, To ); // copy constructor
15	void ?{} ( To*, From ); // explicit constructor
16	void ?{explicit} ( To*, From ); // explicit cast conversion
17	void ?{safe} ( To*, From ); // implicit safe conversion
18	void ?{unsafe} ( To*, From ); // implicit unsafe conversion
19
20	[1] http://plg.uwaterloo.ca/~cforall/Conversions/index.html
21
22	Glen's design made no distinction between constructors and unsafe implicit
23	conversions; this is elegant, but interacts poorly with tuples.
24	Essentially, without making this distinction, a constructor like the following
25	would add an interpretation of any two `int`s as a `Coord`, needlessly
26	multiplying the space of possible interpretations of all functions:
27
28	void ?{}( Coord *this, int x, int y );
29
30	That said, it would certainly be possible to make a multiple-argument implicit
31	conversion, as below, though the argument above suggests this option should be
32	used infrequently:
33
34	void ?{unsafe}( Coord *this, int x, int y );
35
36	An alternate possibility would be to only count two-arg constructors
37	`void ?{} ( To*, From )` as unsafe conversions; under this semantics, safe and
38	explicit conversions should also have a compiler-enforced restriction to
39	ensure that they are two-arg functions (this restriction may be valuable
40	regardless).
41
42	Regardless of syntax, there should be a type assertion that expresses `From`
43	is convertable to `To`.
44	If user-defined conversions are not added to the language,
45	`void ?{} ( To*, From )` may be a suitable representation, relying on
46	conversions on the argument types to account for transitivity.
47	On the other hand, `To*` should perhaps match its target type exactly, so
48	another assertion syntax specific to conversions may be required, e.g.
49	`From -> To`.
50
51	### Constructor Idiom ###
52	Basing our notion of conversions off otherwise normal Cforall functions means
53	that we can use the full range of Cforall features for conversions, including
54	polymorphism.
55	Glen [1] defines a _constructor idiom_ that can be used to create chains of
56	safe conversions without duplicating code; given a type `Safe` which members
57	of another type `From` can be directly converted to, the constructor idiom
58	allows us to write a conversion for any type `To` which `Safe` converts to:
59
60	forall(otype To \| { void ?{safe}( To*, Safe ) })
61	void ?{safe}( To *this, From that ) {
62	Safe tmp = /* some expression involving that */;
63	*this = tmp; // uses assertion parameter
64	}
65
66	This idiom can also be used with only minor variations for a parallel set of
67	unsafe conversions.
68
69	What selective non-use of the constructor idiom gives us is the ability to
70	define a conversion that may only be the last conversion in a chain of such.
71	Constructing a conversion graph able to unambiguously represent the full
72	hierarchy of implicit conversions in C is provably impossible using only
73	single-step conversions with no additional information (see Appendix A), but
74	this mechanism is sufficiently powerful (see [1], though the design there has
75	some minor bugs; the general idea is to use the constructor idiom to define
76	two chains of conversions, one among the signed integral types, another among
77	the unsigned, and to use monomorphic conversions to allow conversions between
78	signed and unsigned integer types).
79
80	### Appendix A: Partial and Total Orders ###
81	The `<=` relation on integers is a commonly known _total order_, and
82	intuitions based on how it works generally apply well to other total orders.
83	Formally, a total order is some binary relation `<=` over a set `S` such that
84	for any two members `a` and `b` of `S`, `a <= b` or `b <= a` (if both, `a` and
85	`b` must be equal, the _antisymmetry_ property); total orders also have a
86	_transitivity_ property, that if `a <= b` and `b <= c`, then `a <= c`.
87	If `a` and `b` are distinct elements and `a <= b`, we may write `a < b`.
88
89	A _partial order_ is a generalization of this concept where the `<=` relation
90	is not required to be defined over all pairs of elements in `S` (though there
91	is a _reflexivity_ requirement that for all `a` in `S`, `a <= a`); in other
92	words, it is possible for two elements `a` and `b` of `S` to be
93	_incomparable_, unable to be ordered with respect to one another (any `a` and
94	`b` for which either `a <= b` or `b <= a` are called _comparable_).
95	Antisymmetry and transitivity are also required for a partial order, so all
96	total orders are also partial orders by definition.
97	One fairly natural partial order is the "subset of" relation over sets from
98	the same universe; `{ }` is a subset of both `{ 1 }` and `{ 2 }`, which are
99	both subsets of `{ 1, 2 }`, but neither `{ 1 }` nor `{ 2 }` is a subset of the
100	other - they are incomparable under this relation.
101
102	We can compose two (or more) partial orders to produce a new partial order on
103	tuples drawn from both (or all the) sets.
104	For example, given `a` and `c` from set `S` and `b` and `d` from set `R`,
105	where both `S` and `R` both have partial orders defined on them, we can define
106	a ordering relation between `(a, b)` and `(c, d)`.
107	One common order is the _lexicographical order_, where `(a, b) <= (c, d)` iff
108	`a < c` or both `a = c` and `b <= d`; this can be thought of as ordering by
109	the first set and "breaking ties" by the second set.
110	Another common order is the _product order_, which can be roughly thought of
111	as "all the components are ordered the same way"; formally `(a, b) <= (c, d)`
112	iff `a <= c` and `b <= d`.
113	One difference between the lexicographical order and the product order is that
114	in the lexicographical order if both `a` and `c` and `b` and `d` are
115	comparable then `(a, b)` and `(c, d)` will be comparable, while in the product
116	order you can have `a <= c` and `d <= b` (both comparable) which will make
117	`(a, b)` and `(c, d)` incomparable.
118	The product order, on the other hand, has the benefit of not prioritizing one
119	order over the other.
120
121	Any partial order has a natural representation as a directed acyclic graph
122	(DAG).
123	Each element `a` of the set becomes a node of the DAG, with an arc pointing to
124	its _covering_ elements, any element `b` such that `a < b` but where there is
125	no `c` such that `a < c` and `c < b`.
126	Intuitively, the covering elements are the "next ones larger", where you can't
127	fit another element between the two.
128	Under this construction, `a < b` is equivalent to "there is a path from `a` to
129	`b` in the DAG", and the lack of cycles in the directed graph is ensured by
130	the antisymmetry property of the partial order.
131
132	Partial orders can be generalized to _preorders_ by removing the antisymmetry
133	property.
134	In a preorder the relation is generally called `<~`, and it is possible for
135	two distict elements `a` and `b` to have `a <~ b` and `b <~ a` - in this case
136	we write `a ~ b`; `a <~ b` and not `a ~ b` is written `a < b`.
137	Preorders may also be represented as directed graphs, but in this case the
138	graph may contain cycles.
139
140	### Appendix B: Building a Conversion Graph from Un-annotated Single Steps ###
141	The short answer is that it's impossible.
142
143	The longer answer is that it has to do with what's essentially a diamond
144	inheritance problem.
145	In C, `int` converts to `unsigned int` and also `long` "safely"; both convert
146	to `unsigned long` safely, and it's possible to chain the conversions to
147	convert `int` to `unsigned long`.
148	There are two constraints here; one is that the `int` to `unsigned long`
149	conversion needs to cost more than the other two (because the types aren't as
150	"close" in a very intuitive fashion), and the other is that the system needs a
151	way to choose which path to take to get to the destination type.
152	Now, a fairly natural solution for this would be to just say "C knows how to
153	convert from `int` to `unsigned long`, so we just put in a direct conversion
154	and make the compiler smart enough to figure out the costs" - this is the
155	approach taken by the existing compipler, but given that in a user-defined
156	conversion proposal the users can build an arbitrary graph of conversions,
157	this case still needs to be handled.
158
159	We can define a preorder over the types by saying that `a <~ b` if there
160	exists a chain of conversions from `a` to `b` (see Appendix A for description
161	of preorders and related constructs).
162	This preorder corresponds roughly to a more usual type-theoretic concept of
163	subtyping ("if I can convert `a` to `b`, `a` is a more specific type than
164	`b`"); however, since this graph is arbitrary, it may contain cycles, so if
165	there is also a path to convert `b` to `a` they are in some sense equivalently
166	specific.
167
168	Now, to compare the cost of two conversion chains `(s, x1, x2, ... xn)` and
169	`(s, y1, y2, ... ym)`, we have both the length of the chains (`n` versus `m`)
170	and this conversion preorder over the destination types `xn` and `ym`.
171	We could define a preorder by taking chain length and breaking ties by the
172	conversion preorder, but this would lead to unexpected behaviour when closing
173	diamonds with an arm length of longer than 1.
174	Consider a set of types `A`, `B1`, `B2`, `C` with the arcs `A->B1`, `B1->B2`,
175	`B2->C`, and `A->C`.
176	If we are comparing conversions from `A` to both `B2` and `C`, we expect the
177	conversion to `B2` to be chosen because it's the more specific type under the
178	conversion preorder, but since its chain length is longer than the conversion
179	to `C`, it loses and `C` is chosen.
180	However, taking the conversion preorder and breaking ties or ambiguities by
181	chain length also doesn't work, because of cases like the following example
182	where the transitivity property is broken and we can't find a global maximum:
183
184	`X->Y1->Y2`, `X->Z1->Z2->Z3->W`, `X->W`
185
186	In this set of arcs, if we're comparing conversions from `X` to each of `Y2`,
187	`Z3` and `W`, converting to `Y2` is cheaper than converting to `Z3`, because
188	there are no conversions between `Y2` and `Z3`, and `Y2` has the shorter chain
189	length.
190	Also, comparing conversions from `X` to `Z3` and to `W`, we find that the
191	conversion to `Z3` is cheaper, because `Z3 < W` by the conversion preorder,
192	and so is considered to be the nearer type.
193	By transitivity, then, the conversion from `X` to `Y2` should be cheaper than
194	the conversion from `X` to `W`, but in this case the `X` and `W` are
195	incomparable by the conversion preorder, so the tie is broken by the shorter
196	path from `X` to `W` in favour of `W`, contradicting the transitivity property
197	for this proposed order.
198
199	Without transitivity, we would need to compare all pairs of conversions, which
200	would be expensive, and possibly not yield a minimal-cost conversion even if
201	all pairs were comparable.
202	In short, this ordering is infeasible, and by extension I believe any ordering
203	composed solely of single-step conversions between types with no further
204	user-supplied information will be insufficiently powerful to express the
205	built-in conversions between C's types.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: