Context Navigation

source: doc/proposals/user_conversions.md @ df2e00f

Last change on this file since df2e00f was 6eb131c, checked in by Aaron Moss <a3moss@…>, 6 years ago

Proposal documents for user-defined conversions

pull Glen's old conversions document into working docs
update user_conversions.md to reference new location

Property mode set to 100644

File size: 12.7 KB

Rev	Line
[41634098]	1	## User-defined Conversions ##
	2	C's implicit "usual arithmetic conversions" define a structure among the
	3	built-in types consisting of _unsafe_ narrowing conversions and a hierarchy of
	4	_safe_ widening conversions.
	5	There is also a set of _explicit_ conversions that are only allowed through a
	6	cast expression.
[6eb131c]	7	I propose that safe, unsafe, and explicit (cast) conversions be expressed as
	8	constructor variants.
[41634098]	9	Throughout this article, I will use the following operator names for
	10	constructors and conversion functions from `From` to `To`:
	11
[6eb131c]	12	void ?{} ( To&, To ); // copy constructor
	13	void ?{} ( To&, From ); // explicit constructor
	14	void ?{explicit} ( To&, From ); // explicit cast conversion
	15	void ?{safe} ( To&, From ); // implicit safe conversion
	16	void ?{unsafe} ( To&, From ); // implicit unsafe conversion
[41634098]	17
[6eb131c]	18	It has been suggested that all constructors would define unsafe implicit
[41634098]	19	conversions; this is elegant, but interacts poorly with tuples.
	20	Essentially, without making this distinction, a constructor like the following
	21	would add an interpretation of any two `int`s as a `Coord`, needlessly
	22	multiplying the space of possible interpretations of all functions:
	23
[6eb131c]	24	void ?{}( Coord& this, int x, int y );
[41634098]	25
	26	That said, it would certainly be possible to make a multiple-argument implicit
	27	conversion, as below, though the argument above suggests this option should be
	28	used infrequently:
	29
[6eb131c]	30	void ?{unsafe}( Coord& this, int x, int y );
[41634098]	31
	32	An alternate possibility would be to only count two-arg constructors
[6eb131c]	33	`void ?{} ( To&, From )` as unsafe conversions; under this semantics, safe and
[41634098]	34	explicit conversions should also have a compiler-enforced restriction to
	35	ensure that they are two-arg functions (this restriction may be valuable
	36	regardless).
	37
	38	Regardless of syntax, there should be a type assertion that expresses `From`
	39	is convertable to `To`.
	40	If user-defined conversions are not added to the language,
[6eb131c]	41	`void ?{} ( To&, From )` may be a suitable representation, relying on
[41634098]	42	conversions on the argument types to account for transitivity.
[6eb131c]	43	Since `To&` should be an exact match on `To`, this should put all the implicit
	44	conversions on the RHS.
	45	On the other hand, under some models (like [1]), implicit conversions are not
	46	allowed in assertion parameters, so another assertion syntax specific to
	47	conversions may be required, e.g. `From -> To`.
	48	It has also been suggested that, for programmer control, no implicit
	49	conversions (except, possibly, for polymorphic specialization) should be
	50	allowed in resolution of cast operators.
	51
	52	[1] ../working/assertion_resolution.md
[41634098]	53
	54	### Constructor Idiom ###
	55	Basing our notion of conversions off otherwise normal Cforall functions means
	56	that we can use the full range of Cforall features for conversions, including
	57	polymorphism.
[6eb131c]	58	In an earlier version of this proposal, Glen Ditchfield defines a
	59	_constructor idiom_ that can be used to create chains of safe conversions
	60	without duplicating code; given a type `Safe` which members of another type
	61	`From` can be directly converted to, the constructor idiom allows us to write
	62	a conversion for any type `To` which `Safe` converts to:
	63
	64	forall(otype To \| { void ?{safe}( To&, Safe ) })
	65	void ?{safe}( To& this, From that ) {
[41634098]	66	Safe tmp = /* some expression involving that */;
[6eb131c]	67	this{ tmp }; // initialize from assertion parameter
[41634098]	68	}
	69
	70	This idiom can also be used with only minor variations for a parallel set of
	71	unsafe conversions.
	72
[6eb131c]	73	Glen's original suggestion said the copy constructor for `To` should also be
	74	accepted as a resolution for `void ?{safe}( To&, Safe )` (`Safe` == `To`),
	75	allowing this same code to be used for the single-step conversion as well.
	76	This proposal does come at the cost of an extra copy initialization of the
	77	target value, though.
	78
	79	Contrariwise, if a monomorphic conversion from `From` to `Safe` is written,
	80	e.g:
	81
	82	void ?{safe}( Safe& this, From that ) {
	83	this{ /* some parameters involving that */ };
	84	}
	85
	86	Then the code for a transitive conversion from `From` to any `To` type
	87	convertable from `Safe` is written:
	88
	89	forall(otype To \| { void ?{safe}( To&, Safe ) })
	90	void ?{safe}( To& this, From that ) {
	91	Safe tmp = that; // uses monomorphic conversion
	92	this{ tmp }; // initialize from assertion parameter
	93	}
	94
	95	Given the entirely-boilerplate nature of this code, but negative performance
	96	implications of the unmodified constructor idiom, it might be fruitful to have
	97	transitive and single step conversion operators, and let CFA build the
	98	transitive conversions; some possible names:
	99
	100	void ?{safe} (To&, From); void ?{final safe} (To&, From); // single-step
	101	void ?{safe*} (To&, From); void ?{safe} (To&, From); // transitive
	102
[41634098]	103	What selective non-use of the constructor idiom gives us is the ability to
	104	define a conversion that may only be the last conversion in a chain of such.
[6eb131c]	105	One use for this is to solve the problem that `explicit` conversions were
	106	added to C++ for, that of conversions to `bool` chaining to become conversions
	107	to any arithmetic type.
	108	Another use is to unambiguously represent the full hierarchy of implicit
	109	conversions in C by making sign conversions non-transitive, allowing the
	110	compiler to resolve e.g. `int -> unsigned long` as
	111	`int -> long -> unsigned long` over `int -> unsigned int -> unsigned long`.
	112	See [2] for more details.
	113
	114	[2] ../working/glen_conversions/index.html#usual
[41634098]	115
	116	### Appendix A: Partial and Total Orders ###
	117	The `<=` relation on integers is a commonly known _total order_, and
	118	intuitions based on how it works generally apply well to other total orders.
	119	Formally, a total order is some binary relation `<=` over a set `S` such that
	120	for any two members `a` and `b` of `S`, `a <= b` or `b <= a` (if both, `a` and
	121	`b` must be equal, the _antisymmetry_ property); total orders also have a
	122	_transitivity_ property, that if `a <= b` and `b <= c`, then `a <= c`.
	123	If `a` and `b` are distinct elements and `a <= b`, we may write `a < b`.
	124
	125	A _partial order_ is a generalization of this concept where the `<=` relation
	126	is not required to be defined over all pairs of elements in `S` (though there
	127	is a _reflexivity_ requirement that for all `a` in `S`, `a <= a`); in other
	128	words, it is possible for two elements `a` and `b` of `S` to be
	129	_incomparable_, unable to be ordered with respect to one another (any `a` and
	130	`b` for which either `a <= b` or `b <= a` are called _comparable_).
	131	Antisymmetry and transitivity are also required for a partial order, so all
	132	total orders are also partial orders by definition.
	133	One fairly natural partial order is the "subset of" relation over sets from
	134	the same universe; `{ }` is a subset of both `{ 1 }` and `{ 2 }`, which are
	135	both subsets of `{ 1, 2 }`, but neither `{ 1 }` nor `{ 2 }` is a subset of the
	136	other - they are incomparable under this relation.
	137
	138	We can compose two (or more) partial orders to produce a new partial order on
	139	tuples drawn from both (or all the) sets.
	140	For example, given `a` and `c` from set `S` and `b` and `d` from set `R`,
	141	where both `S` and `R` both have partial orders defined on them, we can define
	142	a ordering relation between `(a, b)` and `(c, d)`.
	143	One common order is the _lexicographical order_, where `(a, b) <= (c, d)` iff
	144	`a < c` or both `a = c` and `b <= d`; this can be thought of as ordering by
	145	the first set and "breaking ties" by the second set.
	146	Another common order is the _product order_, which can be roughly thought of
	147	as "all the components are ordered the same way"; formally `(a, b) <= (c, d)`
	148	iff `a <= c` and `b <= d`.
	149	One difference between the lexicographical order and the product order is that
	150	in the lexicographical order if both `a` and `c` and `b` and `d` are
	151	comparable then `(a, b)` and `(c, d)` will be comparable, while in the product
	152	order you can have `a <= c` and `d <= b` (both comparable) which will make
	153	`(a, b)` and `(c, d)` incomparable.
	154	The product order, on the other hand, has the benefit of not prioritizing one
	155	order over the other.
	156
	157	Any partial order has a natural representation as a directed acyclic graph
	158	(DAG).
	159	Each element `a` of the set becomes a node of the DAG, with an arc pointing to
	160	its _covering_ elements, any element `b` such that `a < b` but where there is
	161	no `c` such that `a < c` and `c < b`.
	162	Intuitively, the covering elements are the "next ones larger", where you can't
	163	fit another element between the two.
	164	Under this construction, `a < b` is equivalent to "there is a path from `a` to
	165	`b` in the DAG", and the lack of cycles in the directed graph is ensured by
	166	the antisymmetry property of the partial order.
	167
	168	Partial orders can be generalized to _preorders_ by removing the antisymmetry
	169	property.
	170	In a preorder the relation is generally called `<~`, and it is possible for
	171	two distict elements `a` and `b` to have `a <~ b` and `b <~ a` - in this case
	172	we write `a ~ b`; `a <~ b` and not `a ~ b` is written `a < b`.
	173	Preorders may also be represented as directed graphs, but in this case the
	174	graph may contain cycles.
	175
	176	### Appendix B: Building a Conversion Graph from Un-annotated Single Steps ###
	177	The short answer is that it's impossible.
	178
	179	The longer answer is that it has to do with what's essentially a diamond
	180	inheritance problem.
	181	In C, `int` converts to `unsigned int` and also `long` "safely"; both convert
	182	to `unsigned long` safely, and it's possible to chain the conversions to
	183	convert `int` to `unsigned long`.
	184	There are two constraints here; one is that the `int` to `unsigned long`
	185	conversion needs to cost more than the other two (because the types aren't as
	186	"close" in a very intuitive fashion), and the other is that the system needs a
	187	way to choose which path to take to get to the destination type.
	188	Now, a fairly natural solution for this would be to just say "C knows how to
	189	convert from `int` to `unsigned long`, so we just put in a direct conversion
	190	and make the compiler smart enough to figure out the costs" - this is the
[6eb131c]	191	approach taken by the existing compiler, but given that in a user-defined
[41634098]	192	conversion proposal the users can build an arbitrary graph of conversions,
	193	this case still needs to be handled.
	194
	195	We can define a preorder over the types by saying that `a <~ b` if there
	196	exists a chain of conversions from `a` to `b` (see Appendix A for description
	197	of preorders and related constructs).
[6eb131c]	198	This preorder roughly corresponds to a more usual type-theoretic concept of
[41634098]	199	subtyping ("if I can convert `a` to `b`, `a` is a more specific type than
	200	`b`"); however, since this graph is arbitrary, it may contain cycles, so if
	201	there is also a path to convert `b` to `a` they are in some sense equivalently
	202	specific.
	203
	204	Now, to compare the cost of two conversion chains `(s, x1, x2, ... xn)` and
	205	`(s, y1, y2, ... ym)`, we have both the length of the chains (`n` versus `m`)
	206	and this conversion preorder over the destination types `xn` and `ym`.
	207	We could define a preorder by taking chain length and breaking ties by the
	208	conversion preorder, but this would lead to unexpected behaviour when closing
	209	diamonds with an arm length of longer than 1.
	210	Consider a set of types `A`, `B1`, `B2`, `C` with the arcs `A->B1`, `B1->B2`,
	211	`B2->C`, and `A->C`.
	212	If we are comparing conversions from `A` to both `B2` and `C`, we expect the
	213	conversion to `B2` to be chosen because it's the more specific type under the
	214	conversion preorder, but since its chain length is longer than the conversion
	215	to `C`, it loses and `C` is chosen.
	216	However, taking the conversion preorder and breaking ties or ambiguities by
	217	chain length also doesn't work, because of cases like the following example
	218	where the transitivity property is broken and we can't find a global maximum:
	219
	220	`X->Y1->Y2`, `X->Z1->Z2->Z3->W`, `X->W`
	221
	222	In this set of arcs, if we're comparing conversions from `X` to each of `Y2`,
	223	`Z3` and `W`, converting to `Y2` is cheaper than converting to `Z3`, because
	224	there are no conversions between `Y2` and `Z3`, and `Y2` has the shorter chain
	225	length.
	226	Also, comparing conversions from `X` to `Z3` and to `W`, we find that the
	227	conversion to `Z3` is cheaper, because `Z3 < W` by the conversion preorder,
	228	and so is considered to be the nearer type.
	229	By transitivity, then, the conversion from `X` to `Y2` should be cheaper than
[6eb131c]	230	the conversion from `X` to `W`, but in this case the `Y2` and `W` are
[41634098]	231	incomparable by the conversion preorder, so the tie is broken by the shorter
	232	path from `X` to `W` in favour of `W`, contradicting the transitivity property
	233	for this proposed order.
	234
	235	Without transitivity, we would need to compare all pairs of conversions, which
	236	would be expensive, and possibly not yield a minimal-cost conversion even if
	237	all pairs were comparable.
	238	In short, this ordering is infeasible, and by extension I believe any ordering
	239	composed solely of single-step conversions between types with no further
	240	user-supplied information will be insufficiently powerful to express the
	241	built-in conversions between C's types.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: