## User-defined Conversions ##
C's implicit "usual arithmetic conversions" define a structure among the
built-in types consisting of _unsafe_ narrowing conversions and a hierarchy of
_safe_ widening conversions.
There is also a set of _explicit_ conversions that are only allowed through a
cast expression.
I propose that safe, unsafe, and explicit (cast) conversions be expressed as
constructor variants.
Throughout this article, I will use the following operator names for
constructors and conversion functions from `From` to `To`:
void ?{} ( To&, To ); // copy constructor
void ?{} ( To&, From ); // explicit constructor
void ?{explicit} ( To&, From ); // explicit cast conversion
void ?{safe} ( To&, From ); // implicit safe conversion
void ?{unsafe} ( To&, From ); // implicit unsafe conversion
It has been suggested that all constructors would define unsafe implicit
conversions; this is elegant, but interacts poorly with tuples.
Essentially, without making this distinction, a constructor like the following
would add an interpretation of any two `int`s as a `Coord`, needlessly
multiplying the space of possible interpretations of all functions:
void ?{}( Coord& this, int x, int y );
That said, it would certainly be possible to make a multiple-argument implicit
conversion, as below, though the argument above suggests this option should be
used infrequently:
void ?{unsafe}( Coord& this, int x, int y );
An alternate possibility would be to only count two-arg constructors
`void ?{} ( To&, From )` as unsafe conversions; under this semantics, safe and
explicit conversions should also have a compiler-enforced restriction to
ensure that they are two-arg functions (this restriction may be valuable
regardless).
Regardless of syntax, there should be a type assertion that expresses `From`
is convertable to `To`.
If user-defined conversions are not added to the language,
`void ?{} ( To&, From )` may be a suitable representation, relying on
conversions on the argument types to account for transitivity.
Since `To&` should be an exact match on `To`, this should put all the implicit
conversions on the RHS.
On the other hand, under some models (like [1]), implicit conversions are not
allowed in assertion parameters, so another assertion syntax specific to
conversions may be required, e.g. `From -> To`.
It has also been suggested that, for programmer control, no implicit
conversions (except, possibly, for polymorphic specialization) should be
allowed in resolution of cast operators.
[1] ../working/assertion_resolution.md
### Constructor Idiom ###
Basing our notion of conversions off otherwise normal Cforall functions means
that we can use the full range of Cforall features for conversions, including
polymorphism.
In an earlier version of this proposal, Glen Ditchfield defines a
_constructor idiom_ that can be used to create chains of safe conversions
without duplicating code; given a type `Safe` which members of another type
`From` can be directly converted to, the constructor idiom allows us to write
a conversion for any type `To` which `Safe` converts to:
forall(otype To | { void ?{safe}( To&, Safe ) })
void ?{safe}( To& this, From that ) {
Safe tmp = /* some expression involving that */;
this{ tmp }; // initialize from assertion parameter
}
This idiom can also be used with only minor variations for a parallel set of
unsafe conversions.
Glen's original suggestion said the copy constructor for `To` should also be
accepted as a resolution for `void ?{safe}( To&, Safe )` (`Safe` == `To`),
allowing this same code to be used for the single-step conversion as well.
This proposal does come at the cost of an extra copy initialization of the
target value, though.
Contrariwise, if a monomorphic conversion from `From` to `Safe` is written,
e.g:
void ?{safe}( Safe& this, From that ) {
this{ /* some parameters involving that */ };
}
Then the code for a transitive conversion from `From` to any `To` type
convertable from `Safe` is written:
forall(otype To | { void ?{safe}( To&, Safe ) })
void ?{safe}( To& this, From that ) {
Safe tmp = that; // uses monomorphic conversion
this{ tmp }; // initialize from assertion parameter
}
Given the entirely-boilerplate nature of this code, but negative performance
implications of the unmodified constructor idiom, it might be fruitful to have
transitive and single step conversion operators, and let CFA build the
transitive conversions; some possible names:
void ?{safe} (To&, From); void ?{final safe} (To&, From); // single-step
void ?{safe*} (To&, From); void ?{safe} (To&, From); // transitive
What selective non-use of the constructor idiom gives us is the ability to
define a conversion that may only be the *last* conversion in a chain of such.
One use for this is to solve the problem that `explicit` conversions were
added to C++ for, that of conversions to `bool` chaining to become conversions
to any arithmetic type.
Another use is to unambiguously represent the full hierarchy of implicit
conversions in C by making sign conversions non-transitive, allowing the
compiler to resolve e.g. `int -> unsigned long` as
`int -> long -> unsigned long` over `int -> unsigned int -> unsigned long`.
See [2] for more details.
[2] ../working/glen_conversions/index.html#usual
### Appendix A: Partial and Total Orders ###
The `<=` relation on integers is a commonly known _total order_, and
intuitions based on how it works generally apply well to other total orders.
Formally, a total order is some binary relation `<=` over a set `S` such that
for any two members `a` and `b` of `S`, `a <= b` or `b <= a` (if both, `a` and
`b` must be equal, the _antisymmetry_ property); total orders also have a
_transitivity_ property, that if `a <= b` and `b <= c`, then `a <= c`.
If `a` and `b` are distinct elements and `a <= b`, we may write `a < b`.
A _partial order_ is a generalization of this concept where the `<=` relation
is not required to be defined over all pairs of elements in `S` (though there
is a _reflexivity_ requirement that for all `a` in `S`, `a <= a`); in other
words, it is possible for two elements `a` and `b` of `S` to be
_incomparable_, unable to be ordered with respect to one another (any `a` and
`b` for which either `a <= b` or `b <= a` are called _comparable_).
Antisymmetry and transitivity are also required for a partial order, so all
total orders are also partial orders by definition.
One fairly natural partial order is the "subset of" relation over sets from
the same universe; `{ }` is a subset of both `{ 1 }` and `{ 2 }`, which are
both subsets of `{ 1, 2 }`, but neither `{ 1 }` nor `{ 2 }` is a subset of the
other - they are incomparable under this relation.
We can compose two (or more) partial orders to produce a new partial order on
tuples drawn from both (or all the) sets.
For example, given `a` and `c` from set `S` and `b` and `d` from set `R`,
where both `S` and `R` both have partial orders defined on them, we can define
a ordering relation between `(a, b)` and `(c, d)`.
One common order is the _lexicographical order_, where `(a, b) <= (c, d)` iff
`a < c` or both `a = c` and `b <= d`; this can be thought of as ordering by
the first set and "breaking ties" by the second set.
Another common order is the _product order_, which can be roughly thought of
as "all the components are ordered the same way"; formally `(a, b) <= (c, d)`
iff `a <= c` and `b <= d`.
One difference between the lexicographical order and the product order is that
in the lexicographical order if both `a` and `c` and `b` and `d` are
comparable then `(a, b)` and `(c, d)` will be comparable, while in the product
order you can have `a <= c` and `d <= b` (both comparable) which will make
`(a, b)` and `(c, d)` incomparable.
The product order, on the other hand, has the benefit of not prioritizing one
order over the other.
Any partial order has a natural representation as a directed acyclic graph
(DAG).
Each element `a` of the set becomes a node of the DAG, with an arc pointing to
its _covering_ elements, any element `b` such that `a < b` but where there is
no `c` such that `a < c` and `c < b`.
Intuitively, the covering elements are the "next ones larger", where you can't
fit another element between the two.
Under this construction, `a < b` is equivalent to "there is a path from `a` to
`b` in the DAG", and the lack of cycles in the directed graph is ensured by
the antisymmetry property of the partial order.
Partial orders can be generalized to _preorders_ by removing the antisymmetry
property.
In a preorder the relation is generally called `<~`, and it is possible for
two distict elements `a` and `b` to have `a <~ b` and `b <~ a` - in this case
we write `a ~ b`; `a <~ b` and not `a ~ b` is written `a < b`.
Preorders may also be represented as directed graphs, but in this case the
graph may contain cycles.
### Appendix B: Building a Conversion Graph from Un-annotated Single Steps ###
The short answer is that it's impossible.
The longer answer is that it has to do with what's essentially a diamond
inheritance problem.
In C, `int` converts to `unsigned int` and also `long` "safely"; both convert
to `unsigned long` safely, and it's possible to chain the conversions to
convert `int` to `unsigned long`.
There are two constraints here; one is that the `int` to `unsigned long`
conversion needs to cost more than the other two (because the types aren't as
"close" in a very intuitive fashion), and the other is that the system needs a
way to choose which path to take to get to the destination type.
Now, a fairly natural solution for this would be to just say "C knows how to
convert from `int` to `unsigned long`, so we just put in a direct conversion
and make the compiler smart enough to figure out the costs" - this is the
approach taken by the existing compiler, but given that in a user-defined
conversion proposal the users can build an arbitrary graph of conversions,
this case still needs to be handled.
We can define a preorder over the types by saying that `a <~ b` if there
exists a chain of conversions from `a` to `b` (see Appendix A for description
of preorders and related constructs).
This preorder roughly corresponds to a more usual type-theoretic concept of
subtyping ("if I can convert `a` to `b`, `a` is a more specific type than
`b`"); however, since this graph is arbitrary, it may contain cycles, so if
there is also a path to convert `b` to `a` they are in some sense equivalently
specific.
Now, to compare the cost of two conversion chains `(s, x1, x2, ... xn)` and
`(s, y1, y2, ... ym)`, we have both the length of the chains (`n` versus `m`)
and this conversion preorder over the destination types `xn` and `ym`.
We could define a preorder by taking chain length and breaking ties by the
conversion preorder, but this would lead to unexpected behaviour when closing
diamonds with an arm length of longer than 1.
Consider a set of types `A`, `B1`, `B2`, `C` with the arcs `A->B1`, `B1->B2`,
`B2->C`, and `A->C`.
If we are comparing conversions from `A` to both `B2` and `C`, we expect the
conversion to `B2` to be chosen because it's the more specific type under the
conversion preorder, but since its chain length is longer than the conversion
to `C`, it loses and `C` is chosen.
However, taking the conversion preorder and breaking ties or ambiguities by
chain length also doesn't work, because of cases like the following example
where the transitivity property is broken and we can't find a global maximum:
`X->Y1->Y2`, `X->Z1->Z2->Z3->W`, `X->W`
In this set of arcs, if we're comparing conversions from `X` to each of `Y2`,
`Z3` and `W`, converting to `Y2` is cheaper than converting to `Z3`, because
there are no conversions between `Y2` and `Z3`, and `Y2` has the shorter chain
length.
Also, comparing conversions from `X` to `Z3` and to `W`, we find that the
conversion to `Z3` is cheaper, because `Z3 < W` by the conversion preorder,
and so is considered to be the nearer type.
By transitivity, then, the conversion from `X` to `Y2` should be cheaper than
the conversion from `X` to `W`, but in this case the `Y2` and `W` are
incomparable by the conversion preorder, so the tie is broken by the shorter
path from `X` to `W` in favour of `W`, contradicting the transitivity property
for this proposed order.
Without transitivity, we would need to compare all pairs of conversions, which
would be expensive, and possibly not yield a minimal-cost conversion even if
all pairs were comparable.
In short, this ordering is infeasible, and by extension I believe any ordering
composed solely of single-step conversions between types with no further
user-supplied information will be insufficiently powerful to express the
built-in conversions between C's types.