## Types for 0 and 1 literals ## The literals `0` and `1` are treated specially by Cforall, due to their potential uses in operator overloading. Earlier versions of Cforall allowed `0` and `1` to be variable names, allowing multiple interpretations of them according to the existing variable overloading rules, with the following declarations in the prelude: const int 0, 1; forall ( dtype DT ) const DT * const 0; forall ( ftype FT ) FT * const 0; This did, however, create some backward-compatibility problems and potential performance issues, and works poorly for generic types. To start with, this (entirely legal C) code snippet doesn't compile in Cforall: if ( 0 ) {} It desugars to `if ( (int)(0 != 0) ) {}`, and since both `int` and `forall(dtype DT) DT*` have a != operator which returns `int` the resolver can not choose which `0` variable to take, because they're both exact matches. The general `!=` computation may also be less efficient than a check for a zero value; take the following example of a rational type: struct rational { int32_t num, int32_t den }; rational 0 = { 0, 1 }; int ?!=? (rational a, rational b) { return ((int64_t)a.num)*b.den != ((int64_t)b.num)*a.den; } int not_zero (rational a) { return a.num != 0; } To check if two rationals are equal we need to do a pair of multiplications to normalize them (the casts in the example are to prevent overflow), but to check if a rational is non-zero we just need to check its numerator, a more efficient operation. Finally, though polymorphic null-pointer variables can be meaningfully defined, most other polymorphic variables cannot be, which makes it difficult to make generic types "truthy" using the existing system: forall(otype T) struct pair { T x; T y; }; forall(otype T | { T 0; }) pair(T) 0 = { 0, 0 }; Now, it seems natural enough to want to define the zero for this pair type as a pair of the zero values of its element type (if they're defined). The declaration of `pair(T) 0` above is actually illegal though, as there is no way to represent the zero values of an infinite number of types in the single memory location available for this polymorphic variable - the polymorphic null-pointer variables defined in the prelude are legal, but that is only because all pointers are the same size and the single zero value is a legal value of all pointer types simultaneously; null pointer is, however, somewhat unique in this respect. The technical explanation for the problems with polymorphic zero is that `0` is really a rvalue, not a lvalue - an expression, not an object. Drawing from this, the solution we propose is to give `0` a new built-in type, `zero_t`, and similarly give `1` the new built-in type `one_t`. If the prelude defines `!=` over `zero_t` this solves the `if ( 0 )` problem, because now the unambiguous best interpretation of `0 != 0` is to read them both as `zero_t` (and say that this expression is false). Backwards compatibility with C can be served by defining conversions in the prelude from `zero_t` and `one_t` to `int` and the appropriate pointer types, as below: // int 0; forall(otype T | { void ?{safe}(T*, int); }) void ?{safe} (T*, zero_t); forall(otype T | { void ?{unsafe}(T*, int); }) void ?{unsafe} (T*, zero_t); // int 1; forall(otype T | { void ?{safe}(T*, int); }) void ?{safe} (T*, one_t); forall(otype T | { void ?{unsafe}(T*, int); }) void ?{unsafe} (T*, one_t); // forall(dtype DT) const DT* 0; forall(dtype DT) void ?{safe}(const DT**, zero_t); // forall(ftype FT) FT* 0; forall(ftype FT) void ?{safe}(FT**, zero_t); Further, with this change, instead of making `0` and `1` overloadable variables, we can instead allow user-defined constructors (or, more flexibly, safe conversions) from `zero_t`, as below: // rational 0 = { 0, 1 }; void ?{safe} (rational *this, zero_t) { this->num = 0; this->den = 1; } Note that we don't need to name the `zero_t` parameter to this constructor, because its only possible value is a literal zero. This one line allows `0` to be used anywhere a `rational` is required, as well as enabling the same use of rationals in boolean contexts as above (by interpreting the `0` in the desguraring to be a rational by this conversion). Furthermore, while defining a conversion function from literal zero to `rational` makes rational a "truthy" type able to be used in a boolean context, we can optionally further optimize the truth decision on rationals as follows: int ?!=? (rational a, zero_t) { return a.num != 0; } This comparison function will be chosen in preference to the more general rational comparison function for comparisons against literal zero (like in boolean contexts) because it doesn't require a conversion on the `0` argument. Functions of the form `int ?!=? (T, zero_t)` can acutally be used in general to make a type `T` truthy without making `0` a value which can convert to that type, a capability not available in the current design. This design also solves the problem of polymorphic zero for generic types, as in the following example: // ERROR: forall(otype T | { T 0; }) pair(T) 0 = { 0, 0 }; forall(otype T | { T 0; }) void ?{safe} (pair(T) *this, zero_t) { this->x = 0; this->y = 0; } The polymorphic variable declaration didn't work, but this constructor is perfectly legal and has the desired semantics. We can assert that `T` can be used in a boolean context as follows: `forall(otype T | { int ?!=?(T, zero_t); })` Since the C standard (6.5.16.1.1) specifically states that pointers can be assigned into `_Bool` variables (and implies that other artithmetic types can be assigned into `_Bool` variables), it seems natural to say that assignment into a `_Bool` variable effectively constitutes a boolean context. To allow this interpretation, I propose including the following function (or its effective equivalent) in the prelude: forall(otype T | { int ?!=?(T, zero_t); }) void ?{safe}( _Bool *this, T that ) { *this = that != 0; } Note that this conversion is not transitive; that is, for `t` a variable of some "truthy" type `T`, `(_Bool)t;` would use this conversion (in the absence of a lower-cost one), `(int)t;` would not use this conversion (and in fact would not be legal in the absence of another valid way to convert a `T` to an `int`), but `(int)(_Bool)t;` could legally use this conversion. Similarly giving literal `1` the special type `one_t` allows for more concise and consistent specification of the increment and decrement operators, using the following de-sugaring: ++i => i += 1 i++ => (tmp = i, i += 1, tmp) --i => i -= 1 i-- => (tmp = i, i -= 1, tmp) In the examples above, `tmp` is a fresh temporary with its type inferred from the return type of `i += 1`. Under this proposal, defining a conversion from `one_t` to `T` and a `lvalue T ?+=? (T*, T)` provides both the pre- and post-increment operators for free in a consistent fashion (similarly for -= and the decrement operators). If a meaningful `1` cannot be defined for a type, both increment operators can still be defined with the signature `lvalue T ?+=? (T*, one_t)`. Similarly, if scalar addition can be performed on a type more efficiently than by repeated increment, `lvalue T ?+=? (T*, int)` will not only define the addition operator, it will simultaneously define consistent implementations of both increment operators (this can also be accomplished by defining a conversion from `int` to `T` and an addition operator `lvalue T ?+=?(T*, T)`). To allow functions of the form `lvalue T ?+=? (T*, int)` to satisfy "has an increment operator" assertions of the form `lvalue T ?+=? (T*, one_t)`, we also define a non-transitive unsafe conversion from `_Bool` (allowable values `0` and `1`) to `one_t` (and `zero_t`) as follows: void ?{unsafe} (one_t*, _Bool) {} As a note, the desugaring of post-increment above is possibly even more efficient than that of C++ - in C++, the copy to the temporary may be hidden in a separately-compiled module where it can't be elided in cases where it is not used, whereas this approach for Cforall always gives the compiler the opportunity to optimize out the temporary when it is not needed. Furthermore, one could imagine a post-increment operator that returned some type `T2` that was implicitly convertable to `T` but less work than a full copy of `T` to create (this seems like an absurdly niche case) - since the type of `tmp` is inferred from the return type of `i += 1`, you could set up functions with the following signatures to enable an equivalent pattern in Cforall: lvalue T2 ?+=? (T*, one_t); // increment operator returns T2 void ?{} (T2*, T); // initialize T2 from T for use in `tmp = i` void ?{safe} (T*, T2); // allow T2 to be used as a T when needed to // preserve expected semantics of T x = y++;