Changeset d5f1cfc

Jun 6, 2016, 6:31:42 PM (8 years ago)
Aaron Moss <a3moss@…>
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, ctor, deferred_resn, demangler, enum, forall-pointer-decay, gc_noraii, jacob/cs343-translation, jenkins-sandbox, master, memory, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
685a5e8, dd51906

Update design of reference types

1 edited


  • doc/working/

    rab60d6d rd5f1cfc  
    8787element-wise; summation starts at `(0, 0, 0, 0)`.
    89 **TODO** Costs of T, T*, lvalue T, rvalue T conversions (if applicable)
    91 ### Lvalue Conversions ###
    92 **TODO** Finish me
    94 #### NOTES
    95 * C standard
    96 *
    98 ### Qualifier Conversions ###
    99 **TODO** Finish me
    101 #### NOTES
    102 C standard We can add any qualifier to the pointed-to-type of a
    103 pointer.
    104 * Glen thinks this means that we should make the default assignment operator
    105   `?=?(T volatile restrict *this, T that)`, but I'm not sure I like the
    106   implications for the actual implementation of forcing `this` to be
    107   volatile
    108 * I want to consider whether this property should generalize to other
    109   parameterized types (e.g. `lvalue T`, `box(T)`)
    111 C standard "modifiable lvalues" recursively exclude structs with
    112 const-qualified fields
    114 C standard Using lvalues as rvalues implicitly strips qualifiers
    116 C standard
    118 C standard 6.7.3: **TODO**
     89### Lvalue and Qualifier Conversions ###
     90C defines the notion of a _lvalue_, essentially an addressable object, as well
     91as a number of type _qualifiers_, `const`, `volatile`, and `restrict`.
     92As these type qualifiers are generally only meaningful to the type system as
     93applied to lvalues, the two concepts are closely related.
     94A const lvalue cannot be modified, the compiler cannot assume that a volatile
     95lvalue will not be concurrently modified by some other part of the system, and
     96a restrict lvalue must have pointer type, and the compiler may assume that no
     97other pointer in scope aliases that pointer (this is solely a performance
     98optimization, and may be ignored by implementers).
     99_Lvalue-to-rvalue conversion_, which takes an lvalue of type `T` and converts
     100it to an expression result of type `T` (commonly called an _rvalue_ of type
     101`T`) also strips all the qualifiers from the lvalue, as an expression result
     102is a value, not an addressable object that can have properties like
     104Though lvalue-to-rvalue conversion strips the qualifiers from lvalues,
     105derived rvalue types such as pointer types may include qualifiers;
     106`const int *` is a distinct type from `int *`, though the latter is safely
     107convertable to the former.
     108In general, any number of qualifiers can be safely added to the
     109pointed-to-type of a pointer type, e.g. `int *` converts safely to
     110`const int *` and `volatile int *`, both of which convert safely to
     111`const volatile int *`.
     113Since lvalues are precicely "addressable objects", in C, only lvalues can be
     114used as the operand of the `&` address-of operator.
     115Similarly, only modifiable lvalues may be used as the assigned-to
     116operand of the mutating operators: assignment, compound assignment
     117(e.g. `+=`), and increment and decrement; roughly speaking, lvalues without
     118the `const` qualifier are modifiable, but lvalues of incomplete types, array
     119types, and struct or union types with const members are also not modifiable.
     120Lvalues are produced by the following expressions: object identifiers
     121(function identifiers are not considered to be lvalues), the result of the `*`
     122dereference operator applied to an object pointer, the result of a member
     123expression `s.f` if the left argument `s` is an lvalue (note that the
     124preceding two rules imply that the result of indirect member expressions
     125`s->f` are always lvalues, by desugaring to `(*s).f`), and the result of the
     126indexing operator `a[i]` (similarly by its desugaring to `*((a)+(i))`).
     127Somewhat less obviously, parenthesized lvalue expressions, string literals,
     128and compound literals (e.g. `(struct foo){ 'x', 3.14, 42 }`) are also lvalues.
     130All of the conversions described above are defined in standard C, but Cforall
     131requires further features from its type system.
     132In particular, to allow overloading of the `*?` and `?[?]` dereferencing and
     133indexing operators, Cforall requires a way to declare that the functions
     134defining these operators return lvalues, and since C functions never return
     135lvalues and for syntactic reasons we wish to distinguish functions which
     136return lvalues from functions which return pointers, this is of necessity an
     137extension to standard C.
     138In the current design, an `lvalue` qualifier can be added to function return
     139types (and only to function return types), the effect of which is to return a
     140pointer which is implicitly dereferenced by the caller.
     141C++ includes the more general concept of _references_, which are typically
     142implemented as implicitly dereferenced pointers as well.
     143Another use case which C++ references support is providing a way to pass
     144function parameters by reference (rather than by value) with a natural
     145syntax; Cforall in its current state has no such mechanism.
     146As an example, consider the following (currently typical) copy-constructor
     147signature and call:
     149        void ?{}(T *lhs, T rhs);
     151        T x;
     152        T y = { x };
     154Note that the right-hand argument is passed by value, and would in fact be
     155copied twice in the course of the constructor call `T y = { x };` (once into
     156the parameter by C's standard `memcpy` semantics, once again in the body of
     157the copy constructor, though it is possible that return value optimization
     158will elide the `memcpy`-style copy).
     159However, to pass by reference using the existing pointer syntax, the example
     160above would look like this:
     162        void ?{}(T *lhs, const T *rhs);
     164        T x;
     165        T y = { &x };
     167This example is not even as bad as it could be; assuming pass-by-reference is
     168the desired semantics for the `?+?` operator, that implies the following
     169design today:
     171        T ?+?(const T *lhs, const T *rhs);
     173        T a, b;
     174        T c = &a + &b,
     176In addition to `&a + &b` being unsightly and confusing syntax to add `a` and
     177`b`, it also introduces a possible ambiguity with pointer arithmetic on `T*`
     178which can only be resolved by return-type inference.
     180Pass-by-reference and marking functions as returning lvalues instead of the
     181usual rvalues are actually closely related concepts, as obtaining a reference
     182to pass depends on the referenced object being addressable, i.e. an lvalue,
     183and lvalue return types are effectively return-by-reference.
     184Cforall should also unify the concepts, with a parameterized type for
     185"reference to `T`", which I will write `ref T`.
     186Syntax bikeshedding can be done later (there are some examples at the bottom
     187of this section), but `ref T` is sufficiently distinct from both the existing
     188`lvalue T` (which it subsumes) and the closely related C++ `T&` to allow
     189independent discussion of its semantics.
     191Firstly, assignment to a function parameter as part of a function call and
     192local variable initialization have almost identical semantics, so should be
     193treated similarly for the reference type too; this implies we should be able
     194to declare local variables of reference type, as in the following:
     196        int x = 42;
     197        ref int r = x; // r is now an alias for x
     199Unlike in C++, we would like to have the capability to re-bind references
     200after initialization, as this allows the attractive syntax of references to
     201support some further useful code patterns, such as first initializing a
     202reference after its declaration.
     203Constant references to `T` (`const ref T`) should not be re-bindable.
     205One option for re-binding references is to use a dedicated operator, as in the
     206code example below:
     208        int i = 42, j = 7;
     209        ref int r = i; // bind r to i
     210        r = j;         // set i (== r) to 7
     211        r := j;        // rebind r to j using the new := rebind operator
     212        i = 42;        // reset i (!= r) to 42
     213        assert( r == 7 );
     215The other syntactic option for reference re-bind would be to overload
     216assignment and use type inference on the left and right-hand sides to
     217determine whether the referred-to variable on the left should be reassigned to
     218the value on the right, or if the reference on the left should be aliased to
     219the reference on the right.
     220This could be disambiguated with casts, as in the following code example:
     222        int i
     223        int j;
     224        ref int r = i;   // (0a)
     225        ref int s = i;   // (0b)
     227        i = j;           // (1)
     228        i = (int)s;      // (2)
     229        i = s;           // (3)
     230        // ---------------------
     231        r = s;           // (4)
     232        r = (ref int)j;  // (5)
     233        // ---------------------
     234        r = j;           // (6)
     235        r = (int)s;      // (7)
     237By the expected aliasing syntax, (0a) and (0b) are initializing `r` and `s` as
     238aliases for `i`.
     239For C compatibility, (1) has to be assignment; in general, any assignment to a
     240non-reference type should be assignment, so (2) and (3) are as well.
     241By types, (4) and (5) should have the same semantics, and the semantics of (6)
     242and (7) should match as well.
     243This suggests that (4) and (5) are reference re-bind, and (6) and (7) are an
     244assignment to the referred variable; this makes the syntax to explicitly alias
     245a local variable rather ugly (and inconsistent with the initialization
     246syntax), as well as making it rather awkward to copy the value stored in one
     247reference-type variable into another reference type variable (which is likely
     248more painful in functions with by-reference parameters than with local
     249variables of reference type).
     251Because of the aforementioned issues with overloading assignment as reference
     252rebind, in addition to the fact that reference rebind should not be a
     253user-overloadable operator (unlike assignment), I propose refererence rebind
     254should have its own dedicated operator.
     256The semantics and restrictions of `ref T` are effectively the semantics of an
     257lvalue of type `T`, and by this analogy there should be a safe, qualifier
     258dropping conversion from `ref const volatile restrict T` (and every other
     259qualifier combination on the `T` in `ref T`) to `T`.
     260With this conversion, the resolver may type most expressions that C would
     261call "lvalue of type `T`" as `ref T`.
     262There's also an obvious argument that lvalues of a (possibly-qualified) type
     263`T` should be convertable to references of type `T`, where `T` is also
     264so-qualified (e.g. lvalue `int` to `ref int`, lvalue `const char` to
     265`ref const char`).
     266By similar arguments to pointer types, qualifiers should be addable to the
     267referred-to type of a reference (e.g. `ref int` to `ref const int`).
     268As a note, since pointer arithmetic is explictly not defined on `ref T`,
     269`restrict ref T` should be allowable and would have alias-analysis rules that
     270are actually comprehensible to mere mortals.
     272Using pass-by-reference semantics for function calls should not put syntactic
     273constraints on how the function is called; particularly, temporary values
     274should be able to be passed by reference.
     275The mechanism for this pass-by-reference would be to store the value of the
     276temporary expression into a new unnamed temporary, and pass the reference of
     277that temporary to the function.
     278As an example, the following code should all compile and run:
     280        void f(ref int x) { printf("%d\n", x++); }
     282        int i = 7, j = 11;
     283        const int answer = 42;
     285        f(i);      // (1)
     286        f(42);     // (2)
     287        f(i + j);  // (3)
     288        f(answer); // (4)
     290The semantics of (1) are just like C++'s, "7" is printed, and `i` has the
     291value 8 afterward.
     292For (2), "42" is printed, and the increment of the unnamed temporary to 43 is
     293not visible to the caller; (3) behaves similarly, printing "19", but not
     294changing `i` or `j`.
     295(4) is a bit of an interesting case; we want to be able to support named
     296constants like `answer` that can be used anywhere the constant expression
     297they're replacing (like `42`) could go; in this sense, (4) and (2) should have
     298the same semantics.
     299However, we don't want the mutation to the `x` parameter to be visible in
     300`answer` afterward, because `answer` is a constant, and thus shouldn't change.
     301The solution to this is to allow chaining of the two `ref` conversions;
     302`answer` has the type `ref const int`, which can be converted to `int` by the
     303lvalue-to-rvalue conversion (which drops the qualifiers), then up to `ref int`
     304by the temporary-producing rvalue-to-lvalue conversion.
     305Thus, an unnamed temporary is inserted, initialized to `answer` (i.e. 42),
     306mutated by `f`, then discarded; "42" is printed, just as in case (2), and
     307`answer` still equals 42 after the call, because it was the temporary that was
     308mutated, not `answer`.
     309It may be somewhat surprising to C++ programmers that `f(i)` mutates `i` while
     310`f(answer)` does not mutate `answer` (though `f(answer)` would be illegal in
     311C++, leading to the dreaded "const hell"), but the behaviour of this rule can
     312be determined by examining local scope with the simple rule "non-`const`
     313references to `const` variables produce temporaries", which aligns with
     314programmer intuition that `const` variables cannot be mutated.
     316To bikeshed syntax for `ref T`, there are three basic options: language
     317keywords (`lvalue T` is already in Cforall), compiler-supported "special"
     318generic types (e.g. `ref(T)`), or sigils (`T&` is familiar to C++
     320Keyword or generic based approaches run the risk of name conflicts with
     321existing code, while any sigil used would have to be carefully chosen to not
     322create parsing conflicts.
     324**TODO** Consider arguments for move semantics and see if there is a
     325compelling case for rvalue references.
    120327### Conversion Operator Costs ###
    624831perfectly legal and has the desired semantics.
     833We can assert that `T` can be used in a boolean context as follows:
     835        `forall(otype T | { int ?!=?(T, _zero_t); })`
     837Since the C standard ( specifically states that pointers can be
     838assigned into `_Bool` variables (and implies that other artithmetic types can
     839be assigned into `_Bool` variables), it seems natural to say that assignment
     840into a `_Bool` variable effectively constitutes a boolean context.
     841To allow this interpretation, I propose including the following function (or
     842its effective equivalent) in the prelude:
     844        forall(otype T | { int ?!=?(T, _zero_t); })
     845        void ?{safe}( _Bool *this, T that ) { *this = that != 0; }
     847Note that this conversion is not transitive; that is, for `t` a variable of
     848some "truthy" type `T`, `(_Bool)t;` would use this conversion (in the absence
     849of a lower-cost one), `(int)t;` would not use this conversion (and in fact
     850would not be legal in the absence of another valid way to convert a `T` to an
     851`int`), but `(int)(_Bool)t;` could legally use this conversion.
    626853Similarly giving literal `1` the special type `_unit_t` allows for more
    627854concise and consistent specification of the increment and decrement operators,
    714941then take the non-deleted alternative, and of two equivalent-cost deleted
    715942interpretations with the same return type pick one arbitrarily rather than
    716 producing an ambiguous resolution.
     943producing an ambiguous resolution. This would also be useful for forbidding
     944pointer-to-floating-point explicit conversions (C11,
     945**TODO** Cover default parameters, maybe named parameters (see "named
     946arguments" thread of 11 March 2016)
    718949### Sizeof, Alignof & Offsetof Expressions ###
    732963for each interpretation `J` of `y` with the same type as `J` costing the sum
    733964of the costs of `I` and `J`.
     966### Index Expressions ###
     967**TODO** Consider adding polymorphic function in prelude for this, as per
     9686. in the C standard:
     970        forall(otype T, otype I, otype R, otype E | { R ?+?(T, I); lvalue E *?(R); })
     971        lvalue E ?[?](T a, I i) { return *(a + i); }
     973I think this isn't actually a good idea, because the cases for it are niche,
     974mostly odd tricks like `0[p]` as an alternate syntax for dereferencing a
     975pointer `p`, and adding it to the prelude would slow down resolution of
     976every index expression just a bit. Our existing prelude includes an indexing
     977operator `forall(otype T) lvalue T ?[?](ptrdiff_t, T*)`, plus qualified
     978variants, which should satisfy C source-compatibility without propegating this
     979silly desugaring further.
    735981#### Compatible Functions ####
Note: See TracChangeset for help on using the changeset viewer.