[5c6afcd] | 1 | ## Lvalues and References ## |
---|
| 2 | C defines the notion of a _lvalue_, essentially an addressable object, as well |
---|
| 3 | as a number of type _qualifiers_, `const`, `volatile`, and `restrict`. |
---|
| 4 | As these type qualifiers are generally only meaningful to the type system as |
---|
| 5 | applied to lvalues, the two concepts are closely related. |
---|
| 6 | A const lvalue cannot be modified, the compiler cannot assume that a volatile |
---|
| 7 | lvalue will not be concurrently modified by some other part of the system, and |
---|
| 8 | a restrict lvalue must have pointer type, and the compiler may assume that no |
---|
| 9 | other pointer in scope aliases that pointer (this is solely a performance |
---|
| 10 | optimization, and may be ignored by implementers). |
---|
| 11 | _Lvalue-to-rvalue conversion_, which takes an lvalue of type `T` and converts |
---|
| 12 | it to an expression result of type `T` (commonly called an _rvalue_ of type |
---|
| 13 | `T`) also strips all the qualifiers from the lvalue, as an expression result |
---|
| 14 | is a value, not an addressable object that can have properties like |
---|
| 15 | immutability. |
---|
| 16 | Though lvalue-to-rvalue conversion strips the qualifiers from lvalues, |
---|
| 17 | derived rvalue types such as pointer types may include qualifiers; |
---|
| 18 | `const int *` is a distinct type from `int *`, though the latter is safely |
---|
[a4b3525] | 19 | convertible to the former. |
---|
[5c6afcd] | 20 | In general, any number of qualifiers can be safely added to the |
---|
| 21 | pointed-to-type of a pointer type, e.g. `int *` converts safely to |
---|
| 22 | `const int *` and `volatile int *`, both of which convert safely to |
---|
| 23 | `const volatile int *`. |
---|
| 24 | |
---|
[a4b3525] | 25 | Since lvalues are precisely "addressable objects", in C, only lvalues can be |
---|
[5c6afcd] | 26 | used as the operand of the `&` address-of operator. |
---|
| 27 | Similarly, only modifiable lvalues may be used as the assigned-to |
---|
| 28 | operand of the mutating operators: assignment, compound assignment |
---|
| 29 | (e.g. `+=`), and increment and decrement; roughly speaking, lvalues without |
---|
| 30 | the `const` qualifier are modifiable, but lvalues of incomplete types, array |
---|
| 31 | types, and struct or union types with const members are also not modifiable. |
---|
| 32 | Lvalues are produced by the following expressions: object identifiers |
---|
| 33 | (function identifiers are not considered to be lvalues), the result of the `*` |
---|
| 34 | dereference operator applied to an object pointer, the result of a member |
---|
| 35 | expression `s.f` if the left argument `s` is an lvalue (note that the |
---|
| 36 | preceding two rules imply that the result of indirect member expressions |
---|
| 37 | `s->f` are always lvalues, by desugaring to `(*s).f`), and the result of the |
---|
| 38 | indexing operator `a[i]` (similarly by its desugaring to `*((a)+(i))`). |
---|
| 39 | Somewhat less obviously, parenthesized lvalue expressions, string literals, |
---|
| 40 | and compound literals (e.g. `(struct foo){ 'x', 3.14, 42 }`) are also lvalues. |
---|
| 41 | |
---|
| 42 | All of the conversions described above are defined in standard C, but Cforall |
---|
| 43 | requires further features from its type system. |
---|
| 44 | In particular, to allow overloading of the `*?` and `?[?]` dereferencing and |
---|
| 45 | indexing operators, Cforall requires a way to declare that the functions |
---|
| 46 | defining these operators return lvalues, and since C functions never return |
---|
| 47 | lvalues and for syntactic reasons we wish to distinguish functions which |
---|
| 48 | return lvalues from functions which return pointers, this is of necessity an |
---|
| 49 | extension to standard C. |
---|
| 50 | In the current design, an `lvalue` qualifier can be added to function return |
---|
| 51 | types (and only to function return types), the effect of which is to return a |
---|
| 52 | pointer which is implicitly dereferenced by the caller. |
---|
| 53 | C++ includes the more general concept of _references_, which are typically |
---|
| 54 | implemented as implicitly dereferenced pointers as well. |
---|
| 55 | Another use case which C++ references support is providing a way to pass |
---|
| 56 | function parameters by reference (rather than by value) with a natural |
---|
| 57 | syntax; Cforall in its current state has no such mechanism. |
---|
| 58 | As an example, consider the following (currently typical) copy-constructor |
---|
| 59 | signature and call: |
---|
| 60 | |
---|
| 61 | void ?{}(T *lhs, T rhs); |
---|
| 62 | |
---|
| 63 | T x; |
---|
| 64 | T y = { x }; |
---|
| 65 | |
---|
| 66 | Note that the right-hand argument is passed by value, and would in fact be |
---|
| 67 | copied twice in the course of the constructor call `T y = { x };` (once into |
---|
| 68 | the parameter by C's standard `memcpy` semantics, once again in the body of |
---|
| 69 | the copy constructor, though it is possible that return value optimization |
---|
| 70 | will elide the `memcpy`-style copy). |
---|
| 71 | However, to pass by reference using the existing pointer syntax, the example |
---|
| 72 | above would look like this: |
---|
| 73 | |
---|
| 74 | void ?{}(T *lhs, const T *rhs); |
---|
| 75 | |
---|
| 76 | T x; |
---|
| 77 | T y = { &x }; |
---|
| 78 | |
---|
| 79 | This example is not even as bad as it could be; assuming pass-by-reference is |
---|
| 80 | the desired semantics for the `?+?` operator, that implies the following |
---|
| 81 | design today: |
---|
| 82 | |
---|
| 83 | T ?+?(const T *lhs, const T *rhs); |
---|
| 84 | |
---|
| 85 | T a, b; |
---|
| 86 | T c = &a + &b, |
---|
| 87 | |
---|
| 88 | In addition to `&a + &b` being unsightly and confusing syntax to add `a` and |
---|
| 89 | `b`, it also introduces a possible ambiguity with pointer arithmetic on `T*` |
---|
| 90 | which can only be resolved by return-type inference. |
---|
| 91 | |
---|
| 92 | Pass-by-reference and marking functions as returning lvalues instead of the |
---|
| 93 | usual rvalues are actually closely related concepts, as obtaining a reference |
---|
| 94 | to pass depends on the referenced object being addressable, i.e. an lvalue, |
---|
| 95 | and lvalue return types are effectively return-by-reference. |
---|
| 96 | Cforall should also unify the concepts, with a parameterized type for |
---|
| 97 | "reference to `T`", which I will write `T&`. |
---|
| 98 | |
---|
| 99 | Firstly, assignment to a function parameter as part of a function call and |
---|
| 100 | local variable initialization have almost identical semantics, so should be |
---|
| 101 | treated similarly for the reference type too; this implies we should be able |
---|
| 102 | to declare local variables of reference type, as in the following: |
---|
| 103 | |
---|
| 104 | int x = 42; |
---|
| 105 | int& r = x; // r is now an alias for x |
---|
| 106 | |
---|
| 107 | Unlike in C++, we would like to have the capability to re-bind references |
---|
| 108 | after initialization, as this allows the attractive syntax of references to |
---|
| 109 | support some further useful code patterns, such as first initializing a |
---|
| 110 | reference after its declaration. |
---|
| 111 | Constant references to `T` (`T& const`) should not be re-bindable. |
---|
| 112 | |
---|
| 113 | One option for re-binding references is to use a dedicated operator, as in the |
---|
| 114 | code example below: |
---|
| 115 | |
---|
| 116 | int i = 42, j = 7; |
---|
| 117 | int& r = i; // bind r to i |
---|
| 118 | r = j; // set i (== r) to 7 |
---|
| 119 | r := j; // rebind r to j using the new := rebind operator |
---|
| 120 | i = 42; // reset i (!= r) to 42 |
---|
| 121 | assert( r == 7 ); |
---|
| 122 | |
---|
| 123 | Another option for reference rebind is to modify the semantics of the `&` |
---|
| 124 | address-of operator. |
---|
| 125 | In standard C, the address-of operator never returns an lvalue, but for an |
---|
| 126 | object of type `T`, returns a `T*`. |
---|
| 127 | If the address-of operator returned an lvalue for references, this would |
---|
| 128 | allow reference rebinding using the usual pointer assignment syntax; |
---|
| 129 | that is, if address-of a `T&` returned a `T*&` then the following works: |
---|
| 130 | |
---|
| 131 | int i = 42; j = 7; |
---|
| 132 | int& r = i; // bind r to i |
---|
| 133 | r = j; // set i (== r) to 7 |
---|
| 134 | &r = &j; // rebind r to j using the newly mutable "address-of reference" |
---|
| 135 | i = 42; // reset i (!= r) to 42 |
---|
| 136 | assert( r == 7 ); |
---|
| 137 | |
---|
| 138 | This change (making addresses of references mutable) allows use of existing |
---|
| 139 | operators defined over pointers, as well as elegant handling of nested |
---|
| 140 | references-to-references. |
---|
| 141 | |
---|
| 142 | The semantics and restrictions of `T&` are effectively the semantics of an |
---|
| 143 | lvalue of type `T`, and by this analogy there should be a safe, qualifier |
---|
| 144 | dropping conversion from `const volatile restrict T&` (and every other |
---|
| 145 | qualifier combination on the `T` in `T&`) to `T`. |
---|
| 146 | With this conversion, the resolver may type most expressions that C would |
---|
| 147 | call "lvalue of type `T`" as `T&`. |
---|
| 148 | There's also an obvious argument that lvalues of a (possibly-qualified) type |
---|
[a4b3525] | 149 | `T` should be convertible to references of type `T`, where `T` is also |
---|
[5c6afcd] | 150 | so-qualified (e.g. lvalue `int` to `int&`, lvalue `const char` to |
---|
| 151 | `const char&`). |
---|
| 152 | By similar arguments to pointer types, qualifiers should be addable to the |
---|
| 153 | referred-to type of a reference (e.g. `int&` to `const int&`). |
---|
[a4b3525] | 154 | As a note, since pointer arithmetic is explicitly not defined on `T&`, |
---|
[5c6afcd] | 155 | `restrict T&` should be allowable and would have alias-analysis rules that |
---|
| 156 | are actually comprehensible to mere mortals. |
---|
| 157 | |
---|
| 158 | Using pass-by-reference semantics for function calls should not put syntactic |
---|
| 159 | constraints on how the function is called; particularly, temporary values |
---|
| 160 | should be able to be passed by reference. |
---|
| 161 | The mechanism for this pass-by-reference would be to store the value of the |
---|
| 162 | temporary expression into a new unnamed temporary, and pass the reference of |
---|
| 163 | that temporary to the function. |
---|
| 164 | As an example, the following code should all compile and run: |
---|
| 165 | |
---|
| 166 | void f(int& x) { printf("%d\n", x++); } |
---|
| 167 | |
---|
| 168 | int i = 7, j = 11; |
---|
| 169 | const int answer = 42; |
---|
| 170 | |
---|
| 171 | f(i); // (1) |
---|
| 172 | f(42); // (2) |
---|
| 173 | f(i + j); // (3) |
---|
| 174 | f(answer); // (4) |
---|
| 175 | |
---|
| 176 | The semantics of (1) are just like C++'s, "7" is printed, and `i` has the |
---|
| 177 | value 8 afterward. |
---|
| 178 | For (2), "42" is printed, and the increment of the unnamed temporary to 43 is |
---|
| 179 | not visible to the caller; (3) behaves similarly, printing "19", but not |
---|
| 180 | changing `i` or `j`. |
---|
| 181 | (4) is a bit of an interesting case; we want to be able to support named |
---|
| 182 | constants like `answer` that can be used anywhere the constant expression |
---|
| 183 | they're replacing (like `42`) could go; in this sense, (4) and (2) should have |
---|
| 184 | the same semantics. |
---|
| 185 | However, we don't want the mutation to the `x` parameter to be visible in |
---|
| 186 | `answer` afterward, because `answer` is a constant, and thus shouldn't change. |
---|
| 187 | The solution to this is to allow chaining of the two lvalue conversions; |
---|
| 188 | `answer` has the type `const int&`, which can be converted to `int` by the |
---|
| 189 | lvalue-to-rvalue conversion (which drops the qualifiers), then up to `int&` |
---|
| 190 | by the temporary-producing rvalue-to-lvalue conversion. |
---|
| 191 | Thus, an unnamed temporary is inserted, initialized to `answer` (i.e. 42), |
---|
| 192 | mutated by `f`, then discarded; "42" is printed, just as in case (2), and |
---|
| 193 | `answer` still equals 42 after the call, because it was the temporary that was |
---|
| 194 | mutated, not `answer`. |
---|
| 195 | It may be somewhat surprising to C++ programmers that `f(i)` mutates `i` while |
---|
| 196 | `f(answer)` does not mutate `answer` (though `f(answer)` would be illegal in |
---|
| 197 | C++, leading to the dreaded "const hell"), but the behaviour of this rule can |
---|
| 198 | be determined by examining local scope with the simple rule "non-`const` |
---|
| 199 | references to `const` variables produce temporaries", which aligns with |
---|
| 200 | programmer intuition that `const` variables cannot be mutated. |
---|
| 201 | |
---|
| 202 | To bikeshed syntax for `T&`, there are three basic options: language |
---|
| 203 | keywords (`lvalue T` is already in Cforall), compiler-supported "special" |
---|
| 204 | generic types (e.g. `ref(T)`), or sigils (`T&` is familiar to C++ |
---|
| 205 | programmers). |
---|
| 206 | Keyword or generic based approaches run the risk of name conflicts with |
---|
| 207 | existing code, while any sigil used would have to be carefully chosen to not |
---|
| 208 | create parsing conflicts. |
---|