| 1 | ## Lvalues and References ## | 
|---|
| 2 | C defines the notion of a _lvalue_, essentially an addressable object, as well | 
|---|
| 3 | as a number of type _qualifiers_, `const`, `volatile`, and `restrict`. | 
|---|
| 4 | As these type qualifiers are generally only meaningful to the type system as | 
|---|
| 5 | applied to lvalues, the two concepts are closely related. | 
|---|
| 6 | A const lvalue cannot be modified, the compiler cannot assume that a volatile | 
|---|
| 7 | lvalue will not be concurrently modified by some other part of the system, and | 
|---|
| 8 | a restrict lvalue must have pointer type, and the compiler may assume that no | 
|---|
| 9 | other pointer in scope aliases that pointer (this is solely a performance | 
|---|
| 10 | optimization, and may be ignored by implementers). | 
|---|
| 11 | _Lvalue-to-rvalue conversion_, which takes an lvalue of type `T` and converts | 
|---|
| 12 | it to an expression result of type `T` (commonly called an _rvalue_ of type | 
|---|
| 13 | `T`) also strips all the qualifiers from the lvalue, as an expression result | 
|---|
| 14 | is a value, not an addressable object that can have properties like | 
|---|
| 15 | immutability. | 
|---|
| 16 | Though lvalue-to-rvalue conversion strips the qualifiers from lvalues, | 
|---|
| 17 | derived rvalue types such as pointer types may include qualifiers; | 
|---|
| 18 | `const int *` is a distinct type from `int *`, though the latter is safely | 
|---|
| 19 | convertable to the former. | 
|---|
| 20 | In general, any number of qualifiers can be safely added to the | 
|---|
| 21 | pointed-to-type of a pointer type, e.g. `int *` converts safely to | 
|---|
| 22 | `const int *` and `volatile int *`, both of which convert safely to | 
|---|
| 23 | `const volatile int *`. | 
|---|
| 24 |  | 
|---|
| 25 | Since lvalues are precicely "addressable objects", in C, only lvalues can be | 
|---|
| 26 | used as the operand of the `&` address-of operator. | 
|---|
| 27 | Similarly, only modifiable lvalues may be used as the assigned-to | 
|---|
| 28 | operand of the mutating operators: assignment, compound assignment | 
|---|
| 29 | (e.g. `+=`), and increment and decrement; roughly speaking, lvalues without | 
|---|
| 30 | the `const` qualifier are modifiable, but lvalues of incomplete types, array | 
|---|
| 31 | types, and struct or union types with const members are also not modifiable. | 
|---|
| 32 | Lvalues are produced by the following expressions: object identifiers | 
|---|
| 33 | (function identifiers are not considered to be lvalues), the result of the `*` | 
|---|
| 34 | dereference operator applied to an object pointer, the result of a member | 
|---|
| 35 | expression `s.f` if the left argument `s` is an lvalue (note that the | 
|---|
| 36 | preceding two rules imply that the result of indirect member expressions | 
|---|
| 37 | `s->f` are always lvalues, by desugaring to `(*s).f`), and the result of the | 
|---|
| 38 | indexing operator `a[i]` (similarly by its desugaring to `*((a)+(i))`). | 
|---|
| 39 | Somewhat less obviously, parenthesized lvalue expressions, string literals, | 
|---|
| 40 | and compound literals (e.g. `(struct foo){ 'x', 3.14, 42 }`) are also lvalues. | 
|---|
| 41 |  | 
|---|
| 42 | All of the conversions described above are defined in standard C, but Cforall | 
|---|
| 43 | requires further features from its type system. | 
|---|
| 44 | In particular, to allow overloading of the `*?` and `?[?]` dereferencing and | 
|---|
| 45 | indexing operators, Cforall requires a way to declare that the functions | 
|---|
| 46 | defining these operators return lvalues, and since C functions never return | 
|---|
| 47 | lvalues and for syntactic reasons we wish to distinguish functions which | 
|---|
| 48 | return lvalues from functions which return pointers, this is of necessity an | 
|---|
| 49 | extension to standard C. | 
|---|
| 50 | In the current design, an `lvalue` qualifier can be added to function return | 
|---|
| 51 | types (and only to function return types), the effect of which is to return a | 
|---|
| 52 | pointer which is implicitly dereferenced by the caller. | 
|---|
| 53 | C++ includes the more general concept of _references_, which are typically | 
|---|
| 54 | implemented as implicitly dereferenced pointers as well. | 
|---|
| 55 | Another use case which C++ references support is providing a way to pass | 
|---|
| 56 | function parameters by reference (rather than by value) with a natural | 
|---|
| 57 | syntax; Cforall in its current state has no such mechanism. | 
|---|
| 58 | As an example, consider the following (currently typical) copy-constructor | 
|---|
| 59 | signature and call: | 
|---|
| 60 |  | 
|---|
| 61 | void ?{}(T *lhs, T rhs); | 
|---|
| 62 |  | 
|---|
| 63 | T x; | 
|---|
| 64 | T y = { x }; | 
|---|
| 65 |  | 
|---|
| 66 | Note that the right-hand argument is passed by value, and would in fact be | 
|---|
| 67 | copied twice in the course of the constructor call `T y = { x };` (once into | 
|---|
| 68 | the parameter by C's standard `memcpy` semantics, once again in the body of | 
|---|
| 69 | the copy constructor, though it is possible that return value optimization | 
|---|
| 70 | will elide the `memcpy`-style copy). | 
|---|
| 71 | However, to pass by reference using the existing pointer syntax, the example | 
|---|
| 72 | above would look like this: | 
|---|
| 73 |  | 
|---|
| 74 | void ?{}(T *lhs, const T *rhs); | 
|---|
| 75 |  | 
|---|
| 76 | T x; | 
|---|
| 77 | T y = { &x }; | 
|---|
| 78 |  | 
|---|
| 79 | This example is not even as bad as it could be; assuming pass-by-reference is | 
|---|
| 80 | the desired semantics for the `?+?` operator, that implies the following | 
|---|
| 81 | design today: | 
|---|
| 82 |  | 
|---|
| 83 | T ?+?(const T *lhs, const T *rhs); | 
|---|
| 84 |  | 
|---|
| 85 | T a, b; | 
|---|
| 86 | T c = &a + &b, | 
|---|
| 87 |  | 
|---|
| 88 | In addition to `&a + &b` being unsightly and confusing syntax to add `a` and | 
|---|
| 89 | `b`, it also introduces a possible ambiguity with pointer arithmetic on `T*` | 
|---|
| 90 | which can only be resolved by return-type inference. | 
|---|
| 91 |  | 
|---|
| 92 | Pass-by-reference and marking functions as returning lvalues instead of the | 
|---|
| 93 | usual rvalues are actually closely related concepts, as obtaining a reference | 
|---|
| 94 | to pass depends on the referenced object being addressable, i.e. an lvalue, | 
|---|
| 95 | and lvalue return types are effectively return-by-reference. | 
|---|
| 96 | Cforall should also unify the concepts, with a parameterized type for | 
|---|
| 97 | "reference to `T`", which I will write `T&`. | 
|---|
| 98 |  | 
|---|
| 99 | Firstly, assignment to a function parameter as part of a function call and | 
|---|
| 100 | local variable initialization have almost identical semantics, so should be | 
|---|
| 101 | treated similarly for the reference type too; this implies we should be able | 
|---|
| 102 | to declare local variables of reference type, as in the following: | 
|---|
| 103 |  | 
|---|
| 104 | int x = 42; | 
|---|
| 105 | int& r = x; // r is now an alias for x | 
|---|
| 106 |  | 
|---|
| 107 | Unlike in C++, we would like to have the capability to re-bind references | 
|---|
| 108 | after initialization, as this allows the attractive syntax of references to | 
|---|
| 109 | support some further useful code patterns, such as first initializing a | 
|---|
| 110 | reference after its declaration. | 
|---|
| 111 | Constant references to `T` (`T& const`) should not be re-bindable. | 
|---|
| 112 |  | 
|---|
| 113 | One option for re-binding references is to use a dedicated operator, as in the | 
|---|
| 114 | code example below: | 
|---|
| 115 |  | 
|---|
| 116 | int i = 42, j = 7; | 
|---|
| 117 | int& r = i;  // bind r to i | 
|---|
| 118 | r = j;       // set i (== r) to 7 | 
|---|
| 119 | r := j;      // rebind r to j using the new := rebind operator | 
|---|
| 120 | i = 42;      // reset i (!= r) to 42 | 
|---|
| 121 | assert( r == 7 ); | 
|---|
| 122 |  | 
|---|
| 123 | Another option for reference rebind is to modify the semantics of the `&` | 
|---|
| 124 | address-of operator. | 
|---|
| 125 | In standard C, the address-of operator never returns an lvalue, but for an | 
|---|
| 126 | object of type `T`, returns a `T*`. | 
|---|
| 127 | If the address-of operator returned an lvalue for references, this would | 
|---|
| 128 | allow reference rebinding using the usual pointer assignment syntax; | 
|---|
| 129 | that is, if address-of a `T&` returned a `T*&` then the following works: | 
|---|
| 130 |  | 
|---|
| 131 | int i = 42; j = 7; | 
|---|
| 132 | int& r = i;  // bind r to i | 
|---|
| 133 | r = j;       // set i (== r) to 7 | 
|---|
| 134 | &r = &j;     // rebind r to j using the newly mutable "address-of reference" | 
|---|
| 135 | i = 42;      // reset i (!= r) to 42 | 
|---|
| 136 | assert( r == 7 ); | 
|---|
| 137 |  | 
|---|
| 138 | This change (making addresses of references mutable) allows use of existing | 
|---|
| 139 | operators defined over pointers, as well as elegant handling of nested | 
|---|
| 140 | references-to-references. | 
|---|
| 141 |  | 
|---|
| 142 | The semantics and restrictions of `T&` are effectively the semantics of an | 
|---|
| 143 | lvalue of type `T`, and by this analogy there should be a safe, qualifier | 
|---|
| 144 | dropping conversion from `const volatile restrict T&` (and every other | 
|---|
| 145 | qualifier combination on the `T` in `T&`) to `T`. | 
|---|
| 146 | With this conversion, the resolver may type most expressions that C would | 
|---|
| 147 | call "lvalue of type `T`" as `T&`. | 
|---|
| 148 | There's also an obvious argument that lvalues of a (possibly-qualified) type | 
|---|
| 149 | `T` should be convertable to references of type `T`, where `T` is also | 
|---|
| 150 | so-qualified (e.g. lvalue `int` to `int&`, lvalue `const char` to | 
|---|
| 151 | `const char&`). | 
|---|
| 152 | By similar arguments to pointer types, qualifiers should be addable to the | 
|---|
| 153 | referred-to type of a reference (e.g. `int&` to `const int&`). | 
|---|
| 154 | As a note, since pointer arithmetic is explictly not defined on `T&`, | 
|---|
| 155 | `restrict T&` should be allowable and would have alias-analysis rules that | 
|---|
| 156 | are actually comprehensible to mere mortals. | 
|---|
| 157 |  | 
|---|
| 158 | Using pass-by-reference semantics for function calls should not put syntactic | 
|---|
| 159 | constraints on how the function is called; particularly, temporary values | 
|---|
| 160 | should be able to be passed by reference. | 
|---|
| 161 | The mechanism for this pass-by-reference would be to store the value of the | 
|---|
| 162 | temporary expression into a new unnamed temporary, and pass the reference of | 
|---|
| 163 | that temporary to the function. | 
|---|
| 164 | As an example, the following code should all compile and run: | 
|---|
| 165 |  | 
|---|
| 166 | void f(int& x) { printf("%d\n", x++); } | 
|---|
| 167 |  | 
|---|
| 168 | int i = 7, j = 11; | 
|---|
| 169 | const int answer = 42; | 
|---|
| 170 |  | 
|---|
| 171 | f(i);      // (1) | 
|---|
| 172 | f(42);     // (2) | 
|---|
| 173 | f(i + j);  // (3) | 
|---|
| 174 | f(answer); // (4) | 
|---|
| 175 |  | 
|---|
| 176 | The semantics of (1) are just like C++'s, "7" is printed, and `i` has the | 
|---|
| 177 | value 8 afterward. | 
|---|
| 178 | For (2), "42" is printed, and the increment of the unnamed temporary to 43 is | 
|---|
| 179 | not visible to the caller; (3) behaves similarly, printing "19", but not | 
|---|
| 180 | changing `i` or `j`. | 
|---|
| 181 | (4) is a bit of an interesting case; we want to be able to support named | 
|---|
| 182 | constants like `answer` that can be used anywhere the constant expression | 
|---|
| 183 | they're replacing (like `42`) could go; in this sense, (4) and (2) should have | 
|---|
| 184 | the same semantics. | 
|---|
| 185 | However, we don't want the mutation to the `x` parameter to be visible in | 
|---|
| 186 | `answer` afterward, because `answer` is a constant, and thus shouldn't change. | 
|---|
| 187 | The solution to this is to allow chaining of the two lvalue conversions; | 
|---|
| 188 | `answer` has the type `const int&`, which can be converted to `int` by the | 
|---|
| 189 | lvalue-to-rvalue conversion (which drops the qualifiers), then up to `int&` | 
|---|
| 190 | by the temporary-producing rvalue-to-lvalue conversion. | 
|---|
| 191 | Thus, an unnamed temporary is inserted, initialized to `answer` (i.e. 42), | 
|---|
| 192 | mutated by `f`, then discarded; "42" is printed, just as in case (2), and | 
|---|
| 193 | `answer` still equals 42 after the call, because it was the temporary that was | 
|---|
| 194 | mutated, not `answer`. | 
|---|
| 195 | It may be somewhat surprising to C++ programmers that `f(i)` mutates `i` while | 
|---|
| 196 | `f(answer)` does not mutate `answer` (though `f(answer)` would be illegal in | 
|---|
| 197 | C++, leading to the dreaded "const hell"), but the behaviour of this rule can | 
|---|
| 198 | be determined by examining local scope with the simple rule "non-`const` | 
|---|
| 199 | references to `const` variables produce temporaries", which aligns with | 
|---|
| 200 | programmer intuition that `const` variables cannot be mutated. | 
|---|
| 201 |  | 
|---|
| 202 | To bikeshed syntax for `T&`, there are three basic options: language | 
|---|
| 203 | keywords (`lvalue T` is already in Cforall), compiler-supported "special" | 
|---|
| 204 | generic types (e.g. `ref(T)`), or sigils (`T&` is familiar to C++ | 
|---|
| 205 | programmers). | 
|---|
| 206 | Keyword or generic based approaches run the risk of name conflicts with | 
|---|
| 207 | existing code, while any sigil used would have to be carefully chosen to not | 
|---|
| 208 | create parsing conflicts. | 
|---|