| [5c6afcd] | 1 | ## Lvalues and References ## | 
|---|
|  | 2 | C defines the notion of a _lvalue_, essentially an addressable object, as well | 
|---|
|  | 3 | as a number of type _qualifiers_, `const`, `volatile`, and `restrict`. | 
|---|
|  | 4 | As these type qualifiers are generally only meaningful to the type system as | 
|---|
|  | 5 | applied to lvalues, the two concepts are closely related. | 
|---|
|  | 6 | A const lvalue cannot be modified, the compiler cannot assume that a volatile | 
|---|
|  | 7 | lvalue will not be concurrently modified by some other part of the system, and | 
|---|
|  | 8 | a restrict lvalue must have pointer type, and the compiler may assume that no | 
|---|
|  | 9 | other pointer in scope aliases that pointer (this is solely a performance | 
|---|
|  | 10 | optimization, and may be ignored by implementers). | 
|---|
|  | 11 | _Lvalue-to-rvalue conversion_, which takes an lvalue of type `T` and converts | 
|---|
|  | 12 | it to an expression result of type `T` (commonly called an _rvalue_ of type | 
|---|
|  | 13 | `T`) also strips all the qualifiers from the lvalue, as an expression result | 
|---|
|  | 14 | is a value, not an addressable object that can have properties like | 
|---|
|  | 15 | immutability. | 
|---|
|  | 16 | Though lvalue-to-rvalue conversion strips the qualifiers from lvalues, | 
|---|
|  | 17 | derived rvalue types such as pointer types may include qualifiers; | 
|---|
|  | 18 | `const int *` is a distinct type from `int *`, though the latter is safely | 
|---|
|  | 19 | convertable to the former. | 
|---|
|  | 20 | In general, any number of qualifiers can be safely added to the | 
|---|
|  | 21 | pointed-to-type of a pointer type, e.g. `int *` converts safely to | 
|---|
|  | 22 | `const int *` and `volatile int *`, both of which convert safely to | 
|---|
|  | 23 | `const volatile int *`. | 
|---|
|  | 24 |  | 
|---|
|  | 25 | Since lvalues are precicely "addressable objects", in C, only lvalues can be | 
|---|
|  | 26 | used as the operand of the `&` address-of operator. | 
|---|
|  | 27 | Similarly, only modifiable lvalues may be used as the assigned-to | 
|---|
|  | 28 | operand of the mutating operators: assignment, compound assignment | 
|---|
|  | 29 | (e.g. `+=`), and increment and decrement; roughly speaking, lvalues without | 
|---|
|  | 30 | the `const` qualifier are modifiable, but lvalues of incomplete types, array | 
|---|
|  | 31 | types, and struct or union types with const members are also not modifiable. | 
|---|
|  | 32 | Lvalues are produced by the following expressions: object identifiers | 
|---|
|  | 33 | (function identifiers are not considered to be lvalues), the result of the `*` | 
|---|
|  | 34 | dereference operator applied to an object pointer, the result of a member | 
|---|
|  | 35 | expression `s.f` if the left argument `s` is an lvalue (note that the | 
|---|
|  | 36 | preceding two rules imply that the result of indirect member expressions | 
|---|
|  | 37 | `s->f` are always lvalues, by desugaring to `(*s).f`), and the result of the | 
|---|
|  | 38 | indexing operator `a[i]` (similarly by its desugaring to `*((a)+(i))`). | 
|---|
|  | 39 | Somewhat less obviously, parenthesized lvalue expressions, string literals, | 
|---|
|  | 40 | and compound literals (e.g. `(struct foo){ 'x', 3.14, 42 }`) are also lvalues. | 
|---|
|  | 41 |  | 
|---|
|  | 42 | All of the conversions described above are defined in standard C, but Cforall | 
|---|
|  | 43 | requires further features from its type system. | 
|---|
|  | 44 | In particular, to allow overloading of the `*?` and `?[?]` dereferencing and | 
|---|
|  | 45 | indexing operators, Cforall requires a way to declare that the functions | 
|---|
|  | 46 | defining these operators return lvalues, and since C functions never return | 
|---|
|  | 47 | lvalues and for syntactic reasons we wish to distinguish functions which | 
|---|
|  | 48 | return lvalues from functions which return pointers, this is of necessity an | 
|---|
|  | 49 | extension to standard C. | 
|---|
|  | 50 | In the current design, an `lvalue` qualifier can be added to function return | 
|---|
|  | 51 | types (and only to function return types), the effect of which is to return a | 
|---|
|  | 52 | pointer which is implicitly dereferenced by the caller. | 
|---|
|  | 53 | C++ includes the more general concept of _references_, which are typically | 
|---|
|  | 54 | implemented as implicitly dereferenced pointers as well. | 
|---|
|  | 55 | Another use case which C++ references support is providing a way to pass | 
|---|
|  | 56 | function parameters by reference (rather than by value) with a natural | 
|---|
|  | 57 | syntax; Cforall in its current state has no such mechanism. | 
|---|
|  | 58 | As an example, consider the following (currently typical) copy-constructor | 
|---|
|  | 59 | signature and call: | 
|---|
|  | 60 |  | 
|---|
|  | 61 | void ?{}(T *lhs, T rhs); | 
|---|
|  | 62 |  | 
|---|
|  | 63 | T x; | 
|---|
|  | 64 | T y = { x }; | 
|---|
|  | 65 |  | 
|---|
|  | 66 | Note that the right-hand argument is passed by value, and would in fact be | 
|---|
|  | 67 | copied twice in the course of the constructor call `T y = { x };` (once into | 
|---|
|  | 68 | the parameter by C's standard `memcpy` semantics, once again in the body of | 
|---|
|  | 69 | the copy constructor, though it is possible that return value optimization | 
|---|
|  | 70 | will elide the `memcpy`-style copy). | 
|---|
|  | 71 | However, to pass by reference using the existing pointer syntax, the example | 
|---|
|  | 72 | above would look like this: | 
|---|
|  | 73 |  | 
|---|
|  | 74 | void ?{}(T *lhs, const T *rhs); | 
|---|
|  | 75 |  | 
|---|
|  | 76 | T x; | 
|---|
|  | 77 | T y = { &x }; | 
|---|
|  | 78 |  | 
|---|
|  | 79 | This example is not even as bad as it could be; assuming pass-by-reference is | 
|---|
|  | 80 | the desired semantics for the `?+?` operator, that implies the following | 
|---|
|  | 81 | design today: | 
|---|
|  | 82 |  | 
|---|
|  | 83 | T ?+?(const T *lhs, const T *rhs); | 
|---|
|  | 84 |  | 
|---|
|  | 85 | T a, b; | 
|---|
|  | 86 | T c = &a + &b, | 
|---|
|  | 87 |  | 
|---|
|  | 88 | In addition to `&a + &b` being unsightly and confusing syntax to add `a` and | 
|---|
|  | 89 | `b`, it also introduces a possible ambiguity with pointer arithmetic on `T*` | 
|---|
|  | 90 | which can only be resolved by return-type inference. | 
|---|
|  | 91 |  | 
|---|
|  | 92 | Pass-by-reference and marking functions as returning lvalues instead of the | 
|---|
|  | 93 | usual rvalues are actually closely related concepts, as obtaining a reference | 
|---|
|  | 94 | to pass depends on the referenced object being addressable, i.e. an lvalue, | 
|---|
|  | 95 | and lvalue return types are effectively return-by-reference. | 
|---|
|  | 96 | Cforall should also unify the concepts, with a parameterized type for | 
|---|
|  | 97 | "reference to `T`", which I will write `T&`. | 
|---|
|  | 98 |  | 
|---|
|  | 99 | Firstly, assignment to a function parameter as part of a function call and | 
|---|
|  | 100 | local variable initialization have almost identical semantics, so should be | 
|---|
|  | 101 | treated similarly for the reference type too; this implies we should be able | 
|---|
|  | 102 | to declare local variables of reference type, as in the following: | 
|---|
|  | 103 |  | 
|---|
|  | 104 | int x = 42; | 
|---|
|  | 105 | int& r = x; // r is now an alias for x | 
|---|
|  | 106 |  | 
|---|
|  | 107 | Unlike in C++, we would like to have the capability to re-bind references | 
|---|
|  | 108 | after initialization, as this allows the attractive syntax of references to | 
|---|
|  | 109 | support some further useful code patterns, such as first initializing a | 
|---|
|  | 110 | reference after its declaration. | 
|---|
|  | 111 | Constant references to `T` (`T& const`) should not be re-bindable. | 
|---|
|  | 112 |  | 
|---|
|  | 113 | One option for re-binding references is to use a dedicated operator, as in the | 
|---|
|  | 114 | code example below: | 
|---|
|  | 115 |  | 
|---|
|  | 116 | int i = 42, j = 7; | 
|---|
|  | 117 | int& r = i;  // bind r to i | 
|---|
|  | 118 | r = j;       // set i (== r) to 7 | 
|---|
|  | 119 | r := j;      // rebind r to j using the new := rebind operator | 
|---|
|  | 120 | i = 42;      // reset i (!= r) to 42 | 
|---|
|  | 121 | assert( r == 7 ); | 
|---|
|  | 122 |  | 
|---|
|  | 123 | Another option for reference rebind is to modify the semantics of the `&` | 
|---|
|  | 124 | address-of operator. | 
|---|
|  | 125 | In standard C, the address-of operator never returns an lvalue, but for an | 
|---|
|  | 126 | object of type `T`, returns a `T*`. | 
|---|
|  | 127 | If the address-of operator returned an lvalue for references, this would | 
|---|
|  | 128 | allow reference rebinding using the usual pointer assignment syntax; | 
|---|
|  | 129 | that is, if address-of a `T&` returned a `T*&` then the following works: | 
|---|
|  | 130 |  | 
|---|
|  | 131 | int i = 42; j = 7; | 
|---|
|  | 132 | int& r = i;  // bind r to i | 
|---|
|  | 133 | r = j;       // set i (== r) to 7 | 
|---|
|  | 134 | &r = &j;     // rebind r to j using the newly mutable "address-of reference" | 
|---|
|  | 135 | i = 42;      // reset i (!= r) to 42 | 
|---|
|  | 136 | assert( r == 7 ); | 
|---|
|  | 137 |  | 
|---|
|  | 138 | This change (making addresses of references mutable) allows use of existing | 
|---|
|  | 139 | operators defined over pointers, as well as elegant handling of nested | 
|---|
|  | 140 | references-to-references. | 
|---|
|  | 141 |  | 
|---|
|  | 142 | The semantics and restrictions of `T&` are effectively the semantics of an | 
|---|
|  | 143 | lvalue of type `T`, and by this analogy there should be a safe, qualifier | 
|---|
|  | 144 | dropping conversion from `const volatile restrict T&` (and every other | 
|---|
|  | 145 | qualifier combination on the `T` in `T&`) to `T`. | 
|---|
|  | 146 | With this conversion, the resolver may type most expressions that C would | 
|---|
|  | 147 | call "lvalue of type `T`" as `T&`. | 
|---|
|  | 148 | There's also an obvious argument that lvalues of a (possibly-qualified) type | 
|---|
|  | 149 | `T` should be convertable to references of type `T`, where `T` is also | 
|---|
|  | 150 | so-qualified (e.g. lvalue `int` to `int&`, lvalue `const char` to | 
|---|
|  | 151 | `const char&`). | 
|---|
|  | 152 | By similar arguments to pointer types, qualifiers should be addable to the | 
|---|
|  | 153 | referred-to type of a reference (e.g. `int&` to `const int&`). | 
|---|
|  | 154 | As a note, since pointer arithmetic is explictly not defined on `T&`, | 
|---|
|  | 155 | `restrict T&` should be allowable and would have alias-analysis rules that | 
|---|
|  | 156 | are actually comprehensible to mere mortals. | 
|---|
|  | 157 |  | 
|---|
|  | 158 | Using pass-by-reference semantics for function calls should not put syntactic | 
|---|
|  | 159 | constraints on how the function is called; particularly, temporary values | 
|---|
|  | 160 | should be able to be passed by reference. | 
|---|
|  | 161 | The mechanism for this pass-by-reference would be to store the value of the | 
|---|
|  | 162 | temporary expression into a new unnamed temporary, and pass the reference of | 
|---|
|  | 163 | that temporary to the function. | 
|---|
|  | 164 | As an example, the following code should all compile and run: | 
|---|
|  | 165 |  | 
|---|
|  | 166 | void f(int& x) { printf("%d\n", x++); } | 
|---|
|  | 167 |  | 
|---|
|  | 168 | int i = 7, j = 11; | 
|---|
|  | 169 | const int answer = 42; | 
|---|
|  | 170 |  | 
|---|
|  | 171 | f(i);      // (1) | 
|---|
|  | 172 | f(42);     // (2) | 
|---|
|  | 173 | f(i + j);  // (3) | 
|---|
|  | 174 | f(answer); // (4) | 
|---|
|  | 175 |  | 
|---|
|  | 176 | The semantics of (1) are just like C++'s, "7" is printed, and `i` has the | 
|---|
|  | 177 | value 8 afterward. | 
|---|
|  | 178 | For (2), "42" is printed, and the increment of the unnamed temporary to 43 is | 
|---|
|  | 179 | not visible to the caller; (3) behaves similarly, printing "19", but not | 
|---|
|  | 180 | changing `i` or `j`. | 
|---|
|  | 181 | (4) is a bit of an interesting case; we want to be able to support named | 
|---|
|  | 182 | constants like `answer` that can be used anywhere the constant expression | 
|---|
|  | 183 | they're replacing (like `42`) could go; in this sense, (4) and (2) should have | 
|---|
|  | 184 | the same semantics. | 
|---|
|  | 185 | However, we don't want the mutation to the `x` parameter to be visible in | 
|---|
|  | 186 | `answer` afterward, because `answer` is a constant, and thus shouldn't change. | 
|---|
|  | 187 | The solution to this is to allow chaining of the two lvalue conversions; | 
|---|
|  | 188 | `answer` has the type `const int&`, which can be converted to `int` by the | 
|---|
|  | 189 | lvalue-to-rvalue conversion (which drops the qualifiers), then up to `int&` | 
|---|
|  | 190 | by the temporary-producing rvalue-to-lvalue conversion. | 
|---|
|  | 191 | Thus, an unnamed temporary is inserted, initialized to `answer` (i.e. 42), | 
|---|
|  | 192 | mutated by `f`, then discarded; "42" is printed, just as in case (2), and | 
|---|
|  | 193 | `answer` still equals 42 after the call, because it was the temporary that was | 
|---|
|  | 194 | mutated, not `answer`. | 
|---|
|  | 195 | It may be somewhat surprising to C++ programmers that `f(i)` mutates `i` while | 
|---|
|  | 196 | `f(answer)` does not mutate `answer` (though `f(answer)` would be illegal in | 
|---|
|  | 197 | C++, leading to the dreaded "const hell"), but the behaviour of this rule can | 
|---|
|  | 198 | be determined by examining local scope with the simple rule "non-`const` | 
|---|
|  | 199 | references to `const` variables produce temporaries", which aligns with | 
|---|
|  | 200 | programmer intuition that `const` variables cannot be mutated. | 
|---|
|  | 201 |  | 
|---|
|  | 202 | To bikeshed syntax for `T&`, there are three basic options: language | 
|---|
|  | 203 | keywords (`lvalue T` is already in Cforall), compiler-supported "special" | 
|---|
|  | 204 | generic types (e.g. `ref(T)`), or sigils (`T&` is familiar to C++ | 
|---|
|  | 205 | programmers). | 
|---|
|  | 206 | Keyword or generic based approaches run the risk of name conflicts with | 
|---|
|  | 207 | existing code, while any sigil used would have to be carefully chosen to not | 
|---|
|  | 208 | create parsing conflicts. | 
|---|