1 | ## Lvalues and References ## |
---|
2 | C defines the notion of a _lvalue_, essentially an addressable object, as well |
---|
3 | as a number of type _qualifiers_, `const`, `volatile`, and `restrict`. |
---|
4 | As these type qualifiers are generally only meaningful to the type system as |
---|
5 | applied to lvalues, the two concepts are closely related. |
---|
6 | A const lvalue cannot be modified, the compiler cannot assume that a volatile |
---|
7 | lvalue will not be concurrently modified by some other part of the system, and |
---|
8 | a restrict lvalue must have pointer type, and the compiler may assume that no |
---|
9 | other pointer in scope aliases that pointer (this is solely a performance |
---|
10 | optimization, and may be ignored by implementers). |
---|
11 | _Lvalue-to-rvalue conversion_, which takes an lvalue of type `T` and converts |
---|
12 | it to an expression result of type `T` (commonly called an _rvalue_ of type |
---|
13 | `T`) also strips all the qualifiers from the lvalue, as an expression result |
---|
14 | is a value, not an addressable object that can have properties like |
---|
15 | immutability. |
---|
16 | Though lvalue-to-rvalue conversion strips the qualifiers from lvalues, |
---|
17 | derived rvalue types such as pointer types may include qualifiers; |
---|
18 | `const int *` is a distinct type from `int *`, though the latter is safely |
---|
19 | convertible to the former. |
---|
20 | In general, any number of qualifiers can be safely added to the |
---|
21 | pointed-to-type of a pointer type, e.g. `int *` converts safely to |
---|
22 | `const int *` and `volatile int *`, both of which convert safely to |
---|
23 | `const volatile int *`. |
---|
24 | |
---|
25 | Since lvalues are precisely "addressable objects", in C, only lvalues can be |
---|
26 | used as the operand of the `&` address-of operator. |
---|
27 | Similarly, only modifiable lvalues may be used as the assigned-to |
---|
28 | operand of the mutating operators: assignment, compound assignment |
---|
29 | (e.g. `+=`), and increment and decrement; roughly speaking, lvalues without |
---|
30 | the `const` qualifier are modifiable, but lvalues of incomplete types, array |
---|
31 | types, and struct or union types with const members are also not modifiable. |
---|
32 | Lvalues are produced by the following expressions: object identifiers |
---|
33 | (function identifiers are not considered to be lvalues), the result of the `*` |
---|
34 | dereference operator applied to an object pointer, the result of a member |
---|
35 | expression `s.f` if the left argument `s` is an lvalue (note that the |
---|
36 | preceding two rules imply that the result of indirect member expressions |
---|
37 | `s->f` are always lvalues, by desugaring to `(*s).f`), and the result of the |
---|
38 | indexing operator `a[i]` (similarly by its desugaring to `*((a)+(i))`). |
---|
39 | Somewhat less obviously, parenthesized lvalue expressions, string literals, |
---|
40 | and compound literals (e.g. `(struct foo){ 'x', 3.14, 42 }`) are also lvalues. |
---|
41 | |
---|
42 | All of the conversions described above are defined in standard C, but Cforall |
---|
43 | requires further features from its type system. |
---|
44 | In particular, to allow overloading of the `*?` and `?[?]` dereferencing and |
---|
45 | indexing operators, Cforall requires a way to declare that the functions |
---|
46 | defining these operators return lvalues, and since C functions never return |
---|
47 | lvalues and for syntactic reasons we wish to distinguish functions which |
---|
48 | return lvalues from functions which return pointers, this is of necessity an |
---|
49 | extension to standard C. |
---|
50 | In the current design, an `lvalue` qualifier can be added to function return |
---|
51 | types (and only to function return types), the effect of which is to return a |
---|
52 | pointer which is implicitly dereferenced by the caller. |
---|
53 | C++ includes the more general concept of _references_, which are typically |
---|
54 | implemented as implicitly dereferenced pointers as well. |
---|
55 | Another use case which C++ references support is providing a way to pass |
---|
56 | function parameters by reference (rather than by value) with a natural |
---|
57 | syntax; Cforall in its current state has no such mechanism. |
---|
58 | As an example, consider the following (currently typical) copy-constructor |
---|
59 | signature and call: |
---|
60 | |
---|
61 | void ?{}(T *lhs, T rhs); |
---|
62 | |
---|
63 | T x; |
---|
64 | T y = { x }; |
---|
65 | |
---|
66 | Note that the right-hand argument is passed by value, and would in fact be |
---|
67 | copied twice in the course of the constructor call `T y = { x };` (once into |
---|
68 | the parameter by C's standard `memcpy` semantics, once again in the body of |
---|
69 | the copy constructor, though it is possible that return value optimization |
---|
70 | will elide the `memcpy`-style copy). |
---|
71 | However, to pass by reference using the existing pointer syntax, the example |
---|
72 | above would look like this: |
---|
73 | |
---|
74 | void ?{}(T *lhs, const T *rhs); |
---|
75 | |
---|
76 | T x; |
---|
77 | T y = { &x }; |
---|
78 | |
---|
79 | This example is not even as bad as it could be; assuming pass-by-reference is |
---|
80 | the desired semantics for the `?+?` operator, that implies the following |
---|
81 | design today: |
---|
82 | |
---|
83 | T ?+?(const T *lhs, const T *rhs); |
---|
84 | |
---|
85 | T a, b; |
---|
86 | T c = &a + &b, |
---|
87 | |
---|
88 | In addition to `&a + &b` being unsightly and confusing syntax to add `a` and |
---|
89 | `b`, it also introduces a possible ambiguity with pointer arithmetic on `T*` |
---|
90 | which can only be resolved by return-type inference. |
---|
91 | |
---|
92 | Pass-by-reference and marking functions as returning lvalues instead of the |
---|
93 | usual rvalues are actually closely related concepts, as obtaining a reference |
---|
94 | to pass depends on the referenced object being addressable, i.e. an lvalue, |
---|
95 | and lvalue return types are effectively return-by-reference. |
---|
96 | Cforall should also unify the concepts, with a parameterized type for |
---|
97 | "reference to `T`", which I will write `T&`. |
---|
98 | |
---|
99 | Firstly, assignment to a function parameter as part of a function call and |
---|
100 | local variable initialization have almost identical semantics, so should be |
---|
101 | treated similarly for the reference type too; this implies we should be able |
---|
102 | to declare local variables of reference type, as in the following: |
---|
103 | |
---|
104 | int x = 42; |
---|
105 | int& r = x; // r is now an alias for x |
---|
106 | |
---|
107 | Unlike in C++, we would like to have the capability to re-bind references |
---|
108 | after initialization, as this allows the attractive syntax of references to |
---|
109 | support some further useful code patterns, such as first initializing a |
---|
110 | reference after its declaration. |
---|
111 | Constant references to `T` (`T& const`) should not be re-bindable. |
---|
112 | |
---|
113 | One option for re-binding references is to use a dedicated operator, as in the |
---|
114 | code example below: |
---|
115 | |
---|
116 | int i = 42, j = 7; |
---|
117 | int& r = i; // bind r to i |
---|
118 | r = j; // set i (== r) to 7 |
---|
119 | r := j; // rebind r to j using the new := rebind operator |
---|
120 | i = 42; // reset i (!= r) to 42 |
---|
121 | assert( r == 7 ); |
---|
122 | |
---|
123 | Another option for reference rebind is to modify the semantics of the `&` |
---|
124 | address-of operator. |
---|
125 | In standard C, the address-of operator never returns an lvalue, but for an |
---|
126 | object of type `T`, returns a `T*`. |
---|
127 | If the address-of operator returned an lvalue for references, this would |
---|
128 | allow reference rebinding using the usual pointer assignment syntax; |
---|
129 | that is, if address-of a `T&` returned a `T*&` then the following works: |
---|
130 | |
---|
131 | int i = 42; j = 7; |
---|
132 | int& r = i; // bind r to i |
---|
133 | r = j; // set i (== r) to 7 |
---|
134 | &r = &j; // rebind r to j using the newly mutable "address-of reference" |
---|
135 | i = 42; // reset i (!= r) to 42 |
---|
136 | assert( r == 7 ); |
---|
137 | |
---|
138 | This change (making addresses of references mutable) allows use of existing |
---|
139 | operators defined over pointers, as well as elegant handling of nested |
---|
140 | references-to-references. |
---|
141 | |
---|
142 | The semantics and restrictions of `T&` are effectively the semantics of an |
---|
143 | lvalue of type `T`, and by this analogy there should be a safe, qualifier |
---|
144 | dropping conversion from `const volatile restrict T&` (and every other |
---|
145 | qualifier combination on the `T` in `T&`) to `T`. |
---|
146 | With this conversion, the resolver may type most expressions that C would |
---|
147 | call "lvalue of type `T`" as `T&`. |
---|
148 | There's also an obvious argument that lvalues of a (possibly-qualified) type |
---|
149 | `T` should be convertible to references of type `T`, where `T` is also |
---|
150 | so-qualified (e.g. lvalue `int` to `int&`, lvalue `const char` to |
---|
151 | `const char&`). |
---|
152 | By similar arguments to pointer types, qualifiers should be addable to the |
---|
153 | referred-to type of a reference (e.g. `int&` to `const int&`). |
---|
154 | As a note, since pointer arithmetic is explicitly not defined on `T&`, |
---|
155 | `restrict T&` should be allowable and would have alias-analysis rules that |
---|
156 | are actually comprehensible to mere mortals. |
---|
157 | |
---|
158 | Using pass-by-reference semantics for function calls should not put syntactic |
---|
159 | constraints on how the function is called; particularly, temporary values |
---|
160 | should be able to be passed by reference. |
---|
161 | The mechanism for this pass-by-reference would be to store the value of the |
---|
162 | temporary expression into a new unnamed temporary, and pass the reference of |
---|
163 | that temporary to the function. |
---|
164 | As an example, the following code should all compile and run: |
---|
165 | |
---|
166 | void f(int& x) { printf("%d\n", x++); } |
---|
167 | |
---|
168 | int i = 7, j = 11; |
---|
169 | const int answer = 42; |
---|
170 | |
---|
171 | f(i); // (1) |
---|
172 | f(42); // (2) |
---|
173 | f(i + j); // (3) |
---|
174 | f(answer); // (4) |
---|
175 | |
---|
176 | The semantics of (1) are just like C++'s, "7" is printed, and `i` has the |
---|
177 | value 8 afterward. |
---|
178 | For (2), "42" is printed, and the increment of the unnamed temporary to 43 is |
---|
179 | not visible to the caller; (3) behaves similarly, printing "19", but not |
---|
180 | changing `i` or `j`. |
---|
181 | (4) is a bit of an interesting case; we want to be able to support named |
---|
182 | constants like `answer` that can be used anywhere the constant expression |
---|
183 | they're replacing (like `42`) could go; in this sense, (4) and (2) should have |
---|
184 | the same semantics. |
---|
185 | However, we don't want the mutation to the `x` parameter to be visible in |
---|
186 | `answer` afterward, because `answer` is a constant, and thus shouldn't change. |
---|
187 | The solution to this is to allow chaining of the two lvalue conversions; |
---|
188 | `answer` has the type `const int&`, which can be converted to `int` by the |
---|
189 | lvalue-to-rvalue conversion (which drops the qualifiers), then up to `int&` |
---|
190 | by the temporary-producing rvalue-to-lvalue conversion. |
---|
191 | Thus, an unnamed temporary is inserted, initialized to `answer` (i.e. 42), |
---|
192 | mutated by `f`, then discarded; "42" is printed, just as in case (2), and |
---|
193 | `answer` still equals 42 after the call, because it was the temporary that was |
---|
194 | mutated, not `answer`. |
---|
195 | It may be somewhat surprising to C++ programmers that `f(i)` mutates `i` while |
---|
196 | `f(answer)` does not mutate `answer` (though `f(answer)` would be illegal in |
---|
197 | C++, leading to the dreaded "const hell"), but the behaviour of this rule can |
---|
198 | be determined by examining local scope with the simple rule "non-`const` |
---|
199 | references to `const` variables produce temporaries", which aligns with |
---|
200 | programmer intuition that `const` variables cannot be mutated. |
---|
201 | |
---|
202 | To bikeshed syntax for `T&`, there are three basic options: language |
---|
203 | keywords (`lvalue T` is already in Cforall), compiler-supported "special" |
---|
204 | generic types (e.g. `ref(T)`), or sigils (`T&` is familiar to C++ |
---|
205 | programmers). |
---|
206 | Keyword or generic based approaches run the risk of name conflicts with |
---|
207 | existing code, while any sigil used would have to be carefully chosen to not |
---|
208 | create parsing conflicts. |
---|