source: doc/proposals/references.md @ 5ddb8bf

Last change on this file since 5ddb8bf was a4b3525, checked in by Peter A. Buhr <pabuhr@…>, 7 years ago

correct spelling

  • Property mode set to 100644
File size: 10.5 KB
RevLine 
[5c6afcd]1## Lvalues and References ##
2C defines the notion of a _lvalue_, essentially an addressable object, as well
3as a number of type _qualifiers_, `const`, `volatile`, and `restrict`.
4As these type qualifiers are generally only meaningful to the type system as
5applied to lvalues, the two concepts are closely related.
6A const lvalue cannot be modified, the compiler cannot assume that a volatile
7lvalue will not be concurrently modified by some other part of the system, and
8a restrict lvalue must have pointer type, and the compiler may assume that no
9other pointer in scope aliases that pointer (this is solely a performance
10optimization, and may be ignored by implementers).
11_Lvalue-to-rvalue conversion_, which takes an lvalue of type `T` and converts
12it to an expression result of type `T` (commonly called an _rvalue_ of type
13`T`) also strips all the qualifiers from the lvalue, as an expression result
14is a value, not an addressable object that can have properties like
15immutability.
16Though lvalue-to-rvalue conversion strips the qualifiers from lvalues,
17derived rvalue types such as pointer types may include qualifiers;
18`const int *` is a distinct type from `int *`, though the latter is safely
[a4b3525]19convertible to the former.
[5c6afcd]20In general, any number of qualifiers can be safely added to the
21pointed-to-type of a pointer type, e.g. `int *` converts safely to
22`const int *` and `volatile int *`, both of which convert safely to
23`const volatile int *`.
24
[a4b3525]25Since lvalues are precisely "addressable objects", in C, only lvalues can be
[5c6afcd]26used as the operand of the `&` address-of operator.
27Similarly, only modifiable lvalues may be used as the assigned-to
28operand of the mutating operators: assignment, compound assignment
29(e.g. `+=`), and increment and decrement; roughly speaking, lvalues without
30the `const` qualifier are modifiable, but lvalues of incomplete types, array
31types, and struct or union types with const members are also not modifiable.
32Lvalues are produced by the following expressions: object identifiers
33(function identifiers are not considered to be lvalues), the result of the `*` 
34dereference operator applied to an object pointer, the result of a member
35expression `s.f` if the left argument `s` is an lvalue (note that the
36preceding two rules imply that the result of indirect member expressions
37`s->f` are always lvalues, by desugaring to `(*s).f`), and the result of the
38indexing operator `a[i]` (similarly by its desugaring to `*((a)+(i))`).
39Somewhat less obviously, parenthesized lvalue expressions, string literals,
40and compound literals (e.g. `(struct foo){ 'x', 3.14, 42 }`) are also lvalues.
41
42All of the conversions described above are defined in standard C, but Cforall
43requires further features from its type system.
44In particular, to allow overloading of the `*?` and `?[?]` dereferencing and
45indexing operators, Cforall requires a way to declare that the functions
46defining these operators return lvalues, and since C functions never return
47lvalues and for syntactic reasons we wish to distinguish functions which
48return lvalues from functions which return pointers, this is of necessity an
49extension to standard C.
50In the current design, an `lvalue` qualifier can be added to function return
51types (and only to function return types), the effect of which is to return a
52pointer which is implicitly dereferenced by the caller.
53C++ includes the more general concept of _references_, which are typically
54implemented as implicitly dereferenced pointers as well.
55Another use case which C++ references support is providing a way to pass
56function parameters by reference (rather than by value) with a natural
57syntax; Cforall in its current state has no such mechanism.
58As an example, consider the following (currently typical) copy-constructor
59signature and call:
60
61        void ?{}(T *lhs, T rhs);
62       
63        T x;
64        T y = { x };
65
66Note that the right-hand argument is passed by value, and would in fact be
67copied twice in the course of the constructor call `T y = { x };` (once into
68the parameter by C's standard `memcpy` semantics, once again in the body of
69the copy constructor, though it is possible that return value optimization
70will elide the `memcpy`-style copy).
71However, to pass by reference using the existing pointer syntax, the example
72above would look like this:
73
74        void ?{}(T *lhs, const T *rhs);
75       
76        T x;
77        T y = { &x };
78
79This example is not even as bad as it could be; assuming pass-by-reference is
80the desired semantics for the `?+?` operator, that implies the following
81design today:
82
83        T ?+?(const T *lhs, const T *rhs);
84       
85        T a, b;
86        T c = &a + &b,
87
88In addition to `&a + &b` being unsightly and confusing syntax to add `a` and
89`b`, it also introduces a possible ambiguity with pointer arithmetic on `T*`
90which can only be resolved by return-type inference.
91
92Pass-by-reference and marking functions as returning lvalues instead of the
93usual rvalues are actually closely related concepts, as obtaining a reference
94to pass depends on the referenced object being addressable, i.e. an lvalue,
95and lvalue return types are effectively return-by-reference.
96Cforall should also unify the concepts, with a parameterized type for
97"reference to `T`", which I will write `T&`.
98
99Firstly, assignment to a function parameter as part of a function call and
100local variable initialization have almost identical semantics, so should be
101treated similarly for the reference type too; this implies we should be able
102to declare local variables of reference type, as in the following:
103
104        int x = 42;
105        int& r = x; // r is now an alias for x
106
107Unlike in C++, we would like to have the capability to re-bind references
108after initialization, as this allows the attractive syntax of references to
109support some further useful code patterns, such as first initializing a
110reference after its declaration.
111Constant references to `T` (`T& const`) should not be re-bindable.
112
113One option for re-binding references is to use a dedicated operator, as in the
114code example below:
115
116        int i = 42, j = 7;
117        int& r = i;  // bind r to i
118        r = j;       // set i (== r) to 7
119        r := j;      // rebind r to j using the new := rebind operator
120        i = 42;      // reset i (!= r) to 42
121        assert( r == 7 );
122
123Another option for reference rebind is to modify the semantics of the `&` 
124address-of operator.
125In standard C, the address-of operator never returns an lvalue, but for an
126object of type `T`, returns a `T*`.
127If the address-of operator returned an lvalue for references, this would
128allow reference rebinding using the usual pointer assignment syntax;
129that is, if address-of a `T&` returned a `T*&` then the following works:
130
131    int i = 42; j = 7;
132    int& r = i;  // bind r to i
133    r = j;       // set i (== r) to 7
134    &r = &j;     // rebind r to j using the newly mutable "address-of reference"
135    i = 42;      // reset i (!= r) to 42
136    assert( r == 7 );
137
138This change (making addresses of references mutable) allows use of existing
139operators defined over pointers, as well as elegant handling of nested
140references-to-references.
141
142The semantics and restrictions of `T&` are effectively the semantics of an
143lvalue of type `T`, and by this analogy there should be a safe, qualifier
144dropping conversion from `const volatile restrict T&` (and every other
145qualifier combination on the `T` in `T&`) to `T`.
146With this conversion, the resolver may type most expressions that C would
147call "lvalue of type `T`" as `T&`.
148There's also an obvious argument that lvalues of a (possibly-qualified) type
[a4b3525]149`T` should be convertible to references of type `T`, where `T` is also
[5c6afcd]150so-qualified (e.g. lvalue `int` to `int&`, lvalue `const char` to
151`const char&`).
152By similar arguments to pointer types, qualifiers should be addable to the
153referred-to type of a reference (e.g. `int&` to `const int&`).
[a4b3525]154As a note, since pointer arithmetic is explicitly not defined on `T&`,
[5c6afcd]155`restrict T&` should be allowable and would have alias-analysis rules that
156are actually comprehensible to mere mortals.
157
158Using pass-by-reference semantics for function calls should not put syntactic
159constraints on how the function is called; particularly, temporary values
160should be able to be passed by reference.
161The mechanism for this pass-by-reference would be to store the value of the
162temporary expression into a new unnamed temporary, and pass the reference of
163that temporary to the function.
164As an example, the following code should all compile and run:
165
166        void f(int& x) { printf("%d\n", x++); }
167       
168        int i = 7, j = 11;
169        const int answer = 42;
170       
171        f(i);      // (1)
172        f(42);     // (2)
173        f(i + j);  // (3)
174        f(answer); // (4)
175
176The semantics of (1) are just like C++'s, "7" is printed, and `i` has the
177value 8 afterward.
178For (2), "42" is printed, and the increment of the unnamed temporary to 43 is
179not visible to the caller; (3) behaves similarly, printing "19", but not
180changing `i` or `j`.
181(4) is a bit of an interesting case; we want to be able to support named
182constants like `answer` that can be used anywhere the constant expression
183they're replacing (like `42`) could go; in this sense, (4) and (2) should have
184the same semantics.
185However, we don't want the mutation to the `x` parameter to be visible in
186`answer` afterward, because `answer` is a constant, and thus shouldn't change.
187The solution to this is to allow chaining of the two lvalue conversions;
188`answer` has the type `const int&`, which can be converted to `int` by the
189lvalue-to-rvalue conversion (which drops the qualifiers), then up to `int&` 
190by the temporary-producing rvalue-to-lvalue conversion.
191Thus, an unnamed temporary is inserted, initialized to `answer` (i.e. 42),
192mutated by `f`, then discarded; "42" is printed, just as in case (2), and
193`answer` still equals 42 after the call, because it was the temporary that was
194mutated, not `answer`.
195It may be somewhat surprising to C++ programmers that `f(i)` mutates `i` while
196`f(answer)` does not mutate `answer` (though `f(answer)` would be illegal in
197C++, leading to the dreaded "const hell"), but the behaviour of this rule can
198be determined by examining local scope with the simple rule "non-`const`
199references to `const` variables produce temporaries", which aligns with
200programmer intuition that `const` variables cannot be mutated.
201
202To bikeshed syntax for `T&`, there are three basic options: language
203keywords (`lvalue T` is already in Cforall), compiler-supported "special"
204generic types (e.g. `ref(T)`), or sigils (`T&` is familiar to C++
205programmers).
206Keyword or generic based approaches run the risk of name conflicts with
207existing code, while any sigil used would have to be carefully chosen to not
208create parsing conflicts.
Note: See TracBrowser for help on using the repository browser.