Changeset 685a5e8 for doc/working


Ignore:
Timestamp:
Jun 7, 2016, 11:27:55 AM (5 years ago)
Author:
Rob Schluntz <rschlunt@…>
Branches:
aaron-thesis, arm-eh, cleanup-dtors, ctor, deferred_resn, demangler, gc_noraii, jacob/cs343-translation, jenkins-sandbox, master, memory, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, resolv-new, with_gc
Children:
db175c8
Parents:
2f6b7c9 (diff), d5f1cfc (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' into ctor

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/working/resolver_design.md

    r2f6b7c9 r685a5e8  
    8787element-wise; summation starts at `(0, 0, 0, 0)`.
    8888
    89 **TODO** Costs of T, T*, lvalue T, rvalue T conversions (if applicable)
    90 
    91 ### Lvalue Conversions ###
    92 **TODO** Finish me
    93 
    94 #### NOTES
    95 * C standard 6.3.2.1
    96 * pointer_like_generators.md
    97 
    98 ### Qualifier Conversions ###
    99 **TODO** Finish me
    100 
    101 #### NOTES
    102 C standard 6.3.2.3.2: We can add any qualifier to the pointed-to-type of a
    103 pointer.
    104 * Glen thinks this means that we should make the default assignment operator
    105   `?=?(T volatile restrict *this, T that)`, but I'm not sure I like the
    106   implications for the actual implementation of forcing `this` to be
    107   volatile
    108 * I want to consider whether this property should generalize to other
    109   parameterized types (e.g. `lvalue T`, `box(T)`)
    110 
    111 C standard 6.3.2.1.1: "modifiable lvalues" recursively exclude structs with
    112 const-qualified fields
    113 
    114 C standard 6.3.2.1.2: Using lvalues as rvalues implicitly strips qualifiers
    115 
    116 C standard 6.2.4.26:
    117 
    118 C standard 6.7.3: **TODO**
     89### Lvalue and Qualifier Conversions ###
     90C defines the notion of a _lvalue_, essentially an addressable object, as well
     91as a number of type _qualifiers_, `const`, `volatile`, and `restrict`.
     92As these type qualifiers are generally only meaningful to the type system as
     93applied to lvalues, the two concepts are closely related.
     94A const lvalue cannot be modified, the compiler cannot assume that a volatile
     95lvalue will not be concurrently modified by some other part of the system, and
     96a restrict lvalue must have pointer type, and the compiler may assume that no
     97other pointer in scope aliases that pointer (this is solely a performance
     98optimization, and may be ignored by implementers).
     99_Lvalue-to-rvalue conversion_, which takes an lvalue of type `T` and converts
     100it to an expression result of type `T` (commonly called an _rvalue_ of type
     101`T`) also strips all the qualifiers from the lvalue, as an expression result
     102is a value, not an addressable object that can have properties like
     103immutability.
     104Though lvalue-to-rvalue conversion strips the qualifiers from lvalues,
     105derived rvalue types such as pointer types may include qualifiers;
     106`const int *` is a distinct type from `int *`, though the latter is safely
     107convertable to the former.
     108In general, any number of qualifiers can be safely added to the
     109pointed-to-type of a pointer type, e.g. `int *` converts safely to
     110`const int *` and `volatile int *`, both of which convert safely to
     111`const volatile int *`.
     112
     113Since lvalues are precicely "addressable objects", in C, only lvalues can be
     114used as the operand of the `&` address-of operator.
     115Similarly, only modifiable lvalues may be used as the assigned-to
     116operand of the mutating operators: assignment, compound assignment
     117(e.g. `+=`), and increment and decrement; roughly speaking, lvalues without
     118the `const` qualifier are modifiable, but lvalues of incomplete types, array
     119types, and struct or union types with const members are also not modifiable.
     120Lvalues are produced by the following expressions: object identifiers
     121(function identifiers are not considered to be lvalues), the result of the `*`
     122dereference operator applied to an object pointer, the result of a member
     123expression `s.f` if the left argument `s` is an lvalue (note that the
     124preceding two rules imply that the result of indirect member expressions
     125`s->f` are always lvalues, by desugaring to `(*s).f`), and the result of the
     126indexing operator `a[i]` (similarly by its desugaring to `*((a)+(i))`).
     127Somewhat less obviously, parenthesized lvalue expressions, string literals,
     128and compound literals (e.g. `(struct foo){ 'x', 3.14, 42 }`) are also lvalues.
     129
     130All of the conversions described above are defined in standard C, but Cforall
     131requires further features from its type system.
     132In particular, to allow overloading of the `*?` and `?[?]` dereferencing and
     133indexing operators, Cforall requires a way to declare that the functions
     134defining these operators return lvalues, and since C functions never return
     135lvalues and for syntactic reasons we wish to distinguish functions which
     136return lvalues from functions which return pointers, this is of necessity an
     137extension to standard C.
     138In the current design, an `lvalue` qualifier can be added to function return
     139types (and only to function return types), the effect of which is to return a
     140pointer which is implicitly dereferenced by the caller.
     141C++ includes the more general concept of _references_, which are typically
     142implemented as implicitly dereferenced pointers as well.
     143Another use case which C++ references support is providing a way to pass
     144function parameters by reference (rather than by value) with a natural
     145syntax; Cforall in its current state has no such mechanism.
     146As an example, consider the following (currently typical) copy-constructor
     147signature and call:
     148
     149        void ?{}(T *lhs, T rhs);
     150       
     151        T x;
     152        T y = { x };
     153
     154Note that the right-hand argument is passed by value, and would in fact be
     155copied twice in the course of the constructor call `T y = { x };` (once into
     156the parameter by C's standard `memcpy` semantics, once again in the body of
     157the copy constructor, though it is possible that return value optimization
     158will elide the `memcpy`-style copy).
     159However, to pass by reference using the existing pointer syntax, the example
     160above would look like this:
     161
     162        void ?{}(T *lhs, const T *rhs);
     163       
     164        T x;
     165        T y = { &x };
     166
     167This example is not even as bad as it could be; assuming pass-by-reference is
     168the desired semantics for the `?+?` operator, that implies the following
     169design today:
     170
     171        T ?+?(const T *lhs, const T *rhs);
     172       
     173        T a, b;
     174        T c = &a + &b,
     175
     176In addition to `&a + &b` being unsightly and confusing syntax to add `a` and
     177`b`, it also introduces a possible ambiguity with pointer arithmetic on `T*`
     178which can only be resolved by return-type inference.
     179
     180Pass-by-reference and marking functions as returning lvalues instead of the
     181usual rvalues are actually closely related concepts, as obtaining a reference
     182to pass depends on the referenced object being addressable, i.e. an lvalue,
     183and lvalue return types are effectively return-by-reference.
     184Cforall should also unify the concepts, with a parameterized type for
     185"reference to `T`", which I will write `ref T`.
     186Syntax bikeshedding can be done later (there are some examples at the bottom
     187of this section), but `ref T` is sufficiently distinct from both the existing
     188`lvalue T` (which it subsumes) and the closely related C++ `T&` to allow
     189independent discussion of its semantics.
     190
     191Firstly, assignment to a function parameter as part of a function call and
     192local variable initialization have almost identical semantics, so should be
     193treated similarly for the reference type too; this implies we should be able
     194to declare local variables of reference type, as in the following:
     195
     196        int x = 42;
     197        ref int r = x; // r is now an alias for x
     198
     199Unlike in C++, we would like to have the capability to re-bind references
     200after initialization, as this allows the attractive syntax of references to
     201support some further useful code patterns, such as first initializing a
     202reference after its declaration.
     203Constant references to `T` (`const ref T`) should not be re-bindable.
     204
     205One option for re-binding references is to use a dedicated operator, as in the
     206code example below:
     207
     208        int i = 42, j = 7;
     209        ref int r = i; // bind r to i
     210        r = j;         // set i (== r) to 7
     211        r := j;        // rebind r to j using the new := rebind operator
     212        i = 42;        // reset i (!= r) to 42
     213        assert( r == 7 );
     214
     215The other syntactic option for reference re-bind would be to overload
     216assignment and use type inference on the left and right-hand sides to
     217determine whether the referred-to variable on the left should be reassigned to
     218the value on the right, or if the reference on the left should be aliased to
     219the reference on the right.
     220This could be disambiguated with casts, as in the following code example:
     221
     222        int i
     223        int j;
     224        ref int r = i;   // (0a)
     225        ref int s = i;   // (0b)
     226       
     227        i = j;           // (1)
     228        i = (int)s;      // (2)
     229        i = s;           // (3)
     230        // ---------------------
     231        r = s;           // (4)
     232        r = (ref int)j;  // (5)
     233        // ---------------------
     234        r = j;           // (6)
     235        r = (int)s;      // (7)
     236
     237By the expected aliasing syntax, (0a) and (0b) are initializing `r` and `s` as
     238aliases for `i`.
     239For C compatibility, (1) has to be assignment; in general, any assignment to a
     240non-reference type should be assignment, so (2) and (3) are as well.
     241By types, (4) and (5) should have the same semantics, and the semantics of (6)
     242and (7) should match as well.
     243This suggests that (4) and (5) are reference re-bind, and (6) and (7) are an
     244assignment to the referred variable; this makes the syntax to explicitly alias
     245a local variable rather ugly (and inconsistent with the initialization
     246syntax), as well as making it rather awkward to copy the value stored in one
     247reference-type variable into another reference type variable (which is likely
     248more painful in functions with by-reference parameters than with local
     249variables of reference type).
     250
     251Because of the aforementioned issues with overloading assignment as reference
     252rebind, in addition to the fact that reference rebind should not be a
     253user-overloadable operator (unlike assignment), I propose refererence rebind
     254should have its own dedicated operator.
     255
     256The semantics and restrictions of `ref T` are effectively the semantics of an
     257lvalue of type `T`, and by this analogy there should be a safe, qualifier
     258dropping conversion from `ref const volatile restrict T` (and every other
     259qualifier combination on the `T` in `ref T`) to `T`.
     260With this conversion, the resolver may type most expressions that C would
     261call "lvalue of type `T`" as `ref T`.
     262There's also an obvious argument that lvalues of a (possibly-qualified) type
     263`T` should be convertable to references of type `T`, where `T` is also
     264so-qualified (e.g. lvalue `int` to `ref int`, lvalue `const char` to
     265`ref const char`).
     266By similar arguments to pointer types, qualifiers should be addable to the
     267referred-to type of a reference (e.g. `ref int` to `ref const int`).
     268As a note, since pointer arithmetic is explictly not defined on `ref T`,
     269`restrict ref T` should be allowable and would have alias-analysis rules that
     270are actually comprehensible to mere mortals.
     271
     272Using pass-by-reference semantics for function calls should not put syntactic
     273constraints on how the function is called; particularly, temporary values
     274should be able to be passed by reference.
     275The mechanism for this pass-by-reference would be to store the value of the
     276temporary expression into a new unnamed temporary, and pass the reference of
     277that temporary to the function.
     278As an example, the following code should all compile and run:
     279
     280        void f(ref int x) { printf("%d\n", x++); }
     281       
     282        int i = 7, j = 11;
     283        const int answer = 42;
     284       
     285        f(i);      // (1)
     286        f(42);     // (2)
     287        f(i + j);  // (3)
     288        f(answer); // (4)
     289
     290The semantics of (1) are just like C++'s, "7" is printed, and `i` has the
     291value 8 afterward.
     292For (2), "42" is printed, and the increment of the unnamed temporary to 43 is
     293not visible to the caller; (3) behaves similarly, printing "19", but not
     294changing `i` or `j`.
     295(4) is a bit of an interesting case; we want to be able to support named
     296constants like `answer` that can be used anywhere the constant expression
     297they're replacing (like `42`) could go; in this sense, (4) and (2) should have
     298the same semantics.
     299However, we don't want the mutation to the `x` parameter to be visible in
     300`answer` afterward, because `answer` is a constant, and thus shouldn't change.
     301The solution to this is to allow chaining of the two `ref` conversions;
     302`answer` has the type `ref const int`, which can be converted to `int` by the
     303lvalue-to-rvalue conversion (which drops the qualifiers), then up to `ref int`
     304by the temporary-producing rvalue-to-lvalue conversion.
     305Thus, an unnamed temporary is inserted, initialized to `answer` (i.e. 42),
     306mutated by `f`, then discarded; "42" is printed, just as in case (2), and
     307`answer` still equals 42 after the call, because it was the temporary that was
     308mutated, not `answer`.
     309It may be somewhat surprising to C++ programmers that `f(i)` mutates `i` while
     310`f(answer)` does not mutate `answer` (though `f(answer)` would be illegal in
     311C++, leading to the dreaded "const hell"), but the behaviour of this rule can
     312be determined by examining local scope with the simple rule "non-`const`
     313references to `const` variables produce temporaries", which aligns with
     314programmer intuition that `const` variables cannot be mutated.
     315
     316To bikeshed syntax for `ref T`, there are three basic options: language
     317keywords (`lvalue T` is already in Cforall), compiler-supported "special"
     318generic types (e.g. `ref(T)`), or sigils (`T&` is familiar to C++
     319programmers).
     320Keyword or generic based approaches run the risk of name conflicts with
     321existing code, while any sigil used would have to be carefully chosen to not
     322create parsing conflicts.
     323
     324**TODO** Consider arguments for move semantics and see if there is a
     325compelling case for rvalue references.
    119326
    120327### Conversion Operator Costs ###
     
    624831perfectly legal and has the desired semantics.
    625832
     833We can assert that `T` can be used in a boolean context as follows:
     834
     835        `forall(otype T | { int ?!=?(T, _zero_t); })`
     836 
     837Since the C standard (6.5.16.1.1) specifically states that pointers can be
     838assigned into `_Bool` variables (and implies that other artithmetic types can
     839be assigned into `_Bool` variables), it seems natural to say that assignment
     840into a `_Bool` variable effectively constitutes a boolean context.
     841To allow this interpretation, I propose including the following function (or
     842its effective equivalent) in the prelude:
     843
     844        forall(otype T | { int ?!=?(T, _zero_t); })
     845        void ?{safe}( _Bool *this, T that ) { *this = that != 0; }
     846
     847Note that this conversion is not transitive; that is, for `t` a variable of
     848some "truthy" type `T`, `(_Bool)t;` would use this conversion (in the absence
     849of a lower-cost one), `(int)t;` would not use this conversion (and in fact
     850would not be legal in the absence of another valid way to convert a `T` to an
     851`int`), but `(int)(_Bool)t;` could legally use this conversion.
     852
    626853Similarly giving literal `1` the special type `_unit_t` allows for more
    627854concise and consistent specification of the increment and decrement operators,
     
    714941then take the non-deleted alternative, and of two equivalent-cost deleted
    715942interpretations with the same return type pick one arbitrarily rather than
    716 producing an ambiguous resolution.
     943producing an ambiguous resolution. This would also be useful for forbidding
     944pointer-to-floating-point explicit conversions (C11, 6.5.4.4).
     945**TODO** Cover default parameters, maybe named parameters (see "named
     946arguments" thread of 11 March 2016)
     947 
    717948
    718949### Sizeof, Alignof & Offsetof Expressions ###
     
    732963for each interpretation `J` of `y` with the same type as `J` costing the sum
    733964of the costs of `I` and `J`.
     965
     966### Index Expressions ###
     967**TODO** Consider adding polymorphic function in prelude for this, as per
     9686.5.2.1.2 in the C standard:
     969
     970        forall(otype T, otype I, otype R, otype E | { R ?+?(T, I); lvalue E *?(R); })
     971        lvalue E ?[?](T a, I i) { return *(a + i); }
     972
     973I think this isn't actually a good idea, because the cases for it are niche,
     974mostly odd tricks like `0[p]` as an alternate syntax for dereferencing a
     975pointer `p`, and adding it to the prelude would slow down resolution of
     976every index expression just a bit. Our existing prelude includes an indexing
     977operator `forall(otype T) lvalue T ?[?](ptrdiff_t, T*)`, plus qualified
     978variants, which should satisfy C source-compatibility without propegating this
     979silly desugaring further.
    734980
    735981#### Compatible Functions ####
Note: See TracChangeset for help on using the changeset viewer.