Changeset cbe477e


Ignore:
Timestamp:
Feb 5, 2018, 4:55:35 PM (6 years ago)
Author:
Aaron Moss <a3moss@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
43bbdf3, 7ad6b6d
Parents:
51b5a02
Message:

About a page of paper content on references

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/papers/general/Paper.tex

    r51b5a02 rcbe477e  
    10381038The implicit targets of the current @continue@ and @break@, \ie the closest enclosing loop or @switch@, change as certain constructs are added or removed.
    10391039
     1040\TODO{choose and fallthrough here as well?}
    10401041
    10411042\subsection{\texorpdfstring{\LstKeywordStyle{with} Clause / Statement}{with Clause / Statement}}
     
    11621163\subsection{References}
    11631164
    1164 \TODO{Pull draft text from user manual; make sure to discuss nested references and rebind operator drawn from lvalue-addressof operator}
    1165 
     1165All variables in C have an \emph{address}, a \emph{value}, and a \emph{type}; at the position in the program's memory denoted by the address, there exists a sequence of bits (the value), with the length and semantic meaning of this bit sequence defined by the type.
     1166The C type system does not always track the relationship between a value and its address; a value that does not have a corresponding address is called a \emph{rvalue} (for ``right-hand value''), while a value that does have an address is called a \emph{lvalue} (for ``left-hand value''); in @int x; x = 42;@ the variable expression @x@ on the left-hand-side of the assignment is a lvalue, while the constant expression @42@ on the right-hand-side of the assignment is a rvalue.
     1167Which address a value is located at is sometimes significant; the imperative programming paradigm of C relies on the mutation of values at specific addresses.
     1168Within a lexical scope, lvalue exressions can be used in either their \emph{address interpretation} to determine where a mutated value should be stored or in their \emph{value interpretation} to refer to their stored value; in @x = y;@ in @{ int x, y = 7; x = y; }@, @x@ is used in its address interpretation, while y is used in its value interpretation.
     1169Though this duality of interpretation is useful, C lacks a direct mechanism to pass lvalues between contexts, instead relying on \emph{pointer types} to serve a similar purpose.
     1170In C, for any type @T@ there is a pointer type @T*@, the value of which is the address of a value of type @T@; a pointer rvalue can be explicitly \emph{dereferenced} to the pointed-to lvalue with the dereference operator @*?@, while the rvalue representing the address of a lvalue can be obtained with the address-of operator @&?@.
     1171
     1172\begin{cfa}
     1173int x = 1, y = 2, * p1, * p2, ** p3;
     1174p1 = &x;  $\C{// p1 points to x}$
     1175p2 = &y;  $\C{// p2 points to y}$
     1176p3 = &p1;  $\C{// p3 points to p1}$
     1177\end{cfa}
     1178
     1179Unfortunately, the dereference and address-of operators introduce a great deal of syntactic noise when dealing with pointed-to values rather than pointers, as well as the potential for subtle bugs.
     1180It would be desirable to have the compiler figure out how to elide the dereference operators in a complex expression such as @*p2 = ((*p1 + *p2) * (**p3 - *p1)) / (**p3 - 15);@, for both brevity and clarity.
     1181However, since C defines a number of forms of \emph{pointer arithmetic}, two similar expressions involving pointers to arithmetic types (\eg @*p1 + x@ and @p1 + x@) may each have well-defined but distinct semantics, introducing the possibility that a user programmer may write one when they mean the other, and precluding any simple algorithm for elision of dereference operators.
     1182To solve these problems, \CFA introduces reference types @T&@; a @T&@ has exactly the same value as a @T*@, but where the @T*@ takes the address interpretation by default, a @T&@ takes the value interpretation by default, as below:
     1183
     1184\begin{cfa}
     1185inx x = 1, y = 2, & r1, & r2, && r3;
     1186&r1 = &x;  $\C{// r1 points to x}$
     1187&r2 = &y;  $\C{// r2 points to y}$
     1188&&r3 = &&r1;  $\C{// r3 points to r2}$
     1189r2 = ((r1 + r2) * (r3 - r1)) / (r3 - 15);  $\C{// implicit dereferencing}$
     1190\end{cfa}
     1191
     1192Except for auto-dereferencing by the compiler, this reference example is exactly the same as the previous pointer example.
     1193Hence, a reference behaves like a variable name -- an lvalue expression which is interpreted as a value, but also has the type system track the address of that value.
     1194One way to conceptualize a reference is via a rewrite rule, where the compiler inserts a dereference operator before the reference variable for each reference qualifier in the reference variable declaration, so the previous example implicitly acts like:
     1195
     1196\begin{cfa}
     1197`*`r2 = ((`*`r1 + `*`r2) * (`**`r3 - `*`r1)) / (`**`r3 - 15);
     1198\end{cfa}
     1199
     1200References in \CFA are similar to those in \CC, but with a couple important improvements, both of which can be seen in the example above.
     1201Firstly, \CFA does not forbid references to references, unlike \CC.
     1202This provides a much more orthogonal design for library implementors, obviating the need for workarounds such as @std::reference_wrapper@.
     1203
     1204Secondly, unlike the references in \CC which always point to a fixed address, \CFA references are rebindable.
     1205This allows \CFA references to be default-initialized (to a null pointer), and also to point to different addresses throughout their lifetime.
     1206This rebinding is accomplished without adding any new syntax to \CFA, but simply by extending the existing semantics of the address-of operator in C.
     1207In C, the address of a lvalue is always a rvalue, as in general that address is not stored anywhere in memory, and does not itself have an address.
     1208In \CFA, the address of a @T&@ is a lvalue @T*@, as the address of the underlying @T@ is stored in the reference, and can thus be mutated there.
     1209The result of this rule is that any reference can be rebound using the existing pointer assignment semantics by assigning a compatible pointer into the address of the reference, \eg @&r1 = &x;@ above.
     1210This rebinding can occur to an arbitrary depth of reference nesting; $n$ address-of operators applied to a reference nested $m$ times will produce an lvalue pointer nested $n$ times if $n \le m$ (note that $n = m+1$ is simply the usual C rvalue address-of operator applied to the $n = m$ case).
     1211The explicit address-of operators can be thought of as ``cancelling out'' the implicit dereference operators, \eg @(&`*`)r1 = &x;@ or @(&(&`*`)`*`)r3 = &(&`*`)r1;@ or even @(&`*`)r2 = (&`*`)`*`r3;@ for @&r2 = &r3;@.
     1212
     1213Since pointers and references share the same internal representation, code using either is equally performant; in fact the \CFA compiler converts references to pointers internally, and the choice between them in user code can be made based solely on convenience.
     1214By analogy to pointers, \CFA references also allow cv-qualifiers:
     1215
     1216\begin{cfa}
     1217const int cx = 5;               $\C{// cannot change cx}$
     1218const int & cr = cx;    $\C{// cannot change cr's referred value}$
     1219&cr = &cx;                              $\C{// rebinding cr allowed}$
     1220cr = 7;                                 $\C{// ERROR, cannot change cr}$
     1221int & const rc = x;             $\C{// must be initialized, like in \CC}$
     1222&rc = &x;                               $\C{// ERROR, cannot rebind rc}$
     1223rc = 7;                                 $\C{// x now equal to 7}$
     1224\end{cfa}
     1225
     1226\TODO{Pull more draft text from user manual; make sure to discuss initialization and reference conversions}
    11661227
    11671228\subsection{Constructors and Destructors}
     
    11821243\subsection{0/1}
    11831244
     1245\TODO{Some text already at the end of Section~\ref{sec:poly-fns}}
    11841246
    11851247\subsection{Units}
Note: See TracChangeset for help on using the changeset viewer.