# Changeset 27caf8d

Ignore:
Timestamp:
May 19, 2017, 11:56:43 AM (4 years ago)
Branches:
aaron-thesis, arm-eh, cleanup-dtors, deferred_resn, demangler, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, resolv-new, with_gc
Children:
Parents:
fdca7c6
Message:

updates to pointers/references section

File:
1 edited

### Legend:

Unmodified
 rfdca7c6 %% Created On       : Wed Apr  6 14:53:29 2016 %% Last Modified By : Peter A. Buhr %% Last Modified On : Wed May 17 22:42:11 2017 %% Update Count     : 1685 %% Last Modified On : Fri May 19 11:54:31 2017 %% Update Count     : 1735 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% One way to conceptualize the null pointer is that no variable is placed at this address, so the null-pointer address can be used to denote an uninitialized pointer/reference object; \ie the null pointer is guaranteed to compare unequal to a pointer to any object or routine.} An address is \newterm{sound}, if it points to a valid memory location in scope, \ie has not been freed. An address is \newterm{sound}, if it points to a valid memory location in scope, \ie within the program's execution-environment and has not been freed. Dereferencing an \newterm{unsound} address, including the null pointer, is \Index{undefined}, often resulting in a \Index{memory fault}. \end{quote2} Finally, the immutable nature of a variable's address and the fact that there is no storage for the variable pointer means pointer assignment\index{pointer!assignment}\index{assignment!pointer} is impossible. Therefore, the expression ©x = y© has only one meaning, ©*x = *y©, \ie manipulate values, which is why explicitly writing the dereferences is unnecessary even though it occurs implicitly as part of instruction decoding. Therefore, the expression ©x = y© has only one meaning, ©*x = *y©, \ie manipulate values, which is why explicitly writing the dereferences is unnecessary even though it occurs implicitly as part of \Index{instruction decoding}. A \Index{pointer}/\Index{reference} object is a generalization of an object variable-name, \ie a mutable address that can point to more than one memory location during its lifetime. (Similarly, an integer variable can contain multiple integer literals during its lifetime versus an integer constant representing a single literal during its lifetime and, like a variable name, may not occupy storage as the literal is embedded directly into instructions.) (Similarly, an integer variable can contain multiple integer literals during its lifetime versus an integer constant representing a single literal during its lifetime, and like a variable name, may not occupy storage as the literal is embedded directly into instructions.) Hence, a pointer occupies memory to store its current address, and the pointer's value is loaded by dereferencing, \eg: \begin{quote2} \end{quote2} Notice, an address has a duality\index{address!duality}: a location in memory or the value at that location. Notice, an address has a \Index{duality}\index{address!duality}: a location in memory or the value at that location. In many cases, a compiler might be able to infer the best meaning for these two cases. For example, \Index*{Algol68}~\cite{Algol68} inferences pointer dereferencing to select the best meaning for each pointer usage For example, \Index*{Algol68}~\cite{Algol68} infers pointer dereferencing to select the best meaning for each pointer usage \begin{cfa} p2 = p1 + x;                                    §\C{// compiler infers *p2 = *p1 + x;}§ Unfortunately, automatic dereferencing does not work in all cases, and so some mechanism is necessary to fix incorrect choices. Rather than dereference inferencing, most programming languages pick one implicit dereferencing semantics, and the programmer explicitly indicates the other to resolve address-duality. Rather than inferring dereference, most programming languages pick one implicit dereferencing semantics, and the programmer explicitly indicates the other to resolve address-duality. In C, objects of pointer type always manipulate the pointer object's address: \begin{cfa} \end{cfa} To support this common case, a reference type is introduced in \CFA, denoted by ©&©, which is the opposite dereference semantics to a pointer type, making the value at the pointed-to location the implicit semantics for dereferencing. To support this common case, a reference type is introduced in \CFA, denoted by ©&©, which is the opposite dereference semantics to a pointer type, making the value at the pointed-to location the implicit semantics for dereferencing (similar but not the same as \CC \Index{reference type}s). \begin{cfa} int x, y, ®&® r1, ®&® r2, ®&&® r3; ®*®r2 = ((®*®r1 + ®*®r2) ®*® (®**®r3 - ®*®r1)) / (®**®r3 - 15); \end{cfa} When a reference operation appears beside a dereference operation, \eg ©&*©, they cancel out.\footnote{ When a reference operation appears beside a dereference operation, \eg ©&*©, they cancel out. However, in C, the cancellation always yields a value (\Index{rvalue}).\footnote{ The unary ©&© operator yields the address of its operand. If the operand has type type'', the result has type pointer to type''. If the operand is the result of a unary ©*© operator, neither that operator nor the ©&© operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue.~\cite[\S~6.5.3.2--3]{C11}} Hence, assigning to a reference requires the address of the reference variable (\Index{lvalue}): \begin{cfa} (&®*®)r1 = &x;                                  §\C{// (\&*) cancel giving variable r1 not variable pointed-to by r1}§ For a \CFA reference type, the cancellation on the left-hand side of assignment leaves the reference as an address (\Index{lvalue}): \begin{cfa} (&®*®)r1 = &x;                                  §\C{// (\&*) cancel giving address of r1 not variable pointed-to by r1}§ \end{cfa} Similarly, the address of a reference can be obtained for assignment or computation (\Index{rvalue}): \begin{cfa} (&(&®*®)®*®)r3 = &(&®*®)r2;             §\C{// (\&*) cancel giving address of r2, (\&(\&*)*) cancel giving variable r3}§ (&(&®*®)®*®)r3 = &(&®*®)r2;             §\C{// (\&*) cancel giving address of r2, (\&(\&*)*) cancel giving address of r3}§ \end{cfa} Cancellation\index{cancellation!pointer/reference}\index{pointer!cancellation} works to arbitrary depth. \end{cfa} Furthermore, both types are equally performant, as the same amount of dereferencing occurs for both types. Therefore, the choice between them is based solely on whether the address is dereferenced frequently or infrequently, which dictates the amount of dereferencing aid from the compiler. Therefore, the choice between them is based solely on whether the address is dereferenced frequently or infrequently, which dictates the amount of implicit dereferencing aid from the compiler. As for a pointer type, a reference type may have qualifiers: int & const cr = *0;                    §\C{// where 0 is the int * zero}§ \end{cfa} Note, constant reference types do not prevent addressing errors because of explicit storage-management: Note, constant reference-types do not prevent addressing errors because of explicit storage-management: \begin{cfa} int & const cr = *malloc(); cr = 5; delete &cr; cr = 7;                                                 §\C{// unsound pointer dereference}§ where the \CFA declaration is read left-to-right (see \VRef{s:Declarations}). In contract to \CFA reference types, \Index*[C++]{\CC{}}'s reference types are all ©const© references, preventing changes to the reference address, so only value assignment is possible, which eliminates half of the \Index{address duality}. \Index*{Java}'s reference types to objects (because all Java objects are on the heap) are like C pointers, which always manipulate the address and there is no (bit-wise) object assignment, so objects are explicitly cloned by shallow or deep copying, which eliminates half of the address duality. In contrast to \CFA reference types, \Index*[C++]{\CC{}}'s reference types are all ©const© references, preventing changes to the reference address, so only value assignment is possible, which eliminates half of the \Index{address duality}. \Index*{Java}'s reference types to objects (all Java objects are on the heap) are like C pointers, which always manipulate the address, and there is no (bit-wise) object assignment, so objects are explicitly cloned by shallow or deep copying, which eliminates half of the address duality. \Index{Initialization} is different than \Index{assignment} because initialization occurs on the empty (uninitialized) storage on an object, while assignment occurs on possibly initialized storage of an object. There are three initialization contexts in \CFA: declaration initialization, argument/parameter binding, return/temporary binding. For reference initialization (like pointer), the initializing value must be an address (\Index{lvalue}) not a value (\Index{rvalue}). \begin{cfa} int * p = &x;                                   §\C{// both \&x and x are possible interpretations in C}§ int & r = x;                                    §\C{// x unlikely interpretation, because of auto-dereferencing}§ \end{cfa} C allows ©p© to be assigned with ©&x© or ©x© (many compilers warn about the latter assignment). \CFA allows ©r© to be assigned ©x© only because it inferences a dereference for ©x©, by implicitly inserting a address-of operator, ©&©, before the initialization expression because a reference behaves like the variable name it is pointing-to. Similarly, when a reference is used for a parameter/return type, the call-site argument does not require a reference operator for the same reason. \begin{cfa} int & f( int & rp );                    §\C{// reference parameter and return}§ Because the object being initialized has no value, there is only one meaningful semantics with respect to address duality: it must mean address as there is no pointed-to value. In contrast, the left-hand side of assignment has an address that has a duality. Therefore, for pointer/reference initialization, the initializing value must be an address (\Index{lvalue}) not a value (\Index{rvalue}). \begin{cfa} int * p = &x;                           §\C{// must have address of x}§ int & r = x;                            §\C{// must have address of x}§ \end{cfa} Therefore, it is superfluous to require explicitly taking the address of the initialization object, even though the type is incorrect. Hence, \CFA allows ©r© to be assigned ©x© because it infers a reference for ©x©, by implicitly inserting a address-of operator, ©&©, and it is an error to put an ©&© because the types no longer match. Unfortunately, C allows ©p© to be assigned with ©&x© or ©x©, by value, but most compilers warn about the latter assignment as being potentially incorrect. (\CFA extends pointer initialization so a variable name is automatically referenced, eliminating the unsafe assignment.) Similarly, when a reference type is used for a parameter/return type, the call-site argument does not require a reference operator for the same reason. \begin{cfa} int & f( int & r );                             §\C{// reference parameter and return}§ z = f( x ) + f( y );                    §\C{// reference operator added, temporaries needed for call results}§ \end{cfa} Within routine ©f©, it is possible to change the argument by changing the corresponding parameter, and parameter ©rp© can be locally reassigned within ©f©. Within routine ©f©, it is possible to change the argument by changing the corresponding parameter, and parameter ©r© can be locally reassigned within ©f©. Since operator routine ©?+?© takes its arguments by value, the references returned from ©f© are used to initialize compiler generated temporaries with value semantics that copy from the references. \begin{cfa} int temp1 = f( x ), temp2 = f( y ); z = temp1 + temp2; \end{cfa} This implicit referencing is crucial for reducing the syntactic burden for programmers when using references; otherwise references have the same syntactic  burden as pointers in these contexts. When a pointer/reference parameter has a ©const© value (immutable), it is possible to pass literals and expressions. \begin{cfa} void f( ®const® int & crp ); void g( ®const® int * cpp ); void f( ®const® int & cr ); void g( ®const® int * cp ); f( 3 );                   g( &3 ); f( x + y );             g( &(x + y) ); Here, the compiler passes the address to the literal 3 or the temporary for the expression ©x + y©, knowing the argument cannot be changed through the parameter. (The ©&© is necessary for the pointer-type parameter to make the types match, and is a common requirement for a C programmer.) \CFA \emph{extends} this semantics to a mutable pointer/reference parameter, and the compiler implicitly creates the necessary temporary (copying the argument), which is subsequently pointed-to by the reference parameter and can be changed. \begin{cfa} void f( int & rp ); void g( int * pp ); \CFA \emph{extends} this semantics to a mutable pointer/reference parameter, and the compiler implicitly creates the necessary temporary (copying the argument), which is subsequently pointed-to by the reference parameter and can be changed.\footnote{ If whole program analysis is possible, and shows the parameter is not assigned, \ie it is ©const©, the temporary is unnecessary.} \begin{cfa} void f( int & r ); void g( int * p ); f( 3 );                   g( &3 );              §\C{// compiler implicit generates temporaries}§ f( x + y );             g( &(x + y) );  §\C{// compiler implicit generates temporaries}§ The implicit conversion allows seamless calls to any routine without having to explicitly name/copy the literal/expression to allow the call. While \CFA attempts to handle pointers and references in a uniform, symmetric manner, C handles routine objects in an inconsistent way: a routine object is both a pointer and a reference (particle and wave). \begin{cfa} void f( int p ) {...} void (*fp)( int ) = &f;                 §\C{// pointer initialization}§ void (*fp)( int ) = f;                  §\C{// reference initialization}§ %\CFA attempts to handle pointers and references in a uniform, symmetric manner. However, C handles routine objects in an inconsistent way. A routine object is both a pointer and a reference (particle and wave). \begin{cfa} void f( int i ); void (*fp)( int ); fp = f;                                                 §\C{// reference initialization}§ fp = &f;                                                §\C{// pointer initialization}§ fp = *f;                                                §\C{// reference initialization}§ fp(3);                                                  §\C{// reference invocation}§ (*fp)(3);                                               §\C{// pointer invocation}§ fp(3);                                                  §\C{// reference invocation}§ \end{cfa} A routine object is best described by a ©const© reference: \begin{cfa} const void (&fp)( int ) = f; fp( 3 ); fp = ...                                                §\C{// error, cannot change code}§ &fp = ...;                                              §\C{// changing routine reference}§ const void (&fr)( int ) = f; fr = ...                                                §\C{// error, cannot change code}§ &fr = ...;                                              §\C{// changing routine reference}§ fr( 3 );                                                §\C{// reference call to f}§ (*fr)(3);                                               §\C{// error, incorrect type}§ \end{cfa} because the value of the routine object is a routine literal, \ie the routine code is normally immutable during execution.\footnote{ Dynamic code rewriting is possible but only in special circumstances.} \CFA allows this additional use of references for routine objects in an attempt to give a more consistent meaning for them. This situation is different from inferring with reference type being used ... int main() { * [int](int) fp = foo();        §\C{// int (*fp)(int)}§ sout | fp( 3 ) | endl; sout | fp( 3 ) | endl; } \end{cfa} ®int j = 0;®                            §\C{// disallowed}§ case 1: { { ®int k = 0;®                    §\C{// allowed at different nesting levels}§ ...