Changeset 27caf8d for doc/user


Ignore:
Timestamp:
May 19, 2017, 11:56:43 AM (7 years ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
535adab
Parents:
fdca7c6
Message:

updates to pointers/references section

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/user/user.tex

    rfdca7c6 r27caf8d  
    1111%% Created On       : Wed Apr  6 14:53:29 2016
    1212%% Last Modified By : Peter A. Buhr
    13 %% Last Modified On : Wed May 17 22:42:11 2017
    14 %% Update Count     : 1685
     13%% Last Modified On : Fri May 19 11:54:31 2017
     14%% Update Count     : 1735
    1515%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    1616
     
    668668One way to conceptualize the null pointer is that no variable is placed at this address, so the null-pointer address can be used to denote an uninitialized pointer/reference object;
    669669\ie the null pointer is guaranteed to compare unequal to a pointer to any object or routine.}
    670 An address is \newterm{sound}, if it points to a valid memory location in scope, \ie has not been freed.
     670An address is \newterm{sound}, if it points to a valid memory location in scope, \ie within the program's execution-environment and has not been freed.
    671671Dereferencing an \newterm{unsound} address, including the null pointer, is \Index{undefined}, often resulting in a \Index{memory fault}.
    672672
     
    717717\end{quote2}
    718718Finally, the immutable nature of a variable's address and the fact that there is no storage for the variable pointer means pointer assignment\index{pointer!assignment}\index{assignment!pointer} is impossible.
    719 Therefore, the expression ©x = y© has only one meaning, ©*x = *y©, \ie manipulate values, which is why explicitly writing the dereferences is unnecessary even though it occurs implicitly as part of instruction decoding.
     719Therefore, the expression ©x = y© has only one meaning, ©*x = *y©, \ie manipulate values, which is why explicitly writing the dereferences is unnecessary even though it occurs implicitly as part of \Index{instruction decoding}.
    720720
    721721A \Index{pointer}/\Index{reference} object is a generalization of an object variable-name, \ie a mutable address that can point to more than one memory location during its lifetime.
    722 (Similarly, an integer variable can contain multiple integer literals during its lifetime versus an integer constant representing a single literal during its lifetime and, like a variable name, may not occupy storage as the literal is embedded directly into instructions.)
     722(Similarly, an integer variable can contain multiple integer literals during its lifetime versus an integer constant representing a single literal during its lifetime, and like a variable name, may not occupy storage as the literal is embedded directly into instructions.)
    723723Hence, a pointer occupies memory to store its current address, and the pointer's value is loaded by dereferencing, \eg:
    724724\begin{quote2}
     
    736736\end{quote2}
    737737
    738 Notice, an address has a duality\index{address!duality}: a location in memory or the value at that location.
     738Notice, an address has a \Index{duality}\index{address!duality}: a location in memory or the value at that location.
    739739In many cases, a compiler might be able to infer the best meaning for these two cases.
    740 For example, \Index*{Algol68}~\cite{Algol68} inferences pointer dereferencing to select the best meaning for each pointer usage
     740For example, \Index*{Algol68}~\cite{Algol68} infers pointer dereferencing to select the best meaning for each pointer usage
    741741\begin{cfa}
    742742p2 = p1 + x;                                    §\C{// compiler infers *p2 = *p1 + x;}§
     
    745745Unfortunately, automatic dereferencing does not work in all cases, and so some mechanism is necessary to fix incorrect choices.
    746746
    747 Rather than dereference inferencing, most programming languages pick one implicit dereferencing semantics, and the programmer explicitly indicates the other to resolve address-duality.
     747Rather than inferring dereference, most programming languages pick one implicit dereferencing semantics, and the programmer explicitly indicates the other to resolve address-duality.
    748748In C, objects of pointer type always manipulate the pointer object's address:
    749749\begin{cfa}
     
    768768\end{cfa}
    769769
    770 To support this common case, a reference type is introduced in \CFA, denoted by ©&©, which is the opposite dereference semantics to a pointer type, making the value at the pointed-to location the implicit semantics for dereferencing.
     770To support this common case, a reference type is introduced in \CFA, denoted by ©&©, which is the opposite dereference semantics to a pointer type, making the value at the pointed-to location the implicit semantics for dereferencing (similar but not the same as \CC \Index{reference type}s).
    771771\begin{cfa}
    772772int x, y, ®&® r1, ®&® r2, ®&&® r3;
     
    783783®*®r2 = ((®*®r1 + ®*®r2) ®*® (®**®r3 - ®*®r1)) / (®**®r3 - 15);
    784784\end{cfa}
    785 When a reference operation appears beside a dereference operation, \eg ©&*©, they cancel out.\footnote{
     785When a reference operation appears beside a dereference operation, \eg ©&*©, they cancel out.
     786However, in C, the cancellation always yields a value (\Index{rvalue}).\footnote{
    786787The unary ©&© operator yields the address of its operand.
    787788If the operand has type ``type'', the result has type ``pointer to type''.
    788789If the operand is the result of a unary ©*© operator, neither that operator nor the ©&© operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue.~\cite[\S~6.5.3.2--3]{C11}}
    789 Hence, assigning to a reference requires the address of the reference variable (\Index{lvalue}):
    790 \begin{cfa}
    791 (&®*®)r1 = &x;                                  §\C{// (\&*) cancel giving variable r1 not variable pointed-to by r1}§
     790For a \CFA reference type, the cancellation on the left-hand side of assignment leaves the reference as an address (\Index{lvalue}):
     791\begin{cfa}
     792(&®*®)r1 = &x;                                  §\C{// (\&*) cancel giving address of r1 not variable pointed-to by r1}§
    792793\end{cfa}
    793794Similarly, the address of a reference can be obtained for assignment or computation (\Index{rvalue}):
    794795\begin{cfa}
    795 (&(&®*®)®*®)r3 = &(&®*®)r2;             §\C{// (\&*) cancel giving address of r2, (\&(\&*)*) cancel giving variable r3}§
     796(&(&®*®)®*®)r3 = &(&®*®)r2;             §\C{// (\&*) cancel giving address of r2, (\&(\&*)*) cancel giving address of r3}§
    796797\end{cfa}
    797798Cancellation\index{cancellation!pointer/reference}\index{pointer!cancellation} works to arbitrary depth.
     
    810811\end{cfa}
    811812Furthermore, both types are equally performant, as the same amount of dereferencing occurs for both types.
    812 Therefore, the choice between them is based solely on whether the address is dereferenced frequently or infrequently, which dictates the amount of dereferencing aid from the compiler.
     813Therefore, the choice between them is based solely on whether the address is dereferenced frequently or infrequently, which dictates the amount of implicit dereferencing aid from the compiler.
    813814
    814815As for a pointer type, a reference type may have qualifiers:
     
    828829int & const cr = *0;                    §\C{// where 0 is the int * zero}§
    829830\end{cfa}
    830 Note, constant reference types do not prevent addressing errors because of explicit storage-management:
     831Note, constant reference-types do not prevent addressing errors because of explicit storage-management:
    831832\begin{cfa}
    832833int & const cr = *malloc();
     834cr = 5;
    833835delete &cr;
    834836cr = 7;                                                 §\C{// unsound pointer dereference}§
     
    854856where the \CFA declaration is read left-to-right (see \VRef{s:Declarations}).
    855857
    856 In contract to \CFA reference types, \Index*[C++]{\CC{}}'s reference types are all ©const© references, preventing changes to the reference address, so only value assignment is possible, which eliminates half of the \Index{address duality}.
    857 \Index*{Java}'s reference types to objects (because all Java objects are on the heap) are like C pointers, which always manipulate the address and there is no (bit-wise) object assignment, so objects are explicitly cloned by shallow or deep copying, which eliminates half of the address duality.
     858In contrast to \CFA reference types, \Index*[C++]{\CC{}}'s reference types are all ©const© references, preventing changes to the reference address, so only value assignment is possible, which eliminates half of the \Index{address duality}.
     859\Index*{Java}'s reference types to objects (all Java objects are on the heap) are like C pointers, which always manipulate the address, and there is no (bit-wise) object assignment, so objects are explicitly cloned by shallow or deep copying, which eliminates half of the address duality.
    858860
    859861\Index{Initialization} is different than \Index{assignment} because initialization occurs on the empty (uninitialized) storage on an object, while assignment occurs on possibly initialized storage of an object.
    860862There are three initialization contexts in \CFA: declaration initialization, argument/parameter binding, return/temporary binding.
    861 For reference initialization (like pointer), the initializing value must be an address (\Index{lvalue}) not a value (\Index{rvalue}).
    862 \begin{cfa}
    863 int * p = &x;                                   §\C{// both \&x and x are possible interpretations in C}§
    864 int & r = x;                                    §\C{// x unlikely interpretation, because of auto-dereferencing}§
    865 \end{cfa}
    866 C allows ©p© to be assigned with ©&x© or ©x© (many compilers warn about the latter assignment).
    867 \CFA allows ©r© to be assigned ©x© only because it inferences a dereference for ©x©, by implicitly inserting a address-of operator, ©&©, before the initialization expression because a reference behaves like the variable name it is pointing-to.
    868 Similarly, when a reference is used for a parameter/return type, the call-site argument does not require a reference operator for the same reason.
    869 \begin{cfa}
    870 int & f( int & rp );                    §\C{// reference parameter and return}§
     863Because the object being initialized has no value, there is only one meaningful semantics with respect to address duality: it must mean address as there is no pointed-to value.
     864In contrast, the left-hand side of assignment has an address that has a duality.
     865Therefore, for pointer/reference initialization, the initializing value must be an address (\Index{lvalue}) not a value (\Index{rvalue}).
     866\begin{cfa}
     867int * p = &x;                           §\C{// must have address of x}§
     868int & r = x;                            §\C{// must have address of x}§
     869\end{cfa}
     870Therefore, it is superfluous to require explicitly taking the address of the initialization object, even though the type is incorrect.
     871Hence, \CFA allows ©r© to be assigned ©x© because it infers a reference for ©x©, by implicitly inserting a address-of operator, ©&©, and it is an error to put an ©&© because the types no longer match.
     872Unfortunately, C allows ©p© to be assigned with ©&x© or ©x©, by value, but most compilers warn about the latter assignment as being potentially incorrect.
     873(\CFA extends pointer initialization so a variable name is automatically referenced, eliminating the unsafe assignment.)
     874Similarly, when a reference type is used for a parameter/return type, the call-site argument does not require a reference operator for the same reason.
     875\begin{cfa}
     876int & f( int & r );                             §\C{// reference parameter and return}§
    871877z = f( x ) + f( y );                    §\C{// reference operator added, temporaries needed for call results}§
    872878\end{cfa}
    873 Within routine ©f©, it is possible to change the argument by changing the corresponding parameter, and parameter ©rp© can be locally reassigned within ©f©.
     879Within routine ©f©, it is possible to change the argument by changing the corresponding parameter, and parameter ©r© can be locally reassigned within ©f©.
    874880Since operator routine ©?+?© takes its arguments by value, the references returned from ©f© are used to initialize compiler generated temporaries with value semantics that copy from the references.
     881\begin{cfa}
     882int temp1 = f( x ), temp2 = f( y );
     883z = temp1 + temp2;
     884\end{cfa}
     885This implicit referencing is crucial for reducing the syntactic burden for programmers when using references;
     886otherwise references have the same syntactic  burden as pointers in these contexts.
    875887
    876888When a pointer/reference parameter has a ©const© value (immutable), it is possible to pass literals and expressions.
    877889\begin{cfa}
    878 void f( ®const® int & crp );
    879 void g( ®const® int * cpp );
     890void f( ®const® int & cr );
     891void g( ®const® int * cp );
    880892f( 3 );                   g( &3 );
    881893f( x + y );             g( &(x + y) );
     
    883895Here, the compiler passes the address to the literal 3 or the temporary for the expression ©x + y©, knowing the argument cannot be changed through the parameter.
    884896(The ©&© is necessary for the pointer-type parameter to make the types match, and is a common requirement for a C programmer.)
    885 \CFA \emph{extends} this semantics to a mutable pointer/reference parameter, and the compiler implicitly creates the necessary temporary (copying the argument), which is subsequently pointed-to by the reference parameter and can be changed.
    886 \begin{cfa}
    887 void f( int & rp );
    888 void g( int * pp );
     897\CFA \emph{extends} this semantics to a mutable pointer/reference parameter, and the compiler implicitly creates the necessary temporary (copying the argument), which is subsequently pointed-to by the reference parameter and can be changed.\footnote{
     898If whole program analysis is possible, and shows the parameter is not assigned, \ie it is ©const©, the temporary is unnecessary.}
     899\begin{cfa}
     900void f( int & r );
     901void g( int * p );
    889902f( 3 );                   g( &3 );              §\C{// compiler implicit generates temporaries}§
    890903f( x + y );             g( &(x + y) );  §\C{// compiler implicit generates temporaries}§
     
    894907The implicit conversion allows seamless calls to any routine without having to explicitly name/copy the literal/expression to allow the call.
    895908
    896 While \CFA attempts to handle pointers and references in a uniform, symmetric manner, C handles routine objects in an inconsistent way: a routine object is both a pointer and a reference (particle and wave).
    897 \begin{cfa}
    898 void f( int p ) {...}
    899 void (*fp)( int ) = &f;                 §\C{// pointer initialization}§
    900 void (*fp)( int ) = f;                  §\C{// reference initialization}§
     909%\CFA attempts to handle pointers and references in a uniform, symmetric manner.
     910However, C handles routine objects in an inconsistent way.
     911A routine object is both a pointer and a reference (particle and wave).
     912\begin{cfa}
     913void f( int i );
     914void (*fp)( int );
     915fp = f;                                                 §\C{// reference initialization}§
     916fp = &f;                                                §\C{// pointer initialization}§
     917fp = *f;                                                §\C{// reference initialization}§
     918fp(3);                                                  §\C{// reference invocation}§
    901919(*fp)(3);                                               §\C{// pointer invocation}§
    902 fp(3);                                                  §\C{// reference invocation}§
    903920\end{cfa}
    904921A routine object is best described by a ©const© reference:
    905922\begin{cfa}
    906 const void (&fp)( int ) = f;
    907 fp( 3 );
    908 fp = ...                                                §\C{// error, cannot change code}§
    909 &fp = ...;                                              §\C{// changing routine reference}§
     923const void (&fr)( int ) = f;
     924fr = ...                                                §\C{// error, cannot change code}§
     925&fr = ...;                                              §\C{// changing routine reference}§
     926fr( 3 );                                                §\C{// reference call to f}§
     927(*fr)(3);                                               §\C{// error, incorrect type}§
    910928\end{cfa}
    911929because the value of the routine object is a routine literal, \ie the routine code is normally immutable during execution.\footnote{
    912930Dynamic code rewriting is possible but only in special circumstances.}
    913931\CFA allows this additional use of references for routine objects in an attempt to give a more consistent meaning for them.
     932
     933This situation is different from inferring with reference type being used ...
     934
    914935
    915936
     
    15291550int main() {
    15301551        * [int](int) fp = foo();        §\C{// int (*fp)(int)}§
    1531     sout | fp( 3 ) | endl;
     1552        sout | fp( 3 ) | endl;
    15321553}
    15331554\end{cfa}
     
    21452166        ®int j = 0;®                            §\C{// disallowed}§
    21462167  case 1:
    2147     {
     2168        {
    21482169                ®int k = 0;®                    §\C{// allowed at different nesting levels}§
    21492170                ...
Note: See TracChangeset for help on using the changeset viewer.