Changeset ad9f593 for doc/theses


Ignore:
Timestamp:
Nov 1, 2024, 4:48:29 PM (12 days ago)
Author:
Michael Brooks <mlbrooks@…>
Branches:
master
Children:
63a7394
Parents:
b7921d8
Message:

Thesis, background, array: flesh out and rework section Array Parameter Declaration

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/background.tex

    rb7921d8 rad9f593  
    168168The conjoining of pointers and arrays could also be applied to structures, where a pointer references a structure field like an array element.
    169169Finally, while subscripting involves pointer arithmetic (as does a field reference @x.y.z@), the computation is complex for multi-dimensional arrays and requires array descriptors to know stride lengths along dimensions.
    170 Many C errors result from manually performing pointer arithmetic instead of using language subscripting, letting the compiler performs any arithmetic;
    171 some C textbooks erroneously suggest manual pointer arithmetic is faster than subscripting.
     170Many C errors result from manually performing pointer arithmetic instead of using language subscripting, letting the compiler perform any arithmetic.
     171
     172Some C textbooks erroneously suggest manual pointer arithmetic is faster than subscripting.
    172173A sound and efficient C program does not require explicit pointer arithmetic.
    173 
    174 C semantics want a programmer to \emph{believe} an array variable is a ``pointer to its first element.''
     174TODO: provide an example, explain the belief, and give modern refutation
     175
     176C semantics wants a programmer to \emph{believe} an array variable is a ``pointer to its first element.''
    175177This desire becomes apparent by a detailed inspection of an array declaration.
    176178\lstinput{34-34}{bkgd-carray-arrty.c}
    177179The inspection begins by using @sizeof@ to provide program semantics for the intuition of an expression's type.
     180An architecture with 64-bit pointer size is used, to keep irrelevant details fixed.
    178181\lstinput{35-36}{bkgd-carray-arrty.c}
    179182Now consider the @sizeof@ expressions derived from @ar@, modified by adding pointer-to and first-element (and including unnecessary parentheses to avoid any confusion about precedence).
    180183\lstinput{37-40}{bkgd-carray-arrty.c}
    181184Given that arrays are contiguous and the size of @float@ is 4, then the size of @ar@ with 10 floats being 40 bytes is common reasoning for C programmers.
    182 Equally, C programmers know the size of a pointer to the first array element is 8 (or 4 depending on the addressing architecture).
     185Equally, C programmers know the size of a pointer to the first array element is 8.
    183186% Now, set aside for a moment the claim that this first assertion is giving information about a type.
    184187Clearly, an array and a pointer to its first element are different.
    185188
    186 In fact, the idea that there is such a thing as a pointer to an array may be surprising and it is not the same thing as a pointer to the first element.
     189In fact, the idea that there is such a thing as a pointer to an array may be surprising.
     190It it is not the same thing as a pointer to the first element.
    187191\lstinput{42-45}{bkgd-carray-arrty.c}
    188192The first assignment generates:
     
    196200Using the type form yields the same results as the prior expression form.
    197201\lstinput{46-49}{bkgd-carray-arrty.c}
    198 The results are also the same when there is no allocation using a pointer-to-array type.
     202The results are also the same when there is no allocation at all.
     203This time, starting from a pointer-to-array type:
    199204\lstinput{51-57}{bkgd-carray-arrty.c}
    200 Hence, in all cases, @sizeof@ is informing about type information.
     205Hence, in all cases, @sizeof@ is reporting on type information.
    201206
    202207Therefore, thinking of an array as a pointer to its first element is too simplistic an analogue and it is not backed up by the type system.
     
    527532\subsection{Array Parameter Declaration}
    528533
    529 C has a formal and actual declaration for functions to allow definition-before-use and separate compilation, where formal describes a type and an actual defines the type.
    530 \begin{cfa}
    531 int foo( int, float, char );                            $\C{// formal, parameter names option}$
    532 int foo( int i, float f, char c ) { ... }       $\C{// actual}$
    533 \end{cfa}
    534 For array parameters, a formal parameter array declaration can specify the first dimension with a dimension value, @[10]@ (which is ignored), an empty dimension list, @[ ]@, or a pointer, @*@:
     534Passing an array along with a function call is obviously useful.
     535Let us say that a parameter is an array parameter when the called function intends to subscript it.
     536This section asserts that a more satisfactory/formal characterization does not exist in C, surveys the ways that C API authors communicate ``@p@ has zero or more @T@s,'' and calls out the minority cases where the C type system is using or verifying such claims.
     537
     538A C function's parameter declarations look different, from the caller's and callee's perspectives.
     539Both perspectives consist of the text read by a programmer and the semantics enforced by the type system.
     540The caller's perspecitve is available from a mere function declaration (which allow definition-before-use and separate compilation), but can also be read from (the non-body part of) a function definition.
     541The callee's perspective is what is available inside the function.
     542\begin{cfa}
     543        int foo( int, float, char );                            $\C{// declaration, names optional}$
     544        int bar( int i, float f, char c ) {             $\C{// definition, names mandatory}$
     545                $/* caller's perspective of foo's; callee's perspective of bar's */$
     546                ...
     547        }
     548        $/* caller's persepectives of foo's and bar's */$
     549\end{cfa}
     550The caller's perspective is more limited.
     551The example shows, so far, that parameter names (by virtue of being optional) are really comments in the caller's perspective, while they are semantically significant in the callee's perspective.
     552Array parameters introduce a further, subtle, semantic difference and considerable freedom to comment.
     553
     554At the semantic level, there is no such thing as an array parameter, except for one case (@T[static 5]@) discussed shortly.
     555Rather, there are only pointer parameters.
     556This fact probably shares considerable responsibility for the common sense of ``an array is just a pointer,'' wich has been refuted in non-parameter contexts.
     557This fact holds in both the caller's and callee's perspectives.
     558However, a parameter's type can include ``array of.''
     559For example, the type ``pointer to array of 5 ints'' (@T(*)[5]@) is a pointer type, a fully meaningful parameter type (in the sense that this description does not contain any information that the type system ignores), and a type that appears the same in the caller's \vs callee's perspectives.
     560The outermost type constructor (syntactically first dimension) is really the one that determines the flavour of parameter.
     561
     562\begin{figure}
    535563\begin{cquote}
    536564\begin{tabular}{@{}llll@{}}
    537565\begin{cfa}
    538 double sum( double [5] );
    539 double sum( double *[5] );
     566float sum( float a[5] );
     567float sum( float a[5][4] );
     568float sum( float a[5][] );
     569float sum( float a[5]* );
     570float sum( float *a[5] );
    540571\end{cfa}
    541572&
    542573\begin{cfa}
    543 double sum( double [ ] );
    544 double sum( double *[ ] );
     574float sum( float a[] );
     575float sum( float a[][4] );
     576float sum( float a[][] );
     577float sum( float a[]* );
     578float sum( float *a[] );
    545579\end{cfa}
    546580&
    547581\begin{cfa}
    548 double sum( double * );
    549 double sum( double ** );
     582float sum( float *a );
     583float sum( float (*a)[4] );
     584float sum( float (*a)[] );
     585float sum( float (*a)* );
     586float sum( float **a );
    550587\end{cfa}
    551588&
    552589\begin{cfa}
    553 // array
    554 // matrix
     590// ar of float
     591// mat of float
     592// invalid
     593// invalid
     594// ar of ptr to float
    555595\end{cfa}
    556596\end{tabular}
    557597\end{cquote}
    558 Good practice uses the middle form as it clearly indicates the parameter is subscripted.
    559 However, an actual declaration cannot use @[ ]@;
    560 it must use @*@.
    561 \begin{cfa}
    562 double sum( double v[ ] ) {                                     $\C{// formal declaration}$
    563 double * cv;                                                            $\C{// actual declaration, think cv[ ]}$
    564 sum( cv );                                                                      $\C{// address assignment v = cv}$
    565 \end{cfa}
    566 
    567 Given the formal dimension forms @[ ]@ or @[5]@, it raises the question of qualifying the implicit array pointer rather than the array element type.
     598\caption{Multiple ways to declare an arrray parameter.  Across a valid row, every declaration is equivalent.  Each column gives a declaration style.  Really, the style can be read from the first row only.  The second row shows how the style extends to multiple dimensions, with the rows thereafter providing context for the choice of which second-row \lstinline{[]}receives the column-style variation.}
     599\label{f:ArParmEquivDecl}
     600\end{figure}
     601
     602Yet, C allows array syntax for the outermost type constructor, from which comes the freedom to comment.
     603An array parameter declaration can specify the outermost dimension with a dimension value, @[10]@ (which is ignored), an empty dimension list, @[ ]@, or a pointer, @*@, as seen in \VRef[Figure]{f:ArParmEquivDecl}.  The rationale for rejecting the first ``invalid'' row follows shortly, while the second ``invalid'' row is simple nonsense, included to complete the pattern; its syntax hints at what the final row actually achieves.
     604
     605In the lefmost style, the typechecker ignores the actual value in most practical cases.
     606This value is allowed to be a dynamic expression, so it is \emph{possible} to use the leftmost style in many practical cases.
     607
     608% To help contextualize the matrix part of this example, the syntaxes @float [5][]@, @float [][]@ and @float (*)[]@ are all rejected, for reasons discussed shortly.
     609% So are @float[5]*@, @float[]*@ and @float (*)*@.  These latter ones are simply nonsense, though they hint at ``1d array of pointers'', whose equivalent syntax options are, @float *[5]@, @float *[]@, and @float **@.
     610
     611It is a matter of taste as to whether a programmer should use a form as far left as possible (getting the most out of syntactically integrated comments), sticking to the right (avoiding false comfort from suggesting the typechecker is checking more than it is), or compromising in the middle (reducing unchecked information, yet clearly stating, ``I will subscript this one'').
     612
     613Note that this equivalence of pointer and array declarations is special to paramters.
     614It does not apply to local variables, where true array declarations are possible.
     615\begin{cfa}
     616void f( float * a ) {
     617        float * b = a; // ok
     618        float c[] = a; // reject
     619        float d[] = { 1.0, 2.0, 3.0 }; // ok
     620        static_assert( sizeof(b) == sizeof(float*) );
     621        static_assert( sizeof(d) != sizeof(float*) );
     622}
     623\end{cfa}
     624This equivalence has the consequence that the type system does not help a caller get it right.
     625\begin{cfa}
     626float sum( float v[] );
     627float arg = 3.14;
     628sum( &arg );                                                            $\C{// accepted, v := \&arg}$
     629\end{cfa}
     630
     631Given the syntactic dimension forms @[ ]@ or @[5]@, it raises the question of qualifying the implied array pointer rather than the array element type.
    568632For example, the qualifiers after the @*@ apply to the array pointer.
    569633\begin{cfa}
     
    571635void foo( const volatile int [ ] @const volatile@ ); // does not parse
    572636\end{cfa}
    573 C addressed this shortcoming by moving the pointer qualifiers into the first dimension.
     637C instead puts these pointer qualifiers syntactically into the first dimension.
    574638\begin{cquote}
    575639@[@ \textit{type-qualifier-list}$_{opt}$ \textit{assignment-expression}$_{opt}$ @]@
     
    580644\end{cfa}
    581645
    582 To make the first formal dimension size meaningful, C adds this form.
     646To make the first dimension size meaningful, C adds this form.
    583647\begin{cquote}
    584648@[@ @static@ \textit{type-qualifier-list}$_{opt}$ \textit{assignment-expression} @]@
     
    590654\end{cfa}
    591655Here, the @static@ storage qualifier defines the minimum array size for its argument.
    592 @gcc@ ignores this dimension qualifier, \ie it gives no warning if the argument array size is less than the parameter minimum.
    593 
    594 Finally, to handle VLAs, C repurposed the @*@ \emph{within} the dimension in the formal declaration context to mean the argument must be a VLA (contiguous).
     656@gcc@ ignores this dimension qualifier, \ie it gives no warning if the argument array size is less than the parameter minimum.  However, @clang@ implements the check, in accordance with the standard.  TODO: be specific about versions
     657
     658Note that there are now two different meanings for modifiers in the same position.  In
     659\begin{cfa}
     660void foo( int x[static const volatile 3] );
     661\end{cfa}
     662the @static@ applies to the 3, while the @const volatile@ applies to the @x@.
     663
     664With multidimensional arrays, on dimensions after the first, a size is required and, is not ignored.
     665These sizes are required for the callee to be able to subscript.
     666\begin{cfa}
     667void f( float a[][10], float b[][100] ) {
     668    static_assert( ((char*)&a([1])) - ((char*)&a([0])) == 10 * sizeof(float) );
     669    static_assert( ((char*)&b([1])) - ((char*)&b([0])) == 100 * sizeof(float) );
     670}
     671\end{cfa}
     672Here, the distance between the first and second elements of each array depends on the inner dimension size.
     673
     674The last observation is a fact of the callee's perspective.
     675There is little type-system checking, in the caller's perspective, that what is being passed, matches.
     676\begin{cfa}
     677void f( float [][10] );
     678int n = 100;
     679float a[100], b[n];
     680f(&a); // reject
     681f(&b); // accept
     682\end{cfa}
     683This size is therefore, a callee's assumption.
     684
     685Finally, to handle higher-dimensional VLAs, C repurposed the @*@ \emph{within} the dimension in a declaration to mean that the callee will have make an assumption about the size here, but no (unchecked, possibly wrong) information about this assumption is included for the caller-programmer's benefit/overconfidence.
    595686\begin{cquote}
    596687@[@ \textit{type-qualifier-list$_{opt}$} @* ]@
    597688\end{cquote}
    598689\begin{cfa}
    599 void foo( int [@*@][@*@] );                                     $\C{// formal}$
    600 void foo( int ar[10][10] ) { ... }                      $\C{// actual}$
    601 int ar[2][10];                                                          $\C{// contiguous}$
    602 foo( ar );                                                                      $\C{// valid}$
    603 int * arp[10];                                                          $\C{// non-contiguous}$
    604 foo( arp );                                                                     $\C{// invalid}$
    605 \end{cfa}
    606 This syntactic form for the formal prototype means the header file does not have to commit to specific dimension values, but the compiler knows the argument is a contiguous array.
     690void foo( float [][@*@] );                                              $\C{// declaration}$
     691void foo( float a[][10] ) { ... }                               $\C{// definition}$
     692\end{cfa}
     693Repeating it with the full context of a VLA is useful:
     694\begin{cfa}
     695void foo( int, float [][@*@] );                                 $\C{// declaration}$
     696void foo( int n, float a[][n] ) { ... }                 $\C{// definition}$
     697\end{cfa}
     698Omitting the dimension from the declaration is consistent with omitting parameter names, for the declaration case has no name @n@ in scope.
     699The omission is also redacting all information not needed to generate correct caller-side code.
    607700
    608701
Note: See TracChangeset for help on using the changeset viewer.