Changeset 63cf80e for doc/theses


Ignore:
Timestamp:
Oct 29, 2024, 12:48:41 PM (7 weeks ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
7ef4438
Parents:
bf91d1d
Message:

finish proofreading the array section of the background chapter

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/background.tex

    rbf91d1d r63cf80e  
    2323with a segmentation fault at runtime.
    2424Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems inappropriate.
    25 Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings, \eg unused variable.
     25Compiling with flag @-Werror@, which turns warnings into errors, is often too pervasive, because some warnings are just warnings, \eg an unused variable.
    2626In the following discussion, ``ill-typed'' means giving a nonzero @gcc@ exit condition with a message that discusses typing.
    2727Note, \CFA's type-system rejects all these ill-typed cases as type mismatch errors.
     
    3838
    3939
    40 \section{Array}
    41 
    42 At the start, the C programming language made a significant design mistake.
    43 \begin{quote}
    44 In C, there is a strong relationship between pointers and arrays, strong enough that pointers and arrays really should be treated simultaneously.
    45 Any operation which can be achieved by array subscripting can also be done with pointers.~\cite[p.~93]{C:old}
    46 \end{quote}
    47 Accessing any storage requires pointer arithmetic, even if it is just base-displacement addressing in an instruction.
    48 The conjoining of pointers and arrays could also be applied to structures, where a pointer references a structure field like an array element.
    49 Finally, while subscripting involves pointer arithmetic (as does field references @x.y.z@), the computation is very complex for multi-dimensional arrays and requires array descriptors to know stride lengths along dimensions.
    50 Many C errors result from performing pointer arithmetic instead of using subscripting;
    51 some C textbooks teach pointer arithmetic erroneously suggesting it is faster than subscripting.
    52 A sound and efficient C program does not require explicit pointer arithmetic.
    53 
    54 C semantics want a programmer to \emph{believe} an array variable is a ``pointer to its first element.''
    55 This desire becomes apparent by a detailed inspection of an array declaration.
    56 \lstinput{34-34}{bkgd-carray-arrty.c}
    57 The inspection begins by using @sizeof@ to provide definite program semantics for the intuition of an expression's type.
    58 \lstinput{35-36}{bkgd-carray-arrty.c}
    59 Now consider the sizes of expressions derived from @ar@, modified by adding ``pointer to'' and ``first element'' (and including unnecessary parentheses to avoid confusion about precedence).
    60 \lstinput{37-40}{bkgd-carray-arrty.c}
    61 Given the size of @float@ is 4, the size of @ar@ with 10 floats being 40 bytes is common reasoning for C programmers.
    62 Equally, C programmers know the size of a \emph{pointer} to the first array element is 8 (or 4 depending on the addressing architecture).
    63 % Now, set aside for a moment the claim that this first assertion is giving information about a type.
    64 Clearly, an array and a pointer to its first element are different.
    65 
    66 In fact, the idea that there is such a thing as a pointer to an array may be surprising and it is not the same thing as a pointer to the first element.
    67 \lstinput{42-45}{bkgd-carray-arrty.c}
    68 The first assignment gets
    69 \begin{cfa}
    70 warning: assignment to `float (*)[10]' from incompatible pointer type `float *'
    71 \end{cfa}
    72 and the second assignment gets the opposite.
    73 
    74 The inspection now refutes any suggestion that @sizeof@ is informing about allocation rather than type information.
    75 Note, @sizeof@ has two forms, one operating on an expression and the other on a type.
    76 Using the type form yields the same results as the prior expression form.
    77 \lstinput{46-49}{bkgd-carray-arrty.c}
    78 The results are also the same when there is \emph{no allocation} using a pointer-to-array type.
    79 \lstinput{51-57}{bkgd-carray-arrty.c}
    80 Hence, in all cases, @sizeof@ is informing about type information.
    81 
    82 So, thinking of an array as a pointer to its first element is too simplistic an analogue and it is not backed up by the type system.
    83 This misguided analogue works for a single-dimension array but there is no advantage other than possibly teaching beginning programmers about basic runtime array-access.
    84 
    85 Continuing, a short form for declaring array variables exists using length information provided implicitly by an initializer.
    86 \lstinput{59-62}{bkgd-carray-arrty.c}
    87 The compiler counts the number of initializer elements and uses this value as the first dimension.
    88 Unfortunately, the implicit element counting does not extend to dimensions beyond the first.
    89 \lstinput{64-67}{bkgd-carray-arrty.c}
    90 
    91 My contribution is recognizing:
     40\section{Reading declarations}
     41
     42A significant area of confusion for reading C declarations results from:
    9243\begin{itemize}
    93         \item There is value in using a type that knows its size.
    94         \item The type pointer to (first) element does not.
    95         \item C \emph{has} a type that knows the whole picture: array, e.g. @T[10]@.
    96         \item This type has all the usual derived forms, which also know the whole picture.
    97         A usefully noteworthy example is pointer to array, e.g. @T (*)[10]@.\footnote{
    98         The parenthesis are necessary because subscript has higher priority than pointer in C declarations.
    99         (Subscript also has higher priority than dereference in C expressions.)}
     44\item
     45C is unique in having dimension be higher priority than pointer in declarations.\footnote{
     46For consistency, subscript has higher priority than dereference, yielding \lstinline{(*arp)[3]} rather than \lstinline{*arp[3]}.}
     47\item
     48Embedding a declared variable in a declaration, mimics the way the variable is used in executable statements.
    10049\end{itemize}
    101 
    102 
    103 \section{Reading declarations}
    104 
    105 A significant area of confusion for reading C declarations results from embedding a declared variable in a declaration, mimicking the way the variable is used in executable statements.
    10650\begin{cquote}
    10751\begin{tabular}{@{}ll@{}}
     
    11862\end{tabular}
    11963\end{cquote}
    120 Essentially, the type is wrapped around the name in successive layers (like an \Index{onion}).
     64The parenthesis are necessary to achieve a pointer to a @T@, and the type is wrapped around the name in successive layers (like an \Index{onion}) to match usage in an expression.
    12165While attempting to make the two contexts consistent is a laudable goal, it has not worked out in practice, even though Dennis Richie believed otherwise:
    12266\begin{quote}
     
    15599\end{cquote}
    156100As declaration size increases, it becomes corresponding difficult to read and understand the C declaration form, whereas reading and understanding a \CFA declaration has linear complexity as the declaration size increases.
    157 Note, writing declarations left to right is common in other programming languages, where the function return-type is often placed after the parameter declarations.
     101Note, writing declarations left to right is common in other programming languages, where the function return-type is often placed after the parameter declarations, \eg \CC \lstinline[language=C++]{auto f( int ) -> int}.
     102Unfortunately, \CFA cannot interchange the priorities of subscript and dereference in expressions without breaking C compatibility.
    158103
    159104\VRef[Table]{bkgd:ar:usr:avp} introduces the many layers of the C and \CFA array story, where the \CFA story is discussion in \VRef[Chapter]{c:Array}.
    160 The \CFA-thesis column shows the new array declaration form, which is my contributed improvements for safety and ergonomics.
     105The \CFA-thesis column shows the new array declaration form, which is my contribution to safety and ergonomics.
    161106The table shows there are multiple yet equivalent forms for the array types under discussion, and subsequent discussion shows interactions with orthogonal (but easily confused) language features.
    162107Each row of the table shows alternate syntactic forms.
    163108The simplest occurrences of types distinguished in the preceding discussion are marked with $\triangleright$.
    164 Removing the declared variable @x@, gives the type used for variable, structure field, cast or error messages \PAB{(though note Section TODO points out that some types cannot be casted to)}.
    165 Unfortunately, parameter declarations \PAB{(section TODO)} have more syntactic forms and rules.
     109Removing the declared variable @x@, gives the type used for variable, structure field, cast, or error messages.
     110Unfortunately, parameter declarations have more syntactic forms and rules.
    166111
    167112\begin{table}
     
    211156\end{table}
    212157
    213 TODO: Address these parked unfortunate syntaxes
    214 \begin{itemize}
    215         \item static
    216         \item star as dimension
    217         \item under pointer decay: @int p1[const 3]@ being @int const *p1@
     158
     159\section{Array}
     160\label{s:Array}
     161
     162At the start, the C language designers made a significant design mistake with respect to arrays.
     163\begin{quote}
     164In C, there is a strong relationship between pointers and arrays, strong enough that pointers and arrays really should be treated simultaneously.
     165Any operation which can be achieved by array subscripting can also be done with pointers.~\cite[p.~93]{C:old}
     166\end{quote}
     167Accessing any storage requires pointer arithmetic, even if it is just base-displacement addressing in an instruction.
     168The conjoining of pointers and arrays could also be applied to structures, where a pointer references a structure field like an array element.
     169Finally, while subscripting involves pointer arithmetic (as does a field reference @x.y.z@), the computation is complex for multi-dimensional arrays and requires array descriptors to know stride lengths along dimensions.
     170Many C errors result from manually performing pointer arithmetic instead of using language subscripting, letting the compiler performs any arithmetic;
     171some C textbooks erroneously suggest manual pointer arithmetic is faster than subscripting.
     172A sound and efficient C program does not require explicit pointer arithmetic.
     173
     174C semantics want a programmer to \emph{believe} an array variable is a ``pointer to its first element.''
     175This desire becomes apparent by a detailed inspection of an array declaration.
     176\lstinput{34-34}{bkgd-carray-arrty.c}
     177The inspection begins by using @sizeof@ to provide program semantics for the intuition of an expression's type.
     178\lstinput{35-36}{bkgd-carray-arrty.c}
     179Now consider the @sizeof@ expressions derived from @ar@, modified by adding pointer-to and first-element (and including unnecessary parentheses to avoid any confusion about precedence).
     180\lstinput{37-40}{bkgd-carray-arrty.c}
     181Given that arrays are contiguous and the size of @float@ is 4, then the size of @ar@ with 10 floats being 40 bytes is common reasoning for C programmers.
     182Equally, C programmers know the size of a pointer to the first array element is 8 (or 4 depending on the addressing architecture).
     183% Now, set aside for a moment the claim that this first assertion is giving information about a type.
     184Clearly, an array and a pointer to its first element are different.
     185
     186In fact, the idea that there is such a thing as a pointer to an array may be surprising and it is not the same thing as a pointer to the first element.
     187\lstinput{42-45}{bkgd-carray-arrty.c}
     188The first assignment generates:
     189\begin{cfa}
     190warning: assignment to `float (*)[10]' from incompatible pointer type `float *'
     191\end{cfa}
     192and the second assignment generates the opposite.
     193
     194The inspection now refutes any suggestion that @sizeof@ is informing about allocation rather than type information.
     195Note, @sizeof@ has two forms, one operating on an expression and the other on a type.
     196Using the type form yields the same results as the prior expression form.
     197\lstinput{46-49}{bkgd-carray-arrty.c}
     198The results are also the same when there is no allocation using a pointer-to-array type.
     199\lstinput{51-57}{bkgd-carray-arrty.c}
     200Hence, in all cases, @sizeof@ is informing about type information.
     201
     202Therefore, thinking of an array as a pointer to its first element is too simplistic an analogue and it is not backed up by the type system.
     203This misguided analogue works for a single-dimension array but there is no advantage other than possibly teaching beginner programmers about basic runtime array-access.
     204
     205Continuing, there is a short form for declaring array variables using length information provided implicitly by an initializer.
     206\lstinput{59-62}{bkgd-carray-arrty.c}
     207The compiler counts the number of initializer elements and uses this value as the first dimension.
     208Unfortunately, the implicit element counting does not extend to dimensions beyond the first.
     209\lstinput{64-67}{bkgd-carray-arrty.c}
     210
     211My observation is recognizing:
     212\begin{itemize}[leftmargin=*,topsep=0pt,itemsep=0pt]
     213        \item There is value in using a type that knows its size.
     214        \item The type pointer to the (first) element does not.
     215        \item C \emph{has} a type that knows the whole picture: array, \eg @T[10]@.
     216        \item This type has all the usual derived forms, which also know the whole picture.
     217        A noteworthy example is pointer to array, \eg @T (*)[10]@.
    218218\end{itemize}
    219219
     
    221221\subsection{Arrays decay and pointers diffract}
    222222
    223 The last section established the difference between these four types:
     223The last section established the difference among these four types:
    224224\lstinput{3-6}{bkgd-carray-decay.c}
    225225But the expression used for obtaining the pointer to the first element is pedantic.
     
    228228which reproduces @pa0@, in type and value:
    229229\lstinput{9-9}{bkgd-carray-decay.c}
    230 The validity of this initialization is unsettling, in the context of the facts established in the last section.
     230The validity of this initialization is unsettling, in the context of the facts established in \VRef{s:Array}.
    231231Notably, it initializes name @pa0x@ from expression @ar@, when they are not of the same type:
    232232\lstinput{10-10}{bkgd-carray-decay.c}
     
    239239\end{quote}
    240240This phenomenon is the famous \newterm{pointer decay}, which is a decay of an array-typed expression into a pointer-typed one.
    241 It is worthy to note that the list of exception cases does not feature the occurrence of @ar@ in @ar[i]@.
     241It is worthy to note that the list of exceptional cases does not feature the occurrence of @ar@ in @ar[i]@.
    242242Thus, subscripting happens on pointers not arrays.
    243243
     
    247247Taken together, these rules illustrate that @ar[i]@ and @i[a]@ mean the same thing!
    248248
    249 Subscripting a pointer when the target is standard inappropriate is still practically well-defined.
     249Subscripting a pointer when the target is standard-inappropriate is still practically well-defined.
    250250While the standard affords a C compiler freedom about the meaning of an out-of-bound access, or of subscripting a pointer that does not refer to an array element at all,
    251251the fact that C is famously both generally high-performance, and specifically not bound-checked, leads to an expectation that the runtime handling is uniform across legal and illegal accesses.
     
    266266\cite[\S~6.7.6.3.7]{C11} explains that when an array type is written for a parameter,
    267267the parameter's type becomes a type that can be summarized as the array-decayed type.
    268 The respective handling of the following two parameter spellings shows that the array-spelled one is really, like the other, a pointer.
     268The respective handling of the following two parameter spellings shows that the array and pointer versions are identical.
    269269\lstinput{12-16}{bkgd-carray-decay.c}
    270270As the @sizeof(x)@ meaning changed, compared with when run on a similarly-spelled local variable declaration,
     
    292292warning: passing argument 1 of 'edit' discards 'const' qualifier from pointer target type
    293293\end{cfa}
    294 The basic two meanings, with a syntactic difference helping to distinguish,
    295 are illustrated in the declarations of @ca@ \vs @cp@,
    296 whose subsequent @edit@ calls behave differently.
    297 The syntax-caused confusion is in the comparison of the first and last lines,
    298 both of which use a literal to initialize an object declared with spelling @T x[]@.
    299 But these initialized declarations get opposite meanings,
    300 depending on whether the object is a local variable or a parameter.
     294The basic two meanings, with a syntactic difference helping to distinguish, are illustrated in the declarations of @ca@ \vs @cp@, whose subsequent @edit@ calls behave differently.
     295The syntax-caused confusion is in the comparison of the first and last lines, both of which use a literal to initialize an object declared with spelling @T x[]@.
     296But these initialized declarations get opposite meanings, depending on whether the object is a local variable or a parameter!
    301297
    302298In summary, when a function is written with an array-typed parameter,
    303 \begin{itemize}
    304         \item an appearance of passing an array by value is always an incorrect understanding
    305         \item a dimension value, if any is present, is ignored
    306         \item pointer decay is forced at the call site and the callee sees the parameter having the decayed type
     299\begin{itemize}[leftmargin=*,topsep=0pt]
     300        \item an appearance of passing an array by value is always an incorrect understanding,
     301        \item a dimension value, if any is present, is ignored,
     302        \item pointer decay is forced at the call site and the callee sees the parameter having the decayed type.
    307303\end{itemize}
    308304
    309305Pointer decay does not affect pointer-to-array types, because these are already pointers, not arrays.
    310306As a result, a function with a pointer-to-array parameter sees the parameter exactly as the caller does:
    311 \lstinput{32-42}{bkgd-carray-decay.c}
     307\par\noindent
     308\begin{tabular}{@{\hspace*{-0.75\parindentlnth}}l@{}l@{}}
     309\lstinput{32-36}{bkgd-carray-decay.c}
     310&
     311\lstinput{38-42}{bkgd-carray-decay.c}
     312\end{tabular}
     313\par\noindent
    312314\VRef[Table]{bkgd:ar:usr:decay-parm} gives the reference for the decay phenomenon seen in parameter declarations.
    313315
     
    350352
    351353
    352 \subsection{Multi-dimensional}
    353 
    354 As in the last section, multi-dimensional array declarations are examined.
     354\subsection{Variable Length Arrays}
     355
     356As of C99, the C standard supports a \newterm{variable length array} (VLA)~\cite[\S~6.7.5.2.5]{C99}, providing a dynamic-fixed array feature \see{\VRef{s:ArrayIntro}}.
     357Note, the \CC standard does not support VLAs, but @g++@ provides them.
     358A VLA is used when the desired number of array elements is \emph{unknown} at compile time.
     359\begin{cfa}
     360size_t  cols;
     361scanf( "%d", &cols );
     362double ar[cols];
     363\end{cfa}
     364The array dimension is read from outside the program and used to create an array of size @cols@ on the stack.
     365The VLA is implemented by the @alloca@ routine, which bumps the stack pointer.
     366Unfortunately, there is significant misinformation about VLAs, \eg the stack size is limited (small), or VLAs cause stack failures or are inefficient.
     367VLAs exist as far back as Algol W~\cite[\S~5.2]{AlgolW} and are a sound and efficient data type.
     368For types with a dynamic-fixed stack, \eg coroutines or user-level threads, large VLAs can overflow the stack without appropriately sizing the stack, so heap allocation is used when the array size is unbounded.
     369
     370
     371\subsection{Multidimensional Arrays}
     372\label{toc:mdimpl}
     373
     374% TODO: introduce multidimensional array feature and approaches
     375
     376When working with arrays, \eg linear algebra, array dimensions are referred to as ``rows'' and ``columns'' for a matrix, adding ``planes'' for a cube.
     377(There is little terminology for higher dimensional arrays.)
     378For example, an acrostic poem\footnote{A type of poetry where the first, last or other letters in a line spell out a particular word or phrase in a vertical column.}
     379can be treated as a grid of characters, where the rows are the text and the columns are the embedded keyword(s).
     380Within a poem, there is the concept of a \newterm{slice}, \eg a row is a slice for the poem text, a column is a slice for a keyword.
     381In general, the dimensioning and subscripting for multidimensional arrays has two syntactic forms: @m[r,c]@ or @m[r][c]@.
     382
     383Commonly, an array, matrix, or cube, is visualized (especially in mathematics) as a contiguous row, rectangle, or block.
     384This conceptualization is reenforced by subscript ordering, \eg $m_{r,c}$ for a matrix and $c_{p,r,c}$ for a cube.
     385Few programming languages differ from the mathematical subscript ordering.
     386However, computer memory is flat, and hence, array forms are structured in memory as appropriate for the runtime system.
     387The closest representation to the conceptual visualization is for an array object to be contiguous, and the language structures this memory using pointer arithmetic to access the values using various subscripts.
     388This approach still has degrees of layout freedom, such as row or column major order, \ie juxtaposed rows or columns in memory, even when the subscript order remains fixed.
     389For example, programming languages like MATLAB, Fortran, Julia and R store matrices in column-major order since they are commonly used for processing column-vectors in tabular data sets but retain row-major subscripting to match with mathematical notation.
     390In general, storage layout is hidden by subscripting, and only appears when passing arrays among different programming languages or accessing specific hardware.
     391
     392\VRef[Figure]{f:FixedVariable} shows two C90 approaches for manipulating a contiguous matrix.
     393Note, C90 does not support VLAs.
     394The fixed-dimension approach (left) uses the type system;
     395however, it requires all dimensions except the first to be specified at compile time, \eg @m[][6]@, allowing all subscripting stride calculations to be generated with constants.
     396Hence, every matrix passed to @fp1@ must have exactly 6 columns but the row size can vary.
     397The variable-dimension approach (right) ignores (violates) the type system, \ie argument and parameters types do not match, and subscripting is performed manually using pointer arithmetic in the macro @sub@.
     398
     399\begin{figure}
     400\begin{tabular}{@{}l@{\hspace{40pt}}l@{}}
     401\multicolumn{1}{c}{\textbf{Fixed Dimension}} & \multicolumn{1}{c}{\textbf{Variable Dimension}} \\
     402\begin{cfa}
     403
     404void fp1( int rows, int m[][@6@] ) {
     405        ...  printf( "%d ", @m[r][c]@ );  ...
     406}
     407int fm1[4][@6@], fm2[6][@6@]; // no VLA
     408// initialize matrixes
     409fp1( 4, fm1 ); // implicit 6 columns
     410fp1( 6, fm2 );
     411\end{cfa}
     412&
     413\begin{cfa}
     414#define sub( m, r, c ) *(m + r * sizeof( m[0] ) + c)
     415void fp2( int rows, int cols, int *m ) {
     416        ...  printf( "%d ", @sub( m, r, c )@ );  ...
     417}
     418int vm1[@4@][@4@], vm2[@6@][@8@]; // no VLA
     419// initialize matrixes
     420fp2( 4, 4, vm1 );
     421fp2( 6, 8, vm2 );
     422\end{cfa}
     423\end{tabular}
     424\caption{C90 Fixed \vs Variable Contiguous Matrix Styles}
     425\label{f:FixedVariable}
     426\end{figure}
     427
     428Many languages allow multidimensional arrays-of-arrays, \eg in Pascal or \CC.
     429\begin{cquote}
     430\begin{tabular}{@{}ll@{}}
     431\begin{pascal}
     432var m : array[0..4, 0..4] of Integer;  (* matrix *)
     433type AT = array[0..4] of Integer;  (* array type *)
     434type MT = array[0..4] of AT;  (* array of array type *)
     435var aa : MT;  (* array of array variable *)
     436m@[1][2]@ := 1;   aa@[1][2]@ := 1 (* same subscripting *)
     437\end{pascal}
     438&
     439\begin{c++}
     440int m[5][5];
     441
     442typedef vector< vector<int> > MT;
     443MT vm( 5, vector<int>( 5 ) );
     444m@[1][2]@ = 1;  aa@[1][2]@ = 1;
     445\end{c++}
     446\end{tabular}
     447\end{cquote}
     448The language decides if the matrix and array-of-array are laid out the same or differently.
     449For example, an array-of-array may be an array of row pointers to arrays of columns, so the rows may not be contiguous in memory nor even the same length (triangular matrix).
     450Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though a language could differentiated between the two forms using subscript syntax, \eg @m[1,2]@ \vs @aa[1][2]@.
     451Nevertheless, controlling memory layout can make a difference in what operations are allowed and in performance (caching/NUMA effects).
     452
     453C also provides non-contiguous arrays-of-arrays.
     454\begin{cfa}
     455int m[5][5];                                                    $\C{// contiguous}$
     456int * aa[5];                                                    $\C{// non-contiguous}$
     457\end{cfa}
     458both with different memory layout using the same subscripting, and both with different degrees of issues.
     459The focus of this work is on the contiguous multidimensional arrays in C.
     460The reason is that programmers are often forced to use the more complex array-of-array form when a contiguous array would be simpler, faster, and safer.
     461Nevertheless, the C array-of-array form is still important for special circumstances.
     462
     463\VRef[Figure]{f:ContiguousNon-contiguous} shows a powerful extension made in C99 for manipulating contiguous \vs non-contiguous arrays.\footnote{C90 also supported non-contiguous arrays.}
     464For contiguous-array (including VLA) arguments, C99 conjoins one or more of the parameters as a downstream dimension(s), \eg @cols@, implicitly using this parameter to compute the row stride of @m@.
     465There is now sufficient information to support subscript checking along the columns to prevent buffer-overflow problems, but subscript checking is not provided.
     466If the declaration of @fc@ is changed to:
     467\begin{cfa}
     468void fc( int rows, int cols, int m[@rows@][@cols@] ) ...
     469\end{cfa}
     470it is possible for C to perform bound checking across all subscripting.
     471While this contiguous-array capability is a step forward, it is still the programmer's responsibility to manually manage the number of dimensions and their sizes, both at the function definition and call sites.
     472That is, the array does not automatically carry its structure and sizes for use in computing subscripts.
     473While the non-contiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknown-sized array of row pointers, and it relies on the programmer to traverse the columns in a row correctly with a correctly bounded loop index.
     474Specifically, there is no requirement that the rows are the same length, like a poem with different length lines.
     475
     476\begin{figure}
     477\begin{tabular}{@{}ll@{}}
     478\multicolumn{1}{c}{\textbf{Contiguous}} & \multicolumn{1}{c}{\textbf{ Non-contiguous}} \\
     479\begin{cfa}
     480void fc( int rows, @int cols@, int m[ /* rows */ ][@cols@] ) {
     481        for ( size_t  r = 0; r < rows; r += 1 ) {
     482                for ( size_t  c = 0; c < cols; c += 1 )
     483                        ...  @m[r][c]@  ...
     484}
     485int m@[5][5]@;
     486for ( int r = 0; r < 5; r += 1 ) {
     487
     488        for ( int c = 0; c < 5; c += 1 )
     489                m[r][c] = r + c;
     490}
     491fc( 5, 5, m );
     492\end{cfa}
     493&
     494\begin{cfa}
     495void faa( int rows, int cols, int * m[ @/* cols */@ ] ) {
     496        for ( size_t  r = 0; r < rows; r += 1 ) {
     497                for ( size_t  c = 0; c < cols; c += 1 )
     498                        ...  @m[r][c]@  ...
     499}
     500int @* aa[5]@;  // row pointers
     501for ( int r = 0; r < 5; r += 1 ) {
     502        @aa[r] = malloc( 5 * sizeof(int) );@ // create rows
     503        for ( int c = 0; c < 5; c += 1 )
     504                aa[r][c] = r + c;
     505}
     506faa( 5, 5, aa );
     507\end{cfa}
     508\end{tabular}
     509\caption{C99 Contiguous \vs Non-contiguous Matrix Styles}
     510\label{f:ContiguousNon-contiguous}
     511\end{figure}
     512
     513
     514\subsection{Multi-dimensional arrays decay and pointers diffract}
     515
     516As for single-dimension arrays, multi-dimensional arrays have similar issues.
    355517\lstinput{16-18}{bkgd-carray-mdim.c}
    356518The significant axis of deriving expressions from @ar@ is now ``itself,'' ``first element'' or ``first grand-element (meaning, first element of first element).''
    357 \lstinput{20-44}{bkgd-carray-mdim.c}
    358 
    359 
    360 \subsection{Lengths may vary, checking does not}
    361 
    362 When the desired number of elements is unknown at compile time, a variable-length array is a solution:
    363 \begin{cfa}
    364 int main( int argc, const char * argv[] ) {
    365         assert( argc == 2 );
    366         size_t n = atol( argv[1] );
    367         assert( 0 < n );
    368         float ar[n];
    369         float b[10];
    370         // ... discussion continues here
    371 }
    372 \end{cfa}
    373 This arrangement allocates @n@ elements on the @main@ stack frame for @ar@, called a \newterm{variable length array} (VLA), as well as 10 elements in the same stack frame for @b@.
    374 The variable-sized allocation of @ar@ is provided by the @alloca@ routine, which bumps the stack pointer.
    375 Note, the C standard supports VLAs~\cite[\S~6.7.6.2.4]{C11} as a conditional feature, but the \CC standard does not;
    376 both @gcc@ and @g++@ support VLAs.
    377 As well, there is misinformation about VLAs, \eg the stack size is limited (small), or VLAs cause stack failures or are inefficient.
    378 VLAs exist as far back as Algol W~\cite[\S~5.2]{AlgolW} and are a sound and efficient data type.
    379 
    380 For high-performance applications, the stack size can be fixed and small (coroutines or user-level threads).
    381 Here, VLAs can overflow the stack without appropriately sizing the stack, so a heap allocation is used.
    382 \begin{cfa}
    383 float * ax1 = malloc( sizeof( float[n] ) );
    384 float * ax2 = malloc( n * sizeof( float ) );    $\C{// arrays}$
    385 float * bx1 = malloc( sizeof( float[1000000] ) );
    386 float * bx2 = malloc( 1000000 * sizeof( float ) );
    387 \end{cfa}
    388 
    389 Parameter dependency
    390 
    391 Checking is best-effort / unsound
    392 
    393 Limited special handling to get the dimension value checked (static)
    394 
    395 
    396 \subsection{Dynamically sized, multidimensional arrays}
    397 
    398 In C and \CC, ``multidimensional array'' means ``array of arrays.''  Other meanings are discussed in TODO.
    399 
    400 Just as an array's element type can be @float@, so can it be @float[10]@.
    401 
    402 While any of @float*@, @float[10]@ and @float(*)[10]@ are easy to tell apart from @float@, telling them apart from each other may need occasional reference back to TODO intro section.
    403 The sentence derived by wrapping each type in @-[3]@ follows.
    404 
    405 While any of @float*[3]@, @float[3][10]@ and @float(*)[3][10]@ are easy to tell apart from @float[3]@,
    406 telling them apart from each other is what it takes to know what ``array of arrays'' really means.
    407 
    408 Pointer decay affects the outermost array only
    409 
    410 TODO: unfortunate syntactic reference with these cases:
    411 
    412 \begin{itemize}
    413         \item ar. of ar. of val (be sure about ordering of dimensions when the declaration is dropped)
    414         \item ptr. to ar. of ar. of val
    415 \end{itemize}
    416 
    417 
    418 \subsection{Arrays are (but) almost values}
    419 
    420 Has size; can point to
    421 
    422 Can't cast to
    423 
    424 Can't pass as value
    425 
    426 Can initialize
    427 
    428 Can wrap in aggregate
    429 
    430 Can't assign
    431 
    432 
    433 \subsection{Returning an array is (but) almost possible}
    434 
    435 
    436 \subsection{The pointer-to-array type has been noticed before}
     519\PAB{Explain, explain, explain.}
     520\lstinput{20-26}{bkgd-carray-mdim.c}
     521\PAB{Explain, explain, explain.}
     522\lstinput{28-36}{bkgd-carray-mdim.c}
     523\PAB{Explain, explain, explain.}
     524\lstinput{38-44}{bkgd-carray-mdim.c}
     525
     526
     527\subsection{Array Parameter Declaration}
     528
     529C has a formal and actual declaration for functions to allow definition-before-use and separate compilation, where formal describes a type and an actual defines the type.
     530\begin{cfa}
     531int foo( int, float, char );                            $\C{// formal, parameter names option}$
     532int foo( int i, float f, char c ) { ... }       $\C{// actual}$
     533\end{cfa}
     534For array parameters, a formal parameter array declaration can specify the first dimension with a dimension value, @[10]@ (which is ignored), an empty dimension list, @[ ]@, or a pointer, @*@:
     535\begin{cquote}
     536\begin{tabular}{@{}llll@{}}
     537\begin{cfa}
     538double sum( double [5] );
     539double sum( double *[5] );
     540\end{cfa}
     541&
     542\begin{cfa}
     543double sum( double [ ] );
     544double sum( double *[ ] );
     545\end{cfa}
     546&
     547\begin{cfa}
     548double sum( double * );
     549double sum( double ** );
     550\end{cfa}
     551&
     552\begin{cfa}
     553// array
     554// matrix
     555\end{cfa}
     556\end{tabular}
     557\end{cquote}
     558Good practice uses the middle form as it clearly indicates the parameter is subscripted.
     559However, an actual declaration cannot use @[ ]@;
     560it must use @*@.
     561\begin{cfa}
     562double sum( double v[ ] ) {                                     $\C{// formal declaration}$
     563double * cv;                                                            $\C{// actual declaration, think cv[ ]}$
     564sum( cv );                                                                      $\C{// address assignment v = cv}$
     565\end{cfa}
     566
     567Given the formal dimension forms @[ ]@ or @[5]@, it raises the question of qualifying the implicit array pointer rather than the array element type.
     568For example, the qualifiers after the @*@ apply to the array pointer.
     569\begin{cfa}
     570void foo( const volatile int * @const volatile@ );
     571void foo( const volatile int [ ] @const volatile@ ); // does not parse
     572\end{cfa}
     573C addressed this shortcoming by moving the pointer qualifiers into the first dimension.
     574\begin{cquote}
     575@[@ \textit{type-qualifier-list}$_{opt}$ \textit{assignment-expression}$_{opt}$ @]@
     576\end{cquote}
     577\begin{cfa}
     578void foo( int [@const  volatile@] );
     579void foo( int [@const  volatile@ 5] );          $\C{// 5 is ignored}$
     580\end{cfa}
     581
     582To make the first formal dimension size meaningful, C adds this form.
     583\begin{cquote}
     584@[@ @static@ \textit{type-qualifier-list}$_{opt}$ \textit{assignment-expression} @]@
     585\end{cquote}
     586\begin{cfa}
     587void foo( int [static @3@] );
     588int ar[@10@];
     589foo( ar ); // check argument dimension 10 > 3
     590\end{cfa}
     591Here, the @static@ storage qualifier defines the minimum array size for its argument.
     592@gcc@ ignores this dimension qualifier, \ie it gives no warning if the argument array size is less than the parameter minimum.
     593
     594Finally, to handle VLAs, C repurposed the @*@ \emph{within} the dimension in the formal declaration context to mean the argument must be a VLA (contiguous).
     595\begin{cquote}
     596@[@ \textit{type-qualifier-list$_{opt}$} @* ]@
     597\end{cquote}
     598\begin{cfa}
     599void foo( int [@*@][@*@] );                                     $\C{// formal}$
     600void foo( int ar[10][10] ) { ... }                      $\C{// actual}$
     601int ar[2][10];                                                          $\C{// contiguous}$
     602foo( ar );                                                                      $\C{// valid}$
     603int * arp[10];                                                          $\C{// non-contiguous}$
     604foo( arp );                                                                     $\C{// invalid}$
     605\end{cfa}
     606This syntactic form for the formal prototype means the header file does not have to commit to specific dimension values, but the compiler knows the argument is a contiguous array.
     607
     608
     609\subsection{Arrays could be values}
     610
     611All arrays have a know runtime size at their point of declaration.
     612Furthermore, C provides an explicit mechanism to pass an array's dimensions to a function.
     613Nevertheless, an array cannot be copied, and hence, not passed by value to a function, even when there is sufficient information to do so.
     614
     615However, if an array is a structure field (compile-time size), it can be copied and passed by value.
     616For example, a C @jmp_buf@ is an array.
     617\begin{cfa}
     618typedef long int jmp_buf[8];
     619\end{cfa}
     620A instance of this array can be declared as a structure field.
     621\begin{cfa}
     622struct Jmp_Buf {
     623        @jmp_buf@ jb;
     624};
     625\end{cfa}
     626Now the array can be copied (and passed by value) because structures can be copied.
     627\begin{cfa}
     628Jmp_Buf jb1, jb2;
     629jb1 = jb2;
     630void foo( Jmp_Buf );
     631foo( jb2 );
     632\end{cfa}
     633
     634This same argument applies to returning arrays from functions.
     635There can be sufficient information to return an array by value but it is not supported.
     636Again, array wrapping allows an array to be returned from a function and copied into variable.
    437637
    438638
     
    467667
    468668\subsection{Preexisting linked-list libraries}
     669\label{s:PreexistingLinked-ListLibraries}
    469670
    470671Two preexisting linked-list libraries are used throughout, to show examples of the concepts being defined,
     
    8831084The kind of characters in the string is denoted by a prefix: UTF-8 characters are prefixed by @u8@, wide characters are prefixed by @L@, @u@, or @U@.
    8841085
    885 For UTF-8 string literals, the array elements have type @char@ and are initialized with the characters of the multibyte character sequences, \eg @u8"\xe1\x90\x87"@ (Canadian syllabics Y-Cree OO).
    886 For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding to the multibyte character sequence, \eg @L"abc@$\mu$@"@ and are read/printed using @wsanf@/@wprintf@.
     1086For UTF-8 string literals, the array elements have type @char@ and are initialized with the characters of the multi-byte character sequences, \eg @u8"\xe1\x90\x87"@ (Canadian syllabics Y-Cree OO).
     1087For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding to the multi-byte character sequence, \eg @L"abc@$\mu$@"@ and are read/printed using @wsanf@/@wprintf@.
    8871088The value of a wide-character is implementation-defined, usually a UTF-16 character.
    888 For wide string literals prefixed by the letter @u@ or @U@, the array elements have type @char16_t@ or @char32_t@, respectively, and are initialized with wide characters corresponding to the multibyte character sequence, \eg @u"abc@$\mu$@"@, @U"abc@$\mu$@"@.
     1089For wide string literals prefixed by the letter @u@ or @U@, the array elements have type @char16_t@ or @char32_t@, respectively, and are initialized with wide characters corresponding to the multi-byte character sequence, \eg @u"abc@$\mu$@"@, @U"abc@$\mu$@"@.
    8891090The value of a @"u"@ character is an UTF-16 character;
    8901091the value of a @"U"@ character is an UTF-32 character.
    891 The value of a string literal containing a multibyte character or escape sequence not represented in the execution character set is implementation-defined.
     1092The value of a string literal containing a multi-byte character or escape sequence not represented in the execution character set is implementation-defined.
    8921093
    8931094C strings are null-terminated rather than maintaining a separate string length.
Note: See TracChangeset for help on using the changeset viewer.