Context Navigation

-                      rbf91d1d
+                      r63cf80e
 with a segmentation fault at runtime.
 Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems inappropriate.
 Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings, \eg unused variable.
+Compiling with flag @-Werror@, which turns warnings into errors, is often too pervasive, because some warnings are just warnings, \eg an unused variable.
 In the following discussion, ``ill-typed'' means giving a nonzero @gcc@ exit condition with a message that discusses typing.
 Note, \CFA's type-system rejects all these ill-typed cases as type mismatch errors.
 …
+\section{Array}
+At the start, the C programming language made a significant design mistake.
+\begin{quote}
+In C, there is a strong relationship between pointers and arrays, strong enough that pointers and arrays really should be treated simultaneously.
+Any operation which can be achieved by array subscripting can also be done with pointers.~\cite[p.~93]{C:old}
+\end{quote}
+Accessing any storage requires pointer arithmetic, even if it is just base-displacement addressing in an instruction.
+The conjoining of pointers and arrays could also be applied to structures, where a pointer references a structure field like an array element.
+Finally, while subscripting involves pointer arithmetic (as does field references @x.y.z@), the computation is very complex for multi-dimensional arrays and requires array descriptors to know stride lengths along dimensions.
+Many C errors result from performing pointer arithmetic instead of using subscripting;
+some C textbooks teach pointer arithmetic erroneously suggesting it is faster than subscripting.
+A sound and efficient C program does not require explicit pointer arithmetic.
+C semantics want a programmer to \emph{believe} an array variable is a ``pointer to its first element.''
+This desire becomes apparent by a detailed inspection of an array declaration.
+\lstinput{34-34}{bkgd-carray-arrty.c}
+The inspection begins by using @sizeof@ to provide definite program semantics for the intuition of an expression's type.
+\lstinput{35-36}{bkgd-carray-arrty.c}
+Now consider the sizes of expressions derived from @ar@, modified by adding ``pointer to'' and ``first element'' (and including unnecessary parentheses to avoid confusion about precedence).
+\lstinput{37-40}{bkgd-carray-arrty.c}
+Given the size of @float@ is 4, the size of @ar@ with 10 floats being 40 bytes is common reasoning for C programmers.
+Equally, C programmers know the size of a \emph{pointer} to the first array element is 8 (or 4 depending on the addressing architecture).
+% Now, set aside for a moment the claim that this first assertion is giving information about a type.
+Clearly, an array and a pointer to its first element are different.
+In fact, the idea that there is such a thing as a pointer to an array may be surprising and it is not the same thing as a pointer to the first element.
+\lstinput{42-45}{bkgd-carray-arrty.c}
+The first assignment gets
+\begin{cfa}
+warning: assignment to `float (*)[10]' from incompatible pointer type `float *'
+\end{cfa}
+and the second assignment gets the opposite.
+The inspection now refutes any suggestion that @sizeof@ is informing about allocation rather than type information.
+Note, @sizeof@ has two forms, one operating on an expression and the other on a type.
+Using the type form yields the same results as the prior expression form.
+\lstinput{46-49}{bkgd-carray-arrty.c}
+The results are also the same when there is \emph{no allocation} using a pointer-to-array type.
+\lstinput{51-57}{bkgd-carray-arrty.c}
+Hence, in all cases, @sizeof@ is informing about type information.
+So, thinking of an array as a pointer to its first element is too simplistic an analogue and it is not backed up by the type system.
+This misguided analogue works for a single-dimension array but there is no advantage other than possibly teaching beginning programmers about basic runtime array-access.
+Continuing, a short form for declaring array variables exists using length information provided implicitly by an initializer.
+\lstinput{59-62}{bkgd-carray-arrty.c}
+The compiler counts the number of initializer elements and uses this value as the first dimension.
+Unfortunately, the implicit element counting does not extend to dimensions beyond the first.
+\lstinput{64-67}{bkgd-carray-arrty.c}
+My contribution is recognizing:
+\section{Reading declarations}
+A significant area of confusion for reading C declarations results from:
 \begin{itemize}
+        \item There is value in using a type that knows its size.
+        \item The type pointer to (first) element does not.
+        \item C \emph{has} a type that knows the whole picture: array, e.g. @T[10]@.
+        \item This type has all the usual derived forms, which also know the whole picture.
+        A usefully noteworthy example is pointer to array, e.g. @T (*)[10]@.\footnote{
+        The parenthesis are necessary because subscript has higher priority than pointer in C declarations.
+        (Subscript also has higher priority than dereference in C expressions.)}
+\item
+C is unique in having dimension be higher priority than pointer in declarations.\footnote{
+For consistency, subscript has higher priority than dereference, yielding \lstinline{(*arp)[3]} rather than \lstinline{*arp[3]}.}
+\item
+Embedding a declared variable in a declaration, mimics the way the variable is used in executable statements.
 \end{itemize}
-\section{Reading declarations}
-A significant area of confusion for reading C declarations results from embedding a declared variable in a declaration, mimicking the way the variable is used in executable statements.
 \begin{cquote}
 \begin{tabular}{@{}ll@{}}
 …
 \end{tabular}
 \end{cquote}
 Essentially, the type is wrapped around the name in successive layers (like an \Index{onion}).
+The parenthesis are necessary to achieve a pointer to a @T@, and the type is wrapped around the name in successive layers (like an \Index{onion}) to match usage in an expression.
 While attempting to make the two contexts consistent is a laudable goal, it has not worked out in practice, even though Dennis Richie believed otherwise:
 \begin{quote}
 …
 \end{cquote}
 As declaration size increases, it becomes corresponding difficult to read and understand the C declaration form, whereas reading and understanding a \CFA declaration has linear complexity as the declaration size increases.
+Note, writing declarations left to right is common in other programming languages, where the function return-type is often placed after the parameter declarations.
+Note, writing declarations left to right is common in other programming languages, where the function return-type is often placed after the parameter declarations, \eg \CC \lstinline[language=C++]{auto f( int ) -> int}.
+Unfortunately, \CFA cannot interchange the priorities of subscript and dereference in expressions without breaking C compatibility.
 \VRef[Table]{bkgd:ar:usr:avp} introduces the many layers of the C and \CFA array story, where the \CFA story is discussion in \VRef[Chapter]{c:Array}.
 The \CFA-thesis column shows the new array declaration form, which is my contributed improvements for safety and ergonomics.
+The \CFA-thesis column shows the new array declaration form, which is my contribution to safety and ergonomics.
 The table shows there are multiple yet equivalent forms for the array types under discussion, and subsequent discussion shows interactions with orthogonal (but easily confused) language features.
 Each row of the table shows alternate syntactic forms.
 The simplest occurrences of types distinguished in the preceding discussion are marked with $\triangleright$.
 Removing the declared variable @x@, gives the type used for variable, structure field, cast or error messages \PAB{(though note Section TODO points out that some types cannot be casted to)}.
 Unfortunately, parameter declarations \PAB{(section TODO)} have more syntactic forms and rules.
+Removing the declared variable @x@, gives the type used for variable, structure field, cast, or error messages.
+Unfortunately, parameter declarations have more syntactic forms and rules.
 \begin{table}
 …
 \end{table}
+TODO: Address these parked unfortunate syntaxes
+\begin{itemize}
+        \item static
+        \item star as dimension
+        \item under pointer decay: @int p1[const 3]@ being @int const *p1@
+\section{Array}
+\label{s:Array}
+At the start, the C language designers made a significant design mistake with respect to arrays.
+\begin{quote}
+In C, there is a strong relationship between pointers and arrays, strong enough that pointers and arrays really should be treated simultaneously.
+Any operation which can be achieved by array subscripting can also be done with pointers.~\cite[p.~93]{C:old}
+\end{quote}
+Accessing any storage requires pointer arithmetic, even if it is just base-displacement addressing in an instruction.
+The conjoining of pointers and arrays could also be applied to structures, where a pointer references a structure field like an array element.
+Finally, while subscripting involves pointer arithmetic (as does a field reference @x.y.z@), the computation is complex for multi-dimensional arrays and requires array descriptors to know stride lengths along dimensions.
+Many C errors result from manually performing pointer arithmetic instead of using language subscripting, letting the compiler performs any arithmetic;
+some C textbooks erroneously suggest manual pointer arithmetic is faster than subscripting.
+A sound and efficient C program does not require explicit pointer arithmetic.
+C semantics want a programmer to \emph{believe} an array variable is a ``pointer to its first element.''
+This desire becomes apparent by a detailed inspection of an array declaration.
+\lstinput{34-34}{bkgd-carray-arrty.c}
+The inspection begins by using @sizeof@ to provide program semantics for the intuition of an expression's type.
+\lstinput{35-36}{bkgd-carray-arrty.c}
+Now consider the @sizeof@ expressions derived from @ar@, modified by adding pointer-to and first-element (and including unnecessary parentheses to avoid any confusion about precedence).
+\lstinput{37-40}{bkgd-carray-arrty.c}
+Given that arrays are contiguous and the size of @float@ is 4, then the size of @ar@ with 10 floats being 40 bytes is common reasoning for C programmers.
+Equally, C programmers know the size of a pointer to the first array element is 8 (or 4 depending on the addressing architecture).
+% Now, set aside for a moment the claim that this first assertion is giving information about a type.
+Clearly, an array and a pointer to its first element are different.
+In fact, the idea that there is such a thing as a pointer to an array may be surprising and it is not the same thing as a pointer to the first element.
+\lstinput{42-45}{bkgd-carray-arrty.c}
+The first assignment generates:
+\begin{cfa}
+warning: assignment to `float (*)[10]' from incompatible pointer type `float *'
+\end{cfa}
+and the second assignment generates the opposite.
+The inspection now refutes any suggestion that @sizeof@ is informing about allocation rather than type information.
+Note, @sizeof@ has two forms, one operating on an expression and the other on a type.
+Using the type form yields the same results as the prior expression form.
+\lstinput{46-49}{bkgd-carray-arrty.c}
+The results are also the same when there is no allocation using a pointer-to-array type.
+\lstinput{51-57}{bkgd-carray-arrty.c}
+Hence, in all cases, @sizeof@ is informing about type information.
+Therefore, thinking of an array as a pointer to its first element is too simplistic an analogue and it is not backed up by the type system.
+This misguided analogue works for a single-dimension array but there is no advantage other than possibly teaching beginner programmers about basic runtime array-access.
+Continuing, there is a short form for declaring array variables using length information provided implicitly by an initializer.
+\lstinput{59-62}{bkgd-carray-arrty.c}
+The compiler counts the number of initializer elements and uses this value as the first dimension.
+Unfortunately, the implicit element counting does not extend to dimensions beyond the first.
+\lstinput{64-67}{bkgd-carray-arrty.c}
+My observation is recognizing:
+\begin{itemize}[leftmargin=*,topsep=0pt,itemsep=0pt]
+        \item There is value in using a type that knows its size.
+        \item The type pointer to the (first) element does not.
+        \item C \emph{has} a type that knows the whole picture: array, \eg @T[10]@.
+        \item This type has all the usual derived forms, which also know the whole picture.
+        A noteworthy example is pointer to array, \eg @T (*)[10]@.
 \end{itemize}
 …
 \subsection{Arrays decay and pointers diffract}
 The last section established the difference between these four types:
+The last section established the difference among these four types:
 \lstinput{3-6}{bkgd-carray-decay.c}
 But the expression used for obtaining the pointer to the first element is pedantic.
 …
 which reproduces @pa0@, in type and value:
 \lstinput{9-9}{bkgd-carray-decay.c}
 The validity of this initialization is unsettling, in the context of the facts established in the last section.
+The validity of this initialization is unsettling, in the context of the facts established in \VRef{s:Array}.
 Notably, it initializes name @pa0x@ from expression @ar@, when they are not of the same type:
 \lstinput{10-10}{bkgd-carray-decay.c}
 …
 \end{quote}
 This phenomenon is the famous \newterm{pointer decay}, which is a decay of an array-typed expression into a pointer-typed one.
 It is worthy to note that the list of exception cases does not feature the occurrence of @ar@ in @ar[i]@.
+It is worthy to note that the list of exceptional cases does not feature the occurrence of @ar@ in @ar[i]@.
 Thus, subscripting happens on pointers not arrays.
 …
 Taken together, these rules illustrate that @ar[i]@ and @i[a]@ mean the same thing!
 Subscripting a pointer when the target is standard inappropriate is still practically well-defined.
+Subscripting a pointer when the target is standard-inappropriate is still practically well-defined.
 While the standard affords a C compiler freedom about the meaning of an out-of-bound access, or of subscripting a pointer that does not refer to an array element at all,
 the fact that C is famously both generally high-performance, and specifically not bound-checked, leads to an expectation that the runtime handling is uniform across legal and illegal accesses.
 …
 \cite[\S~6.7.6.3.7]{C11} explains that when an array type is written for a parameter,
 the parameter's type becomes a type that can be summarized as the array-decayed type.
 The respective handling of the following two parameter spellings shows that the array-spelled one is really, like the other, a pointer.
+The respective handling of the following two parameter spellings shows that the array and pointer versions are identical.
 \lstinput{12-16}{bkgd-carray-decay.c}
 As the @sizeof(x)@ meaning changed, compared with when run on a similarly-spelled local variable declaration,
 …
 warning: passing argument 1 of 'edit' discards 'const' qualifier from pointer target type
 \end{cfa}
+The basic two meanings, with a syntactic difference helping to distinguish,
+are illustrated in the declarations of @ca@ \vs @cp@,
+whose subsequent @edit@ calls behave differently.
+The syntax-caused confusion is in the comparison of the first and last lines,
+both of which use a literal to initialize an object declared with spelling @T x[]@.
+But these initialized declarations get opposite meanings,
+depending on whether the object is a local variable or a parameter.
+The basic two meanings, with a syntactic difference helping to distinguish, are illustrated in the declarations of @ca@ \vs @cp@, whose subsequent @edit@ calls behave differently.
+The syntax-caused confusion is in the comparison of the first and last lines, both of which use a literal to initialize an object declared with spelling @T x[]@.
+But these initialized declarations get opposite meanings, depending on whether the object is a local variable or a parameter!
 In summary, when a function is written with an array-typed parameter,
 \begin{itemize}
         \item an appearance of passing an array by value is always an incorrect understanding
         \item a dimension value, if any is present, is ignored
         \item pointer decay is forced at the call site and the callee sees the parameter having the decayed type
+\begin{itemize}[leftmargin=*,topsep=0pt]
+        \item an appearance of passing an array by value is always an incorrect understanding,
+        \item a dimension value, if any is present, is ignored,
+        \item pointer decay is forced at the call site and the callee sees the parameter having the decayed type.
 \end{itemize}
 Pointer decay does not affect pointer-to-array types, because these are already pointers, not arrays.
 As a result, a function with a pointer-to-array parameter sees the parameter exactly as the caller does:
+\lstinput{32-42}{bkgd-carray-decay.c}
+\par\noindent
+\begin{tabular}{@{\hspace*{-0.75\parindentlnth}}l@{}l@{}}
+\lstinput{32-36}{bkgd-carray-decay.c}
+&
+\lstinput{38-42}{bkgd-carray-decay.c}
+\end{tabular}
+\par\noindent
 \VRef[Table]{bkgd:ar:usr:decay-parm} gives the reference for the decay phenomenon seen in parameter declarations.
 …
+\subsection{Multi-dimensional}
+As in the last section, multi-dimensional array declarations are examined.
+\subsection{Variable Length Arrays}
+As of C99, the C standard supports a \newterm{variable length array} (VLA)~\cite[\S~6.7.5.2.5]{C99}, providing a dynamic-fixed array feature \see{\VRef{s:ArrayIntro}}.
+Note, the \CC standard does not support VLAs, but @g++@ provides them.
+A VLA is used when the desired number of array elements is \emph{unknown} at compile time.
+\begin{cfa}
+size_t  cols;
+scanf( "%d", &cols );
+double ar[cols];
+\end{cfa}
+The array dimension is read from outside the program and used to create an array of size @cols@ on the stack.
+The VLA is implemented by the @alloca@ routine, which bumps the stack pointer.
+Unfortunately, there is significant misinformation about VLAs, \eg the stack size is limited (small), or VLAs cause stack failures or are inefficient.
+VLAs exist as far back as Algol W~\cite[\S~5.2]{AlgolW} and are a sound and efficient data type.
+For types with a dynamic-fixed stack, \eg coroutines or user-level threads, large VLAs can overflow the stack without appropriately sizing the stack, so heap allocation is used when the array size is unbounded.
+\subsection{Multidimensional Arrays}
+\label{toc:mdimpl}
+% TODO: introduce multidimensional array feature and approaches
+When working with arrays, \eg linear algebra, array dimensions are referred to as ``rows'' and ``columns'' for a matrix, adding ``planes'' for a cube.
+(There is little terminology for higher dimensional arrays.)
+For example, an acrostic poem\footnote{A type of poetry where the first, last or other letters in a line spell out a particular word or phrase in a vertical column.}
+can be treated as a grid of characters, where the rows are the text and the columns are the embedded keyword(s).
+Within a poem, there is the concept of a \newterm{slice}, \eg a row is a slice for the poem text, a column is a slice for a keyword.
+In general, the dimensioning and subscripting for multidimensional arrays has two syntactic forms: @m[r,c]@ or @m[r][c]@.
+Commonly, an array, matrix, or cube, is visualized (especially in mathematics) as a contiguous row, rectangle, or block.
+This conceptualization is reenforced by subscript ordering, \eg $m_{r,c}$ for a matrix and $c_{p,r,c}$ for a cube.
+Few programming languages differ from the mathematical subscript ordering.
+However, computer memory is flat, and hence, array forms are structured in memory as appropriate for the runtime system.
+The closest representation to the conceptual visualization is for an array object to be contiguous, and the language structures this memory using pointer arithmetic to access the values using various subscripts.
+This approach still has degrees of layout freedom, such as row or column major order, \ie juxtaposed rows or columns in memory, even when the subscript order remains fixed.
+For example, programming languages like MATLAB, Fortran, Julia and R store matrices in column-major order since they are commonly used for processing column-vectors in tabular data sets but retain row-major subscripting to match with mathematical notation.
+In general, storage layout is hidden by subscripting, and only appears when passing arrays among different programming languages or accessing specific hardware.
+\VRef[Figure]{f:FixedVariable} shows two C90 approaches for manipulating a contiguous matrix.
+Note, C90 does not support VLAs.
+The fixed-dimension approach (left) uses the type system;
+however, it requires all dimensions except the first to be specified at compile time, \eg @m[][6]@, allowing all subscripting stride calculations to be generated with constants.
+Hence, every matrix passed to @fp1@ must have exactly 6 columns but the row size can vary.
+The variable-dimension approach (right) ignores (violates) the type system, \ie argument and parameters types do not match, and subscripting is performed manually using pointer arithmetic in the macro @sub@.
+\begin{figure}
+\begin{tabular}{@{}l@{\hspace{40pt}}l@{}}
+\multicolumn{1}{c}{\textbf{Fixed Dimension}} & \multicolumn{1}{c}{\textbf{Variable Dimension}} \\
+\begin{cfa}
+void fp1( int rows, int m[][@6@] ) {
+        ...  printf( "%d ", @m[r][c]@ );  ...
+}
+int fm1[4][@6@], fm2[6][@6@]; // no VLA
+// initialize matrixes
+fp1( 4, fm1 ); // implicit 6 columns
+fp1( 6, fm2 );
+\end{cfa}
+&
+\begin{cfa}
+#define sub( m, r, c ) *(m + r * sizeof( m[0] ) + c)
+void fp2( int rows, int cols, int *m ) {
+        ...  printf( "%d ", @sub( m, r, c )@ );  ...
+}
+int vm1[@4@][@4@], vm2[@6@][@8@]; // no VLA
+// initialize matrixes
+fp2( 4, 4, vm1 );
+fp2( 6, 8, vm2 );
+\end{cfa}
+\end{tabular}
+\caption{C90 Fixed \vs Variable Contiguous Matrix Styles}
+\label{f:FixedVariable}
+\end{figure}
+Many languages allow multidimensional arrays-of-arrays, \eg in Pascal or \CC.
+\begin{cquote}
+\begin{tabular}{@{}ll@{}}
+\begin{pascal}
+var m : array[0..4, 0..4] of Integer;  (* matrix *)
+type AT = array[0..4] of Integer;  (* array type *)
+type MT = array[0..4] of AT;  (* array of array type *)
+var aa : MT;  (* array of array variable *)
+m@[1][2]@ := 1;   aa@[1][2]@ := 1 (* same subscripting *)
+\end{pascal}
+&
+\begin{c++}
+int m[5][5];
+typedef vector< vector<int> > MT;
+MT vm( 5, vector<int>( 5 ) );
+m@[1][2]@ = 1;  aa@[1][2]@ = 1;
+\end{c++}
+\end{tabular}
+\end{cquote}
+The language decides if the matrix and array-of-array are laid out the same or differently.
+For example, an array-of-array may be an array of row pointers to arrays of columns, so the rows may not be contiguous in memory nor even the same length (triangular matrix).
+Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though a language could differentiated between the two forms using subscript syntax, \eg @m[1,2]@ \vs @aa[1][2]@.
+Nevertheless, controlling memory layout can make a difference in what operations are allowed and in performance (caching/NUMA effects).
+C also provides non-contiguous arrays-of-arrays.
+\begin{cfa}
+int m[5][5];                                                    $\C{// contiguous}$
+int * aa[5];                                                    $\C{// non-contiguous}$
+\end{cfa}
+both with different memory layout using the same subscripting, and both with different degrees of issues.
+The focus of this work is on the contiguous multidimensional arrays in C.
+The reason is that programmers are often forced to use the more complex array-of-array form when a contiguous array would be simpler, faster, and safer.
+Nevertheless, the C array-of-array form is still important for special circumstances.
+\VRef[Figure]{f:ContiguousNon-contiguous} shows a powerful extension made in C99 for manipulating contiguous \vs non-contiguous arrays.\footnote{C90 also supported non-contiguous arrays.}
+For contiguous-array (including VLA) arguments, C99 conjoins one or more of the parameters as a downstream dimension(s), \eg @cols@, implicitly using this parameter to compute the row stride of @m@.
+There is now sufficient information to support subscript checking along the columns to prevent buffer-overflow problems, but subscript checking is not provided.
+If the declaration of @fc@ is changed to:
+\begin{cfa}
+void fc( int rows, int cols, int m[@rows@][@cols@] ) ...
+\end{cfa}
+it is possible for C to perform bound checking across all subscripting.
+While this contiguous-array capability is a step forward, it is still the programmer's responsibility to manually manage the number of dimensions and their sizes, both at the function definition and call sites.
+That is, the array does not automatically carry its structure and sizes for use in computing subscripts.
+While the non-contiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknown-sized array of row pointers, and it relies on the programmer to traverse the columns in a row correctly with a correctly bounded loop index.
+Specifically, there is no requirement that the rows are the same length, like a poem with different length lines.
+\begin{figure}
+\begin{tabular}{@{}ll@{}}
+\multicolumn{1}{c}{\textbf{Contiguous}} & \multicolumn{1}{c}{\textbf{ Non-contiguous}} \\
+\begin{cfa}
+void fc( int rows, @int cols@, int m[ /* rows */ ][@cols@] ) {
+        for ( size_t  r = 0; r < rows; r += 1 ) {
+                for ( size_t  c = 0; c < cols; c += 1 )
+                        ...  @m[r][c]@  ...
+}
+int m@[5][5]@;
+for ( int r = 0; r < 5; r += 1 ) {
+        for ( int c = 0; c < 5; c += 1 )
+                m[r][c] = r + c;
+}
+fc( 5, 5, m );
+\end{cfa}
+&
+\begin{cfa}
+void faa( int rows, int cols, int * m[ @/* cols */@ ] ) {
+        for ( size_t  r = 0; r < rows; r += 1 ) {
+                for ( size_t  c = 0; c < cols; c += 1 )
+                        ...  @m[r][c]@  ...
+}
+int @* aa[5]@;  // row pointers
+for ( int r = 0; r < 5; r += 1 ) {
+        @aa[r] = malloc( 5 * sizeof(int) );@ // create rows
+        for ( int c = 0; c < 5; c += 1 )
+                aa[r][c] = r + c;
+}
+faa( 5, 5, aa );
+\end{cfa}
+\end{tabular}
+\caption{C99 Contiguous \vs Non-contiguous Matrix Styles}
+\label{f:ContiguousNon-contiguous}
+\end{figure}
+\subsection{Multi-dimensional arrays decay and pointers diffract}
+As for single-dimension arrays, multi-dimensional arrays have similar issues.
 \lstinput{16-18}{bkgd-carray-mdim.c}
 The significant axis of deriving expressions from @ar@ is now ``itself,'' ``first element'' or ``first grand-element (meaning, first element of first element).''
+\lstinput{20-44}{bkgd-carray-mdim.c}
+\subsection{Lengths may vary, checking does not}
+When the desired number of elements is unknown at compile time, a variable-length array is a solution:
+\begin{cfa}
+int main( int argc, const char * argv[] ) {
+        assert( argc == 2 );
+        size_t n = atol( argv[1] );
+        assert( 0 < n );
+        float ar[n];
+        float b[10];
+        // ... discussion continues here
+}
+\end{cfa}
+This arrangement allocates @n@ elements on the @main@ stack frame for @ar@, called a \newterm{variable length array} (VLA), as well as 10 elements in the same stack frame for @b@.
+The variable-sized allocation of @ar@ is provided by the @alloca@ routine, which bumps the stack pointer.
+Note, the C standard supports VLAs~\cite[\S~6.7.6.2.4]{C11} as a conditional feature, but the \CC standard does not;
+both @gcc@ and @g++@ support VLAs.
+As well, there is misinformation about VLAs, \eg the stack size is limited (small), or VLAs cause stack failures or are inefficient.
+VLAs exist as far back as Algol W~\cite[\S~5.2]{AlgolW} and are a sound and efficient data type.
+For high-performance applications, the stack size can be fixed and small (coroutines or user-level threads).
+Here, VLAs can overflow the stack without appropriately sizing the stack, so a heap allocation is used.
+\begin{cfa}
+float * ax1 = malloc( sizeof( float[n] ) );
+float * ax2 = malloc( n * sizeof( float ) );    $\C{// arrays}$
+float * bx1 = malloc( sizeof( float[1000000] ) );
+float * bx2 = malloc( 1000000 * sizeof( float ) );
+\end{cfa}
+Parameter dependency
+Checking is best-effort / unsound
+Limited special handling to get the dimension value checked (static)
+\subsection{Dynamically sized, multidimensional arrays}
+In C and \CC, ``multidimensional array'' means ``array of arrays.''  Other meanings are discussed in TODO.
+Just as an array's element type can be @float@, so can it be @float[10]@.
+While any of @float*@, @float[10]@ and @float(*)[10]@ are easy to tell apart from @float@, telling them apart from each other may need occasional reference back to TODO intro section.
+The sentence derived by wrapping each type in @-[3]@ follows.
+While any of @float*[3]@, @float[3][10]@ and @float(*)[3][10]@ are easy to tell apart from @float[3]@,
+telling them apart from each other is what it takes to know what ``array of arrays'' really means.
+Pointer decay affects the outermost array only
+TODO: unfortunate syntactic reference with these cases:
+\begin{itemize}
+        \item ar. of ar. of val (be sure about ordering of dimensions when the declaration is dropped)
+        \item ptr. to ar. of ar. of val
+\end{itemize}
+\subsection{Arrays are (but) almost values}
+Has size; can point to
+Can't cast to
+Can't pass as value
+Can initialize
+Can wrap in aggregate
+Can't assign
+\subsection{Returning an array is (but) almost possible}
+\subsection{The pointer-to-array type has been noticed before}
+\PAB{Explain, explain, explain.}
+\lstinput{20-26}{bkgd-carray-mdim.c}
+\PAB{Explain, explain, explain.}
+\lstinput{28-36}{bkgd-carray-mdim.c}
+\PAB{Explain, explain, explain.}
+\lstinput{38-44}{bkgd-carray-mdim.c}
+\subsection{Array Parameter Declaration}
+C has a formal and actual declaration for functions to allow definition-before-use and separate compilation, where formal describes a type and an actual defines the type.
+\begin{cfa}
+int foo( int, float, char );                            $\C{// formal, parameter names option}$
+int foo( int i, float f, char c ) { ... }       $\C{// actual}$
+\end{cfa}
+For array parameters, a formal parameter array declaration can specify the first dimension with a dimension value, @[10]@ (which is ignored), an empty dimension list, @[ ]@, or a pointer, @*@:
+\begin{cquote}
+\begin{tabular}{@{}llll@{}}
+\begin{cfa}
+double sum( double [5] );
+double sum( double *[5] );
+\end{cfa}
+&
+\begin{cfa}
+double sum( double [ ] );
+double sum( double *[ ] );
+\end{cfa}
+&
+\begin{cfa}
+double sum( double * );
+double sum( double ** );
+\end{cfa}
+&
+\begin{cfa}
+// array
+// matrix
+\end{cfa}
+\end{tabular}
+\end{cquote}
+Good practice uses the middle form as it clearly indicates the parameter is subscripted.
+However, an actual declaration cannot use @[ ]@;
+it must use @*@.
+\begin{cfa}
+double sum( double v[ ] ) {                                     $\C{// formal declaration}$
+double * cv;                                                            $\C{// actual declaration, think cv[ ]}$
+sum( cv );                                                                      $\C{// address assignment v = cv}$
+\end{cfa}
+Given the formal dimension forms @[ ]@ or @[5]@, it raises the question of qualifying the implicit array pointer rather than the array element type.
+For example, the qualifiers after the @*@ apply to the array pointer.
+\begin{cfa}
+void foo( const volatile int * @const volatile@ );
+void foo( const volatile int [ ] @const volatile@ ); // does not parse
+\end{cfa}
+C addressed this shortcoming by moving the pointer qualifiers into the first dimension.
+\begin{cquote}
+@[@ \textit{type-qualifier-list}$_{opt}$ \textit{assignment-expression}$_{opt}$ @]@
+\end{cquote}
+\begin{cfa}
+void foo( int [@const  volatile@] );
+void foo( int [@const  volatile@ 5] );          $\C{// 5 is ignored}$
+\end{cfa}
+To make the first formal dimension size meaningful, C adds this form.
+\begin{cquote}
+@[@ @static@ \textit{type-qualifier-list}$_{opt}$ \textit{assignment-expression} @]@
+\end{cquote}
+\begin{cfa}
+void foo( int [static @3@] );
+int ar[@10@];
+foo( ar ); // check argument dimension 10 > 3
+\end{cfa}
+Here, the @static@ storage qualifier defines the minimum array size for its argument.
+@gcc@ ignores this dimension qualifier, \ie it gives no warning if the argument array size is less than the parameter minimum.
+Finally, to handle VLAs, C repurposed the @*@ \emph{within} the dimension in the formal declaration context to mean the argument must be a VLA (contiguous).
+\begin{cquote}
+@[@ \textit{type-qualifier-list$_{opt}$} @* ]@
+\end{cquote}
+\begin{cfa}
+void foo( int [@*@][@*@] );                                     $\C{// formal}$
+void foo( int ar[10][10] ) { ... }                      $\C{// actual}$
+int ar[2][10];                                                          $\C{// contiguous}$
+foo( ar );                                                                      $\C{// valid}$
+int * arp[10];                                                          $\C{// non-contiguous}$
+foo( arp );                                                                     $\C{// invalid}$
+\end{cfa}
+This syntactic form for the formal prototype means the header file does not have to commit to specific dimension values, but the compiler knows the argument is a contiguous array.
+\subsection{Arrays could be values}
+All arrays have a know runtime size at their point of declaration.
+Furthermore, C provides an explicit mechanism to pass an array's dimensions to a function.
+Nevertheless, an array cannot be copied, and hence, not passed by value to a function, even when there is sufficient information to do so.
+However, if an array is a structure field (compile-time size), it can be copied and passed by value.
+For example, a C @jmp_buf@ is an array.
+\begin{cfa}
+typedef long int jmp_buf[8];
+\end{cfa}
+A instance of this array can be declared as a structure field.
+\begin{cfa}
+struct Jmp_Buf {
+        @jmp_buf@ jb;
+};
+\end{cfa}
+Now the array can be copied (and passed by value) because structures can be copied.
+\begin{cfa}
+Jmp_Buf jb1, jb2;
+jb1 = jb2;
+void foo( Jmp_Buf );
+foo( jb2 );
+\end{cfa}
+This same argument applies to returning arrays from functions.
+There can be sufficient information to return an array by value but it is not supported.
+Again, array wrapping allows an array to be returned from a function and copied into variable.
 …
 \subsection{Preexisting linked-list libraries}
+\label{s:PreexistingLinked-ListLibraries}
 Two preexisting linked-list libraries are used throughout, to show examples of the concepts being defined,
 …
 The kind of characters in the string is denoted by a prefix: UTF-8 characters are prefixed by @u8@, wide characters are prefixed by @L@, @u@, or @U@.
 For UTF-8 string literals, the array elements have type @char@ and are initialized with the characters of the multibyte character sequences, \eg @u8"\xe1\x90\x87"@ (Canadian syllabics Y-Cree OO).
 For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding to the multibyte character sequence, \eg @L"abc@$\mu$@"@ and are read/printed using @wsanf@/@wprintf@.
+For UTF-8 string literals, the array elements have type @char@ and are initialized with the characters of the multi-byte character sequences, \eg @u8"\xe1\x90\x87"@ (Canadian syllabics Y-Cree OO).
+For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding to the multi-byte character sequence, \eg @L"abc@$\mu$@"@ and are read/printed using @wsanf@/@wprintf@.
 The value of a wide-character is implementation-defined, usually a UTF-16 character.
 For wide string literals prefixed by the letter @u@ or @U@, the array elements have type @char16_t@ or @char32_t@, respectively, and are initialized with wide characters corresponding to the multibyte character sequence, \eg @u"abc@$\mu$@"@, @U"abc@$\mu$@"@.
+For wide string literals prefixed by the letter @u@ or @U@, the array elements have type @char16_t@ or @char32_t@, respectively, and are initialized with wide characters corresponding to the multi-byte character sequence, \eg @u"abc@$\mu$@"@, @U"abc@$\mu$@"@.
 The value of a @"u"@ character is an UTF-16 character;
 the value of a @"U"@ character is an UTF-32 character.
 The value of a string literal containing a multibyte character or escape sequence not represented in the execution character set is implementation-defined.
+The value of a string literal containing a multi-byte character or escape sequence not represented in the execution character set is implementation-defined.
 C strings are null-terminated rather than maintaining a separate string length.

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 63cf80e for doc

Legend:

TabularUnified doc/theses/mike_brooks_MMath/background.tex ¶

Download in other formats: