- Timestamp:
- Oct 29, 2024, 12:48:41 PM (2 weeks ago)
- Branches:
- master
- Children:
- 7ef4438
- Parents:
- bf91d1d
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/background.tex
rbf91d1d r63cf80e 23 23 with a segmentation fault at runtime. 24 24 Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems inappropriate. 25 Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings, \egunused variable.25 Compiling with flag @-Werror@, which turns warnings into errors, is often too pervasive, because some warnings are just warnings, \eg an unused variable. 26 26 In the following discussion, ``ill-typed'' means giving a nonzero @gcc@ exit condition with a message that discusses typing. 27 27 Note, \CFA's type-system rejects all these ill-typed cases as type mismatch errors. … … 38 38 39 39 40 \section{Array} 41 42 At the start, the C programming language made a significant design mistake. 43 \begin{quote} 44 In C, there is a strong relationship between pointers and arrays, strong enough that pointers and arrays really should be treated simultaneously. 45 Any operation which can be achieved by array subscripting can also be done with pointers.~\cite[p.~93]{C:old} 46 \end{quote} 47 Accessing any storage requires pointer arithmetic, even if it is just base-displacement addressing in an instruction. 48 The conjoining of pointers and arrays could also be applied to structures, where a pointer references a structure field like an array element. 49 Finally, while subscripting involves pointer arithmetic (as does field references @x.y.z@), the computation is very complex for multi-dimensional arrays and requires array descriptors to know stride lengths along dimensions. 50 Many C errors result from performing pointer arithmetic instead of using subscripting; 51 some C textbooks teach pointer arithmetic erroneously suggesting it is faster than subscripting. 52 A sound and efficient C program does not require explicit pointer arithmetic. 53 54 C semantics want a programmer to \emph{believe} an array variable is a ``pointer to its first element.'' 55 This desire becomes apparent by a detailed inspection of an array declaration. 56 \lstinput{34-34}{bkgd-carray-arrty.c} 57 The inspection begins by using @sizeof@ to provide definite program semantics for the intuition of an expression's type. 58 \lstinput{35-36}{bkgd-carray-arrty.c} 59 Now consider the sizes of expressions derived from @ar@, modified by adding ``pointer to'' and ``first element'' (and including unnecessary parentheses to avoid confusion about precedence). 60 \lstinput{37-40}{bkgd-carray-arrty.c} 61 Given the size of @float@ is 4, the size of @ar@ with 10 floats being 40 bytes is common reasoning for C programmers. 62 Equally, C programmers know the size of a \emph{pointer} to the first array element is 8 (or 4 depending on the addressing architecture). 63 % Now, set aside for a moment the claim that this first assertion is giving information about a type. 64 Clearly, an array and a pointer to its first element are different. 65 66 In fact, the idea that there is such a thing as a pointer to an array may be surprising and it is not the same thing as a pointer to the first element. 67 \lstinput{42-45}{bkgd-carray-arrty.c} 68 The first assignment gets 69 \begin{cfa} 70 warning: assignment to `float (*)[10]' from incompatible pointer type `float *' 71 \end{cfa} 72 and the second assignment gets the opposite. 73 74 The inspection now refutes any suggestion that @sizeof@ is informing about allocation rather than type information. 75 Note, @sizeof@ has two forms, one operating on an expression and the other on a type. 76 Using the type form yields the same results as the prior expression form. 77 \lstinput{46-49}{bkgd-carray-arrty.c} 78 The results are also the same when there is \emph{no allocation} using a pointer-to-array type. 79 \lstinput{51-57}{bkgd-carray-arrty.c} 80 Hence, in all cases, @sizeof@ is informing about type information. 81 82 So, thinking of an array as a pointer to its first element is too simplistic an analogue and it is not backed up by the type system. 83 This misguided analogue works for a single-dimension array but there is no advantage other than possibly teaching beginning programmers about basic runtime array-access. 84 85 Continuing, a short form for declaring array variables exists using length information provided implicitly by an initializer. 86 \lstinput{59-62}{bkgd-carray-arrty.c} 87 The compiler counts the number of initializer elements and uses this value as the first dimension. 88 Unfortunately, the implicit element counting does not extend to dimensions beyond the first. 89 \lstinput{64-67}{bkgd-carray-arrty.c} 90 91 My contribution is recognizing: 40 \section{Reading declarations} 41 42 A significant area of confusion for reading C declarations results from: 92 43 \begin{itemize} 93 \item There is value in using a type that knows its size. 94 \item The type pointer to (first) element does not. 95 \item C \emph{has} a type that knows the whole picture: array, e.g. @T[10]@. 96 \item This type has all the usual derived forms, which also know the whole picture. 97 A usefully noteworthy example is pointer to array, e.g. @T (*)[10]@.\footnote{ 98 The parenthesis are necessary because subscript has higher priority than pointer in C declarations. 99 (Subscript also has higher priority than dereference in C expressions.)} 44 \item 45 C is unique in having dimension be higher priority than pointer in declarations.\footnote{ 46 For consistency, subscript has higher priority than dereference, yielding \lstinline{(*arp)[3]} rather than \lstinline{*arp[3]}.} 47 \item 48 Embedding a declared variable in a declaration, mimics the way the variable is used in executable statements. 100 49 \end{itemize} 101 102 103 \section{Reading declarations}104 105 A significant area of confusion for reading C declarations results from embedding a declared variable in a declaration, mimicking the way the variable is used in executable statements.106 50 \begin{cquote} 107 51 \begin{tabular}{@{}ll@{}} … … 118 62 \end{tabular} 119 63 \end{cquote} 120 Essentially, the type is wrapped around the name in successive layers (like an \Index{onion}).64 The parenthesis are necessary to achieve a pointer to a @T@, and the type is wrapped around the name in successive layers (like an \Index{onion}) to match usage in an expression. 121 65 While attempting to make the two contexts consistent is a laudable goal, it has not worked out in practice, even though Dennis Richie believed otherwise: 122 66 \begin{quote} … … 155 99 \end{cquote} 156 100 As declaration size increases, it becomes corresponding difficult to read and understand the C declaration form, whereas reading and understanding a \CFA declaration has linear complexity as the declaration size increases. 157 Note, writing declarations left to right is common in other programming languages, where the function return-type is often placed after the parameter declarations. 101 Note, writing declarations left to right is common in other programming languages, where the function return-type is often placed after the parameter declarations, \eg \CC \lstinline[language=C++]{auto f( int ) -> int}. 102 Unfortunately, \CFA cannot interchange the priorities of subscript and dereference in expressions without breaking C compatibility. 158 103 159 104 \VRef[Table]{bkgd:ar:usr:avp} introduces the many layers of the C and \CFA array story, where the \CFA story is discussion in \VRef[Chapter]{c:Array}. 160 The \CFA-thesis column shows the new array declaration form, which is my contribut ed improvements forsafety and ergonomics.105 The \CFA-thesis column shows the new array declaration form, which is my contribution to safety and ergonomics. 161 106 The table shows there are multiple yet equivalent forms for the array types under discussion, and subsequent discussion shows interactions with orthogonal (but easily confused) language features. 162 107 Each row of the table shows alternate syntactic forms. 163 108 The simplest occurrences of types distinguished in the preceding discussion are marked with $\triangleright$. 164 Removing the declared variable @x@, gives the type used for variable, structure field, cast or error messages \PAB{(though note Section TODO points out that some types cannot be casted to)}.165 Unfortunately, parameter declarations \PAB{(section TODO)}have more syntactic forms and rules.109 Removing the declared variable @x@, gives the type used for variable, structure field, cast, or error messages. 110 Unfortunately, parameter declarations have more syntactic forms and rules. 166 111 167 112 \begin{table} … … 211 156 \end{table} 212 157 213 TODO: Address these parked unfortunate syntaxes 214 \begin{itemize} 215 \item static 216 \item star as dimension 217 \item under pointer decay: @int p1[const 3]@ being @int const *p1@ 158 159 \section{Array} 160 \label{s:Array} 161 162 At the start, the C language designers made a significant design mistake with respect to arrays. 163 \begin{quote} 164 In C, there is a strong relationship between pointers and arrays, strong enough that pointers and arrays really should be treated simultaneously. 165 Any operation which can be achieved by array subscripting can also be done with pointers.~\cite[p.~93]{C:old} 166 \end{quote} 167 Accessing any storage requires pointer arithmetic, even if it is just base-displacement addressing in an instruction. 168 The conjoining of pointers and arrays could also be applied to structures, where a pointer references a structure field like an array element. 169 Finally, while subscripting involves pointer arithmetic (as does a field reference @x.y.z@), the computation is complex for multi-dimensional arrays and requires array descriptors to know stride lengths along dimensions. 170 Many C errors result from manually performing pointer arithmetic instead of using language subscripting, letting the compiler performs any arithmetic; 171 some C textbooks erroneously suggest manual pointer arithmetic is faster than subscripting. 172 A sound and efficient C program does not require explicit pointer arithmetic. 173 174 C semantics want a programmer to \emph{believe} an array variable is a ``pointer to its first element.'' 175 This desire becomes apparent by a detailed inspection of an array declaration. 176 \lstinput{34-34}{bkgd-carray-arrty.c} 177 The inspection begins by using @sizeof@ to provide program semantics for the intuition of an expression's type. 178 \lstinput{35-36}{bkgd-carray-arrty.c} 179 Now consider the @sizeof@ expressions derived from @ar@, modified by adding pointer-to and first-element (and including unnecessary parentheses to avoid any confusion about precedence). 180 \lstinput{37-40}{bkgd-carray-arrty.c} 181 Given that arrays are contiguous and the size of @float@ is 4, then the size of @ar@ with 10 floats being 40 bytes is common reasoning for C programmers. 182 Equally, C programmers know the size of a pointer to the first array element is 8 (or 4 depending on the addressing architecture). 183 % Now, set aside for a moment the claim that this first assertion is giving information about a type. 184 Clearly, an array and a pointer to its first element are different. 185 186 In fact, the idea that there is such a thing as a pointer to an array may be surprising and it is not the same thing as a pointer to the first element. 187 \lstinput{42-45}{bkgd-carray-arrty.c} 188 The first assignment generates: 189 \begin{cfa} 190 warning: assignment to `float (*)[10]' from incompatible pointer type `float *' 191 \end{cfa} 192 and the second assignment generates the opposite. 193 194 The inspection now refutes any suggestion that @sizeof@ is informing about allocation rather than type information. 195 Note, @sizeof@ has two forms, one operating on an expression and the other on a type. 196 Using the type form yields the same results as the prior expression form. 197 \lstinput{46-49}{bkgd-carray-arrty.c} 198 The results are also the same when there is no allocation using a pointer-to-array type. 199 \lstinput{51-57}{bkgd-carray-arrty.c} 200 Hence, in all cases, @sizeof@ is informing about type information. 201 202 Therefore, thinking of an array as a pointer to its first element is too simplistic an analogue and it is not backed up by the type system. 203 This misguided analogue works for a single-dimension array but there is no advantage other than possibly teaching beginner programmers about basic runtime array-access. 204 205 Continuing, there is a short form for declaring array variables using length information provided implicitly by an initializer. 206 \lstinput{59-62}{bkgd-carray-arrty.c} 207 The compiler counts the number of initializer elements and uses this value as the first dimension. 208 Unfortunately, the implicit element counting does not extend to dimensions beyond the first. 209 \lstinput{64-67}{bkgd-carray-arrty.c} 210 211 My observation is recognizing: 212 \begin{itemize}[leftmargin=*,topsep=0pt,itemsep=0pt] 213 \item There is value in using a type that knows its size. 214 \item The type pointer to the (first) element does not. 215 \item C \emph{has} a type that knows the whole picture: array, \eg @T[10]@. 216 \item This type has all the usual derived forms, which also know the whole picture. 217 A noteworthy example is pointer to array, \eg @T (*)[10]@. 218 218 \end{itemize} 219 219 … … 221 221 \subsection{Arrays decay and pointers diffract} 222 222 223 The last section established the difference betweenthese four types:223 The last section established the difference among these four types: 224 224 \lstinput{3-6}{bkgd-carray-decay.c} 225 225 But the expression used for obtaining the pointer to the first element is pedantic. … … 228 228 which reproduces @pa0@, in type and value: 229 229 \lstinput{9-9}{bkgd-carray-decay.c} 230 The validity of this initialization is unsettling, in the context of the facts established in the last section.230 The validity of this initialization is unsettling, in the context of the facts established in \VRef{s:Array}. 231 231 Notably, it initializes name @pa0x@ from expression @ar@, when they are not of the same type: 232 232 \lstinput{10-10}{bkgd-carray-decay.c} … … 239 239 \end{quote} 240 240 This phenomenon is the famous \newterm{pointer decay}, which is a decay of an array-typed expression into a pointer-typed one. 241 It is worthy to note that the list of exception cases does not feature the occurrence of @ar@ in @ar[i]@.241 It is worthy to note that the list of exceptional cases does not feature the occurrence of @ar@ in @ar[i]@. 242 242 Thus, subscripting happens on pointers not arrays. 243 243 … … 247 247 Taken together, these rules illustrate that @ar[i]@ and @i[a]@ mean the same thing! 248 248 249 Subscripting a pointer when the target is standard 249 Subscripting a pointer when the target is standard-inappropriate is still practically well-defined. 250 250 While the standard affords a C compiler freedom about the meaning of an out-of-bound access, or of subscripting a pointer that does not refer to an array element at all, 251 251 the fact that C is famously both generally high-performance, and specifically not bound-checked, leads to an expectation that the runtime handling is uniform across legal and illegal accesses. … … 266 266 \cite[\S~6.7.6.3.7]{C11} explains that when an array type is written for a parameter, 267 267 the parameter's type becomes a type that can be summarized as the array-decayed type. 268 The respective handling of the following two parameter spellings shows that the array -spelled one is really, like the other, a pointer.268 The respective handling of the following two parameter spellings shows that the array and pointer versions are identical. 269 269 \lstinput{12-16}{bkgd-carray-decay.c} 270 270 As the @sizeof(x)@ meaning changed, compared with when run on a similarly-spelled local variable declaration, … … 292 292 warning: passing argument 1 of 'edit' discards 'const' qualifier from pointer target type 293 293 \end{cfa} 294 The basic two meanings, with a syntactic difference helping to distinguish, 295 are illustrated in the declarations of @ca@ \vs @cp@, 296 whose subsequent @edit@ calls behave differently. 297 The syntax-caused confusion is in the comparison of the first and last lines, 298 both of which use a literal to initialize an object declared with spelling @T x[]@. 299 But these initialized declarations get opposite meanings, 300 depending on whether the object is a local variable or a parameter. 294 The basic two meanings, with a syntactic difference helping to distinguish, are illustrated in the declarations of @ca@ \vs @cp@, whose subsequent @edit@ calls behave differently. 295 The syntax-caused confusion is in the comparison of the first and last lines, both of which use a literal to initialize an object declared with spelling @T x[]@. 296 But these initialized declarations get opposite meanings, depending on whether the object is a local variable or a parameter! 301 297 302 298 In summary, when a function is written with an array-typed parameter, 303 \begin{itemize} 304 \item an appearance of passing an array by value is always an incorrect understanding 305 \item a dimension value, if any is present, is ignored 306 \item pointer decay is forced at the call site and the callee sees the parameter having the decayed type 299 \begin{itemize}[leftmargin=*,topsep=0pt] 300 \item an appearance of passing an array by value is always an incorrect understanding, 301 \item a dimension value, if any is present, is ignored, 302 \item pointer decay is forced at the call site and the callee sees the parameter having the decayed type. 307 303 \end{itemize} 308 304 309 305 Pointer decay does not affect pointer-to-array types, because these are already pointers, not arrays. 310 306 As a result, a function with a pointer-to-array parameter sees the parameter exactly as the caller does: 311 \lstinput{32-42}{bkgd-carray-decay.c} 307 \par\noindent 308 \begin{tabular}{@{\hspace*{-0.75\parindentlnth}}l@{}l@{}} 309 \lstinput{32-36}{bkgd-carray-decay.c} 310 & 311 \lstinput{38-42}{bkgd-carray-decay.c} 312 \end{tabular} 313 \par\noindent 312 314 \VRef[Table]{bkgd:ar:usr:decay-parm} gives the reference for the decay phenomenon seen in parameter declarations. 313 315 … … 350 352 351 353 352 \subsection{Multi-dimensional} 353 354 As in the last section, multi-dimensional array declarations are examined. 354 \subsection{Variable Length Arrays} 355 356 As of C99, the C standard supports a \newterm{variable length array} (VLA)~\cite[\S~6.7.5.2.5]{C99}, providing a dynamic-fixed array feature \see{\VRef{s:ArrayIntro}}. 357 Note, the \CC standard does not support VLAs, but @g++@ provides them. 358 A VLA is used when the desired number of array elements is \emph{unknown} at compile time. 359 \begin{cfa} 360 size_t cols; 361 scanf( "%d", &cols ); 362 double ar[cols]; 363 \end{cfa} 364 The array dimension is read from outside the program and used to create an array of size @cols@ on the stack. 365 The VLA is implemented by the @alloca@ routine, which bumps the stack pointer. 366 Unfortunately, there is significant misinformation about VLAs, \eg the stack size is limited (small), or VLAs cause stack failures or are inefficient. 367 VLAs exist as far back as Algol W~\cite[\S~5.2]{AlgolW} and are a sound and efficient data type. 368 For types with a dynamic-fixed stack, \eg coroutines or user-level threads, large VLAs can overflow the stack without appropriately sizing the stack, so heap allocation is used when the array size is unbounded. 369 370 371 \subsection{Multidimensional Arrays} 372 \label{toc:mdimpl} 373 374 % TODO: introduce multidimensional array feature and approaches 375 376 When working with arrays, \eg linear algebra, array dimensions are referred to as ``rows'' and ``columns'' for a matrix, adding ``planes'' for a cube. 377 (There is little terminology for higher dimensional arrays.) 378 For example, an acrostic poem\footnote{A type of poetry where the first, last or other letters in a line spell out a particular word or phrase in a vertical column.} 379 can be treated as a grid of characters, where the rows are the text and the columns are the embedded keyword(s). 380 Within a poem, there is the concept of a \newterm{slice}, \eg a row is a slice for the poem text, a column is a slice for a keyword. 381 In general, the dimensioning and subscripting for multidimensional arrays has two syntactic forms: @m[r,c]@ or @m[r][c]@. 382 383 Commonly, an array, matrix, or cube, is visualized (especially in mathematics) as a contiguous row, rectangle, or block. 384 This conceptualization is reenforced by subscript ordering, \eg $m_{r,c}$ for a matrix and $c_{p,r,c}$ for a cube. 385 Few programming languages differ from the mathematical subscript ordering. 386 However, computer memory is flat, and hence, array forms are structured in memory as appropriate for the runtime system. 387 The closest representation to the conceptual visualization is for an array object to be contiguous, and the language structures this memory using pointer arithmetic to access the values using various subscripts. 388 This approach still has degrees of layout freedom, such as row or column major order, \ie juxtaposed rows or columns in memory, even when the subscript order remains fixed. 389 For example, programming languages like MATLAB, Fortran, Julia and R store matrices in column-major order since they are commonly used for processing column-vectors in tabular data sets but retain row-major subscripting to match with mathematical notation. 390 In general, storage layout is hidden by subscripting, and only appears when passing arrays among different programming languages or accessing specific hardware. 391 392 \VRef[Figure]{f:FixedVariable} shows two C90 approaches for manipulating a contiguous matrix. 393 Note, C90 does not support VLAs. 394 The fixed-dimension approach (left) uses the type system; 395 however, it requires all dimensions except the first to be specified at compile time, \eg @m[][6]@, allowing all subscripting stride calculations to be generated with constants. 396 Hence, every matrix passed to @fp1@ must have exactly 6 columns but the row size can vary. 397 The variable-dimension approach (right) ignores (violates) the type system, \ie argument and parameters types do not match, and subscripting is performed manually using pointer arithmetic in the macro @sub@. 398 399 \begin{figure} 400 \begin{tabular}{@{}l@{\hspace{40pt}}l@{}} 401 \multicolumn{1}{c}{\textbf{Fixed Dimension}} & \multicolumn{1}{c}{\textbf{Variable Dimension}} \\ 402 \begin{cfa} 403 404 void fp1( int rows, int m[][@6@] ) { 405 ... printf( "%d ", @m[r][c]@ ); ... 406 } 407 int fm1[4][@6@], fm2[6][@6@]; // no VLA 408 // initialize matrixes 409 fp1( 4, fm1 ); // implicit 6 columns 410 fp1( 6, fm2 ); 411 \end{cfa} 412 & 413 \begin{cfa} 414 #define sub( m, r, c ) *(m + r * sizeof( m[0] ) + c) 415 void fp2( int rows, int cols, int *m ) { 416 ... printf( "%d ", @sub( m, r, c )@ ); ... 417 } 418 int vm1[@4@][@4@], vm2[@6@][@8@]; // no VLA 419 // initialize matrixes 420 fp2( 4, 4, vm1 ); 421 fp2( 6, 8, vm2 ); 422 \end{cfa} 423 \end{tabular} 424 \caption{C90 Fixed \vs Variable Contiguous Matrix Styles} 425 \label{f:FixedVariable} 426 \end{figure} 427 428 Many languages allow multidimensional arrays-of-arrays, \eg in Pascal or \CC. 429 \begin{cquote} 430 \begin{tabular}{@{}ll@{}} 431 \begin{pascal} 432 var m : array[0..4, 0..4] of Integer; (* matrix *) 433 type AT = array[0..4] of Integer; (* array type *) 434 type MT = array[0..4] of AT; (* array of array type *) 435 var aa : MT; (* array of array variable *) 436 m@[1][2]@ := 1; aa@[1][2]@ := 1 (* same subscripting *) 437 \end{pascal} 438 & 439 \begin{c++} 440 int m[5][5]; 441 442 typedef vector< vector<int> > MT; 443 MT vm( 5, vector<int>( 5 ) ); 444 m@[1][2]@ = 1; aa@[1][2]@ = 1; 445 \end{c++} 446 \end{tabular} 447 \end{cquote} 448 The language decides if the matrix and array-of-array are laid out the same or differently. 449 For example, an array-of-array may be an array of row pointers to arrays of columns, so the rows may not be contiguous in memory nor even the same length (triangular matrix). 450 Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though a language could differentiated between the two forms using subscript syntax, \eg @m[1,2]@ \vs @aa[1][2]@. 451 Nevertheless, controlling memory layout can make a difference in what operations are allowed and in performance (caching/NUMA effects). 452 453 C also provides non-contiguous arrays-of-arrays. 454 \begin{cfa} 455 int m[5][5]; $\C{// contiguous}$ 456 int * aa[5]; $\C{// non-contiguous}$ 457 \end{cfa} 458 both with different memory layout using the same subscripting, and both with different degrees of issues. 459 The focus of this work is on the contiguous multidimensional arrays in C. 460 The reason is that programmers are often forced to use the more complex array-of-array form when a contiguous array would be simpler, faster, and safer. 461 Nevertheless, the C array-of-array form is still important for special circumstances. 462 463 \VRef[Figure]{f:ContiguousNon-contiguous} shows a powerful extension made in C99 for manipulating contiguous \vs non-contiguous arrays.\footnote{C90 also supported non-contiguous arrays.} 464 For contiguous-array (including VLA) arguments, C99 conjoins one or more of the parameters as a downstream dimension(s), \eg @cols@, implicitly using this parameter to compute the row stride of @m@. 465 There is now sufficient information to support subscript checking along the columns to prevent buffer-overflow problems, but subscript checking is not provided. 466 If the declaration of @fc@ is changed to: 467 \begin{cfa} 468 void fc( int rows, int cols, int m[@rows@][@cols@] ) ... 469 \end{cfa} 470 it is possible for C to perform bound checking across all subscripting. 471 While this contiguous-array capability is a step forward, it is still the programmer's responsibility to manually manage the number of dimensions and their sizes, both at the function definition and call sites. 472 That is, the array does not automatically carry its structure and sizes for use in computing subscripts. 473 While the non-contiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknown-sized array of row pointers, and it relies on the programmer to traverse the columns in a row correctly with a correctly bounded loop index. 474 Specifically, there is no requirement that the rows are the same length, like a poem with different length lines. 475 476 \begin{figure} 477 \begin{tabular}{@{}ll@{}} 478 \multicolumn{1}{c}{\textbf{Contiguous}} & \multicolumn{1}{c}{\textbf{ Non-contiguous}} \\ 479 \begin{cfa} 480 void fc( int rows, @int cols@, int m[ /* rows */ ][@cols@] ) { 481 for ( size_t r = 0; r < rows; r += 1 ) { 482 for ( size_t c = 0; c < cols; c += 1 ) 483 ... @m[r][c]@ ... 484 } 485 int m@[5][5]@; 486 for ( int r = 0; r < 5; r += 1 ) { 487 488 for ( int c = 0; c < 5; c += 1 ) 489 m[r][c] = r + c; 490 } 491 fc( 5, 5, m ); 492 \end{cfa} 493 & 494 \begin{cfa} 495 void faa( int rows, int cols, int * m[ @/* cols */@ ] ) { 496 for ( size_t r = 0; r < rows; r += 1 ) { 497 for ( size_t c = 0; c < cols; c += 1 ) 498 ... @m[r][c]@ ... 499 } 500 int @* aa[5]@; // row pointers 501 for ( int r = 0; r < 5; r += 1 ) { 502 @aa[r] = malloc( 5 * sizeof(int) );@ // create rows 503 for ( int c = 0; c < 5; c += 1 ) 504 aa[r][c] = r + c; 505 } 506 faa( 5, 5, aa ); 507 \end{cfa} 508 \end{tabular} 509 \caption{C99 Contiguous \vs Non-contiguous Matrix Styles} 510 \label{f:ContiguousNon-contiguous} 511 \end{figure} 512 513 514 \subsection{Multi-dimensional arrays decay and pointers diffract} 515 516 As for single-dimension arrays, multi-dimensional arrays have similar issues. 355 517 \lstinput{16-18}{bkgd-carray-mdim.c} 356 518 The significant axis of deriving expressions from @ar@ is now ``itself,'' ``first element'' or ``first grand-element (meaning, first element of first element).'' 357 \lstinput{20-44}{bkgd-carray-mdim.c} 358 359 360 \subsection{Lengths may vary, checking does not} 361 362 When the desired number of elements is unknown at compile time, a variable-length array is a solution: 363 \begin{cfa} 364 int main( int argc, const char * argv[] ) { 365 assert( argc == 2 ); 366 size_t n = atol( argv[1] ); 367 assert( 0 < n ); 368 float ar[n]; 369 float b[10]; 370 // ... discussion continues here 371 } 372 \end{cfa} 373 This arrangement allocates @n@ elements on the @main@ stack frame for @ar@, called a \newterm{variable length array} (VLA), as well as 10 elements in the same stack frame for @b@. 374 The variable-sized allocation of @ar@ is provided by the @alloca@ routine, which bumps the stack pointer. 375 Note, the C standard supports VLAs~\cite[\S~6.7.6.2.4]{C11} as a conditional feature, but the \CC standard does not; 376 both @gcc@ and @g++@ support VLAs. 377 As well, there is misinformation about VLAs, \eg the stack size is limited (small), or VLAs cause stack failures or are inefficient. 378 VLAs exist as far back as Algol W~\cite[\S~5.2]{AlgolW} and are a sound and efficient data type. 379 380 For high-performance applications, the stack size can be fixed and small (coroutines or user-level threads). 381 Here, VLAs can overflow the stack without appropriately sizing the stack, so a heap allocation is used. 382 \begin{cfa} 383 float * ax1 = malloc( sizeof( float[n] ) ); 384 float * ax2 = malloc( n * sizeof( float ) ); $\C{// arrays}$ 385 float * bx1 = malloc( sizeof( float[1000000] ) ); 386 float * bx2 = malloc( 1000000 * sizeof( float ) ); 387 \end{cfa} 388 389 Parameter dependency 390 391 Checking is best-effort / unsound 392 393 Limited special handling to get the dimension value checked (static) 394 395 396 \subsection{Dynamically sized, multidimensional arrays} 397 398 In C and \CC, ``multidimensional array'' means ``array of arrays.'' Other meanings are discussed in TODO. 399 400 Just as an array's element type can be @float@, so can it be @float[10]@. 401 402 While any of @float*@, @float[10]@ and @float(*)[10]@ are easy to tell apart from @float@, telling them apart from each other may need occasional reference back to TODO intro section. 403 The sentence derived by wrapping each type in @-[3]@ follows. 404 405 While any of @float*[3]@, @float[3][10]@ and @float(*)[3][10]@ are easy to tell apart from @float[3]@, 406 telling them apart from each other is what it takes to know what ``array of arrays'' really means. 407 408 Pointer decay affects the outermost array only 409 410 TODO: unfortunate syntactic reference with these cases: 411 412 \begin{itemize} 413 \item ar. of ar. of val (be sure about ordering of dimensions when the declaration is dropped) 414 \item ptr. to ar. of ar. of val 415 \end{itemize} 416 417 418 \subsection{Arrays are (but) almost values} 419 420 Has size; can point to 421 422 Can't cast to 423 424 Can't pass as value 425 426 Can initialize 427 428 Can wrap in aggregate 429 430 Can't assign 431 432 433 \subsection{Returning an array is (but) almost possible} 434 435 436 \subsection{The pointer-to-array type has been noticed before} 519 \PAB{Explain, explain, explain.} 520 \lstinput{20-26}{bkgd-carray-mdim.c} 521 \PAB{Explain, explain, explain.} 522 \lstinput{28-36}{bkgd-carray-mdim.c} 523 \PAB{Explain, explain, explain.} 524 \lstinput{38-44}{bkgd-carray-mdim.c} 525 526 527 \subsection{Array Parameter Declaration} 528 529 C has a formal and actual declaration for functions to allow definition-before-use and separate compilation, where formal describes a type and an actual defines the type. 530 \begin{cfa} 531 int foo( int, float, char ); $\C{// formal, parameter names option}$ 532 int foo( int i, float f, char c ) { ... } $\C{// actual}$ 533 \end{cfa} 534 For array parameters, a formal parameter array declaration can specify the first dimension with a dimension value, @[10]@ (which is ignored), an empty dimension list, @[ ]@, or a pointer, @*@: 535 \begin{cquote} 536 \begin{tabular}{@{}llll@{}} 537 \begin{cfa} 538 double sum( double [5] ); 539 double sum( double *[5] ); 540 \end{cfa} 541 & 542 \begin{cfa} 543 double sum( double [ ] ); 544 double sum( double *[ ] ); 545 \end{cfa} 546 & 547 \begin{cfa} 548 double sum( double * ); 549 double sum( double ** ); 550 \end{cfa} 551 & 552 \begin{cfa} 553 // array 554 // matrix 555 \end{cfa} 556 \end{tabular} 557 \end{cquote} 558 Good practice uses the middle form as it clearly indicates the parameter is subscripted. 559 However, an actual declaration cannot use @[ ]@; 560 it must use @*@. 561 \begin{cfa} 562 double sum( double v[ ] ) { $\C{// formal declaration}$ 563 double * cv; $\C{// actual declaration, think cv[ ]}$ 564 sum( cv ); $\C{// address assignment v = cv}$ 565 \end{cfa} 566 567 Given the formal dimension forms @[ ]@ or @[5]@, it raises the question of qualifying the implicit array pointer rather than the array element type. 568 For example, the qualifiers after the @*@ apply to the array pointer. 569 \begin{cfa} 570 void foo( const volatile int * @const volatile@ ); 571 void foo( const volatile int [ ] @const volatile@ ); // does not parse 572 \end{cfa} 573 C addressed this shortcoming by moving the pointer qualifiers into the first dimension. 574 \begin{cquote} 575 @[@ \textit{type-qualifier-list}$_{opt}$ \textit{assignment-expression}$_{opt}$ @]@ 576 \end{cquote} 577 \begin{cfa} 578 void foo( int [@const volatile@] ); 579 void foo( int [@const volatile@ 5] ); $\C{// 5 is ignored}$ 580 \end{cfa} 581 582 To make the first formal dimension size meaningful, C adds this form. 583 \begin{cquote} 584 @[@ @static@ \textit{type-qualifier-list}$_{opt}$ \textit{assignment-expression} @]@ 585 \end{cquote} 586 \begin{cfa} 587 void foo( int [static @3@] ); 588 int ar[@10@]; 589 foo( ar ); // check argument dimension 10 > 3 590 \end{cfa} 591 Here, the @static@ storage qualifier defines the minimum array size for its argument. 592 @gcc@ ignores this dimension qualifier, \ie it gives no warning if the argument array size is less than the parameter minimum. 593 594 Finally, to handle VLAs, C repurposed the @*@ \emph{within} the dimension in the formal declaration context to mean the argument must be a VLA (contiguous). 595 \begin{cquote} 596 @[@ \textit{type-qualifier-list$_{opt}$} @* ]@ 597 \end{cquote} 598 \begin{cfa} 599 void foo( int [@*@][@*@] ); $\C{// formal}$ 600 void foo( int ar[10][10] ) { ... } $\C{// actual}$ 601 int ar[2][10]; $\C{// contiguous}$ 602 foo( ar ); $\C{// valid}$ 603 int * arp[10]; $\C{// non-contiguous}$ 604 foo( arp ); $\C{// invalid}$ 605 \end{cfa} 606 This syntactic form for the formal prototype means the header file does not have to commit to specific dimension values, but the compiler knows the argument is a contiguous array. 607 608 609 \subsection{Arrays could be values} 610 611 All arrays have a know runtime size at their point of declaration. 612 Furthermore, C provides an explicit mechanism to pass an array's dimensions to a function. 613 Nevertheless, an array cannot be copied, and hence, not passed by value to a function, even when there is sufficient information to do so. 614 615 However, if an array is a structure field (compile-time size), it can be copied and passed by value. 616 For example, a C @jmp_buf@ is an array. 617 \begin{cfa} 618 typedef long int jmp_buf[8]; 619 \end{cfa} 620 A instance of this array can be declared as a structure field. 621 \begin{cfa} 622 struct Jmp_Buf { 623 @jmp_buf@ jb; 624 }; 625 \end{cfa} 626 Now the array can be copied (and passed by value) because structures can be copied. 627 \begin{cfa} 628 Jmp_Buf jb1, jb2; 629 jb1 = jb2; 630 void foo( Jmp_Buf ); 631 foo( jb2 ); 632 \end{cfa} 633 634 This same argument applies to returning arrays from functions. 635 There can be sufficient information to return an array by value but it is not supported. 636 Again, array wrapping allows an array to be returned from a function and copied into variable. 437 637 438 638 … … 467 667 468 668 \subsection{Preexisting linked-list libraries} 669 \label{s:PreexistingLinked-ListLibraries} 469 670 470 671 Two preexisting linked-list libraries are used throughout, to show examples of the concepts being defined, … … 883 1084 The kind of characters in the string is denoted by a prefix: UTF-8 characters are prefixed by @u8@, wide characters are prefixed by @L@, @u@, or @U@. 884 1085 885 For UTF-8 string literals, the array elements have type @char@ and are initialized with the characters of the multi byte character sequences, \eg @u8"\xe1\x90\x87"@ (Canadian syllabics Y-Cree OO).886 For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding to the multi byte character sequence, \eg @L"abc@$\mu$@"@ and are read/printed using @wsanf@/@wprintf@.1086 For UTF-8 string literals, the array elements have type @char@ and are initialized with the characters of the multi-byte character sequences, \eg @u8"\xe1\x90\x87"@ (Canadian syllabics Y-Cree OO). 1087 For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding to the multi-byte character sequence, \eg @L"abc@$\mu$@"@ and are read/printed using @wsanf@/@wprintf@. 887 1088 The value of a wide-character is implementation-defined, usually a UTF-16 character. 888 For wide string literals prefixed by the letter @u@ or @U@, the array elements have type @char16_t@ or @char32_t@, respectively, and are initialized with wide characters corresponding to the multi byte character sequence, \eg @u"abc@$\mu$@"@, @U"abc@$\mu$@"@.1089 For wide string literals prefixed by the letter @u@ or @U@, the array elements have type @char16_t@ or @char32_t@, respectively, and are initialized with wide characters corresponding to the multi-byte character sequence, \eg @u"abc@$\mu$@"@, @U"abc@$\mu$@"@. 889 1090 The value of a @"u"@ character is an UTF-16 character; 890 1091 the value of a @"U"@ character is an UTF-32 character. 891 The value of a string literal containing a multi byte character or escape sequence not represented in the execution character set is implementation-defined.1092 The value of a string literal containing a multi-byte character or escape sequence not represented in the execution character set is implementation-defined. 892 1093 893 1094 C strings are null-terminated rather than maintaining a separate string length.
Note: See TracChangeset
for help on using the changeset viewer.