[27f1055] | 1 | \chapter{Background} |
---|
| 2 | |
---|
[0554c1a] | 3 | Since this work builds on C, it is necessary to explain the C mechanisms and their shortcomings for array, linked list, and string. |
---|
[27f1055] | 4 | |
---|
[40ab446] | 5 | |
---|
[f5fbcad] | 6 | \section{Array} |
---|
[40ab446] | 7 | |
---|
[b5bfb16] | 8 | At the start, the C programming language made a significant design mistake. |
---|
| 9 | \begin{quote} |
---|
| 10 | In C, there is a strong relationship between pointers and arrays, strong enough that pointers and arrays really should be treated simultaneously. |
---|
| 11 | Any operation which can be achieved by array subscripting can also be done with pointers.~\cite[p.~93]{C:old} |
---|
| 12 | \end{quote} |
---|
| 13 | Accessing any storage requires pointer arithmetic, even if it is just base-displacement addressing in an instruction. |
---|
| 14 | The conjoining of pointers and arrays could also be applied to structures, where a pointer references a structure field like an array element. |
---|
| 15 | Finally, while subscripting involves pointer arithmetic (as does field references @x.y.z@), it is very complex for multi-dimensional arrays and requires array descriptors to know stride lengths along dimensions. |
---|
| 16 | Many C errors result from performing pointer arithmetic instead of using subscripting. |
---|
| 17 | Some C textbooks erroneously teach pointer arithmetic suggesting it is faster than subscripting. |
---|
| 18 | |
---|
| 19 | C semantics want a programmer to \emph{believe} an array variable is a ``pointer to its first element.'' |
---|
| 20 | This desire becomes apparent by a detailed inspection of an array declaration. |
---|
[266732e] | 21 | \lstinput{34-34}{bkgd-carray-arrty.c} |
---|
| 22 | The inspection begins by using @sizeof@ to provide definite program semantics for the intuition of an expression's type. |
---|
| 23 | \lstinput{35-36}{bkgd-carray-arrty.c} |
---|
[b5bfb16] | 24 | Now consider the sizes of expressions derived from @ar@, modified by adding ``pointer to'' and ``first element'' (and including unnecessary parentheses to avoid confusion about precedence). |
---|
[266732e] | 25 | \lstinput{37-40}{bkgd-carray-arrty.c} |
---|
[b5bfb16] | 26 | Given the size of @float@ is 4, the size of @ar@ with 10 floats being 40 bytes is common reasoning for C programmers. |
---|
| 27 | Equally, C programmers know the size of a \emph{pointer} to the first array element is 8 (or 4 depending on the addressing architecture). |
---|
| 28 | % Now, set aside for a moment the claim that this first assertion is giving information about a type. |
---|
| 29 | Clearly, an array and a pointer to its first element are different things. |
---|
[266732e] | 30 | |
---|
[b5bfb16] | 31 | In fact, the idea that there is such a thing as a pointer to an array may be surprising and it is not the same thing as a pointer to the first element. |
---|
[266732e] | 32 | \lstinput{42-45}{bkgd-carray-arrty.c} |
---|
[b5bfb16] | 33 | The first assignment gets |
---|
[266732e] | 34 | \begin{cfa} |
---|
| 35 | warning: assignment to `float (*)[10]' from incompatible pointer type `float *' |
---|
| 36 | \end{cfa} |
---|
[b5bfb16] | 37 | and the second assignment gets the opposite. |
---|
[266732e] | 38 | |
---|
[b5bfb16] | 39 | The inspection now refutes any suggestion that @sizeof@ is informing about allocation rather than type information. |
---|
| 40 | Note, @sizeof@ has two forms, one operating on an expression and the other on a type. |
---|
| 41 | Using the type form yields the same results as the prior expression form. |
---|
| 42 | \lstinput{46-49}{bkgd-carray-arrty.c} |
---|
| 43 | The results are also the same when there is \emph{no allocation} using a pointer-to-array type. |
---|
[266732e] | 44 | \lstinput{51-57}{bkgd-carray-arrty.c} |
---|
[b5bfb16] | 45 | Hence, in all cases, @sizeof@ is informing about type information. |
---|
| 46 | |
---|
[d3a49864] | 47 | So, thinking of an array as a pointer to its first element is too simplistic an analogue and it is not backed up by the type system. |
---|
| 48 | This misguided analogue works for a single-dimension array but there is no advantage other than possibly teaching beginning programmers about basic runtime array-access. |
---|
[266732e] | 49 | |
---|
[d3a49864] | 50 | Continuing, a short form for declaring array variables exists using length information provided implicitly by an initializer. |
---|
[b5bfb16] | 51 | \lstinput{59-62}{bkgd-carray-arrty.c} |
---|
[d3a49864] | 52 | The compiler counts the number of initializer elements and uses this value as the first dimension. |
---|
| 53 | Unfortunately, the implicit element counting does not extend to dimensions beyond the first. |
---|
| 54 | \lstinput{64-67}{bkgd-carray-arrty.c} |
---|
[266732e] | 55 | |
---|
[d3a49864] | 56 | My contribution is recognizing: |
---|
[40ab446] | 57 | \begin{itemize} |
---|
[d3a49864] | 58 | \item There is value in using a type that knows its size. |
---|
[266732e] | 59 | \item The type pointer to (first) element does not. |
---|
| 60 | \item C \emph{has} a type that knows the whole picture: array, e.g. @T[10]@. |
---|
[d3a49864] | 61 | \item This type has all the usual derived forms, which also know the whole picture. |
---|
| 62 | A usefully noteworthy example is pointer to array, e.g. @T (*)[10]@.\footnote{ |
---|
| 63 | The parenthesis are necessary because subscript has higher priority than pointer in C declarations. |
---|
| 64 | (Subscript also has higher priority than dereference in C expressions.)} |
---|
[40ab446] | 65 | \end{itemize} |
---|
| 66 | |
---|
| 67 | |
---|
[dd37afa] | 68 | \section{Reading Declarations} |
---|
[40ab446] | 69 | |
---|
[dd37afa] | 70 | A significant area of confusion for reading C declarations results from embedding a declared variable in a declaration, mimicking the way the variable is used in executable statements. |
---|
| 71 | \begin{cquote} |
---|
| 72 | \begin{tabular}{@{}ll@{}} |
---|
[0554c1a] | 73 | \multicolumn{1}{@{}c}{\textbf{Array}} & \multicolumn{1}{c@{}}{\textbf{Function Pointer}} \\ |
---|
[266732e] | 74 | \begin{cfa} |
---|
[dd37afa] | 75 | int @(*@ar@)[@5@]@; // definition |
---|
| 76 | ... @(*@ar@)[@3@]@ += 1; // usage |
---|
[266732e] | 77 | \end{cfa} |
---|
[dd37afa] | 78 | & |
---|
[d3a49864] | 79 | \begin{cfa} |
---|
[dd37afa] | 80 | int @(*@f@())[@5@]@ { ... }; // definition |
---|
| 81 | ... @(*@f@())[@3@]@ += 1; // usage |
---|
[d3a49864] | 82 | \end{cfa} |
---|
[dd37afa] | 83 | \end{tabular} |
---|
| 84 | \end{cquote} |
---|
| 85 | Essentially, the type is wrapped around the name in successive layers (like an \Index{onion}). |
---|
[d3a49864] | 86 | While attempting to make the two contexts consistent is a laudable goal, it has not worked out in practice, even though Dennis Richie believed otherwise: |
---|
| 87 | \begin{quote} |
---|
| 88 | In spite of its difficulties, I believe that the C's approach to declarations remains plausible, and am comfortable with it; it is a useful unifying principle.~\cite[p.~12]{Ritchie93} |
---|
| 89 | \end{quote} |
---|
[dd37afa] | 90 | After all, reading a C array type is easy: just read it from the inside out, and know when to look left and when to look right! |
---|
[d3a49864] | 91 | |
---|
[0554c1a] | 92 | \CFA provides its own type, variable and routine declarations, using a simpler syntax. |
---|
[d3a49864] | 93 | The new declarations place qualifiers to the left of the base type, while C declarations place qualifiers to the right of the base type. |
---|
[dd37afa] | 94 | The qualifiers have the same meaning in \CFA as in C. |
---|
[0554c1a] | 95 | Then, a \CFA declaration is read left to right, where a function return type is enclosed in brackets @[@\,@]@. |
---|
[d3a49864] | 96 | \begin{cquote} |
---|
[dd37afa] | 97 | \begin{tabular}{@{}l@{\hspace{3em}}ll@{}} |
---|
[0554c1a] | 98 | \multicolumn{1}{c@{\hspace{3em}}}{\textbf{C}} & \multicolumn{1}{c}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{read left to right}} \\ |
---|
[dd37afa] | 99 | \begin{cfa} |
---|
| 100 | int @*@ x1 @[5]@; |
---|
| 101 | int @(*@x2@)[5]@; |
---|
| 102 | int @(*@f( int p )@)[5]@; |
---|
| 103 | \end{cfa} |
---|
| 104 | & |
---|
| 105 | \begin{cfa} |
---|
| 106 | @[5] *@ int x1; |
---|
| 107 | @* [5]@ int x2; |
---|
| 108 | @[ * [5] int ]@ f( int p ); |
---|
[d3a49864] | 109 | \end{cfa} |
---|
| 110 | & |
---|
[dd37afa] | 111 | \begin{cfa} |
---|
| 112 | // array of 5 pointers to int |
---|
| 113 | // pointer to array of 5 int |
---|
| 114 | // function returning pointer to array of 5 ints |
---|
[d3a49864] | 115 | \end{cfa} |
---|
[dd37afa] | 116 | \\ |
---|
| 117 | & & |
---|
| 118 | \LstCommentStyle{//\ \ \ and taking an int argument} |
---|
[d3a49864] | 119 | \end{tabular} |
---|
| 120 | \end{cquote} |
---|
[dd37afa] | 121 | As declaration complexity increases, it becomes corresponding difficult to read and understand the C declaration form. |
---|
| 122 | Note, writing declarations left to right is common in other programming languages, where the function return-type is often placed after the parameter declarations. |
---|
[d3a49864] | 123 | |
---|
[dd37afa] | 124 | \VRef[Table]{bkgd:ar:usr:avp} introduces the many layers of the C and \CFA array story, where the \CFA story is discussion in \VRef{XXX}. |
---|
| 125 | The \CFA-thesis column shows the new array declaration form, which is my contributed improvements for safety and ergonomics. |
---|
| 126 | The table shows there are multiple yet equivalent forms for the array types under discussion, and subsequent discussion shows interactions with orthogonal (but easily confused) language features. |
---|
| 127 | Each row of the table shows alternate syntactic forms. |
---|
| 128 | The simplest occurrences of types distinguished in the preceding discussion are marked with $\triangleright$. |
---|
[0554c1a] | 129 | Removing the declared variable @x@, gives the type used for variable, structure field, cast or error messages \PAB{(though note Section TODO points out that some types cannot be casted to)}. |
---|
[dd37afa] | 130 | Unfortunately, parameter declarations \PAB{(section TODO)} have more syntactic forms and rules. |
---|
[40ab446] | 131 | |
---|
[dd37afa] | 132 | \begin{table} |
---|
[266732e] | 133 | \centering |
---|
[0554c1a] | 134 | \caption{Syntactic Reference for Array vs Pointer. Includes interaction with \lstinline{const}ness.} |
---|
[dd37afa] | 135 | \label{bkgd:ar:usr:avp} |
---|
| 136 | \begin{tabular}{ll|l|l|l} |
---|
| 137 | & Description & \multicolumn{1}{c|}{C} & \multicolumn{1}{c|}{\CFA} & \multicolumn{1}{c}{\CFA-thesis} \\ |
---|
| 138 | \hline |
---|
[0554c1a] | 139 | $\triangleright$ & value & @T x;@ & @T x;@ & \\ |
---|
[d3a49864] | 140 | \hline |
---|
[dd37afa] | 141 | & immutable value & @const T x;@ & @const T x;@ & \\ |
---|
| 142 | & & @T const x;@ & @T const x;@ & \\ |
---|
[d3a49864] | 143 | \hline \hline |
---|
[0554c1a] | 144 | $\triangleright$ & pointer to value & @T * x;@ & @* T x;@ & \\ |
---|
[d3a49864] | 145 | \hline |
---|
[dd37afa] | 146 | & immutable ptr. to val. & @T * const x;@ & @const * T x;@ & \\ |
---|
[d3a49864] | 147 | \hline |
---|
[dd37afa] | 148 | & ptr. to immutable val. & @const T * x;@ & @* const T x;@ & \\ |
---|
| 149 | & & @T const * x;@ & @* T const x;@ & \\ |
---|
[d3a49864] | 150 | \hline \hline |
---|
[0554c1a] | 151 | $\triangleright$ & array of value & @T x[10];@ & @[10] T x@ & @array(T, 10) x@ \\ |
---|
[d3a49864] | 152 | \hline |
---|
[dd37afa] | 153 | & ar.\ of immutable val. & @const T x[10];@ & @[10] const T x@ & @const array(T, 10) x@ \\ |
---|
| 154 | & & @T const x[10];@ & @[10] T const x@ & @array(T, 10) const x@ \\ |
---|
[d3a49864] | 155 | \hline |
---|
[dd37afa] | 156 | & ar.\ of ptr.\ to value & @T * x[10];@ & @[10] * T x@ & @array(T *, 10) x@ \\ |
---|
| 157 | & & & & @array(* T, 10) x@ \\ |
---|
[d3a49864] | 158 | \hline |
---|
[dd37afa] | 159 | & ar.\ of imm. ptr.\ to val. & @T * const x[10];@ & @[10] const * T x@ & @array(* const T, 10) x@ \\ |
---|
| 160 | & & & & @array(const * T, 10) x@ \\ |
---|
[d3a49864] | 161 | \hline |
---|
[dd37afa] | 162 | & ar.\ of ptr.\ to imm. val. & @const T * x[10];@ & @[10] * const T x@ & @array(const T *, 10) x@ \\ |
---|
| 163 | & & @T const * x[10];@ & @[10] * T const x@ & @array(* const T, 10) x@ \\ |
---|
[d3a49864] | 164 | \hline \hline |
---|
[0554c1a] | 165 | $\triangleright$ & ptr.\ to ar.\ of value & @T (*x)[10];@ & @* [10] T x@ & @* array(T, 10) x@ \\ |
---|
[d3a49864] | 166 | \hline |
---|
[dd37afa] | 167 | & imm. ptr.\ to ar.\ of val. & @T (* const x)[10];@ & @const * [10] T x@ & @const * array(T, 10) x@ \\ |
---|
[d3a49864] | 168 | \hline |
---|
[dd37afa] | 169 | & ptr.\ to ar.\ of imm. val. & @const T (*x)[10];@ & @* [10] const T x@ & @* const array(T, 10) x@ \\ |
---|
| 170 | & & @T const (*x)[10];@ & @* [10] T const x@ & @* array(T, 10) const x@ \\ |
---|
[d3a49864] | 171 | \hline |
---|
[dd37afa] | 172 | & ptr.\ to ar.\ of ptr.\ to val. & @T *(*x)[10];@ & @* [10] * T x@ & @* array(T *, 10) x@ \\ |
---|
| 173 | & & & & @* array(* T, 10) x@ \\ |
---|
[d3a49864] | 174 | \hline |
---|
[40ab446] | 175 | \end{tabular} |
---|
[dd37afa] | 176 | \end{table} |
---|
[2d82999] | 177 | |
---|
| 178 | TODO: Address these parked unfortunate syntaxes |
---|
| 179 | \begin{itemize} |
---|
| 180 | \item static |
---|
| 181 | \item star as dimension |
---|
[0554c1a] | 182 | \item under pointer decay: @int p1[const 3]@ being @int const *p1@ |
---|
[2d82999] | 183 | \end{itemize} |
---|
| 184 | |
---|
| 185 | |
---|
[40ab446] | 186 | \subsection{Arrays decay and pointers diffract} |
---|
| 187 | |
---|
[266732e] | 188 | The last section established the difference between these four types: |
---|
| 189 | \lstinput{3-6}{bkgd-carray-decay.c} |
---|
| 190 | But the expression used for obtaining the pointer to the first element is pedantic. |
---|
| 191 | The root of all C programmer experience with arrays is the shortcut |
---|
| 192 | \lstinput{8-8}{bkgd-carray-decay.c} |
---|
| 193 | which reproduces @pa0@, in type and value: |
---|
| 194 | \lstinput{9-9}{bkgd-carray-decay.c} |
---|
| 195 | The validity of this initialization is unsettling, in the context of the facts established in the last section. |
---|
[b5bfb16] | 196 | Notably, it initializes name @pa0x@ from expression @ar@, when they are not of the same type: |
---|
[266732e] | 197 | \lstinput{10-10}{bkgd-carray-decay.c} |
---|
[40ab446] | 198 | |
---|
[0554c1a] | 199 | So, C provides an implicit conversion from @float[10]@ to @float *@. |
---|
[40ab446] | 200 | \begin{quote} |
---|
[0554c1a] | 201 | Except when it is the operand of the @sizeof@ operator, or the unary @&@ operator, or is a string literal used to |
---|
| 202 | initialize an array an expression that has type ``array of \emph{type}'' is converted to an expression with type |
---|
| 203 | ``pointer to \emph{type}'' that points to the initial element of the array object~\cite[\S~6.3.2.1.3]{C11} |
---|
[40ab446] | 204 | \end{quote} |
---|
| 205 | This phenomenon is the famous ``pointer decay,'' which is a decay of an array-typed expression into a pointer-typed one. |
---|
[b5bfb16] | 206 | It is worthy to note that the list of exception cases does not feature the occurrence of @ar@ in @ar[i]@. |
---|
[0554c1a] | 207 | Thus, subscripting happens on pointers not arrays. |
---|
[40ab446] | 208 | |
---|
[0554c1a] | 209 | Subscripting proceeds first with pointer decay, if needed. Next, \cite[\S~6.5.2.1.2]{C11} explains that @ar[i]@ is treated as if it were @(*((a)+(i)))@. |
---|
| 210 | \cite[\S~6.5.6.8]{C11} explains that the addition, of a pointer with an integer type, is defined only when the pointer refers to an element that is in an array, with a meaning of ``@i@ elements away from,'' which is valid if @ar@ is big enough and @i@ is small enough. |
---|
| 211 | Finally, \cite[\S~6.5.3.2.4]{C11} explains that the @*@ operator's result is the referenced element. |
---|
| 212 | Taken together, these rules illustrate that @ar[i]@ and @i[a]@ mean the same thing! |
---|
[40ab446] | 213 | |
---|
| 214 | Subscripting a pointer when the target is standard-inappropriate is still practically well-defined. |
---|
| 215 | While the standard affords a C compiler freedom about the meaning of an out-of-bound access, |
---|
| 216 | or of subscripting a pointer that does not refer to an array element at all, |
---|
| 217 | the fact that C is famously both generally high-performance, and specifically not bound-checked, |
---|
| 218 | leads to an expectation that the runtime handling is uniform across legal and illegal accesses. |
---|
[b5bfb16] | 219 | Moreover, consider the common pattern of subscripting on a @malloc@ result: |
---|
[266732e] | 220 | \begin{cfa} |
---|
| 221 | float * fs = malloc( 10 * sizeof(float) ); |
---|
| 222 | fs[5] = 3.14; |
---|
| 223 | \end{cfa} |
---|
[0554c1a] | 224 | The @malloc@ behaviour is specified as returning a pointer to ``space for an object whose size is'' as requested (\cite[\S~7.22.3.4.2]{C11}). |
---|
| 225 | But \emph{nothing} more is said about this pointer value, specifically that its referent might \emph{be} an array allowing subscripting. |
---|
[40ab446] | 226 | |
---|
[0554c1a] | 227 | Under this assumption, a pointer being subscripted (or added to, then dereferenced) by any value (positive, zero, or negative), gives a view of the program's entire address space, centred around the @p@ address, divided into adjacent @sizeof(*p)@ chunks, each potentially (re)interpreted as @typeof(*p)@. |
---|
| 228 | I call this phenomenon ``array diffraction,'' which is a diffraction of a single-element pointer into the assumption that its target is in the middle of an array whose size is unlimited in both directions. |
---|
[40ab446] | 229 | No pointer is exempt from array diffraction. |
---|
| 230 | No array shows its elements without pointer decay. |
---|
| 231 | |
---|
| 232 | A further pointer--array confusion, closely related to decay, occurs in parameter declarations. |
---|
[0554c1a] | 233 | \cite[\S~6.7.6.3.7]{C11} explains that when an array type is written for a parameter, |
---|
| 234 | the parameter's type becomes a type that can be summarized as the array-decayed type. |
---|
[d3a49864] | 235 | The respective handling of the following two parameter spellings shows that the array-spelled one is really, like the other, a pointer. |
---|
[266732e] | 236 | \lstinput{12-16}{bkgd-carray-decay.c} |
---|
[b5bfb16] | 237 | As the @sizeof(x)@ meaning changed, compared with when run on a similarly-spelled local variable declaration, |
---|
[c4024b46] | 238 | @gcc@ also gives this code the warning for the first assertion: |
---|
[0554c1a] | 239 | \begin{cfa} |
---|
| 240 | warning: 'sizeof' on array function parameter 'x' will return size of 'float *' |
---|
| 241 | \end{cfa} |
---|
| 242 | The caller of such a function is left with the reality that a pointer parameter is a pointer, no matter how it is spelled: |
---|
[266732e] | 243 | \lstinput{18-21}{bkgd-carray-decay.c} |
---|
[40ab446] | 244 | This fragment gives no warnings. |
---|
| 245 | |
---|
| 246 | The shortened parameter syntax @T x[]@ is a further way to spell ``pointer.'' |
---|
| 247 | Note the opposite meaning of this spelling now, compared with its use in local variable declarations. |
---|
| 248 | This point of confusion is illustrated in: |
---|
[266732e] | 249 | \lstinput{23-30}{bkgd-carray-decay.c} |
---|
[40ab446] | 250 | The basic two meanings, with a syntactic difference helping to distinguish, |
---|
| 251 | are illustrated in the declarations of @ca@ vs.\ @cp@, |
---|
| 252 | whose subsequent @edit@ calls behave differently. |
---|
| 253 | The syntax-caused confusion is in the comparison of the first and last lines, |
---|
[b5bfb16] | 254 | both of which use a literal to initialize an object declared with spelling @T x[]@. |
---|
[40ab446] | 255 | But these initialized declarations get opposite meanings, |
---|
| 256 | depending on whether the object is a local variable or a parameter. |
---|
| 257 | |
---|
[b5bfb16] | 258 | In summary, when a function is written with an array-typed parameter, |
---|
[40ab446] | 259 | \begin{itemize} |
---|
[266732e] | 260 | \item an appearance of passing an array by value is always an incorrect understanding |
---|
[b5bfb16] | 261 | \item a dimension value, if any is present, is ignored |
---|
[266732e] | 262 | \item pointer decay is forced at the call site and the callee sees the parameter having the decayed type |
---|
[40ab446] | 263 | \end{itemize} |
---|
| 264 | |
---|
| 265 | Pointer decay does not affect pointer-to-array types, because these are already pointers, not arrays. |
---|
| 266 | As a result, a function with a pointer-to-array parameter sees the parameter exactly as the caller does: |
---|
[266732e] | 267 | \lstinput{32-42}{bkgd-carray-decay.c} |
---|
[0554c1a] | 268 | \VRef[Table]{bkgd:ar:usr:decay-parm} gives the reference for the decay phenomenon seen in parameter declarations. |
---|
[40ab446] | 269 | |
---|
[0554c1a] | 270 | \begin{table} |
---|
| 271 | \caption{Syntactic Reference for Decay during Parameter-Passing. |
---|
| 272 | Includes interaction with \lstinline{const}ness, where ``immutable'' refers to a restriction on the callee's ability.} |
---|
| 273 | \label{bkgd:ar:usr:decay-parm} |
---|
[266732e] | 274 | \centering |
---|
[40ab446] | 275 | \begin{tabular}{llllll} |
---|
[0554c1a] | 276 | & Description & Type & Parameter Declaration & \CFA \\ |
---|
| 277 | \hline |
---|
| 278 | & & & @T * x,@ & @* T x,@ \\ |
---|
| 279 | $\triangleright$ & pointer to value & @T *@ & @T x[10],@ & @[10] T x,@ \\ |
---|
| 280 | & & & @T x[],@ & @[] T x,@ \\ |
---|
| 281 | \hline |
---|
| 282 | & & & @T * const x,@ & @const * T x@ \\ |
---|
| 283 | & immutable ptr.\ to val. & @T * const@ & @T x[const 10],@ & @[const 10] T x,@ \\ |
---|
| 284 | & & & @T x[const],@ & @[const] T x,@\\ |
---|
| 285 | \hline |
---|
| 286 | & & & @const T * x,@ & @ * const T x,@ \\ |
---|
| 287 | & & & @T const * x,@ & @ * T const x,@ \\ |
---|
| 288 | & ptr.\ to immutable val. & @const T *@ & @const T x[10],@ & @[10] const T x,@ \\ |
---|
| 289 | & & @T const *@ & @T const x[10],@ & @[10] T const x,@ \\ |
---|
| 290 | & & & @const T x[],@ & @[] const T x,@ \\ |
---|
| 291 | & & & @T const x[],@ & @[] T const x,@ \\ |
---|
| 292 | \hline \hline |
---|
| 293 | & & & @T (*x)[10],@ & @* [10] T x,@ \\ |
---|
| 294 | $\triangleright$ & ptr.\ to ar.\ of val. & @T(*)[10]@ & @T x[3][10],@ & @[3][10] T x,@ \\ |
---|
| 295 | & & & @T x[][10],@ & @[][10] T x,@ \\ |
---|
| 296 | \hline |
---|
| 297 | & & & @T ** x,@ & @** T x,@ \\ |
---|
| 298 | & ptr.\ to ptr.\ to val. & @T **@ & @T * x[10],@ & @[10] * T x,@ \\ |
---|
| 299 | & & & @T * x[],@ & @[] * T x,@ \\ |
---|
| 300 | \hline |
---|
| 301 | & ptr.\ to ptr.\ to imm.\ val. & @const char **@ & @const char * argv[],@ & @[] * const char argv,@ \\ |
---|
| 302 | & & & \emph{others elided} & \emph{others elided} \\ |
---|
| 303 | \hline |
---|
[40ab446] | 304 | \end{tabular} |
---|
[0554c1a] | 305 | \end{table} |
---|
[40ab446] | 306 | |
---|
| 307 | |
---|
| 308 | \subsection{Lengths may vary, checking does not} |
---|
| 309 | |
---|
[0554c1a] | 310 | When the desired number of elements is unknown at compile time, a variable-length array is a solution: |
---|
[266732e] | 311 | \begin{cfa} |
---|
[0554c1a] | 312 | int main( int argc, const char * argv[] ) { |
---|
[266732e] | 313 | assert( argc == 2 ); |
---|
| 314 | size_t n = atol( argv[1] ); |
---|
[0554c1a] | 315 | assert( 0 < n ); |
---|
[b5bfb16] | 316 | float ar[n]; |
---|
[266732e] | 317 | float b[10]; |
---|
| 318 | // ... discussion continues here |
---|
| 319 | } |
---|
| 320 | \end{cfa} |
---|
[0554c1a] | 321 | This arrangement allocates @n@ elements on the @main@ stack frame for @ar@, called a \newterm{variable length array} (VLA), as well as 10 elements in the same stack frame for @b@. |
---|
| 322 | The variable-sized allocation of @ar@ is provided by the @alloca@ routine, which bumps the stack pointer. |
---|
[c4024b46] | 323 | Note, the C standard supports VLAs~\cite[\S~6.7.6.2.4]{C11} as a conditional feature, but the \CC standard does not; |
---|
[0554c1a] | 324 | both @gcc@ and @g++@ support VLAs. |
---|
| 325 | As well, there is misinformation about VLAs, \eg VLAs cause stack failures or are inefficient. |
---|
| 326 | VLAs exist as far back as Algol W~\cite[\S~5.2]{AlgolW} and are a sound and efficient data type. |
---|
| 327 | |
---|
[c4024b46] | 328 | For high-performance applications, the stack size can be fixed and small (coroutines or user-level threads). |
---|
| 329 | Here, VLAs can overflow the stack, so a heap allocation is used. |
---|
[266732e] | 330 | \begin{cfa} |
---|
[0554c1a] | 331 | float * ax1 = malloc( sizeof( float[n] ) ); |
---|
| 332 | float * ax2 = malloc( n * sizeof( float ) ); |
---|
| 333 | float * bx1 = malloc( sizeof( float[1000000] ) ); |
---|
| 334 | float * bx2 = malloc( 1000000 * sizeof( float ) ); |
---|
[266732e] | 335 | \end{cfa} |
---|
[40ab446] | 336 | |
---|
| 337 | Parameter dependency |
---|
| 338 | |
---|
| 339 | Checking is best-effort / unsound |
---|
| 340 | |
---|
| 341 | Limited special handling to get the dimension value checked (static) |
---|
| 342 | |
---|
| 343 | |
---|
[0554c1a] | 344 | \subsection{Dynamically sized, multidimensional arrays} |
---|
[40ab446] | 345 | |
---|
| 346 | In C and \CC, ``multidimensional array'' means ``array of arrays.'' Other meanings are discussed in TODO. |
---|
| 347 | |
---|
| 348 | Just as an array's element type can be @float@, so can it be @float[10]@. |
---|
| 349 | |
---|
[266732e] | 350 | While any of @float*@, @float[10]@ and @float(*)[10]@ are easy to tell apart from @float@, telling them apart from each other may need occasional reference back to TODO intro section. |
---|
[40ab446] | 351 | The sentence derived by wrapping each type in @-[3]@ follows. |
---|
| 352 | |
---|
| 353 | While any of @float*[3]@, @float[3][10]@ and @float(*)[3][10]@ are easy to tell apart from @float[3]@, |
---|
| 354 | telling them apart from each other is what it takes to know what ``array of arrays'' really means. |
---|
| 355 | |
---|
| 356 | Pointer decay affects the outermost array only |
---|
| 357 | |
---|
| 358 | TODO: unfortunate syntactic reference with these cases: |
---|
| 359 | |
---|
| 360 | \begin{itemize} |
---|
[266732e] | 361 | \item ar. of ar. of val (be sure about ordering of dimensions when the declaration is dropped) |
---|
| 362 | \item ptr. to ar. of ar. of val |
---|
[40ab446] | 363 | \end{itemize} |
---|
| 364 | |
---|
| 365 | |
---|
| 366 | \subsection{Arrays are (but) almost values} |
---|
| 367 | |
---|
| 368 | Has size; can point to |
---|
| 369 | |
---|
| 370 | Can't cast to |
---|
| 371 | |
---|
| 372 | Can't pass as value |
---|
| 373 | |
---|
| 374 | Can initialize |
---|
| 375 | |
---|
| 376 | Can wrap in aggregate |
---|
| 377 | |
---|
| 378 | Can't assign |
---|
| 379 | |
---|
| 380 | |
---|
| 381 | \subsection{Returning an array is (but) almost possible} |
---|
| 382 | |
---|
| 383 | |
---|
| 384 | \subsection{The pointer-to-array type has been noticed before} |
---|
| 385 | |
---|
[b64d0f4] | 386 | \subsection{Multi-Dimensional} |
---|
| 387 | |
---|
| 388 | As in the last section, we inspect the declaration ... |
---|
| 389 | \lstinput{16-18}{bkgd-carray-mdim.c} |
---|
[b5bfb16] | 390 | The significant axis of deriving expressions from @ar@ is now ``itself,'' ``first element'' or ``first grand-element (meaning, first element of first element).'' |
---|
[b64d0f4] | 391 | \lstinput{20-44}{bkgd-carray-mdim.c} |
---|
| 392 | |
---|
[40ab446] | 393 | |
---|
[f5fbcad] | 394 | \section{Linked List} |
---|
[40ab446] | 395 | |
---|
[c4024b46] | 396 | Linked-lists are blocks of storage connected using one or more pointers. |
---|
| 397 | The storage block is logically divided into data and links (pointers), where the links are the only component used by the list structure. |
---|
| 398 | Since the data is opaque, list structures are often polymorphic over the data, which is normally homogeneous. |
---|
| 399 | |
---|
[40ab446] | 400 | |
---|
[f5fbcad] | 401 | \section{String} |
---|
[c4024b46] | 402 | |
---|
| 403 | A string is a logical sequence of symbols, where the form of the symbols can vary significantly: 7/8-bit characters (ASCII/Latin-1), or 2/4/8-byte (UNICODE) characters/symbols or variable length (UTF-8/16/32) characters. |
---|
| 404 | A string can be read left-to-right, right-to-left, top-to-bottom, and have stacked elements (Arabic). |
---|
| 405 | |
---|
| 406 | An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in @'x'@. |
---|
| 407 | A wide character constant is the same, except prefixed by the letter @L@, @u@, or @U@. |
---|
| 408 | With a few exceptions detailed later, the elements of the sequence are any members of the source character set; |
---|
| 409 | they are mapped in an implementation-defined manner to members of the execution character set. |
---|
| 410 | |
---|
| 411 | A C character-string literal is a sequence of zero or more multibyte characters enclosed in double-quotes, as in @"xyz"@. |
---|
| 412 | A UTF-8 string literal is the same, except prefixed by @u8@. |
---|
| 413 | A wide string literal is the same, except prefixed by the letter @L@, @u@, or @U@. |
---|
| 414 | |
---|
| 415 | For UTF-8 string literals, the array elements have type @char@, and are initialized with the characters of the multibyte character sequence, as encoded in UTF-8. |
---|
| 416 | For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the sequence of wide characters corresponding to the multibyte character sequence, as defined by the @mbstowcs@ function with an implementation-defined current locale. |
---|
| 417 | For wide string literals prefixed by the letter @u@ or @U@, the array elements have type @char16_t@ or @char32_t@, respectively, and are initialized with the sequence of wide characters corresponding to the multibyte character sequence, as defined by successive calls to the @mbrtoc16@, or @mbrtoc32@ function as appropriate for its type, with an implementation-defined current locale. |
---|
| 418 | The value of a string literal containing a multibyte character or escape sequence not represented in the executioncharacter set is implementation-defined. |
---|
| 419 | |
---|
| 420 | |
---|
| 421 | Another bad C design decision is to have null-terminated strings rather than maintaining a separate string length. |
---|
| 422 | \begin{quote} |
---|
| 423 | Technically, a string is an array whose elements are single characters. |
---|
| 424 | The compiler automatically places the null character @\0@ at the end of each such string, so programs can conveniently find the end. |
---|
| 425 | This representation means that there is no real limit to how long a string can be, but programs have to scan one completely to determine its length. |
---|
| 426 | \end{quote} |
---|