[27f1055] | 1 | \chapter{Background} |
---|
| 2 | |
---|
[40ab446] | 3 | This chapter states facts about the prior work, upon which my contributions build. |
---|
| 4 | Each receives a justification of the extent to which its statement is phrased to provoke controversy or surprise. |
---|
[27f1055] | 5 | |
---|
[40ab446] | 6 | \section{C} |
---|
| 7 | |
---|
| 8 | \subsection{Common knowledge} |
---|
| 9 | |
---|
| 10 | The reader is assumed to have used C or \CC for the coursework of at least four university-level courses, or have equivalent experience. |
---|
| 11 | The current discussion introduces facts, unaware of which, such a functioning novice may be operating. |
---|
| 12 | |
---|
| 13 | % TODO: decide if I'm also claiming this collection of facts, and test-oriented presentation is a contribution; if so, deal with (not) arguing for its originality |
---|
| 14 | |
---|
| 15 | \subsection{Convention: C is more touchable than its standard} |
---|
| 16 | |
---|
| 17 | When it comes to explaining how C works, I like illustrating definite program semantics. |
---|
| 18 | I prefer doing so, over a quoting manual's suggested programmer's intuition, or showing how some compiler writers chose to model their problem. |
---|
| 19 | To illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and I show its behaviour. |
---|
| 20 | |
---|
| 21 | This behaviour is typically one of |
---|
| 22 | \begin{itemize} |
---|
[266732e] | 23 | \item my statement that the compiler accepts or rejects the program |
---|
| 24 | \item the program's printed output, which I show |
---|
| 25 | \item my implied assurance that its assertions do not fail when run |
---|
[40ab446] | 26 | \end{itemize} |
---|
| 27 | |
---|
| 28 | The compiler whose program semantics is shown is |
---|
[266732e] | 29 | \begin{cfa} |
---|
[40ab446] | 30 | $ gcc --version |
---|
| 31 | gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 |
---|
[266732e] | 32 | \end{cfa} |
---|
[40ab446] | 33 | running on Architecture @x86_64@, with the same environment targeted. |
---|
| 34 | |
---|
| 35 | Unless explicit discussion ensues about differences among compilers or with (versions of) the standard, it is further implied that there exists a second version of GCC and some version of Clang, running on and for the same platform, that give substantially similar behaviour. |
---|
| 36 | In this case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard. |
---|
| 37 | |
---|
| 38 | |
---|
| 39 | \subsection{C reports many ill-typed expressions as warnings} |
---|
| 40 | |
---|
[266732e] | 41 | These attempts to assign @y@ to @x@ and vice-versa are obviously ill-typed. |
---|
| 42 | \lstinput{12-15}{bkgd-c-tyerr.c} |
---|
| 43 | with warnings: |
---|
| 44 | \begin{cfa} |
---|
| 45 | warning: assignment to 'float *' from incompatible pointer type 'void (*)(void)' |
---|
| 46 | warning: assignment to 'void (*)(void)' from incompatible pointer type 'float *' |
---|
| 47 | \end{cfa} |
---|
| 48 | Similarly, |
---|
| 49 | \lstinput{17-19}{bkgd-c-tyerr.c} |
---|
| 50 | with warning: |
---|
| 51 | \begin{cfa} |
---|
| 52 | warning: passing argument 1 of 'f' from incompatible pointer type |
---|
| 53 | note: expected 'void (*)(void)' but argument is of type 'float *' |
---|
| 54 | \end{cfa} |
---|
| 55 | with a segmentation fault at runtime. |
---|
| 56 | |
---|
| 57 | That @f@'s attempt to call @g@ fails is not due to 3.14 being a particularly unlucky choice of value to put in the variable @pi@. |
---|
| 58 | Rather, it is because obtaining a program that includes this essential fragment, yet exhibits a behaviour other than "doomed to crash," is a matter for an obfuscated coding competition. |
---|
| 59 | |
---|
| 60 | A "tractable syntactic method for proving the absence of certain program behaviours by classifying phrases according to the kinds of values they compute"*1 rejected the program. |
---|
| 61 | The behaviour (whose absence is unprovable) is neither minor nor unlikely. |
---|
| 62 | The rejection shows that the program is ill-typed. |
---|
| 63 | |
---|
| 64 | Yet, the rejection presents as a GCC warning. |
---|
| 65 | |
---|
| 66 | In the discussion following, ``ill-typed'' means giving a nonzero @gcc -Werror@ exit condition with a message that discusses typing. |
---|
| 67 | |
---|
| 68 | *1 TAPL-pg1 definition of a type system |
---|
[40ab446] | 69 | |
---|
| 70 | |
---|
| 71 | \section{C Arrays} |
---|
| 72 | |
---|
| 73 | \subsection{C has an array type (!)} |
---|
| 74 | |
---|
[266732e] | 75 | When a programmer works with an array, C semantics provide access to a type that is different in every way from ``pointer to its first element.'' |
---|
| 76 | Its qualities become apparent by inspecting the declaration |
---|
| 77 | \lstinput{34-34}{bkgd-carray-arrty.c} |
---|
| 78 | The inspection begins by using @sizeof@ to provide definite program semantics for the intuition of an expression's type. |
---|
| 79 | Assuming a target platform keeps things concrete: |
---|
| 80 | \lstinput{35-36}{bkgd-carray-arrty.c} |
---|
| 81 | Consider the sizes of expressions derived from @a@, modified by adding ``pointer to'' and ``first element'' (and including unnecessary parentheses to avoid confusion about precedence). |
---|
| 82 | \lstinput{37-40}{bkgd-carray-arrty.c} |
---|
| 83 | That @a@ takes up 40 bytes is common reasoning for C programmers. |
---|
| 84 | Set aside for a moment the claim that this first assertion is giving information about a type. |
---|
| 85 | For now, note that an array and a pointer to its first element are, sometimes, different things. |
---|
| 86 | |
---|
| 87 | The idea that there is such a thing as a pointer to an array may be surprising. |
---|
| 88 | It is not the same thing as a pointer to the first element: |
---|
| 89 | \lstinput{42-45}{bkgd-carray-arrty.c} |
---|
| 90 | The first gets |
---|
| 91 | \begin{cfa} |
---|
| 92 | warning: assignment to `float (*)[10]' from incompatible pointer type `float *' |
---|
| 93 | \end{cfa} |
---|
| 94 | and the second gets the opposite. |
---|
| 95 | |
---|
| 96 | We now refute a concern that @sizeof(a)@ is reporting on special knowledge from @a@ being an local variable, |
---|
| 97 | say that it is informing about an allocation, rather than simply a type. |
---|
| 98 | |
---|
| 99 | First, recognizing that @sizeof@ has two forms, one operating on an expression, the other on a type, we observe that the original answers are unaffected by using the type-parameterized form: |
---|
| 100 | \lstinput{46-50}{bkgd-carray-arrty.c} |
---|
| 101 | Finally, the same sizing is reported when there is no allocation at all, and we launch the analysis instead from the pointer-to-array type. |
---|
| 102 | \lstinput{51-57}{bkgd-carray-arrty.c} |
---|
| 103 | So, in spite of considerable programmer success enabled by an understanding that an array just a pointer to its first element (revisited TODO pointer decay), this understanding is simplistic. |
---|
| 104 | |
---|
| 105 | A shortened form for declaring local variables exists, provided that length information is given in the initializer: |
---|
| 106 | \lstinput{59-63}{bkgd-carray-arrty.c} |
---|
| 107 | In these declarations, the resulting types are both arrays, but their lengths are inferred. |
---|
| 108 | |
---|
| 109 | \begin{tabular}{lllllll} |
---|
| 110 | @float x;@ & $\rightarrow$ & (base element) & @float@ & @float x;@ & @[ float ]@ & @[ float ]@ \\ |
---|
| 111 | @float * x;@ & $\rightarrow$ & pointer & @float *@ & @float * x;@ & @[ * float ]@ & @[ * float ]@ \\ |
---|
| 112 | @float x[10];@ & $\rightarrow$ & array & @float[10]@ & @float x[10];@ & @[ [10] float ]@ & @[ array(float, 10) ]@ \\ |
---|
| 113 | @float *x[10];@ & $\rightarrow$ & array of pointers & @(float*)[10]@ & @float *x[10];@ & @[ [10] * float ]@ & @[ array(*float, 10) ]@ \\ |
---|
| 114 | @float (*x)[10];@ & $\rightarrow$ & pointer to array & @float(*)[10]@ & @float (*x)[10];@ & @[ * [10] float ]@ & @[ * array(float, 10) ]@ \\ |
---|
| 115 | @float *(*x5)[10];@ & $\rightarrow$ & pointer to array & @(float*)(*)[10]@ & @float *(*x)[10];@ & @[ * [10] * float ]@ & @[ * array(*float, 10) ]@ |
---|
| 116 | \end{tabular} |
---|
| 117 | \begin{cfa} |
---|
| 118 | x5 = (float*(*)[10]) x4; |
---|
| 119 | // x5 = (float(*)[10]) x4; // wrong target type; meta test suggesting above cast uses correct type |
---|
| 120 | |
---|
| 121 | // [here] |
---|
| 122 | // const |
---|
| 123 | |
---|
| 124 | // [later] |
---|
| 125 | // static |
---|
| 126 | // star as dimension |
---|
| 127 | // under pointer decay: int p1[const 3] being int const *p1 |
---|
| 128 | |
---|
| 129 | const float * y1; |
---|
| 130 | float const * y2; |
---|
| 131 | float * const y3; |
---|
| 132 | |
---|
| 133 | y1 = 0; |
---|
| 134 | y2 = 0; |
---|
| 135 | // y3 = 0; // bad |
---|
| 136 | |
---|
| 137 | // *y1 = 3.14; // bad |
---|
| 138 | // *y2 = 3.14; // bad |
---|
| 139 | *y3 = 3.14; |
---|
| 140 | |
---|
| 141 | const float z1 = 1.414; |
---|
| 142 | float const z2 = 1.414; |
---|
| 143 | |
---|
| 144 | // z1 = 3.14; // bad |
---|
| 145 | // z2 = 3.14; // bad |
---|
| 146 | |
---|
| 147 | |
---|
| 148 | } |
---|
| 149 | |
---|
| 150 | #define T float |
---|
| 151 | void stx2() { const T x[10]; |
---|
| 152 | // x[5] = 3.14; // bad |
---|
| 153 | } |
---|
| 154 | void stx3() { T const x[10]; |
---|
| 155 | // x[5] = 3.14; // bad |
---|
| 156 | } |
---|
| 157 | \end{cfa} |
---|
[40ab446] | 158 | |
---|
| 159 | My contribution is enabled by recognizing |
---|
| 160 | \begin{itemize} |
---|
[266732e] | 161 | \item There is value in using a type that knows how big the whole thing is. |
---|
| 162 | \item The type pointer to (first) element does not. |
---|
| 163 | \item C \emph{has} a type that knows the whole picture: array, e.g. @T[10]@. |
---|
| 164 | \item This type has all the usual derived forms, which also know the whole picture. A usefully noteworthy example is pointer to array, e.g. @T(*)[10]@. |
---|
[40ab446] | 165 | \end{itemize} |
---|
| 166 | |
---|
| 167 | Each of these sections, which introduces another layer of of the C arrays' story, |
---|
| 168 | concludes with an \emph{Unfortunate Syntactic Reference}. |
---|
| 169 | It shows how to spell the types under discussion, |
---|
| 170 | along with interactions with orthogonal (but easily confused) language features. |
---|
| 171 | Alterrnate spellings are listed withing a row. |
---|
| 172 | The simplest occurrences of types distinguished in the preceding discussion are marked with $\triangleright$. |
---|
| 173 | The Type column gives the spelling used in a cast or error message (though note Section TODO points out that some types cannot be casted to). |
---|
| 174 | The Declaration column gives the spelling used in an object declaration, such as variable or aggregate member; parameter declarations (section TODO) follow entirely different rules. |
---|
| 175 | |
---|
| 176 | After all, reading a C array type is easy: just read it from the inside out, and know when to look left and when to look right! |
---|
| 177 | |
---|
| 178 | |
---|
| 179 | \CFA-specific spellings (not yet introduced) are also included here for referenceability; these can be skipped on linear reading. |
---|
| 180 | The \CFA-C column gives the, more fortunate, ``new'' syntax of section TODO, for spelling \emph{exactly the same type}. |
---|
| 181 | This fortunate syntax does not have different spellings for types vs declarations; |
---|
| 182 | a declaration is always the type followed by the declared identifier name; |
---|
| 183 | for the example of letting @x@ be a \emph{pointer to array}, the declaration is spelled: |
---|
[266732e] | 184 | \begin{cfa} |
---|
[40ab446] | 185 | [ * [10] T ] x; |
---|
[266732e] | 186 | \end{cfa} |
---|
[40ab446] | 187 | The \CFA-Full column gives the spelling of a different type, introduced in TODO, which has all of my contributed improvements for safety and ergonomics. |
---|
| 188 | |
---|
| 189 | \noindent |
---|
| 190 | \textbf{Unfortunate Syntactic Reference} |
---|
| 191 | |
---|
[266732e] | 192 | \begin{figure} |
---|
| 193 | \centering |
---|
| 194 | \setlength{\tabcolsep}{3pt} |
---|
[40ab446] | 195 | \begin{tabular}{llllll} |
---|
[266732e] | 196 | & Description & Type & Declaration & \CFA-C & \CFA-Full \\ \hline |
---|
| 197 | $\triangleright$ & val. |
---|
| 198 | & @T@ |
---|
| 199 | & @T x;@ |
---|
| 200 | & @[ T ]@ |
---|
| 201 | & |
---|
| 202 | \\ \hline |
---|
| 203 | & \pbox{20cm}{ \vspace{2pt} val.\\ \footnotesize{no writing the val.\ in \lstinline{x}} }\vspace{2pt} |
---|
| 204 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T} \\ \lstinline{T const} } |
---|
| 205 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T x;} \\ \lstinline{T const x;} } |
---|
| 206 | & @[ const T ]@ |
---|
| 207 | & |
---|
| 208 | \\ \hline \hline |
---|
| 209 | $\triangleright$ & ptr.\ to val. |
---|
| 210 | & @T *@ |
---|
| 211 | & @T * x;@ |
---|
| 212 | & @[ * T ]@ |
---|
| 213 | & |
---|
| 214 | \\ \hline |
---|
| 215 | & \pbox{20cm}{ \vspace{2pt} ptr.\ to val.\\ \footnotesize{no writing the ptr.\ in \lstinline{x}} }\vspace{2pt} |
---|
| 216 | & @T * const@ |
---|
| 217 | & @T * const x;@ |
---|
| 218 | & @[ const * T ]@ |
---|
| 219 | & |
---|
| 220 | \\ \hline |
---|
| 221 | & \pbox{20cm}{ \vspace{2pt} ptr.\ to val.\\ \footnotesize{no writing the val.\ in \lstinline{*x}} }\vspace{2pt} |
---|
| 222 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T *} \\ \lstinline{T const *} } |
---|
| 223 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T * x;} \\ \lstinline{T const * x;} } |
---|
| 224 | & @[ * const T ]@ |
---|
| 225 | & |
---|
| 226 | \\ \hline \hline |
---|
| 227 | $\triangleright$ & ar.\ of val. |
---|
| 228 | & @T[10]@ |
---|
| 229 | & @T x[10];@ |
---|
| 230 | & @[ [10] T ]@ |
---|
| 231 | & @[ array(T, 10) ]@ |
---|
| 232 | \\ \hline |
---|
| 233 | & \pbox{20cm}{ \vspace{2pt} ar.\ of val.\\ \footnotesize{no writing the val.\ in \lstinline{x[5]}} }\vspace{2pt} |
---|
| 234 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T[10]} \\ \lstinline{T const[10]} } |
---|
| 235 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T x[10];} \\ \lstinline{T const x[10];} } |
---|
| 236 | & @[ [10] const T ]@ |
---|
| 237 | & @[ const array(T, 10) ]@ |
---|
| 238 | \\ \hline |
---|
| 239 | & ar.\ of ptr.\ to val. |
---|
| 240 | & @T*[10]@ |
---|
| 241 | & @T *x[10];@ |
---|
| 242 | & @[ [10] * T ]@ |
---|
| 243 | & @[ array(* T, 10) ]@ |
---|
| 244 | \\ \hline |
---|
| 245 | & \pbox{20cm}{ \vspace{2pt} ar.\ of ptr.\ to val.\\ \footnotesize{no writing the ptr.\ in \lstinline{x[5]}} }\vspace{2pt} |
---|
| 246 | & @T * const [10]@ |
---|
| 247 | & @T * const x[10];@ |
---|
| 248 | & @[ [10] const * T ]@ |
---|
| 249 | & @[ array(const * T, 10) ]@ |
---|
| 250 | \\ \hline |
---|
| 251 | & \pbox{20cm}{ \vspace{2pt} ar.\ of ptr.\ to val.\\ \footnotesize{no writing the val.\ in \lstinline{*(x[5])}} }\vspace{2pt} |
---|
| 252 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T * [10]} \\ \lstinline{T const * [10]} } |
---|
| 253 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T * x[10];} \\ \lstinline{T const * x[10];} } |
---|
| 254 | & @[ [10] * const T ]@ |
---|
| 255 | & @[ array(* const T, 10) ]@ |
---|
| 256 | \\ \hline \hline |
---|
| 257 | $\triangleright$ & ptr.\ to ar.\ of val. |
---|
| 258 | & @T(*)[10]@ |
---|
| 259 | & @T (*x)[10];@ |
---|
| 260 | & @[ * [10] T ]@ |
---|
| 261 | & @[ * array(T, 10) ]@ |
---|
| 262 | \\ \hline |
---|
| 263 | & \pbox{20cm}{ \vspace{2pt} ptr.\ to ar.\ of val.\\ \footnotesize{no writing the ptr.\ in \lstinline{x}} }\vspace{2pt} |
---|
| 264 | & @T(* const)[10]@ |
---|
| 265 | & @T (* const x)[10];@ |
---|
| 266 | & @[ const * [10] T ]@ |
---|
| 267 | & @[ const * array(T, 10) ]@ |
---|
| 268 | \\ \hline |
---|
| 269 | & \pbox{20cm}{ \vspace{2pt} ptr.\ to ar.\ of val.\\ \footnotesize{no writing the val.\ in \lstinline{(*x)[5]}} }\vspace{2pt} |
---|
| 270 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T(*)[10]} \\ \lstinline{T const (*) [10]} } |
---|
| 271 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T (*x)[10];} \\ \lstinline{T const (*x)[10];} } |
---|
| 272 | & @[ * [10] const T ]@ |
---|
| 273 | & @[ * const array(T, 10) ]@ |
---|
| 274 | \\ \hline |
---|
| 275 | & ptr.\ to ar.\ of ptr.\ to val. |
---|
| 276 | & @T*(*)[10]@ |
---|
| 277 | & @T *(*x)[10];@ |
---|
| 278 | & @[ * [10] * T ]@ |
---|
| 279 | & @[ * array(* T, 10) ]@ |
---|
| 280 | \\ \hline |
---|
[40ab446] | 281 | \end{tabular} |
---|
[266732e] | 282 | \caption{Figure} |
---|
| 283 | \end{figure} |
---|
[40ab446] | 284 | |
---|
| 285 | |
---|
| 286 | \subsection{Arrays decay and pointers diffract} |
---|
| 287 | |
---|
[266732e] | 288 | The last section established the difference between these four types: |
---|
| 289 | \lstinput{3-6}{bkgd-carray-decay.c} |
---|
| 290 | But the expression used for obtaining the pointer to the first element is pedantic. |
---|
| 291 | The root of all C programmer experience with arrays is the shortcut |
---|
| 292 | \lstinput{8-8}{bkgd-carray-decay.c} |
---|
| 293 | which reproduces @pa0@, in type and value: |
---|
| 294 | \lstinput{9-9}{bkgd-carray-decay.c} |
---|
| 295 | The validity of this initialization is unsettling, in the context of the facts established in the last section. |
---|
| 296 | Notably, it initializes name @pa0x@ from expression @a@, when they are not of the same type: |
---|
| 297 | \lstinput{10-10}{bkgd-carray-decay.c} |
---|
[40ab446] | 298 | |
---|
| 299 | So, C provides an implicit conversion from @float[10]@ to @float*@, as described in ARM-6.3.2.1.3: |
---|
| 300 | \begin{quote} |
---|
[266732e] | 301 | Except when it is the operand of the @sizeof@ operator, or the unary @&@ operator, or is a |
---|
| 302 | string literal used to initialize an array |
---|
| 303 | an expression that has type ``array of type'' is |
---|
| 304 | converted to an expression with type ``pointer to type'' that points to the initial element of |
---|
| 305 | the array object |
---|
[40ab446] | 306 | \end{quote} |
---|
| 307 | |
---|
| 308 | This phenomenon is the famous ``pointer decay,'' which is a decay of an array-typed expression into a pointer-typed one. |
---|
| 309 | |
---|
| 310 | It is worthy to note that the list of exception cases does not feature the occurrence of @a@ in @a[i]@. |
---|
| 311 | Thus, subscripting happens on pointers, not arrays. |
---|
| 312 | |
---|
| 313 | Subscripting proceeds first with pointer decay, if needed. Next, ARM-6.5.2.1.2 explains that @a[i]@ is treated as if it were @(*((a)+(i)))@. |
---|
| 314 | ARM-6.5.6.8 explains that the addition, of a pointer with an integer type, is defined only when the pointer refers to an element that is in an array, with a meaning of ``@i@ elements away from,'' which is valid if @a@ is big enough and @i@ is small enough. |
---|
| 315 | Finally, ARM-6.5.3.2.4 explains that the @*@ operator's result is the referenced element. |
---|
| 316 | |
---|
| 317 | Taken together, these rules also happen to illustrate that @a[i]@ and @i[a]@ mean the same thing. |
---|
| 318 | |
---|
| 319 | Subscripting a pointer when the target is standard-inappropriate is still practically well-defined. |
---|
| 320 | While the standard affords a C compiler freedom about the meaning of an out-of-bound access, |
---|
| 321 | or of subscripting a pointer that does not refer to an array element at all, |
---|
| 322 | the fact that C is famously both generally high-performance, and specifically not bound-checked, |
---|
| 323 | leads to an expectation that the runtime handling is uniform across legal and illegal accesses. |
---|
| 324 | Moreover, consider the common pattern of subscripting on a malloc result: |
---|
[266732e] | 325 | \begin{cfa} |
---|
| 326 | float * fs = malloc( 10 * sizeof(float) ); |
---|
| 327 | fs[5] = 3.14; |
---|
| 328 | \end{cfa} |
---|
[40ab446] | 329 | The @malloc@ behaviour is specified as returning a pointer to ``space for an object whose size is'' as requested (ARM-7.22.3.4.2). |
---|
| 330 | But program says \emph{nothing} more about this pointer value, that might cause its referent to \emph{be} an array, before doing the subscript. |
---|
| 331 | |
---|
| 332 | Under this assumption, a pointer being subscripted (or added to, then dereferenced) |
---|
| 333 | by any value (positive, zero, or negative), gives a view of the program's entire address space, |
---|
| 334 | centred around the @p@ address, divided into adjacent @sizeof(*p)@ chunks, |
---|
| 335 | each potentially (re)interpreted as @typeof(*p)@. |
---|
| 336 | |
---|
| 337 | I call this phenomenon ``array diffraction,'' which is a diffraction of a single-element pointer |
---|
| 338 | into the assumption that its target is in the middle of an array whose size is unlimited in both directions. |
---|
| 339 | |
---|
| 340 | No pointer is exempt from array diffraction. |
---|
| 341 | |
---|
| 342 | No array shows its elements without pointer decay. |
---|
| 343 | |
---|
| 344 | A further pointer--array confusion, closely related to decay, occurs in parameter declarations. |
---|
| 345 | ARM-6.7.6.3.7 explains that when an array type is written for a parameter, |
---|
| 346 | the parameter's type becomes a type that I summarize as being the array-decayed type. |
---|
| 347 | The respective handlings of the following two parameter spellings shows that the array-spelled one is really, like the other, a pointer. |
---|
[266732e] | 348 | \lstinput{12-16}{bkgd-carray-decay.c} |
---|
[40ab446] | 349 | As the @sizeof(x)@ meaning changed, compared with when run on a similarly-spelled local variariable declaration, |
---|
| 350 | GCC also gives this code the warning: ```sizeof' on array function parameter `x' will return size of `float *'.'' |
---|
| 351 | |
---|
| 352 | The caller of such a function is left with the reality that a pointer parameter is a pointer, no matter how it's spelled: |
---|
[266732e] | 353 | \lstinput{18-21}{bkgd-carray-decay.c} |
---|
[40ab446] | 354 | This fragment gives no warnings. |
---|
| 355 | |
---|
| 356 | The shortened parameter syntax @T x[]@ is a further way to spell ``pointer.'' |
---|
| 357 | Note the opposite meaning of this spelling now, compared with its use in local variable declarations. |
---|
| 358 | This point of confusion is illustrated in: |
---|
[266732e] | 359 | \lstinput{23-30}{bkgd-carray-decay.c} |
---|
[40ab446] | 360 | The basic two meanings, with a syntactic difference helping to distinguish, |
---|
| 361 | are illustrated in the declarations of @ca@ vs.\ @cp@, |
---|
| 362 | whose subsequent @edit@ calls behave differently. |
---|
| 363 | The syntax-caused confusion is in the comparison of the first and last lines, |
---|
| 364 | both of which use a literal to initialze an object decalared with spelling @T x[]@. |
---|
| 365 | But these initialized declarations get opposite meanings, |
---|
| 366 | depending on whether the object is a local variable or a parameter. |
---|
| 367 | |
---|
| 368 | |
---|
| 369 | In sumary, when a funciton is written with an array-typed parameter, |
---|
| 370 | \begin{itemize} |
---|
[266732e] | 371 | \item an appearance of passing an array by value is always an incorrect understanding |
---|
| 372 | \item a dimension value, if any is present, is ignorred |
---|
| 373 | \item pointer decay is forced at the call site and the callee sees the parameter having the decayed type |
---|
[40ab446] | 374 | \end{itemize} |
---|
| 375 | |
---|
| 376 | Pointer decay does not affect pointer-to-array types, because these are already pointers, not arrays. |
---|
| 377 | As a result, a function with a pointer-to-array parameter sees the parameter exactly as the caller does: |
---|
[266732e] | 378 | \lstinput{32-42}{bkgd-carray-decay.c} |
---|
[40ab446] | 379 | |
---|
| 380 | \noindent |
---|
| 381 | \textbf{Unfortunate Syntactic Reference} |
---|
| 382 | |
---|
| 383 | \noindent |
---|
| 384 | (Parameter declaration; ``no writing'' refers to the callee's ability) |
---|
| 385 | |
---|
[266732e] | 386 | \begin{figure} |
---|
| 387 | \centering |
---|
[40ab446] | 388 | \begin{tabular}{llllll} |
---|
[266732e] | 389 | & Description & Type & Param. Decl & \CFA-C \\ \hline |
---|
| 390 | $\triangleright$ & ptr.\ to val. |
---|
| 391 | & @T *@ |
---|
| 392 | & \pbox{20cm}{ \vspace{2pt} \lstinline{T * x,} \\ \lstinline{T x[10],} \\ \lstinline{T x[],} }\vspace{2pt} |
---|
| 393 | & \pbox{20cm}{ \vspace{2pt} \lstinline{[ * T ]} \\ \lstinline{[ [10] T ]} \\ \lstinline{[ [] T ]} } |
---|
| 394 | \\ \hline |
---|
| 395 | & \pbox{20cm}{ \vspace{2pt} ptr.\ to val.\\ \footnotesize{no writing the ptr.\ in \lstinline{x}} }\vspace{2pt} |
---|
| 396 | & @T * const@ |
---|
| 397 | & \pbox{20cm}{ \vspace{2pt} \lstinline{T * const x,} \\ \lstinline{T x[const 10],} \\ \lstinline{T x[const],} }\vspace{2pt} |
---|
| 398 | & \pbox{20cm}{ \vspace{2pt} \lstinline{[ const * T ]} \\ \lstinline{[ [const 10] T ]} \\ \lstinline{[ [const] T ]} } |
---|
| 399 | \\ \hline |
---|
| 400 | & \pbox{20cm}{ \vspace{2pt} ptr.\ to val.\\ \footnotesize{no writing the val.\ in \lstinline{*x}} }\vspace{2pt} |
---|
| 401 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T *} \\ \lstinline{T const *} } |
---|
| 402 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T * x,} \\ \lstinline{T const * x,} \\ \lstinline{const T x[10],} \\ \lstinline{T const x[10],} \\ \lstinline{const T x[],} \\ \lstinline{T const x[],} }\vspace{2pt} |
---|
| 403 | & \pbox{20cm}{ \vspace{2pt} \lstinline{[* const T]} \\ \lstinline{[ [10] const T ]} \\ \lstinline{[ [] const T ]} } |
---|
| 404 | \\ \hline \hline |
---|
| 405 | $\triangleright$ & ptr.\ to ar.\ of val. |
---|
| 406 | & @T(*)[10]@ |
---|
| 407 | & \pbox{20cm}{ \vspace{2pt} \lstinline{T (*x)[10],} \\ \lstinline{T x[3][10],} \\ \lstinline{T x[][10],} }\vspace{2pt} |
---|
| 408 | & \pbox{20cm}{ \vspace{2pt} \lstinline{[* [10] T]} \\ \lstinline{[ [3] [10] T ]} \\ \lstinline{[ [] [10] T ]} } |
---|
| 409 | \\ \hline |
---|
| 410 | & ptr.\ to ptr.\ to val. |
---|
| 411 | & @T **@ |
---|
| 412 | & \pbox{20cm}{ \vspace{2pt} \lstinline{T ** x,} \\ \lstinline{T *x[10],} \\ \lstinline{T *x[],} }\vspace{2pt} |
---|
| 413 | & \pbox{20cm}{ \vspace{2pt} \lstinline{[ * * T ]} \\ \lstinline{[ [10] * T ]} \\ \lstinline{[ [] * T ]} } |
---|
| 414 | \\ \hline |
---|
| 415 | & \pbox{20cm}{ \vspace{2pt} ptr.\ to ptr.\ to val.\\ \footnotesize{no writing the val.\ in \lstinline{**argv}} }\vspace{2pt} |
---|
| 416 | & @const char **@ |
---|
| 417 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const char *argv[],} \\ \footnotesize{(others elided)} }\vspace{2pt} |
---|
| 418 | & \pbox{20cm}{ \vspace{2pt} \lstinline{[ [] * const char ]} \\ \footnotesize{(others elided)} } |
---|
| 419 | \\ \hline |
---|
[40ab446] | 420 | \end{tabular} |
---|
[266732e] | 421 | \caption{Figure} |
---|
| 422 | \end{figure} |
---|
[40ab446] | 423 | |
---|
| 424 | |
---|
| 425 | \subsection{Lengths may vary, checking does not} |
---|
| 426 | |
---|
| 427 | When the desired number of elements is unknown at compile time, |
---|
| 428 | a variable-length array is a solution: |
---|
[266732e] | 429 | \begin{cfa} |
---|
| 430 | int main( int argc, const char *argv[] ) { |
---|
| 431 | assert( argc == 2 ); |
---|
| 432 | size_t n = atol( argv[1] ); |
---|
| 433 | assert( 0 < n && n < 1000 ); |
---|
| 434 | |
---|
| 435 | float a[n]; |
---|
| 436 | float b[10]; |
---|
| 437 | |
---|
| 438 | // ... discussion continues here |
---|
| 439 | } |
---|
| 440 | \end{cfa} |
---|
| 441 | This arrangement allocates @n@ elements on the @main@ stack frame for @a@, just as it puts 10 elements on the @main@ stack frame for @b@. |
---|
[40ab446] | 442 | The variable-sized allocation of @a@ is provided by @alloca@. |
---|
| 443 | |
---|
[266732e] | 444 | In a situation where the array sizes are not known to be small enough for stack allocation to be sensible, corresponding heap allocations are achievable as: |
---|
| 445 | \begin{cfa} |
---|
| 446 | float *ax1 = malloc( sizeof( float[n] ) ); |
---|
| 447 | float *ax2 = malloc( n * sizeof( float ) ); |
---|
| 448 | float *bx1 = malloc( sizeof( float[1000000] ) ); |
---|
| 449 | float *bx2 = malloc( 1000000 * sizeof( float ) ); |
---|
| 450 | \end{cfa} |
---|
[40ab446] | 451 | |
---|
| 452 | |
---|
| 453 | VLA |
---|
| 454 | |
---|
| 455 | Parameter dependency |
---|
| 456 | |
---|
| 457 | Checking is best-effort / unsound |
---|
| 458 | |
---|
| 459 | Limited special handling to get the dimension value checked (static) |
---|
| 460 | |
---|
| 461 | |
---|
| 462 | |
---|
| 463 | \subsection{C has full-service, dynamically sized, multidimensional arrays (and \CC does not)} |
---|
| 464 | |
---|
| 465 | In C and \CC, ``multidimensional array'' means ``array of arrays.'' Other meanings are discussed in TODO. |
---|
| 466 | |
---|
| 467 | Just as an array's element type can be @float@, so can it be @float[10]@. |
---|
| 468 | |
---|
[266732e] | 469 | While any of @float*@, @float[10]@ and @float(*)[10]@ are easy to tell apart from @float@, telling them apart from each other may need occasional reference back to TODO intro section. |
---|
[40ab446] | 470 | The sentence derived by wrapping each type in @-[3]@ follows. |
---|
| 471 | |
---|
| 472 | While any of @float*[3]@, @float[3][10]@ and @float(*)[3][10]@ are easy to tell apart from @float[3]@, |
---|
| 473 | telling them apart from each other is what it takes to know what ``array of arrays'' really means. |
---|
| 474 | |
---|
| 475 | Pointer decay affects the outermost array only |
---|
| 476 | |
---|
| 477 | TODO: unfortunate syntactic reference with these cases: |
---|
| 478 | |
---|
| 479 | \begin{itemize} |
---|
[266732e] | 480 | \item ar. of ar. of val (be sure about ordering of dimensions when the declaration is dropped) |
---|
| 481 | \item ptr. to ar. of ar. of val |
---|
[40ab446] | 482 | \end{itemize} |
---|
| 483 | |
---|
| 484 | |
---|
| 485 | \subsection{Arrays are (but) almost values} |
---|
| 486 | |
---|
| 487 | Has size; can point to |
---|
| 488 | |
---|
| 489 | Can't cast to |
---|
| 490 | |
---|
| 491 | Can't pass as value |
---|
| 492 | |
---|
| 493 | Can initialize |
---|
| 494 | |
---|
| 495 | Can wrap in aggregate |
---|
| 496 | |
---|
| 497 | Can't assign |
---|
| 498 | |
---|
| 499 | |
---|
| 500 | \subsection{Returning an array is (but) almost possible} |
---|
| 501 | |
---|
| 502 | |
---|
| 503 | \subsection{The pointer-to-array type has been noticed before} |
---|
| 504 | |
---|
[b64d0f4] | 505 | \subsection{Multi-Dimensional} |
---|
| 506 | |
---|
| 507 | As in the last section, we inspect the declaration ... |
---|
| 508 | \lstinput{16-18}{bkgd-carray-mdim.c} |
---|
| 509 | The significant axis of deriving expressions from @a@ is now ``itself,'' ``first element'' or ``first grand-element (meaning, first element of first element).'' |
---|
| 510 | \lstinput{20-44}{bkgd-carray-mdim.c} |
---|
| 511 | |
---|
[40ab446] | 512 | |
---|
| 513 | \section{\CFA} |
---|
| 514 | |
---|
| 515 | Traditionally, fixing C meant leaving the C-ism alone, while providing a better alternative beside it. |
---|
| 516 | (For later: That's what I offer with array.hfa, but in the future-work vision for arrays, the fix includes helping programmers stop accidentally using a broken C-ism.) |
---|
| 517 | |
---|
| 518 | \subsection{\CFA features interacting with arrays} |
---|
| 519 | |
---|
| 520 | Prior work on \CFA included making C arrays, as used in C code from the wild, |
---|
| 521 | work, if this code is fed into @cfacc@. |
---|
| 522 | The quality of this this treatment was fine, with no more or fewer bugs than is typical. |
---|
| 523 | |
---|
| 524 | More mixed results arose with feeding these ``C'' arrays into preexisting \CFA features. |
---|
| 525 | |
---|
| 526 | A notable success was with the \CFA @alloc@ function, |
---|
| 527 | which type information associated with a polymorphic return type |
---|
| 528 | replaces @malloc@'s use of programmer-supplied size information. |
---|
[266732e] | 529 | \begin{cfa} |
---|
| 530 | // C, library |
---|
| 531 | void * malloc( size_t ); |
---|
| 532 | // C, user |
---|
| 533 | struct tm * el1 = malloc( sizeof(struct tm) ); |
---|
| 534 | struct tm * ar1 = malloc( 10 * sizeof(struct tm) ); |
---|
| 535 | |
---|
| 536 | // CFA, library |
---|
| 537 | forall( T * ) T * alloc(); |
---|
| 538 | // CFA, user |
---|
| 539 | tm * el2 = alloc(); |
---|
| 540 | tm (*ar2)[10] = alloc(); |
---|
| 541 | \end{cfa} |
---|
[40ab446] | 542 | The alloc polymorphic return compiles into a hidden parameter, which receives a compiler-generated argument. |
---|
| 543 | This compiler's argument generation uses type information from the left-hand side of the initialization to obtain the intended type. |
---|
| 544 | Using a compiler-produced value eliminates an opportunity for user error. |
---|
| 545 | |
---|
| 546 | TODO: fix in following: even the alloc call gives bad code gen: verify it was always this way; walk back the wording about things just working here; assignment (rebind) seems to offer workaround, as in bkgd-cfa-arrayinteract.cfa |
---|
| 547 | |
---|
| 548 | Bringing in another \CFA feature, reference types, both resolves a sore spot of the last example, and gives a first example of an array-interaction bug. |
---|
| 549 | In the last example, the choice of ``pointer to array'' @ar2@ breaks a parallel with @ar1@. |
---|
| 550 | They are not subscripted in the same way. |
---|
[266732e] | 551 | \begin{cfa} |
---|
| 552 | ar1[5]; |
---|
| 553 | (*ar2)[5]; |
---|
| 554 | \end{cfa} |
---|
[40ab446] | 555 | Using ``reference to array'' works at resolving this issue. TODO: discuss connection with Doug-Lea \CC proposal. |
---|
[266732e] | 556 | \begin{cfa} |
---|
| 557 | tm (&ar3)[10] = *alloc(); |
---|
| 558 | ar3[5]; |
---|
| 559 | \end{cfa} |
---|
[40ab446] | 560 | The implicit size communication to @alloc@ still works in the same ways as for @ar2@. |
---|
| 561 | |
---|
[ed79428] | 562 | Using proper array types (@ar2@ and @ar3@) addresses a concern about using raw element pointers (@ar1@), albeit a theoretical one. |
---|
[40ab446] | 563 | TODO xref C standard does not claim that @ar1@ may be subscripted, |
---|
| 564 | because no stage of interpreting the construction of @ar1@ has it be that ``there is an \emph{array object} here.'' |
---|
| 565 | But both @*ar2@ and the referent of @ar3@ are the results of \emph{typed} @alloc@ calls, |
---|
| 566 | where the type requested is an array, making the result, much more obviously, an array object. |
---|
| 567 | |
---|
[266732e] | 568 | The ``reference to array'' type has its sore spots too. |
---|
| 569 | TODO see also @dimexpr-match-c/REFPARAM_CALL@ (under @TRY_BUG_1@) |
---|
[40ab446] | 570 | |
---|
| 571 | TODO: I fixed a bug associated with using an array as a T. I think. Did I really? What was the bug? |
---|