1 | \chapter{Background}
|
---|
2 |
|
---|
3 | This chapter states facts about the prior work, upon which my contributions build.
|
---|
4 | Each receives a justification of the extent to which its statement is phrased to provoke controversy or surprise.
|
---|
5 |
|
---|
6 | \section{C}
|
---|
7 |
|
---|
8 | \subsection{Common knowledge}
|
---|
9 |
|
---|
10 | The reader is assumed to have used C or \CC for the coursework of at least four university-level courses, or have equivalent experience.
|
---|
11 | The current discussion introduces facts, unaware of which, such a functioning novice may be operating.
|
---|
12 |
|
---|
13 | % TODO: decide if I'm also claiming this collection of facts, and test-oriented presentation is a contribution; if so, deal with (not) arguing for its originality
|
---|
14 |
|
---|
15 | \subsection{Convention: C is more touchable than its standard}
|
---|
16 |
|
---|
17 | When it comes to explaining how C works, I like illustrating definite program semantics.
|
---|
18 | I prefer doing so, over a quoting manual's suggested programmer's intuition, or showing how some compiler writers chose to model their problem.
|
---|
19 | To illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and I show its behaviour.
|
---|
20 |
|
---|
21 | This behaviour is typically one of
|
---|
22 | \begin{itemize}
|
---|
23 | \item my statement that the compiler accepts or rejects the program
|
---|
24 | \item the program's printed output, which I show
|
---|
25 | \item my implied assurance that its assertions do not fail when run
|
---|
26 | \end{itemize}
|
---|
27 |
|
---|
28 | The compiler whose program semantics is shown is
|
---|
29 | \begin{cfa}
|
---|
30 | $ gcc --version
|
---|
31 | gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
|
---|
32 | \end{cfa}
|
---|
33 | running on Architecture @x86_64@, with the same environment targeted.
|
---|
34 |
|
---|
35 | Unless explicit discussion ensues about differences among compilers or with (versions of) the standard, it is further implied that there exists a second version of GCC and some version of Clang, running on and for the same platform, that give substantially similar behaviour.
|
---|
36 | In this case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard.
|
---|
37 |
|
---|
38 |
|
---|
39 | \subsection{C reports many ill-typed expressions as warnings}
|
---|
40 |
|
---|
41 | These attempts to assign @y@ to @x@ and vice-versa are obviously ill-typed.
|
---|
42 | \lstinput{12-15}{bkgd-c-tyerr.c}
|
---|
43 | with warnings:
|
---|
44 | \begin{cfa}
|
---|
45 | warning: assignment to 'float *' from incompatible pointer type 'void (*)(void)'
|
---|
46 | warning: assignment to 'void (*)(void)' from incompatible pointer type 'float *'
|
---|
47 | \end{cfa}
|
---|
48 | Similarly,
|
---|
49 | \lstinput{17-19}{bkgd-c-tyerr.c}
|
---|
50 | with warning:
|
---|
51 | \begin{cfa}
|
---|
52 | warning: passing argument 1 of 'f' from incompatible pointer type
|
---|
53 | note: expected 'void (*)(void)' but argument is of type 'float *'
|
---|
54 | \end{cfa}
|
---|
55 | with a segmentation fault at runtime.
|
---|
56 |
|
---|
57 | That @f@'s attempt to call @g@ fails is not due to 3.14 being a particularly unlucky choice of value to put in the variable @pi@.
|
---|
58 | Rather, it is because obtaining a program that includes this essential fragment, yet exhibits a behaviour other than "doomed to crash," is a matter for an obfuscated coding competition.
|
---|
59 |
|
---|
60 | A "tractable syntactic method for proving the absence of certain program behaviours by classifying phrases according to the kinds of values they compute"*1 rejected the program.
|
---|
61 | The behaviour (whose absence is unprovable) is neither minor nor unlikely.
|
---|
62 | The rejection shows that the program is ill-typed.
|
---|
63 |
|
---|
64 | Yet, the rejection presents as a GCC warning.
|
---|
65 |
|
---|
66 | In the discussion following, ``ill-typed'' means giving a nonzero @gcc -Werror@ exit condition with a message that discusses typing.
|
---|
67 |
|
---|
68 | *1 TAPL-pg1 definition of a type system
|
---|
69 |
|
---|
70 |
|
---|
71 | \section{C Arrays}
|
---|
72 |
|
---|
73 | \subsection{C has an array type (!)}
|
---|
74 |
|
---|
75 | When a programmer works with an array, C semantics provide access to a type that is different in every way from ``pointer to its first element.''
|
---|
76 | Its qualities become apparent by inspecting the declaration
|
---|
77 | \lstinput{34-34}{bkgd-carray-arrty.c}
|
---|
78 | The inspection begins by using @sizeof@ to provide definite program semantics for the intuition of an expression's type.
|
---|
79 | Assuming a target platform keeps things concrete:
|
---|
80 | \lstinput{35-36}{bkgd-carray-arrty.c}
|
---|
81 | Consider the sizes of expressions derived from @a@, modified by adding ``pointer to'' and ``first element'' (and including unnecessary parentheses to avoid confusion about precedence).
|
---|
82 | \lstinput{37-40}{bkgd-carray-arrty.c}
|
---|
83 | That @a@ takes up 40 bytes is common reasoning for C programmers.
|
---|
84 | Set aside for a moment the claim that this first assertion is giving information about a type.
|
---|
85 | For now, note that an array and a pointer to its first element are, sometimes, different things.
|
---|
86 |
|
---|
87 | The idea that there is such a thing as a pointer to an array may be surprising.
|
---|
88 | It is not the same thing as a pointer to the first element:
|
---|
89 | \lstinput{42-45}{bkgd-carray-arrty.c}
|
---|
90 | The first gets
|
---|
91 | \begin{cfa}
|
---|
92 | warning: assignment to `float (*)[10]' from incompatible pointer type `float *'
|
---|
93 | \end{cfa}
|
---|
94 | and the second gets the opposite.
|
---|
95 |
|
---|
96 | We now refute a concern that @sizeof(a)@ is reporting on special knowledge from @a@ being an local variable,
|
---|
97 | say that it is informing about an allocation, rather than simply a type.
|
---|
98 |
|
---|
99 | First, recognizing that @sizeof@ has two forms, one operating on an expression, the other on a type, we observe that the original answers are unaffected by using the type-parameterized form:
|
---|
100 | \lstinput{46-50}{bkgd-carray-arrty.c}
|
---|
101 | Finally, the same sizing is reported when there is no allocation at all, and we launch the analysis instead from the pointer-to-array type.
|
---|
102 | \lstinput{51-57}{bkgd-carray-arrty.c}
|
---|
103 | So, in spite of considerable programmer success enabled by an understanding that an array just a pointer to its first element (revisited TODO pointer decay), this understanding is simplistic.
|
---|
104 |
|
---|
105 | A shortened form for declaring local variables exists, provided that length information is given in the initializer:
|
---|
106 | \lstinput{59-63}{bkgd-carray-arrty.c}
|
---|
107 | In these declarations, the resulting types are both arrays, but their lengths are inferred.
|
---|
108 |
|
---|
109 | \begin{tabular}{lllllll}
|
---|
110 | @float x;@ & $\rightarrow$ & (base element) & @float@ & @float x;@ & @[ float ]@ & @[ float ]@ \\
|
---|
111 | @float * x;@ & $\rightarrow$ & pointer & @float *@ & @float * x;@ & @[ * float ]@ & @[ * float ]@ \\
|
---|
112 | @float x[10];@ & $\rightarrow$ & array & @float[10]@ & @float x[10];@ & @[ [10] float ]@ & @[ array(float, 10) ]@ \\
|
---|
113 | @float *x[10];@ & $\rightarrow$ & array of pointers & @(float*)[10]@ & @float *x[10];@ & @[ [10] * float ]@ & @[ array(*float, 10) ]@ \\
|
---|
114 | @float (*x)[10];@ & $\rightarrow$ & pointer to array & @float(*)[10]@ & @float (*x)[10];@ & @[ * [10] float ]@ & @[ * array(float, 10) ]@ \\
|
---|
115 | @float *(*x5)[10];@ & $\rightarrow$ & pointer to array & @(float*)(*)[10]@ & @float *(*x)[10];@ & @[ * [10] * float ]@ & @[ * array(*float, 10) ]@
|
---|
116 | \end{tabular}
|
---|
117 | \begin{cfa}
|
---|
118 | x5 = (float*(*)[10]) x4;
|
---|
119 | // x5 = (float(*)[10]) x4; // wrong target type; meta test suggesting above cast uses correct type
|
---|
120 |
|
---|
121 | // [here]
|
---|
122 | // const
|
---|
123 |
|
---|
124 | // [later]
|
---|
125 | // static
|
---|
126 | // star as dimension
|
---|
127 | // under pointer decay: int p1[const 3] being int const *p1
|
---|
128 |
|
---|
129 | const float * y1;
|
---|
130 | float const * y2;
|
---|
131 | float * const y3;
|
---|
132 |
|
---|
133 | y1 = 0;
|
---|
134 | y2 = 0;
|
---|
135 | // y3 = 0; // bad
|
---|
136 |
|
---|
137 | // *y1 = 3.14; // bad
|
---|
138 | // *y2 = 3.14; // bad
|
---|
139 | *y3 = 3.14;
|
---|
140 |
|
---|
141 | const float z1 = 1.414;
|
---|
142 | float const z2 = 1.414;
|
---|
143 |
|
---|
144 | // z1 = 3.14; // bad
|
---|
145 | // z2 = 3.14; // bad
|
---|
146 |
|
---|
147 |
|
---|
148 | }
|
---|
149 |
|
---|
150 | #define T float
|
---|
151 | void stx2() { const T x[10];
|
---|
152 | // x[5] = 3.14; // bad
|
---|
153 | }
|
---|
154 | void stx3() { T const x[10];
|
---|
155 | // x[5] = 3.14; // bad
|
---|
156 | }
|
---|
157 | \end{cfa}
|
---|
158 |
|
---|
159 | My contribution is enabled by recognizing
|
---|
160 | \begin{itemize}
|
---|
161 | \item There is value in using a type that knows how big the whole thing is.
|
---|
162 | \item The type pointer to (first) element does not.
|
---|
163 | \item C \emph{has} a type that knows the whole picture: array, e.g. @T[10]@.
|
---|
164 | \item This type has all the usual derived forms, which also know the whole picture. A usefully noteworthy example is pointer to array, e.g. @T(*)[10]@.
|
---|
165 | \end{itemize}
|
---|
166 |
|
---|
167 | Each of these sections, which introduces another layer of of the C arrays' story,
|
---|
168 | concludes with an \emph{Unfortunate Syntactic Reference}.
|
---|
169 | It shows how to spell the types under discussion,
|
---|
170 | along with interactions with orthogonal (but easily confused) language features.
|
---|
171 | Alterrnate spellings are listed withing a row.
|
---|
172 | The simplest occurrences of types distinguished in the preceding discussion are marked with $\triangleright$.
|
---|
173 | The Type column gives the spelling used in a cast or error message (though note Section TODO points out that some types cannot be casted to).
|
---|
174 | The Declaration column gives the spelling used in an object declaration, such as variable or aggregate member; parameter declarations (section TODO) follow entirely different rules.
|
---|
175 |
|
---|
176 | After all, reading a C array type is easy: just read it from the inside out, and know when to look left and when to look right!
|
---|
177 |
|
---|
178 |
|
---|
179 | \CFA-specific spellings (not yet introduced) are also included here for referenceability; these can be skipped on linear reading.
|
---|
180 | The \CFA-C column gives the, more fortunate, ``new'' syntax of section TODO, for spelling \emph{exactly the same type}.
|
---|
181 | This fortunate syntax does not have different spellings for types vs declarations;
|
---|
182 | a declaration is always the type followed by the declared identifier name;
|
---|
183 | for the example of letting @x@ be a \emph{pointer to array}, the declaration is spelled:
|
---|
184 | \begin{cfa}
|
---|
185 | [ * [10] T ] x;
|
---|
186 | \end{cfa}
|
---|
187 | The \CFA-Full column gives the spelling of a different type, introduced in TODO, which has all of my contributed improvements for safety and ergonomics.
|
---|
188 |
|
---|
189 | \noindent
|
---|
190 | \textbf{Unfortunate Syntactic Reference}
|
---|
191 |
|
---|
192 | \begin{figure}
|
---|
193 | \centering
|
---|
194 | \setlength{\tabcolsep}{3pt}
|
---|
195 | \begin{tabular}{llllll}
|
---|
196 | & Description & Type & Declaration & \CFA-C & \CFA-Full \\ \hline
|
---|
197 | $\triangleright$ & val.
|
---|
198 | & @T@
|
---|
199 | & @T x;@
|
---|
200 | & @[ T ]@
|
---|
201 | &
|
---|
202 | \\ \hline
|
---|
203 | & \pbox{20cm}{ \vspace{2pt} val.\\ \footnotesize{no writing the val.\ in \lstinline{x}} }\vspace{2pt}
|
---|
204 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T} \\ \lstinline{T const} }
|
---|
205 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T x;} \\ \lstinline{T const x;} }
|
---|
206 | & @[ const T ]@
|
---|
207 | &
|
---|
208 | \\ \hline \hline
|
---|
209 | $\triangleright$ & ptr.\ to val.
|
---|
210 | & @T *@
|
---|
211 | & @T * x;@
|
---|
212 | & @[ * T ]@
|
---|
213 | &
|
---|
214 | \\ \hline
|
---|
215 | & \pbox{20cm}{ \vspace{2pt} ptr.\ to val.\\ \footnotesize{no writing the ptr.\ in \lstinline{x}} }\vspace{2pt}
|
---|
216 | & @T * const@
|
---|
217 | & @T * const x;@
|
---|
218 | & @[ const * T ]@
|
---|
219 | &
|
---|
220 | \\ \hline
|
---|
221 | & \pbox{20cm}{ \vspace{2pt} ptr.\ to val.\\ \footnotesize{no writing the val.\ in \lstinline{*x}} }\vspace{2pt}
|
---|
222 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T *} \\ \lstinline{T const *} }
|
---|
223 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T * x;} \\ \lstinline{T const * x;} }
|
---|
224 | & @[ * const T ]@
|
---|
225 | &
|
---|
226 | \\ \hline \hline
|
---|
227 | $\triangleright$ & ar.\ of val.
|
---|
228 | & @T[10]@
|
---|
229 | & @T x[10];@
|
---|
230 | & @[ [10] T ]@
|
---|
231 | & @[ array(T, 10) ]@
|
---|
232 | \\ \hline
|
---|
233 | & \pbox{20cm}{ \vspace{2pt} ar.\ of val.\\ \footnotesize{no writing the val.\ in \lstinline{x[5]}} }\vspace{2pt}
|
---|
234 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T[10]} \\ \lstinline{T const[10]} }
|
---|
235 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T x[10];} \\ \lstinline{T const x[10];} }
|
---|
236 | & @[ [10] const T ]@
|
---|
237 | & @[ const array(T, 10) ]@
|
---|
238 | \\ \hline
|
---|
239 | & ar.\ of ptr.\ to val.
|
---|
240 | & @T*[10]@
|
---|
241 | & @T *x[10];@
|
---|
242 | & @[ [10] * T ]@
|
---|
243 | & @[ array(* T, 10) ]@
|
---|
244 | \\ \hline
|
---|
245 | & \pbox{20cm}{ \vspace{2pt} ar.\ of ptr.\ to val.\\ \footnotesize{no writing the ptr.\ in \lstinline{x[5]}} }\vspace{2pt}
|
---|
246 | & @T * const [10]@
|
---|
247 | & @T * const x[10];@
|
---|
248 | & @[ [10] const * T ]@
|
---|
249 | & @[ array(const * T, 10) ]@
|
---|
250 | \\ \hline
|
---|
251 | & \pbox{20cm}{ \vspace{2pt} ar.\ of ptr.\ to val.\\ \footnotesize{no writing the val.\ in \lstinline{*(x[5])}} }\vspace{2pt}
|
---|
252 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T * [10]} \\ \lstinline{T const * [10]} }
|
---|
253 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T * x[10];} \\ \lstinline{T const * x[10];} }
|
---|
254 | & @[ [10] * const T ]@
|
---|
255 | & @[ array(* const T, 10) ]@
|
---|
256 | \\ \hline \hline
|
---|
257 | $\triangleright$ & ptr.\ to ar.\ of val.
|
---|
258 | & @T(*)[10]@
|
---|
259 | & @T (*x)[10];@
|
---|
260 | & @[ * [10] T ]@
|
---|
261 | & @[ * array(T, 10) ]@
|
---|
262 | \\ \hline
|
---|
263 | & \pbox{20cm}{ \vspace{2pt} ptr.\ to ar.\ of val.\\ \footnotesize{no writing the ptr.\ in \lstinline{x}} }\vspace{2pt}
|
---|
264 | & @T(* const)[10]@
|
---|
265 | & @T (* const x)[10];@
|
---|
266 | & @[ const * [10] T ]@
|
---|
267 | & @[ const * array(T, 10) ]@
|
---|
268 | \\ \hline
|
---|
269 | & \pbox{20cm}{ \vspace{2pt} ptr.\ to ar.\ of val.\\ \footnotesize{no writing the val.\ in \lstinline{(*x)[5]}} }\vspace{2pt}
|
---|
270 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T(*)[10]} \\ \lstinline{T const (*) [10]} }
|
---|
271 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T (*x)[10];} \\ \lstinline{T const (*x)[10];} }
|
---|
272 | & @[ * [10] const T ]@
|
---|
273 | & @[ * const array(T, 10) ]@
|
---|
274 | \\ \hline
|
---|
275 | & ptr.\ to ar.\ of ptr.\ to val.
|
---|
276 | & @T*(*)[10]@
|
---|
277 | & @T *(*x)[10];@
|
---|
278 | & @[ * [10] * T ]@
|
---|
279 | & @[ * array(* T, 10) ]@
|
---|
280 | \\ \hline
|
---|
281 | \end{tabular}
|
---|
282 | \caption{Figure}
|
---|
283 | \end{figure}
|
---|
284 |
|
---|
285 |
|
---|
286 | \subsection{Arrays decay and pointers diffract}
|
---|
287 |
|
---|
288 | The last section established the difference between these four types:
|
---|
289 | \lstinput{3-6}{bkgd-carray-decay.c}
|
---|
290 | But the expression used for obtaining the pointer to the first element is pedantic.
|
---|
291 | The root of all C programmer experience with arrays is the shortcut
|
---|
292 | \lstinput{8-8}{bkgd-carray-decay.c}
|
---|
293 | which reproduces @pa0@, in type and value:
|
---|
294 | \lstinput{9-9}{bkgd-carray-decay.c}
|
---|
295 | The validity of this initialization is unsettling, in the context of the facts established in the last section.
|
---|
296 | Notably, it initializes name @pa0x@ from expression @a@, when they are not of the same type:
|
---|
297 | \lstinput{10-10}{bkgd-carray-decay.c}
|
---|
298 |
|
---|
299 | So, C provides an implicit conversion from @float[10]@ to @float*@, as described in ARM-6.3.2.1.3:
|
---|
300 | \begin{quote}
|
---|
301 | Except when it is the operand of the @sizeof@ operator, or the unary @&@ operator, or is a
|
---|
302 | string literal used to initialize an array
|
---|
303 | an expression that has type ``array of type'' is
|
---|
304 | converted to an expression with type ``pointer to type'' that points to the initial element of
|
---|
305 | the array object
|
---|
306 | \end{quote}
|
---|
307 |
|
---|
308 | This phenomenon is the famous ``pointer decay,'' which is a decay of an array-typed expression into a pointer-typed one.
|
---|
309 |
|
---|
310 | It is worthy to note that the list of exception cases does not feature the occurrence of @a@ in @a[i]@.
|
---|
311 | Thus, subscripting happens on pointers, not arrays.
|
---|
312 |
|
---|
313 | Subscripting proceeds first with pointer decay, if needed. Next, ARM-6.5.2.1.2 explains that @a[i]@ is treated as if it were @(*((a)+(i)))@.
|
---|
314 | ARM-6.5.6.8 explains that the addition, of a pointer with an integer type, is defined only when the pointer refers to an element that is in an array, with a meaning of ``@i@ elements away from,'' which is valid if @a@ is big enough and @i@ is small enough.
|
---|
315 | Finally, ARM-6.5.3.2.4 explains that the @*@ operator's result is the referenced element.
|
---|
316 |
|
---|
317 | Taken together, these rules also happen to illustrate that @a[i]@ and @i[a]@ mean the same thing.
|
---|
318 |
|
---|
319 | Subscripting a pointer when the target is standard-inappropriate is still practically well-defined.
|
---|
320 | While the standard affords a C compiler freedom about the meaning of an out-of-bound access,
|
---|
321 | or of subscripting a pointer that does not refer to an array element at all,
|
---|
322 | the fact that C is famously both generally high-performance, and specifically not bound-checked,
|
---|
323 | leads to an expectation that the runtime handling is uniform across legal and illegal accesses.
|
---|
324 | Moreover, consider the common pattern of subscripting on a malloc result:
|
---|
325 | \begin{cfa}
|
---|
326 | float * fs = malloc( 10 * sizeof(float) );
|
---|
327 | fs[5] = 3.14;
|
---|
328 | \end{cfa}
|
---|
329 | The @malloc@ behaviour is specified as returning a pointer to ``space for an object whose size is'' as requested (ARM-7.22.3.4.2).
|
---|
330 | But program says \emph{nothing} more about this pointer value, that might cause its referent to \emph{be} an array, before doing the subscript.
|
---|
331 |
|
---|
332 | Under this assumption, a pointer being subscripted (or added to, then dereferenced)
|
---|
333 | by any value (positive, zero, or negative), gives a view of the program's entire address space,
|
---|
334 | centred around the @p@ address, divided into adjacent @sizeof(*p)@ chunks,
|
---|
335 | each potentially (re)interpreted as @typeof(*p)@.
|
---|
336 |
|
---|
337 | I call this phenomenon ``array diffraction,'' which is a diffraction of a single-element pointer
|
---|
338 | into the assumption that its target is in the middle of an array whose size is unlimited in both directions.
|
---|
339 |
|
---|
340 | No pointer is exempt from array diffraction.
|
---|
341 |
|
---|
342 | No array shows its elements without pointer decay.
|
---|
343 |
|
---|
344 | A further pointer--array confusion, closely related to decay, occurs in parameter declarations.
|
---|
345 | ARM-6.7.6.3.7 explains that when an array type is written for a parameter,
|
---|
346 | the parameter's type becomes a type that I summarize as being the array-decayed type.
|
---|
347 | The respective handlings of the following two parameter spellings shows that the array-spelled one is really, like the other, a pointer.
|
---|
348 | \lstinput{12-16}{bkgd-carray-decay.c}
|
---|
349 | As the @sizeof(x)@ meaning changed, compared with when run on a similarly-spelled local variariable declaration,
|
---|
350 | GCC also gives this code the warning: ```sizeof' on array function parameter `x' will return size of `float *'.''
|
---|
351 |
|
---|
352 | The caller of such a function is left with the reality that a pointer parameter is a pointer, no matter how it's spelled:
|
---|
353 | \lstinput{18-21}{bkgd-carray-decay.c}
|
---|
354 | This fragment gives no warnings.
|
---|
355 |
|
---|
356 | The shortened parameter syntax @T x[]@ is a further way to spell ``pointer.''
|
---|
357 | Note the opposite meaning of this spelling now, compared with its use in local variable declarations.
|
---|
358 | This point of confusion is illustrated in:
|
---|
359 | \lstinput{23-30}{bkgd-carray-decay.c}
|
---|
360 | The basic two meanings, with a syntactic difference helping to distinguish,
|
---|
361 | are illustrated in the declarations of @ca@ vs.\ @cp@,
|
---|
362 | whose subsequent @edit@ calls behave differently.
|
---|
363 | The syntax-caused confusion is in the comparison of the first and last lines,
|
---|
364 | both of which use a literal to initialze an object decalared with spelling @T x[]@.
|
---|
365 | But these initialized declarations get opposite meanings,
|
---|
366 | depending on whether the object is a local variable or a parameter.
|
---|
367 |
|
---|
368 |
|
---|
369 | In sumary, when a funciton is written with an array-typed parameter,
|
---|
370 | \begin{itemize}
|
---|
371 | \item an appearance of passing an array by value is always an incorrect understanding
|
---|
372 | \item a dimension value, if any is present, is ignorred
|
---|
373 | \item pointer decay is forced at the call site and the callee sees the parameter having the decayed type
|
---|
374 | \end{itemize}
|
---|
375 |
|
---|
376 | Pointer decay does not affect pointer-to-array types, because these are already pointers, not arrays.
|
---|
377 | As a result, a function with a pointer-to-array parameter sees the parameter exactly as the caller does:
|
---|
378 | \lstinput{32-42}{bkgd-carray-decay.c}
|
---|
379 |
|
---|
380 | \noindent
|
---|
381 | \textbf{Unfortunate Syntactic Reference}
|
---|
382 |
|
---|
383 | \noindent
|
---|
384 | (Parameter declaration; ``no writing'' refers to the callee's ability)
|
---|
385 |
|
---|
386 | \begin{figure}
|
---|
387 | \centering
|
---|
388 | \begin{tabular}{llllll}
|
---|
389 | & Description & Type & Param. Decl & \CFA-C \\ \hline
|
---|
390 | $\triangleright$ & ptr.\ to val.
|
---|
391 | & @T *@
|
---|
392 | & \pbox{20cm}{ \vspace{2pt} \lstinline{T * x,} \\ \lstinline{T x[10],} \\ \lstinline{T x[],} }\vspace{2pt}
|
---|
393 | & \pbox{20cm}{ \vspace{2pt} \lstinline{[ * T ]} \\ \lstinline{[ [10] T ]} \\ \lstinline{[ [] T ]} }
|
---|
394 | \\ \hline
|
---|
395 | & \pbox{20cm}{ \vspace{2pt} ptr.\ to val.\\ \footnotesize{no writing the ptr.\ in \lstinline{x}} }\vspace{2pt}
|
---|
396 | & @T * const@
|
---|
397 | & \pbox{20cm}{ \vspace{2pt} \lstinline{T * const x,} \\ \lstinline{T x[const 10],} \\ \lstinline{T x[const],} }\vspace{2pt}
|
---|
398 | & \pbox{20cm}{ \vspace{2pt} \lstinline{[ const * T ]} \\ \lstinline{[ [const 10] T ]} \\ \lstinline{[ [const] T ]} }
|
---|
399 | \\ \hline
|
---|
400 | & \pbox{20cm}{ \vspace{2pt} ptr.\ to val.\\ \footnotesize{no writing the val.\ in \lstinline{*x}} }\vspace{2pt}
|
---|
401 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T *} \\ \lstinline{T const *} }
|
---|
402 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const T * x,} \\ \lstinline{T const * x,} \\ \lstinline{const T x[10],} \\ \lstinline{T const x[10],} \\ \lstinline{const T x[],} \\ \lstinline{T const x[],} }\vspace{2pt}
|
---|
403 | & \pbox{20cm}{ \vspace{2pt} \lstinline{[* const T]} \\ \lstinline{[ [10] const T ]} \\ \lstinline{[ [] const T ]} }
|
---|
404 | \\ \hline \hline
|
---|
405 | $\triangleright$ & ptr.\ to ar.\ of val.
|
---|
406 | & @T(*)[10]@
|
---|
407 | & \pbox{20cm}{ \vspace{2pt} \lstinline{T (*x)[10],} \\ \lstinline{T x[3][10],} \\ \lstinline{T x[][10],} }\vspace{2pt}
|
---|
408 | & \pbox{20cm}{ \vspace{2pt} \lstinline{[* [10] T]} \\ \lstinline{[ [3] [10] T ]} \\ \lstinline{[ [] [10] T ]} }
|
---|
409 | \\ \hline
|
---|
410 | & ptr.\ to ptr.\ to val.
|
---|
411 | & @T **@
|
---|
412 | & \pbox{20cm}{ \vspace{2pt} \lstinline{T ** x,} \\ \lstinline{T *x[10],} \\ \lstinline{T *x[],} }\vspace{2pt}
|
---|
413 | & \pbox{20cm}{ \vspace{2pt} \lstinline{[ * * T ]} \\ \lstinline{[ [10] * T ]} \\ \lstinline{[ [] * T ]} }
|
---|
414 | \\ \hline
|
---|
415 | & \pbox{20cm}{ \vspace{2pt} ptr.\ to ptr.\ to val.\\ \footnotesize{no writing the val.\ in \lstinline{**argv}} }\vspace{2pt}
|
---|
416 | & @const char **@
|
---|
417 | & \pbox{20cm}{ \vspace{2pt} \lstinline{const char *argv[],} \\ \footnotesize{(others elided)} }\vspace{2pt}
|
---|
418 | & \pbox{20cm}{ \vspace{2pt} \lstinline{[ [] * const char ]} \\ \footnotesize{(others elided)} }
|
---|
419 | \\ \hline
|
---|
420 | \end{tabular}
|
---|
421 | \caption{Figure}
|
---|
422 | \end{figure}
|
---|
423 |
|
---|
424 |
|
---|
425 | \subsection{Lengths may vary, checking does not}
|
---|
426 |
|
---|
427 | When the desired number of elements is unknown at compile time,
|
---|
428 | a variable-length array is a solution:
|
---|
429 | \begin{cfa}
|
---|
430 | int main( int argc, const char *argv[] ) {
|
---|
431 | assert( argc == 2 );
|
---|
432 | size_t n = atol( argv[1] );
|
---|
433 | assert( 0 < n && n < 1000 );
|
---|
434 |
|
---|
435 | float a[n];
|
---|
436 | float b[10];
|
---|
437 |
|
---|
438 | // ... discussion continues here
|
---|
439 | }
|
---|
440 | \end{cfa}
|
---|
441 | This arrangement allocates @n@ elements on the @main@ stack frame for @a@, just as it puts 10 elements on the @main@ stack frame for @b@.
|
---|
442 | The variable-sized allocation of @a@ is provided by @alloca@.
|
---|
443 |
|
---|
444 | In a situation where the array sizes are not known to be small enough for stack allocation to be sensible, corresponding heap allocations are achievable as:
|
---|
445 | \begin{cfa}
|
---|
446 | float *ax1 = malloc( sizeof( float[n] ) );
|
---|
447 | float *ax2 = malloc( n * sizeof( float ) );
|
---|
448 | float *bx1 = malloc( sizeof( float[1000000] ) );
|
---|
449 | float *bx2 = malloc( 1000000 * sizeof( float ) );
|
---|
450 | \end{cfa}
|
---|
451 |
|
---|
452 |
|
---|
453 | VLA
|
---|
454 |
|
---|
455 | Parameter dependency
|
---|
456 |
|
---|
457 | Checking is best-effort / unsound
|
---|
458 |
|
---|
459 | Limited special handling to get the dimension value checked (static)
|
---|
460 |
|
---|
461 |
|
---|
462 |
|
---|
463 | \subsection{C has full-service, dynamically sized, multidimensional arrays (and \CC does not)}
|
---|
464 |
|
---|
465 | In C and \CC, ``multidimensional array'' means ``array of arrays.'' Other meanings are discussed in TODO.
|
---|
466 |
|
---|
467 | Just as an array's element type can be @float@, so can it be @float[10]@.
|
---|
468 |
|
---|
469 | While any of @float*@, @float[10]@ and @float(*)[10]@ are easy to tell apart from @float@, telling them apart from each other may need occasional reference back to TODO intro section.
|
---|
470 | The sentence derived by wrapping each type in @-[3]@ follows.
|
---|
471 |
|
---|
472 | While any of @float*[3]@, @float[3][10]@ and @float(*)[3][10]@ are easy to tell apart from @float[3]@,
|
---|
473 | telling them apart from each other is what it takes to know what ``array of arrays'' really means.
|
---|
474 |
|
---|
475 | Pointer decay affects the outermost array only
|
---|
476 |
|
---|
477 | TODO: unfortunate syntactic reference with these cases:
|
---|
478 |
|
---|
479 | \begin{itemize}
|
---|
480 | \item ar. of ar. of val (be sure about ordering of dimensions when the declaration is dropped)
|
---|
481 | \item ptr. to ar. of ar. of val
|
---|
482 | \end{itemize}
|
---|
483 |
|
---|
484 |
|
---|
485 | \subsection{Arrays are (but) almost values}
|
---|
486 |
|
---|
487 | Has size; can point to
|
---|
488 |
|
---|
489 | Can't cast to
|
---|
490 |
|
---|
491 | Can't pass as value
|
---|
492 |
|
---|
493 | Can initialize
|
---|
494 |
|
---|
495 | Can wrap in aggregate
|
---|
496 |
|
---|
497 | Can't assign
|
---|
498 |
|
---|
499 |
|
---|
500 | \subsection{Returning an array is (but) almost possible}
|
---|
501 |
|
---|
502 |
|
---|
503 | \subsection{The pointer-to-array type has been noticed before}
|
---|
504 |
|
---|
505 | \subsection{Multi-Dimensional}
|
---|
506 |
|
---|
507 | As in the last section, we inspect the declaration ...
|
---|
508 | \lstinput{16-18}{bkgd-carray-mdim.c}
|
---|
509 | The significant axis of deriving expressions from @a@ is now ``itself,'' ``first element'' or ``first grand-element (meaning, first element of first element).''
|
---|
510 | \lstinput{20-44}{bkgd-carray-mdim.c}
|
---|
511 |
|
---|
512 |
|
---|
513 | \section{\CFA}
|
---|
514 |
|
---|
515 | Traditionally, fixing C meant leaving the C-ism alone, while providing a better alternative beside it.
|
---|
516 | (For later: That's what I offer with array.hfa, but in the future-work vision for arrays, the fix includes helping programmers stop accidentally using a broken C-ism.)
|
---|
517 |
|
---|
518 | \subsection{\CFA features interacting with arrays}
|
---|
519 |
|
---|
520 | Prior work on \CFA included making C arrays, as used in C code from the wild,
|
---|
521 | work, if this code is fed into @cfacc@.
|
---|
522 | The quality of this this treatment was fine, with no more or fewer bugs than is typical.
|
---|
523 |
|
---|
524 | More mixed results arose with feeding these ``C'' arrays into preexisting \CFA features.
|
---|
525 |
|
---|
526 | A notable success was with the \CFA @alloc@ function,
|
---|
527 | which type information associated with a polymorphic return type
|
---|
528 | replaces @malloc@'s use of programmer-supplied size information.
|
---|
529 | \begin{cfa}
|
---|
530 | // C, library
|
---|
531 | void * malloc( size_t );
|
---|
532 | // C, user
|
---|
533 | struct tm * el1 = malloc( sizeof(struct tm) );
|
---|
534 | struct tm * ar1 = malloc( 10 * sizeof(struct tm) );
|
---|
535 |
|
---|
536 | // CFA, library
|
---|
537 | forall( T * ) T * alloc();
|
---|
538 | // CFA, user
|
---|
539 | tm * el2 = alloc();
|
---|
540 | tm (*ar2)[10] = alloc();
|
---|
541 | \end{cfa}
|
---|
542 | The alloc polymorphic return compiles into a hidden parameter, which receives a compiler-generated argument.
|
---|
543 | This compiler's argument generation uses type information from the left-hand side of the initialization to obtain the intended type.
|
---|
544 | Using a compiler-produced value eliminates an opportunity for user error.
|
---|
545 |
|
---|
546 | TODO: fix in following: even the alloc call gives bad code gen: verify it was always this way; walk back the wording about things just working here; assignment (rebind) seems to offer workaround, as in bkgd-cfa-arrayinteract.cfa
|
---|
547 |
|
---|
548 | Bringing in another \CFA feature, reference types, both resolves a sore spot of the last example, and gives a first example of an array-interaction bug.
|
---|
549 | In the last example, the choice of ``pointer to array'' @ar2@ breaks a parallel with @ar1@.
|
---|
550 | They are not subscripted in the same way.
|
---|
551 | \begin{cfa}
|
---|
552 | ar1[5];
|
---|
553 | (*ar2)[5];
|
---|
554 | \end{cfa}
|
---|
555 | Using ``reference to array'' works at resolving this issue. TODO: discuss connection with Doug-Lea \CC proposal.
|
---|
556 | \begin{cfa}
|
---|
557 | tm (&ar3)[10] = *alloc();
|
---|
558 | ar3[5];
|
---|
559 | \end{cfa}
|
---|
560 | The implicit size communication to @alloc@ still works in the same ways as for @ar2@.
|
---|
561 |
|
---|
562 | Using proper array types (@ar2@ and @ar3@) addresses a concern about using raw element pointers (@ar1@), albeit a theoretical one.
|
---|
563 | TODO xref C standard does not claim that @ar1@ may be subscripted,
|
---|
564 | because no stage of interpreting the construction of @ar1@ has it be that ``there is an \emph{array object} here.''
|
---|
565 | But both @*ar2@ and the referent of @ar3@ are the results of \emph{typed} @alloc@ calls,
|
---|
566 | where the type requested is an array, making the result, much more obviously, an array object.
|
---|
567 |
|
---|
568 | The ``reference to array'' type has its sore spots too.
|
---|
569 | TODO see also @dimexpr-match-c/REFPARAM_CALL@ (under @TRY_BUG_1@)
|
---|
570 |
|
---|
571 | TODO: I fixed a bug associated with using an array as a T. I think. Did I really? What was the bug?
|
---|