1 | \chapter{Introduction}
|
---|
2 |
|
---|
3 | All modern programming languages provide three high-level containers (collection): array, linked-list, and string.
|
---|
4 | Often array is part of the programming language, while linked-list is built from pointer types, and string from a combination of array and linked-list.
|
---|
5 | For all three types, there is some corresponding mechanism for iterating through the structure, where the iterator flexibility varies with the kind of structure and ingenuity of the iterator implementor.
|
---|
6 |
|
---|
7 |
|
---|
8 | \section{Array}
|
---|
9 |
|
---|
10 | An array provides a homogeneous container with $O(1)$ access to elements using subscripting (some form of pointer arithmetic).
|
---|
11 | The array size can be static, dynamic but fixed after creation, or dynamic and variable after creation.
|
---|
12 | For static and dynamic-fixed, an array can be stack allocated, while dynamic-variable requires the heap.
|
---|
13 | Because array layout has contiguous components, subscripting is a computation.
|
---|
14 | However, the computation can exceed the array bounds resulting in programming errors and security violations~\cite{Elliott18, Blache19, Ruef19, Oorschot23}.
|
---|
15 | The goal is to provide good performance with safety.
|
---|
16 |
|
---|
17 |
|
---|
18 | \section{Linked List}
|
---|
19 |
|
---|
20 | A linked-list provides a homogeneous container often with $O(log N)$/$O(N)$ access to elements using successor and predecessor operations.
|
---|
21 | Subscripting by value is sometimes available, \eg hash table.
|
---|
22 | Linked types are normally dynamically sized by adding/removing nodes using link fields internal or external to the elements (nodes).
|
---|
23 | If a programming language allows pointer to stack storage, linked-list types can be allocated on the stack;
|
---|
24 | otherwise, elements are heap allocated and explicitly/implicitly managed.
|
---|
25 |
|
---|
26 |
|
---|
27 | \section{String}
|
---|
28 |
|
---|
29 | A string provides a dynamic array of homogeneous elements, where the elements are often human-readable characters.
|
---|
30 | What differentiates a string from other types in that its operations work on blocks of elements for scanning and changing the elements, rather than accessing individual elements, \eg @index@ and @substr@.
|
---|
31 | Subscripting individual elements is often available.
|
---|
32 | Often the cost of string operations is less important than the power of the operations to accomplish complex text manipulation, \eg search, analysing, composing, and decomposing.
|
---|
33 | The dynamic nature of a string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages.
|
---|
34 |
|
---|
35 |
|
---|
36 | \section{Motivation}
|
---|
37 |
|
---|
38 | The goal of this work is to introduce safe and complex versions of array, link lists, and strings into the programming language \CFA~\cite{Cforall}, which is based on C.
|
---|
39 | Unfortunately, to make C better, while retaining a high level of backwards compatibility, requires a significant knowledge of C's design.
|
---|
40 | Hence, it is assumed the reader has a medium knowledge of C or \CC, on which extensive new C knowledge is built.
|
---|
41 |
|
---|
42 |
|
---|
43 | \subsection{C?}
|
---|
44 |
|
---|
45 | Like many established programming languages, C has a standards committee and multiple ANSI/\-ISO language manuals~\cite{C99,C11,C18,C23}.
|
---|
46 | However, most programming languages are only partially explained by standard's manuals.
|
---|
47 | When it comes to explaining how C works, the definitive source is the @gcc@ compiler, which is mimicked by other C compilers, such as Clang~\cite{clang}.
|
---|
48 | Often other C compilers must mimic @gcc@ because a large part of the C library (runtime) system contains @gcc@ features.
|
---|
49 | While some key aspects of C need to be explained by quoting from the language reference manual, to illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and shows its behaviour.
|
---|
50 | These example programs show
|
---|
51 | \begin{itemize}
|
---|
52 | \item the compiler accepts or rejects certain syntax,
|
---|
53 | \item prints output to buttress a claim of behaviour,
|
---|
54 | \item executes without triggering any embedded assertions testing pre/post-assertions or invariants.
|
---|
55 | \end{itemize}
|
---|
56 | This work has been tested across @gcc@ versions 8--12 and clang version 10 running on ARM, AMD, and Intel architectures.
|
---|
57 | Any discovered anomalies among compilers or versions is discussed.
|
---|
58 | In all case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard.
|
---|
59 |
|
---|
60 |
|
---|
61 | \subsection{Ill-Typed Expressions}
|
---|
62 |
|
---|
63 | C reports many ill-typed expressions as warnings.
|
---|
64 | For example, these attempts to assign @y@ to @x@ and vice-versa are obviously ill-typed.
|
---|
65 | \lstinput{12-15}{bkgd-c-tyerr.c}
|
---|
66 | with warnings:
|
---|
67 | \begin{cfa}
|
---|
68 | warning: assignment to 'float *' from incompatible pointer type 'void (*)(void)'
|
---|
69 | warning: assignment to 'void (*)(void)' from incompatible pointer type 'float *'
|
---|
70 | \end{cfa}
|
---|
71 | Similarly,
|
---|
72 | \lstinput{17-19}{bkgd-c-tyerr.c}
|
---|
73 | with warning:
|
---|
74 | \begin{cfa}
|
---|
75 | warning: passing argument 1 of 'f' from incompatible pointer type
|
---|
76 | note: expected 'void (*)(void)' but argument is of type 'float *'
|
---|
77 | \end{cfa}
|
---|
78 | with a segmentation fault at runtime.
|
---|
79 | Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems inappropriate.
|
---|
80 | Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings, \eg unsed variable.
|
---|
81 | In the following discussion, ``ill-typed'' means giving a nonzero @gcc@ exit condition with a message that discusses typing.
|
---|
82 | Note, \CFA's type-system rejects all these ill-typed cases as type mismatch errors.
|
---|
83 |
|
---|
84 | % That @f@'s attempt to call @g@ fails is not due to 3.14 being a particularly unlucky choice of value to put in the variable @pi@.
|
---|
85 | % Rather, it is because obtaining a program that includes this essential fragment, yet exhibits a behaviour other than "doomed to crash," is a matter for an obfuscated coding competition.
|
---|
86 |
|
---|
87 | % A "tractable syntactic method for proving the absence of certain program behaviours by classifying phrases according to the kinds of values they compute"*1 rejected the program.
|
---|
88 | % The behaviour (whose absence is unprovable) is neither minor nor unlikely.
|
---|
89 | % The rejection shows that the program is ill-typed.
|
---|
90 | %
|
---|
91 | % Yet, the rejection presents as a GCC warning.
|
---|
92 | % *1 TAPL-pg1 definition of a type system
|
---|
93 |
|
---|
94 |
|
---|
95 | \section{Contributions}
|
---|
96 |
|
---|
97 | \subsection{Linked List}
|
---|
98 |
|
---|
99 | \subsection{Array}
|
---|
100 |
|
---|
101 | \subsection{String}
|
---|
102 |
|
---|
103 | \subsection{Iterator}
|
---|