[27f1055] | 1 | \chapter{Introduction} |
---|
| 2 | |
---|
[051aec4] | 3 | All modern programming languages provide three high-level containers (collection): array, linked-list, and string. |
---|
| 4 | Often array is part of the programming language, while linked-list is built from pointer types, and string from a combination of array and linked-list. |
---|
[bdc8591] | 5 | |
---|
[19a2890] | 6 | \cite{Blache19} |
---|
| 7 | \cite{Oorschot23} |
---|
| 8 | \cite{Ruef19} |
---|
| 9 | |
---|
[bdc8591] | 10 | \section{Array} |
---|
| 11 | |
---|
| 12 | Array provides a homogeneous container with $O(1)$ access to elements using subscripting. |
---|
| 13 | The array size can be static, dynamic but fixed after creation, or dynamic and variable after creation. |
---|
| 14 | For static and dynamic-fixed, an array can be stack allocated, while dynamic-variable requires the heap. |
---|
| 15 | |
---|
| 16 | |
---|
| 17 | \section{Linked List} |
---|
| 18 | |
---|
[051aec4] | 19 | Linked-list provides a homogeneous container with $O(log N)$/$O(N)$ access to elements using successor and predecessor operations. |
---|
| 20 | Subscripting by value is sometimes available, \eg hash table. |
---|
| 21 | Linked types are normally dynamically sized by adding/removing nodes using link fields internal or external to the elements (nodes). |
---|
| 22 | If a programming language allows pointer to stack storage, linked-list types can be allocated on the stack; |
---|
[bdc8591] | 23 | otherwise, elements are heap allocated and explicitly/implicitly managed. |
---|
| 24 | |
---|
| 25 | |
---|
| 26 | \section{String} |
---|
| 27 | |
---|
| 28 | String provides a dynamic array of homogeneous elements, where the elements are often human-readable characters. |
---|
| 29 | What differentiates string from other types in that string operations work on blocks of elements for scanning and changing the elements, rather than accessing individual elements. |
---|
| 30 | Nevertheless, subscripting is often available. |
---|
| 31 | The cost of string operations is less important than the power of the block operation to accomplish complex manipulation. |
---|
| 32 | The dynamic nature of string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages. |
---|
| 33 | |
---|
| 34 | |
---|
[6a8c773] | 35 | \section{Motivation} |
---|
[bdc8591] | 36 | |
---|
[b8cb388] | 37 | The goal of this work is to introduce safe and complex versions of array, link-lists, and string into the programming language \CFA~\cite{Cforall}, which is based on C. |
---|
[051aec4] | 38 | Unfortunately, to make C better, while retaining a high level of backwards compatibility, requires a significant knowledge of C's design. |
---|
[6a8c773] | 39 | Hence, it is assumed the reader has a medium knowledge of C or \CC, on which extensive new C knowledge is built. |
---|
[bdc8591] | 40 | |
---|
| 41 | |
---|
[6a8c773] | 42 | \subsection{C?} |
---|
[bdc8591] | 43 | |
---|
[051aec4] | 44 | Like many established programming languages, C has a standards committee and multiple ANSI/\-ISO language manuals~\cite{C99,C11,C18,C23}. |
---|
| 45 | However, most programming languages are only partially explained by standard's manuals. |
---|
[6a8c773] | 46 | When it comes to explaining how C works, the definitive source is the @gcc@ compiler, which is mimicked by other C compilers, such as Clang~\cite{clang}. |
---|
[051aec4] | 47 | Often other C compilers must \emph{ape} @gcc@ because a large part of the C library (runtime) system contains @gcc@ features. |
---|
[6a8c773] | 48 | While some key aspects of C need to be explained by quoting from the language reference manual, to illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and shows its behaviour. |
---|
| 49 | These example programs show |
---|
[bdc8591] | 50 | \begin{itemize} |
---|
[6a8c773] | 51 | \item the compiler accepts or rejects certain syntax, |
---|
[051aec4] | 52 | \item prints output to buttress a claim of behaviour, |
---|
[6a8c773] | 53 | \item executes without triggering any embedded assertions testing pre/post-assertions or invariants. |
---|
[bdc8591] | 54 | \end{itemize} |
---|
[6a8c773] | 55 | This work has been tested across @gcc@ versions 8--12 and clang version 10 running on ARM, AMD, and Intel architectures. |
---|
| 56 | Any discovered anomalies among compilers or versions is discussed. |
---|
[bdc8591] | 57 | In this case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard. |
---|
| 58 | |
---|
| 59 | |
---|
[6a8c773] | 60 | \subsection{Ill-Typed Expressions} |
---|
[bdc8591] | 61 | |
---|
[6a8c773] | 62 | C reports many ill-typed expressions as warnings. |
---|
| 63 | For example, these attempts to assign @y@ to @x@ and vice-versa are obviously ill-typed. |
---|
[bdc8591] | 64 | \lstinput{12-15}{bkgd-c-tyerr.c} |
---|
| 65 | with warnings: |
---|
| 66 | \begin{cfa} |
---|
| 67 | warning: assignment to 'float *' from incompatible pointer type 'void (*)(void)' |
---|
| 68 | warning: assignment to 'void (*)(void)' from incompatible pointer type 'float *' |
---|
| 69 | \end{cfa} |
---|
| 70 | Similarly, |
---|
| 71 | \lstinput{17-19}{bkgd-c-tyerr.c} |
---|
| 72 | with warning: |
---|
| 73 | \begin{cfa} |
---|
| 74 | warning: passing argument 1 of 'f' from incompatible pointer type |
---|
| 75 | note: expected 'void (*)(void)' but argument is of type 'float *' |
---|
| 76 | \end{cfa} |
---|
| 77 | with a segmentation fault at runtime. |
---|
[6a8c773] | 78 | Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems like madness. |
---|
[051aec4] | 79 | Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings. |
---|
[6a8c773] | 80 | In the following discussion, ``ill-typed'' means giving a nonzero @gcc@ exit condition with a message that discusses typing. |
---|
| 81 | Note, \CFA's type-system rejects all these ill-typed cases as type mismatch errors. |
---|
| 82 | |
---|
| 83 | % That @f@'s attempt to call @g@ fails is not due to 3.14 being a particularly unlucky choice of value to put in the variable @pi@. |
---|
| 84 | % Rather, it is because obtaining a program that includes this essential fragment, yet exhibits a behaviour other than "doomed to crash," is a matter for an obfuscated coding competition. |
---|
| 85 | |
---|
| 86 | % A "tractable syntactic method for proving the absence of certain program behaviours by classifying phrases according to the kinds of values they compute"*1 rejected the program. |
---|
| 87 | % The behaviour (whose absence is unprovable) is neither minor nor unlikely. |
---|
| 88 | % The rejection shows that the program is ill-typed. |
---|
| 89 | % |
---|
| 90 | % Yet, the rejection presents as a GCC warning. |
---|
| 91 | % *1 TAPL-pg1 definition of a type system |
---|
[27f1055] | 92 | |
---|
| 93 | |
---|
| 94 | \section{Contributions} |
---|
[bdc8591] | 95 | |
---|
| 96 | \subsection{Linked List} |
---|
| 97 | |
---|
| 98 | \subsection{Array} |
---|
| 99 | |
---|
| 100 | \subsection{String} |
---|