source: doc/theses/mike_brooks_MMath/intro.tex @ b8cb388

Last change on this file since b8cb388 was b8cb388, checked in by Peter A. Buhr <pabuhr@…>, 6 months ago

fix CFA citation

  • Property mode set to 100644
File size: 5.4 KB
Line 
1\chapter{Introduction}
2
3All modern programming languages provide three high-level containers (collection): array, linked-list, and string.
4Often array is part of the programming language, while linked-list is built from pointer types, and string from a combination of array and linked-list.
5
6\cite{Blache19}
7\cite{Oorschot23}
8\cite{Ruef19}
9
10\section{Array}
11
12Array provides a homogeneous container with $O(1)$ access to elements using subscripting.
13The array size can be static, dynamic but fixed after creation, or dynamic and variable after creation.
14For static and dynamic-fixed, an array can be stack allocated, while dynamic-variable requires the heap.
15
16
17\section{Linked List}
18
19Linked-list provides a homogeneous container with $O(log N)$/$O(N)$ access to elements using successor and predecessor operations.
20Subscripting by value is sometimes available, \eg hash table.
21Linked types are normally dynamically sized by adding/removing nodes using link fields internal or external to the elements (nodes).
22If a programming language allows pointer to stack storage, linked-list types can be allocated on the stack;
23otherwise, elements are heap allocated and explicitly/implicitly managed.
24
25
26\section{String}
27
28String provides a dynamic array of homogeneous elements, where the elements are often human-readable characters.
29What differentiates string from other types in that string operations work on blocks of elements for scanning and changing the elements, rather than accessing individual elements.
30Nevertheless, subscripting is often available.
31The cost of string operations is less important than the power of the block operation to accomplish complex manipulation.
32The dynamic nature of string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages.
33
34
35\section{Motivation}
36
37The goal of this work is to introduce safe and complex versions of array, link-lists, and string into the programming language \CFA~\cite{Cforall}, which is based on C.
38Unfortunately, to make C better, while retaining a high level of backwards compatibility, requires a significant knowledge of C's design.
39Hence, it is assumed the reader has a medium knowledge of C or \CC, on which extensive new C knowledge is built.
40
41
42\subsection{C?}
43
44Like many established programming languages, C has a standards committee and multiple ANSI/\-ISO language manuals~\cite{C99,C11,C18,C23}.
45However, most programming languages are only partially explained by standard's manuals.
46When it comes to explaining how C works, the definitive source is the @gcc@ compiler, which is mimicked by other C compilers, such as Clang~\cite{clang}.
47Often other C compilers must \emph{ape} @gcc@ because a large part of the C library (runtime) system contains @gcc@ features.
48While some key aspects of C need to be explained by quoting from the language reference manual, to illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and shows its behaviour.
49These example programs show
50\begin{itemize}
51        \item the compiler accepts or rejects certain syntax,
52        \item prints output to buttress a claim of behaviour,
53        \item executes without triggering any embedded assertions testing pre/post-assertions or invariants.
54\end{itemize}
55This work has been tested across @gcc@ versions 8--12 and clang version 10 running on ARM, AMD, and Intel architectures.
56Any discovered anomalies among compilers or versions is discussed.
57In this case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard.
58
59
60\subsection{Ill-Typed Expressions}
61
62C reports many ill-typed expressions as warnings.
63For example, these attempts to assign @y@ to @x@ and vice-versa are obviously ill-typed.
64\lstinput{12-15}{bkgd-c-tyerr.c}
65with warnings:
66\begin{cfa}
67warning: assignment to 'float *' from incompatible pointer type 'void (*)(void)'
68warning: assignment to 'void (*)(void)' from incompatible pointer type 'float *'
69\end{cfa}
70Similarly,
71\lstinput{17-19}{bkgd-c-tyerr.c}
72with warning:
73\begin{cfa}
74warning: passing argument 1 of 'f' from incompatible pointer type
75note: expected 'void (*)(void)' but argument is of type 'float *'
76\end{cfa}
77with a segmentation fault at runtime.
78Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems like madness.
79Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings.
80In the following discussion, ``ill-typed'' means giving a nonzero @gcc@ exit condition with a message that discusses typing.
81Note, \CFA's type-system rejects all these ill-typed cases as type mismatch errors.
82
83% That @f@'s attempt to call @g@ fails is not due to 3.14 being a particularly unlucky choice of value to put in the variable @pi@.
84% Rather, it is because obtaining a program that includes this essential fragment, yet exhibits a behaviour other than "doomed to crash," is a matter for an obfuscated coding competition.
85
86% A "tractable syntactic method for proving the absence of certain program behaviours by classifying phrases according to the kinds of values they compute"*1 rejected the program.
87% The behaviour (whose absence is unprovable) is neither minor nor unlikely.
88% The rejection shows that the program is ill-typed.
89%
90% Yet, the rejection presents as a GCC warning.
91% *1  TAPL-pg1 definition of a type system
92
93
94\section{Contributions}
95
96\subsection{Linked List}
97
98\subsection{Array}
99
100\subsection{String}
Note: See TracBrowser for help on using the repository browser.