source: doc/theses/mike_brooks_MMath/intro.tex @ 85855b0

Last change on this file since 85855b0 was c721105, checked in by Peter A. Buhr <pabuhr@…>, 5 months ago

proofreading changes

  • Property mode set to 100644
File size: 6.0 KB
Line 
1\chapter{Introduction}
2
3All modern programming languages provide three high-level containers (collection): array, linked-list, and string.
4Often array is part of the programming language, while linked-list is built from pointer types, and string from a combination of array and linked-list.
5For all three types, there is some corresponding mechanism for iterating through the structure, where the iterator flexibility varies with the kind of structure and ingenuity of the iterator implementor.
6
7
8\section{Array}
9
10An array provides a homogeneous container with $O(1)$ access to elements using subscripting (some form of pointer arithmetic).
11The array size can be static, dynamic but fixed after creation, or dynamic and variable after creation.
12For static and dynamic-fixed, an array can be stack allocated, while dynamic-variable requires the heap.
13Because array layout has contiguous components, subscripting is a computation.
14However, the computation can exceed the array bounds resulting in programming errors and security violations~\cite{Elliott18, Blache19, Ruef19, Oorschot23}.
15The goal is to provide good performance with safety.
16
17
18\section{Linked List}
19
20A linked-list provides a homogeneous container often with $O(log N)$/$O(N)$ access to elements using successor and predecessor operations.
21Subscripting by value is sometimes available, \eg hash table.
22Linked types are normally dynamically sized by adding/removing nodes using link fields internal or external to the elements (nodes).
23If a programming language allows pointer to stack storage, linked-list types can be allocated on the stack;
24otherwise, elements are heap allocated and explicitly/implicitly managed.
25
26
27\section{String}
28
29A string provides a dynamic array of homogeneous elements, where the elements are often human-readable characters.
30What differentiates a string from other types in that its operations work on blocks of elements for scanning and changing the elements, rather than accessing individual elements, \eg @index@ and @substr@.
31Subscripting individual elements is often available.
32Often the cost of string operations is less important than the power of the operations to accomplish complex text manipulation, \eg search, analysing, composing, and decomposing.
33The dynamic nature of a string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages.
34
35
36\section{Motivation}
37
38The goal of this work is to introduce safe and complex versions of array, link lists, and strings into the programming language \CFA~\cite{Cforall}, which is based on C.
39Unfortunately, to make C better, while retaining a high level of backwards compatibility, requires a significant knowledge of C's design.
40Hence, it is assumed the reader has a medium knowledge of C or \CC, on which extensive new C knowledge is built.
41
42
43\subsection{C?}
44
45Like many established programming languages, C has a standards committee and multiple ANSI/\-ISO language manuals~\cite{C99,C11,C18,C23}.
46However, most programming languages are only partially explained by standard's manuals.
47When it comes to explaining how C works, the definitive source is the @gcc@ compiler, which is mimicked by other C compilers, such as Clang~\cite{clang}.
48Often other C compilers must mimic @gcc@ because a large part of the C library (runtime) system contains @gcc@ features.
49While some key aspects of C need to be explained by quoting from the language reference manual, to illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and shows its behaviour.
50These example programs show
51\begin{itemize}
52        \item the compiler accepts or rejects certain syntax,
53        \item prints output to buttress a claim of behaviour,
54        \item executes without triggering any embedded assertions testing pre/post-assertions or invariants.
55\end{itemize}
56This work has been tested across @gcc@ versions 8--12 and clang version 10 running on ARM, AMD, and Intel architectures.
57Any discovered anomalies among compilers or versions is discussed.
58In all case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard.
59
60
61\subsection{Ill-Typed Expressions}
62
63C reports many ill-typed expressions as warnings.
64For example, these attempts to assign @y@ to @x@ and vice-versa are obviously ill-typed.
65\lstinput{12-15}{bkgd-c-tyerr.c}
66with warnings:
67\begin{cfa}
68warning: assignment to 'float *' from incompatible pointer type 'void (*)(void)'
69warning: assignment to 'void (*)(void)' from incompatible pointer type 'float *'
70\end{cfa}
71Similarly,
72\lstinput{17-19}{bkgd-c-tyerr.c}
73with warning:
74\begin{cfa}
75warning: passing argument 1 of 'f' from incompatible pointer type
76note: expected 'void (*)(void)' but argument is of type 'float *'
77\end{cfa}
78with a segmentation fault at runtime.
79Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems inappropriate.
80Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings, \eg unsed variable.
81In the following discussion, ``ill-typed'' means giving a nonzero @gcc@ exit condition with a message that discusses typing.
82Note, \CFA's type-system rejects all these ill-typed cases as type mismatch errors.
83
84% That @f@'s attempt to call @g@ fails is not due to 3.14 being a particularly unlucky choice of value to put in the variable @pi@.
85% Rather, it is because obtaining a program that includes this essential fragment, yet exhibits a behaviour other than "doomed to crash," is a matter for an obfuscated coding competition.
86
87% A "tractable syntactic method for proving the absence of certain program behaviours by classifying phrases according to the kinds of values they compute"*1 rejected the program.
88% The behaviour (whose absence is unprovable) is neither minor nor unlikely.
89% The rejection shows that the program is ill-typed.
90%
91% Yet, the rejection presents as a GCC warning.
92% *1  TAPL-pg1 definition of a type system
93
94
95\section{Contributions}
96
97\subsection{Linked List}
98
99\subsection{Array}
100
101\subsection{String}
102
103\subsection{Iterator}
Note: See TracBrowser for help on using the repository browser.