Index: doc/theses/mike_brooks_MMath/intro.tex
===================================================================
--- doc/theses/mike_brooks_MMath/intro.tex	(revision 9c8afc70c6dc1fa03a353d2b14e83cce3a95909e)
+++ doc/theses/mike_brooks_MMath/intro.tex	(revision 79ec8c33d579c8ed4fccea236937f7aa5e7bb517)
@@ -2,12 +2,13 @@
 
 All modern programming languages provide three high-level containers (collections): array, linked-list, and string.
-Often array is part of the programming language, while linked-list is built from (recursive) pointer types, and string from a combination of array and linked-list.
+Often array is part of the programming language, while linked-lists are built from (recursive) pointer types, and strings from a combination of array and linked-list.
 For all three types, languages and/or their libraries supply varying degrees of high-level mechanisms for manipulating these objects at the bulk and component level, such as copying, slicing, extracting, and iterating among elements.
 
 Unfortunately, these three aspects of C cause a significant number of memory errors~\cite{Oorschot23}.
+Estimates suggest 50\%~\cite{Mendio24} of total reported open-source vulnerabilities occurring in C result from errors using these facilities (memory errors).
 For operating system and browser vendors, who heavily use systems languages, 60\%--70\% of reported software vulnerabilities involved memory errors~\cite{Kehrer23}.
 For Microsoft, 70\% of vulnerabilities addressed via security updates between 2006--2018 are memory safety issues~\cite[slide 10]{Miller19}.
 In a study of software exploits in the U.S. National Vulnerability Database over 2013--2017, the top reported vulnerability is (memory) buffer errors, among 19 vulnerability categories~\cite{Cifuentes19}.
-Therefore, hardening these three C types goes a long way to make the majority of C programs safer.
+Therefore, hardening these three C types goes a long way to make the majority of C programs safer and eliminating major hacker attack-vectors.
 
 This work looks at extending these three foundational container types in the programming language \CFA, which is a new dialect of the C programming language.
@@ -26,8 +27,8 @@
 The array size can be static, dynamic but fixed after creation, or dynamic and variable after creation.
 For static and dynamic-fixed, an array can be stack allocated, while dynamic-variable requires the heap.
-Because array layout has contiguous components, subscripting is a computation (some form of pointer arithmetic).
+Because array layout has contiguous components, subscripting is a computation, \ie some form of pointer arithmetic.
 
 C provides a simple array type as a language feature.
-However, it adopts the controversial language position of treating pointer and array as duals, leading to multiple problems.
+However, it adopts the controversial position of treating pointer and array as duals, leading to multiple problems.
 
 
@@ -36,6 +37,6 @@
 A linked-list provides a homogeneous container often with $O(log N)$ or $O(N)$ access to elements using successor and predecessor operations that normally involve pointer chasing.
 Subscripting by value (rather than position or location as for array) is sometimes available, \eg hash table.
-Linked types are normally dynamically sized by adding and removing nodes using link fields internal or external to the elements (nodes).
-If a programming language allows pointers to stack storage, linked-list nodes can be allocated on the stack and connected with stack addresses (pointers);
+Linked types are dynamically sized by adding and removing nodes using link fields internal or external to the list elements (nodes).
+If a programming language allows pointers to stack storage, linked-list nodes can be allocated on the stack and connected with stack addresses;
 otherwise, elements are heap allocated with internal or external link fields.
 
@@ -47,5 +48,5 @@
 hash search table consisting of a key (string) with associated data (@<search.h>@)
 \end{itemize}
-Because these libraries are simple, awkward to use, and unsafe, C programmers commonly build bespoke linked data-structures.
+Because these container libraries can be restrictive, awkward to use, and unsafe, C programmers often build bespoke linked data-structures, which further increases the potential for memory errors.
 
 
@@ -54,13 +55,14 @@
 A string provides a dynamic array of homogeneous elements, where the elements are (often) some form of human-readable characters.
 What differentiates a string from other types in that many of its operations work on groups of elements for scanning and changing, \eg @index@ and @substr@.
-While subscripting individual elements is usually available, working at the individual character level is considered poor practise, \ie underutilizing the powerful string operations.
+While subscripting individual elements is usually available, working at the character level is considered poor practise, \ie underutilizing the powerful string operations.
 Therefore, the cost of a string operation is usually less important than the power of the operation to accomplish complex text manipulation, \eg search, analysing, composing, and decomposing string text.
 The dynamic nature of a string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages.
-In some cases, string management is separate from heap management, \ie strings roll their own heap.
+In many cases, string management is separate from heap management, \ie strings roll their own heap.
 
 The C string type is just a character array and requires user storage-management to handle varying size.
-The character array uses the convention of marking its (variable) array length by placing the 0-valued control character at the end (null-terminated).
+C adopts a terminating sentinel character, @'\0'@, to mark the end of a variable-length character-array versus a preceding length field.
+Hence, the sentinel character is excluded from a string and determining the string length is an $O(N)$ operation.
 The C standard library includes a number of high-level operations for working with this representation.
-Most of these operations are awkward and error prone.
+Most of these operations are awkward to use and error prone.
 
 
@@ -80,18 +82,21 @@
 \begin{enumerate}[leftmargin=*]
 \item
-These three aspects of C are difficult to understand, teach, and get right because they are correspondingly low level.
+The three primary container types in C are difficult to understand, teach, and get right because they are too low level.
 Providing higher-level, feature-rich versions of these containers in \CFA is a major component of the primary goal.
+The result is a simplify programming experience, which increases productivity and maintainability.
 \item
-These three aspects of C cause the greatest safety issues because there are few or no safe guards when a programmer misunderstands or misuses these features~\cite{Elliott18, Blache19, Ruef19, Oorschot23}.
-Estimates suggest 50\%~\cite{Mendio24} of total reported open-source vulnerabilities occurring in C result from errors using these facilities (memory errors), providing the major hacker attack-vectors.
+The new container types must be as correct and safe to use as those in other modern programming languages, which has been shown to a primary concern of industry, government, and military~\cite{ONCD}.
+Prior approaches focus on out-of-bound array accesses using a model-based approach (ASCET) in embedded systems (\eg cars)~\cite{Blache19}, and general and null-terminated string arrays using a \CC template syntax (Checked C)~\cite{Elliott18,Ruef19}.
+Both White House~\cite{WhiteHouse24} and DARPA~\cite{DARPA24} recently released a recommendation to \emph{move away} from C and \CC, because of cybersecurity threats exploiting vulnerabilities in these languages.
+Fixing these vulnerabilities negates this need, allowing C and its ecosystem to continue into the future.
 \end{enumerate}
-Both White House~\cite{WhiteHouse24} and DARPA~\cite{DARPA24} recently released a recommendation to move away from C and \CC, because of cybersecurity threats exploiting vulnerabilities in these older languages.
-Hardening these three types goes a long way to make the majority of C programs safer.
 
-
-While multiple new languages purport to be systems languages replacing C, the reality is that rewriting massive C code-bases is impractical and a non-starter if the new runtime uses garage collection.
+While new languages, \eg Go, Rust, Swift, purport to be systems languages replacing C, the reality is that rewriting massive C code-bases is impractical and a non-starter if the runtime uses garbage collection.
+Even assuming automated conversion of C programs to other safe languages, who will maintain this massive new code-base?
 Furthermore, new languages must still interact with the underlying C operating system through fragile, type-unsafe, interlanguage-communication.
-Switching to \CC is equally impractical as its complex and interdependent type-system (\eg objects, overloading, inheritance, templates) means idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning new \CC.
-Hence, rewriting and retraining costs for these languages can be prohibitive for companies with a large C software-base (Google, Apple, Microsoft, Amazon, AMD, Nvidia).
+Switching to \CC is equally impractical as its complex and interdependent type-system (\eg objects, overloading, inheritance, templates) means idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning new \CC features and idioms.
+Finally, projects claiming to be modern C replacements (Cyclone~\cite{Grossman06}, Polymorphic C~\cite{Smith98}, GObject~\cite{GObject}) do not retain C backwards compatibility in syntax, programing model, or semantic compatibility;
+these languages are equivalent to switching to a different programming language.
+Hence, rewriting and retraining costs for other languages are prohibitive when companies have a large C software-base (Google, Apple, Microsoft, Amazon, AMD, Nvidia).
 
 
@@ -100,10 +105,9 @@
 Like many established programming languages, C has a standards committee and multiple ANSI/\-ISO language manuals~\cite{C99,C11,C18,C23}.
 However, most programming languages are only partially explained by their standard's manual(s).
-When it comes to explaining how C works, the definitive source is the @gcc@ compiler, which is mimicked by other C compilers, such as Clang~\cite{clang}.
-Often other C compilers \emph{must} mimic @gcc@ because a large part of the C library (runtime) system (@glibc@ on Linux) contains @gcc@ features.
+When it comes to explaining how C works, the definitive source is the @gcc@ compiler, which is mimicked by other C compilers, such as Clang~\cite{clang}, because a large part of the C library (runtime) system (@glibc@ on Linux) contains @gcc@ features.
 Some key aspects of C need to be explained and understood by quoting from the language reference manual.
 However, to illustrate actual program semantics, this thesis constructs multiple small programs whose behaviour exercises a particular point and then confirms the behaviour by running the program using multiple @gcc@ compilers.
 These example programs show
-\begin{itemize}
+\begin{itemize}[itemsep=0pt]
 	\item if the compiler accepts or rejects certain syntax,
 	\item prints output to buttress a behavioural claim,
@@ -111,5 +115,5 @@
 \end{itemize}
 These programs are tested across @gcc@ versions 8--14 and clang versions 10--14 running on ARM, AMD, and Intel architectures.
-Any discovered anomalies among compilers, versions, or architectures is discussed.
+Any discovered anomalies among compilers, versions, or architectures are discussed.
 In general, it is never clear whether the \emph{truth} lies in the C standard or the compiler(s), which may be true for other programming languages.
 
@@ -118,13 +122,13 @@
 
 Overall, this work has produced significant syntactic and semantic improvements to C's arrays, linked-lists and string types.
-As well, a strong plan for general iteration has been sketched out.
+% As well, a strong plan for general iteration has been sketched out.
 The following are the detailed contributions, where performance and safety were always the motivating factors.
 
 \subsection{Array}
 
-This work's array improvements are:
+The improvements to C arrays are:
 \begin{enumerate}[leftmargin=*]
 \item
-Introduce a small number of subtle changes to the typing rules for the C array, while still achieving significant backwards compatibility
+Introduce a small number of subtle changes to the typing rules for the C array, while still achieving significant backwards compatibility.
 \item
 Create a new polymorphic mechanism in the \CFA @forall@ clause to specify array dimension values, similar to a fixed-typed parameter in a \CC \lstinline[language=C++]{template}.
