Index: doc/theses/mike_brooks_MMath/intro.tex
===================================================================
--- doc/theses/mike_brooks_MMath/intro.tex	(revision 1661ad7544afa80f4c0bca16476b62d7a8d0eab6)
+++ doc/theses/mike_brooks_MMath/intro.tex	(revision 16915b17690104542a0a42a110ccf4a6d34aacd6)
@@ -1,26 +1,29 @@
 \chapter{Introduction}
 
-All modern programming languages provide three high-level containers (collection): array, linked-list, and string.
-Often array is part of the programming language, while linked-list is built from pointer types, and string from a combination of array and linked-list.
-For all three types, there is some corresponding mechanism for iterating through the structure, where the iterator flexibility varies with the kind of structure and ingenuity of the iterator implementor.
+All modern programming languages provide three high-level containers (collections): array, linked-list, and string.
+Often array is part of the programming language, while linked-list is built from (recursive) pointer types, and string from a combination of array and linked-list.
+For all three types, languages supply varying degrees of high-level mechanism for manipulating these objects at the bulk level and at the component level, such as array copy, slicing and iterating.
+
+This work looks at extending these three foundational container types in the programming language \CFA, which is a new dialect of the C programming language.
+A primary goal of \CFA~\cite{Cforall} is 99\% backward compatibility with C, while maintaining a look and feel that matches with C programmer experience and intuition.
+This goal requires ``thinking inside the box'' to engineer new features that ``work and play'' with C and its massive legacy code-base.
+An additional goal is balancing good performance with safety.
 
 
 \section{Array}
 
-An array provides a homogeneous container with $O(1)$ access to elements using subscripting (some form of pointer arithmetic).
+An array provides a homogeneous container with $O(1)$ access to elements using subscripting.
 The array size can be static, dynamic but fixed after creation, or dynamic and variable after creation.
 For static and dynamic-fixed, an array can be stack allocated, while dynamic-variable requires the heap.
-Because array layout has contiguous components, subscripting is a computation.
-However, the computation can exceed the array bounds resulting in programming errors and security violations~\cite{Elliott18, Blache19, Ruef19, Oorschot23}.
-The goal is to provide good performance with safety.
+Because array layout has contiguous components, subscripting is a computation (some form of pointer arithmetic).
 
 
 \section{Linked list}
 
-A linked-list provides a homogeneous container often with $O(log N)$/$O(N)$ access to elements using successor and predecessor operations.
+A linked-list provides a homogeneous container often with $O(log N)$/$O(N)$ access to elements using successor and predecessor operations that normally involve pointer chasing.
 Subscripting by value is sometimes available, \eg hash table.
 Linked types are normally dynamically sized by adding/removing nodes using link fields internal or external to the elements (nodes).
 If a programming language allows pointer to stack storage, linked-list types can be allocated on the stack;
-otherwise, elements are heap allocated and explicitly/implicitly managed.
+otherwise, elements are heap allocated with explicitly/implicitly managed.
 
 
@@ -28,15 +31,30 @@
 
 A string provides a dynamic array of homogeneous elements, where the elements are often human-readable characters.
-What differentiates a string from other types in that its operations work on blocks of elements for scanning and changing the elements, rather than accessing individual elements, \eg @index@ and @substr@.
+What differentiates a string from other types in that its operations work on blocks of elements for scanning and changing, \eg @index@ and @substr@.
 Subscripting individual elements is often available.
-Often the cost of string operations is less important than the power of the operations to accomplish complex text manipulation, \eg search, analysing, composing, and decomposing.
+Therefore, the cost of string operations is less important than the power of the operations to accomplish complex text manipulation, \eg search, analysing, composing, and decomposing.
 The dynamic nature of a string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages.
+Often string management is separate from heap management, \ie strings roll their own heap.
 
 
 \section{Motivation}
 
-The goal of this work is to introduce safe and complex versions of array, link lists, and strings into the programming language \CFA~\cite{Cforall}, which is based on C.
-Unfortunately, to make C better, while retaining a high level of backwards compatibility, requires a significant knowledge of C's design.
-Hence, it is assumed the reader has a medium knowledge of C or \CC, on which extensive new C knowledge is built.
+The primary motivation for this work is two fold:
+\begin{enumerate}[leftmargin=*]
+\item
+These three aspects of C are extremely difficult to understand, teach, and get right because they are correspondingly extremely low level.
+Providing higher-level versions of these containers in \CFA is a major component of the primary goal.
+\item
+These three aspects of C cause the greatest safety issues because there are few or no safe guards when a programmer misunderstands or misuses these features~\cite{Elliott18, Blache19, Ruef19, Oorschot23}.
+Estimates suggest 50\%~\cite{Mendio24} of total reported open-source vulnerabilities occur in C resulting from errors using these facilities (memory errors), providing the major hacker attack-vectors.
+\end{enumerate}
+Both White House~\cite{WhiteHouse24} and DARPA~\cite{DARPA24} recently released a recommendation to move away from C and \CC, because of cybersecurity threats exploiting vulnerabilities in these older languages.
+Hardening these three types goes a long way to make the majority of C programs safer.
+
+
+While multiple new languages purport to be systems languages replacing C, the reality is that rewriting massive C code-bases is impractical and a non-starter if the new runtime uses garage collection.
+Furthermore, these languages must still interact with the underlying C operating system through fragile, type-unsafe, interlanguage-communication.
+Switching to \CC is equally impractical as its complex and interdependent type-system (\eg objects, inheritance, templates) means idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning \CC.
+Hence, rewriting and retraining costs for these languages can be prohibitive for companies with a large C software-base (Google, Apple, Microsoft, Amazon, AMD, Nvidia).
 
 
@@ -46,49 +64,15 @@
 However, most programming languages are only partially explained by standard's manuals.
 When it comes to explaining how C works, the definitive source is the @gcc@ compiler, which is mimicked by other C compilers, such as Clang~\cite{clang}.
-Often other C compilers must mimic @gcc@ because a large part of the C library (runtime) system contains @gcc@ features.
-While some key aspects of C need to be explained by quoting from the language reference manual, to illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and shows its behaviour.
+Often other C compilers must mimic @gcc@ because a large part of the C library (runtime) system (@glibc@ on Linux) contains @gcc@ features.
+While some key aspects of C need to be explained by quoting from the language reference manual, to illustrate definite program semantics, my approach in this thesis is to devise a program, whose behaviour exercises a point at issue, and shows its behaviour.
 These example programs show
-\begin{itemize}
-	\item the compiler accepts or rejects certain syntax,
+\begin{itemize}[leftmargin=*]
+	\item if the compiler accepts or rejects certain syntax,
 	\item prints output to buttress a claim of behaviour,
-	\item executes without triggering any embedded assertions testing pre/post-assertions or invariants.
+	\item or executes without triggering any embedded assertions testing pre/post-assertions or invariants.
 \end{itemize}
 This work has been tested across @gcc@ versions 8--12 and clang version 10 running on ARM, AMD, and Intel architectures.
 Any discovered anomalies among compilers or versions is discussed.
-In all case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard.
-
-
-\subsection{Ill-typed expressions}
-
-C reports many ill-typed expressions as warnings.
-For example, these attempts to assign @y@ to @x@ and vice-versa are obviously ill-typed.
-\lstinput{12-15}{bkgd-c-tyerr.c}
-with warnings:
-\begin{cfa}
-warning: assignment to 'float *' from incompatible pointer type 'void (*)(void)'
-warning: assignment to 'void (*)(void)' from incompatible pointer type 'float *'
-\end{cfa}
-Similarly,
-\lstinput{17-19}{bkgd-c-tyerr.c}
-with warning:
-\begin{cfa}
-warning: passing argument 1 of 'f' from incompatible pointer type
-note: expected 'void (*)(void)' but argument is of type 'float *'
-\end{cfa}
-with a segmentation fault at runtime.
-Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems inappropriate.
-Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings, \eg unsed variable.
-In the following discussion, ``ill-typed'' means giving a nonzero @gcc@ exit condition with a message that discusses typing.
-Note, \CFA's type-system rejects all these ill-typed cases as type mismatch errors.
-
-% That @f@'s attempt to call @g@ fails is not due to 3.14 being a particularly unlucky choice of value to put in the variable @pi@.
-% Rather, it is because obtaining a program that includes this essential fragment, yet exhibits a behaviour other than "doomed to crash," is a matter for an obfuscated coding competition.
-
-% A "tractable syntactic method for proving the absence of certain program behaviours by classifying phrases according to the kinds of values they compute"*1 rejected the program.
-% The behaviour (whose absence is unprovable) is neither minor nor unlikely.
-% The rejection shows that the program is ill-typed.
-% 
-% Yet, the rejection presents as a GCC warning.
-% *1  TAPL-pg1 definition of a type system
+In all case, it is never clear whether the \emph{truth} lies in the compiler or the C standard.
 
 
Index: doc/theses/mike_brooks_MMath/string.tex
===================================================================
--- doc/theses/mike_brooks_MMath/string.tex	(revision 1661ad7544afa80f4c0bca16476b62d7a8d0eab6)
+++ doc/theses/mike_brooks_MMath/string.tex	(revision 16915b17690104542a0a42a110ccf4a6d34aacd6)
@@ -1,5 +1,7 @@
 \chapter{String}
 
-\section{String}
+
+
+
 
 \subsection{Logical overlap}
Index: doc/theses/mike_brooks_MMath/uw-ethesis.bib
===================================================================
--- doc/theses/mike_brooks_MMath/uw-ethesis.bib	(revision 1661ad7544afa80f4c0bca16476b62d7a8d0eab6)
+++ doc/theses/mike_brooks_MMath/uw-ethesis.bib	(revision 16915b17690104542a0a42a110ccf4a6d34aacd6)
@@ -114,2 +114,11 @@
     pages	= {53-60},
 }
+
+
+@misc{Mendio24,
+    contributer	= {pabuhr@plg},
+    title	= {What are the most secure programming languages?},
+    author	= {Mend.io (White Source Ltd.)},
+    year	= 2024,
+    howpublished= {\url{https://www.mend.io/most-secure-programming-languages}},
+}
