Changeset c979afa for doc/theses/mike_brooks_MMath/intro.tex
- Timestamp:
- Mar 12, 2026, 10:38:53 PM (19 hours ago)
- Branches:
- master
- Parents:
- 43b6516
- File:
-
- 1 edited
-
doc/theses/mike_brooks_MMath/intro.tex (modified) (7 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/intro.tex
r43b6516 rc979afa 11 11 In a study of software exploits in the U.S. National Vulnerability Database over 2013--2017, the top reported vulnerability is buffer errors (in the sense of misused array bounds), among 19 vulnerability categories~\cite{Cifuentes19}. 12 12 13 While these figures address memory errors, defined without limit to the three primary collections, these authors' explanations of memory errors, among other C analysis efforts, suggest improving on C's language/library options for working with these three collections would be widelybeneficial.13 While these figures address memory errors, not limited to the three primary collections, the authors' explanations of memory errors suggest improving on C's language/library options for working with the three collections would be beneficial. 14 14 Every one of the safety-statistic sources above includes array bounds in its first characterization of unsafe programming-language features. 15 15 Of the four studies that deal exclusively with memory safety (excluding~\cite{Cifuentes19}), all put @malloc@--@free@ upfront as well. 16 The one that specifically surveys C features, \cite{Oorschot23}, also has similarly prominent blame on both of: the null-terminated text idiom, with its support/mandate in the C standard library; the unprotected nature of C pointers. 17 An effort at translating C to safe Rust~\cite{Emre2022} has pointers as the stand-out difficult aspect to analyze.\footnote{ 18 \cite{Emre2022} demonstrates the difficulty of analyzing C pointers for safety (while tacking some of this difficult problem). 19 It first puts a corpus of C programs through a naive Rust translation, which ``succeeds'' by leaving many uses of Rust's \lstinline{unsafe} marker. 20 Then, it analyzes these uses and presents a technique for reducing those uses that hinge exclusively on referenced object lifetime. 21 Pointer dereference accounts for 80\% of the cases that the naive translation did not make safe (Table 3.2, p33). 22 Of these, 10\% are eligible for the work's own safety improvement technique, while this filtering leaves 75\% with no single cause of unsafety determined (Table 3.5, p39). 23 The presented technique succeeds at making 75\% of the eligible initially unsafe dereferences safe, by inferring the necessary lifetime annotations (Table 4.2, p83). 24 This result leaves, per 1000 LOC, 3.3 unremovable unsafe dereferences that are understood to evade lifetime analysis, among 220 gross unsafe dereferences (Tables 3.5, 4.2 and 3.1, p27). 25 % 3.3 = 1398/413,428 * 1000 26 % 220 ~= (99,101 [@tab3.5] - 9631 [@tab4.2] ) / 413,428 [@tab 3.1] * 1000 27 } 16 The specific survey of C features, \cite{Oorschot23}, also places blame on: 17 \begin{itemize} 18 \item 19 the null-terminated text idiom, with its support/mandate in the C standard library and 20 \item 21 the unprotected nature of C pointers. 22 \end{itemize} 23 24 An effort at translating C to safe Rust~\cite{Emre2022} has pointers as the stand-out difficult aspect to analyze. 25 It demonstrates the difficulty of analyzing C pointers for safety. 26 It first puts a corpus of C programs through a naive Rust translation, which ``succeeds'' by leaving many uses of Rust's \lstinline{unsafe} marker. 27 Then, it analyzes these uses and presents a technique for reducing the uses that hinge exclusively on referenced object lifetime. 28 Pointer dereference accounts for 80\% of the cases that the na\"{i}ve translation did not make safe (Table 3.2, p.~33). 29 Of these, 10\% are eligible for the work's own safety improvement technique, while this filtering leaves 75\% with no single cause of unsafety determined (Table 3.5, p.~39). 30 The presented technique succeeds at making 75\% of the eligible initially unsafe dereferences safe, by inferring the necessary lifetime annotations (Table 4.2, p.~83). 31 This result leaves, per 1000 LOC, 3.3 unremovable unsafe dereferences that are understood to evade lifetime analysis, among 220 gross unsafe dereferences (Tables 3.5, 4.2 and 3.1, p.~27). 32 % 3.3 = 1398/413,428 * 1000 33 % 220 ~= (99,101 [@tab3.5] - 9631 [@tab4.2] ) / 413,428 [@tab 3.1] * 1000 34 28 35 A tool for analyzing C code's use of linked data structures (including, \eg skip list and binary tree) found variations of the linked-list shape to be present in most of the programs in its corpus~\cite{White2016}. 29 So, C's array unsafety is infamous, its string pattern necessitates explicit storage management (also infamous), and user-written linked lists are a n attrictivelyrecognizable player in the arena of (infamously) unsafe raw pointer uses.30 Therefore, hardening the three primary collections goes a long way to making the majority of C programs safer, andeliminating major hacker attack vectors.36 So, C's array unsafety is infamous, its string pattern necessitates explicit storage management (also infamous), and user-written linked lists are a recognizable player in the arena of (infamously) unsafe raw pointer uses. 37 Therefore, hardening the three primary collections goes a long way to making the majority of C programs safer, eliminating major hacker attack vectors. 31 38 32 39 This work looks at extending these foundational container types in the programming language \CFA, which is a new dialect of the C programming language. … … 46 53 For static and dynamic-fixed, an array can be stack allocated, while dynamic-variable requires the heap. 47 54 Because array layout has contiguous components, subscripting is a computation, \ie some form of pointer arithmetic. 55 Not included in this requirement are pseudo-arrays with arbitrary keys, \eg hash table subscripted by a string key (see below). 48 56 49 57 C provides a simple array type as a language feature. 50 58 However, it adopts the controversial position of treating pointer and array as twins, leading to multiple problems. 51 The way that arrays are typicaly passed arounda program removes the information necessary to do bound checks.59 Hence, the way that arrays are typically passed around in a program removes the information necessary to do bound checks. 52 60 53 61 … … 77 85 Therefore, the cost of a string operation is usually less important than the power of the operation to accomplish complex text manipulation, \eg search, analysing, composing, and decomposing string text. 78 86 The dynamic nature of a string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages. 79 In many cases, string management is separate from heap management, \ie strings roll theirown heap.87 In many cases, string management is separate from heap management, \ie strings roll-their-own heap. 80 88 81 89 The C string type is just a character array and requires user storage-management to handle varying size. 82 90 C adopts a terminating sentinel character, @'\0'@, to mark the end of a variable-length character-array versus a preceding length field. 83 Hence, the sentinel character is excluded froma string and determining the string length is an $O(N)$ operation.91 Hence, the sentinel character is excluded as an element within a string and determining the string length is an $O(N)$ operation. 84 92 The C standard library includes a number of high-level operations for working with this representation. 85 Most of these operations are awkward to use and error prone.93 Most of these operations are awkward to use and highly error prone. 86 94 87 95 … … 103 111 The three primary container types in C are difficult to understand, teach, and get right because they are too low level. 104 112 Providing higher-level, feature-rich versions of these containers in \CFA is a major component of the primary goal. 105 The result is a simplif yprogramming experience, which increases productivity and maintainability.106 \item 107 The new container types must be as correct and safe to use as those in other modern programming languages, which has been shown to a primary concernof industry, government, and military~\cite{ONCD}.113 The result is a simplified programming experience, which increases productivity and maintainability. 114 \item 115 The new container types must be as correct and safe to use as those in other modern programming languages, to elide the concerns of industry, government, and military~\cite{ONCD}. 108 116 Prior approaches focus on out-of-bound array accesses using a model-based approach (ASCET) in embedded systems (\eg cars)~\cite{Blache19}, and general and null-terminated string arrays using a \CC template syntax (Checked C)~\cite{Elliott18,Ruef19}. 109 117 Both White House~\cite{WhiteHouse24} and DARPA~\cite{DARPA24} recently released a recommendation to \emph{move away} from C and \CC, because of cybersecurity threats exploiting vulnerabilities in these languages. … … 111 119 \end{enumerate} 112 120 113 While new languages, \eg Go, Rust, Swift, purport to be systems languages replacing C, the reality is that rewriting massive C code-bases is impractical and a non-starter if the runtime uses garbage collection .121 While new languages, \eg Go, Rust, Swift, purport to be systems languages replacing C, the reality is that rewriting massive C code-bases is impractical and a non-starter if the runtime uses garbage collection~\cite{Brinker26}. 114 122 Even assuming automated conversion of C programs to other safe languages, who will maintain this massive new code-base? 115 123 Furthermore, new languages must still interact with the underlying C operating system through fragile, type-unsafe, interlanguage-communication. … … 142 150 Overall, this work has produced significant syntactic and semantic improvements to C's arrays, linked-lists and string types. 143 151 % As well, a strong plan for general iteration has been sketched out. 144 The following are the detailed contributions, where performance and safety were always the motivating factors.152 The following are the detailed contributions, where performance and safety are always the motivating factors. 145 153 146 154 \subsection{Array} … … 172 180 \subsection{String} 173 181 174 Thenew string type and its runtime library are available through @#include <string.hfa>@.182 A new string type and its runtime library are available through @#include <string.hfa>@. 175 183 The library offers a type where basic usage is comparable to the \CC @string@ type, including analogous coexistence with raw-character pointers. 176 184 Its implementation, however, follows different principles, enabling programs to work with strings by value, without incurring excessive copying.
Note:
See TracChangeset
for help on using the changeset viewer.