Changeset c979afa


Ignore:
Timestamp:
Mar 12, 2026, 10:38:53 PM (7 hours ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Parents:
43b6516
Message:

final proofread of introduction chapter

Location:
doc/theses/mike_brooks_MMath
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/intro.tex

    r43b6516 rc979afa  
    1111In a study of software exploits in the U.S. National Vulnerability Database over 2013--2017, the top reported vulnerability is buffer errors (in the sense of misused array bounds), among 19 vulnerability categories~\cite{Cifuentes19}.
    1212
    13 While these figures address memory errors, defined without limit to the three primary collections, these authors' explanations of memory errors, among other C analysis efforts, suggest improving on C's language/library options for working with these three collections would be widely beneficial.
     13While these figures address memory errors, not limited to the three primary collections, the authors' explanations of memory errors suggest improving on C's language/library options for working with the three collections would be beneficial.
    1414Every one of the safety-statistic sources above includes array bounds in its first characterization of unsafe programming-language features.
    1515Of the four studies that deal exclusively with memory safety (excluding~\cite{Cifuentes19}), all put @malloc@--@free@ upfront as well.
    16 The one that specifically surveys C features, \cite{Oorschot23}, also has similarly prominent blame on both of: the null-terminated text idiom, with its support/mandate in the C standard library; the unprotected nature of C pointers.
    17 An effort at translating C to safe Rust~\cite{Emre2022} has pointers as the stand-out difficult aspect to analyze.\footnote{
    18         \cite{Emre2022} demonstrates the difficulty of analyzing C pointers for safety (while tacking some of this difficult problem).
    19         It first puts a corpus of C programs through a naive Rust translation, which ``succeeds'' by leaving many uses of Rust's \lstinline{unsafe} marker.
    20         Then, it analyzes these uses and presents a technique for reducing those uses that hinge exclusively on referenced object lifetime.
    21         Pointer dereference accounts for 80\% of the cases that the naive translation did not make safe (Table 3.2, p33).
    22         Of these, 10\% are eligible for the work's own safety improvement technique, while this filtering leaves 75\% with no single cause of unsafety determined (Table 3.5, p39).
    23         The presented technique succeeds at making 75\% of the eligible initially unsafe dereferences safe, by inferring the necessary lifetime annotations (Table 4.2, p83).
    24         This result leaves, per 1000 LOC, 3.3 unremovable unsafe dereferences that are understood to evade lifetime analysis, among 220 gross unsafe dereferences (Tables 3.5, 4.2 and 3.1, p27).
    25         % 3.3 = 1398/413,428 * 1000
    26         % 220 ~= (99,101 [@tab3.5] - 9631 [@tab4.2] ) / 413,428 [@tab 3.1] * 1000
    27 }
     16The specific survey of C features, \cite{Oorschot23}, also places blame on:
     17\begin{itemize}
     18\item
     19the null-terminated text idiom, with its support/mandate in the C standard library and
     20\item
     21the unprotected nature of C pointers.
     22\end{itemize}
     23
     24An effort at translating C to safe Rust~\cite{Emre2022} has pointers as the stand-out difficult aspect to analyze.
     25It demonstrates the difficulty of analyzing C pointers for safety.
     26It first puts a corpus of C programs through a naive Rust translation, which ``succeeds'' by leaving many uses of Rust's \lstinline{unsafe} marker.
     27Then, it analyzes these uses and presents a technique for reducing the uses that hinge exclusively on referenced object lifetime.
     28Pointer dereference accounts for 80\% of the cases that the na\"{i}ve translation did not make safe (Table 3.2, p.~33).
     29Of these, 10\% are eligible for the work's own safety improvement technique, while this filtering leaves 75\% with no single cause of unsafety determined (Table 3.5, p.~39).
     30The presented technique succeeds at making 75\% of the eligible initially unsafe dereferences safe, by inferring the necessary lifetime annotations (Table 4.2, p.~83).
     31This result leaves, per 1000 LOC, 3.3 unremovable unsafe dereferences that are understood to evade lifetime analysis, among 220 gross unsafe dereferences (Tables 3.5, 4.2 and 3.1, p.~27).
     32% 3.3 = 1398/413,428 * 1000
     33% 220 ~= (99,101 [@tab3.5] - 9631 [@tab4.2] ) / 413,428 [@tab 3.1] * 1000
     34
    2835A tool for analyzing C code's use of linked data structures (including, \eg skip list and binary tree) found variations of the linked-list shape to be present in most of the programs in its corpus~\cite{White2016}.
    29 So, C's array unsafety is infamous, its string pattern necessitates explicit storage management (also infamous), and user-written linked lists are an attrictively recognizable player in the arena of (infamously) unsafe raw pointer uses.
    30 Therefore, hardening the three primary collections goes a long way to making the majority of C programs safer, and eliminating major hacker attack vectors.
     36So, C's array unsafety is infamous, its string pattern necessitates explicit storage management (also infamous), and user-written linked lists are a recognizable player in the arena of (infamously) unsafe raw pointer uses.
     37Therefore, hardening the three primary collections goes a long way to making the majority of C programs safer, eliminating major hacker attack vectors.
    3138
    3239This work looks at extending these foundational container types in the programming language \CFA, which is a new dialect of the C programming language.
     
    4653For static and dynamic-fixed, an array can be stack allocated, while dynamic-variable requires the heap.
    4754Because array layout has contiguous components, subscripting is a computation, \ie some form of pointer arithmetic.
     55Not included in this requirement are pseudo-arrays with arbitrary keys, \eg hash table subscripted by a string key (see below).
    4856
    4957C provides a simple array type as a language feature.
    5058However, it adopts the controversial position of treating pointer and array as twins, leading to multiple problems.
    51 The way that arrays are typicaly passed around a program removes the information necessary to do bound checks.
     59Hence, the way that arrays are typically passed around in a program removes the information necessary to do bound checks.
    5260
    5361
     
    7785Therefore, the cost of a string operation is usually less important than the power of the operation to accomplish complex text manipulation, \eg search, analysing, composing, and decomposing string text.
    7886The dynamic nature of a string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages.
    79 In many cases, string management is separate from heap management, \ie strings roll their own heap.
     87In many cases, string management is separate from heap management, \ie strings roll-their-own heap.
    8088
    8189The C string type is just a character array and requires user storage-management to handle varying size.
    8290C adopts a terminating sentinel character, @'\0'@, to mark the end of a variable-length character-array versus a preceding length field.
    83 Hence, the sentinel character is excluded from a string and determining the string length is an $O(N)$ operation.
     91Hence, the sentinel character is excluded as an element within a string and determining the string length is an $O(N)$ operation.
    8492The C standard library includes a number of high-level operations for working with this representation.
    85 Most of these operations are awkward to use and error prone.
     93Most of these operations are awkward to use and highly error prone.
    8694
    8795
     
    103111The three primary container types in C are difficult to understand, teach, and get right because they are too low level.
    104112Providing higher-level, feature-rich versions of these containers in \CFA is a major component of the primary goal.
    105 The result is a simplify programming experience, which increases productivity and maintainability.
    106 \item
    107 The new container types must be as correct and safe to use as those in other modern programming languages, which has been shown to a primary concern of industry, government, and military~\cite{ONCD}.
     113The result is a simplified programming experience, which increases productivity and maintainability.
     114\item
     115The new container types must be as correct and safe to use as those in other modern programming languages, to elide the concerns of industry, government, and military~\cite{ONCD}.
    108116Prior approaches focus on out-of-bound array accesses using a model-based approach (ASCET) in embedded systems (\eg cars)~\cite{Blache19}, and general and null-terminated string arrays using a \CC template syntax (Checked C)~\cite{Elliott18,Ruef19}.
    109117Both White House~\cite{WhiteHouse24} and DARPA~\cite{DARPA24} recently released a recommendation to \emph{move away} from C and \CC, because of cybersecurity threats exploiting vulnerabilities in these languages.
     
    111119\end{enumerate}
    112120
    113 While new languages, \eg Go, Rust, Swift, purport to be systems languages replacing C, the reality is that rewriting massive C code-bases is impractical and a non-starter if the runtime uses garbage collection.
     121While new languages, \eg Go, Rust, Swift, purport to be systems languages replacing C, the reality is that rewriting massive C code-bases is impractical and a non-starter if the runtime uses garbage collection~\cite{Brinker26}.
    114122Even assuming automated conversion of C programs to other safe languages, who will maintain this massive new code-base?
    115123Furthermore, new languages must still interact with the underlying C operating system through fragile, type-unsafe, interlanguage-communication.
     
    142150Overall, this work has produced significant syntactic and semantic improvements to C's arrays, linked-lists and string types.
    143151% As well, a strong plan for general iteration has been sketched out.
    144 The following are the detailed contributions, where performance and safety were always the motivating factors.
     152The following are the detailed contributions, where performance and safety are always the motivating factors.
    145153
    146154\subsection{Array}
     
    172180\subsection{String}
    173181
    174 The new string type and its runtime library are available through @#include <string.hfa>@.
     182A new string type and its runtime library are available through @#include <string.hfa>@.
    175183The library offers a type where basic usage is comparable to the \CC @string@ type, including analogous coexistence with raw-character pointers.
    176184Its implementation, however, follows different principles, enabling programs to work with strings by value, without incurring excessive copying.
  • doc/theses/mike_brooks_MMath/uw-ethesis.bib

    r43b6516 rc979afa  
    1 % Bibliography of key references for "LaTeX for Thesis and Large Documents"
     1f% Bibliography of key references for "LaTeX for Thesis and Large Documents"
    22% For use with BibTeX
    33
     
    126126
    127127@phdthesis{Emre2022,
    128     author  = "Mehmet Emre",
    129     title   = "Translating C to Safe Rust: Reasoning about Pointer Types and Lifetimes",
    130     school  = "UC Santa Barbara",
    131     year    = "2022"
     128    author      = "Mehmet Emre",
     129    title       = "Translating C to Safe Rust: Reasoning about Pointer Types and Lifetimes",
     130    school      = "UC Santa Barbara",
     131    year        = 2022
    132132}
    133133
    134134@inproceedings{White2016,
    135     author = {White, David H. and Rupprecht, Thomas and L\"{u}ttgen, Gerald},
    136     title = {DSI: an evidence-based approach to identify dynamic data structures in C programs},
    137     year = {2016},
    138     isbn = {9781450343909},
    139     publisher = {Association for Computing Machinery},
    140     address = {New York, NY, USA},
    141     url = {https://doi.org/10.1145/2931037.2931071},
    142     doi = {10.1145/2931037.2931071},
    143     booktitle = {Proceedings of the 25th International Symposium on Software Testing and Analysis},
    144     pages = {259–269},
    145     numpages = {11},
    146     keywords = {Data structure identification, dynamic data structures, pointer programs, program comprehension},
    147     location = {Saarbr\"{u}cken, Germany},
    148     series = {ISSTA 2016}
     135    author      = {White, David H. and Rupprecht, Thomas and L\"{u}ttgen, Gerald},
     136    title       = {DSI: an evidence-based approach to identify dynamic data structures in C programs},
     137    year        = {2016},
     138    isbn        = {9781450343909},
     139    publisher   = {Association for Computing Machinery},
     140    address     = {New York, NY, USA},
     141    url = {https://doi.org/10.1145/2931037.2931071},
     142    doi = {10.1145/2931037.2931071},
     143    booktitle   = {Proceedings of the 25th International Symposium on Software Testing and Analysis},
     144    pages       = {259-269},
     145    numpages    = {11},
     146    keywords    = {Data structure identification, dynamic data structures, pointer programs, program comprehension},
     147    location    = {Saarbr\"{u}cken, Germany},
     148    series      = {ISSTA 2016}
    149149}
    150150
     
    180180}
    181181
     182@article{Brinker26,
     183    author      = {Brinker, Andrew Lilley},
     184    title       = {Memory Safety for Skeptics},
     185    year        = {2026},
     186    publisher   = {ACM},
     187    address     = {New York, NY, USA},
     188    volume      = {69},
     189    number      = {2},
     190    journal     = {CACM},
     191    month       = jan,
     192    pages       = {52-58},
     193}
Note: See TracChangeset for help on using the changeset viewer.