Changes in / [774c97e:6abb6dc]


Ignore:
Location:
doc
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • doc/bibliography/pl.bib

    r774c97e r6abb6dc  
    30883088    note        = {WikipediA},
    30893089    howpublished= {\url{http://www.akkadia.org/drepper/tls.pdf}},
    3090 }
    3091 
    3092 @misc{DARPA24,
    3093     contributer = {pabuhr@plg},
    3094     title       = {Eliminating Memory Safety Vulnerabilities Once and For All},
    3095     author      = {Defense Advanced Research Projects Agency},
    3096     year        = 2024,
    3097     month       = jul,
    3098     howpublished= {\url{https://www.darpa.mil/news-events/2024-07-31a}},
    30993090}
    31003091
     
    76957686}
    76967687
    7697 @misc{WhiteHouse24,
    7698     contributer = {pabuhr@plg},
    7699     title       = {Part {II}: Securing the Building Blocks of Cyberspace},
    7700     author      = {U.S. Federal Government},
    7701     year        = 2024,
    7702     howpublished= {\url{https://www.whitehouse.gov/wp-content/uploads/2024/02/Final-ONCD-Technical-Report.pdf}},
    7703 }
    7704 
    77057688@inproceedings{Chen07,
    77067689    keywords    = {chip multiprocessors, constructive cache sharing, parallel depth first, scheduling algorithms, thread granularity, work stealing, working set profiling},
  • doc/theses/mike_brooks_MMath/intro.tex

    r774c97e r6abb6dc  
    11\chapter{Introduction}
    22
    3 All modern programming languages provide three high-level containers (collections): array, linked-list, and string.
    4 Often array is part of the programming language, while linked-list is built from (recursive) pointer types, and string from a combination of array and linked-list.
    5 For all three types, languages supply varying degrees of high-level mechanism for manipulating these objects at the bulk level and at the component level, such as array copy, slicing and iterating.
    6 
    7 This work looks at extending these three foundational container types in the programming language \CFA, which is a new dialect of the C programming language.
    8 A primary goal of \CFA~\cite{Cforall} is 99\% backward compatibility with C, while maintaining a look and feel that matches with C programmer experience and intuition.
    9 This goal requires ``thinking inside the box'' to engineer new features that ``work and play'' with C and its massive legacy code-base.
    10 An additional goal is balancing good performance with safety.
     3All modern programming languages provide three high-level containers (collection): array, linked-list, and string.
     4Often array is part of the programming language, while linked-list is built from pointer types, and string from a combination of array and linked-list.
     5For all three types, there is some corresponding mechanism for iterating through the structure, where the iterator flexibility varies with the kind of structure and ingenuity of the iterator implementor.
    116
    127
    138\section{Array}
    149
    15 An array provides a homogeneous container with $O(1)$ access to elements using subscripting.
     10An array provides a homogeneous container with $O(1)$ access to elements using subscripting (some form of pointer arithmetic).
    1611The array size can be static, dynamic but fixed after creation, or dynamic and variable after creation.
    1712For static and dynamic-fixed, an array can be stack allocated, while dynamic-variable requires the heap.
    18 Because array layout has contiguous components, subscripting is a computation (some form of pointer arithmetic).
     13Because array layout has contiguous components, subscripting is a computation.
     14However, the computation can exceed the array bounds resulting in programming errors and security violations~\cite{Elliott18, Blache19, Ruef19, Oorschot23}.
     15The goal is to provide good performance with safety.
    1916
    2017
    2118\section{Linked list}
    2219
    23 A linked-list provides a homogeneous container often with $O(log N)$/$O(N)$ access to elements using successor and predecessor operations that normally involve pointer chasing.
     20A linked-list provides a homogeneous container often with $O(log N)$/$O(N)$ access to elements using successor and predecessor operations.
    2421Subscripting by value is sometimes available, \eg hash table.
    2522Linked types are normally dynamically sized by adding/removing nodes using link fields internal or external to the elements (nodes).
    2623If a programming language allows pointer to stack storage, linked-list types can be allocated on the stack;
    27 otherwise, elements are heap allocated with explicitly/implicitly managed.
     24otherwise, elements are heap allocated and explicitly/implicitly managed.
    2825
    2926
     
    3128
    3229A string provides a dynamic array of homogeneous elements, where the elements are often human-readable characters.
    33 What differentiates a string from other types in that its operations work on blocks of elements for scanning and changing, \eg @index@ and @substr@.
     30What differentiates a string from other types in that its operations work on blocks of elements for scanning and changing the elements, rather than accessing individual elements, \eg @index@ and @substr@.
    3431Subscripting individual elements is often available.
    35 Therefore, the cost of string operations is less important than the power of the operations to accomplish complex text manipulation, \eg search, analysing, composing, and decomposing.
     32Often the cost of string operations is less important than the power of the operations to accomplish complex text manipulation, \eg search, analysing, composing, and decomposing.
    3633The dynamic nature of a string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages.
    37 Often string management is separate from heap management, \ie strings roll their own heap.
    3834
    3935
    4036\section{Motivation}
    4137
    42 The primary motivation for this work is two fold:
    43 \begin{enumerate}[leftmargin=*]
    44 \item
    45 These three aspects of C are extremely difficult to understand, teach, and get right because they are correspondingly extremely low level.
    46 Providing higher-level versions of these containers in \CFA is a major component of the primary goal.
    47 \item
    48 These three aspects of C cause the greatest safety issues because there are few or no safe guards when a programmer misunderstands or misuses these features~\cite{Elliott18, Blache19, Ruef19, Oorschot23}.
    49 Estimates suggest 50\%~\cite{Mendio24} of total reported open-source vulnerabilities occur in C resulting from errors using these facilities (memory errors), providing the major hacker attack-vectors.
    50 \end{enumerate}
    51 Both White House~\cite{WhiteHouse24} and DARPA~\cite{DARPA24} recently released a recommendation to move away from C and \CC, because of cybersecurity threats exploiting vulnerabilities in these older languages.
    52 Hardening these three types goes a long way to make the majority of C programs safer.
    53 
    54 
    55 While multiple new languages purport to be systems languages replacing C, the reality is that rewriting massive C code-bases is impractical and a non-starter if the new runtime uses garage collection.
    56 Furthermore, these languages must still interact with the underlying C operating system through fragile, type-unsafe, interlanguage-communication.
    57 Switching to \CC is equally impractical as its complex and interdependent type-system (\eg objects, inheritance, templates) means idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning \CC.
    58 Hence, rewriting and retraining costs for these languages can be prohibitive for companies with a large C software-base (Google, Apple, Microsoft, Amazon, AMD, Nvidia).
     38The goal of this work is to introduce safe and complex versions of array, link lists, and strings into the programming language \CFA~\cite{Cforall}, which is based on C.
     39Unfortunately, to make C better, while retaining a high level of backwards compatibility, requires a significant knowledge of C's design.
     40Hence, it is assumed the reader has a medium knowledge of C or \CC, on which extensive new C knowledge is built.
    5941
    6042
     
    6446However, most programming languages are only partially explained by standard's manuals.
    6547When it comes to explaining how C works, the definitive source is the @gcc@ compiler, which is mimicked by other C compilers, such as Clang~\cite{clang}.
    66 Often other C compilers must mimic @gcc@ because a large part of the C library (runtime) system (@glibc@ on Linux) contains @gcc@ features.
    67 While some key aspects of C need to be explained by quoting from the language reference manual, to illustrate definite program semantics, my approach in this thesis is to devise a program, whose behaviour exercises a point at issue, and shows its behaviour.
     48Often other C compilers must mimic @gcc@ because a large part of the C library (runtime) system contains @gcc@ features.
     49While some key aspects of C need to be explained by quoting from the language reference manual, to illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and shows its behaviour.
    6850These example programs show
    69 \begin{itemize}[leftmargin=*]
    70         \item if the compiler accepts or rejects certain syntax,
     51\begin{itemize}
     52        \item the compiler accepts or rejects certain syntax,
    7153        \item prints output to buttress a claim of behaviour,
    72         \item or executes without triggering any embedded assertions testing pre/post-assertions or invariants.
     54        \item executes without triggering any embedded assertions testing pre/post-assertions or invariants.
    7355\end{itemize}
    7456This work has been tested across @gcc@ versions 8--12 and clang version 10 running on ARM, AMD, and Intel architectures.
    7557Any discovered anomalies among compilers or versions is discussed.
    76 In all case, it is never clear whether the \emph{truth} lies in the compiler or the C standard.
     58In all case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard.
     59
     60
     61\subsection{Ill-typed expressions}
     62
     63C reports many ill-typed expressions as warnings.
     64For example, these attempts to assign @y@ to @x@ and vice-versa are obviously ill-typed.
     65\lstinput{12-15}{bkgd-c-tyerr.c}
     66with warnings:
     67\begin{cfa}
     68warning: assignment to 'float *' from incompatible pointer type 'void (*)(void)'
     69warning: assignment to 'void (*)(void)' from incompatible pointer type 'float *'
     70\end{cfa}
     71Similarly,
     72\lstinput{17-19}{bkgd-c-tyerr.c}
     73with warning:
     74\begin{cfa}
     75warning: passing argument 1 of 'f' from incompatible pointer type
     76note: expected 'void (*)(void)' but argument is of type 'float *'
     77\end{cfa}
     78with a segmentation fault at runtime.
     79Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems inappropriate.
     80Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings, \eg unsed variable.
     81In the following discussion, ``ill-typed'' means giving a nonzero @gcc@ exit condition with a message that discusses typing.
     82Note, \CFA's type-system rejects all these ill-typed cases as type mismatch errors.
     83
     84% That @f@'s attempt to call @g@ fails is not due to 3.14 being a particularly unlucky choice of value to put in the variable @pi@.
     85% Rather, it is because obtaining a program that includes this essential fragment, yet exhibits a behaviour other than "doomed to crash," is a matter for an obfuscated coding competition.
     86
     87% A "tractable syntactic method for proving the absence of certain program behaviours by classifying phrases according to the kinds of values they compute"*1 rejected the program.
     88% The behaviour (whose absence is unprovable) is neither minor nor unlikely.
     89% The rejection shows that the program is ill-typed.
     90%
     91% Yet, the rejection presents as a GCC warning.
     92% *1  TAPL-pg1 definition of a type system
    7793
    7894
  • doc/theses/mike_brooks_MMath/string.tex

    r774c97e r6abb6dc  
    11\chapter{String}
    22
    3 
    4 
    5 
     3\section{String}
    64
    75\subsection{Logical overlap}
  • doc/theses/mike_brooks_MMath/uw-ethesis.bib

    r774c97e r6abb6dc  
    114114    pages       = {53-60},
    115115}
    116 
    117 
    118 @misc{Mendio24,
    119     contributer = {pabuhr@plg},
    120     title       = {What are the most secure programming languages?},
    121     author      = {Mend.io (White Source Ltd.)},
    122     year        = 2024,
    123     howpublished= {\url{https://www.mend.io/most-secure-programming-languages}},
    124 }
Note: See TracChangeset for help on using the changeset viewer.