Changeset 7ca6bf1 for doc/theses
- Timestamp:
- Aug 20, 2025, 12:12:31 PM (6 months ago)
- Branches:
- master, stuck-waitfor-destruct
- Children:
- 1dec8f3
- Parents:
- 9989781 (diff), 7ea4073 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)links above to see all the changes relative to each parent. - Location:
- doc/theses/mike_brooks_MMath
- Files:
-
- 8 edited
-
array.tex (modified) (1 diff)
-
background.tex (modified) (4 diffs)
-
conclusion.tex (modified) (1 diff)
-
intro.tex (modified) (6 diffs)
-
list.tex (modified) (12 diffs)
-
string.tex (modified) (2 diffs)
-
uw-ethesis-frontpgs.tex (modified) (1 diff)
-
uw-ethesis.bib (modified) (1 diff)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/array.tex
r9989781 r7ca6bf1 93 93 In an imperative language like C and \CFA, it is also necessary to discuss side effects, for which an even heavier formalism, like separation logic, is required. 94 94 Secondly, TODO: bash Rust. 95 TODO: cite the crap out of these claims.95 % TODO: cite the crap out of these claims. 96 96 \end{comment} 97 97 -
doc/theses/mike_brooks_MMath/background.tex
r9989781 r7ca6bf1 36 36 % Yet, the rejection presents as a GCC warning. 37 37 % *1 TAPL-pg1 definition of a type system 38 39 % reading C declaration: https://c-faq.com/decl/spiral.anderson.html40 38 41 39 … … 96 94 In spite of its difficulties, I believe that the C's approach to declarations remains plausible, and am comfortable with it; it is a useful unifying principle.~\cite[p.~12]{Ritchie93} 97 95 \end{quote} 98 After all, reading a C array type is easy: just read it from the inside out , and know when to look left and when to look right!96 After all, reading a C array type is easy: just read it from the inside out following the ``clock-wise spiral rule''~\cite{Anderson94}. 99 97 Unfortunately, \CFA cannot correct these operator priority inversions without breaking C compatibility. 100 101 TODO: rephrase to acknowledge the "clockwise rule" https://c-faq.com/decl/spiral.anderson.html102 98 103 99 The alternative solution is for \CFA to provide its own type, variable and routine declarations, using a more intuitive syntax. … … 621 617 In fact, the outermost type constructor (syntactically first dimension) is really the one that determines the parameter flavour. 622 618 623 TODO: add examples of mycode/arrr/bugs/c-dependent/x.cfa:v5102,5103,624 which are shocking how much C ignores. 619 \PAB{TODO: add examples of mycode/arrr/bugs/c-dependent/x.cfa:v5102,5103, 620 which are shocking how much C ignores.} 625 621 626 622 \begin{figure} … … 1261 1257 The kind of characters in the string is denoted by a prefix: UTF-8 characters are prefixed by @u8@, wide characters are prefixed by @L@, @u@, or @U@. 1262 1258 1263 For UTF-8 string literals, the array elements have type @char@ and are initialized with the characters of the multi-byte character sequences, \eg @u8"\xe1\x90\x87"@ (Canadian syllabic sY-Cree OO).1259 For UTF-8 string literals, the array elements have type @char@ and are initialized with the characters of the multi-byte character sequences, \eg @u8"\xe1\x90\x87"@ (Canadian syllabic Y-Cree OO). 1264 1260 For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding to the multi-byte character sequence, \eg @L"abc@$\mu$@"@ and are read/printed using @wscanf@/@wprintf@. 1265 1261 The value of a wide-character is implementation-defined, usually a UTF-16 character. -
doc/theses/mike_brooks_MMath/conclusion.tex
r9989781 r7ca6bf1 1 1 \chapter{Conclusion} 2 3 This thesis performed a detailed examination of the three most important high-level containers in many programming languages: array, linked-list, and string. 4 The goal of the work is to make containers easier to use, performant and safer. 5 Since some subset of these three containers are used in almost every program in every programming language, this is a laudable goal. 6 Accomplishing this goal in C is difficult because these features are poorly designed. 7 In contrast, \CFA's advanced type system and language features plus my critical design choices made it possible to provide better support with significant safety. 8 The result is application code that is easier to write, understand, maintain, and safer from hacker attach-vectors. 2 9 3 10 \section{Lists} 4 11 12 The key takeaway for lists is that intrusive lists can be made easy to use, performant by eliminating implicit memory allocation, and able to simulate wrapped lists, whereas wrapped lists cannot simulate intrusive lists. 13 5 14 \section{Arrays} 15 16 The key takeaway for arrays is that the type system must be extended to properly manage array bounds (dimensions) to safely pass arrays to (polymorphic) functions. 17 By adding a special kind of template constant to \CFA, @[N]@, the type system understands array bounds and implicitly associates these bounds with array instances, statically and dynamically, throughout the programming language. 18 Array overruns are no longer possible because all subscripting is checked, as in other modern languages. 19 Subscript checking can be implicitly elided when the compiler is given sufficient information to determine the subscript variable is always in bounds, giving performant execution. 20 21 Safe, complex VLA's is another important feature because it replaces unsafe explicit dynamic allocation. 22 As well, VLA's reduce heap contention in concurrent programs. 23 24 Finally, the ability to slice a higher-dimensional array into subarrays is also a powerful and safety critical feature. 25 6 26 7 27 \section{Strings} 8 28 29 The key takeaway for strings is that providing powerful and safe block operations for manipulating strings is more important than ultra-level performance. 30 Manipulating strings is always going to be expensive because of their dynamic variable sizing, as the hardware rarely has string-level operations, possibly only move and compare. 31 This work designs an expressive set of safe string operations for composing, comparing, and decomposing arbitrary length strings, include complex reading and printing operations. 32 33 Creating bespoke storage management for strings has the advantage of faster, more compact storage management due to string sharing, at the cost of additional external fragmentation between the string and general heaps. 34 With the large amounts of available memory, this approach is a viable tradeoff. 35 36 37 9 38 \section{Future Work} 39 40 All three forms of containers presented in this work are in their nascence, both in design and implementation. 41 This work provides the foundation for future \CFA students to add more functionality along with robust and performant implementations. 42 -
doc/theses/mike_brooks_MMath/intro.tex
r9989781 r7ca6bf1 5 5 For all three types, languages and/or their libraries supply varying degrees of high-level mechanisms for manipulating these objects at the bulk and component level, such as copying, slicing, extracting, and iterating among elements. 6 6 7 Unfortunately, these three aspects of C cause a significant number of memory errors~\cite{ vanOorschot23}.7 Unfortunately, these three aspects of C cause a significant number of memory errors~\cite{Oorschot23}. 8 8 For operating system and browser vendors, who heavily use systems languages, 60\%--70\% of reported software vulnerabilities involved memory errors~\cite{Kehrer23}. 9 9 For Microsoft, 70\% of vulnerabilities addressed via security updates between 2006--2018 are memory safety issues~\cite[slide 10]{Miller19}. … … 21 21 22 22 \section{Array} 23 \label{s:ArrayIntro}24 23 25 24 An array provides a homogeneous container with $O(1)$ access to elements using subscripting. … … 66 65 67 66 67 \begin{comment} 68 68 \section{Iterator} 69 69 … … 72 72 However, the general iteration work is only a sketch for others as future work. 73 73 Nevertheless, sufficed work was done to write out the ideas that developed and how they should apply in the main context of this work. 74 \end{comment} 74 75 75 76 … … 160 161 161 162 163 \begin{comment} 162 164 \subsection{Iterator} 163 165 … … 169 171 This design extends a preexisting proposal to adapt the \CFA (fixed) for-each loop to be more user-pluggable, and builds upon preexisting \CFA coroutines. 170 172 Overall, it simplifies the work a programmer must do to leverage the suspended-state abstraction during iteration. 173 \end{comment} -
doc/theses/mike_brooks_MMath/list.tex
r9989781 r7ca6bf1 88 88 \subsection{Core Design Issues} 89 89 90 The doubly-linked list attaches links intrusively, supports multiple link directions, integrates with user code via the type system, treats its ends uniformly, and identifies a list using an explicit head.90 The doubly-linked list attaches links intrusively, supports multiple link axes, integrates with user code via the type system, treats its ends uniformly, and identifies a list using an explicit head. 91 91 This design covers system and data management issues stated in \VRef{toc:lst:issue}. 92 92 … … 101 101 \begin{figure} 102 102 \lstinput{20-30}{lst-features-intro.run.cfa} 103 \caption[Multiple link directions in \CFA list library]{103 \caption[Multiple link axes in \CFA list library]{ 104 104 Demonstration of the running \lstinline{req} example, done using the \CFA list library. 105 105 This example is equivalent to the three approaches in \VRef[Figure]{fig:lst-issues-attach}. … … 128 128 \end{figure} 129 129 130 \VRef[Figure]{fig:lst-features-multidir} shows how the \CFA library supports multi-inline links, so a node can be on one or more lists simultaneously .130 \VRef[Figure]{fig:lst-features-multidir} shows how the \CFA library supports multi-inline links, so a node can be on one or more lists simultaneously (axes). 131 131 The declaration of @req@ has two inline-inheriting @dlink@ occurrences. 132 132 The first of these gives a type named @req.by_pri@, @req@ inherits from it, and it inherits from @dlink@. … … 150 150 151 151 \caption{ 152 Demonstration of multiple static link directions done in the \CFA list library.152 Demonstration of multiple static link axes done in the \CFA list library. 153 153 The right example is from \VRef[Figure]{fig:lst-issues-multi-static}. 154 154 The left \CFA example does the same job. … … 158 158 159 159 The list library also supports the common case of single directionality more naturally than LQ. 160 Returning to \VRef[Figure]{fig:lst-features-intro}, the single- direction list has no contrived name for the link directionas it uses the default type in the definition of @dlist@;160 Returning to \VRef[Figure]{fig:lst-features-intro}, the single-axis list has no contrived name for the link axis as it uses the default type in the definition of @dlist@; 161 161 in contrast, the LQ list in \VRef[Figure]{f:Intrusive} adds the unnecessary field name @d@. 162 In \CFA, a single directionlist sets up a single inheritance with @dlink@, and the default list axis is to itself.163 164 When operating on a list with several directions and operations that do not take the list head, the list axis can be ambiguous.162 In \CFA, a single axis list sets up a single inheritance with @dlink@, and the default list axis is to itself. 163 164 When operating on a list with several axes and operations that do not take the list head, the list axis can be ambiguous. 165 165 For example, a call like @insert_after( r1, r2 )@ does not have enough information to know which axes to select implicitly. 166 166 Is @r2@ supposed to be the next-priority request after @r1@, or is @r2@ supposed to join the same-requester list of @r1@? … … 171 171 To mitigate this issue, the list library provides a hook for applying the \CFA language's scoping and priority rules. 172 172 \begin{cfa} 173 with ( DLINK_VIA( req, req.pri ) ) insert_after( r1, r2 );173 with ( DLINK_VIA( req, req.pri ) ) insert_after( r1, r2 ); 174 174 \end{cfa} 175 175 Here, the @with@ statement opens the scope of the object type for the expression; 176 hence, the @DLINK_VIA@ result causes one of the list directions to become a more attractive candidate to \CFA's overload resolution.176 hence, the @DLINK_VIA@ result causes one of the list axes to become a more attractive candidate to \CFA's overload resolution. 177 177 This boost can be applied across multiple statements in a block or an entire function body. 178 178 \begin{cquote} … … 192 192 \end{tabular} 193 193 \end{cquote} 194 Within the @with@, the code acts as if there is only one list direction.194 Within the @with@, the code acts as if there is only one list axis, without explicit casting. 195 195 196 196 Unlike the \CC template container-types, the \CFA library works completely within the type system; 197 197 both @dlink@ and @dlist@ are ordinary types, not language macros. 198 198 There is no textual expansion other than header-included static-inline function for performance. 199 Hence, errors in user code are reported only with mention of the library's declarations .199 Hence, errors in user code are reported only with mention of the library's declarations, versus long template names in error messages. 200 200 Finally, the library is separately compiled from the usage code, modulo inlining. 201 201 … … 361 361 \end{tabular} 362 362 \end{cquote} 363 Iterating forward and reverse through the entire list using the shorthand start at the list head and pick a direction.363 Iterating forward and reverse through the entire list using the shorthand start at the list head and pick a axis. 364 364 In this case, @advance@ and @recede@ return a boolean, like \CC @while ( cin >> i )@. 365 365 \begin{cquote} … … 629 629 After each round, a counter is incremented by $n$ (for throughput). 630 630 Time is measured outside the loop because a large $n$ can overrun the time duration before the @CONTINUE@ flag is tested. 631 Hence, there is minimum of one outer (@CONTINUE@) loop iteration for large lists. 631 632 The loop duration is divided by the counter and this throughput is reported. 632 633 In a scatter-plot, each dot is one throughput, which means insert + remove + harness overhead. … … 636 637 To test list operations, the experiment performs the inserts/removes in different patterns, \eg insert and remove from front, insert from front and remove from back, random insert and remove, \etc. 637 638 Unfortunately, the @std::list@ does \emph{not} support direct insert/remove from a node without an iterator, \ie no @erase( node )@, even though the list is doubly-linked. 638 To eliminate this additional cost in the harness, a trick is used for random insertions without replacement. 639 The @i@ fields in each node are initialized from @0..n-1@, these @i@ values are then shuffled in the nodes, and the @i@ value is used to represent an indirection to that node for insertion, so the nodes are inserted in random order, and hence, removed in th same random order. 639 To eliminate the iterator, a trick is used for random insertions without replacement. 640 The @i@ fields in each node are initialized from @0..n-1@. 641 These @i@ values are then shuffled in the nodes, and the @i@ value is used to represent an indirection to that node for insertion. 642 hence, the nodes are inserted in random order and removed in the same random order. 640 643 $\label{p:Shuffle}$ 641 644 \begin{c++} … … 644 647 645 648 while ( CONTINUE ) { 646 for ( i; n ) insert_first( lst, nodes[ @nodes[i].i@ ] ); $\C{// build up}$ 649 for ( i; n ) { 650 node_t & temp = nodes[ nodes[i].i ]; 651 @temp.j = 0;@ $\C{// touch random node for wrapped nodes}$ 652 insert_first( lst, temp ); 653 } 654 insert_first( lst, nodes[ @nodes[i].i@ ] ); $\C{// build up}$ 647 655 for ( i; n ) pass( &remove_first( lst ) ); $\C{// tear down}$ 648 656 totalOpsDone += n; 649 657 } 650 658 \end{c++} 651 This approach works across intrusive and wrapped lists. 659 Note, insertion is traversing the list of nodes linearly, @node[i]@. 660 For intrusive lists, the inserted (random) node is always touched because its link fields are read/written for insertion into the list. 661 Hence, the array of nodes is being accessed both linearly and randomly during the traversal. 662 For wrapped lists, the wrapped nodes are traversed linearly but the random node is not accessed, only a pointer to it is inserted into the linearly accessed wrapped node. 663 Hence, the traversal is the same as the non-random traversal above. 664 To level the experiments, an explicit access to the random node is inserted after the insertion, @temp.j = 0@, for the wrapped experiment. 665 Furthermore, it is rare to insert/remove nodes and not access them. 652 666 653 667 % \emph{Interleaving} allows for movements other than pure stack and queue. … … 770 784 As a result, these memory layouts result in high spatial and temporal locality for both kinds of lists during the linear array traversal. 771 785 With address look-ahead, the hardware does an excellent job of managing the multi-level cache. 772 Hence, performance is largely constant for both kinds of lists, until cache and NUMA boundaries are crossed for longer lists and the costs increase consistently for both kinds of lists.773 774 In detail, \VRef[Figure]{f:Shuffled} shows shuffle insertion of all the nodes and then linear removal, both in the same direction.786 Hence, performance is largely constant for both kinds of lists, until L3 cache and NUMA boundaries are crossed for longer lists and the costs increase consistently for both kinds of lists. 787 788 In detail, \VRef[Figure]{f:Shuffled} shows shuffle insertion and removal of the nodes. 775 789 As for linear, there are issues with the wrapped list and memory allocation. 776 790 For intrusive lists, it is possible to link the nodes randomly, so consecutive nodes in memory seldom point at adjacent nodes. -
doc/theses/mike_brooks_MMath/string.tex
r9989781 r7ca6bf1 700 700 \end{tabular} 701 701 \end{cquote} 702 \CC @setfill@ is not considered an important string manipulator. 702 703 703 704 \CC input matching for @char@, @char *@, and @string@ are similar, where \emph{all} input characters are read from the current point in the input stream to the end of the type size, format width, whitespace, end of line (@'\n'@), or end of file. … … 1556 1557 Timing outcomes report mean nanoseconds per invocation, which includes harness overhead and the targeted string API execution. 1557 1558 1558 \PAB{To discuss: hardware and such}1559 1560 1559 As discussed in \VRef[Section]{string-raii-limit}, general performance comparisons are made using \CFA's faster, low-level string API, named @string_res@. 1561 1560 \VRef{s:ControllingImplicitSharing} presents an operational mode where \CFA string sharing is turned off. 1562 1561 In this mode, the \CFA string operates similarly to \CC's, by using a heap allocation for string text. 1563 1562 Some experiments include measurements in this mode for baselining purposes, called ``\CC emulation mode'' or ``nosharing''. 1563 1564 See~\VRef{s:ExperimentalEnvironment} for a description of the hardware environment. 1564 1565 1565 1566 -
doc/theses/mike_brooks_MMath/uw-ethesis-frontpgs.tex
r9989781 r7ca6bf1 132 132 Often array is part of the programming language, while linked-list is built from (recursive) pointer types, and string from a combination of array and linked-list. 133 133 For all three types, languages and/or their libraries supply varying degrees of high-level mechanisms for manipulating these objects at the bulk and component level, such as copying, slicing, extracting, and iterating among elements. 134 Unfortunately, these three aspects of C cause 60\%--70\% of the reported software vulnerabilities involved memory errors, and 70\%- 80\% of hacker attack-vectors target these types.134 Unfortunately, these three aspects of C cause 60\%--70\% of the reported software vulnerabilities involved memory errors, and 70\%--80\% of hacker attack-vectors target these types. 135 135 Therefore, hardening these three C types goes a long way to make the majority of C programs safer. 136 136 -
doc/theses/mike_brooks_MMath/uw-ethesis.bib
r9989781 r7ca6bf1 140 140 } 141 141 142 143 @misc{Anderson94, 144 contributer = {pabuhr@plg}, 145 title = {Clockwise/Spiral Rule}, 146 author = {David Anderson}, 147 year = 1994, 148 month = may, 149 howpublished= {\url{https://c-faq.com/decl/spiral.anderson.html}}, 150 }
Note:
See TracChangeset
for help on using the changeset viewer.