Changeset 74accc6 for doc/theses
- Timestamp:
- Apr 26, 2026, 7:49:45 PM (4 days ago)
- Branches:
- master
- Children:
- 037dc57
- Parents:
- ac9c0ee8
- Location:
- doc/theses/mike_brooks_MMath
- Files:
-
- 4 edited
-
background.tex (modified) (46 diffs)
-
intro.tex (modified) (10 diffs)
-
list.tex (modified) (1 diff)
-
programs/bkgd-carray-decay.c (modified) (1 diff)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/background.tex
rac9c0ee8 r74accc6 21 21 note: expected 'void (*)(void)' but argument is of type 'float *' 22 22 \end{cfa} 23 Clearly, @gcc@ understands these ill-typed case , and yet allows the program to compile, which seems inappropriate.23 Clearly, @gcc@ understands these ill-typed cases, and yet allows the program to compile, which seems inappropriate. 24 24 Compiling with flag @-Werror@, which turns warnings into errors, is often too pervasive, because some warnings are just warnings, \eg an unused variable. 25 25 In the following discussion, \emph{ill-typed} means giving a nonzero @gcc@ exit condition with a message that discusses typing. … … 89 89 this anomaly is \emph{fixed} with operator @->@, which performs the two operations in the more intuitive order: @sp->f@ $\Rightarrow$ @(*sp).f@. 90 90 \end{itemize} 91 While attempting to make the declaration and expression contexts consistent is a laudable goal, it has not worked out in practice, even though Dennis Richie believed otherwise:91 While attempting to make the declaration and expression contexts consistent is a laudable goal, it has not worked out in practice, Dennis Richie's contrary position on the issue conceded: 92 92 \begin{quote} 93 93 In spite of its difficulties, I believe that the C's approach to declarations remains plausible, and am comfortable with it; it is a useful unifying principle.~\cite[p.~12]{Ritchie93} 94 94 \end{quote} 95 A fter all, reading a C array type is easy: just read it from the inside outfollowing the ``clock-wise spiral rule''~\cite{Anderson94}.95 As it stands, typed must be read from their declared identifier outward, following the ``clock-wise spiral rule''~\cite{Anderson94}. 96 96 Unfortunately, \CFA cannot correct these operator priority inversions without breaking C compatibility. 97 97 98 The alternative solution is for \CFA to provide its own type, variable and routine declarations, using a more intuitive syntax.99 The new declarations place qualifiers to the left of the base type, while C declarations place qualifiers to the right of the base type.100 The qualifiers have the same syntax andsemantics in \CFA as in C, so there is nothing to learn.98 The alternative solution is for \CFA to provide its own, more intuitive declaration syntax for types, variables and routines. 99 The new declarations place type operators to the left of the base type, while C declarations place them (to the right of the base type and) spiraling around the declared object name. 100 The type operators have the same semantics in \CFA as in C, so there is nothing to learn. 101 101 Then, a \CFA declaration is read left to right, where a function return-type is enclosed in brackets @[@\,@]@. 102 102 \begin{cquote} … … 131 131 (It is unclear if Hebrew or Arabic speakers, say declarations right to left.) 132 132 133 Specifically, \VRef[Table]{bkgd:ar:usr:avp} introduces the many layers of the C and \CFA array story, where the \CFA story is discuss ionin \VRef[Chapter]{c:Array}.133 Specifically, \VRef[Table]{bkgd:ar:usr:avp} introduces the many layers of the C and \CFA array story, where the \CFA story is discussed in \VRef[Chapter]{c:Array}. 134 134 The \CFA-thesis column shows the new array declaration form, which is my contribution to safety and ergonomics. 135 135 The table shows there are multiple yet equivalent forms for the array types under discussion, and subsequent discussion shows interactions with orthogonal (but easily confused) language features. … … 294 294 \begin{quote} 295 295 Except when it is the operand of the @sizeof@ operator, or the unary @&@ operator, or is a string literal used to 296 initialize an array an expression that has type ``array of \emph{type}'' is converted to an expression with type296 initialize an array, an expression that has type ``array of \emph{type}'' is converted to an expression with type 297 297 ``pointer to \emph{type}'' that points to the initial element of the array object~\cite[\S~6.3.2.1.3]{C11} 298 298 \end{quote} 299 299 This phenomenon is the famous (infamous) \newterm{pointer decay}, which is a decay of an array-typed expression into a pointer-typed one. 300 It is worthy to note that the list of exceptional cases does not feature the occurrence of @ar@ in @ar[i]@.301 Thus, subscripting happens on pointers notarrays.302 303 Subscripting proceeds first with pointer decay, if needed.304 Next, \cite[\S~6.5.2.1.2]{C11} explains that @ar[i]@ is treated as if it is @(*((ar)+(i)))@.300 It is worthy to note that, in the expression @ar[i]@, the subexpression @ar@, is not one of the standard's exceptions; @ar@ is converted to a pointer, before subscripting occurs. 301 Thus, \emph{all} subscripting happens on pointers, never arrays. 302 303 After this pointer decay (when needed), subscripting proceeds. 304 In the next step, \cite[\S~6.5.2.1.2]{C11} explains that @ar[i]@ is treated as if it is @(*(ar+i))@. 305 305 \cite[\S~6.5.6.8]{C11} explains that the addition, of a pointer with an integer type, is defined only when the pointer refers to an element that is in an array. 306 306 The addition gives the address @i@ elements away from the element (@ar@, or @&ar[0]@). 307 307 This address is valid if @ar@ is big enough and @i@ is small enough. 308 308 Finally, \cite[\S~6.5.3.2.4]{C11} explains that the @*@ operator's result is the referenced element. 309 Taken together, these rules illustrate that @ar[i]@ and @i[a]@ mean the same thing, as plusis commutative.310 311 Subscripting a pointer when the target is standard-inappropriate is still practicallywell-defined.309 Taken together, these rules illustrate that @ar[i]@ and @i[a]@ mean the same thing, as addition is commutative. 310 311 Subscripting a pointer when the target is standard-inappropriate is still, practically, well-defined. 312 312 While the standard affords a C compiler freedom about the meaning of an out-of-bound access, or of subscripting a pointer that does not refer to an array element at all, 313 313 the fact that C is famously both generally high-performance, and specifically not bound-checked, leads to an expectation that the runtime handling is uniform across legal and illegal accesses. … … 317 317 fs[5] = 3.14; 318 318 \end{cfa} 319 The @malloc@ behaviour is specified as returning a pointer to ``space for an object whose size is'' as requested (\cite[\S~7.22.3.4.2]{C11}). 320 But \emph{nothing} more is said about this pointer value, specifically that its referent might \emph{be} an array allowing subscripting. 321 322 Under this assumption, a pointer, @p@, being subscripted (or added to, then dereferenced) by any value (positive, zero, or negative), gives a view of the program's entire address space, centred around @p@'s address, divided into adjacent @sizeof(*p)@ chunks, each potentially (re)interpreted as @typeof(*p).t@. 323 I call this phenomenon \emph{array diffraction}, which is a diffraction of a single-element pointer into the assumption that its target is conceptually in the middle of an array whose size is unlimited in both directions, \eg @(&ar[5])[-200]@ or @(&ar[5])[200]@). 319 The @malloc@ behaviour is specified as returning a pointer to ``space for an object whose size is [the size requested]'' (\cite[\S~7.22.3.4.2]{C11}). 320 But \emph{nothing} more is said about this pointer, much less that its referent might \emph{be} an array, \ie that subscripting might be allowed at all. 321 322 Under this assumption, a pointer, @p@, being subscripted (or added to, then dereferenced) by any value (positive, zero, or negative), gives a view of the program's entire address space, centred around @p@'s address, divided into adjacent @sizeof(*p)@ chunks, each potentially (re)interpreted as @typeof(*p)@. 323 I call this phenomenon \emph{array diffraction}, which is a diffraction of a single-element pointer into the assumption that its target is conceptually in the middle of an array, whose size is unlimited in both directions, \eg @(&ar[5])[-200]@ or @(&ar[5])[200]@. 324 324 325 No pointer is exempt from array diffraction. 325 326 No array shows its elements without pointer decay. … … 328 329 A further pointer--array confusion, closely related to decay, occurs in parameter declarations. 329 330 \cite[\S~6.7.6.3.7]{C11} explains that when an array type is written for a parameter, 330 the parameter's type becomes a type that can be summarized as the array-decayed type. 331 the parameter's type changes into (what is practically) the pointer-decayed type. 332 But pointer decay applies to an expression, while this transformation applies to a declaration. 331 333 The respective handling of the following two parameter spellings shows that the array and pointer versions are identical. 332 334 \lstinput{12-16}{bkgd-carray-decay.c} … … 419 421 420 422 As of C99, the C standard supports a \newterm{variable length array} (VLA)~\cite[\S~6.7.5.2.5]{C99}, providing a dynamic-fixed array feature \see{\VRef{s:ArrayIntro}}. 421 Note, the \CC standard does not support VLAs, but @g++@ and @clang@ provide them.423 Note, the \CC standard does not support VLAs, but @g++@ and @clang@ provide a limited form of them. 422 424 A VLA is used when the desired number of array elements is \emph{unknown} at compile time. 423 425 \begin{cfa} 424 size_t cols;425 scanf( "%d", & cols);426 double ar[ cols];427 \end{cfa} 428 The array dimension is read from outside the program and used to create an array of size @ cols@ on the stack.426 size_t n; 427 scanf( "%d", &n ); 428 double ar[n]; 429 \end{cfa} 430 The array dimension is read from outside the program and used to create an array of size @n@ on the stack. 429 431 The VLA is implemented by the @alloca@ routine, which bumps the stack pointer. 430 432 Unfortunately, there is significant misinformation about VLAs, \eg the stack size is limited (small), or VLAs cause stack failures or are inefficient. … … 441 443 When working with arrays, \eg linear algebra, array dimensions are referred to as \emph{rows} and \emph{columns} for a matrix, adding \emph{planes} for a cube. 442 444 (There is little terminology for higher dimensional arrays.) 443 For example, an acrostic poem\footnote{A kind of poetrywhere the first, last or other letters in a line spell out a word or phrase in a vertical column.}445 For example, an acrostic poem\footnote{A written presentation of words where the first, last or other letters in a line spell out a word or phrase in a vertical column.} 444 446 can be treated as a grid of characters, where the rows are the text and the columns are the embedded keyword(s). 445 447 Within a poem, there is the concept of a \newterm{slice}, \eg a row is a slice for the poem text, a column is a slice for a keyword. 446 448 In general, the dimensioning and subscripting for multidimensional arrays has two syntactic forms: @m[r,c]@ or @m[r][c]@. 447 449 448 Commonly, an array, matrix , or cube, is visualized (especially in mathematics) as a contiguous row, rectangle,or block.449 This conceptualization is re enforced by subscript ordering, \eg $m_{r,c}$ for a matrix and $c_{p,r,c}$ for a cube.450 Commonly, an array, matrix or cube is visualized (especially in mathematics) as a contiguous row, rectangle or block. 451 This conceptualization is reinforced by subscript ordering, \eg $m_{r,c}$ for a matrix and $c_{p,r,c}$ for a cube. 450 452 Few programming languages differ from the mathematical subscript ordering. 451 453 However, computer memory is flat, and hence, array forms are structured in memory as appropriate for the runtime system. … … 460 462 however, it requires all dimensions except the first to be specified at compile time, \eg @m[][6]@, allowing all subscripting stride calculations to be generated with constants. 461 463 Hence, every matrix passed to @fp1@ must have exactly 6 columns but the row size can vary. 462 The variable-dimension approach (right) ignores (violates) the type system, \ie argument and parameters types do not match, and subscripting is performed manually using pointer arithmetic in the macro @sub@. 464 The variable-dimension approach (right) ignores (violates) the type system, \ie the parameter type has no suggestion of mutidimensionality and some acrobatics are required for a w\footnote{ 465 One may be tempted to phrase a call as \lstinline{fp2( 4, 4, vm1 )}, but this call is ill-typed. Argument \lstinline{vm1} could match parameter declarations \lstinline{int m[][4]} or \lstinline{int (*m)[4]}. But only the argument \lstinline{&vm1[0][0]}, or its equivalent, but confusing, \lstinline{vm1[0]}, relate \lstinline{vm1} to parameter type \lstinline{int*}. 466 }, and subscripting is performed manually using pointer arithmetic in the macro @sub@. 463 467 464 468 \begin{figure} … … 470 474 ... printf( "%d ", @m[r][c]@ ); ... 471 475 } 472 int fm1[4][@6@], fm2[6][@6@]; // no VLA 476 int fm1[4][@6@], fm2[6][@6@]; // no VLA, same 473 477 // initialize matrixes 474 478 fp1( 4, fm1 ); // implicit 6 columns … … 481 485 ... printf( "%d ", @sub( m, r, c )@ ); ... 482 486 } 483 int vm1[ @4@][@4@], vm2[@6@][@8@]; // no VLA487 int vm1[4][@4@], vm2[6][@8@]; // no VLA, different 484 488 // initialize matrixes 485 fp2( 4, 4, vm1);486 fp2( 6, 8, vm2);487 \end{cfa} 488 \end{tabular} 489 \caption{ C90Fixed \vs Variable Contiguous Matrix Styles}489 fp2( 4, 4, @&vm1[0][0]@ ); 490 fp2( 6, 8, &vm2[0][0] ); 491 \end{cfa} 492 \end{tabular} 493 \caption{Pre-VLA Fixed \vs Variable Contiguous Matrix Styles} 490 494 \label{f:FixedVariable} 491 495 \end{figure} 492 496 493 Many languages allow multidimensional arrays-of-arrays, \eg in Pascal or\CC.497 Many languages allow multidimensional arrays-of-arrays, \eg Pascal and \CC. 494 498 \begin{cquote} 495 499 \setlength{\tabcolsep}{15pt} 496 500 \begin{tabular}{@{}ll@{}} 497 501 \begin{pascal} 502 (* Pascal *) 498 503 var m : array[0..4, 0..4] of Integer; (* matrix *) 499 504 type AT = array[0..4] of Integer; (* array type *) … … 504 509 & 505 510 \begin{c++} 511 // C++ 506 512 int m[5][5]; 507 513 508 514 typedef vector< vector<int> > MT; 509 MT v m( 5, vector<int>( 5 ) );510 m@[1][2]@ = 1; aa@[1][2]@ = 1;515 MT vv( 5, vector<int>( 5 ) ); 516 m@[1][2]@ = 1; vv@[1][2]@ = 1; 511 517 \end{c++} 512 518 \end{tabular} 513 519 \end{cquote} 514 520 The language decides if the matrix and array-of-array are laid out the same or differently. 515 For example, an array-of-array may be an array of row pointers to arrays of columns, so the rows may not be contiguous in memory nor even the same length ( triangular matrix).516 Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though a language could differentiate dbetween the two forms using subscript syntax, \eg @m[1,2]@ \vs @aa[1][2]@.521 For example, an array-of-array may be an array of row pointers to arrays of columns, so the rows may not be contiguous in memory nor even the same length (\eg triangular matrix). 522 Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though a language could differentiate between the two forms using subscript syntax, \eg @m[1,2]@ \vs @aa[1][2]@. 517 523 Nevertheless, controlling memory layout can make a difference in what operations are allowed and in performance (\eg caching/NUMA effects). 518 C also provides non-contiguous arrays-of-arrays:524 C also allows non-contiguous arrays-of-arrays: 519 525 \begin{cfa} 520 526 int m[5][5]; $\C{// contiguous}$ … … 527 533 Nevertheless, the C array-of-array form is still important for special circumstances. 528 534 529 \VRef[Figure]{f:ContiguousNon-contiguous} shows a powerful extension made in C99 for manipulating contiguous \vs non-contiguous arrays.\footnote{C90 also supported non-contiguous arrays.}535 \VRef[Figure]{f:ContiguousNon-contiguous} shows a powerful extension made in C99, for manipulating contiguous \vs non-contiguous arrays.\footnote{C90 also supported non-contiguous arrays. Though GNU-flavoured C++ eventually got VLAs, it never got this enhancement for managing a multidimensionl VLA parameter. } 530 536 For contiguous-array arguments (including VLA), C99 conjoins one or more of the parameters as a downstream dimension(s), \eg @cols@, implicitly using this parameter to compute the row stride of @m@. 531 537 Hence, if the declaration of @fc@ is changed to: … … 533 539 void fc( int rows, int cols, int m[@rows@][@cols@] ) ... 534 540 \end{cfa} 535 there is now sufficient information to support array copying and subscript checking to prevent changing the argument orbuffer-overflow problems, \emph{but neither feature is provided}.541 there is now sufficient information to support array copying and subscript checking to prevent buffer-overflow problems, \emph{but neither feature is provided}. 536 542 While this contiguous-array capability is a step forward, it is still the programmer's responsibility to manually manage the number of dimensions and their sizes, both at the function definition and call sites. 537 543 That is, the array does not automatically carry its structure and sizes for use in computing subscripts. 538 544 While the non-contiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknown-sized array of row pointers, and it relies on the programmer to traverse the columns in a row with a correctly bounded loop index. 539 Specifically, there is no requirement that the rows are the same length, like a poem with differentlength lines.545 Specifically, there is no requirement that the rows are the same length, like for a poem with different-length lines. 540 546 541 547 \begin{figure} … … 547 553 for ( size_t c = 0; c < cols; c += 1 ) 548 554 ... @m[r][c]@ ... 555 // each r-step: cols * sizeof(int) 549 556 } 550 557 int m@[5][5]@; … … 562 569 for ( size_t c = 0; c < cols; c += 1 ) 563 570 ... @m[r][c]@ ... 571 // each r-step: 1 * sizeof(int*) 564 572 } 565 573 int @* aa[5]@; // row pointers … … 582 590 Again, the inspection begins by using @sizeof@ to provide program semantics for the intuition of an expression's type. 583 591 \lstinput{16-18}{bkgd-carray-mdim.c} 584 There are now three axis forderiving expressions from @mx@: \emph{itself}, \emph{first element}, and \emph{first grand-element} (meaning, first element of first element).592 There are now three means of deriving expressions from @mx@: \emph{itself}, \emph{first element}, and \emph{first grand-element} (meaning, first element of first element). 585 593 \lstinput{20-26}{bkgd-carray-mdim.c} 586 594 Given that arrays are contiguous and the size of @float@ is 4, then the size of @mx@ with 3 $\times$ 10 floats is 120 bytes, the size of its first element (row) is 40 bytes, and the size of the first element of the first row is 4. 587 Again, an array and a point to each of its axes are different.595 Again, an array and a pointer to anything it contains are different. 588 596 \lstinput{28-36}{bkgd-carray-mdim.c} 589 597 As well, there is pointer decay from each of the matrix axes to pointers, all having the same address. 590 \lstinput{38-44}{bkgd-carray-mdim.c}591 598 Finally, subscripting is allowed on a @malloc@ result, where the referent may or may not allow subscripting or have the right number of subscripts. 592 599 … … 596 603 Passing an array as an argument to a function is necessary. 597 604 Assume a parameter is an array where the function intends to subscript it. 598 This section asserts that a more satisfactory/formal characterization does not exist in C, then surveys the ways that C API authors communicate @p@ has zero or more dimensions, and finally calls out the minority cases where the C type system is using or verifying such claims.605 This section asserts that a more satisfactory/formal characterization does not exist in C, then surveys the ways that C API authors communicate that @p@ has zero or more dimensions, and finally calls out the minority cases where the C type system is using or verifying such claims. 599 606 600 607 A C parameter declaration looks different from the caller's and callee's perspectives. … … 685 692 % So are @float[5]*@, @float[]*@ and @float (*)*@. These latter ones are simply nonsense, though they hint at ``1d array of pointers'', whose equivalent syntax options are, @float *[5]@, @float *[]@, and @float **@. 686 693 687 It is a matter of taste as to whether a programmer should use the left form to get the most out of commenting subscripting and dimension sizes, sticking to the right (avoiding false comfort from suggesting the typechecker is checking more than it is), or compromising in the middle (reducing unchecked information, yet clearly stating, ``I am subscript'').694 It is a matter of taste as to whether a programmer should use the left form to get the most out of commenting subscripting and dimension sizes, sticking to the right (avoiding false comfort from suggesting the typechecker is checking more than it is), or compromising in the middle (reducing unchecked information, yet clearly stating, ``I am for subscripting''). 688 695 689 696 Note that this equivalence of pointer and array declarations is special to parameters. … … 771 778 } 772 779 \end{cfa} 773 The cases without comments are rejections, but simply because the array ranks do not match; in the commented cases, the ranks match and the rules beingdiscussed apply.780 The cases without comments are rejections, but simply because the array ranks do not match; in the commented cases, the ranks match and the rules are discussed apply. 774 781 The cases @f( b )@ and @f( &a )@ show where some length checking occurs. 775 782 But this checking misses the cases @f( d )@ and @f( &c )@, allowing the calls with mismatched lengths, actually 100 for 10. … … 777 784 Ultimately, an inner dimension's size is a callee's \emph{assumption} because the type system uses declaration details in the callee's perspective that it does not enforce in the caller's perspective. 778 785 779 Finally, to handle higher-dimensional VLAs, C repurposed the @*@ \emph{within} the dimension in a declaration to mean that the callee has make an assumption about the size, but no (checked, possibly wrong) information about this assumption is included for the caller-programmer's benefit/\-over-confidence.786 Finally, to handle higher-dimensional VLAs, C repurposed the @*@ \emph{within} the dimension in a declaration to mean that the callee must make an assumption about the size, but no (unchecked, possibly wrong) information about this assumption is included for the caller-programmer's benefit/\-over-confidence. 780 787 \begin{cquote} 781 788 @[@ \textit{type-qualifier-list$_{opt}$} @* ]@ … … 797 804 \label{s:ArraysCouldbeValues} 798 805 799 All arrays have a know runtime size at their point of declaration.806 All arrays have a known runtime size at their point of declaration. 800 807 Furthermore, C provides an explicit mechanism to pass an array's dimensions to a function. 801 808 Nevertheless, an array cannot be copied, and hence, not passed by value to a function, even when there is sufficient information to do so. … … 827 834 \section{Linked List} 828 835 829 Linked -lists are blocks of storage connected using one or more pointers.836 Linked lists are blocks of storage connected using one or more pointers. 830 837 The storage block (node) is logically divided into data (user payload) and links (list pointers), where the links are the only component used by the list structure. 831 Since the data is opaque, list structures are often polymorphic over the data, which is often homogeneous. 832 833 The links organize nodes into a particular kind of data structure, \eg queue, tree, hash table, \etc, with operations specific to that kind. 834 Because a node's existence is independent of the data structure that organizes it, all nodes are manipulated by address not value; 835 hence, all data structure routines take and return pointers to nodes and not the nodes themselves. 838 Since the data unused, list structures are often polymorphic over the data, which is often homogeneous. 839 840 The links organize nodes into a particular shape of data structure, \eg chain, tree, hash table, \etc, with operations specific to that kind. 841 In all these cases, a node's address is significant, so nodes are communicated by reference/pointer, and never by copy (by value). 836 842 837 843 … … 842 848 Within this restricted space, all design-issue discussions assume the following invariants. 843 849 \begin{itemize} 850 \item The chain shape, as opposed to tree or hash table, is considered. 844 851 \item A doubly-linked list is being designed. 845 852 Generally, the discussed issues apply similarly for singly-linked lists. 846 Circular \vs ordered linking is discussed under List identity (\VRef{toc:lst:issue:ident}).853 Ordered linking (as opposed to circularly-linked) occurs. 847 854 \item Link fields are system-managed. 848 855 The system has freedom over how to represent these links. 849 856 The user works with the system-provided API to query and modify list membership. 857 \begin{comment} % yes, that's what I'm building; no, it's not an invariant for the design-issue discussions 850 858 \item The user data must provide storage for the list link-fields. 851 859 Hence, a list node is \emph{statically} defined as data and links \vs a node that is \emph{dynamically} constructed from data and links \see{\VRef{toc:lst:issue:attach}}. 860 \end{comment} 852 861 \end{itemize} 853 862 Alternatives to these assumptions are discussed under Future Work (\VRef{toc:lst:futwork}). … … 857 866 \label{s:PreexistingLinked-ListLibraries} 858 867 859 To show examples of the concepts being defined, t wo preexisting linked-list libraries are used throughoutand further libraries are introduced as needed.868 To show examples of the concepts being defined, these two preexisting linked-list libraries are used throughout, and further libraries are introduced as needed. 860 869 \begin{enumerate} 861 870 \item Linux Queue library~\cite{lst:linuxq} (LQ) of @<sys/queue.h>@. … … 864 873 %A general comparison of libraries' abilities is given under Related Work (\VRef{toc:lst:relwork}). 865 874 For the discussion, assume the type @req@ (request) is the user's payload in examples. 866 Then the job of a list library is to help a user manage (organize) requests, \eg a request can be a network arrival-event processed by a web browser or a thread blocked/scheduled by the runtime. 875 Then the job of a list library is to help a user manage (organize) requests. 876 A request might be a network arrival event processed by a web browser or a thread blocked/scheduled by a runtime. 867 877 868 878 … … 872 882 Link attachment deals with the question: 873 883 Where are the libraries' inter-node link-fields stored, in relation to the user's payload data fields? 884 874 885 \VRef[Figure]{fig:lst-issues-attach} shows three basic styles. 875 886 \VRef[Figure]{f:Intrusive} shows the \newterm{intrusive} style, placing the link fields inside the payload structure. … … 877 888 The wrapped style distinguishes between wrapping a reference or a value, \eg @list<req *>@ or @list<req>@. 878 889 (For this discussion, @list<req &>@ is similar to @list<req *>@.) 879 This difference is one of user style and performance (copying), notframework capability.880 Library LQ is intrusive; STL is wrapped withreference or value.890 This difference is one of user style, performance (due to copying) and ownership; it is not a matter of framework capability. 891 Library LQ is intrusive; STL is wrapped, supporting either reference or value. 881 892 882 893 \begin{comment} … … 949 960 The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs}; 950 961 head objects are discussed in \VRef{toc:lst:issue:ident}. 951 In \protect\subref*{f:Intrusive}, the field \lstinline{req.d} names a list direction;962 In \protect\subref*{f:Intrusive}, the field \lstinline{req.d} names a list axis; 952 963 these are discussed in \VRef{s:Axis}. 953 964 In \protect\subref*{f:WrappedRef} and \protect\subref*{f:WrappedValue}, the type \lstinline{node} represents a … … 959 970 960 971 Each diagram in \VRef[Figure]{fig:lst-issues-attach} is using the fewest dynamic allocations for its respective style: 961 in intrusive, here is no dynamic allocation, in wrapped reference only the linked fields are dynamically allocated, and in wrapped value the copy data-area and linked fields are dynamically allocated.972 in intrusive, there is no dynamic allocation, in wrapped reference only the linked fields are dynamically allocated, and in wrapped value the copy data-area and linked fields are dynamically allocated. 962 973 The advantage of intrusive is the control in memory layout and storage placement. 963 974 Both wrapped styles have independent storage layout and imply library-induced heap allocations, with lifetime that matches the item's membership in the list. 975 976 \begin{comment} % duplicated from s:Axis, blurs the taxonomy 964 977 In all three cases, a @req@ object can enter and leave a list many times. 965 978 However, in intrusive a @req@ can only be on one list at a time, unless there are separate link-fields for each simultaneous list. … … 968 981 however, knowing which of the @req@ object is the \emph{true} object becomes complex. 969 982 \see*{\VRef{s:Axis} for further discussion.} 970 971 The implementation of @LIST_ENTRY@ uses a trick to find the links and the node containing the links. 972 The macro @LIST_INSERT_HEAD( &reqs, &r2, d )@ takes the list header, a pointer to the node, and the offset of the link fields in the node. 973 One of the fields generated by @LIST_ENTRY@ is a pointer to the node, which is set to the node address, \eg @r2@. 974 Hence, the offset to the link fields provides an access to the entire node, because the node points at itself. 975 For list traversal, @LIST_FOREACH( cur, &reqs_pri, by_pri )@, there is the node cursor, the list, and the offset of the link fields within the node. 976 The traversal actually moves from link fields to link fields within a node and sets the node cursor from the pointer within the link fields back to the node. 977 978 A further aspect of layout control is allowing the user to explicitly specify link fields controlling placement and attributes within the @req@ object. 979 LQ allows this ability through the @LIST_ENTRY@ macro\footnote{It is possible to have multiple named linked fields allowing a node to appear on multiple lists simultaneously.}, which can be placed anywhere in the node. 983 \end{comment} 984 985 In the LQ column, \subref*{f:Intrusive}, the macro @LIST_INSERT_HEAD( &reqs, &r2, d )@ takes the list header, a pointer to the node, and the offset of the link fields in the node. 986 The user's last argument is the field name; the macro expansion uses it as @(&r2)->d@, causing the compiler to insert the link-field offset. 987 With this information, the API can use node pointers for conversing with the user, and also work on the link fields within. 988 However, when there is only one set of link fields, the user is required to manage a redundant parameter. 989 990 An aspect of layout control is allowing the user to specify link fields explicitly, controlling field order and attributes within the @req@ object. 991 LQ allows this ability through the @LIST_ENTRY@ macro\footnote{It is possible to have multiple of these link fields allowing a node to appear on multiple lists simultaneously.}, which can be placed anywhere in the node. 980 992 An example of an attribute on the link fields is cache alignment, possibly in conjunction with other @req@ fields, improving locality and/or avoiding false sharing. 981 For example, if a list is frequently traversed in the forward direction, and infrequently gets elements removed at randompositions, then an ideal layout for cache locality puts the forward links, together with frequently-used payload data on one cache line, leaving the reverse links on a colder cache line.993 If a list is frequently traversed in the forward direction, and infrequently gets elements removed at arbitrary positions, then an ideal layout for cache locality puts the forward links, together with frequently-used payload data on one cache line, leaving the reverse links on a colder cache line. 982 994 In contrast, supplying link fields by inheritance makes them implicit and relies on compiler placement, such as the start or end of @req@, and no explicit attributes. 983 995 Wrapped reference has no control over the link fields, but the separate data allows some control; 984 996 wrapped value has no control over data or links. 985 997 986 Another subtle advantage of intrusive a rrangement is that a reference to a user-level item (@req@) is sufficient to navigate or manage the item's membership.998 Another subtle advantage of intrusive attachment is that a reference to a user-level item (@req@) is sufficient to navigate or manage the item's membership. 987 999 In LQ, the intrusive @req@ pointer is the correct argument type for operations @LIST_NEXT@ or @LIST_REMOVE@; 988 1000 there is no distinguishing a @req@ from a @req@ in a list. … … 996 1008 Then, a novel use can put a @req@ in a list, without requiring any upstream change in the @req@ library. 997 1009 In intrusive, the ability to be listed must be planned during the definition of @req@. 998 When in doubt, optimistically adding a couple links for future use is cheap because links are small and memory is big.999 1010 1000 1011 \begin{figure} … … 1022 1033 An intrusive-primitive library like LQ lets users choose when to make this tradeoff. 1023 1034 A wrapped-primitive library like STL forces users to incur the costs of wrapping, whether or not they access its benefits. 1024 Like LQ, \CFAis capable of supporting a wrapped library, if need arose.1035 Like LQ, the \CFA intrusive library of \VRef[Chapter]{ch:list} is capable of supporting a wrapped library, if need arose. 1025 1036 1026 1037 … … 1048 1059 \newterm{Axis} deals with the question: 1049 1060 In how many different lists can a node be stored, at the same time? 1050 \VRef[Figure]{fig:lst-issues-multi-static} shows an example that can traverse all requests in priority order (field @pri@) or navigate among requests with the same request value (field @rqr@). 1051 Each of ``by priority'' and ``by common request value'' is a separate list. 1052 For example, there is a single priority-list linked in order [1, 2, 2, 3, 3, 4], where nodes may have the same priority, and there are three common request-value lists combining requests with the same values: [42, 42], [17, 17, 17], and [99], giving four head nodes, one for each list. 1053 The example shows a list can encompass all the nodes (by-priority) or only a subset of the nodes (three request-value lists). 1054 1055 As stated, the limitation of intrusive is knowing apriori how many groups of links are needed for the maximum number of simultaneous lists. 1056 Thus, the intrusive LQ example supports multiple, but statically many, link lists. 1057 Note, it is possible to reuse links for different purposes, \eg if a list in linked one way at one time and another way at another time, and these times do not overlap, the two different linkings can use the same link fields. 1061 \VRef[Figure]{fig:lst-issues-multi-static} shows an example that can traverse all requests in priority order (field @pri@) or navigate among requests with the same requestor (field @rqr@). 1062 Each of ``by priority'' and ``by common requestor'' is a separate axis. 1063 The example has a single priority-list linked (first axis): [1, 2, 2, 3, 3, 4], where nodes may have the same priority. 1064 And it has three common-requestor lists (second axis): [42, 42], [17, 17, 17], and [99], giving four head nodes, one for each list. 1065 The example shows an axis can encompass all the nodes (by-priority) or only a subset of the nodes (three by-common-requestor lists). 1066 1067 As stated, the limitation of intrusive is knowing a priori how many groups of links are needed for the maximum number of simultaneous lists. 1068 Thus, the intrusive LQ example supports multiple, but statically many, linked lists. 1069 Note, it is possible to reuse links for different purposes, \eg if an item is linked one way at one time and another way at another time, and these times do not overlap, the two different linkings can use the same link fields. 1058 1070 This feature is used in the \CFA runtime, where a thread node may be on a blocked or running list, but never on both simultaneously. 1059 1071 … … 1111 1123 \end{c++} 1112 1124 1113 Axis cannot be done with multiple inheritance, because there is no mechanism to either know the order of inheritance fields or name each inheritance.1114 Instead, a special type is require that contains the link fields and points at the node.1125 Simultaneous axes cannot be done with multiple inheritance, because there is no mechanism to either know the order of inheritance fields or name each inheritance. 1126 Instead, a special type is required that contains the link fields and points at the node. 1115 1127 \begin{cquote} 1116 1128 \setlength{\tabcolsep}{10pt} … … 1140 1152 int i = nodedl.@get()@.i; $\C{// indirection to node}$ 1141 1153 \end{c++} 1142 Hence, the \uCpp approach optimizes one set of intrusive links through the \CC inheritance mechanism, and falls back onto the LQ approach of explicit declarations for additional intrusive links.1143 However, \uCpp cannot apply the LQ trick for finding the links and node.1154 Hence, the \uCpp approach streamlines one set of intrusive links through the \CC inheritance mechanism, and falls back onto the LQ approach of explicit declarations for additional intrusive links. 1155 However, \uCpp does not have the LQ trick of passing a field name as a macro argument, resulting in the somewhat more cumbersome realization of subsequent link fields. 1144 1156 1145 1157 The major ergonomic difference among the approaches is naming and name usage. … … 1153 1165 \subsection{User Integration: Preprocessed \vs Type-System Mediated} 1154 1166 1155 While the syntax for LQ is reasonably succinct, it comes at the cost of using C preprocessor macros for generics, whi ch are not part of the language type-system, unlike \CC templates.1156 Hence, small errors in macro arguments can lead to large substitution mistakes, as the arguments may be textually written in many places and/or concatenated with other arguments/text to create new names and expressions.1167 While the syntax for LQ is reasonably succinct, it comes at the cost of using C preprocessor macros for generics, while macros are not part of the language's type system. 1168 Hence, small errors in macro arguments can lead to large substitution mistakes, as the arguments may be textually written in many places and/or concatenated with other arguments/text to create new names and expressions. 1157 1169 Hence, textual expansion can lead to a cascade of error messages that are confusing and difficult to debug. 1158 1170 For example, argument errors like @a.b@{\Large\color{red},}@c@, comma instead of period, or @by@{\Large\color{red}-}@pri@, minus instead of underscore, can produce many error messages. 1159 1171 Note, similar problems exist for \CC templates. 1160 1172 1161 Instead, language function calls (even with inlining) handle argument mistakes locally at the call, giving very specific error message. 1162 \CC @concepts@ were introduced in @templates@ to deal with this problem. 1173 Further issues with macros occur when the substitution result contains artifacts related to evaluation order that are not evident in the original. 1174 A real problem that occurred while preparing the linked-list performance evaluation stemmed from the innocent-looking 1175 \begin{c++} 1176 TAILQ_REMOVE(reqs, TAILQ_LAST(reqs, reql), d); 1177 \end{c++} 1178 not being equivalent to the explicitly multi-step version: 1179 \begin{c++} 1180 struct req * last = TAILQ_LAST(reqs, reql); 1181 TAILQ_REMOVE(reqs, last, d); 1182 \end{c++} 1183 It turns out that @TAILQ_REMOVE@ uses its ``which element to remove'' parameter at several places, importantly, one occurring after the removal's changes are in progress. 1184 When the second use encounters the macro substitution @TAILQ_LAST(reqs, reql)@, it obtains a different node than the first use got, with the removal's changes having alredy started. 1185 This macro-induced phenomenon led to an invalid pointer dereference (safety violation), at a run-time well after the removal at issue (costly to resolve). 1186 1187 Instead, language function calls (even with inlining) handle argument mistakes locally at the call, giving a very specific error message. In the world of \CC templates, concepts were introduced to deal with this problem. Furthermovre, language function calls use (only) C-language semantics for argument evaluation, instead of applealing to reasoning about how the semantics might play out in an invisible substitution result. 1188 1189 So, avoiding a macro-centric implementation helps uers avoid and diagnose critical mistakes. 1163 1190 1164 1191 % example of poor error message due to LQ's preprocessed integration … … 1174 1201 1175 1202 All examples so far use two distinct types for: 1176 an item found in a list (type @req@ of variable @r1@, see \VRef[Figure]{fig:lst-issues-attach}), and the list (type @reql@ of variable @reqs_pri@, see\VRef[Figure]{fig:lst-issues-ident}).1203 an item found in a list (type @req@ of variable @r1@, in \VRef[Figure]{fig:lst-issues-attach}), and the list ``itself'' (type @reql@ of variable @reqs@ in the same example, or @reqs_pri@, in \VRef[Figure]{fig:lst-issues-ident}). 1177 1204 This kind of list is \newterm{headed}, where the empty list is just a head. 1178 1205 An alternate \emph{ad-hoc} approach omits the header, where the empty list is no nodes. 1179 Here, a pointer to any node can traverse its link fields: right or left and around, depending on the data structure. 1180 Note, a headed list is a superset of an ad-hoc list, and can normally perform all of the ad-hoc operations. 1206 Here, all link traversals begin from an existing reference to a node. 1207 A give node may be able to see that it is first, but there is no direct access to a global first. 1208 Note, a headed-list API is a superset of an ad-hoc list's, and can normally perform all of the ad-hoc operations. 1181 1209 \VRef[Figure]{fig:lst-issues-ident} shows both approaches for different list lengths and unlisted elements. 1182 1210 For headed, there are length-zero lists (heads with no elements), and an element can be listed or not listed. … … 1203 1231 (Both types are doubly linked and an analogous choice is available for singly linked.) 1204 1232 1233 Libraries have not traditionally offered an ad-hoc list abstraction, but it is a common pattern when programmers roll their own inter-linked type. 1234 The ad-hoc pattern obviates the questions, ``Who should own the head?'' or, ``Are we sure the head outlives the elements?'' 1235 Providing library support for ad-hoc lists is an opportunity to lower the incentive for rolling one's own. 1205 1236 1206 1237 \subsection{End Treatment: Cased \vs Uniform } … … 1208 1239 All lists must have a logical \emph{beginning/ending}, otherwise list traversal is infinite. 1209 1240 \emph{End treatment} refers to how the list represents the lack of a predecessor/successor to demarcate end point(s). 1210 For example, in a doubly-linked list containing a single node, the next/prev links have no successor/predecessor nodes.1241 For example, a doubly-linked list implementation might represent a node that is solo in a list, by using null next/prev pointers. 1211 1242 Note, a list does not need to use links to denote its size; 1212 1243 it can use a node counter in the header, where $N$ node traversals indicates complete navigation of the list. 1213 1244 However, managing the number of nodes is an additional cost, whereas the links must always be managed. 1214 1245 1215 The following discussion refers to the LQ representations, detailed in \VRef[Figure]{fig:lst-issues-end}, using a null pointer to mark end points. 1216 LQ uses this representation for its successor/last. 1246 The following discussion refers to the LQ representations, detailed in \VRef[Figure]{fig:lst-issues-end}, which uses a null @succ@ pointer to mark the last element. 1217 1247 For example, consider the operation of inserting after a given element. 1218 1248 A doubly-linked list must update the given node's successor, to make its predecessor-pointer refer to the new node. … … 1229 1259 LQ sub-object-level representation of links and ends. 1230 1260 Each object's memory is pictured as a vertical strip. 1231 Pointers' target locations, within these strips, aresignificant.1232 Uniform treatment of the first-end is evident from an assertion like \lstinline{(**this .pred == this)} holding for all nodes \lstinline{this}, including the first one.1233 Cased treatment of the last-end is evident from the symmetric proposition, \lstinline{(this.succ.pred == &this.succ)}, failing when \lstinline{this} is the last node.1261 The location, within a strip, at which an arrow points, is significant. 1262 Uniform treatment of the first-end is evident from an assertion like \lstinline{(**this->pred == this)} holding for all nodes \lstinline{this}, including the first one. 1263 Cased treatment of the last-end is evident from the last-end analogous proposition, \lstinline{(this->succ->pred == &this->succ)}, failing when \lstinline{this} is the last node. 1234 1264 } 1235 1265 \label{fig:lst-issues-end} … … 1237 1267 1238 1268 Interestingly, this branch is sometimes avoidable, giving a uniform end-treatment in the code. 1239 For example, LQ is headed at the front.1269 For example, LQ is uniform headed at the front. 1240 1270 For predecessor/first navigation, the relevant operation is inserting before a given element. 1241 1271 LQ's predecessor representation is not a pointer to a node, but a pointer to a pseudo-successor pointer. … … 1257 1287 1258 1288 A string is a sequence of symbols, where the form of a symbol can vary significantly: 7/8-bit characters (ASCII/Latin-1), or 2/4/8-byte (UNICODE) characters/symbols or variable length (UTF-8/16/32) characters. 1259 A string can be read left-to-right, right-to-left, top-to-bottom, and have stacked elements (Arabic). 1289 When drawn for human reading, text can be left-to-right, right-to-left, top-to-bottom, or have stacked elements (\eg Arabic). 1290 But a string serves all of these presentations by forming the sequence of symbols that matches the order in which the text is read. 1260 1291 1261 1292 A C character constant is an ASCII/Latin-1 character enclosed in single-quotes, \eg @'x'@, @'@\textsterling@'@. … … 1263 1294 A character can be formed from an escape sequence, which expresses a non-typable character @'\f'@, form feed, a delimiter character @'\''@, embedded single quote, or a raw character @'\xa3'@, \textsterling. 1264 1295 1265 A C character stringis zero or more regular, wide, or escape characters enclosed in double-quotes @"xyz\n"@.1266 The kind of characters in the string is denoted by a prefix: wide characters are prefixed by @L@, @u@, or @U@; UTF-8 charactersare prefixed by @u8@.1296 A C string constant is zero or more regular, wide, or escape characters enclosed in double-quotes @"xyz\n"@. 1297 The kind of characters in the string is denoted by a prefix: wide-character strings are prefixed by the same @L@, @u@, or @U@; UTF-8 strings, where characters have nonuniform byte lengths, are prefixed by @u8@. 1267 1298 1268 1299 For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding to the multi-byte character sequence, \eg @L"abc@$\mu$@"@ and are read/printed using @wscanf@/@wprintf@. … … 1280 1311 This representation means that there is no real limit to how long a string can be, but programs have to scan one completely to determine its length.~\cite[p.~36]{C:old} 1281 1312 \end{quote} 1282 This property is only preserved by the compiler with respect to character constants, \eg @"abc"@ is actually @ "abc\0"@, \ie 4 characters rather than 3.1313 This property is only preserved by the compiler with respect to character constants, \eg @"abc"@ is actually @{'a', 'b', 'c', '\0'}@, \ie 4 characters rather than 3. 1283 1314 Otherwise, the compiler does not participate, making string operations both unsafe and inefficient. 1284 1315 For example, it is common in C to: … … 1297 1328 As a result, storage management for C strings is a nightmare, quickly resulting in array overruns and incorrect results. 1298 1329 1299 Collectively, these design decisions make working with strings in C , awkward, time consuming,and unsafe.1330 Collectively, these design decisions make working with strings in C awkward, time consuming and unsafe. 1300 1331 While there are companion string routines that take the maximum lengths of strings to prevent array overruns, \eg @strncpy@, @strncat@, @strncpy@, that means the semantics of the operation can fail because strings are truncated. 1301 1332 Suffice it to say, C is not a go-to language for string applications, which is why \CC introduced the dynamically-sized @string@ type. -
doc/theses/mike_brooks_MMath/intro.tex
rac9c0ee8 r74accc6 3 3 All modern programming languages provide at least these three high-level containers (collections): array, linked-list, and string. 4 4 Often, the array is part of the programming language, while linked lists are built from (recursive) pointer types, and strings from arrays and/or linked lists. 5 For all three types, languages and/or their libraries supply varying degrees of high-level mechanisms for manipulating these objects at the bulk and component levels, such as copying, slicing, extracting, and iterating among elements.5 For all three types, languages and/or their libraries supply varying degrees of high-level mechanisms for manipulating these objects at the bulk and component levels, such as copying, slicing, extracting, iterating among, adding and removing elements. 6 6 7 7 Unfortunately, memory misuse under C-idiomatic programming causes a significant number of errors~\cite{Oorschot23}. … … 26 26 It first puts a corpus of C programs through a naive Rust translation, which ``succeeds'' by leaving many uses of Rust's \lstinline{unsafe} marker. 27 27 Then, it analyzes these uses and presents a technique for reducing the uses that hinge exclusively on referenced object lifetime. 28 Pointer dereference accounts for 80\% of the cases that the na \"{i}ve translation did not make safe (Table 3.2, p.~33).28 Pointer dereference accounts for 80\% of the cases that the naive translation did not make safe (Table 3.2, p.~33). 29 29 Of these, 10\% are eligible for the work's own safety improvement technique, while this filtering leaves 75\% with no single cause of unsafety determined (Table 3.5, p.~39). 30 30 The presented technique succeeds at making 75\% of the eligible initially unsafe dereferences safe, by inferring the necessary lifetime annotations (Table 4.2, p.~83). … … 33 33 % 220 ~= (99,101 [@tab3.5] - 9631 [@tab4.2] ) / 413,428 [@tab 3.1] * 1000 34 34 35 35 36 A tool for analyzing C code's use of linked data structures (including, \eg skip list and binary tree) found variations of the linked-list shape to be present in most of the programs in its corpus~\cite{White2016}. 37 36 38 So, C's array unsafety is infamous, its string pattern necessitates explicit storage management (also infamous), and user-written linked lists are a recognizable player in the arena of (infamously) unsafe raw pointer uses. 37 39 Therefore, hardening the three primary collections goes a long way to making the majority of C programs safer, eliminating major hacker attack vectors. … … 62 64 \section{Linked List} 63 65 64 A linked-list provides a homogeneous container often with $O(log N)$ or $O(N)$ access to elements using successor and predecessor operations that normally involve pointer chasing. 65 Subscripting by value (rather than position or location as for array) is sometimes available, \eg hash table. 66 A linked-list provides a homogeneous container often with $O(n)$ access to elements using successor and predecessor operations that involve pointer chasing. 67 It is the most foundational case of the linked data structures, which include trees and hash tables. 68 Among the linked structures, the list's defining characteristic is maintaining a user-explicit total order (chain shape). 69 Element search can be by position (as for array) or by value; it is sometimes presented as a subscript. 70 In a more complex structure, the linked shape may be system managed (user impicit) by being a function of a designated ``key'' datum, \eg often for a hash table. 71 Though such schemes often give better-than-$O(n)$ lookup, the linked list's user-explicit shape limits what the system can provide, to fast step/iteration for user-``nearby'' data, and slow exhaustive search/seek otherwise. 66 72 Linked types are dynamically sized by adding and removing nodes using link fields internal or external to the list elements (nodes). 67 If a programming language allows pointers to stack storage, linked -listnodes can be allocated on the stack and connected with stack addresses;73 If a programming language allows pointers to stack storage, linked nodes can be allocated on the stack and connected with stack addresses; 68 74 otherwise, elements are heap allocated with internal or external link fields. 75 76 69 77 70 78 C provides a few simple, library data-structures (@glibc@): … … 81 89 82 90 A string provides a dynamic array of homogeneous elements, where the elements are (often) some form of human-readable characters. 83 What differentiates a string from other types i nthat many of its operations work on groups of elements for scanning and changing, \eg @index@ and @substr@.84 While subscripting individual elements is usually available, working at the character level is considered poor practi se, \ie underutilizing the powerful string operations.85 Therefore, the cost of a string operation is usually less important than the power of the operation to accomplish complex text manipulation, \eg search, analy sing, composing, and decomposing string text.91 What differentiates a string from other types is that many of its operations work on groups of elements for scanning and changing, \eg @index@ and @substr@. 92 While subscripting individual elements is usually available, working at the character level is considered poor practice, \ie underutilizing the powerful string operations. 93 Therefore, the cost of a string operation is usually less important than the power of the operation to accomplish complex text manipulation, \eg search, analyzing, composing, and decomposing string text. 86 94 The dynamic nature of a string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages. 87 In many cases, string management is separate from heap management, \ie string s roll-their-own heap.88 89 The C string type is just a character array and requires user storage -management to handle varying size.90 C adopts a terminating sentinel character, @'\0'@, to mark the end of a variable-length character -array versus a precedinglength field.95 In many cases, string management is separate from heap management, \ie string libraries roll their own heaps. 96 97 The C string type is just a character array and requires user storage management to handle varying size. 98 C adopts a terminating sentinel character, @'\0'@, to mark the end of a variable-length character array, as opposed to keeping an independent length field. 91 99 Hence, the sentinel character is excluded as an element within a string and determining the string length is an $O(N)$ operation. 92 100 The C standard library includes a number of high-level operations for working with this representation. … … 106 114 \section{Motivation} 107 115 108 The primary motivation for this work is two fold:116 The primary motivation for this work is twofold: 109 117 \begin{enumerate}[leftmargin=*] 110 118 \item … … 123 131 Furthermore, new languages must still interact with the underlying C operating system through fragile, type-unsafe, interlanguage-communication. 124 132 Switching to \CC is equally impractical as its complex and interdependent type-system (\eg objects, overloading, inheritance, templates) means idiomatic \CC code is difficult to use from C, and C programmers must expend significant effort learning new \CC features and idioms. 125 Finally, projects claiming to be modern C replacements (Cyclone~\cite{Grossman06}, Polymorphic C~\cite{Smith98}, GObject~\cite{GObject}) do not retain C backwards compatibility in syntax, program ing model, or semantic compatibility;133 Finally, projects claiming to be modern C replacements (Cyclone~\cite{Grossman06}, Polymorphic C~\cite{Smith98}, GObject~\cite{GObject}) do not retain C backwards compatibility in syntax, programming model, or semantic compatibility; 126 134 these languages are equivalent to switching to a different programming language. 127 135 Hence, rewriting and retraining costs for other languages are prohibitive when companies have a large C software-base (Google, Apple, Microsoft, Amazon, AMD, Nvidia). 128 136 129 137 130 \section{ C?}138 \section{Which C?} 131 139 132 140 Like many established programming languages, C has a standards committee and multiple ANSI/\-ISO language manuals~\cite{C99,C11,C18,C23}. … … 137 145 These example programs show 138 146 \begin{itemize}[itemsep=0pt] 139 \item ifthe compiler accepts or rejects certain syntax,140 \item print s output to buttress a behavioural claim,141 \item or fails to trigger embedded assertions testing pre/post-assertions or invariants.147 \item whether the compiler accepts or rejects certain syntax, 148 \item printed output to buttress a behavioural claim, or 149 \item passed/failed assertions testing pre/post-conditions or invariants. 142 150 \end{itemize} 143 151 These programs are tested across @gcc@ versions 8--14 and clang versions 10--14 running on ARM, AMD, and Intel architectures. … … 159 167 Introduce a small number of subtle changes to the typing rules for the C array, while still achieving significant backwards compatibility. 160 168 \item 161 Create a new polymorphic mechanism in the \CFA @forall@ clause to specify array dimension values, similar to a fixed-typed parameter in a \CC \lstinline[language=C++]{template}.162 \item 163 Construct a new standard-library array -type, available through @#include <array.hfa>@.169 Create a new polymorphic mechanism in the \CFA @forall@ clause to specify array dimension values, similar to a \CC non-typed template parameter. 170 \item 171 Construct a new standard-library array type, available through @#include <array.hfa>@. 164 172 \end{enumerate} 165 The new array type , enabled by prior features, defines an array withguaranteed runtime bound checks (often optimizer-removable) and implicit (guaranteed accurate) inter-function length communication.166 Leveraging the preexisting \CFA type-system to achieve this length-related type checking is an original contribution.167 Furthermore, the new array incorporates multidimensional capabilities typical of scientific/machine-learning languages, made to coexist with the C raw-memory-aware tradition in a novel way.168 The thesis includes a performance evaluation that shows \CFA arrays perform comparably with C arrays in many programming cases.173 The new array type accesses prior \CFA type-system features, delivering guaranteed runtime bound checks (often optimizer-removable) and implicit (guaranteed accurate) inter-function length communication. 174 Extending the preexisting \CFA type system to achieve this length-related type checking is an original contribution. 175 Furthermore, the new array incorporates multidimensional capabilities typical of scientific/machine-learning languages, made to coexist with the C raw-memory-aware tradition, in a novel way. 176 The thesis includes a performance-relevant assessment of generated code that shows the new \CFA array simultaneously delivering the performance of a C array and the safety of a \CC vector. 169 177 170 178 171 179 \subsection{Linked List} 172 180 173 The linked list is a new runtime library, available through @#include < dlist.hfa>@.181 The linked list is a new runtime library, available through @#include <list.hfa>@. 174 182 The library offers a novel combination of the preexisting \CFA features: references, inline inheritance, polymorphism, and declaration scoping. 175 A programmer's experience using the list data -types within the type system is novel.176 The list's runtime representation follows the established design pattern of intrusive link -fields for performance reasons, especially in concurrent programs.183 A programmer's experience using the list data types within the type system is novel. 184 The list's runtime representation follows the established design pattern of intrusive link fields for performance reasons, especially in concurrent programs. 177 185 The thesis includes a performance evaluation that shows the list performing comparably with other C-family intrusive lists, and notably better than the \CC standard list. 178 186 … … 182 190 A new string type and its runtime library are available through @#include <string.hfa>@. 183 191 The library offers a type where basic usage is comparable to the \CC @string@ type, including analogous coexistence with raw-character pointers. 184 Its implementation, however, follows different principles, enabling programs to work with strings by value, without incurring excessive copying.192 Its implementation, however, follows different principles, enabling programs to work with strings as simple values, without incurring excessive copying. 185 193 Mutability is retained. 186 Substrings are supported, including the ability for overlapping ranges to share edits transparently.187 The implementation allows a set of strings to use an optimized, shared, custom heap or to use the regular heap for thread safety.194 Substrings are supported, including the option for overlapping ranges to share edits transparently, or act as diverging snapshots. 195 The implementation allows a set of strings to use an optimized, shared, custom heap, or to use the regular heap, on strings that are passed among threads. 188 196 189 197 The string library includes reading and writing strings via the preexisting \CFA I/O stream library. -
doc/theses/mike_brooks_MMath/list.tex
rac9c0ee8 r74accc6 1 1 \chapter{Linked List} 2 \label{ch:list} 2 3 3 4 This chapter presents my work on designing and building a linked-list library for \CFA. -
doc/theses/mike_brooks_MMath/programs/bkgd-carray-decay.c
rac9c0ee8 r74accc6 10 10 assert( sizeof(pa0x) != sizeof(ar) ); 11 11 12 void f( @float x[10]@, @float * y@) {12 void f( float x@[10]@, float @*@ y ) { 13 13 static_assert( sizeof(x) == sizeof(void *) ); 14 14 static_assert( sizeof(y) == sizeof(void *) ); 15 x = y; y = x; 15 16 } 16 f( 0, 0 );17 17 18 18 // reusing local variable float ar[10]
Note:
See TracChangeset
for help on using the changeset viewer.