Ignore:
Timestamp:
Apr 26, 2026, 7:49:45 PM (4 days ago)
Author:
Michael Brooks <mlbrooks@…>
Branches:
master
Children:
037dc57
Parents:
ac9c0ee8
Message:

proof-read through background chapter

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mike_brooks_MMath/background.tex

    rac9c0ee8 r74accc6  
    2121note: expected 'void (*)(void)' but argument is of type 'float *'
    2222\end{cfa}
    23 Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems inappropriate.
     23Clearly, @gcc@ understands these ill-typed cases, and yet allows the program to compile, which seems inappropriate.
    2424Compiling with flag @-Werror@, which turns warnings into errors, is often too pervasive, because some warnings are just warnings, \eg an unused variable.
    2525In the following discussion, \emph{ill-typed} means giving a nonzero @gcc@ exit condition with a message that discusses typing.
     
    8989this anomaly is \emph{fixed} with operator @->@, which performs the two operations in the more intuitive order: @sp->f@ $\Rightarrow$ @(*sp).f@.
    9090\end{itemize}
    91 While attempting to make the declaration and expression contexts consistent is a laudable goal, it has not worked out in practice, even though Dennis Richie believed otherwise:
     91While attempting to make the declaration and expression contexts consistent is a laudable goal, it has not worked out in practice, Dennis Richie's contrary position on the issue conceded:
    9292\begin{quote}
    9393In spite of its difficulties, I believe that the C's approach to declarations remains plausible, and am comfortable with it; it is a useful unifying principle.~\cite[p.~12]{Ritchie93}
    9494\end{quote}
    95 After all, reading a C array type is easy: just read it from the inside out following the ``clock-wise spiral rule''~\cite{Anderson94}.
     95As it stands, typed must be read from their declared identifier outward, following the ``clock-wise spiral rule''~\cite{Anderson94}.
    9696Unfortunately, \CFA cannot correct these operator priority inversions without breaking C compatibility.
    9797
    98 The alternative solution is for \CFA to provide its own type, variable and routine declarations, using a more intuitive syntax.
    99 The new declarations place qualifiers to the left of the base type, while C declarations place qualifiers to the right of the base type.
    100 The qualifiers have the same syntax and semantics in \CFA as in C, so there is nothing to learn.
     98The alternative solution is for \CFA to provide its own, more intuitive declaration syntax for  types, variables and routines.
     99The new declarations place type operators to the left of the base type, while C declarations place them (to the right of the base type and) spiraling around the declared object name.
     100The type operators have the same semantics in \CFA as in C, so there is nothing to learn.
    101101Then, a \CFA declaration is read left to right, where a function return-type is enclosed in brackets @[@\,@]@.
    102102\begin{cquote}
     
    131131(It is unclear if Hebrew or Arabic speakers, say declarations right to left.)
    132132
    133 Specifically, \VRef[Table]{bkgd:ar:usr:avp} introduces the many layers of the C and \CFA array story, where the \CFA story is discussion in \VRef[Chapter]{c:Array}.
     133Specifically, \VRef[Table]{bkgd:ar:usr:avp} introduces the many layers of the C and \CFA array story, where the \CFA story is discussed in \VRef[Chapter]{c:Array}.
    134134The \CFA-thesis column shows the new array declaration form, which is my contribution to safety and ergonomics.
    135135The table shows there are multiple yet equivalent forms for the array types under discussion, and subsequent discussion shows interactions with orthogonal (but easily confused) language features.
     
    294294\begin{quote}
    295295Except when it is the operand of the @sizeof@ operator, or the unary @&@ operator, or is a string literal used to
    296 initialize an array an expression that has type ``array of \emph{type}'' is converted to an expression with type
     296initialize an array, an expression that has type ``array of \emph{type}'' is converted to an expression with type
    297297``pointer to \emph{type}'' that points to the initial element of the array object~\cite[\S~6.3.2.1.3]{C11}
    298298\end{quote}
    299299This phenomenon is the famous (infamous) \newterm{pointer decay}, which is a decay of an array-typed expression into a pointer-typed one.
    300 It is worthy to note that the list of exceptional cases does not feature the occurrence of @ar@ in @ar[i]@.
    301 Thus, subscripting happens on pointers not arrays.
    302 
    303 Subscripting proceeds first with pointer decay, if needed.
    304 Next, \cite[\S~6.5.2.1.2]{C11} explains that @ar[i]@ is treated as if it is @(*((ar)+(i)))@.
     300It is worthy to note that, in the expression @ar[i]@, the subexpression @ar@, is not one of the standard's exceptions; @ar@ is converted to a pointer, before subscripting occurs.
     301Thus, \emph{all} subscripting happens on pointers, never arrays.
     302
     303After this pointer decay (when needed), subscripting proceeds.
     304In the next step, \cite[\S~6.5.2.1.2]{C11} explains that @ar[i]@ is treated as if it is @(*(ar+i))@.
    305305\cite[\S~6.5.6.8]{C11} explains that the addition, of a pointer with an integer type, is defined only when the pointer refers to an element that is in an array.
    306306The addition gives the address @i@ elements away from the element (@ar@, or @&ar[0]@).
    307307This address is valid if @ar@ is big enough and @i@ is small enough.
    308308Finally, \cite[\S~6.5.3.2.4]{C11} explains that the @*@ operator's result is the referenced element.
    309 Taken together, these rules illustrate that @ar[i]@ and @i[a]@ mean the same thing, as plus is commutative.
    310 
    311 Subscripting a pointer when the target is standard-inappropriate is still practically well-defined.
     309Taken together, these rules illustrate that @ar[i]@ and @i[a]@ mean the same thing, as addition is commutative.
     310
     311Subscripting a pointer when the target is standard-inappropriate is still, practically, well-defined.
    312312While the standard affords a C compiler freedom about the meaning of an out-of-bound access, or of subscripting a pointer that does not refer to an array element at all,
    313313the fact that C is famously both generally high-performance, and specifically not bound-checked, leads to an expectation that the runtime handling is uniform across legal and illegal accesses.
     
    317317fs[5] = 3.14;
    318318\end{cfa}
    319 The @malloc@ behaviour is specified as returning a pointer to ``space for an object whose size is'' as requested (\cite[\S~7.22.3.4.2]{C11}).
    320 But \emph{nothing} more is said about this pointer value, specifically that its referent might \emph{be} an array allowing subscripting.
    321 
    322 Under this assumption, a pointer, @p@, being subscripted (or added to, then dereferenced) by any value (positive, zero, or negative), gives a view of the program's entire address space, centred around @p@'s address, divided into adjacent @sizeof(*p)@ chunks, each potentially (re)interpreted as @typeof(*p).t@.
    323 I call this phenomenon \emph{array diffraction}, which is a diffraction of a single-element pointer into the assumption that its target is conceptually in the middle of an array whose size is unlimited in both directions, \eg @(&ar[5])[-200]@ or @(&ar[5])[200]@).
     319The @malloc@ behaviour is specified as returning a pointer to ``space for an object whose size is [the size requested]'' (\cite[\S~7.22.3.4.2]{C11}).
     320But \emph{nothing} more is said about this pointer, much less that its referent might \emph{be} an array, \ie that subscripting might be allowed at all.
     321
     322Under this assumption, a pointer, @p@, being subscripted (or added to, then dereferenced) by any value (positive, zero, or negative), gives a view of the program's entire address space, centred around @p@'s address, divided into adjacent @sizeof(*p)@ chunks, each potentially (re)interpreted as @typeof(*p)@.
     323I call this phenomenon \emph{array diffraction}, which is a diffraction of a single-element pointer into the assumption that its target is conceptually in the middle of an array, whose size is unlimited in both directions, \eg @(&ar[5])[-200]@ or @(&ar[5])[200]@.
     324
    324325No pointer is exempt from array diffraction.
    325326No array shows its elements without pointer decay.
     
    328329A further pointer--array confusion, closely related to decay, occurs in parameter declarations.
    329330\cite[\S~6.7.6.3.7]{C11} explains that when an array type is written for a parameter,
    330 the parameter's type becomes a type that can be summarized as the array-decayed type.
     331the parameter's type changes into (what is practically) the pointer-decayed type.
     332But pointer decay applies to an expression, while this transformation applies to a declaration.
    331333The respective handling of the following two parameter spellings shows that the array and pointer versions are identical.
    332334\lstinput{12-16}{bkgd-carray-decay.c}
     
    419421
    420422As of C99, the C standard supports a \newterm{variable length array} (VLA)~\cite[\S~6.7.5.2.5]{C99}, providing a dynamic-fixed array feature \see{\VRef{s:ArrayIntro}}.
    421 Note, the \CC standard does not support VLAs, but @g++@ and @clang@ provide them.
     423Note, the \CC standard does not support VLAs, but @g++@ and @clang@ provide a limited form of them.
    422424A VLA is used when the desired number of array elements is \emph{unknown} at compile time.
    423425\begin{cfa}
    424 size_t  cols;
    425 scanf( "%d", &cols );
    426 double ar[cols];
    427 \end{cfa}
    428 The array dimension is read from outside the program and used to create an array of size @cols@ on the stack.
     426size_t  n;
     427scanf( "%d", &n );
     428double ar[n];
     429\end{cfa}
     430The array dimension is read from outside the program and used to create an array of size @n@ on the stack.
    429431The VLA is implemented by the @alloca@ routine, which bumps the stack pointer.
    430432Unfortunately, there is significant misinformation about VLAs, \eg the stack size is limited (small), or VLAs cause stack failures or are inefficient.
     
    441443When working with arrays, \eg linear algebra, array dimensions are referred to as \emph{rows} and \emph{columns} for a matrix, adding \emph{planes} for a cube.
    442444(There is little terminology for higher dimensional arrays.)
    443 For example, an acrostic poem\footnote{A kind of poetry where the first, last or other letters in a line spell out a word or phrase in a vertical column.}
     445For example, an acrostic poem\footnote{A written presentation of words where the first, last or other letters in a line spell out a word or phrase in a vertical column.}
    444446can be treated as a grid of characters, where the rows are the text and the columns are the embedded keyword(s).
    445447Within a poem, there is the concept of a \newterm{slice}, \eg a row is a slice for the poem text, a column is a slice for a keyword.
    446448In general, the dimensioning and subscripting for multidimensional arrays has two syntactic forms: @m[r,c]@ or @m[r][c]@.
    447449
    448 Commonly, an array, matrix, or cube, is visualized (especially in mathematics) as a contiguous row, rectangle, or block.
    449 This conceptualization is reenforced by subscript ordering, \eg $m_{r,c}$ for a matrix and $c_{p,r,c}$ for a cube.
     450Commonly, an array, matrix or cube is visualized (especially in mathematics) as a contiguous row, rectangle or block.
     451This conceptualization is reinforced by subscript ordering, \eg $m_{r,c}$ for a matrix and $c_{p,r,c}$ for a cube.
    450452Few programming languages differ from the mathematical subscript ordering.
    451453However, computer memory is flat, and hence, array forms are structured in memory as appropriate for the runtime system.
     
    460462however, it requires all dimensions except the first to be specified at compile time, \eg @m[][6]@, allowing all subscripting stride calculations to be generated with constants.
    461463Hence, every matrix passed to @fp1@ must have exactly 6 columns but the row size can vary.
    462 The variable-dimension approach (right) ignores (violates) the type system, \ie argument and parameters types do not match, and subscripting is performed manually using pointer arithmetic in the macro @sub@.
     464The variable-dimension approach (right) ignores (violates) the type system, \ie the parameter type has no suggestion of mutidimensionality and some acrobatics are required for a w\footnote{
     465        One may be tempted to phrase a call as \lstinline{fp2( 4, 4, vm1 )}, but this call is ill-typed.  Argument \lstinline{vm1} could match parameter declarations \lstinline{int m[][4]} or \lstinline{int (*m)[4]}.  But only the argument \lstinline{&vm1[0][0]}, or its equivalent, but confusing, \lstinline{vm1[0]}, relate \lstinline{vm1} to parameter type \lstinline{int*}.
     466}, and subscripting is performed manually using pointer arithmetic in the macro @sub@.
    463467
    464468\begin{figure}
     
    470474        ...  printf( "%d ", @m[r][c]@ );  ...
    471475}
    472 int fm1[4][@6@], fm2[6][@6@]; // no VLA
     476int fm1[4][@6@], fm2[6][@6@]; // no VLA, same
    473477// initialize matrixes
    474478fp1( 4, fm1 ); // implicit 6 columns
     
    481485        ...  printf( "%d ", @sub( m, r, c )@ );  ...
    482486}
    483 int vm1[@4@][@4@], vm2[@6@][@8@]; // no VLA
     487int vm1[4][@4@], vm2[6][@8@]; // no VLA, different
    484488// initialize matrixes
    485 fp2( 4, 4, vm1 );
    486 fp2( 6, 8, vm2 );
    487 \end{cfa}
    488 \end{tabular}
    489 \caption{C90 Fixed \vs Variable Contiguous Matrix Styles}
     489fp2( 4, 4, @&vm1[0][0]@ );
     490fp2( 6, 8, &vm2[0][0] );
     491\end{cfa}
     492\end{tabular}
     493\caption{Pre-VLA Fixed \vs Variable Contiguous Matrix Styles}
    490494\label{f:FixedVariable}
    491495\end{figure}
    492496
    493 Many languages allow multidimensional arrays-of-arrays, \eg in Pascal or \CC.
     497Many languages allow multidimensional arrays-of-arrays, \eg Pascal and \CC.
    494498\begin{cquote}
    495499\setlength{\tabcolsep}{15pt}
    496500\begin{tabular}{@{}ll@{}}
    497501\begin{pascal}
     502(* Pascal *)
    498503var m : array[0..4, 0..4] of Integer;  (* matrix *)
    499504type AT = array[0..4] of Integer;  (* array type *)
     
    504509&
    505510\begin{c++}
     511// C++
    506512int m[5][5];
    507513
    508514typedef vector< vector<int> > MT;
    509 MT vm( 5, vector<int>( 5 ) );
    510 m@[1][2]@ = 1;  aa@[1][2]@ = 1;
     515MT vv( 5, vector<int>( 5 ) );
     516m@[1][2]@ = 1;  vv@[1][2]@ = 1;
    511517\end{c++}
    512518\end{tabular}
    513519\end{cquote}
    514520The language decides if the matrix and array-of-array are laid out the same or differently.
    515 For example, an array-of-array may be an array of row pointers to arrays of columns, so the rows may not be contiguous in memory nor even the same length (triangular matrix).
    516 Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though a language could differentiated between the two forms using subscript syntax, \eg @m[1,2]@ \vs @aa[1][2]@.
     521For example, an array-of-array may be an array of row pointers to arrays of columns, so the rows may not be contiguous in memory nor even the same length (\eg triangular matrix).
     522Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though a language could differentiate between the two forms using subscript syntax, \eg @m[1,2]@ \vs @aa[1][2]@.
    517523Nevertheless, controlling memory layout can make a difference in what operations are allowed and in performance (\eg caching/NUMA effects).
    518 C also provides non-contiguous arrays-of-arrays:
     524C also allows non-contiguous arrays-of-arrays:
    519525\begin{cfa}
    520526int m[5][5];                                                    $\C{// contiguous}$
     
    527533Nevertheless, the C array-of-array form is still important for special circumstances.
    528534
    529 \VRef[Figure]{f:ContiguousNon-contiguous} shows a powerful extension made in C99 for manipulating contiguous \vs non-contiguous arrays.\footnote{C90 also supported non-contiguous arrays.}
     535\VRef[Figure]{f:ContiguousNon-contiguous} shows a powerful extension made in C99, for manipulating contiguous \vs non-contiguous arrays.\footnote{C90 also supported non-contiguous arrays.  Though GNU-flavoured C++ eventually got VLAs, it never got this enhancement for managing a multidimensionl VLA parameter. }
    530536For contiguous-array arguments (including VLA), C99 conjoins one or more of the parameters as a downstream dimension(s), \eg @cols@, implicitly using this parameter to compute the row stride of @m@.
    531537Hence, if the declaration of @fc@ is changed to:
     
    533539void fc( int rows, int cols, int m[@rows@][@cols@] ) ...
    534540\end{cfa}
    535 there is now sufficient information to support array copying and subscript checking to prevent changing the argument or buffer-overflow problems, \emph{but neither feature is provided}.
     541there is now sufficient information to support array copying and subscript checking to prevent buffer-overflow problems, \emph{but neither feature is provided}.
    536542While this contiguous-array capability is a step forward, it is still the programmer's responsibility to manually manage the number of dimensions and their sizes, both at the function definition and call sites.
    537543That is, the array does not automatically carry its structure and sizes for use in computing subscripts.
    538544While the non-contiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknown-sized array of row pointers, and it relies on the programmer to traverse the columns in a row with a correctly bounded loop index.
    539 Specifically, there is no requirement that the rows are the same length, like a poem with different length lines.
     545Specifically, there is no requirement that the rows are the same length, like for a poem with different-length lines.
    540546
    541547\begin{figure}
     
    547553                for ( size_t  c = 0; c < cols; c += 1 )
    548554                        ...  @m[r][c]@  ...
     555                        // each r-step: cols * sizeof(int)
    549556}
    550557int m@[5][5]@;
     
    562569                for ( size_t  c = 0; c < cols; c += 1 )
    563570                        ...  @m[r][c]@  ...
     571                        // each r-step: 1 * sizeof(int*)
    564572}
    565573int @* aa[5]@;  // row pointers
     
    582590Again, the inspection begins by using @sizeof@ to provide program semantics for the intuition of an expression's type.
    583591\lstinput{16-18}{bkgd-carray-mdim.c}
    584 There are now three axis for deriving expressions from @mx@: \emph{itself}, \emph{first element}, and \emph{first grand-element} (meaning, first element of first element).
     592There are now three means of deriving expressions from @mx@: \emph{itself}, \emph{first element}, and \emph{first grand-element} (meaning, first element of first element).
    585593\lstinput{20-26}{bkgd-carray-mdim.c}
    586594Given that arrays are contiguous and the size of @float@ is 4, then the size of @mx@ with 3 $\times$ 10 floats is 120 bytes, the size of its first element (row) is 40 bytes, and the size of the first element of the first row is 4.
    587 Again, an array and a point to each of its axes are different.
     595Again, an array and a pointer to anything it contains are different.
    588596\lstinput{28-36}{bkgd-carray-mdim.c}
    589597As well, there is pointer decay from each of the matrix axes to pointers, all having the same address.
    590 \lstinput{38-44}{bkgd-carray-mdim.c}
    591598Finally, subscripting is allowed on a @malloc@ result, where the referent may or may not allow subscripting or have the right number of subscripts.
    592599
     
    596603Passing an array as an argument to a function is necessary.
    597604Assume a parameter is an array where the function intends to subscript it.
    598 This section asserts that a more satisfactory/formal characterization does not exist in C, then surveys the ways that C API authors communicate @p@ has zero or more dimensions, and finally calls out the minority cases where the C type system is using or verifying such claims.
     605This section asserts that a more satisfactory/formal characterization does not exist in C, then surveys the ways that C API authors communicate that @p@ has zero or more dimensions, and finally calls out the minority cases where the C type system is using or verifying such claims.
    599606
    600607A C parameter declaration looks different from the caller's and callee's perspectives.
     
    685692% So are @float[5]*@, @float[]*@ and @float (*)*@.  These latter ones are simply nonsense, though they hint at ``1d array of pointers'', whose equivalent syntax options are, @float *[5]@, @float *[]@, and @float **@.
    686693
    687 It is a matter of taste as to whether a programmer should use the left form to get the most out of commenting subscripting and dimension sizes, sticking to the right (avoiding false comfort from suggesting the typechecker is checking more than it is), or compromising in the middle (reducing unchecked information, yet clearly stating, ``I am subscript'').
     694It is a matter of taste as to whether a programmer should use the left form to get the most out of commenting subscripting and dimension sizes, sticking to the right (avoiding false comfort from suggesting the typechecker is checking more than it is), or compromising in the middle (reducing unchecked information, yet clearly stating, ``I am for subscripting'').
    688695
    689696Note that this equivalence of pointer and array declarations is special to parameters.
     
    771778}
    772779\end{cfa}
    773 The cases without comments are rejections, but simply because the array ranks do not match; in the commented cases, the ranks match and the rules being discussed apply.
     780The cases without comments are rejections, but simply because the array ranks do not match; in the commented cases, the ranks match and the rules are discussed apply.
    774781The cases @f( b )@ and @f( &a )@ show where some length checking occurs.
    775782But this checking misses the cases @f( d )@ and @f( &c )@, allowing the calls with mismatched lengths, actually 100 for 10.
     
    777784Ultimately, an inner dimension's size is a callee's \emph{assumption} because the type system uses declaration details in the callee's perspective that it does not enforce in the caller's perspective.
    778785
    779 Finally, to handle higher-dimensional VLAs, C repurposed the @*@ \emph{within} the dimension in a declaration to mean that the callee has make an assumption about the size, but no (checked, possibly wrong) information about this assumption is included for the caller-programmer's benefit/\-over-confidence.
     786Finally, to handle higher-dimensional VLAs, C repurposed the @*@ \emph{within} the dimension in a declaration to mean that the callee must make an assumption about the size, but no (unchecked, possibly wrong) information about this assumption is included for the caller-programmer's benefit/\-over-confidence.
    780787\begin{cquote}
    781788@[@ \textit{type-qualifier-list$_{opt}$} @* ]@
     
    797804\label{s:ArraysCouldbeValues}
    798805
    799 All arrays have a know runtime size at their point of declaration.
     806All arrays have a known runtime size at their point of declaration.
    800807Furthermore, C provides an explicit mechanism to pass an array's dimensions to a function.
    801808Nevertheless, an array cannot be copied, and hence, not passed by value to a function, even when there is sufficient information to do so.
     
    827834\section{Linked List}
    828835
    829 Linked-lists are blocks of storage connected using one or more pointers.
     836Linked lists are blocks of storage connected using one or more pointers.
    830837The storage block (node) is logically divided into data (user payload) and links (list pointers), where the links are the only component used by the list structure.
    831 Since the data is opaque, list structures are often polymorphic over the data, which is often homogeneous.
    832 
    833 The links organize nodes into a particular kind of data structure, \eg queue, tree, hash table, \etc, with operations specific to that kind.
    834 Because a node's existence is independent of the data structure that organizes it, all nodes are manipulated by address not value;
    835 hence, all data structure routines take and return pointers to nodes and not the nodes themselves.
     838Since the data unused, list structures are often polymorphic over the data, which is often homogeneous.
     839
     840The links organize nodes into a particular shape of data structure, \eg chain, tree, hash table, \etc, with operations specific to that kind.
     841In all these cases, a node's address is significant, so nodes are communicated by reference/pointer, and never by copy (by value).
    836842
    837843
     
    842848Within this restricted space, all design-issue discussions assume the following invariants.
    843849\begin{itemize}
     850        \item The chain shape, as opposed to tree or hash table, is considered.
    844851        \item A doubly-linked list is being designed.
    845852                Generally, the discussed issues apply similarly for singly-linked lists.
    846                 Circular \vs ordered linking is discussed under List identity (\VRef{toc:lst:issue:ident}).
     853                Ordered linking (as opposed to circularly-linked) occurs.
    847854        \item Link fields are system-managed.
    848855                The system has freedom over how to represent these links.
    849856                The user works with the system-provided API to query and modify list membership.
     857\begin{comment} % yes, that's what I'm building; no, it's not an invariant for the design-issue discussions
    850858        \item The user data must provide storage for the list link-fields.
    851859                Hence, a list node is \emph{statically} defined as data and links \vs a node that is \emph{dynamically} constructed from data and links \see{\VRef{toc:lst:issue:attach}}.
     860\end{comment}
    852861\end{itemize}
    853862Alternatives to these assumptions are discussed under Future Work (\VRef{toc:lst:futwork}).
     
    857866\label{s:PreexistingLinked-ListLibraries}
    858867
    859 To show examples of the concepts being defined, two preexisting linked-list libraries are used throughout and further libraries are introduced as needed.
     868To show examples of the concepts being defined, these two preexisting linked-list libraries are used throughout, and further libraries are introduced as needed.
    860869\begin{enumerate}
    861870        \item Linux Queue library~\cite{lst:linuxq} (LQ) of @<sys/queue.h>@.
     
    864873%A general comparison of libraries' abilities is given under Related Work (\VRef{toc:lst:relwork}).
    865874For the discussion, assume the type @req@ (request) is the user's payload in examples.
    866 Then the job of a list library is to help a user manage (organize) requests, \eg a request can be a network arrival-event processed by a web browser or a thread blocked/scheduled by the runtime.
     875Then the job of a list library is to help a user manage (organize) requests.
     876A request might be a network arrival event processed by a web browser or a thread blocked/scheduled by a runtime.
    867877
    868878
     
    872882Link attachment deals with the question:
    873883Where are the libraries' inter-node link-fields stored, in relation to the user's payload data fields?
     884
    874885\VRef[Figure]{fig:lst-issues-attach} shows three basic styles.
    875886\VRef[Figure]{f:Intrusive} shows the \newterm{intrusive} style, placing the link fields inside the payload structure.
     
    877888The wrapped style distinguishes between wrapping a reference or a value, \eg @list<req *>@ or @list<req>@.
    878889(For this discussion, @list<req &>@ is similar to @list<req *>@.)
    879 This difference is one of user style and performance (copying), not framework capability.
    880 Library LQ is intrusive; STL is wrapped with reference or value.
     890This difference is one of user style, performance (due to copying) and ownership; it is not a matter of framework capability.
     891Library LQ is intrusive; STL is wrapped, supporting either reference or value.
    881892
    882893\begin{comment}
     
    949960                The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs};
    950961                head objects are discussed in \VRef{toc:lst:issue:ident}.
    951                 In \protect\subref*{f:Intrusive}, the field \lstinline{req.d} names a list direction;
     962                In \protect\subref*{f:Intrusive}, the field \lstinline{req.d} names a list axis;
    952963                these are discussed in \VRef{s:Axis}.
    953964                In \protect\subref*{f:WrappedRef} and \protect\subref*{f:WrappedValue}, the type \lstinline{node} represents a
     
    959970
    960971Each diagram in \VRef[Figure]{fig:lst-issues-attach} is using the fewest dynamic allocations for its respective style:
    961 in intrusive, here is no dynamic allocation, in wrapped reference only the linked fields are dynamically allocated, and in wrapped value the copy data-area and linked fields are dynamically allocated.
     972in intrusive, there is no dynamic allocation, in wrapped reference only the linked fields are dynamically allocated, and in wrapped value the copy data-area and linked fields are dynamically allocated.
    962973The advantage of intrusive is the control in memory layout and storage placement.
    963974Both wrapped styles have independent storage layout and imply library-induced heap allocations, with lifetime that matches the item's membership in the list.
     975
     976\begin{comment} % duplicated from s:Axis, blurs the taxonomy
    964977In all three cases, a @req@ object can enter and leave a list many times.
    965978However, in intrusive a @req@ can only be on one list at a time, unless there are separate link-fields for each simultaneous list.
     
    968981however, knowing which of the @req@ object is the \emph{true} object becomes complex.
    969982\see*{\VRef{s:Axis} for further discussion.}
    970 
    971 The implementation of @LIST_ENTRY@ uses a trick to find the links and the node containing the links.
    972 The macro @LIST_INSERT_HEAD( &reqs, &r2, d )@ takes the list header, a pointer to the node, and the offset of the link fields in the node.
    973 One of the fields generated by @LIST_ENTRY@ is a pointer to the node, which is set to the node address, \eg @r2@.
    974 Hence, the offset to the link fields provides an access to the entire node, because the node points at itself.
    975 For list traversal, @LIST_FOREACH( cur, &reqs_pri, by_pri )@, there is the node cursor, the list, and the offset of the link fields within the node.
    976 The traversal actually moves from link fields to link fields within a node and sets the node cursor from the pointer within the link fields back to the node.
    977 
    978 A further aspect of layout control is allowing the user to explicitly specify link fields controlling placement and attributes within the @req@ object.
    979 LQ allows this ability through the @LIST_ENTRY@ macro\footnote{It is possible to have multiple named linked fields allowing a node to appear on multiple lists simultaneously.}, which can be placed anywhere in the node.
     983\end{comment}
     984
     985In the LQ column, \subref*{f:Intrusive}, the macro @LIST_INSERT_HEAD( &reqs, &r2, d )@ takes the list header, a pointer to the node, and the offset of the link fields in the node.
     986The user's last argument is the field name; the macro expansion uses it as @(&r2)->d@, causing the compiler to insert the link-field offset.
     987With this information, the API can use node pointers for conversing with the user, and also work on the link fields within.
     988However, when there is only one set of link fields, the user is required to manage a redundant parameter.
     989
     990An aspect of layout control is allowing the user to specify link fields explicitly, controlling field order and attributes within the @req@ object.
     991LQ allows this ability through the @LIST_ENTRY@ macro\footnote{It is possible to have multiple of these link fields allowing a node to appear on multiple lists simultaneously.}, which can be placed anywhere in the node.
    980992An example of an attribute on the link fields is cache alignment, possibly in conjunction with other @req@ fields, improving locality and/or avoiding false sharing.
    981 For example, if a list is frequently traversed in the forward direction, and infrequently gets elements removed at random positions, then an ideal layout for cache locality puts the forward links, together with frequently-used payload data on one cache line, leaving the reverse links on a colder cache line.
     993If a list is frequently traversed in the forward direction, and infrequently gets elements removed at arbitrary positions, then an ideal layout for cache locality puts the forward links, together with frequently-used payload data on one cache line, leaving the reverse links on a colder cache line.
    982994In contrast, supplying link fields by inheritance makes them implicit and relies on compiler placement, such as the start or end of @req@, and no explicit attributes.
    983995Wrapped reference has no control over the link fields, but the separate data allows some control;
    984996wrapped value has no control over data or links.
    985997
    986 Another subtle advantage of intrusive arrangement is that a reference to a user-level item (@req@) is sufficient to navigate or manage the item's membership.
     998Another subtle advantage of intrusive attachment is that a reference to a user-level item (@req@) is sufficient to navigate or manage the item's membership.
    987999In LQ, the intrusive @req@ pointer is the correct argument type for operations @LIST_NEXT@ or @LIST_REMOVE@;
    9881000there is no distinguishing a @req@ from a @req@ in a list.
     
    9961008Then, a novel use can put a @req@ in a list, without requiring any upstream change in the @req@ library.
    9971009In intrusive, the ability to be listed must be planned during the definition of @req@.
    998 When in doubt, optimistically adding a couple links for future use is cheap because links are small and memory is big.
    9991010
    10001011\begin{figure}
     
    10221033An intrusive-primitive library like LQ lets users choose when to make this tradeoff.
    10231034A wrapped-primitive library like STL forces users to incur the costs of wrapping, whether or not they access its benefits.
    1024 Like LQ, \CFA is capable of supporting a wrapped library, if need arose.
     1035Like LQ, the \CFA intrusive library of \VRef[Chapter]{ch:list} is capable of supporting a wrapped library, if need arose.
    10251036
    10261037
     
    10481059\newterm{Axis} deals with the question:
    10491060In how many different lists can a node be stored, at the same time?
    1050 \VRef[Figure]{fig:lst-issues-multi-static} shows an example that can traverse all requests in priority order (field @pri@) or navigate among requests with the same request value (field @rqr@).
    1051 Each of ``by priority'' and ``by common request value'' is a separate list.
    1052 For example, there is a single priority-list linked in order [1, 2, 2, 3, 3, 4], where nodes may have the same priority, and there are three common request-value lists combining requests with the same values: [42, 42], [17, 17, 17], and [99], giving four head nodes, one for each list.
    1053 The example shows a list can encompass all the nodes (by-priority) or only a subset of the nodes (three request-value lists).
    1054 
    1055 As stated, the limitation of intrusive is knowing apriori how many groups of links are needed for the maximum number of simultaneous lists.
    1056 Thus, the intrusive LQ example supports multiple, but statically many, link lists.
    1057 Note, it is possible to reuse links for different purposes, \eg if a list in linked one way at one time and another way at another time, and these times do not overlap, the two different linkings can use the same link fields.
     1061\VRef[Figure]{fig:lst-issues-multi-static} shows an example that can traverse all requests in priority order (field @pri@) or navigate among requests with the same requestor (field @rqr@).
     1062Each of ``by priority'' and ``by common requestor'' is a separate axis.
     1063The example has a single priority-list linked (first axis): [1, 2, 2, 3, 3, 4], where nodes may have the same priority.
     1064And it has three common-requestor lists (second axis): [42, 42], [17, 17, 17], and [99], giving four head nodes, one for each list.
     1065The example shows an axis can encompass all the nodes (by-priority) or only a subset of the nodes (three by-common-requestor lists).
     1066
     1067As stated, the limitation of intrusive is knowing a priori how many groups of links are needed for the maximum number of simultaneous lists.
     1068Thus, the intrusive LQ example supports multiple, but statically many, linked lists.
     1069Note, it is possible to reuse links for different purposes, \eg if an item is linked one way at one time and another way at another time, and these times do not overlap, the two different linkings can use the same link fields.
    10581070This feature is used in the \CFA runtime, where a thread node may be on a blocked or running list, but never on both simultaneously.
    10591071
     
    11111123\end{c++}
    11121124
    1113 Axis cannot be done with multiple inheritance, because there is no mechanism to either know the order of inheritance fields or name each inheritance.
    1114 Instead, a special type is require that contains the link fields and points at the node.
     1125Simultaneous axes cannot be done with multiple inheritance, because there is no mechanism to either know the order of inheritance fields or name each inheritance.
     1126Instead, a special type is required that contains the link fields and points at the node.
    11151127\begin{cquote}
    11161128\setlength{\tabcolsep}{10pt}
     
    11401152int i = nodedl.@get()@.i;                               $\C{// indirection to node}$
    11411153\end{c++}
    1142 Hence, the \uCpp approach optimizes one set of intrusive links through the \CC inheritance mechanism, and falls back onto the LQ approach of explicit declarations for additional intrusive links.
    1143 However, \uCpp cannot apply the LQ trick for finding the links and node.
     1154Hence, the \uCpp approach streamlines one set of intrusive links through the \CC inheritance mechanism, and falls back onto the LQ approach of explicit declarations for additional intrusive links.
     1155However, \uCpp does not have the LQ trick of passing a field name as a macro argument, resulting in the somewhat more cumbersome realization of subsequent link fields.
    11441156
    11451157The major ergonomic difference among the approaches is naming and name usage.
     
    11531165\subsection{User Integration: Preprocessed \vs Type-System Mediated}
    11541166
    1155 While the syntax for LQ is reasonably succinct, it comes at the cost of using C preprocessor macros for generics, which are not part of the language type-system, unlike \CC templates.
    1156 Hence, small errors in macro arguments can lead to large substitution mistakes, as the arguments maybe textually written in many places and/or concatenated with other arguments/text to create new names and expressions.
     1167While the syntax for LQ is reasonably succinct, it comes at the cost of using C preprocessor macros for generics, while macros are not part of the language's type system.
     1168Hence, small errors in macro arguments can lead to large substitution mistakes, as the arguments may be textually written in many places and/or concatenated with other arguments/text to create new names and expressions.
    11571169Hence, textual expansion can lead to a cascade of error messages that are confusing and difficult to debug.
    11581170For example, argument errors like @a.b@{\Large\color{red},}@c@, comma instead of period, or @by@{\Large\color{red}-}@pri@, minus instead of underscore, can produce many error messages.
    11591171Note, similar problems exist for \CC templates.
    11601172
    1161 Instead, language function calls (even with inlining) handle argument mistakes locally at the call, giving very specific error message.
    1162 \CC @concepts@ were introduced in @templates@ to deal with this problem.
     1173Further issues with macros occur when the substitution result contains artifacts related to evaluation order that are not evident in the original.
     1174A real problem that occurred while preparing the linked-list performance evaluation stemmed from the innocent-looking
     1175\begin{c++}
     1176TAILQ_REMOVE(reqs, TAILQ_LAST(reqs, reql), d);
     1177\end{c++}
     1178not being equivalent to the explicitly multi-step version:
     1179\begin{c++}
     1180struct req * last = TAILQ_LAST(reqs, reql);
     1181TAILQ_REMOVE(reqs, last, d);
     1182\end{c++}
     1183It turns out that @TAILQ_REMOVE@ uses its ``which element to remove'' parameter at several places, importantly, one occurring after the removal's changes are in progress.
     1184When the second use encounters the macro substitution @TAILQ_LAST(reqs, reql)@, it obtains a different node than the first use got, with the removal's changes having alredy started.
     1185This macro-induced phenomenon led to an invalid pointer dereference (safety violation), at a run-time well after the removal at issue (costly to resolve).
     1186
     1187Instead, language function calls (even with inlining) handle argument mistakes locally at the call, giving a very specific error message.  In the world of \CC templates, concepts were introduced to deal with this problem.  Furthermovre, language function calls use (only) C-language semantics for argument evaluation, instead of applealing to reasoning about how the semantics might play out in an invisible substitution result.
     1188
     1189So, avoiding a macro-centric implementation helps uers avoid and diagnose critical mistakes.
    11631190
    11641191% example of poor error message due to LQ's preprocessed integration
     
    11741201
    11751202All examples so far use two distinct types for:
    1176 an item found in a list (type @req@ of variable @r1@, see \VRef[Figure]{fig:lst-issues-attach}), and the list (type @reql@ of variable @reqs_pri@, see \VRef[Figure]{fig:lst-issues-ident}).
     1203an item found in a list (type @req@ of variable @r1@, in \VRef[Figure]{fig:lst-issues-attach}), and the list  ``itself'' (type @reql@ of variable @reqs@ in the same example, or @reqs_pri@, in \VRef[Figure]{fig:lst-issues-ident}).
    11771204This kind of list is \newterm{headed}, where the empty list is just a head.
    11781205An alternate \emph{ad-hoc} approach omits the header, where the empty list is no nodes.
    1179 Here, a pointer to any node can traverse its link fields: right or left and around, depending on the data structure.
    1180 Note, a headed list is a superset of an ad-hoc list, and can normally perform all of the ad-hoc operations.
     1206Here, all link traversals begin from an existing reference to a node.
     1207A give node may be able to see that it is first, but there is no direct access to a global first.
     1208Note, a headed-list API is a superset of an ad-hoc list's, and can normally perform all of the ad-hoc operations.
    11811209\VRef[Figure]{fig:lst-issues-ident} shows both approaches for different list lengths and unlisted elements.
    11821210For headed, there are length-zero lists (heads with no elements), and an element can be listed or not listed.
     
    12031231(Both types are doubly linked and an analogous choice is available for singly linked.)
    12041232
     1233Libraries have not traditionally offered an ad-hoc list abstraction, but it is a common pattern when programmers roll their own inter-linked type.
     1234The ad-hoc pattern obviates the questions, ``Who should own the head?'' or, ``Are we sure the head outlives the elements?''
     1235Providing library support for ad-hoc lists is an opportunity to lower the incentive for rolling one's own.
    12051236
    12061237\subsection{End Treatment: Cased \vs Uniform }
     
    12081239All lists must have a logical \emph{beginning/ending}, otherwise list traversal is infinite.
    12091240\emph{End treatment} refers to how the list represents the lack of a predecessor/successor to demarcate end point(s).
    1210 For example, in a doubly-linked list containing a single node, the next/prev links have no successor/predecessor nodes.
     1241For example, a doubly-linked list implementation might represent a node that is solo in a list, by using null next/prev pointers.
    12111242Note, a list does not need to use links to denote its size;
    12121243it can use a node counter in the header, where $N$ node traversals indicates complete navigation of the list.
    12131244However, managing the number of nodes is an additional cost, whereas the links must always be managed.
    12141245
    1215 The following discussion refers to the LQ representations, detailed in \VRef[Figure]{fig:lst-issues-end}, using a null pointer to mark end points.
    1216 LQ uses this representation for its successor/last.
     1246The following discussion refers to the LQ representations, detailed in \VRef[Figure]{fig:lst-issues-end}, which uses a null @succ@ pointer to mark the last element.
    12171247For example, consider the operation of inserting after a given element.
    12181248A doubly-linked list must update the given node's successor, to make its predecessor-pointer refer to the new node.
     
    12291259                LQ sub-object-level representation of links and ends.
    12301260                Each object's memory is pictured as a vertical strip.
    1231                 Pointers' target locations, within these strips, are significant.
    1232                 Uniform treatment of the first-end is evident from an assertion like \lstinline{(**this.pred == this)} holding for all nodes \lstinline{this}, including the first one.
    1233                 Cased treatment of the last-end is evident from the symmetric proposition, \lstinline{(this.succ.pred == &this.succ)}, failing when \lstinline{this} is the last node.
     1261                The location, within a strip, at which an arrow points, is significant.
     1262                Uniform treatment of the first-end is evident from an assertion like \lstinline{(**this->pred == this)} holding for all nodes \lstinline{this}, including the first one.
     1263                Cased treatment of the last-end is evident from the last-end analogous proposition, \lstinline{(this->succ->pred == &this->succ)}, failing when \lstinline{this} is the last node.
    12341264        }
    12351265        \label{fig:lst-issues-end}
     
    12371267
    12381268Interestingly, this branch is sometimes avoidable, giving a uniform end-treatment in the code.
    1239 For example, LQ is headed at the front.
     1269For example, LQ is uniform headed at the front.
    12401270For predecessor/first navigation, the relevant operation is inserting before a given element.
    12411271LQ's predecessor representation is not a pointer to a node, but a pointer to a pseudo-successor pointer.
     
    12571287
    12581288A string is a sequence of symbols, where the form of a symbol can vary significantly: 7/8-bit characters (ASCII/Latin-1), or 2/4/8-byte (UNICODE) characters/symbols or variable length (UTF-8/16/32) characters.
    1259 A string can be read left-to-right, right-to-left, top-to-bottom, and have stacked elements (Arabic).
     1289When drawn for human reading, text can be left-to-right, right-to-left, top-to-bottom, or have stacked elements (\eg Arabic).
     1290But a string serves all of these presentations by forming the sequence of symbols that matches the order in which the text is read.
    12601291
    12611292A C character constant is an ASCII/Latin-1 character enclosed in single-quotes, \eg @'x'@, @'@\textsterling@'@.
     
    12631294A character can be formed from an escape sequence, which expresses a non-typable character @'\f'@, form feed, a delimiter character @'\''@, embedded single quote, or a raw character @'\xa3'@, \textsterling.
    12641295
    1265 A C character string is zero or more regular, wide, or escape characters enclosed in double-quotes @"xyz\n"@.
    1266 The kind of characters in the string is denoted by a prefix: wide characters are prefixed by @L@, @u@, or @U@; UTF-8 characters are prefixed by @u8@.
     1296A C string constant is zero or more regular, wide, or escape characters enclosed in double-quotes @"xyz\n"@.
     1297The kind of characters in the string is denoted by a prefix: wide-character strings are prefixed by the same @L@, @u@, or @U@; UTF-8 strings, where characters have nonuniform byte lengths, are prefixed by @u8@.
    12671298
    12681299For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding to the multi-byte character sequence, \eg @L"abc@$\mu$@"@ and are read/printed using @wscanf@/@wprintf@.
     
    12801311This representation means that there is no real limit to how long a string can be, but programs have to scan one completely to determine its length.~\cite[p.~36]{C:old}
    12811312\end{quote}
    1282 This property is only preserved by the compiler with respect to character constants, \eg @"abc"@ is actually @"abc\0"@, \ie 4 characters rather than 3.
     1313This property is only preserved by the compiler with respect to character constants, \eg @"abc"@ is actually @{'a', 'b', 'c', '\0'}@, \ie 4 characters rather than 3.
    12831314Otherwise, the compiler does not participate, making string operations both unsafe and inefficient.
    12841315For example, it is common in C to:
     
    12971328As a result, storage management for C strings is a nightmare, quickly resulting in array overruns and incorrect results.
    12981329
    1299 Collectively, these design decisions make working with strings in C, awkward, time consuming, and unsafe.
     1330Collectively, these design decisions make working with strings in C awkward, time consuming and unsafe.
    13001331While there are companion string routines that take the maximum lengths of strings to prevent array overruns, \eg @strncpy@, @strncat@, @strncpy@, that means the semantics of the operation can fail because strings are truncated.
    13011332Suffice it to say, C is not a go-to language for string applications, which is why \CC introduced the dynamically-sized @string@ type.
Note: See TracChangeset for help on using the changeset viewer.