Context Navigation

-              rac9c0ee8
+              r74accc6
 note: expected 'void (*)(void)' but argument is of type 'float *'
 \end{cfa}
 Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems inappropriate.
+Clearly, @gcc@ understands these ill-typed cases, and yet allows the program to compile, which seems inappropriate.
 Compiling with flag @-Werror@, which turns warnings into errors, is often too pervasive, because some warnings are just warnings, \eg an unused variable.
 In the following discussion, \emph{ill-typed} means giving a nonzero @gcc@ exit condition with a message that discusses typing.
 …
 this anomaly is \emph{fixed} with operator @->@, which performs the two operations in the more intuitive order: @sp->f@ $\Rightarrow$ @(*sp).f@.
 \end{itemize}
 While attempting to make the declaration and expression contexts consistent is a laudable goal, it has not worked out in practice, even though Dennis Richie believed otherwise:
+While attempting to make the declaration and expression contexts consistent is a laudable goal, it has not worked out in practice, Dennis Richie's contrary position on the issue conceded:
 \begin{quote}
 In spite of its difficulties, I believe that the C's approach to declarations remains plausible, and am comfortable with it; it is a useful unifying principle.~\cite[p.~12]{Ritchie93}
 \end{quote}
 After all, reading a C array type is easy: just read it from the inside out following the ``clock-wise spiral rule''~\cite{Anderson94}.
+As it stands, typed must be read from their declared identifier outward, following the ``clock-wise spiral rule''~\cite{Anderson94}.
 Unfortunately, \CFA cannot correct these operator priority inversions without breaking C compatibility.
 The alternative solution is for \CFA to provide its own type, variable and routine declarations, using a more intuitive syntax.
 The new declarations place qualifiers to the left of the base type, while C declarations place qualifiers to the right of the base type.
 The qualifiers have the same syntax and semantics in \CFA as in C, so there is nothing to learn.
+The alternative solution is for \CFA to provide its own, more intuitive declaration syntax for  types, variables and routines.
+The new declarations place type operators to the left of the base type, while C declarations place them (to the right of the base type and) spiraling around the declared object name.
+The type operators have the same semantics in \CFA as in C, so there is nothing to learn.
 Then, a \CFA declaration is read left to right, where a function return-type is enclosed in brackets @[@\,@]@.
 \begin{cquote}
 …
 (It is unclear if Hebrew or Arabic speakers, say declarations right to left.)
 Specifically, \VRef[Table]{bkgd:ar:usr:avp} introduces the many layers of the C and \CFA array story, where the \CFA story is discussion in \VRef[Chapter]{c:Array}.
+Specifically, \VRef[Table]{bkgd:ar:usr:avp} introduces the many layers of the C and \CFA array story, where the \CFA story is discussed in \VRef[Chapter]{c:Array}.
 The \CFA-thesis column shows the new array declaration form, which is my contribution to safety and ergonomics.
 The table shows there are multiple yet equivalent forms for the array types under discussion, and subsequent discussion shows interactions with orthogonal (but easily confused) language features.
 …
 \begin{quote}
 Except when it is the operand of the @sizeof@ operator, or the unary @&@ operator, or is a string literal used to
 initialize an array an expression that has type ``array of \emph{type}'' is converted to an expression with type
+initialize an array, an expression that has type ``array of \emph{type}'' is converted to an expression with type
 ``pointer to \emph{type}'' that points to the initial element of the array object~\cite[\S~6.3.2.1.3]{C11}
 \end{quote}
 This phenomenon is the famous (infamous) \newterm{pointer decay}, which is a decay of an array-typed expression into a pointer-typed one.
 It is worthy to note that the list of exceptional cases does not feature the occurrence of @ar@ in @ar[i]@.
 Thus, subscripting happens on pointers not arrays.
 Subscripting proceeds first with pointer decay, if needed.
 Next, \cite[\S~6.5.2.1.2]{C11} explains that @ar[i]@ is treated as if it is @(*((ar)+(i)))@.
+It is worthy to note that, in the expression @ar[i]@, the subexpression @ar@, is not one of the standard's exceptions; @ar@ is converted to a pointer, before subscripting occurs.
+Thus, \emph{all} subscripting happens on pointers, never arrays.
+After this pointer decay (when needed), subscripting proceeds.
+In the next step, \cite[\S~6.5.2.1.2]{C11} explains that @ar[i]@ is treated as if it is @(*(ar+i))@.
 \cite[\S~6.5.6.8]{C11} explains that the addition, of a pointer with an integer type, is defined only when the pointer refers to an element that is in an array.
 The addition gives the address @i@ elements away from the element (@ar@, or @&ar[0]@).
 This address is valid if @ar@ is big enough and @i@ is small enough.
 Finally, \cite[\S~6.5.3.2.4]{C11} explains that the @*@ operator's result is the referenced element.
 Taken together, these rules illustrate that @ar[i]@ and @i[a]@ mean the same thing, as plus is commutative.
 Subscripting a pointer when the target is standard-inappropriate is still practically well-defined.
+Taken together, these rules illustrate that @ar[i]@ and @i[a]@ mean the same thing, as addition is commutative.
+Subscripting a pointer when the target is standard-inappropriate is still, practically, well-defined.
 While the standard affords a C compiler freedom about the meaning of an out-of-bound access, or of subscripting a pointer that does not refer to an array element at all,
 the fact that C is famously both generally high-performance, and specifically not bound-checked, leads to an expectation that the runtime handling is uniform across legal and illegal accesses.
 …
 fs[5] = 3.14;
 \end{cfa}
+The @malloc@ behaviour is specified as returning a pointer to ``space for an object whose size is'' as requested (\cite[\S~7.22.3.4.2]{C11}).
+But \emph{nothing} more is said about this pointer value, specifically that its referent might \emph{be} an array allowing subscripting.
+Under this assumption, a pointer, @p@, being subscripted (or added to, then dereferenced) by any value (positive, zero, or negative), gives a view of the program's entire address space, centred around @p@'s address, divided into adjacent @sizeof(*p)@ chunks, each potentially (re)interpreted as @typeof(*p).t@.
+I call this phenomenon \emph{array diffraction}, which is a diffraction of a single-element pointer into the assumption that its target is conceptually in the middle of an array whose size is unlimited in both directions, \eg @(&ar[5])[-200]@ or @(&ar[5])[200]@).
+The @malloc@ behaviour is specified as returning a pointer to ``space for an object whose size is [the size requested]'' (\cite[\S~7.22.3.4.2]{C11}).
+But \emph{nothing} more is said about this pointer, much less that its referent might \emph{be} an array, \ie that subscripting might be allowed at all.
+Under this assumption, a pointer, @p@, being subscripted (or added to, then dereferenced) by any value (positive, zero, or negative), gives a view of the program's entire address space, centred around @p@'s address, divided into adjacent @sizeof(*p)@ chunks, each potentially (re)interpreted as @typeof(*p)@.
+I call this phenomenon \emph{array diffraction}, which is a diffraction of a single-element pointer into the assumption that its target is conceptually in the middle of an array, whose size is unlimited in both directions, \eg @(&ar[5])[-200]@ or @(&ar[5])[200]@.
 No pointer is exempt from array diffraction.
 No array shows its elements without pointer decay.
 …
 A further pointer--array confusion, closely related to decay, occurs in parameter declarations.
 \cite[\S~6.7.6.3.7]{C11} explains that when an array type is written for a parameter,
+the parameter's type becomes a type that can be summarized as the array-decayed type.
+the parameter's type changes into (what is practically) the pointer-decayed type.
+But pointer decay applies to an expression, while this transformation applies to a declaration.
 The respective handling of the following two parameter spellings shows that the array and pointer versions are identical.
 \lstinput{12-16}{bkgd-carray-decay.c}
 …
 As of C99, the C standard supports a \newterm{variable length array} (VLA)~\cite[\S~6.7.5.2.5]{C99}, providing a dynamic-fixed array feature \see{\VRef{s:ArrayIntro}}.
 Note, the \CC standard does not support VLAs, but @g++@ and @clang@ provide them.
+Note, the \CC standard does not support VLAs, but @g++@ and @clang@ provide a limited form of them.
 A VLA is used when the desired number of array elements is \emph{unknown} at compile time.
 \begin{cfa}
 size_t  cols;
 scanf( "%d", &cols );
 double ar[cols];
 \end{cfa}
 The array dimension is read from outside the program and used to create an array of size @cols@ on the stack.
+size_t  n;
+scanf( "%d", &n );
+double ar[n];
+\end{cfa}
+The array dimension is read from outside the program and used to create an array of size @n@ on the stack.
 The VLA is implemented by the @alloca@ routine, which bumps the stack pointer.
 Unfortunately, there is significant misinformation about VLAs, \eg the stack size is limited (small), or VLAs cause stack failures or are inefficient.
 …
 When working with arrays, \eg linear algebra, array dimensions are referred to as \emph{rows} and \emph{columns} for a matrix, adding \emph{planes} for a cube.
 (There is little terminology for higher dimensional arrays.)
 For example, an acrostic poem\footnote{A kind of poetry where the first, last or other letters in a line spell out a word or phrase in a vertical column.}
+For example, an acrostic poem\footnote{A written presentation of words where the first, last or other letters in a line spell out a word or phrase in a vertical column.}
 can be treated as a grid of characters, where the rows are the text and the columns are the embedded keyword(s).
 Within a poem, there is the concept of a \newterm{slice}, \eg a row is a slice for the poem text, a column is a slice for a keyword.
 In general, the dimensioning and subscripting for multidimensional arrays has two syntactic forms: @m[r,c]@ or @m[r][c]@.
 Commonly, an array, matrix, or cube, is visualized (especially in mathematics) as a contiguous row, rectangle, or block.
 This conceptualization is reenforced by subscript ordering, \eg $m_{r,c}$ for a matrix and $c_{p,r,c}$ for a cube.
+Commonly, an array, matrix or cube is visualized (especially in mathematics) as a contiguous row, rectangle or block.
+This conceptualization is reinforced by subscript ordering, \eg $m_{r,c}$ for a matrix and $c_{p,r,c}$ for a cube.
 Few programming languages differ from the mathematical subscript ordering.
 However, computer memory is flat, and hence, array forms are structured in memory as appropriate for the runtime system.
 …
 however, it requires all dimensions except the first to be specified at compile time, \eg @m[][6]@, allowing all subscripting stride calculations to be generated with constants.
 Hence, every matrix passed to @fp1@ must have exactly 6 columns but the row size can vary.
+The variable-dimension approach (right) ignores (violates) the type system, \ie argument and parameters types do not match, and subscripting is performed manually using pointer arithmetic in the macro @sub@.
+The variable-dimension approach (right) ignores (violates) the type system, \ie the parameter type has no suggestion of mutidimensionality and some acrobatics are required for a w\footnote{
+        One may be tempted to phrase a call as \lstinline{fp2( 4, 4, vm1 )}, but this call is ill-typed.  Argument \lstinline{vm1} could match parameter declarations \lstinline{int m[][4]} or \lstinline{int (*m)[4]}.  But only the argument \lstinline{&vm1[0][0]}, or its equivalent, but confusing, \lstinline{vm1[0]}, relate \lstinline{vm1} to parameter type \lstinline{int*}.
+}, and subscripting is performed manually using pointer arithmetic in the macro @sub@.
 \begin{figure}
 …
         ...  printf( "%d ", @m[r][c]@ );  ...
+}
 int fm1[4][@6@], fm2[6][@6@]; // no VLA
+int fm1[4][@6@], fm2[6][@6@]; // no VLA, same
 // initialize matrixes
 fp1( 4, fm1 ); // implicit 6 columns
 …
         ...  printf( "%d ", @sub( m, r, c )@ );  ...
+}
 int vm1[@4@][@4@], vm2[@6@][@8@]; // no VLA
+int vm1[4][@4@], vm2[6][@8@]; // no VLA, different
 // initialize matrixes
 fp2( 4, 4, vm1 );
 fp2( 6, 8, vm2 );
 \end{cfa}
 \end{tabular}
 \caption{C90 Fixed \vs Variable Contiguous Matrix Styles}
+fp2( 4, 4, @&vm1[0][0]@ );
+fp2( 6, 8, &vm2[0][0] );
+\end{cfa}
+\end{tabular}
+\caption{Pre-VLA Fixed \vs Variable Contiguous Matrix Styles}
 \label{f:FixedVariable}
 \end{figure}
 Many languages allow multidimensional arrays-of-arrays, \eg in Pascal or \CC.
+Many languages allow multidimensional arrays-of-arrays, \eg Pascal and \CC.
 \begin{cquote}
 \setlength{\tabcolsep}{15pt}
 \begin{tabular}{@{}ll@{}}
 \begin{pascal}
+(* Pascal *)
 var m : array[0..4, 0..4] of Integer;  (* matrix *)
 type AT = array[0..4] of Integer;  (* array type *)
 …
+&
 \begin{c++}
+// C++
 int m[5][5];
 typedef vector< vector<int> > MT;
 MT vm( 5, vector<int>( 5 ) );
 m@[1][2]@ = 1;  aa@[1][2]@ = 1;
+MT vv( 5, vector<int>( 5 ) );
+m@[1][2]@ = 1;  vv@[1][2]@ = 1;
 \end{c++}
 \end{tabular}
 \end{cquote}
 The language decides if the matrix and array-of-array are laid out the same or differently.
 For example, an array-of-array may be an array of row pointers to arrays of columns, so the rows may not be contiguous in memory nor even the same length (triangular matrix).
 Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though a language could differentiated between the two forms using subscript syntax, \eg @m[1,2]@ \vs @aa[1][2]@.
+For example, an array-of-array may be an array of row pointers to arrays of columns, so the rows may not be contiguous in memory nor even the same length (\eg triangular matrix).
+Regardless, there is usually a uniform subscripting syntax masking the memory layout, even though a language could differentiate between the two forms using subscript syntax, \eg @m[1,2]@ \vs @aa[1][2]@.
 Nevertheless, controlling memory layout can make a difference in what operations are allowed and in performance (\eg caching/NUMA effects).
 C also provides non-contiguous arrays-of-arrays:
+C also allows non-contiguous arrays-of-arrays:
 \begin{cfa}
 int m[5][5];                                                    $\C{// contiguous}$
 …
 Nevertheless, the C array-of-array form is still important for special circumstances.
 \VRef[Figure]{f:ContiguousNon-contiguous} shows a powerful extension made in C99 for manipulating contiguous \vs non-contiguous arrays.\footnote{C90 also supported non-contiguous arrays.}
+\VRef[Figure]{f:ContiguousNon-contiguous} shows a powerful extension made in C99, for manipulating contiguous \vs non-contiguous arrays.\footnote{C90 also supported non-contiguous arrays.  Though GNU-flavoured C++ eventually got VLAs, it never got this enhancement for managing a multidimensionl VLA parameter. }
 For contiguous-array arguments (including VLA), C99 conjoins one or more of the parameters as a downstream dimension(s), \eg @cols@, implicitly using this parameter to compute the row stride of @m@.
 Hence, if the declaration of @fc@ is changed to:
 …
 void fc( int rows, int cols, int m[@rows@][@cols@] ) ...
 \end{cfa}
 there is now sufficient information to support array copying and subscript checking to prevent changing the argument or buffer-overflow problems, \emph{but neither feature is provided}.
+there is now sufficient information to support array copying and subscript checking to prevent buffer-overflow problems, \emph{but neither feature is provided}.
 While this contiguous-array capability is a step forward, it is still the programmer's responsibility to manually manage the number of dimensions and their sizes, both at the function definition and call sites.
 That is, the array does not automatically carry its structure and sizes for use in computing subscripts.
 While the non-contiguous style in @faa@ looks very similar to @fc@, the compiler only understands the unknown-sized array of row pointers, and it relies on the programmer to traverse the columns in a row with a correctly bounded loop index.
 Specifically, there is no requirement that the rows are the same length, like a poem with different length lines.
+Specifically, there is no requirement that the rows are the same length, like for a poem with different-length lines.
 \begin{figure}
 …
                 for ( size_t  c = 0; c < cols; c += 1 )
                         ...  @m[r][c]@  ...
+                        // each r-step: cols * sizeof(int)
+}
 int m@[5][5]@;
 …
                 for ( size_t  c = 0; c < cols; c += 1 )
                         ...  @m[r][c]@  ...
+                        // each r-step: 1 * sizeof(int*)
+}
 int @* aa[5]@;  // row pointers
 …
 Again, the inspection begins by using @sizeof@ to provide program semantics for the intuition of an expression's type.
 \lstinput{16-18}{bkgd-carray-mdim.c}
 There are now three axis for deriving expressions from @mx@: \emph{itself}, \emph{first element}, and \emph{first grand-element} (meaning, first element of first element).
+There are now three means of deriving expressions from @mx@: \emph{itself}, \emph{first element}, and \emph{first grand-element} (meaning, first element of first element).
 \lstinput{20-26}{bkgd-carray-mdim.c}
 Given that arrays are contiguous and the size of @float@ is 4, then the size of @mx@ with 3 $\times$ 10 floats is 120 bytes, the size of its first element (row) is 40 bytes, and the size of the first element of the first row is 4.
 Again, an array and a point to each of its axes are different.
+Again, an array and a pointer to anything it contains are different.
 \lstinput{28-36}{bkgd-carray-mdim.c}
 As well, there is pointer decay from each of the matrix axes to pointers, all having the same address.
-\lstinput{38-44}{bkgd-carray-mdim.c}
 Finally, subscripting is allowed on a @malloc@ result, where the referent may or may not allow subscripting or have the right number of subscripts.
 …
 Passing an array as an argument to a function is necessary.
 Assume a parameter is an array where the function intends to subscript it.
 This section asserts that a more satisfactory/formal characterization does not exist in C, then surveys the ways that C API authors communicate @p@ has zero or more dimensions, and finally calls out the minority cases where the C type system is using or verifying such claims.
+This section asserts that a more satisfactory/formal characterization does not exist in C, then surveys the ways that C API authors communicate that @p@ has zero or more dimensions, and finally calls out the minority cases where the C type system is using or verifying such claims.
 A C parameter declaration looks different from the caller's and callee's perspectives.
 …
 % So are @float[5]*@, @float[]*@ and @float (*)*@.  These latter ones are simply nonsense, though they hint at ``1d array of pointers'', whose equivalent syntax options are, @float *[5]@, @float *[]@, and @float **@.
 It is a matter of taste as to whether a programmer should use the left form to get the most out of commenting subscripting and dimension sizes, sticking to the right (avoiding false comfort from suggesting the typechecker is checking more than it is), or compromising in the middle (reducing unchecked information, yet clearly stating, ``I am subscript'').
+It is a matter of taste as to whether a programmer should use the left form to get the most out of commenting subscripting and dimension sizes, sticking to the right (avoiding false comfort from suggesting the typechecker is checking more than it is), or compromising in the middle (reducing unchecked information, yet clearly stating, ``I am for subscripting'').
 Note that this equivalence of pointer and array declarations is special to parameters.
 …
+}
 \end{cfa}
 The cases without comments are rejections, but simply because the array ranks do not match; in the commented cases, the ranks match and the rules being discussed apply.
+The cases without comments are rejections, but simply because the array ranks do not match; in the commented cases, the ranks match and the rules are discussed apply.
 The cases @f( b )@ and @f( &a )@ show where some length checking occurs.
 But this checking misses the cases @f( d )@ and @f( &c )@, allowing the calls with mismatched lengths, actually 100 for 10.
 …
 Ultimately, an inner dimension's size is a callee's \emph{assumption} because the type system uses declaration details in the callee's perspective that it does not enforce in the caller's perspective.
 Finally, to handle higher-dimensional VLAs, C repurposed the @*@ \emph{within} the dimension in a declaration to mean that the callee has make an assumption about the size, but no (checked, possibly wrong) information about this assumption is included for the caller-programmer's benefit/\-over-confidence.
+Finally, to handle higher-dimensional VLAs, C repurposed the @*@ \emph{within} the dimension in a declaration to mean that the callee must make an assumption about the size, but no (unchecked, possibly wrong) information about this assumption is included for the caller-programmer's benefit/\-over-confidence.
 \begin{cquote}
 @[@ \textit{type-qualifier-list$_{opt}$} @* ]@
 …
 \label{s:ArraysCouldbeValues}
 All arrays have a know runtime size at their point of declaration.
+All arrays have a known runtime size at their point of declaration.
 Furthermore, C provides an explicit mechanism to pass an array's dimensions to a function.
 Nevertheless, an array cannot be copied, and hence, not passed by value to a function, even when there is sufficient information to do so.
 …
 \section{Linked List}
 Linked-lists are blocks of storage connected using one or more pointers.
+Linked lists are blocks of storage connected using one or more pointers.
 The storage block (node) is logically divided into data (user payload) and links (list pointers), where the links are the only component used by the list structure.
+Since the data is opaque, list structures are often polymorphic over the data, which is often homogeneous.
+The links organize nodes into a particular kind of data structure, \eg queue, tree, hash table, \etc, with operations specific to that kind.
+Because a node's existence is independent of the data structure that organizes it, all nodes are manipulated by address not value;
+hence, all data structure routines take and return pointers to nodes and not the nodes themselves.
+Since the data unused, list structures are often polymorphic over the data, which is often homogeneous.
+The links organize nodes into a particular shape of data structure, \eg chain, tree, hash table, \etc, with operations specific to that kind.
+In all these cases, a node's address is significant, so nodes are communicated by reference/pointer, and never by copy (by value).
 …
 Within this restricted space, all design-issue discussions assume the following invariants.
 \begin{itemize}
+        \item The chain shape, as opposed to tree or hash table, is considered.
         \item A doubly-linked list is being designed.
                 Generally, the discussed issues apply similarly for singly-linked lists.
                 Circular \vs ordered linking is discussed under List identity (\VRef{toc:lst:issue:ident}).
+                Ordered linking (as opposed to circularly-linked) occurs.
         \item Link fields are system-managed.
                 The system has freedom over how to represent these links.
                 The user works with the system-provided API to query and modify list membership.
+\begin{comment} % yes, that's what I'm building; no, it's not an invariant for the design-issue discussions
         \item The user data must provide storage for the list link-fields.
                 Hence, a list node is \emph{statically} defined as data and links \vs a node that is \emph{dynamically} constructed from data and links \see{\VRef{toc:lst:issue:attach}}.
+\end{comment}
 \end{itemize}
 Alternatives to these assumptions are discussed under Future Work (\VRef{toc:lst:futwork}).
 …
 \label{s:PreexistingLinked-ListLibraries}
 To show examples of the concepts being defined, two preexisting linked-list libraries are used throughout and further libraries are introduced as needed.
+To show examples of the concepts being defined, these two preexisting linked-list libraries are used throughout, and further libraries are introduced as needed.
 \begin{enumerate}
         \item Linux Queue library~\cite{lst:linuxq} (LQ) of @<sys/queue.h>@.
 …
 %A general comparison of libraries' abilities is given under Related Work (\VRef{toc:lst:relwork}).
 For the discussion, assume the type @req@ (request) is the user's payload in examples.
+Then the job of a list library is to help a user manage (organize) requests, \eg a request can be a network arrival-event processed by a web browser or a thread blocked/scheduled by the runtime.
+Then the job of a list library is to help a user manage (organize) requests.
+A request might be a network arrival event processed by a web browser or a thread blocked/scheduled by a runtime.
 …
 Link attachment deals with the question:
 Where are the libraries' inter-node link-fields stored, in relation to the user's payload data fields?
 \VRef[Figure]{fig:lst-issues-attach} shows three basic styles.
 \VRef[Figure]{f:Intrusive} shows the \newterm{intrusive} style, placing the link fields inside the payload structure.
 …
 The wrapped style distinguishes between wrapping a reference or a value, \eg @list<req *>@ or @list<req>@.
 (For this discussion, @list<req &>@ is similar to @list<req *>@.)
 This difference is one of user style and performance (copying), not framework capability.
 Library LQ is intrusive; STL is wrapped with reference or value.
+This difference is one of user style, performance (due to copying) and ownership; it is not a matter of framework capability.
+Library LQ is intrusive; STL is wrapped, supporting either reference or value.
 \begin{comment}
 …
                 The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs};
                 head objects are discussed in \VRef{toc:lst:issue:ident}.
                 In \protect\subref*{f:Intrusive}, the field \lstinline{req.d} names a list direction;
+                In \protect\subref*{f:Intrusive}, the field \lstinline{req.d} names a list axis;
                 these are discussed in \VRef{s:Axis}.
                 In \protect\subref*{f:WrappedRef} and \protect\subref*{f:WrappedValue}, the type \lstinline{node} represents a
 …
 Each diagram in \VRef[Figure]{fig:lst-issues-attach} is using the fewest dynamic allocations for its respective style:
 in intrusive, here is no dynamic allocation, in wrapped reference only the linked fields are dynamically allocated, and in wrapped value the copy data-area and linked fields are dynamically allocated.
+in intrusive, there is no dynamic allocation, in wrapped reference only the linked fields are dynamically allocated, and in wrapped value the copy data-area and linked fields are dynamically allocated.
 The advantage of intrusive is the control in memory layout and storage placement.
 Both wrapped styles have independent storage layout and imply library-induced heap allocations, with lifetime that matches the item's membership in the list.
+\begin{comment} % duplicated from s:Axis, blurs the taxonomy
 In all three cases, a @req@ object can enter and leave a list many times.
 However, in intrusive a @req@ can only be on one list at a time, unless there are separate link-fields for each simultaneous list.
 …
 however, knowing which of the @req@ object is the \emph{true} object becomes complex.
 \see*{\VRef{s:Axis} for further discussion.}
+The implementation of @LIST_ENTRY@ uses a trick to find the links and the node containing the links.
+The macro @LIST_INSERT_HEAD( &reqs, &r2, d )@ takes the list header, a pointer to the node, and the offset of the link fields in the node.
+One of the fields generated by @LIST_ENTRY@ is a pointer to the node, which is set to the node address, \eg @r2@.
+Hence, the offset to the link fields provides an access to the entire node, because the node points at itself.
+For list traversal, @LIST_FOREACH( cur, &reqs_pri, by_pri )@, there is the node cursor, the list, and the offset of the link fields within the node.
+The traversal actually moves from link fields to link fields within a node and sets the node cursor from the pointer within the link fields back to the node.
+A further aspect of layout control is allowing the user to explicitly specify link fields controlling placement and attributes within the @req@ object.
+LQ allows this ability through the @LIST_ENTRY@ macro\footnote{It is possible to have multiple named linked fields allowing a node to appear on multiple lists simultaneously.}, which can be placed anywhere in the node.
+\end{comment}
+In the LQ column, \subref*{f:Intrusive}, the macro @LIST_INSERT_HEAD( &reqs, &r2, d )@ takes the list header, a pointer to the node, and the offset of the link fields in the node.
+The user's last argument is the field name; the macro expansion uses it as @(&r2)->d@, causing the compiler to insert the link-field offset.
+With this information, the API can use node pointers for conversing with the user, and also work on the link fields within.
+However, when there is only one set of link fields, the user is required to manage a redundant parameter.
+An aspect of layout control is allowing the user to specify link fields explicitly, controlling field order and attributes within the @req@ object.
+LQ allows this ability through the @LIST_ENTRY@ macro\footnote{It is possible to have multiple of these link fields allowing a node to appear on multiple lists simultaneously.}, which can be placed anywhere in the node.
 An example of an attribute on the link fields is cache alignment, possibly in conjunction with other @req@ fields, improving locality and/or avoiding false sharing.
 For example, if a list is frequently traversed in the forward direction, and infrequently gets elements removed at random positions, then an ideal layout for cache locality puts the forward links, together with frequently-used payload data on one cache line, leaving the reverse links on a colder cache line.
+If a list is frequently traversed in the forward direction, and infrequently gets elements removed at arbitrary positions, then an ideal layout for cache locality puts the forward links, together with frequently-used payload data on one cache line, leaving the reverse links on a colder cache line.
 In contrast, supplying link fields by inheritance makes them implicit and relies on compiler placement, such as the start or end of @req@, and no explicit attributes.
 Wrapped reference has no control over the link fields, but the separate data allows some control;
 wrapped value has no control over data or links.
 Another subtle advantage of intrusive arrangement is that a reference to a user-level item (@req@) is sufficient to navigate or manage the item's membership.
+Another subtle advantage of intrusive attachment is that a reference to a user-level item (@req@) is sufficient to navigate or manage the item's membership.
 In LQ, the intrusive @req@ pointer is the correct argument type for operations @LIST_NEXT@ or @LIST_REMOVE@;
 there is no distinguishing a @req@ from a @req@ in a list.
 …
 Then, a novel use can put a @req@ in a list, without requiring any upstream change in the @req@ library.
 In intrusive, the ability to be listed must be planned during the definition of @req@.
-When in doubt, optimistically adding a couple links for future use is cheap because links are small and memory is big.
 \begin{figure}
 …
 An intrusive-primitive library like LQ lets users choose when to make this tradeoff.
 A wrapped-primitive library like STL forces users to incur the costs of wrapping, whether or not they access its benefits.
 Like LQ, \CFA is capable of supporting a wrapped library, if need arose.
+Like LQ, the \CFA intrusive library of \VRef[Chapter]{ch:list} is capable of supporting a wrapped library, if need arose.
 …
 \newterm{Axis} deals with the question:
 In how many different lists can a node be stored, at the same time?
+\VRef[Figure]{fig:lst-issues-multi-static} shows an example that can traverse all requests in priority order (field @pri@) or navigate among requests with the same request value (field @rqr@).
+Each of ``by priority'' and ``by common request value'' is a separate list.
+For example, there is a single priority-list linked in order [1, 2, 2, 3, 3, 4], where nodes may have the same priority, and there are three common request-value lists combining requests with the same values: [42, 42], [17, 17, 17], and [99], giving four head nodes, one for each list.
+The example shows a list can encompass all the nodes (by-priority) or only a subset of the nodes (three request-value lists).
+As stated, the limitation of intrusive is knowing apriori how many groups of links are needed for the maximum number of simultaneous lists.
+Thus, the intrusive LQ example supports multiple, but statically many, link lists.
+Note, it is possible to reuse links for different purposes, \eg if a list in linked one way at one time and another way at another time, and these times do not overlap, the two different linkings can use the same link fields.
+\VRef[Figure]{fig:lst-issues-multi-static} shows an example that can traverse all requests in priority order (field @pri@) or navigate among requests with the same requestor (field @rqr@).
+Each of ``by priority'' and ``by common requestor'' is a separate axis.
+The example has a single priority-list linked (first axis): [1, 2, 2, 3, 3, 4], where nodes may have the same priority.
+And it has three common-requestor lists (second axis): [42, 42], [17, 17, 17], and [99], giving four head nodes, one for each list.
+The example shows an axis can encompass all the nodes (by-priority) or only a subset of the nodes (three by-common-requestor lists).
+As stated, the limitation of intrusive is knowing a priori how many groups of links are needed for the maximum number of simultaneous lists.
+Thus, the intrusive LQ example supports multiple, but statically many, linked lists.
+Note, it is possible to reuse links for different purposes, \eg if an item is linked one way at one time and another way at another time, and these times do not overlap, the two different linkings can use the same link fields.
 This feature is used in the \CFA runtime, where a thread node may be on a blocked or running list, but never on both simultaneously.
 …
 \end{c++}
 Axis cannot be done with multiple inheritance, because there is no mechanism to either know the order of inheritance fields or name each inheritance.
 Instead, a special type is require that contains the link fields and points at the node.
+Simultaneous axes cannot be done with multiple inheritance, because there is no mechanism to either know the order of inheritance fields or name each inheritance.
+Instead, a special type is required that contains the link fields and points at the node.
 \begin{cquote}
 \setlength{\tabcolsep}{10pt}
 …
 int i = nodedl.@get()@.i;                               $\C{// indirection to node}$
 \end{c++}
 Hence, the \uCpp approach optimizes one set of intrusive links through the \CC inheritance mechanism, and falls back onto the LQ approach of explicit declarations for additional intrusive links.
 However, \uCpp cannot apply the LQ trick for finding the links and node.
+Hence, the \uCpp approach streamlines one set of intrusive links through the \CC inheritance mechanism, and falls back onto the LQ approach of explicit declarations for additional intrusive links.
+However, \uCpp does not have the LQ trick of passing a field name as a macro argument, resulting in the somewhat more cumbersome realization of subsequent link fields.
 The major ergonomic difference among the approaches is naming and name usage.
 …
 \subsection{User Integration: Preprocessed \vs Type-System Mediated}
 While the syntax for LQ is reasonably succinct, it comes at the cost of using C preprocessor macros for generics, which are not part of the language type-system, unlike \CC templates.
 Hence, small errors in macro arguments can lead to large substitution mistakes, as the arguments maybe textually written in many places and/or concatenated with other arguments/text to create new names and expressions.
+While the syntax for LQ is reasonably succinct, it comes at the cost of using C preprocessor macros for generics, while macros are not part of the language's type system.
+Hence, small errors in macro arguments can lead to large substitution mistakes, as the arguments may be textually written in many places and/or concatenated with other arguments/text to create new names and expressions.
 Hence, textual expansion can lead to a cascade of error messages that are confusing and difficult to debug.
 For example, argument errors like @a.b@{\Large\color{red},}@c@, comma instead of period, or @by@{\Large\color{red}-}@pri@, minus instead of underscore, can produce many error messages.
 Note, similar problems exist for \CC templates.
+Instead, language function calls (even with inlining) handle argument mistakes locally at the call, giving very specific error message.
+\CC @concepts@ were introduced in @templates@ to deal with this problem.
+Further issues with macros occur when the substitution result contains artifacts related to evaluation order that are not evident in the original.
+A real problem that occurred while preparing the linked-list performance evaluation stemmed from the innocent-looking
+\begin{c++}
+TAILQ_REMOVE(reqs, TAILQ_LAST(reqs, reql), d);
+\end{c++}
+not being equivalent to the explicitly multi-step version:
+\begin{c++}
+struct req * last = TAILQ_LAST(reqs, reql);
+TAILQ_REMOVE(reqs, last, d);
+\end{c++}
+It turns out that @TAILQ_REMOVE@ uses its ``which element to remove'' parameter at several places, importantly, one occurring after the removal's changes are in progress.
+When the second use encounters the macro substitution @TAILQ_LAST(reqs, reql)@, it obtains a different node than the first use got, with the removal's changes having alredy started.
+This macro-induced phenomenon led to an invalid pointer dereference (safety violation), at a run-time well after the removal at issue (costly to resolve).
+Instead, language function calls (even with inlining) handle argument mistakes locally at the call, giving a very specific error message.  In the world of \CC templates, concepts were introduced to deal with this problem.  Furthermovre, language function calls use (only) C-language semantics for argument evaluation, instead of applealing to reasoning about how the semantics might play out in an invisible substitution result.
+So, avoiding a macro-centric implementation helps uers avoid and diagnose critical mistakes.
 % example of poor error message due to LQ's preprocessed integration
 …
 All examples so far use two distinct types for:
 an item found in a list (type @req@ of variable @r1@, see \VRef[Figure]{fig:lst-issues-attach}), and the list (type @reql@ of variable @reqs_pri@, see \VRef[Figure]{fig:lst-issues-ident}).
+an item found in a list (type @req@ of variable @r1@, in \VRef[Figure]{fig:lst-issues-attach}), and the list  ``itself'' (type @reql@ of variable @reqs@ in the same example, or @reqs_pri@, in \VRef[Figure]{fig:lst-issues-ident}).
 This kind of list is \newterm{headed}, where the empty list is just a head.
 An alternate \emph{ad-hoc} approach omits the header, where the empty list is no nodes.
+Here, a pointer to any node can traverse its link fields: right or left and around, depending on the data structure.
+Note, a headed list is a superset of an ad-hoc list, and can normally perform all of the ad-hoc operations.
+Here, all link traversals begin from an existing reference to a node.
+A give node may be able to see that it is first, but there is no direct access to a global first.
+Note, a headed-list API is a superset of an ad-hoc list's, and can normally perform all of the ad-hoc operations.
 \VRef[Figure]{fig:lst-issues-ident} shows both approaches for different list lengths and unlisted elements.
 For headed, there are length-zero lists (heads with no elements), and an element can be listed or not listed.
 …
 (Both types are doubly linked and an analogous choice is available for singly linked.)
+Libraries have not traditionally offered an ad-hoc list abstraction, but it is a common pattern when programmers roll their own inter-linked type.
+The ad-hoc pattern obviates the questions, ``Who should own the head?'' or, ``Are we sure the head outlives the elements?''
+Providing library support for ad-hoc lists is an opportunity to lower the incentive for rolling one's own.
 \subsection{End Treatment: Cased \vs Uniform }
 …
 All lists must have a logical \emph{beginning/ending}, otherwise list traversal is infinite.
 \emph{End treatment} refers to how the list represents the lack of a predecessor/successor to demarcate end point(s).
 For example, in a doubly-linked list containing a single node, the next/prev links have no successor/predecessor nodes.
+For example, a doubly-linked list implementation might represent a node that is solo in a list, by using null next/prev pointers.
 Note, a list does not need to use links to denote its size;
 it can use a node counter in the header, where $N$ node traversals indicates complete navigation of the list.
 However, managing the number of nodes is an additional cost, whereas the links must always be managed.
+The following discussion refers to the LQ representations, detailed in \VRef[Figure]{fig:lst-issues-end}, using a null pointer to mark end points.
+LQ uses this representation for its successor/last.
+The following discussion refers to the LQ representations, detailed in \VRef[Figure]{fig:lst-issues-end}, which uses a null @succ@ pointer to mark the last element.
 For example, consider the operation of inserting after a given element.
 A doubly-linked list must update the given node's successor, to make its predecessor-pointer refer to the new node.
 …
                 LQ sub-object-level representation of links and ends.
                 Each object's memory is pictured as a vertical strip.
                 Pointers' target locations, within these strips, are significant.
                 Uniform treatment of the first-end is evident from an assertion like \lstinline{(**this.pred == this)} holding for all nodes \lstinline{this}, including the first one.
                 Cased treatment of the last-end is evident from the symmetric proposition, \lstinline{(this.succ.pred == &this.succ)}, failing when \lstinline{this} is the last node.
+                The location, within a strip, at which an arrow points, is significant.
+                Uniform treatment of the first-end is evident from an assertion like \lstinline{(**this->pred == this)} holding for all nodes \lstinline{this}, including the first one.
+                Cased treatment of the last-end is evident from the last-end analogous proposition, \lstinline{(this->succ->pred == &this->succ)}, failing when \lstinline{this} is the last node.
+        }
         \label{fig:lst-issues-end}
 …
 Interestingly, this branch is sometimes avoidable, giving a uniform end-treatment in the code.
 For example, LQ is headed at the front.
+For example, LQ is uniform headed at the front.
 For predecessor/first navigation, the relevant operation is inserting before a given element.
 LQ's predecessor representation is not a pointer to a node, but a pointer to a pseudo-successor pointer.
 …
 A string is a sequence of symbols, where the form of a symbol can vary significantly: 7/8-bit characters (ASCII/Latin-1), or 2/4/8-byte (UNICODE) characters/symbols or variable length (UTF-8/16/32) characters.
+A string can be read left-to-right, right-to-left, top-to-bottom, and have stacked elements (Arabic).
+When drawn for human reading, text can be left-to-right, right-to-left, top-to-bottom, or have stacked elements (\eg Arabic).
+But a string serves all of these presentations by forming the sequence of symbols that matches the order in which the text is read.
 A C character constant is an ASCII/Latin-1 character enclosed in single-quotes, \eg @'x'@, @'@\textsterling@'@.
 …
 A character can be formed from an escape sequence, which expresses a non-typable character @'\f'@, form feed, a delimiter character @'\''@, embedded single quote, or a raw character @'\xa3'@, \textsterling.
 A C character string is zero or more regular, wide, or escape characters enclosed in double-quotes @"xyz\n"@.
 The kind of characters in the string is denoted by a prefix: wide characters are prefixed by @L@, @u@, or @U@; UTF-8 characters are prefixed by @u8@.
+A C string constant is zero or more regular, wide, or escape characters enclosed in double-quotes @"xyz\n"@.
+The kind of characters in the string is denoted by a prefix: wide-character strings are prefixed by the same @L@, @u@, or @U@; UTF-8 strings, where characters have nonuniform byte lengths, are prefixed by @u8@.
 For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding to the multi-byte character sequence, \eg @L"abc@$\mu$@"@ and are read/printed using @wscanf@/@wprintf@.
 …
 This representation means that there is no real limit to how long a string can be, but programs have to scan one completely to determine its length.~\cite[p.~36]{C:old}
 \end{quote}
 This property is only preserved by the compiler with respect to character constants, \eg @"abc"@ is actually @"abc\0"@, \ie 4 characters rather than 3.
+This property is only preserved by the compiler with respect to character constants, \eg @"abc"@ is actually @{'a', 'b', 'c', '\0'}@, \ie 4 characters rather than 3.
 Otherwise, the compiler does not participate, making string operations both unsafe and inefficient.
 For example, it is common in C to:
 …
 As a result, storage management for C strings is a nightmare, quickly resulting in array overruns and incorrect results.
 Collectively, these design decisions make working with strings in C, awkward, time consuming, and unsafe.
+Collectively, these design decisions make working with strings in C awkward, time consuming and unsafe.
 While there are companion string routines that take the maximum lengths of strings to prevent array overruns, \eg @strncpy@, @strncat@, @strncpy@, that means the semantics of the operation can fail because strings are truncated.
 Suffice it to say, C is not a go-to language for string applications, which is why \CC introduced the dynamically-sized @string@ type.

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 74accc6 for doc/theses/mike_brooks_MMath/background.tex

Legend:

doc/theses/mike_brooks_MMath/background.tex

Download in other formats: