Context Navigation

← Previous Changeset
Next Changeset →

Changeset c721105

Timestamp:

May 24, 2024, 2:16:09 PM (6 months ago)

Author:

Peter A. Buhr <pabuhr@…>

Branches:

Children:

Parents:

Message:

proofreading changes

Location:

doc/theses/mike_brooks_MMath

Files:

: 3 edited

array.tex (modified) (2 diffs)
background.tex (modified) (28 diffs)
intro.tex (modified) (6 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/theses/mike_brooks_MMath/array.tex

-                      r76425bc
+                      rc721105
 \chapter{Array}
+\label{c:Array}
 \section{Introduction}
 …
 Unsigned integers have a special status in this type system.
 Unlike how C++ allows
 \begin{lstlisting}[language=c++]
+\begin{c++}
 template< size_t N, char * msg, typename T >... // declarations
 \end{lstlisting}
+\end{c++}
 \CFA does not accommodate values of any user-provided type.
 TODO: discuss connection with dependent types.

doc/theses/mike_brooks_MMath/background.tex

-                      r76425bc
+                      rc721105
 Accessing any storage requires pointer arithmetic, even if it is just base-displacement addressing in an instruction.
 The conjoining of pointers and arrays could also be applied to structures, where a pointer references a structure field like an array element.
+Finally, while subscripting involves pointer arithmetic (as does field references @x.y.z@), it is very complex for multi-dimensional arrays and requires array descriptors to know stride lengths along dimensions.
+Many C errors result from performing pointer arithmetic instead of using subscripting.
+Some C textbooks erroneously teach pointer arithmetic suggesting it is faster than subscripting.
+Finally, while subscripting involves pointer arithmetic (as does field references @x.y.z@), the computation is very complex for multi-dimensional arrays and requires array descriptors to know stride lengths along dimensions.
+Many C errors result from performing pointer arithmetic instead of using subscripting;
+some C textbooks teach pointer arithmetic erroneously suggesting it is faster than subscripting.
+A sound and efficient C program does not require explicit pointer arithmetic.
 C semantics want a programmer to \emph{believe} an array variable is a ``pointer to its first element.''
 …
 Equally, C programmers know the size of a \emph{pointer} to the first array element is 8 (or 4 depending on the addressing architecture).
 % Now, set aside for a moment the claim that this first assertion is giving information about a type.
 Clearly, an array and a pointer to its first element are different things.
+Clearly, an array and a pointer to its first element are different.
 In fact, the idea that there is such a thing as a pointer to an array may be surprising and it is not the same thing as a pointer to the first element.
 …
 \CFA provides its own type, variable and routine declarations, using a simpler syntax.
 The new declarations place qualifiers to the left of the base type, while C declarations place qualifiers to the right of the base type.
 The qualifiers have the same meaning in \CFA as in C.
+The qualifiers have the same syntax and semantics in \CFA as in C.
 Then, a \CFA declaration is read left to right, where a function return type is enclosed in brackets @[@\,@]@.
 \begin{cquote}
 …
 \end{tabular}
 \end{cquote}
 As declaration complexity increases, it becomes corresponding difficult to read and understand the C declaration form.
+As declaration size increases, it becomes corresponding difficult to read and understand the C declaration form, whereas reading and understanding a \CFA declaration has linear complexity as the declaration size increases.
 Note, writing declarations left to right is common in other programming languages, where the function return-type is often placed after the parameter declarations.
 \VRef[Table]{bkgd:ar:usr:avp} introduces the many layers of the C and \CFA array story, where the \CFA story is discussion in \VRef{XXX}.
+\VRef[Table]{bkgd:ar:usr:avp} introduces the many layers of the C and \CFA array story, where the \CFA story is discussion in \VRef[Chapter]{c:Array}.
 The \CFA-thesis column shows the new array declaration form, which is my contributed improvements for safety and ergonomics.
 The table shows there are multiple yet equivalent forms for the array types under discussion, and subsequent discussion shows interactions with orthogonal (but easily confused) language features.
 …
         \hline
         & ar.\ of immutable val. & @const T x[10];@ & @[10] const T x@ & @const array(T, 10) x@ \\
     & & @T const x[10];@ & @[10] T const x@ & @array(T, 10) const x@ \\
+        & & @T const x[10];@ & @[10] T const x@ & @array(T, 10) const x@ \\
         \hline
         & ar.\ of ptr.\ to value & @T * x[10];@ & @[10] * T x@ & @array(T *, 10) x@ \\
 …
 ``pointer to \emph{type}'' that points to the initial element of the array object~\cite[\S~6.3.2.1.3]{C11}
 \end{quote}
 This phenomenon is the famous ``pointer decay,'' which is a decay of an array-typed expression into a pointer-typed one.
+This phenomenon is the famous \newterm{pointer decay}, which is a decay of an array-typed expression into a pointer-typed one.
 It is worthy to note that the list of exception cases does not feature the occurrence of @ar@ in @ar[i]@.
 Thus, subscripting happens on pointers not arrays.
 …
 Taken together, these rules illustrate that @ar[i]@ and @i[a]@ mean the same thing!
+Subscripting a pointer when the target is standard-inappropriate is still practically well-defined.
+While the standard affords a C compiler freedom about the meaning of an out-of-bound access,
+or of subscripting a pointer that does not refer to an array element at all,
+the fact that C is famously both generally high-performance, and specifically not bound-checked,
+leads to an expectation that the runtime handling is uniform across legal and illegal accesses.
+Subscripting a pointer when the target is standard inappropriate is still practically well-defined.
+While the standard affords a C compiler freedom about the meaning of an out-of-bound access, or of subscripting a pointer that does not refer to an array element at all,
+the fact that C is famously both generally high-performance, and specifically not bound-checked, leads to an expectation that the runtime handling is uniform across legal and illegal accesses.
 Moreover, consider the common pattern of subscripting on a @malloc@ result:
 \begin{cfa}
 …
 Under this assumption, a pointer being subscripted (or added to, then dereferenced) by any value (positive, zero, or negative), gives a view of the program's entire address space, centred around the @p@ address, divided into adjacent @sizeof(*p)@ chunks, each potentially (re)interpreted as @typeof(*p)@.
 I call this phenomenon ``array diffraction,''  which is a diffraction of a single-element pointer into the assumption that its target is in the middle of an array whose size is unlimited in both directions.
+I call this phenomenon \emph{array diffraction}, which is a diffraction of a single-element pointer into the assumption that its target is in the middle of an array whose size is unlimited in both directions.
 No pointer is exempt from array diffraction.
 No array shows its elements without pointer decay.
 …
 The caller of such a function is left with the reality that a pointer parameter is a pointer, no matter how it is spelled:
 \lstinput{18-21}{bkgd-carray-decay.c}
+This fragment gives no warnings.
+This fragment gives the warning for the first argument, in the second call.
+\begin{cfa}
+warning: 'f' accessing 40 bytes in a region of size 4
+\end{cfa}
 The shortened parameter syntax @T x[]@ is a further way to spell ``pointer.''
 …
 This point of confusion is illustrated in:
 \lstinput{23-30}{bkgd-carray-decay.c}
+Note, \CC gives a warning for the initialization of @cp@.
+\begin{cfa}
+warning: ISO C++ forbids converting a string constant to 'char*'
+\end{cfa}
+and C gives a warning at the call of @edit@, if @const@ is added to the declaration of @cp@.
+\begin{cfa}
+warning: passing argument 1 of 'edit' discards 'const' qualifier from pointer target type
+\end{cfa}
 The basic two meanings, with a syntactic difference helping to distinguish,
 are illustrated in the declarations of @ca@ vs.\ @cp@,
+are illustrated in the declarations of @ca@ \vs @cp@,
 whose subsequent @edit@ calls behave differently.
 The syntax-caused confusion is in the comparison of the first and last lines,
 …
         \hline
         & ptr.\ to ptr.\ to imm.\ val. & @const char **@ & @const char * argv[],@ & @[] * const char argv,@ \\
     & & & \emph{others elided} & \emph{others elided} \\
+        & & & \emph{others elided} & \emph{others elided} \\
         \hline
 \end{tabular}
 \end{table}
+\subsection{Multi-Dimensional}
+As in the last section, multi-dimensional array declarations are examined.
+\lstinput{16-18}{bkgd-carray-mdim.c}
+The significant axis of deriving expressions from @ar@ is now ``itself,'' ``first element'' or ``first grand-element (meaning, first element of first element).''
+\lstinput{20-44}{bkgd-carray-mdim.c}
 …
 Note, the C standard supports VLAs~\cite[\S~6.7.6.2.4]{C11} as a conditional feature, but the \CC standard does not;
 both @gcc@ and @g++@ support VLAs.
 As well, there is misinformation about VLAs, \eg VLAs cause stack failures or are inefficient.
+As well, there is misinformation about VLAs, \eg the stack size is limited (small), or VLAs cause stack failures or are inefficient.
 VLAs exist as far back as Algol W~\cite[\S~5.2]{AlgolW} and are a sound and efficient data type.
 For high-performance applications, the stack size can be fixed and small (coroutines or user-level threads).
 Here, VLAs can overflow the stack, so a heap allocation is used.
+Here, VLAs can overflow the stack without appropriately sizing the stack, so a heap allocation is used.
 \begin{cfa}
 float * ax1 = malloc( sizeof( float[n] ) );
 float * ax2 = malloc( n * sizeof( float ) );
+float * ax2 = malloc( n * sizeof( float ) );    $\C{// arrays}$
 float * bx1 = malloc( sizeof( float[1000000] ) );
 float * bx2 = malloc( 1000000 * sizeof( float ) );
 …
 \subsection{The pointer-to-array type has been noticed before}
-\subsection{Multi-Dimensional}
-As in the last section, we inspect the declaration ...
-\lstinput{16-18}{bkgd-carray-mdim.c}
-The significant axis of deriving expressions from @ar@ is now ``itself,'' ``first element'' or ``first grand-element (meaning, first element of first element).''
-\lstinput{20-44}{bkgd-carray-mdim.c}
 \section{Linked List}
 …
 Since the data is opaque, list structures are often polymorphic over the data, which is often homogeneous.
 Linking is used to build data structures, which are a group of nodes, containing data and links, organized in a particular format, with specific operations peculiar to that format, \eg queue, tree, hash table, \etc.
+Storage linking is used to build data structures, which are a group of nodes, containing data and links, organized in a particular format, with specific operations peculiar to that format, \eg queue, tree, hash table, \etc.
 Because a node's existence is independent of the data structure that organizes it, all nodes are manipulated by address not value;
 hence, all data structure routines take and return pointers to nodes and not the nodes themselves.
-\begin{comment}
-\subsection{Linked-List Packages}
-C only supports type-eraser polymorphism, with no help from the type system.
-This approach is used in the @queue@ library providing macros that define and operate on four types of data structures: singly-linked lists, singly-linked tail queues, lists, and tail queues.
-These linked structures are \newterm{intrusive list}, where the link fields are defined (intrude) with data fields.
-\begin{cfa}
-struct DS {
-        // link fields, intrustive
-        // data fields
+}
-\end{cfa}
-\uCpp~\cite{uC++} is a concurrent extension of \CC, and provides a basic set of intrusive lists, where the link fields are defined with the data fields using inheritance.
-\begin{cfa}
-struct DS : public uColable {
-        // implicit link fields
-        // data fields
+}
-\end{cfa}
-Intrusive nodes eliminate the need to dynamically allocate/deallocate the link fields when a node is added/removed to/from a data-structure.
-Reducing dynamic allocation is important in concurrent programming because the heap is a shared resource with the potential for high contention.
-The two formats are one link field, which form a \Index{collection}, and two link fields, which form a \Index{sequence}.
-\begin{center}
-%\input{DSLNodes}
-\end{center}
-@uStack@ and @uQueue@ are collections and @uSequence@ is a sequence.
-To get the appropriate link fields associated with a user node, it must be a public descendant of @uColable@\index{uColable@@uColable@} or @uSeqable@\index{uSeqable@@uSeqable@}, respectively, e.g.:
-%[
-class stacknode : public uColable { ... }
-class queuenode : public uColable { ... }
-class seqnode : public uSeqable { ... }
-%]
-A node inheriting from @uSeqable@ can appear in a sequence/collection but a node inheriting from @uColable@ can only appear in a collection.
-Along with providing the appropriate link fields, the types @uColable@ and @uSeqable@ also provide one member routine:
-%[
-bool listed() const;
-%]
-which returns @true@ if the node is an element of any collection or sequence and @false@ otherwise.
-Finally, no header files are necessary to access the uC DSL.
-Some uC DSL restrictions are:
-\begin{itemize}
-\item
-None of the member routines are virtual in any of the data structures for efficiency reasons.
-Therefore, pointers to data structures must be used with care or incorrect member routines may be invoked.
-\end{itemize}
-\end{comment}
 …
 Alternatives to the assumptions are discussed under Future Work (Section~\ref{toc:lst:futwork}).
 \begin{itemize}
     \item A doubly-linked list is being designed.
           Generally, the discussed issues apply similarly for singly-linked lists.
           Circular \vs ordered linking is discussed under List identity (Section~\ref{toc:lst:issue:ident}).
     \item Link fields are system-managed.
           The user works with the system-provided API to query and modify list membership.
           The system has freedom over how to represent these links.
+        \item A doubly-linked list is being designed.
+                Generally, the discussed issues apply similarly for singly-linked lists.
+                Circular \vs ordered linking is discussed under List identity (Section~\ref{toc:lst:issue:ident}).
+        \item Link fields are system-managed.
+                The user works with the system-provided API to query and modify list membership.
+                The system has freedom over how to represent these links.
         \item The user data must provide storage for the list link-fields.
           Hence, a list node is \emph{statically} defined as data and links \vs a node that is \emph{dynamically} constructed from data and links \see{\VRef{toc:lst:issue:attach}}.
+                Hence, a list node is \emph{statically} defined as data and links \vs a node that is \emph{dynamically} constructed from data and links \see{\VRef{toc:lst:issue:attach}}.
 \end{itemize}
 …
 and further libraries are introduced as needed.
 \begin{enumerate}
     \item Linux Queue library\cite{lst:linuxq} (LQ) of @<sys/queue.h>@.
     \item \CC Standard Template Library's (STL)\footnote{The term STL is contentious as some people prefer the term standard library.} @std::list@\cite{lst:stl}
+        \item Linux Queue library\cite{lst:linuxq} (LQ) of @<sys/queue.h>@.
+        \item \CC Standard Template Library's (STL)\footnote{The term STL is contentious as some people prefer the term standard library.} @std::list@\cite{lst:stl}
 \end{enumerate}
+A general comparison of libraries' abilities is given under Related Work (Section~\ref{toc:lst:relwork}).
+%A general comparison of libraries' abilities is given under Related Work (Section~\ref{toc:lst:relwork}).
 For the discussion, assume the fictional type @req@ (request) is the user's payload in examples.
 As well, the list library is helping the user manage (organize) requests, \eg a request can be work on the level of handling a network arrival event or scheduling a thread.
 …
 Link attachment deals with the question:
 Where are the libraries' inter-element link fields stored, in relation to the user's payload data fields?
+Figure~\ref{fig:lst-issues-attach} shows three basic styles.
+The \newterm{intrusive} style places the link fields inside the payload structure.
+The two \newterm{wrapped} styles place the payload inside a generic library-provided structure that then defines the link fields.
+Library LQ is intrusive; STL is wrapped.
+The wrapped style further distinguishes between wrapping a reference and wrapping a value, \eg @list<req *>@ or @list<req>@.
+\VRef[Figure]{fig:lst-issues-attach} shows three basic styles.
+\VRef[Figure]{f:Intrusive} shows the \newterm{intrusive} style, placing the link fields inside the payload structure.
+\VRef[Figures]{f:WrappedRef} and \subref*{f:WrappedValue} show the two \newterm{wrapped} styles, which place the payload inside a generic library-provided structure that then defines the link fields.
+The wrapped style distinguishes between wrapping a reference and wrapping a value, \eg @list<req *>@ or @list<req>@.
 (For this discussion, @list<req &>@ is similar to @list<req *>@.)
 This difference is one of user style, not framework capability.
+Library LQ is intrusive; STL is wrapped with reference and value.
 \begin{comment}
 \begin{figure}
     \begin{tabularx}{\textwidth}{Y|Y|Y}
+        \begin{tabularx}{\textwidth}{Y|Y|Y}
                 \lstinput[language=C]{20-39}{lst-issues-intrusive.run.c}
         &\lstinputlisting[language=C++]{20-39}{lst-issues-wrapped-byref.run.cpp}
         &\lstinputlisting[language=C++]{20-39}{lst-issues-wrapped-emplaced.run.cpp}
       \\ & &
       \\
         \includegraphics[page=1]{lst-issues-attach.pdf}
+        &
         \includegraphics[page=2]{lst-issues-attach.pdf}
+        &
         \includegraphics[page=3]{lst-issues-attach.pdf}
       \\ & &
       \\
         (a) & (b) & (c)
     \end{tabularx}
+                &\lstinputlisting[language=C++]{20-39}{lst-issues-wrapped-byref.run.cpp}
+                &\lstinputlisting[language=C++]{20-39}{lst-issues-wrapped-emplaced.run.cpp}
+          \\ & &
+          \\
+                \includegraphics[page=1]{lst-issues-attach.pdf}
+                &
+                \includegraphics[page=2]{lst-issues-attach.pdf}
+                &
+                \includegraphics[page=3]{lst-issues-attach.pdf}
+          \\ & &
+          \\
+                (a) & (b) & (c)
+        \end{tabularx}
 \caption{
+        Three styles of link attachment: (a)~intrusive, (b)~wrapped reference, and (c)~wrapped value.
         The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs};
+        head objects are discussed in Section~\ref{toc:lst:issue:ident}.
         In (a), the field \lstinline{req.x} names a list direction;
         these are discussed in Section~\ref{toc:lst:issue:simultaneity}.
         In (b) and (c), the type \lstinline{node} represents a system-internal type,
         which is \lstinline{std::_List_node} in the GNU implementation.
         (TODO: cite? found in  /usr/include/c++/7/bits/stl\_list.h )
+    }
      \label{fig:lst-issues-attach}
+                Three styles of link attachment: (a)~intrusive, (b)~wrapped reference, and (c)~wrapped value.
+                The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs};
+                head objects are discussed in Section~\ref{toc:lst:issue:ident}.
+                In (a), the field \lstinline{req.x} names a list direction;
+                these are discussed in Section~\ref{toc:lst:issue:simultaneity}.
+                In (b) and (c), the type \lstinline{node} represents a system-internal type,
+                which is \lstinline{std::_List_node} in the GNU implementation.
+                (TODO: cite? found in  /usr/include/c++/7/bits/stl\_list.h )
+        }
+         \label{fig:lst-issues-attach}
 \end{figure}
 \end{comment}
 …
 \caption{
         Three styles of link attachment:
                 % \protect\subref*{f:Intrusive}~intrusive, \protect\subref*{f:WrappedRef}~wrapped reference, and \protect\subref*{f:WrappedValue}~wrapped value.
         The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs};
+        head objects are discussed in Section~\ref{toc:lst:issue:ident}.
         In \protect\subref*{f:Intrusive}, the field \lstinline{req.d} names a list direction;
         these are discussed in Section~\ref{toc:lst:issue:simultaneity}.
         In \protect\subref*{f:WrappedRef} and \protect\subref*{f:WrappedValue}, the type \lstinline{node} represents a
+                Three styles of link attachment:
+                % \protect\subref*{f:Intrusive}~intrusive, \protect\subref*{f:WrappedRef}~wrapped reference, and \protect\subref*{f:WrappedValue}~wrapped value.
+                The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs};
+                head objects are discussed in Section~\ref{toc:lst:issue:ident}.
+                In \protect\subref*{f:Intrusive}, the field \lstinline{req.d} names a list direction;
+                these are discussed in Section~\ref{toc:lst:issue:simultaneity}.
+                In \protect\subref*{f:WrappedRef} and \protect\subref*{f:WrappedValue}, the type \lstinline{node} represents a
                 library-internal type, which is \lstinline{std::_List_node} in the GNU implementation
         \see{\lstinline{/usr/include/c++/X/bits/stl_list.h}, where \lstinline{X} is the \lstinline{g++} version number}.
+    }
     \label{fig:lst-issues-attach}
+                \see{\lstinline{/usr/include/c++/X/bits/stl_list.h}, where \lstinline{X} is the \lstinline{g++} version number}.
+        }
+        \label{fig:lst-issues-attach}
 \end{figure}
 Each diagrammed example is using the fewest dynamic allocations for its respective style:
 in \subref*{f:Intrusive}, here are no dynamic allocations, in \subref*{f:WrappedRef} only the linked fields are dynamically allocated, and in \subref*{f:WrappedValue} the copied data and linked fields are dynamically allocated.
 The advantage of intrusive attachment is the control in memory layout and storage placement.
 Both wrapped attachment styles have independent storage layout and imply library-induced heap allocations, with lifetime that matches the item's membership in the list.
+in intrusive, here is no dynamic allocation, in wrapped reference only the linked fields are dynamically allocated, and in wrapped value the copied data and linked fields are dynamically allocated.
+The advantage of intrusive is the control in memory layout and storage placement.
+Both wrapped styles have independent storage layout and imply library-induced heap allocations, with lifetime that matches the item's membership in the list.
 In all three cases, a @req@ object can enter and leave a list many times.
 However, in \subref*{f:Intrusive} a @req@ can only be on one list at a time, unless there are separate link-fields for each simultaneous list.
 In \subref*{f:WrappedRef}, a @req@ can appear multiple times on the same or different lists simultaneously, but since @req@ is shared via the pointer, care must be taken if updating data also occurs simultaneously, \eg concurrency.
 In \subref*{f:WrappedValue}, the @req@ is copied, which increases storage usage, but allows independent simultaneous changes;
+However, in intrusive a @req@ can only be on one list at a time, unless there are separate link-fields for each simultaneous list.
+In wrapped reference, a @req@ can appear multiple times on the same or different lists simultaneously, but since @req@ is shared via the pointer, care must be taken if updating data also occurs simultaneously, \eg concurrency.
+In wrapped value, the @req@ is copied, which increases storage usage, but allows independent simultaneous changes;
 however, knowing which of the @req@ object is the ``true'' object becomes complex.
 \see*{\VRef{toc:lst:issue:simultaneity} for further discussion.}
 …
 A further aspect of layout control is allowing the user to explicitly specify link fields controlling attributes and placement within the @req@ object.
 LQ allows this ability through the @LIST_ENTRY@ macro\footnote{It is possible to have multiple named linked fields allowing a node to appear on multiple lists simultaneously.};
+LQ allows this ability through the @LIST_ENTRY@ macro;\footnote{It is possible to have multiple named linked fields allowing a node to appear on multiple lists simultaneously.}
 supplying the link fields by inheritance makes them implicit and relies on compiler placement, such as the start or end of @req@.
 An example of an explicit attribute is cache alignment of the link fields in conjunction with other @req@ fields, improving locality and/or avoiding false sharing.
 …
 Another subtle advantage of intrusive arrangement is that a reference to a user-level item (@req@) is sufficient to navigate or manage the item's membership.
 In LQ, \subref*{f:Intrusive}, a @req@ pointer is the right argument type for operations @LIST_NEXT@ or @LIST_REMOVE@;
+In LQ, the intrusive @req@ pointer is the right argument type for operations @LIST_NEXT@ or @LIST_REMOVE@;
 there is no distinguishing a @req@ from ``a @req@ in a list.''
+The same is not true of STL, \subref*{f:WrappedRef} or \subref*{f:WrappedValue}.
+There, the analogous operations work on a parameter of type @list<T>::iterator@;
+they are @iterator::operator++()@, @iterator::operator*()@, and @list::erase(iterator)@.
+There is no mapping from @req &@ to @list<req>::iterator@, except for linear search.
+The advantage of wrapped attachment is the abstraction of a data item from its list membership(s).
+The same is not true of STL, wrapped reference or value.
+There, the analogous operations, @iterator::operator++()@, @iterator::operator*()@, and @list::erase(iterator)@, work on a parameter of type @list<T>::iterator@;
+There is no mapping from @req &@ to @list<req>::iterator@. %, for linear search.
+The advantage of wrapped is the abstraction of a data item from its list membership(s).
 In the wrapped style, the @req@ type can come from a library that serves many independent uses,
 which generally have no need for listing.
+Then, a novel use can put @req@ in list, without requiring any upstream change in the @req@ library.
+In intrusive attachment, the ability to be listed must be planned during the definition of @req@.
+Finally, for wrapper reference a single node can appear at multiple places in the same list or different list, which might be useful in certain read-only cases.
+For intrusive and wrapper value, a node must be duplicated to appear at multiple locations, presenting additional cost.
+This scenario becomes difficult to imagine when the nodes are written because three link styles have issues.
+Then, a novel use can put a @req@ in a list, without requiring any upstream change in the @req@ library.
+In intrusive, the ability to be listed must be planned during the definition of @req@.
 \begin{figure}
+    \lstinput[language=C++]{100-117}{lst-issues-attach-reduction.hpp}
+    \lstinput[language=C++]{150-150}{lst-issues-attach-reduction.hpp}
+    \caption{
+        Reduction of wrapped attachment to intrusive attachment.
+        Illustrated by pseudocode implementation of an STL-compatible API fragment
+        using LQ as the underlying implementation.
+        The gap that makes it pseudocode is that
+        the LQ C macros do not expand to valid C++ when instantiated with template parameters---there is no \lstinline{struct El}.
+        When using a custom-patched version of LQ to work around this issue,
+        the programs of Figure~\ref{f:WrappedRef} and \protect\subref*{f:WrappedValue} work with this shim in place of real STL.
+        Their executions lead to the same memory layouts.
+    }
+    \label{fig:lst-issues-attach-reduction}
+        \lstinput[language=C++]{100-117}{lst-issues-attach-reduction.hpp}
+        \lstinput[language=C++]{150-150}{lst-issues-attach-reduction.hpp}
+        \caption{
+                Simulation of wrapped using intrusive.
+                Illustrated by pseudocode implementation of an STL-compatible API fragment using LQ as the underlying implementation.
+                The gap that makes it pseudocode is that
+                the LQ C macros do not expand to valid C++ when instantiated with template parameters---there is no \lstinline{struct El}.
+                When using a custom-patched version of LQ to work around this issue,
+                the programs of Figure~\ref{f:WrappedRef} and wrapped value work with this shim in place of real STL.
+                Their executions lead to the same memory layouts.
+        }
+        \label{fig:lst-issues-attach-reduction}
 \end{figure}
 Wrapped attachment has a straightforward reduction to intrusive attachment, illustrated in Figure~\ref{fig:lst-issues-attach-reduction}.
+It is possible to simulate wrapped using intrusive, illustrated in Figure~\ref{fig:lst-issues-attach-reduction}.
 This shim layer performs the implicit dynamic allocations that pure intrusion avoids.
 But there is no reduction going the other way.
 No shimming can cancel the allocations to which wrapped membership commits.
+So intrusion is a lower-level listing primitive.
+And so, the system design choice is not between forcing users to use intrusion or wrapping.
+Because intrusion is a lower-level listing primitive, the system design choice is not between forcing users to use intrusion or wrapping.
 The choice is whether or not to provide access to an allocation-free layer of functionality.
+An intrusive-primitive library like LQ lets users choose when to make this tradeoff.
 A wrapped-primitive library like STL forces users to incur the costs of wrapping, whether or not they access its benefits.
-An intrusive-primitive library like LQ lets users choose when to make this tradeoff.
 …
 \begin{figure}
     \parbox[t]{3.5in} {
         \lstinput[language=C++]{20-60}{lst-issues-multi-static.run.c}
     }\parbox[t]{20in} {
         ~\\
         \includegraphics[page=1]{lst-issues-direct.pdf} \\
         ~\\
         \hspace*{1.5in}\includegraphics[page=2]{lst-issues-direct.pdf}
+    }
     \caption{
+        Example of simultaneity using LQ lists.
         The zoomed-out diagram (right/top) shows the complete multi-linked data structure.
         This structure can navigate all requests in priority order, and navigate among requests with a common request value.
         The zoomed-in diagram (right/bottom) shows how the link fields connect the nodes on different lists.
+    }
     \label{fig:lst-issues-multi-static}
+        \parbox[t]{3.5in} {
+                \lstinput[language=C++]{20-60}{lst-issues-multi-static.run.c}
+        }\parbox[t]{20in} {
+                ~\\
+                \includegraphics[page=1]{lst-issues-direct.pdf} \\
+                ~\\
+                \hspace*{1.5in}\includegraphics[page=2]{lst-issues-direct.pdf}
+        }
+        \caption{
+                Example of simultaneity using LQ lists.
+                The zoomed-out diagram (right/top) shows the complete multi-linked data structure.
+                This structure can navigate all requests in priority order ({\color{blue}blue}), and navigate among requests with a common request value ({\color{orange}orange}).
+                The zoomed-in diagram (right/bottom) shows how the link fields connect the nodes on different lists.
+        }
+        \label{fig:lst-issues-multi-static}
 \end{figure}
 …
 The example shows a list can encompass all the nodes (by-priority) or only a subset of the nodes (three request-value lists).
 As stated, the limitation of intrusive attachment is knowing apriori how many groups of links are needed for the maximum number of simultaneous lists.
+As stated, the limitation of intrusive is knowing apriori how many groups of links are needed for the maximum number of simultaneous lists.
 Thus, the intrusive LQ example supports multiple, but statically many, link lists.
 Note, it is possible to reuse links for different purposes, \eg if a list in linked one at one time and another way at another time, and these times do not overlap, the two different linkings can use the same link fields.
+Note, it is possible to reuse links for different purposes, \eg if a list in linked one way at one time and another way at another time, and these times do not overlap, the two different linkings can use the same link fields.
 This feature is used in the \CFA runtime where a thread node may be on a blocked or running list, both never on both simultaneously.
 …
 Again, it is possible to construct the same simultaneity by creating multiple STL lists, each copying the appropriate nodes, where the intrusive links become the links for each separate STL list.
 The upside is the same as for wrapped-reference arrangement with an unlimited number of a list bindings.
 The downside is the dynamic allocation and significant storage usage due to copying.
+The downside is the dynamic allocation and significant storage usage due to node copying.
 As well, it is unclear how node updates work in this scenario, without some notation of ultimately merging node information.
 …
 % The example uses @x@; @reqs@ would be a more readily ignored choice. \PAB{wording?}
+\uCpp offers an intrusive list that makes the opposite ergonomic choice.  TODO: elaborate on inheritance for first direction and acrobatics for subsequent directions.
+STL may seem to have similar ergonomics to LQ, but in fact, the current ergonomic distinction is not applicable there,
+where one static direction is enough to achieve multiple dynamic directions.
+Note that all options in Figure~\ref{fig:lst-issues-attach} have a \emph{variable} named @refs@
+just as both Figure~\ref{fig:lst-issues-multi-static} and Figure~(TODO~new) have \emph{variables} with names including @pri@ vs @rqr@.
+But only the intrusive model has this naming showing up within the definition of a structure.
+This lack of named parts of a structure lets Figure~\ref{fig:lst-issues-attach} \subref*{f:WrappedRef} and \subref*{f:WrappedValue}, just like \uCpp,
+insert into a list without mentioning a part's name, while only version \subref*{f:Intrusive} has to mention @x@ at this step.
+LQ demands this same extraneous part-naming when removing, iterating, and even asking for a neighbour.
+At issue in this distinction is whether an API that offers multiple static directions (and so requires these to be named differently)
+allows the sole direction (when several are not wanted) to be \emph{implicit}.
+\uCpp allows it, LQ does not, and STL does not have this question as applicable.
+\uCpp is a concurrent extension of \CC, which provides a basic set of intrusive lists~\cite[appx.~F]{uC++}, where the link fields are defined with the data fields using inheritance.
+\begin{cquote}
+\setlength{\tabcolsep}{15pt}
+\begin{tabular}{@{}ll@{}}
+\multicolumn{1}{c}{singly-linked list} & \multicolumn{1}{c}{doubly-linked list} \\
+\begin{c++}
+struct Node : public uColable {
+        int i;  // data
+        Node( int i ) : i{ i } {}
+};
+\end{c++}
+&
+\begin{c++}
+struct Node : public uSeqable {
+        int i;  // data
+        Node( int i ) : i{ i } {}
+};
+\end{c++}
+\end{tabular}
+\end{cquote}
+A node can be placed in the following data structures depending on its link fields: @uStack@ and @uQueue@ (singly linked), and @uSequence@ (doubly linked).
+A node inheriting from @uSeqable@ can appear in a singly or doubly linked structure.
+Structure operations implicitly know the link-field location through the inheritance.
+\begin{c++}
+uStack<Node> stack;
+Node node;
+stack.push( node );  // link fields at beginning of node
+\end{c++}
+Simultaneity cannot be done with multiple inheritance, because there is no mechanism to either know the order of inheritance fields or name each inheritance.
+Instead, a special type is require that contains the link fields and points at the node.
+\begin{cquote}
+\setlength{\tabcolsep}{10pt}
+\begin{tabular}{@{}ll@{}}
+\begin{c++}
+struct NodeDL : public uSeqable {
+        @Node & node;@  // node pointer
+        NodeDL( Node & node ) : node( node ) {}
+        Node & get() const { return node; }
+};
+\end{c++}
+&
+\begin{c++}
+struct Node : public uColable {
+        int i;  // data
+        @NodeDL nodeseq;@  // embedded intrusive links
+        Node( int i ) : i{ i }, @nodeseq{ this }@ {}
+};
+\end{c++}
+\end{tabular}
+\end{cquote}
+This node can now be inserted into a doubly-linked list through the embedded intrusive links.
+\begin{c++}
+uSequence<NodeDL> sequence;
+sequence.add_front( node.nodeseq );             $\C{// link fields in embedded type}$
+NodeDL nodedl = sequence.remove( node.nodeseq );
+int i = nodedl.get().i;                                 $\C{// indirection to node}$
+\end{c++}
+Hence, the \uCpp approach optimizes one set of intrusive links through the \CC inheritance mechanism, and falls back onto the LQ approach of explicit declarations for additional intrusive links.
+However, \uCpp cannot apply the LQ trick for finding the links and node.
+The major ergonomic difference among the approaches is naming and name usage.
+The intrusive model requires naming each set of intrusive links, \eg @by_pri@ and @by_rqr@ in \VRef[Figure]{fig:lst-issues-multi-static}.
+\uCpp cheats by using inheritance for the first intrusive links, after which explicit naming of intrusive links is required.
+Furthermore, these link names must be used in all list operations, including iterating, whereas wrapped reference and value hide the list names in the implicit dynamically-allocated node-structure.
+At issue is whether an API for simultaneity can support one list (when several are not wanted) to be \emph{implicit}.
+\uCpp allows it, LQ does not, and the STL does not have this question.
 …
 \label{toc:lst:issue:ident}
 All examples so far have used distinct user-facing types:
+All examples so far have used distinct user-facing types:
 an item found in a list (type @req@, of variables like @r1@), and
 a list (type @reql@ or @list<req>@, of variables like @reqs@ or @reqs_rqr_42@).
 …
 A C character constant is an ASCII/Latin-1 character enclosed in single-quotes, \eg @'x'@, @'@\textsterling@'@.
 A wide C character constant is the same, except prefixed by the letter @L@, @u@, or @U@, \eg @u'\u25A0'@ (black square), where the @\u@ identifies a universal character name.
 A character can be formed from an escape sequence, which expresses a non-typable character (@'\n'@), a delimiter character @'\''@, or a raw character @'\x2f'@.
 A character sequence is zero or more regular, wide, or escape characters enclosed in double-quotes @"xyz\n"@.
+A character can be formed from an escape sequence, which expresses a non-typable character @'\f'@ form feed, a delimiter character @'\''@ embedded single quote, or a raw character @'\xa3'@ \textsterling.
+A C character string is zero or more regular, wide, or escape characters enclosed in double-quotes @"xyz\n"@.
 The kind of characters in the string is denoted by a prefix: UTF-8 characters are prefixed by @u8@, wide characters are prefixed by @L@, @u@, or @U@.
 For UTF-8 string literals, the array elements have type @char@ and are initialized with the characters of the multibyte character sequences, \eg @u8"\xe1\x90\x87"@ (Canadian syllabics Y-Cree OO).
 For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding of the multibyte character sequence, \eg @L"abc@$\mu$@"@ and read/print using @wsanf@/@wprintf@.
+For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the wide characters corresponding to the multibyte character sequence, \eg @L"abc@$\mu$@"@ and are read/printed using @wsanf@/@wprintf@.
 The value of a wide-character is implementation-defined, usually a UTF-16 character.
 For wide string literals prefixed by the letter @u@ or @U@, the array elements have type @char16_t@ or @char32_t@, respectively, and are initialized with wide characters corresponding to the multibyte character sequence, \eg @u"abc@$\mu$@"@, @U"abc@$\mu$@"@.
 …
 Unfortunately, this design decision is both unsafe and inefficient.
 It is common error in C to forget the space in a character array for the terminator or overwrite the terminator, resulting in array overruns in string operations.
 The need to repeatedly scan an entire string to determine its length can result in significant cost, as it is not possible to cache the length in many cases.
+The need to repeatedly scan an entire string to determine its length can result in significant cost, as it is impossible to cache the length in many cases.
 C strings are fixed size because arrays are used for the implementation.
 …
 As a result, storage management for C strings is a nightmare, quickly resulting in array overruns and incorrect results.
 Collectively, these design decisions make working with strings in C, awkward, time consuming, and very unsafe.
+Collectively, these design decisions make working with strings in C, awkward, time consuming, and unsafe.
 While there are companion string routines that take the maximum lengths of strings to prevent array overruns, that means the semantics of the operation can fail because strings are truncated.
 Suffice it to say, C is not a go-to language for string applications, which is why \CC introduced the @string@ type.

doc/theses/mike_brooks_MMath/intro.tex

-                      r76425bc
+                      rc721105
 All modern programming languages provide three high-level containers (collection): array, linked-list, and string.
 Often array is part of the programming language, while linked-list is built from pointer types, and string from a combination of array and linked-list.
+For all three types, there is some corresponding mechanism for iterating through the structure, where the iterator flexibility varies with the kind of structure and ingenuity of the iterator implementor.
-\cite{Blache19}
-\cite{Oorschot23}
-\cite{Ruef19}
 \section{Array}
 Array provides a homogeneous container with $O(1)$ access to elements using subscripting.
+An array provides a homogeneous container with $O(1)$ access to elements using subscripting (some form of pointer arithmetic).
 The array size can be static, dynamic but fixed after creation, or dynamic and variable after creation.
 For static and dynamic-fixed, an array can be stack allocated, while dynamic-variable requires the heap.
+Because array layout has contiguous components, subscripting is a computation.
+However, the computation can exceed the array bounds resulting in programming errors and security violations~\cite{Elliott18, Blache19, Ruef19, Oorschot23}.
+The goal is to provide good performance with safety.
 \section{Linked List}
 Linked-list provides a homogeneous container with $O(log N)$/$O(N)$ access to elements using successor and predecessor operations.
+A linked-list provides a homogeneous container often with $O(log N)$/$O(N)$ access to elements using successor and predecessor operations.
 Subscripting by value is sometimes available, \eg hash table.
 Linked types are normally dynamically sized by adding/removing nodes using link fields internal or external to the elements (nodes).
 …
 \section{String}
 String provides a dynamic array of homogeneous elements, where the elements are often human-readable characters.
 What differentiates string from other types in that string operations work on blocks of elements for scanning and changing the elements, rather than accessing individual elements.
 Nevertheless, subscripting is often available.
 The cost of string operations is less important than the power of the block operation to accomplish complex manipulation.
 The dynamic nature of string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages.
+A string provides a dynamic array of homogeneous elements, where the elements are often human-readable characters.
+What differentiates a string from other types in that its operations work on blocks of elements for scanning and changing the elements, rather than accessing individual elements, \eg @index@ and @substr@.
+Subscripting individual elements is often available.
+Often the cost of string operations is less important than the power of the operations to accomplish complex text manipulation, \eg search, analysing, composing, and decomposing.
+The dynamic nature of a string means storage is normally heap allocated but often implicitly managed, even in unmanaged languages.
 \section{Motivation}
 The goal of this work is to introduce safe and complex versions of array, link-lists, and string into the programming language \CFA~\cite{Cforall}, which is based on C.
+The goal of this work is to introduce safe and complex versions of array, link lists, and strings into the programming language \CFA~\cite{Cforall}, which is based on C.
 Unfortunately, to make C better, while retaining a high level of backwards compatibility, requires a significant knowledge of C's design.
 Hence, it is assumed the reader has a medium knowledge of C or \CC, on which extensive new C knowledge is built.
 …
 However, most programming languages are only partially explained by standard's manuals.
 When it comes to explaining how C works, the definitive source is the @gcc@ compiler, which is mimicked by other C compilers, such as Clang~\cite{clang}.
 Often other C compilers must \emph{ape} @gcc@ because a large part of the C library (runtime) system contains @gcc@ features.
+Often other C compilers must mimic @gcc@ because a large part of the C library (runtime) system contains @gcc@ features.
 While some key aspects of C need to be explained by quoting from the language reference manual, to illustrate definite program semantics, I devise a program, whose behaviour exercises the point at issue, and shows its behaviour.
 These example programs show
 …
 This work has been tested across @gcc@ versions 8--12 and clang version 10 running on ARM, AMD, and Intel architectures.
 Any discovered anomalies among compilers or versions is discussed.
 In this case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard.
+In all case, I do not argue that my sample of major Linux compilers is doing the right thing with respect to the C standard.
 …
 \end{cfa}
 with a segmentation fault at runtime.
 Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems like madness.
 Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings.
+Clearly, @gcc@ understands these ill-typed case, and yet allows the program to compile, which seems inappropriate.
+Compiling with flag @-Werror@, which turns warnings into errors, is often too strong, because some warnings are just warnings, \eg unsed variable.
 In the following discussion, ``ill-typed'' means giving a nonzero @gcc@ exit condition with a message that discusses typing.
 Note, \CFA's type-system rejects all these ill-typed cases as type mismatch errors.
 …
 \subsection{String}
+\subsection{Iterator}

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: