Context Navigation

-                      r297b796
+                      r0775468
 Linked-lists are blocks of storage connected using one or more pointers.
+The storage block is logically divided into data and links (pointers), where the links are the only component used by the list structure.
+Since the data is opaque, list structures are often polymorphic over the data, which is normally homogeneous.
+The storage block is logically divided into data (user payload) and links (list pointers), where the links are the only component used by the list structure.
+Since the data is opaque, list structures are often polymorphic over the data, which is often homogeneous.
+Linking is used to build data structures, which are a group of nodes, containing data and links, organized in a particular format, with specific operations peculiar to that format, \eg queue, tree, hash table, \etc.
+Because a node's existence is independent of the data structure that organizes it, all nodes are manipulated by address not value;
+hence, all data structure routines take and return pointers to nodes and not the nodes themselves.
+\begin{comment}
+\subsection{Linked-List Packages}
+C only supports type-eraser polymorphism, with no help from the type system.
+This approach is used in the @queue@ library providing macros that define and operate on four types of data structures: singly-linked lists, singly-linked tail queues, lists, and tail queues.
+These linked structures are \newterm{intrusive list}, where the link fields are defined (intrude) with data fields.
+\begin{cfa}
+struct DS {
+        // link fields, intrustive
+        // data fields
+}
+\end{cfa}
+\uCpp~\cite{uC++} is a concurrent extension of \CC, and provides a basic set of intrusive lists, where the link fields are defined with the data fields using inheritance.
+\begin{cfa}
+struct DS : public uColable {
+        // implicit link fields
+        // data fields
+}
+\end{cfa}
+Intrusive nodes eliminate the need to dynamically allocate/deallocate the link fields when a node is added/removed to/from a data-structure.
+Reducing dynamic allocation is important in concurrent programming because the heap is a shared resource with the potential for high contention.
+The two formats are one link field, which form a \Index{collection}, and two link fields, which form a \Index{sequence}.
+\begin{center}
+%\input{DSLNodes}
+\end{center}
+@uStack@ and @uQueue@ are collections and @uSequence@ is a sequence.
+To get the appropriate link fields associated with a user node, it must be a public descendant of @uColable@\index{uColable@@uColable@} or @uSeqable@\index{uSeqable@@uSeqable@}, respectively, e.g.:
+%[
+class stacknode : public uColable { ... }
+class queuenode : public uColable { ... }
+class seqnode : public uSeqable { ... }
+%]
+A node inheriting from @uSeqable@ can appear in a sequence/collection but a node inherting from @uColable@ can only appear in a collection.
+Along with providing the appropriate link fields, the types @uColable@ and @uSeqable@ also provide one member routine:
+%[
+bool listed() const;
+%]
+which returns @true@ if the node is an element of any collection or sequence and @false@ otherwise.
+Finally, no header files are necessary to access the uC DSL.
+Some uC DSL restrictions are:
+\begin{itemize}
+\item
+None of the member routines are virtual in any of the data structures for efficiency reasons.
+Therefore, pointers to data structures must be used with care or incorrect member routines may be invoked.
+\end{itemize}
+\end{comment}
+\subsection{Design Issues}
+\label{toc:lst:issue}
+This section introduces the design space for linked lists that target \emph{system programmers}.
+Within this restricted target, all design-issue discussions assume the following invariants.
+Alternatives to the assumptions are discussed under Future Work (Section~\ref{toc:lst:futwork}).
+\begin{itemize}
+    \item A doubly-linked list is being designed.
+          Generally, the discussed issues apply similarly for singly-linked lists.
+          Circular \vs ordered linking is discussed under List identity (Section~\ref{toc:lst:issue:ident}).
+    \item Link fields are system-managed.
+          The user works with the system-provided API to query and modify list membership.
+          The system has freedom over how to represent these links.
+        \item The user data must provide storage for the list link-fields.
+          Hence, a list node is \emph{statically} defined as data and links \vs a node that is \emph{dynamically} constructed from data and links \see{\VRef{toc:lst:issue:attach}}.
+\end{itemize}
+\subsection{Preexisting Linked-List Libraries}
+Two preexisting linked-list libraries are used throughout, to show examples of the concepts being defined,
+and further libraries are introduced as needed.
+\begin{enumerate}
+    \item Linux Queue library\cite{lst:linuxq} (LQ) of @<sys/queue.h>@.
+    \item \CC Standard Template Library's (STL)\footnote{The term STL is contentious as some people prefer the term standard library.} @std::list@\cite{lst:stl}
+\end{enumerate}
+A general comparison of libraries' abilities is given under Related Work (Section~\ref{toc:lst:relwork}).
+For the discussion, assume the fictional type @req@ (request) is the user's payload in examples.
+As well, the list library is helping the user manage (organize) requests, \eg a request can be work on the level of handling a network arrival event or scheduling a thread.
+\subsection{Link attachment: Intrusive vs.\ Wrapped}
+\label{toc:lst:issue:attach}
+Link attachment deals with the question:
+Where are the libraries' inter-element link fields stored, in relation to the user's payload data fields?
+Figure~\ref{fig:lst-issues-attach} shows three basic styles.
+The \newterm{intrusive} style places the link fields inside the payload structure.
+The two \newterm{wrapped} styles place the payload inside a generic library-provided structure that then defines the link fields.
+Library LQ is intrusive; STL is wrapped.
+The wrapped style further distinguishes between wrapping a reference and wrapping a value, \eg @list<req *>@ or @list<req>@.
+(For this discussion, @list<req &>@ is similar to @list<req *>@.)
+This difference is one of user style, not framework capability.
+\begin{comment}
+\begin{figure}
+    \begin{tabularx}{\textwidth}{Y|Y|Y}
+                \lstinput[language=C]{20-39}{lst-issues-intrusive.run.c}
+        &\lstinputlisting[language=C++]{20-39}{lst-issues-wrapped-byref.run.cpp}
+        &\lstinputlisting[language=C++]{20-39}{lst-issues-wrapped-emplaced.run.cpp}
+      \\ & &
+      \\
+        \includegraphics[page=1]{lst-issues-attach.pdf}
+        &
+        \includegraphics[page=2]{lst-issues-attach.pdf}
+        &
+        \includegraphics[page=3]{lst-issues-attach.pdf}
+      \\ & &
+      \\
+        (a) & (b) & (c)
+    \end{tabularx}
+\caption{
+        Three styles of link attachment: (a)~intrusive, (b)~wrapped reference, and (c)~wrapped value.
+        The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs};
+        head objects are discussed in Section~\ref{toc:lst:issue:ident}.
+        In (a), the field \lstinline{req.x} names a list direction;
+        these are discussed in Section~\ref{toc:lst:issue:simultaneity}.
+        In (b) and (c), the type \lstinline{node} represents a system-internal type,
+        which is \lstinline{std::_List_node} in the GNU implementation.
+        (TODO: cite? found in  /usr/include/c++/7/bits/stl\_list.h )
+    }
+     \label{fig:lst-issues-attach}
+\end{figure}
+\end{comment}
+\begin{figure}
+\centering
+\newsavebox{\myboxA}                                    % used with subfigure
+\newsavebox{\myboxB}
+\newsavebox{\myboxC}
+\begin{lrbox}{\myboxA}
+\begin{tabular}{@{}l@{}}
+\lstinput[language=C]{20-35}{lst-issues-intrusive.run.c} \\
+\includegraphics[page=1]{lst-issues-attach.pdf}
+\end{tabular}
+\end{lrbox}
+\begin{lrbox}{\myboxB}
+\begin{tabular}{@{}l@{}}
+\lstinput[language=C++]{20-35}{lst-issues-wrapped-byref.run.cpp} \\
+\includegraphics[page=2]{lst-issues-attach.pdf}
+\end{tabular}
+\end{lrbox}
+\begin{lrbox}{\myboxC}
+\begin{tabular}{@{}l@{}}
+\lstinput[language=C++]{20-35}{lst-issues-wrapped-emplaced.run.cpp} \\
+\includegraphics[page=3]{lst-issues-attach.pdf}
+\end{tabular}
+\end{lrbox}
+\subfloat[Intrusive]{\label{f:Intrusive}\usebox\myboxA}
+\hspace{6pt}
+\vrule
+\hspace{6pt}
+\subfloat[Wrapped reference]{\label{f:WrappedRef}\usebox\myboxB}
+\hspace{6pt}
+\vrule
+\hspace{6pt}
+\subfloat[Wrapped value]{\label{f:WrappedValue}\usebox\myboxC}
+\caption{
+        Three styles of link attachment:
+                % \protect\subref*{f:Intrusive}~intrusive, \protect\subref*{f:WrappedRef}~wrapped reference, and \protect\subref*{f:WrappedValue}~wrapped value.
+        The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs};
+        head objects are discussed in Section~\ref{toc:lst:issue:ident}.
+        In \protect\subref*{f:Intrusive}, the field \lstinline{req.d} names a list direction;
+        these are discussed in Section~\ref{toc:lst:issue:simultaneity}.
+        In \protect\subref*{f:WrappedRef} and \protect\subref*{f:WrappedValue}, the type \lstinline{node} represents a
+                library-internal type, which is \lstinline{std::_List_node} in the GNU implementation
+        \see{\lstinline{/usr/include/c++/X/bits/stl_list.h}, where \lstinline{X} is the \lstinline{g++} version number}.
+    }
+    \label{fig:lst-issues-attach}
+\end{figure}
+Each diagrammed example is using the fewest dynamic allocations for its respective style:
+in \subref*{f:Intrusive}, here are no dynamic allocations, in \subref*{f:WrappedRef} only the linked fields are dynamically allocated, and in \subref*{f:WrappedValue} the copied data and linked fields are dynamically allocated.
+The advantage of intrusive attachment is the control in memory layout and storage placement.
+Both wrapped attachment styles have independent storage layout and imply library-induced heap allocations, with lifetime that matches the item's membership in the list.
+In all three cases, a @req@ object can enter and leave a list many times.
+However, in \subref*{f:Intrusive} a @req@ can only be on one list at a time, unless there are separate link-fields for each simultaneous list.
+In \subref*{f:WrappedRef}, a @req@ can appear multiple times on the same or different lists simultaneously, but since @req@ is shared via the pointer, care must be taken if updating data also occurs simultaneously, \eg concurrency.
+In \subref*{f:WrappedValue}, the @req@ is copied, which increases storage usage, but allows independent simultaneous changes;
+however, knowing which of the @req@ object is the ``true'' object becomes complex.
+\see*{\VRef{toc:lst:issue:simultaneity} for further discussion.}
+The implementation of @LIST_ENTRY@ uses a trick to find the links and the node containing the links.
+The macro @LIST_INSERT_HEAD(&reqs, &r2, d);@ takes the list header, a pointer to the node, and the offset of the link fields in the node.
+One of the fields generated by @LIST_ENTRY@ is a pointer to the node, which is set to the node address, \eg @r2@.
+Hence, the offset to the link fields provides an access to the entire node, \ie the node points at itself.
+For list traversal, @LIST_FOREACH(cur, &reqs_pri, by_pri)@, there is the node cursor, the list, and the offset of the link fields within the node.
+The traversal actually moves from link fields to link fields within a node and sets the node cursor from the pointer within the link fields back to the node.
+A further aspect of layout control is allowing the user to explicitly specify link fields controlling attributes and placement within the @req@ object.
+LQ allows this ability through the @LIST_ENTRY@ macro\footnote{It is possible to have multiple named linked fields allowing a node to appear on multiple lists simultaneously.};
+supplying the link fields by inheritance makes them implicit and relies on compiler placement, such as the start or end of @req@.
+An example of an explicit attribute is cache alignment of the link fields in conjunction with other @req@ fields, improving locality and/or avoiding false sharing.
+Wrapped reference has no control over the link fields, but the seperate data allows some control;
+wrapped value has no control over data or links.
+Another subtle advantage of intrusive arrangement is that a reference to a user-level item (@req@) is sufficient to navigate or manage the item's membership.
+In LQ, \subref*{f:Intrusive}, a @req@ pointer is the right argument type for operations @LIST_NEXT@ or @LIST_REMOVE@;
+there is no distinguishing a @req@ from ``a @req@ in a list.''
+The same is not true of STL, \subref*{f:WrappedRef} or \subref*{f:WrappedValue}.
+There, the analogous operations work on a parameter of type @list<T>::iterator@;
+they are @iterator::operator++()@, @iterator::operator*()@, and @list::erase(iterator)@.
+There is no mapping from @req &@ to @list<req>::iterator@, except for linear search.
+The advantage of wrapped attachment is the abstraction of a data item from its list membership(s).
+In the wrapped style, the @req@ type can come from a library that serves many independent uses,
+which generally have no need for listing.
+Then, a novel use can put @req@ in list, without requiring any upstream change in the @req@ library.
+In intrusive attachment, the ability to be listed must be planned during the definition of @req@.
+Finally, for wrapper reference a single node can appear at multiple places in the same list or different list, which might be useful in certain read-only cases.
+For intrusive and wrapper value, a node must be duplicated to appear at multiple locations, presenting additional cost.
+This scenario becomes difficult to imagine when the nodes are written because three link styles have issues.
+\begin{figure}
+    \lstinput[language=C++]{100-117}{lst-issues-attach-reduction.hpp}
+    \lstinput[language=C++]{150-150}{lst-issues-attach-reduction.hpp}
+    \caption{
+        Reduction of wrapped attachment to intrusive attachment.
+        Illustrated by pseudocode implementation of an STL-compatible API fragment
+        using LQ as the underlying implementation.
+        The gap that makes it pseudocode is that
+        the LQ C macros do not expand to valid C++ when instantiated with template parameters---there is no \lstinline{struct El}.
+        When using a custom-patched version of LQ to work around this issue,
+        the programs of Figure~\ref{f:WrappedRef} and \protect\subref*{f:WrappedValue} work with this shim in place of real STL.
+        Their executions lead to the same memory layouts.
+    }
+    \label{fig:lst-issues-attach-reduction}
+\end{figure}
+Wrapped attachment has a straightforward reduction to intrusive attachment, illustrated in Figure~\ref{fig:lst-issues-attach-reduction}.
+This shim layer performs the implicit dynamic allocations that pure intrusion avoids.
+But there is no reduction going the other way.
+No shimming can cancel the allocations to which wrapped membership commits.
+So intrusion is a lower-level listing primitive.
+And so, the system design choice is not between forcing users to use intrusion or wrapping.
+The choice is whether or not to provide access to an allocation-free layer of functionality.
+A wrapped-primitive library like STL forces users to incur the costs of wrapping, whether or not they access its benefits.
+An intrusive-primitive library like LQ lets users choose when to make this tradeoff.
+\subsection{Simultaneity: Single vs.\ Multi-Static vs.\ Dynamic}
+\label{toc:lst:issue:simultaneity}
+\begin{figure}
+    \parbox[t]{3.5in} {
+        \lstinput[language=C++]{20-60}{lst-issues-multi-static.run.c}
+    }\parbox[t]{20in} {
+        ~\\
+        \includegraphics[page=1]{lst-issues-direct.pdf} \\
+        ~\\
+        \hspace*{1.5in}\includegraphics[page=2]{lst-issues-direct.pdf}
+    }
+    \caption{
+        Example of simultaneity using LQ lists.
+        The zoomed-out diagram (right/top) shows the complete multi-linked data structure.
+        This structure can navigate all requests in priority order, and navigate among requests with a common request value.
+        The zoomed-in diagram (right/bottom) shows how the link fields connect the nodes on different lists.
+    }
+    \label{fig:lst-issues-multi-static}
+\end{figure}
+\newterm{Simultaneity} deals with the question:
+In how many different lists can a node be stored, at the same time?
+Figure~\ref{fig:lst-issues-multi-static} shows an example that can traverse all requests in priority order (field @pri@) or navigate among requests with the same request value (field @rqr@).
+Each of ``by priority'' and ``by common request value'' is a separate list.
+For example, there is a single priority-list linked in order [1, 2, 2, 3, 3, 4], where nodes may have the same priority, and there are three common request-value lists combining requests with the same values: [42, 42], [17, 17, 17], and [99], giving four head nodes one for each list.
+The example shows a list can encompass all the nodes (by-priority) or only a subset of the nodes (three request-value lists).
+As stated, the limitation of intrusive attachment is knowing apriori how many groups of links are needed for the maximum number of simultaneous lists.
+Thus, the intrusive LQ example supports multiple, but statically many, link lists.
+Note, it is possible to reuse links for different purposes, \eg if a list in linked one at one time and another way at another time, and these times do not overlap, the two different linkings can use the same link fields.
+This feature is used in the \CFA runtime where a thread node may be on a blocked or running list, both never on both simultaneously.
+Now consider the STL in the wrapped-reference arrangement of Figure~\ref{f:WrappedRef}.
+Here it is possible to construct the same simultaneity by creating multiple STL lists, each pointing at the appropriate nodes.
+Each group of intrusive links become the links for each separate STL list.
+The upside is the unlimited number of a lists a node can be associated with simultaneously, any number of STL lists can be created dynamically.
+The downside is the dynamic allocation of the link nodes and manging multiple lists.
+Note, it might be possible to wrap the multiple lists in another type to hide this implementation issue.
+Now consider the STL in the wrapped-value arrangement of Figure~\ref{f:WrappedValue}.
+Again, it is possible to construct the same simultaneity by creating multiple STL lists, each copying the appropriate nodes, where the intrusive links become the links for each separate STL list.
+The upside is the same as for wrapped-reference arrangement with an unlimited number of a list bindings.
+The downside is the dynamic allocation and significant storage usage due to copying.
+As well, it is unclear how node updates work in this scenario, without some notation of ultimately merging node information.
+% https://www.geeksforgeeks.org/introduction-to-multi-linked-list/ -- example of building a bespoke multi-linked list out of STL primitives (providing indication that STL doesn't offer one); offers dynamic directionality by embedding `vector<struct node*> pointers;`
+% When allowing multiple static directions, frameworks differ in their ergonomics for
+% the typical case: when the user needs only one direction, vs.\ the atypical case, when the user needs several.
+% LQ's ergonomics are well-suited to the uncommon case of multiple list directions.
+% Its intrusion declaration and insertion operation both use a mandatory explicit parameter naming the direction.
+% This decision works well in Figure~\ref{fig:lst-issues-multi-static}, where the names @by_pri@ and @by_rqr@ work well,
+% but it clutters Figure~\ref{f:Intrusive}, where a contrived name must be invented and used.
+% The example uses @x@; @reqs@ would be a more readily ignored choice. \PAB{wording?}
+\uCpp offers an intrusive list that makes the opposite ergonomic choice.  TODO: elaborate on inheritance for first direction and acrobatics for subsequent directions.
+STL may seem to have similar ergonomics to LQ, but in fact, the current ergonomic distinction is not applicable there,
+where one static direction is enough to achieve multiple dynamic directions.
+Note that all options in Figure~\ref{fig:lst-issues-attach} have a \emph{variable} named @refs@
+just as both Figure~\ref{fig:lst-issues-multi-static} and Figure~(TODO~new) have \emph{variables} with names including @pri@ vs @rqr@.
+But only the intrusive model has this naming showing up within the definition of a structure.
+This lack of named parts of a structure lets Figure~\ref{fig:lst-issues-attach} \subref*{f:WrappedRef} and \subref*{f:WrappedValue}, just like \uCpp,
+insert into a list without mentioning a part's name, while only version \subref*{f:Intrusive} has to mention @x@ at this step.
+LQ demands this same extraneous part-naming when removing, iterating, and even asking for a neighbour.
+At issue in this distinction is whether an API that offers multiple static directions (and so requires these to be named differently)
+allows the sole direction (when several are not wanted) to be \emph{implicit}.
+\uCpp allows it, LQ does not, and STL does not have this question as applicable.
+\subsection{User integration: Preprocessed vs.\ Type-System Mediated}
+% example of poor error message due to LQ's preprocessed integration
+% programs/lst-issues-multi-static.run.c:46:1: error: expected identifier or '(' before 'do'
+%    46 | LIST_INSERT_HEAD(&reqs_rtr_42, &r42b, by_rqr);
+%       | ^~~~~~~~~~~~~~~~
+%
+% ... not a wonderful example; it was a missing semicolon on the preceding line; but at least real
+\subsection{List identity: Headed vs.\ Ad-hoc}
+\label{toc:lst:issue:ident}
+All examples so far have used distinct user-facing types:
+an item found in a list (type @req@, of variables like @r1@), and
+a list (type @reql@ or @list<req>@, of variables like @reqs@ or @reqs_rqr_42@).
+\see{Figure~\ref{fig:lst-issues-attach} and Figure~\ref{fig:lst-issues-multi-static}}
+The latter type is a head, and these examples are of headed lists.
+A bespoke ``pointer to next @req@'' implementation often omits the latter type.
+Such a list model is ad-hoc.
+In headed thinking, there are length-zero lists (heads with no elements), and an element can be listed or not listed.
+In ad-hoc thinking, there are no length-zero lists and every element belongs to a list of length at least one.
+\PAB{Create a figure for this.}
+By omitting the head, elements can enter into an adjacency relationship,
+without requiring that someone allocate room for the head of the possibly-resulting list,
+or being able to find a correct existing head.
+A head defines one or more element roles, among elements that share a transitive adjacency.
+``First'' and ``last'' are element roles.
+One moment's ``first'' need not be the next moment's.
+There is a cost to maintaining these roles.
+A runtime component of this cost is evident in LQ's offering the choice of type generators @LIST@ vs.~@TAILQ@.
+Its @LIST@ maintains a ``first,'' but not a ``last;'' its @TAILQ@ maintains both roles.
+(Both types are doubly linked and an analogous choice is available for singly linked.)
+TODO: finish making this point
+See WIP in lst-issues-adhoc-*.ignore.*.
+The code-complexity component of the cost ...
+Ability to offer heads is good.  Point: Does maintaining a head mean that the user has to provide more state when manipulating the list?  Requiring the user to do so is bad, because the user may have lots of "list" typed variables in scope, and detecting that the user passed the wrong one requires testing all the listing edge cases.
+\subsection{End treatment: Uniform }
 …
 An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in @'x'@.
 A wide character constant is the same, except prefixed by the letter @L@, @u@, or @U@.
 With a few exceptions detailed later, the elements of the sequence are any members of the source character set;
+Except for escape sequences, the elements of the sequence are any members of the source character set;
 they are mapped in an implementation-defined manner to members of the execution character set.
 …
 For wide string literals prefixed by the letter @L@, the array elements have type @wchar_t@ and are initialized with the sequence of wide characters corresponding to the multibyte character sequence, as defined by the @mbstowcs@ function with an implementation-defined current locale.
 For wide string literals prefixed by the letter @u@ or @U@, the array elements have type @char16_t@ or @char32_t@, respectively, and are initialized with the sequence of wide characters corresponding to the multibyte character sequence, as defined by successive calls to the @mbrtoc16@, or @mbrtoc32@ function as appropriate for its type, with an implementation-defined current locale.
 The value of a string literal containing a multibyte character or escape sequence not represented in the executioncharacter set is implementation-defined.
+The value of a string literal containing a multibyte character or escape sequence not represented in the execution character set is implementation-defined.

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 0775468

Legend:

doc/theses/mike_brooks_MMath/background.tex

Download in other formats: