source: doc/theses/mike_brooks_MMath/list.tex @ e78966e

Last change on this file since e78966e was b64d0f4, checked in by Peter A. Buhr <pabuhr@…>, 8 months ago

second attempt changing program-input style

  • Property mode set to 100644
File size: 21.5 KB
Line 
1\chapter{Linked List}
2
3I wrote a linked-list library for \CFA.  This chapter describes it.
4
5The library provides a doubly-linked list that
6attaches links intrusively,
7supports multiple link directions,
8integrates with user code via the type system,
9treats its ends uniformly, and
10identifies a list using an explicit head.
11
12TODO: more summary
13
14
15\section{Design Issues}
16\label{toc:lst:issue}
17
18This section introduces the design space for linked lists that target system programmers.
19
20All design-issue discussions assume the following invariants.
21\PAB{They are stated here to clarify that none of the discussed design issues refers to one of these.}
22Alternatives to the assumptions are discussed under Future Work (Section~\ref{toc:lst:futwork}).
23\begin{itemize}
24    \item A doubly-linked list is being designed.
25          Generally, the discussed issues apply similarly for singly-linked lists.
26          Circular vs ordered linking is discussed under List identity (Section~\ref{toc:lst:issue:ident}).
27    \item Link fields are system-managed.
28          The user works with the system-provided API to query and modify list membership.
29          The system has freedom over how to represent these links.
30          The library is not providing applied wrapper operations that consume a user's hand-implemented list primitives.
31    \item \PAB{These issues are compared at a requirement/functional level.}
32\end{itemize}
33
34Two preexisting linked-list libraries are used throughout, to show examples of the concepts being defined,
35and further libraries are introduced as needed.
36\begin{description}
37    \item[LQ] Linux Queue library\cite{lst:linuxq} of @<sys/queue.h>@.
38    \item[STL] C++ Standard Template Library's @std::list@\cite{lst:stl}
39\end{description}
40A general comparison of libraries' abilities is given under Related Work (Section~\ref{toc:lst:relwork}).
41
42The fictional type @req@ (request) is the user's payload in examples.
43The list library is helping the user track requests.
44A request represents work that the user must do but has not done yet.
45This work is on the level of handling a network arrival event or scheduling a thread.
46
47
48
49\subsection{Link attachment: Intrusive vs.\ Wrapped}
50\label{toc:lst:issue:attach}
51
52Link attachment deals with the question:
53Where are the system's inter-element link fields stored, in relation to the user's payload data fields?
54An intrusive list places the link fields inside the payload structure.
55A wrapped list places the payload inside a generic system-provided structure that also defines the link fields.
56LQ is intrusive; STL is wrapped.
57
58The wrapped style admits the further distinction between wrapping a reference and wrapping a value.
59This distinction is pervasive in all STL collections; @list<req *>@ wraps a reference; @list<req>@ wraps a value.
60(For this discussion, @list<req &>@ is similar to @list<req *>@.)
61This difference is one of user style, not framework capability.
62Figure~\ref{fig:lst-issues-attach} compares the three styles.
63
64\begin{comment}
65\begin{figure}
66    \begin{tabularx}{\textwidth}{Y|Y|Y}
67                \lstinput[language=C]{20-39}{lst-issues-intrusive.run.c}
68        &\lstinputlisting[language=C++]{20-39}{lst-issues-wrapped-byref.run.cpp}
69        &\lstinputlisting[language=C++]{20-39}{lst-issues-wrapped-emplaced.run.cpp}
70      \\ & &
71      \\
72        \includegraphics[page=1]{lst-issues-attach.pdf}
73        &
74        \includegraphics[page=2]{lst-issues-attach.pdf}
75        &
76        \includegraphics[page=3]{lst-issues-attach.pdf}
77      \\ & &
78      \\
79        (a) & (b) & (c)
80    \end{tabularx}
81\caption{
82        Three styles of link attachment: (a)~intrusive, (b)~wrapped reference, and (c)~wrapped value.
83        The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs};
84        head objects are discussed in Section~\ref{toc:lst:issue:ident}.
85        In (a), the field \lstinline{req.x} names a list direction;
86        these are discussed in Section~\ref{toc:lst:issue:derection}.
87        In (b) and (c), the type \lstinline{node} represents a system-internal type,
88        which is \lstinline{std::_List_node} in the GNU implementation.
89        (TODO: cite? found in  /usr/include/c++/7/bits/stl\_list.h )
90    }
91     \label{fig:lst-issues-attach}
92\end{figure}
93\end{comment}
94
95\begin{figure}
96\centering
97\newsavebox{\myboxA}                                    % used with subfigure
98\newsavebox{\myboxB}
99\newsavebox{\myboxC}
100
101\begin{lrbox}{\myboxA}
102\begin{tabular}{@{}l@{}}
103\lstinput[language=C]{20-39}{lst-issues-intrusive.run.c} \\
104\ \\
105\includegraphics[page=1]{lst-issues-attach.pdf}
106\end{tabular}
107\end{lrbox}
108
109\begin{lrbox}{\myboxB}
110\begin{tabular}{@{}l@{}}
111\lstinput[language=C++]{20-39}{lst-issues-wrapped-byref.run.cpp} \\
112\ \\
113\includegraphics[page=2]{lst-issues-attach.pdf}
114\end{tabular}
115\end{lrbox}
116
117\begin{lrbox}{\myboxC}
118\begin{tabular}{@{}l@{}}
119\lstinput[language=C++]{20-39}{lst-issues-wrapped-emplaced.run.cpp} \\
120\ \\
121\includegraphics[page=3]{lst-issues-attach.pdf}
122\end{tabular}
123\end{lrbox}
124
125\subfloat[Intrusive]{\label{f:Intrusive}\usebox\myboxA}
126\hspace{10pt}
127\vrule
128\hspace{10pt}
129\subfloat[Wrapped reference]{\label{f:WrappedRef}\usebox\myboxB}
130\hspace{10pt}
131\vrule
132\hspace{10pt}
133\subfloat[Wrapped value]{\label{f:WrappedValue}\usebox\myboxC}
134
135\caption{
136        Three styles of link attachment: \protect\subref*{f:Intrusive}~intrusive, \protect\subref*{f:WrappedRef}~wrapped
137        reference, and \protect\subref*{f:WrappedValue}~wrapped value.
138        The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs};
139        head objects are discussed in Section~\ref{toc:lst:issue:ident}.
140        In \protect\subref*{f:Intrusive}, the field \lstinline{req.x} names a list direction;
141        these are discussed in Section~\ref{toc:lst:issue:derection}.
142        In \protect\subref*{f:WrappedRef} and \protect\subref*{f:WrappedValue}, the type \lstinline{node} represents a
143        system-internal type, which is \lstinline{std::_List_node} in the GNU implementation.
144        (TODO: cite? found in  /usr/include/c++/7/bits/stl\_list.h )
145    }
146    \label{fig:lst-issues-attach}
147\end{figure}
148
149The advantage of intrusive attachment is the control that it gives the user over memory layout.
150Each diagrammed example is using the fewest dynamic allocations that its respective style allows.
151Both wrapped attachment styles imply system-induced heap allocations.
152Such an allocation has a lifetime that matches the item's membership in the list.
153In \subref*{f:Intrusive} and \subref*{f:WrappedRef}, one @req@ object can enter and leave a list many times.
154In \subref*{f:WrappedRef}, it implies a dynamic allocation/deallocation for each enter/leave; in \subref*{f:Intrusive}, it does not.
155
156A further aspect of layout control is allowing the user to specify the location of the link fields within the @req@ object.
157LQ allows this ability; a different mechanism of intrusion, such as inheriting from a @linkable@ base type, may not.
158Having this control means the user can allocate the link fields to cache lines along with the other @req@ fields.
159Doing this allocation sensibly can help improve locality or avoid false sharing.
160With an attachment mechanism that does not offer this control,
161a framework design choice or fact of the host language forces the links to be contiguous with either the beginning or end of the @req@.
162All wrapping realizations have this limitation in their wrapped-value configurations.
163
164Another subtle advantage of intrusive arrangement is that
165a reference to a user-level item (@req@) is sufficient to navigate or manage the item's membership.
166In LQ, \subref*{f:Intrusive}, a @req@ pointer is the right argument type for operations @LIST_NEXT@ or @LIST_REMOVE@;
167there is no distinguishing a @req@ from ``a @req@ in a list.''
168The same is not true of STL, \subref*{f:WrappedRef} or \subref*{f:WrappedValue}.
169There, the analogous operations work on a parameter of type @list<T>::iterator@;
170they are @iterator::operator++()@, @iterator::operator*()@, and @list::erase(iterator)@.
171There is no mapping from @req &@ to @list<req>::iterator@, except for linear search.
172
173The advantage of wrapped attachment is the abstraction of a data item from its list membership(s).
174In the wrapped style, the @req@ type can come from a library that serves many independent uses,
175which generally have no need for listing.
176Then, a novel use can still put @req@ in a (further) list, without requiring any upstream change in the @req@ library.
177In intrusive attachment, the ability to be listed must be planned during the definition of @req@.
178Similarly, style \subref*{f:WrappedRef} allows for one @req@ to occur at several positions in one list.
179Styles \subref*{f:Intrusive} and \subref*{f:WrappedValue} do not support this ability.
180\PAB{But style \subref*{f:WrappedValue} can sort of mimic this effect by have multiple copies of \lstinline{req} in the list, modulo changes to the copies are not seen by the original.}
181
182\begin{figure}
183    \lstinput[language=C++]{100-117}{lst-issues-attach-reduction.hpp}
184    \lstinput[language=C++]{150-150}{lst-issues-attach-reduction.hpp}
185    \caption{
186        Reduction of wrapped attachment to intrusive attachment.
187        Illustrated by pseudocode implementation of an STL-compatible API fragment
188        using LQ as the underlying implementation.
189        The gap that makes it pseudocode is that
190        the LQ C macros do not expand to valid C++ when instantiated with template parameters---there is no \lstinline{struct El}.
191        When using a custom-patched version of LQ to work around this issue,
192        the programs of Figure~\ref{f:WrappedRef} and \protect\subref*{f:WrappedValue} work with this shim in place of real STL.
193        Their executions lead to the same memory layouts.
194    }
195    \label{fig:lst-issues-attach-reduction}
196\end{figure}
197
198Wrapped attachment has a straightforward reduction to intrusive attachment, illustrated in Figure~\ref{fig:lst-issues-attach-reduction}.
199This shim layer performs the implicit dynamic allocations that pure intrusion avoids.
200But there is no reduction going the other way.
201No shimming can cancel the allocations to which wrapped membership commits.
202
203So intrusion is a lower-level listing primitive.
204And so, the system design choice is not between forcing users to use intrusion or wrapping.
205The choice is whether or not to provide access to an allocation-free layer of functionality.
206A wrapped-primitive library like STL forces users to incur the costs of wrapping, whether or not they access its benefits.
207An intrusive-primitive library like LQ lets users choose when to make this tradeoff.
208
209
210\subsection{Directionality: Single vs.\ Multi-Static vs.\ Dynamic}
211\label{toc:lst:issue:derection}
212
213Axis?
214
215\PAB{I'm not sure about the term \newterm{Directionality}. Directionality to me, means going forward or backwards through a list.
216Would \newterm{dimensionality} work? Think of each list containing the node as a different dimension in which the node sits.}
217
218Directionality deals with the question:
219In how many different lists can an item be stored, at a given time?
220
221Consider STL in the wrapped-value arrangement of Figure~\ref{f:WrappedValue}.
222The STL API completely hides its @node@ type from a user; the user cannot configure this choice or impose a custom one.
223STL's @node@ type offers the sole set of links shown in the diagram.
224Therefore, every @req@ in existence is allocated either to belong to an occurrence of the diagrammed arrangement,
225or to be apart from all occurrences of it.
226In the first case, the @req@ belongs to exactly one list (of the style in question).
227STL with wrapped values supports a single link direction.
228
229\begin{figure}
230    \parbox[t]{3.5in} {
231        \lstinput[language=C++]{20-60}{lst-issues-multi-static.run.c}
232    }\parbox[t]{20in} {
233        ~\\
234        \includegraphics[page=1]{lst-issues-direct.pdf} \\
235        ~\\
236        \hspace*{1.5in}\includegraphics[page=2]{lst-issues-direct.pdf}
237    }
238    \caption{
239        Example of two link directions, with an LQ realization. 
240        The zoomed-out diagram portion shows the whole example dataset, conceptually.
241        A consumer of this structure can navigate all requests in priority order, and navigate among requests from a common requestor.
242        The code is the LQ implementation.
243        The zoomed-in diagram portion shows the field-level state that results from running the LQ code.
244    }
245    \label{fig:lst-issues-multi-static}
246\end{figure}
247
248
249The user may benefit from a further link direction.
250Suppose the user must both: navigate all requests in priority order, and navigate among requests from a common requestor.
251Figure~\ref{fig:lst-issues-multi-static} shows such a situation.
252Each of its ``by priority'' and ``by requestor'' is a link direction.
253The example shows that a link direction can occur either as one global list (by-priority) or as many lists (there are three by-requestor lists).
254
255The limitation of intrusive attachment presented in Section~\ref{toc:lst:issue:attach}
256has a straightforward extension to multiple directions.
257The set of directions by which an item is to be listed must be planned during the definition of the item.
258Thus, intrusive LQ supports multiple, but statically many, link directions.
259
260% https://www.geeksforgeeks.org/introduction-to-multi-linked-list/ -- example of building a bespoke multi-linked list out of STL primitives (providing indication that STL doesn't offer one); offers dynamic directionality by embedding `vector<struct node*> pointers;`
261
262The corresponding flexibility of wrapped attachment means
263the STL wrapped-reference arrangement supports an item being a member of arbitrarily many lists.
264This support also applies to the wrapped-value list because the @req@ is copied,
265but wrapped-reference lists provide further link directions.
266\PAB{Explain how}
267STL with wrapped references supports dynamic link directions.
268\PAB{Expand}
269
270When allowing multiple static directions, frameworks differ in their ergonomics for
271the typical case: when the user needs only one direction, vs.\ the atypical case, when the user needs several.
272LQ's ergonomics are well-suited to the uncommon case of multiple list directions.
273Its intrusion declaration and insertion operation both use a mandatory explicit parameter naming the direction.
274This decision works well in Figure~\ref{fig:lst-issues-multi-static}, where the names @by_pri@ and @by_rqr@ work well,
275but it clutters Figure~\ref{f:Intrusive}, where a contrived name must be invented and used.
276The example uses @x@; @reqs@ would be a more readily ignored choice. \PAB{wording?}
277
278\uCpp offers an intrusive list that makes the opposite choice.  TODO: elaborate on inheritance for first direction and acrobatics for subsequent directions.
279
280
281\subsection{User integration: Preprocessed vs.\ Type-System Mediated}
282
283% example of poor error message due to LQ's preprocessed integration
284% programs/lst-issues-multi-static.run.c:46:1: error: expected identifier or '(' before 'do'
285%    46 | LIST_INSERT_HEAD(&reqs_rtr_42, &r42b, by_rqr);
286%       | ^~~~~~~~~~~~~~~~
287%
288% ... not a wonderful example; it was a missing semicolon on the preceding line; but at least real
289
290
291\subsection{List identity: Headed vs.\ Ad-hoc}
292\label{toc:lst:issue:ident}
293
294All examples so far have used distinct user-facing types:
295an item found in a list (type @req@, of variables like @r1@), and
296a list (type @reql@ or @list<req>@, of variables like @reqs@ or @reqs_rqr_42@).
297\see{Figure~\ref{fig:lst-issues-attach} and Figure~\ref{fig:lst-issues-multi-static}}
298The latter type is a head, and these examples are of headed lists.
299
300A bespoke ``pointer to next @req@'' implementation often omits the latter type.
301Such a list model is ad-hoc.
302
303In headed thinking, there are length-zero lists (heads with no elements), and an element can be listed or not listed.
304In ad-hoc thinking, there are no length-zero lists and every element belongs to a list of length at least one.
305\PAB{Create a figure for this.}
306
307By omitting the head, elements can enter into an adjacency relationship,
308without requiring that someone allocate room for the head of the possibly-resulting list,
309or being able to find a correct existing head.
310
311A head defines one or more element roles, among elements that share a transitive adjacency.
312``First'' and ``last'' are element roles.
313One moment's ``first'' need not be the next moment's.
314
315There is a cost to maintaining these roles.
316A runtime component of this cost is evident in LQ's offering the choice of type generators @LIST@ vs.~@TAILQ@.
317Its @LIST@ maintains a ``first,'' but not a ``last;'' its @TAILQ@ maintains both roles.
318(Both types are doubly linked and an analogous choice is available for singly linked.)
319
320TODO: finish making this point
321
322See WIP in lst-issues-adhoc-*.ignore.*.
323
324The code-complexity component of the cost ...
325
326Ability to offer heads is good.  Point: Does maintaining a head mean that the user has to provide more state when manipulating the list?  Requiring the user to do so is bad, because the user may have lots of "list" typed variables in scope, and detecting that the user passed the wrong one requires testing all the listing edge cases.
327
328\subsection{End treatment: Uniform }
329
330
331\section{Features}
332
333\subsection{Core Design Issues}
334
335This section reviews how a user experiences my \CFA list library's position on the issues of Section~\ref{toc:lst:issue}.
336The library provides a doubly-linked list that
337attaches links intrusively,
338supports multiple link directions,
339integrates with user code via the type system,
340treats its ends uniformly, and
341identifies a list using an explicit head.
342
343The \CFA list library's version of the running @req@ example is in Figure~\ref{fig:lst-features-intro}.
344Its link attachment is intrusive and the resulting memory layout is pure-stack, just as for the LQ version of Figure~\ref{f:Intrusive}.
345The framework-provided type @dlink(...)@ provides the links.
346The user inserts the links into the @req@ structure by using \CFA inline-inheritance (TODO: reference introduction).
347Inline inheritance means the type of the field is @dlink(req)@, the field is unnamed, a reference to a @req@ is implicitly convertible to @dlink@.\footnote{
348    The \CFA list examples elide the \lstinline{P9_EMBEDDED} annotations that (TODO: xref P9E future work) proposes to obviate.
349    Thus, these examples illustrate a to-be state, free of what is to be historic clutter.
350    The elided portions are immaterial to the discussion and the examples work with the annotations provided.
351    The \CFA test suite (TODO:cite?) includes equivalent demonstrations, with the annotations included.}
352These links have a nontrivial, user-specified location within the @req@ structure;
353this convention encapsulates the implied pointer arithmetic safely.
354
355\begin{figure}
356    \lstinput{20-32}{lst-features-intro.run.cfa}
357    \caption[Multiple link directions in \CFA list library]{
358        Demonstration of the running \lstinline{req} example, done using the \CFA list library.
359        This example does the same job that Figure~\ref{fig:lst-issues-attach} shows three ways.
360    }
361    \label{fig:lst-features-intro}
362\end{figure}
363
364\begin{figure}
365\centering
366\begin{tabular}{@{}ll@{}}
367\begin{tabular}{@{}l@{}}
368    \lstinput{20-25}{lst-features-multidir.run.cfa} \\
369    \lstinput{40-67}{lst-features-multidir.run.cfa}
370    \end{tabular}
371        &
372        \lstinput[language=C++]{20-60}{lst-issues-multi-static.run.c}
373        \end{tabular}
374
375\caption{
376        Demonstration of multiple static link directions done in the \CFA list library.
377        This example does the same job as Figure~\ref{fig:lst-issues-multi-static}.
378    }
379    \label{fig:lst-features-multidir}
380\end{figure}
381
382Figure~\ref{fig:lst-features-multidir} shows how the \CFA library supports multi-static link directionality.
383The declaration of @req@ now has two inline-inheriting @dlink@ occurrences.
384The first of these lines gives a type named @req.by_pri@, @req@ inherits from it, and it inherits from @dlink@.
385The second line @req.by_rqr@ is similar to @req.by_pri@.
386Thus, there is a diamond, non-virtual, inheritance from @req@ to @dlink@, with @by_pri@ and @by_rqr@ being the mid-level types.
387Disambiguation occurs in the declarations of the list-head objects.
388The type of the variable @reqs_pri_global@ is @dlist(req, req.by_pri)@,
389meaning operations called on @reqs_pri_global@ are implicitly disambiguated.
390In the example, the calls @insert_first(reqs_pri_global, ...)@ imply, ``here, we are working by priority.''
391
392The \CFA library also supports the common case, of single directionality, more naturally than LQ. 
393Figure~\ref{fig:lst-features-intro} shows a single-direction list done with no contrived name for the link direction,
394where Figure~\ref{f:Intrusive} adds the unnecessary name, @x@.
395In \CFA, a user doing a single direction (Figure~\ref{fig:lst-features-intro})
396sets up a simple inheritance with @dlink@, and declares a list head to have the simpler type @dlist(...)@.
397While a user doing multiple link directions (Figure~\ref{fig:lst-features-multidir})
398sets up a diamond inheritance with @dlink@, and declares a list head to have the more-informed type @dlist(..., DIR)@.
399
400The \CFA library offers a type-system mediated integration with user code.
401The examples presented do not use preprocessor macros.
402The touchpoints @dlink@ and @dlist@ are ordinary types.
403Even though they are delivered as header-included static-inline implementations,
404the \CFA compiler typechecks the list library code separately from user code.
405Errors in user code are reported only with mention of the library's declarations.
406
407The \CFA library works in headed and headless modes.  TODO: elaborate.
408
409\subsection{Iteration-FOUNDATIONS}
410
411TODO: This section should be moved to a Foundations chapter.  The next section stays under Linked List.
412
413
414\subsection{Iteration}
415
416
417\section{Future Work}
418\label{toc:lst:futwork}
419
420
421TODO: deal with: A doubly linked list is being designed.
422
423TODO: deal with: Link fields are system-managed.
424Links in GDB.
425
426\section{Related Work}
427\label{toc:lst:relwork}
Note: See TracBrowser for help on using the repository browser.