Context Navigation

source: doc/theses/mike_brooks_MMath/list.tex @ 9d84a88

ADTast-experimental

Last change on this file since 9d84a88 was 5717495, checked in by Michael Brooks <mlbrooks@…>, 15 months ago
Start of the linked-list chapter.
Property mode set to `100644`
File size: 19.1 KB

Line
1	\chapter{Linked List}
2
3	I wrote a linked-list library for \CFA. This chapter describes it.
4
5	The library provides a doubly-linked list that
6	attaches links intrusively,
7	supports multiple link directions,
8	integrates with user code via the type system,
9	treats its ends uniformly, and
10	identifies a list using an explicit head.
11
12	TODO: more summary
13
14
15
16	\section{Design Issues}
17	\label{toc:lst:issue}
18
19	This section introduces the design space for linked lists that target system programmers.
20
21	All design-issue discussions assume the following invariants.
22	They are stated here to clarify that none of the discussed design issues refers to one of these.
23	Alternatives to the assumptions are discussed under Future Work (Section~\ref{toc:lst:futwork}).
24	\begin{itemize}
25	\item A doubly-linked list is being designed.
26	Generally, the discussed issues apply similarly for singly-linked lists.
27	Circular vs ordered linking is discussed under List identity (Section~\ref{toc:lst:issue:ident}).
28	\item Link fields are system-managed.
29	The user works with the system-provided API to query and modify list membership.
30	The system has freedom over how to represent these links.
31	The library is not providing applied wrapper operations that consume a user's hand-implemented list primitives.
32	\item These issues are compared at a requirement/functional level.
33	\end{itemize}
34
35	Two preexisting linked-list libraries are used throughout, to show examples of the concepts being defined,
36	and further libraries are introduced as needed.
37	A general comparison of libraries' abilities is given under Related Work (Section~\ref{toc:lst:relwork}).
38	\begin{description}
39	\item[LQ] Linux Queue library\cite{lst:linuxq} of @<sys/queue.h>@.
40	\item[STL] C++ Standard Template Library's @std::list@\cite{lst:stl}
41	\end{description}
42
43	The fictional type @req@ (request) is the user's payload in examples.
44	The list library is helping the user track requests.
45	A request represents work that the user must do but has not done yet.
46	This work is on the level of handling a network arrival event or scheduling a thread.
47
48
49
50	\subsection{Link attachment: Intrusive vs.\ Wrapped}
51	\label{toc:lst:issue:attach}
52
53	Link attachment deals with the question:
54	Where are the system's inter-element link fields stored, in relation to the user's payload data fields?
55	An intrusive list places the link fields inside the payload structure.
56	A wrapped list places the payload inside a generic system-provided structure that also defines the link fields.
57	LQ is intrusive; STL is wrapped.
58
59	The wrapped style admits the further distinction between wrapping a reference and wrapping a value.
60	This distinction is pervasive in all STL collections; @list<req*>@ wraps a reference; @list<req>@ wraps a value.
61	(For this discussion, @list<req&>@ is similar to @list<req*>@.)
62	This difference is one of user style, not framework capability.
63
64	\begin{figure}
65	\begin{tabularx}{\textwidth}{Y\|Y\|Y}\lstinputlisting[language=C , firstline=20, lastline=39]{lst-issues-intrusive.run.c}
66	&\lstinputlisting[language=C++, firstline=20, lastline=39]{lst-issues-wrapped-byref.run.cpp}
67	&\lstinputlisting[language=C++, firstline=20, lastline=39]{lst-issues-wrapped-emplaced.run.cpp}
68	\\ & &
69	\\
70	\includegraphics[page=1]{lst-issues-attach.pdf}
71	&
72	\includegraphics[page=2]{lst-issues-attach.pdf}
73	&
74	\includegraphics[page=3]{lst-issues-attach.pdf}
75	\\ & &
76	\\
77	(a) & (b) & (c)
78	\end{tabularx}
79	\caption{
80	Three styles of link attachment: (a)~intrusive, (b)~wrapped reference, and (c)~wrapped value.
81	The diagrams show the memory layouts that result after the code runs, eliding the head object \lstinline{reqs};
82	head objects are discussed in Section~\ref{toc:lst:issue:ident}.
83	In (a), the field \lstinline{req.x} names a list direction;
84	these are discussed in Section~\ref{toc:lst:issue:derection}.
85	In (b) and (c), the type \lstinline{node} represents a system-internal type,
86	which is \lstinline{std::_List_node} in the GNU implementation.
87	(TODO: cite? found in /usr/include/c++/7/bits/stl\_list.h )
88	}
89	\label{fig:lst-issues-attach}
90	\end{figure}
91
92
93	Figure~\ref{fig:lst-issues-attach} compares the three styles.
94
95	The advantage of intrusive attachment is the control that it gives the user over memory layout.
96	Each diagrammed example is using the fewest dynamic allocations that its respective style allows.
97	Both wrapped attachment styles imply system-induced heap allocations.
98	Such an allocation has a lifetime that matches the item's membership in the list.
99	In (a) and (b), one @req@ object can enter and leave a list many times.
100	In (b), doing do implies a dynamic allocation for each time joining; in (a), it does not.
101
102	A further aspect of layout control is allowing the user to specify the location of the link fields within the @req@ object.
103	LQ allows this ability; a different mechanism of intrusion, such as inheriting from a @linkable@ base type, may not.
104	Having this control means the user can allocate the link fields to cache lines along with the other @req@ fields.
105	Doing this allocation sensibly can help improve locality or avoid false sharing.
106	With an attachment mechanism that does not offer this control,
107	a framework design choice or fact of the host language forces the links to be contiguous with either the beginning or end of the @req@.
108	All wrapping realizations have this limitation in their wrapped-value configurations.
109
110	Another subtle advantage of intrusive arrangement is that
111	a reference to a user-level item (@req@) is sufficient to navigate or manage the item's membership.
112	In LQ, (a), a @req@ pointer is the right argument type for operations @LIST_NEXT@ or @LIST_REMOVE@;
113	there is no distinguishing a @req@ from ``a @req@ in a list.''
114	The same is not true of STL, (b) or (c).
115	There, the analogous operations work on a parameter of type @list<T>::iterator@;
116	they are @iterator::operator++()@, @iterator::operator*()@, and @list::erase(iterator)@.
117	There is no mapping from @req&@ to @list<req>::iterator@, except for linear search.
118
119	The advantage of wrapped attachment is the abstraction of a data item from its list membership(s).
120	In the wrapped style, the @req@ type can come from a library that serves many independent uses,
121	which generally have no need for listing.
122	Then, a novel use can still put @req@ in a (further) list, without requiring any upstream change in the @re@ library.
123	In intrusive attachment, the ability to be listed must be planned during the definition of @req@.
124	Similarly, style (b) allows for one @req@ to occur at several positions in one list.
125	Styles (a) and (c) do not support this ability.
126
127	\begin{figure}
128	\lstinputlisting[language=C++, firstline=100, lastline=117]{lst-issues-attach-reduction.hpp}
129	\lstinputlisting[language=C++, firstline=150, lastline=150]{lst-issues-attach-reduction.hpp}
130	\caption{
131	Reduction of wrapped attachment to intrusive attachment.
132	Illustrated by pseudocode implementation of an STL-compatible API fragment
133	using LQ as the underlying implementation.
134	The gap that makes it pseudocode is that
135	the LQ C macros do not expand to valid C++ when instantiated with template parameters---there is no \lstinline{struct El}.
136	When using a custom-patched version of LQ to work around this issue,
137	the programs of Figure~\ref{fig:lst-issues-attach}~(b) and (c) work with this shim in place of real STL.
138	Their executions lead to the same memory layouts.
139	}
140	\label{fig:lst-issues-attach-reduction}
141	\end{figure}
142
143	Wrapped attachment has a straightforward reduction to intrusive attachment, illustrated in Figure~\ref{fig:lst-issues-attach-reduction}.
144	This shim layer performs the implicit dynamic allocations that pure intrusion avoids.
145	But there is no reduction going the other way.
146	No shimming can cancel the allocations to which wrapped membership commits.
147
148	So intrusion is a lower-level listing primitive.
149	And so, the system design choice is not between forcing users to use intrusion and forcing them to use wrapping.
150	The choice is whether or not to provide access to an allocation-free layer of functionality.
151	A wrapped-primitive library like STL forces users to incur the costs of wrapping, whether or not they access its benefits.
152	An intrusive-primitive library like LQ lets users choose when to make this tradeoff.
153
154
155
156
157	\subsection{Directionality: Single vs.\ Multi-Static vs.\ Dynamic}
158	\label{toc:lst:issue:derection}
159
160	Directionality deals with the question:
161	In how many different lists can an item be stored, at a given time?
162
163
164	Consider STL in the wrapped-value arrangement of Figure~\ref{fig:lst-issues-attach}~(c).
165	The STL API completely hides its @node@ type from a user; the user cannot configure this choice or impose a custom one.
166	STL's @node@ type offers the sole set of links shown in the diagram.
167	Therefore, every @req@ in existence was allocated either to belong to an occurrence of the diagrammed arrangement,
168	or to be apart from all occurrences of it.
169	In the first case, the @req@ belongs to exactly one list (of the style in question).
170	STL with wrapped values supports a single link direction.
171
172	\begin{figure}
173	\label{fig:lst-issues-multi-static}
174	\parbox[t]{3.5in} {
175	\lstinputlisting[language=C++, firstline=20, lastline=60]{lst-issues-multi-static.run.c}
176	}\parbox[t]{20in} {
177	~\\
178	\includegraphics[page=1]{lst-issues-direct.pdf} \\
179	~\\
180	\hspace*{1.5in}\includegraphics[page=2]{lst-issues-direct.pdf}
181	}
182	\caption{
183	Example of two link directions, with an LQ realization.
184	The zoomed-out diagram portion shows the whole example dataset, conceptually.
185	A consumer of this structure can navigate all requests in priority order, and navigate among requests from a common requestor.
186	The code is the LQ implementation.
187	The zoomed-in diagram portion shows the field-level state that results from running the LQ code.
188	}
189	\end{figure}
190
191
192	The user may benefit from a further link direction.
193	Suppose the user must both: navigate all requests in priority order, and navigate among requests from a common requestor.
194	Figure~\ref{fig:lst-issues-multi-static} shows such a situation.
195	Each of its ``by priority'' and ``by requestor'' is a link direction.
196	The example shows that a link direction can occur either as one global list (by-priority) or as many lists (there are three by-requestor lists).
197
198	The limitation of intrusive attachment presented in Section~\ref{toc:lst:issue:attach}
199	has a straightforward extension to multiple directions.
200	The set of directions by which an item is to be listed must be planned during the definition of the item.
201	Thus, intrusive LQ supports multiple, but statically many, link directions.
202
203	% https://www.geeksforgeeks.org/introduction-to-multi-linked-list/ -- example of building a bespoke multi-linked list out of STL primitives (providing indication that STL doesn't offer one); offers dynamic directionality by embedding `vector<struct node*> pointers;`
204
205	The corresponding flexibility of wrapped attachment means
206	the STL wrapped-reference arrangement supports an item being a member of arbitrarily many lists.
207	This support also applies to the hybrid in which an item's allocation is within a wrapped-value list,
208	but wrapped-reference lists provide further link directions.
209	STL with wrapped references supports dynamic link directions.
210
211	When allowing multiple static directions,frameworks differ in their ergonomics for
212	the typical case, when the user needs only one direction, vs.\ the atypical case, when the user needs several.
213
214	LQ's ergonomics are well-suited to the uncommon case of multiple list directions.
215	Its intrusion declaration and insertion operation both use a mandatory explicit parameter naming the direction.
216	This decision works well in Figure~\ref{fig:lst-issues-multi-static}, where the names @by_pri@ and @by_rqr@ work well,
217	but it clutters Figure~\ref{fig:lst-issues-attach}~(a), where a contrived name must be invented and used.
218	The example uses @x@; @reqs@ would be a more readily ignored choice.
219
220	\uCpp offers an intrusive list that makes the opposite choice. TODO: elaborate on inheritance for first direction and acrobatics for subsequent directions.
221
222
223
224
225	\subsection{User integration: Preprocessed vs.\ Type-System Mediated}
226
227	% example of poor error message due to LQ's preprocessed integration
228	% programs/lst-issues-multi-static.run.c:46:1: error: expected identifier or ‘(’ before ‘do’
229	% 46 \| LIST_INSERT_HEAD(&reqs_rtr_42, &r42b, by_rqr);
230	% \| ^~~~~~~~~~~~~~~~
231	%
232	% ... not a wonderful example; it was a missing semicolon on the preceding line; but at least real
233
234
235	\subsection{List identity: Headed vs.\ Ad-hoc}
236	\label{toc:lst:issue:ident}
237
238	All examples so far have used distinct user-facing types:
239	an item found in a list (type @req@, of variables like @r1@), and
240	a list (type @reql@ or @list<req>@, of variables like @reqs@ or @reqs_rqr_42@).
241	The latter type is a head, and these examples are of are headed lists.
242
243	A bespoke ``pointer to next @req@'' implementation often omits the latter type.
244	Such a list model is ad-hoc.
245
246	In headed thinking, there are length-zero lists (heads with no elements), and an element can be listed or not listed.
247	In ad-hoc thinking, there are no length-zero lists and every element belongs to a list of length at least one.
248
249	By omitting the head, elements can enter into an adjacency relationship,
250	without requiring that someone allocate room for the head of the possibly-resulting list,
251	or being able to find a correct existing head.
252
253	A head defines one or more element roles, among elements that share a transitive adjacency.
254	``First'' and ``last'' are element roles.
255	One moment's ``first'' need not be the next moment's.
256
257	There is a cost to maintaining these roles.
258	A runtime component of this cost is evident in LQ's offering the choice of type generators @LIST@ vs.~@TAILQ@.
259	Its @LIST@ maintains a ``first,'' but not a ``last;'' its @TAILQ@ maintains both roles.
260	(Both types are doubly linked and an analogous choice is available for singly linked.)
261
262	TODO: finish making this point
263
264	See WIP in lst-issues-adhoc-.ignore..
265
266	The code-complexity component of the cost ...
267
268	Ability to offer heads is good. Point: Does maintaining a head mean that the user has to provide more state when manipulating the list? Requiring the user to do so is bad, because the user may have lots of "list" typed variables in scope, and detecting that the user passed the wrong one requires testing all the listing edge cases.
269
270	\subsection{End treatment: Uniform }
271
272
273	\section{Features}
274
275	\subsection{Core Design Issues}
276
277	This section reviews how a user experiences my \CFA list library's position on the issues of Section~\ref{toc:lst:issue}.
278	The library provides a doubly-linked list that
279	attaches links intrusively,
280	supports multiple link directions,
281	integrates with user code via the type system,
282	treats its ends uniformly, and
283	identifies a list using an explicit head.
284
285
286	\begin{figure}
287	\label{fig:lst-features-intro}
288	\lstinputlisting[language=CFA, firstline=20, lastline=32]{lst-features-intro.run.cfa}
289	\caption[Multiple link directions in \CFA list library]{
290	Demonstration of the running \lstinline{req} example, done using the \CFA list library\protect\footnotemark.
291	This example does the same job that Figure~\ref{fig:lst-issues-attach} shows three ways.
292	}
293	\end{figure}
294	\footnotetext{
295	The \CFA list examples elide the \lstinline{P9_EMBEDDED} annotations that (TODO: xref P9E future work) proposes to obviate.
296	Thus, these examples illustrate a to-be state, free of what is to be historic clutter.
297	The elided portions are immaterial to the discussion and the examples work with the annotations provided.
298	The \CFA test suite (TODO:cite?) includes equivalent demonstrations, with the annotations included.
299	}.
300
301	My \CFA list library's version of the running @req@ example is in Figure~\ref{fig:lst-features-intro}.
302	Its link attachment is intrusive and the resulting memory layout is pure-stack, just as for the LQ version of Figure~\ref{fig:lst-issues-attach}-(a).
303	The framework-provided type @dlink(-)@ provides the links.
304	The user puts links into the @req@ structure by inline-inheriting (TODO: reference introduction) this type.
305	Which means: the type of the field is @dlink(req)@; the field is unnamed; a reference to a @req@ is implicitly convertible to @dlink@.
306	As these links have a nontrivial, user-specified location within the @req@ structure, this conversion also encapsulates the implied pointer arithmetic safely.
307
308	\begin{figure}
309	\label{fig:lst-features-multidir}
310	\lstinputlisting[language=CFA, firstline=20, lastline=25]{lst-features-multidir.run.cfa}
311	\lstinputlisting[language=CFA, firstline=40, lastline=67]{lst-features-multidir.run.cfa}
312	\caption{
313	Demonstration of multiple static link directions done in the \CFA list library.
314	This example does the same job as Figure~\ref{fig:lst-issues-multi-static}.
315	}
316	\end{figure}
317
318	The \CFA library supports multi-static link directionality. Figure~\ref{fig:lst-features-multidir} illustrates how.
319	The declaration of @req@ now has two inline-inheriting @dlink@ occurrences.
320	The first of these lines gives a type named @req.by_pri@; @req@ inherits from it; it inherits from @dlink@.
321	The second line gives a similar @req.by_rqr@.
322	Thus, there is a diamond, non-virtual, inheritance from @req@ to @dlink@, with @by_pri@ and @by_rqr@ being the mid-level types.
323	Disambiguation occurs in the declarations of the list-head objects.
324	The type of the variable @reqs_pri_global@ is @dlist(req, req.by_pri)@,
325	meaning operations called on @reqs_pri_global@ are implicitly disambiguated.
326	In the example, the calls @insert_first(reqs_pri_global, ...)@ imply, ``here, we're working by priority.''
327
328	The \CFA library also supports the common case, of single directionality, more naturally than LQ.
329	Figure~\ref{fig:lst-features-intro} shows a single-direction list done with no contrived name for the link direction,
330	where Figure~\ref{fig:lst-issues-attach}-(a) adds the unnecessary name, @x@.
331	In \CFA, a user doing a single direction (Figure~\ref{fig:lst-features-intro})
332	sets up a simple inheritance with @dlink@, and declares a list head to have the simpler type @dlist(-)@.
333	While a user doing multiple link directions (Figure~\ref{fig:lst-features-multidir})
334	sets up a diamond inheritance with @dlink@, and declares a list head to have the more-informed type @dlist(-, DIR)@.
335
336	The \CFA library offers a type-system mediated integration with user code.
337	The examples presented do not use preprocessor macros.
338	The touchpoints @dlink@ and @dlist@ are ordinary types.
339	Even though they are delivered as header-included static-inline implementations,
340	the \CFA compiler typechecks the list library code separately from user code.
341	Errors in user code are reported only with mention of the library's declarations.
342
343	The \CFA library works in headed and headless modes. TODO: elaborate.
344
345	\subsection{Iteration-FOUNDATIONS}
346
347	TODO: This section should be moved to a Foundations chapter. The next section stays under Linked List.
348
349
350
351
352
353	\subsection{Iteration}
354
355
356	\section{Future Work}
357	\label{toc:lst:futwork}
358
359
360	TODO: deal with: A doubly linked list is being designed.
361
362	TODO: deal with: Link fields are system-managed.
363	Links in GDB.
364
365	\section{Related Work}
366	\label{toc:lst:relwork}

Note: See TracBrowser for help on using the repository browser.

Download in other formats: