source: doc/theses/mike_brooks_MMath/list.tex@ 2581f1e

Last change on this file since 2581f1e was 6762d46, checked in by Peter A. Buhr <pabuhr@…>, 2 weeks ago

working on Section 4.6.3

  • Property mode set to 100644
File size: 77.5 KB
Line 
1\chapter{Linked List}
2
3This chapter presents my work on designing and building a linked-list library for \CFA.
4Due to time limitations and the needs expressed by the \CFA runtime developers, I focussed on providing a doubly-linked list, and its bidirectionally iterators for traversal.
5Simpler data-structures, like stack and queue, can be built from the doubly-linked mechanism with only a slight storage/performance cost because of the unused link field.
6Reducing to data-structures with a single link follows directly from the more complex doubly-links and its iterators.
7
8
9\section{Plan-9 Inheritance}
10\label{s:Plan9Inheritance}
11
12This chapter uses a form of inheritance from the Plan-9 C dialect~\cite[\S~3.3]{Thompson90new}, which is supported by @gcc@ and @clang@ using @-fplan9-extensions@.
13\CFA has its own variation of the Plan-9 mechanism, where the nested type is denoted using @inline@.
14\begin{cfa}
15union U { int x; double y; char z; };
16struct W { int i; double j; char k; };
17struct S {
18 @inline@ struct W; $\C{// extended Plan-9 inheritance}$
19 unsigned int tag;
20 @inline@ U; $\C{// extended Plan-9 inheritance}$
21} s;
22\end{cfa}
23Inline inheritance is containment, where the inlined field is unnamed and the type's internal fields are hoisted into the containing structure.
24Hence, the field names must be unique, unlike \CC nested types, but any inlined type-names are at a nested scope level, unlike aggregate nesting in C.
25Note, the position of the containment is normally unimportant, unless there is some form of memory or @union@ overlay.
26Finally, the inheritance declaration of @U@ is not prefixed with @union@.
27Like \CC, \CFA allows optional prefixing of type names with their kind, \eg @struct@, @union@, and @enum@, unless there is ambiguity with variable names in the same scope.
28
29\VRef[Figure]{f:Plan9Polymorphism} shows the key polymorphic feature of Plan-9 inheritance: implicit conversion of values and pointers for nested types.
30In the example, there are implicit conversions from @S@ to @U@ and @S@ to @W@, extracting the appropriate value or pointer for the substructure.
31\VRef[Figure]{f:DiamondInheritancePattern} shows complex multiple inheritance patterns are possible, like the \newterm{diamond pattern}~\cite[\S~6.1]{Stroustrup89}\cite[\S~4]{Cargill91}.
32Currently, the \CFA type-system does not support @virtual@ inheritance.
33
34\begin{figure}
35\begin{cfa}
36void f( U, U * ); $\C{// value, pointer}$
37void g( W, W * ); $\C{// value, pointer}$
38U u, * up; W w, * wp; S s, * sp;
39u = s; up = sp; $\C{// value, pointer}$
40w = s; wp = sp; $\C{// value, pointer}$
41f( s, &s ); $\C{// value, pointer}$
42g( s, &s ); $\C{// value, pointer}$
43\end{cfa}
44\caption{Plan-9 Polymorphism}
45\label{f:Plan9Polymorphism}
46\end{figure}
47
48\begin{figure}
49\setlength{\tabcolsep}{10pt}
50\begin{tabular}{ll@{}}
51\multicolumn{1}{c}{\CC} & \multicolumn{1}{c}{\CFA} \\
52\begin{c++}
53struct B { int b; };
54struct L : @public B@ { int l, w; };
55struct R : @public B@ { int r, w; };
56struct T : @public L, R@ { int t; };
57 T
58 B L B R
59T t = { { { 5 }, 3, 4 }, { { 8 }, 6, 7 }, 3 };
60t.t = 42;
61t.l = 42;
62t.r = 42;
63((L &)t).b = 42; // disambiguate
64\end{c++}
65&
66\begin{cfa}
67struct B { int b; };
68struct L { @inline B;@ int l, w; };
69struct R { @inline B;@ int r, w; };
70struct T { @inline L; inline R;@ int t; };
71 T
72 B L B R
73T t = { { { 5 }, 3, 4 }, { { 8 }, 6, 7 }, 3 };
74t.t = 42;
75t.l = 42;
76t.r = 42;
77((L &)t).b = 42; // disambiguate, proposed solution t.L.b = 42;
78\end{cfa}
79\end{tabular}
80\caption{Diamond Non-Virtual Inheritance Pattern}
81\label{f:DiamondInheritancePattern}
82\end{figure}
83
84
85\section{Features}
86
87The following features directed this project, where the goal is high-performance list operations required by \CFA runtime components, like the threading library.
88
89
90\subsection{Core Design Issues}
91
92The doubly-linked list attaches links intrusively in a node, allows a node to appear on multiple lists (axes) simultaneously, integrates with user code via the type system, treats its ends uniformly, and identifies a list using an explicit head.
93This design covers system and data management issues stated in \VRef{toc:lst:issue}.
94
95\VRef[Figure]{fig:lst-features-intro} continues the running @req@ example from \VRef[Figure]{fig:lst-issues-attach} using the \CFA list @dlist@.
96The \CFA link attachment is intrusive so the resulting memory layout is per user node, as for the LQ version of \VRef[Figure]{f:Intrusive}.
97The \CFA framework provides generic type @dlink( T )@ (not to be confused with @dlist@) for the two link fields (front and back).
98A user inserts the links into the @req@ structure via \CFA inline-inheritance \see{\VRef{s:Plan9Inheritance}}.
99Lists leverage the automatic conversion of a pointer to anonymous inline field for assignments and function calls.
100Therefore, a reference to a @req@ is implicitly convertible to @dlink@ in both contexts.
101The links in @dlist@ point at another embedded @dlist@ node, know the offsets of all links (data is abstract but accessible), and any field-offset arithmetic or link-value changes are safe and abstract.
102
103\begin{figure}
104 \lstinput{20-30}{lst-features-intro.run.cfa}
105 \caption[Multiple link axes in \CFA list library]{
106 Demonstration of the running \lstinline{req} example, done using the \CFA list library.
107 This example is equivalent to the three approaches in \VRef[Figure]{fig:lst-issues-attach}.
108 }
109 \label{fig:lst-features-intro}
110\end{figure}
111
112\VRef[Figure]{f:dlistOutline} shows the outline for types @dlink@ and @dlist@.
113Note, the first @forall@ clause is distributed across all the declaration in its containing block, eliminating repetition on each declaration.
114The second nested @forall@ on @dlist@ is also distributed and adds an optional second type parameter, @tLinks@, denoting the linking axis \see{\VRef{s:Axis}}, \ie the kind of list this node can appear on.
115
116\begin{figure}
117\begin{cfa}
118forall( tE & ) { $\C{// distributed}$
119 struct @dlink@ { ... }; $\C{// abstract type}$
120 static inline void ?{}( dlink( tE ) & this ); $\C{// constructor}$
121
122 forall( tLinks & = dlink( tE ) ) { $\C{// distributed, default type for axis}$
123 struct @dlist@ { ... }; $\C{// abstract type}$
124 static inline void ?{}( dlist( tE, tLinks ) & this ); $\C{// constructor}$
125 }
126}
127\end{cfa}
128\caption{\lstinline{dlisk} / \lstinline{dlist} Outline}
129\label{f:dlistOutline}
130\end{figure}
131
132\VRef[Figure]{fig:lst-features-multidir} shows how the \CFA library supports multi-inline links, so a node has multiple axes.
133The declaration of @req@ has two inline-inheriting @dlink@ occurrences.
134The first of these gives a type named @req.by_pri@, @req@ inherits from it, and it inherits from @dlink@.
135The second line @req.by_rqr@ is similar to @req.by_pri@.
136Thus, there is a diamond, non-virtual, inheritance from @req@ to @dlink@, with @by_pri@ and @by_rqr@ being the mid-level types \see{\VRef[Figure]{f:DiamondInheritancePattern}}.
137
138The declarations of the list-head objects, @reqs_pri@, @reqs_rqr_42@, @reqs_rqr_17@, and @reqs_rqr_99@, bind which link nodes in @req@ are used by this list.
139Hence, the type of the variable @reqs_pri@, @dlist(req, req.by_pri)@, means operations on @reqs_pri@ implicitly select (disambiguate) the correct @dlink@s, \eg the calls @insert_first(reqs_pri, ...)@ imply, ``here, we are working by priority.''
140As in \VRef[Figure]{fig:lst-issues-multi-static}, three lists are constructed, a priority list containing all nodes, a list with only nodes containing the value 42, and a list with only nodes containing the value 17.
141
142\begin{figure}
143\centering
144
145\begin{tabular}{@{}ll@{}}
146\multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{C} \\
147 \begin{tabular}{@{}l@{}}
148 \lstinput{20-31}{lst-features-multidir.run.cfa} \\
149 \lstinput{43-71}{lst-features-multidir.run.cfa}
150 \end{tabular}
151&
152 \lstinput[language=C++]{20-60}{lst-issues-multi-static.run.c}
153\end{tabular}
154
155\caption{
156 Demonstration of multiple static link axes done in the \CFA list library.
157 The right example is from \VRef[Figure]{fig:lst-issues-multi-static}.
158 The left \CFA example does the same job.
159}
160\label{fig:lst-features-multidir}
161\end{figure}
162
163The list library also supports the common case of single directionality more naturally than LQ.
164Returning to \VRef[Figure]{fig:lst-features-intro}, the single-axis list has no contrived name for the link axis as it uses the default type in the definition of @dlist@;
165in contrast, the LQ list in \VRef[Figure]{f:Intrusive} adds the unnecessary field name @d@.
166In \CFA, a single axis list sets up a single inheritance with @dlink@, and the default list axis is to itself.
167
168When operating on a list with several axes and operations that do not take the list head, the list axis can be ambiguous.
169For example, a call like @insert_after( r1, r2 )@ does not have enough information to know which axes to select implicitly.
170Is @r2@ supposed to be the next-priority request after @r1@, or is @r2@ supposed to join the same-requester list of @r1@?
171As such, the \CFA type-system gives an ambiguity error for this call.
172There are multiple ways to resolve the ambiguity.
173The simplest is an explicit cast on each call to select the specific axis, \eg @insert_after( (by_pri)r1, r2 )@.
174However, multiple explicit casts are tedious and error-prone.
175To mitigate this issue, the list library provides a hook for applying the \CFA language's scoping and priority rules.
176\begin{cfa}
177with ( DLINK_VIA( req, req.pri ) ) insert_after( r1, r2 );
178\end{cfa}
179Here, the @with@ statement opens the scope of the object type for the expression;
180hence, the @DLINK_VIA@ result causes one of the list axes to become a more attractive candidate to \CFA's overload resolution.
181This boost can be applied across multiple statements in a block or an entire function body.
182\begin{cquote}
183\setlength{\tabcolsep}{15pt}
184\begin{tabular}{@{}ll@{}}
185\begin{cfa}
186@with( DLINK_VIA( req, req.pri ) ) {@
187 ... insert_after( r1, r2 ); ...
188@}@
189\end{cfa}
190&
191\begin{cfa}
192void f() @with( DLINK_VIA( req, req.pri ) ) {@
193 ... insert_after( r1, r2 ); ...
194@}@
195\end{cfa}
196\end{tabular}
197\end{cquote}
198Within the @with@, the code acts as if there is only one list axis, without explicit casting.
199
200Unlike the \CC template container-types, the \CFA library works completely within the type system;
201both @dlink@ and @dlist@ are ordinary types, not language macros.
202There is no textual expansion other than header-included static-inline function for performance.
203Hence, errors in user code are reported only with mention of the library's declarations, versus long template names in error messages.
204Finally, the library is separately compiled from the usage code, modulo inlining.
205
206
207\section{List API}
208
209\VRef[Figure]{f:ListAPI} shows the API for the doubly-link list operations, where each is explained.
210\begin{itemize}[leftmargin=*]
211\item
212@listed@ returns true if the node is an element of a list and false otherwise.
213\item
214@empty@ returns true if the list has no nodes and false otherwise.
215\item
216@first@ returns a reference to the first node of the list without removing it or @0p@ if the list is empty.
217\item
218@last@ returns a reference to the last node of the list without removing it or @0p@ if the list is empty.
219\item
220@insert_before@ adds a node before a specified node \see{\lstinline{insert_last} for insertion at the end}\footnote{
221Some list packages allow \lstinline{0p} (\lstinline{nullptr}) for the before/after node implying insert/remove at the start/end of the list, respectively.
222However, this inserts an \lstinline{if} statement in the fastpath of a potentially commonly used list operation.}
223\item
224@insert_after@ adds a node after a specified node \see{\lstinline{insert_first} for insertion at the start}\footnotemark[\value{footnote}].
225\item
226@remove@ removes a specified node from the list (any location) and returns a reference to the node.
227\item
228@iter@ create an iterator for the list.
229\item
230@recede@ returns true if the iterator cursor is advanced to the previous (predecessor, towards first) node before the prior cursor node and false otherwise.
231\item
232@advance@ returns true if the iterator cursor is advanced to the next (successor, towards last) node after the prior cursor node and false otherwise.
233\item
234@first@ returns true if the node is the first node in the list and false otherwise.
235\item
236@last@ returns true if the node is the last node in the list and false otherwise.
237\item
238@pred@ returns a reference to the previous (predecessor, towards first) node before a specified node or @0p@ if a specified node is the first node in the list.
239\item
240@next@ returns a reference to the next (successor, towards last) node after a specified node or @0p@ if a specified node is the last node in the list.
241\item
242@insert_first@ adds a node to the start of the list so it becomes the first node and returns a reference to the node for cascading.
243\item
244@insert_last@ adds a node to the end of the list so it becomes the last node and returns returns a reference to the node for cascading.
245\item
246@remove_first@ removes the first node and returns a reference to it or @0p@ if the list is empty.
247\item
248@remove_last@ removes the last node and returns a reference to it or @0p@ if the list is empty.
249\item
250@transfer@ transfers all nodes from the @from@ list to the end of the @to@ list; the @from@ list is empty after the transfer.
251\item
252@split@ transfers the @from@ list up to node to the end of the @to@ list; the @from@ list becomes the list after the node.
253The node must be in the @from@ list.
254\end{itemize}
255For operations @insert_*@, @insert@, and @remove@, a variable-sized list of nodes can be specified using \CFA's tuple type~\cite[\S~4.7]{Moss18} (not discussed), \eg @insert( list, n1, n2, n3 )@, recursively invokes @insert( list, n@$_i$@ )@.\footnote{
256Currently, a resolver bug between tuple types and references means tuple routines must use pointer parameters.
257Nevertheless, the imaginary reference versions are used here as the code is cleaner, \eg no \lstinline{&} on call arguments.}
258
259\begin{figure}
260\begin{cfa}
261E & listed( E & node );
262E & empty( dlist( E ) & list );
263E & first( dlist( E ) & list );
264E & last( dlist( E ) & list );
265E & insert_before( E & before, E & node );
266E & insert_after( E & after, E & node );
267E & remove( E & node );
268E & iter( dlist( E ) & list );
269bool advance( E && refx );
270bool recede( E && refx );
271bool first( E & node );
272bool last( E & node );
273E & prev( E & node );
274E & next( E & node );
275E & insert_first( dlist( E ) & list, E & node );
276E & insert_last( dlist( E ) & list, E & node ); // synonym insert
277E & remove_first( dlist( E ) & list );
278E & remove_last( dlist( E ) & list );
279void transfer( dlist( E ) & to, dlist( E ) & from ) {
280void split( dlist( E ) & to, dlist( E ) & from, E & node ) {
281\end{cfa}
282\caption{\CFA List API}
283\label{f:ListAPI}
284\end{figure}
285
286
287\subsection{Iteration}
288
289It is possible to iterate through a list manually or using a set of standard macros.
290\VRef[Figure]{f:IteratorDriver} shows the iterator outline, managing a list of nodes, used throughout the following iterator examples.
291Each example assumes its loop body prints the value in the current node.
292
293\begin{figure}
294\begin{cfa}
295#include <fstream.hfa>
296#include <list.hfa>
297struct node {
298 int v;
299 inline dlink(node);
300};
301int main() {
302 dlist(node) list;
303 node n1 = { 1 }, n2 = { 2 }, n3 = { 3 }, n4 = { 4 };
304 insert( list, n1, n2, n3, n4 ); $\C{// insert in any order}$
305 sout | nlOff;
306 @for ( ... )@ sout | it.v | ","; sout | nl; $\C{// iterator examples in text}$
307 remove( n1, n2, n3, n4 ); $\C{// remove in any order}$
308}
309\end{cfa}
310\caption{Iterator Driver}
311\label{f:IteratorDriver}
312\end{figure}
313
314The manual method is low level but allows complete control of the iteration.
315The list cursor (index) can be either a pointer or a reference to a node in the list.
316The choice depends on how the programmer wants to access the fields: @it->f@ or @it.f@.
317The following examples use a reference because the loop body manipulates the node values rather than the list pointers.
318The end of iteration is denoted by the loop cursor returning @0p@.
319
320\noindent
321Iterating forward and reverse through the entire list.
322\begin{cquote}
323\setlength{\tabcolsep}{15pt}
324\begin{tabular}{@{}l|l@{}}
325\begin{cfa}
326for ( node & it = @first@( list ); &it /* != 0p */ ; &it = &@next@( it ) ...
327for ( node & it = @last@( list ); &it; &it = &@prev@( it ) ) ...
328\end{cfa}
329&
330\begin{cfa}
3311, 2, 3, 4,
3324, 3, 2, 1,
333\end{cfa}
334\end{tabular}
335\end{cquote}
336Iterating forward and reverse from a starting node through the remaining list.
337\begin{cquote}
338\setlength{\tabcolsep}{15pt}
339\begin{tabular}{@{}l|l@{}}
340\begin{cfa}
341for ( node & it = @n2@; &it; &it = &@next@( it ) ) ...
342for ( node & it = @n3@; &it; &it = &@prev@( it ) ) ...
343\end{cfa}
344&
345\begin{cfa}
3462, 3, 4,
3473, 2, 1,
348\end{cfa}
349\end{tabular}
350\end{cquote}
351Iterating forward and reverse from a starting node to an ending node through the contained list.
352\begin{cquote}
353\setlength{\tabcolsep}{15pt}
354\begin{tabular}{@{}l|l@{}}
355\begin{cfa}
356for ( node & it = @n2@; &it @!= &n4@; &it = &@next@( it ) ) ...
357for ( node & it = @n4@; &it @!= &n2@; &it = &@prev@( it ) ) ...
358\end{cfa}
359&
360\begin{cfa}
3612, 3,
3624, 3,
363\end{cfa}
364\end{tabular}
365\end{cquote}
366Iterating forward and reverse through the entire list using the shorthand start at the list head and pick an axis.
367In this case, @advance@ and @recede@ return a boolean, like \CC @while ( cin >> i )@.
368\begin{cquote}
369\setlength{\tabcolsep}{15pt}
370\begin{tabular}{@{}l|l@{}}
371\begin{cfa}
372for ( node & it = @iter@( list ); @advance@( it ); ) ...
373for ( node & it = @iter@( list ); @recede@( it ); ) ...
374\end{cfa}
375&
376\begin{cfa}
3771, 2, 3, 4,
3784, 3, 2, 1,
379\end{cfa}
380\end{tabular}
381\end{cquote}
382Finally, there are convenience macros that look like @foreach@ in other programming languages.
383Iterating forward and reverse through the entire list.
384\begin{cquote}
385\setlength{\tabcolsep}{15pt}
386\begin{tabular}{@{}l|l@{}}
387\begin{cfa}
388FOREACH( list, it ) ...
389FOREACH_REV( list, it ) ...
390\end{cfa}
391&
392\begin{cfa}
3931, 2, 3, 4,
3944, 3, 2, 1,
395\end{cfa}
396\end{tabular}
397\end{cquote}
398Iterating forward and reverse through the entire list or until a predicate is triggered.
399\begin{cquote}
400\setlength{\tabcolsep}{15pt}
401\begin{tabular}{@{}l|l@{}}
402\begin{cfa}
403FOREACH_COND( list, it, @it.v == 3@ ) ...
404FOREACH_REV_COND( list, it, @it.v == 1@ ) ...
405\end{cfa}
406&
407\begin{cfa}
4081, 2,
4094, 3, 2,
410\end{cfa}
411\end{tabular}
412\end{cquote}
413Macros are not ideal, so future work is to provide a language-level @foreach@ statement, like \CC.
414Finally, a predicate can be added to any of the manual iteration loops.
415\begin{cquote}
416\setlength{\tabcolsep}{15pt}
417\begin{tabular}{@{}l|l@{}}
418\begin{cfa}
419for ( node & it = first( list ); &it @&& !(it.v == 3)@; &it = &next( it ) ) ...
420for ( node & it = last( list ); &it @&& !(it.v == 1)@; &it = &prev( it ) ) ...
421for ( node & it = iter( list ); advance( it ) @&& !(it.v == 3)@; ) ...
422for ( node & it = iter( list ); recede( it ) @&& !(it.v == 1)@; ) ...
423\end{cfa}
424&
425\begin{cfa}
4261, 2,
4274, 3, 2,
4281, 2,
4294, 3, 2,
430\end{cfa}
431\end{tabular}
432\end{cquote}
433
434\begin{comment}
435Many languages offer an iterator interface for collections, and a corresponding for-each loop syntax for consuming the items through implicit interface calls.
436\CFA does not yet have a general-purpose form of such a feature, though it has a form that addresses some use cases.
437This section shows why the incumbent \CFA pattern does not work for linked lists and gives the alternative now offered by the linked-list library.
438Chapter 5 [TODO: deal with optimism here] presents a design that satisfies both uses and accommodates even more complex collections.
439
440The current \CFA extensible loop syntax is:
441\begin{cfa}
442for( elem; end )
443for( elem; begin ~ end )
444for( elem; begin ~ end ~ step )
445\end{cfa}
446Many derived forms of @begin ~ end@ exist, but are used for defining numeric ranges, so they are excluded from the linked-list discussion.
447These three forms are rely on the iterative trait:
448\begin{cfa}
449forall( T ) trait Iterate {
450 void ?{}( T & t, zero_t );
451 int ?<?( T t1, T t2 );
452 int ?<=?( T t1, T t2 );
453 int ?>?( T t1, T t2 );
454 int ?>=?( T t1, T t2 );
455 T ?+=?( T & t1, T t2 );
456 T ?+=?( T & t, one_t );
457 T ?-=?( T & t1, T t2 );
458 T ?-=?( T & t, one_t );
459}
460\end{cfa}
461where @zero_t@ and @one_t@ are constructors for the constants 0 and 1.
462The simple loops above are abbreviates for:
463\begin{cfa}
464for( typeof(end) elem = @0@; elem @<@ end; elem @+=@ @1@ )
465for( typeof(begin) elem = begin; elem @<@ end; elem @+=@ @1@ )
466for( typeof(begin) elem = @0@; elem @<@ end; elem @+=@ @step@ )
467\end{cfa}
468which use a subset of the trait operations.
469The shortened loop works well for iterating a number of times or through an array.
470\begin{cfa}
471for ( 20 ) // 20 iterations
472for ( i: 1 ~= 21 ~ 2 ) // odd numbers
473for ( i; n ) total += a[i]; // subscripts
474\end{cfa}
475which is similar to other languages, like JavaScript.
476\begin{cfa}
477for ( i in a ) total += a[i];
478\end{cfa}
479Albeit with different mechanisms for expressing the array's length.
480It might be possible to take the \CC iterator:
481\begin{c++}
482for ( list<int>::iterator it=mylist.begin(); it != mylist.end(); ++it )
483\end{c++}
484and convert it to the \CFA form
485\begin{cfa}
486for ( it; begin() ~= end() )
487\end{cfa}
488by having a list operator @<=@ that just looks for equality, and @+=@ that moves to the next node, \etc.
489
490However, the list usage is contrived, because a list does use its data values for relational comparison, only links for equality comparison.
491Hence, the focus of a list iterator's stopping condition is fundamentally different.
492So, iteration of a linked list via the existing loop syntax is to ask whether this syntax can also do double-duty for iterating values.
493That is, to be an analog of JavaScript's @for..of@ syntax:
494\begin{cfa}
495for ( e of a ) total += e;
496\end{cfa}
497
498The \CFA team will likely implement an extension of this functionality that moves the @~@ syntax from being part of the loop, to being a first-class operator (with associated multi-pace operators for the elided derived forms).
499With this change, both @begin ~ end@ and @end@ (in context of the latter ``two-place for'' expression) parse as \emph{ranges}, and the loop syntax becomes, simply:
500\begin{cfa}
501 for( elem; rangeExpr )
502\end{cfa}
503The expansion and underlying API are under discussion.
504TODO: explain pivot from ``is at done?'' to ``has more?''
505Advantages of this change include being able to pass ranges to functions, for example, projecting a numerically regular subsequence of array entries, and being able to use the loop syntax to cover more collection types, such as looping over the keys of a hashtable.
506
507When iterating an empty list, the question, ``Is there a further element?'' needs to be posed once, receiving the answer, ``no.''
508When iterating an $n$-item list, the same question gets $n$ ``yes'' answers (one for each element), plus one ``no'' answer, once there are no more elements; the question is posed $n+1$ times.
509
510When iterating an empty list, the question, ``What is the value of the current element?'' is never posed, nor is the command, ``Move to the next element,'' issued. When iterating an $n$-item list, each happens $n$ times.
511
512So, asking about the existence of an element happens once more than retrieving an element's value and advancing the position.
513
514Many iteration APIs deal with this fact by splitting these steps across different functions, and relying on the user's knowledge of iterator state to know when to call each. In Java, the function @hasNext@ should be called $n+1$ times and @next@ should be called $n$ times (doing the double duty of advancing the iteration and returning a value). In \CC, the jobs are split among the three actions, @it != end@, @it++@ and @*it@, the latter two being called one time more than the first.
515
516TODO: deal with simultaneous axes: @DLINK_VIA@ just works
517
518TODO: deal with spontaneous simultaneity, like a single-axis req, put into an array: which ``axis'' is @&req++@ navigating: array-adjacency vs link dereference. It should sick according to how you got it in the first place: navigating dlist(req, req.pri) vs navigating array(req, 42). (prob. future work)
519\end{comment}
520
521
522\section[C++ Lists]{\CC Lists}
523
524It is worth addressing two API issues in \CC lists avoided in \CFA.
525First, \CC lists require two steps to remove a node versus one in \CFA.
526\begin{cquote}
527\begin{tabular}{@{}ll@{}}
528\multicolumn{1}{c}{\CC} & \multicolumn{1}{c}{\CFA} \\
529\begin{c++}
530list<node> li;
531node n = li.first(); // assignment could raise exception
532li.pop_front();
533\end{c++}
534&
535\begin{cfa}
536dlist(node) list;
537node n = remove_first( list );
538
539\end{cfa}
540\end{tabular}
541\end{cquote}
542The argument for two steps is exception safety: returning an unknown T by value/move might throw an exception from T's copy/move constructor.
543Hence, to be \emph{exception safe}, all internal list operations must complete before the copy/move so the list is consistent should the return fail.
544This coding style can result in contrived code, but is usually possible;
545however, it requires the container designer to anticipate the potential throw.
546(Note, this anticipation issue is pervasive in exception systems, not just with containers.)
547The solution moves the coding complexity from the container designer to the programer~\cite[ch~10, part 3]{Sutter99}.
548First, obtain the node, which might fail, but the container is unmodified.
549Second, remove the node, which modifies the container without the possibly of an exception.
550This movement of responsibility increases the cognitive effort for programmers.
551Unfortunately, this \emph{single-responsibility principle}, \ie preferring separate operations, is often repeated as a necessary requirement rather an an optional one.
552Separate operations should always be available, but their composition should also be available.
553Interestingly, this issue does not apply to intrusive lists, because the node data is never copied/moved in or out of a list;
554only the link fields are accessed in list operations.
555
556Second, \VRef[Figure]{f:CCvsCFAListIssues} shows an example where a \CC list operation is $O(N$) rather than $O(1)$ in \CFA.
557This issue is inherent to wrapped (non-intrusive) lists.
558Specifically, to remove a node requires access to the links that materialize its membership.
559In a wrapped list, there is no access from node to links, and for abstract reasons, no direct pointers to wrapped nodes, so the links must be found indirectly by navigating the list.
560The \CC iterator is the abstraction to navigate wrapped links.
561So an iterator is needed, not because it offers go-next, but for managing the list membership.
562Note, attempting to keep an array of iterators to each node requires high-complexity to ensure the list and array are harmonized.
563
564\begin{figure}
565\begin{tabular}{@{}ll@{}}
566\multicolumn{1}{c}{\CC} & \multicolumn{1}{c}{\CFA} \\
567\begin{c++}
568list<node *> list;
569node n1 = { 1 }, n2 = { 2 }, n3 = { 3 }, n4 = { 4 };
570list.push_back( &n1 ); list.push_back( &n2 );
571list.push_back( &n3 ); list.push_back( &n4 );
572list<node *>::iterator it;
573for ( it = list.begin(); it != list.end(); it ++ )
574 if ( *it == &n3 ) { dl.erase( it ); break; }
575\end{c++}
576&
577\begin{cfa}
578dlist(node) list;
579node n1 = { 1 }, n2 = { 2 }, n3 = { 3 }, n4 = { 4 };
580insert( list, n1, n2, n3, n4 );
581
582
583
584remove( list, n3 );
585\end{cfa}
586\end{tabular}
587\caption{\CC \vs \CFA List Issues}
588\label{f:CCvsCFAListIssues}
589\end{figure}
590
591\begin{comment}
592Dave Dice:
593There are what I'd call the "dark days" of C++, where the language \& libraries seemed to be getting progressively uglier - requiring more tokens to express something simple, and lots of arcana.
594But over time, somehow, they seem to have mostly righted the ship and now I can write C++ code that's fairly terse, like python, by ignoring all the old constructs.
595(They carry around the legacy baggage, of course, but seemed to have found a way to evolve away from it).
596
597If you just want to traverse a std::list, then, using modern "for" loops, you never need to see an iterator.
598I try hard never to need to write X.begin() or X.end().
599(There are situations where I'll expose iterators for my own types, however, to enable modern "for" loops).
600If I'm implementing simple linked lists, I'll usually skip std:: collections and do it myself, as it's less grief.
601I just don't get that much advantage from std::list. And my code is certainly not any shorter.
602On the other hand, @std::map@, @unordered_map@, @set@, and friends, are terrific, and I can usually still avoid seeing any iterators, which are blight to the eye.
603So those are a win and I get to move up an abstraction level, and write terse but easily understood code that still performs well.
604(One slight concern here is that all the C++ collection/container code is templated and lives in include files, and not traditional libraries.
605So the only way to replace something - say with a better algorithm, or you have a bug in the collections code - is to boil the oceans and recompile everything.
606But on the other hand the compiler can specialize to the specific use case, which is often a nice performance win).
607
608And yeah, the method names are pretty terrible. I think they boxed themselves in early with a set of conventions that didn't age well, and they tried to stick with it and force regularity over the collection types.
609
610I've never seen anything written up about the history, although lots of the cppcon talks make note of the "bad old days" idea and how collection \& container library design evolved. The issue is certainly recognized.
611\end{comment}
612
613
614\section{Implementation}
615
616\VRef[Figure]{fig:lst-impl-links} continues the running @req@ example, showing the \CFA list library's internal representation.
617The @dlink@ structure contains exactly two pointers: @next@ and @prev@, which are opaque.
618Even though the user-facing list model is linear, the \CFA library implements all listing as circular.
619This choice helps achieve uniform end treatment.
620% and \PAB{TODO finish summarizing benefit}.
621A link pointer targets a neighbouring @dlink@ structure, rather than a neighbouring @req@.
622(Recall, the running example has the user putting a @dlink@ within a @req@.)
623
624\begin{figure}
625 \centering
626 \includegraphics[width=\textwidth]{lst-impl-links.pdf}
627 \caption{
628 \CFA list library representations for headed and headless lists.
629 }
630 \label{fig:lst-impl-links}
631\end{figure}
632
633Circular link-pointers (dashed lines) are tagged internally in the pointer to indicate linear endpoints.
634Links among neighbour nodes are not tagged.
635Iteration reports ``has more elements'' when accessing untagged links, and ``no more elements'' when accessing tagged links.
636Hence, the tags are set on the links that a user cannot navigate.
637
638The \CFA library works in headed and headless modes.
639In a headed list, the list head (@dlist(req)@) acts as an extra element in the implementation-level circularly-linked list.
640The content of a @dlist@ header is a (private) @dlink@, with the @next@ pointer to the first element, and the @prev@ pointer to the last element.
641Since the head wraps a @dlink@, as does @req@, and since a link-pointer targets a @dlink@, the resulting cycle is among @dlink@ structures, situated inside of header/node.
642An untagged pointer points within a @req@, while a tagged pointer points within a list head.
643In a headless list, the circular backing list is only among @dlink@s within @req@s.
644
645No distinction is made between an unlisted node (top left and middle) under a headed model and a singleton list under a headless model.
646Both are represented as an item referring to itself, with both tags set.
647
648
649\section{Assessment}
650\label{toc:lst:assess}
651
652This section examines the performance of the discussed list implementations.
653The goal is to show the \CFA lists are competitive with other designs, but the different list designs may not have equivalent functionality, so it is impossible to select a winner encompassing both functionality and execution performance.
654
655
656\subsection{Add-Remove Performance}
657\label{s:AddRemovePerformance}
658
659The fundamental job of a linked-list library is to manage the links that connect nodes.
660Any link management is an action that causes pair(s) of elements to become, or cease to be, adjacent in the list.
661Thus, adding and removing an element are the sole primitive actions.
662
663Repeated adding and removing is necessary to measure timing because these operations can be as simple as a dozen instructions.
664These instruction sequences may have cases that proceed (in a modern, deep pipeline) without a stall.
665
666This experiment takes the position that:
667\begin{itemize}[leftmargin=*]
668 \item The total time to add and remove is relevant \vs individual times for add and remove.
669 Adds without removes quickly fill memory;
670 removes without adds is meaningless.
671 \item A relevant breakdown ``by operation'' is:
672 \begin{description}
673 \item[movement]
674 Is the add/remove applied to a stack, queue, or something else?
675 \item[polarity]
676 In which direction does the action apply?
677 For a queue, do items flow from first to last or last to first?
678 For a stack, is the first or last end used for adding and removing?
679 \item[accessor]
680 Is an add/remove location given by a list head's first/last, or by a reference to an individual element?
681 \end{description}
682 \item Speed differences caused by the host machine's memory hierarchy need to be identified and explained,
683 but do not represent advantages of one linked-list implementation over another.
684\end{itemize}
685
686The experiment used to measure insert/remove cost measures the mean duration of a sequence of additions and removals.
687Confidence bounds, on this mean, are discussed.
688The distribution of speeds experienced by an individual add-remove pair (tail latency) is not discussed.
689Space efficiency is shown only indirectly, by way of caches' impact on speed.
690The experiment is sensitive enough to show:
691\begin{itemize}
692 \item intrusive lists performing (majorly) differently than wrapped lists,
693 \item a space of (minor) performance differences among the intrusive lists.
694\end{itemize}
695
696
697\subsubsection{Experiment setup}
698\label{s:ExperimentSetup}
699
700The experiment driver defines a (intrusive) node type:
701\begin{cfa}
702struct Node {
703 int i, j, k; // fields
704 // possible intrusive links
705};
706\end{cfa}
707and considers the speed of building and tearing down a list of $n$ instances of it.
708% A number of experimental rounds per clock check is precalculated to be appropriate to the value of $n$.
709\begin{cfa}
710// simplified harness: CFA implementation,
711// stack movement, insert-first polarity, head-mediated access
712size_t totalOpsDone = 0;
713dlist( node_t ) lst;
714node_t nodes[ n ]; $\C{// preallocated list storage}$
715startTimer();
716while ( CONTINUE ) { $\C{// \(\approx\) 20 second duration}$
717 for ( i; n ) insert_first( lst, nodes[i] ); $\C{// build up}$
718 for ( i; n ) remove_first( lst ); $\C{// tear down}$
719 totalOpsDone += n;
720}
721stopTimer();
722reportedDuration = getTimerDuration() / totalOpsDone; // throughput per insert/remove operation
723\end{cfa}
724To reduce administrative overhead, the $n$ nodes for each experiment list are preallocated in an array (on the stack), which removes dynamic allocations for this storage.
725These nodes contain an intrusive pointer for intrusive lists, or a pointer to it is stored in a dynamically-allocated internal-node for a wrapper list.
726Copying the node for wrapped lists skews the results with administration costs.
727The list insertion/removal operations are repeated for a typical 20+ second duration.
728After each round, a counter is incremented by $n$ (for throughput).
729Time is measured outside the loop because a large $n$ can overrun the time duration before the @CONTINUE@ flag is tested.
730Hence, there is minimum of one outer (@CONTINUE@) loop iteration for large lists.
731The loop duration is divided by the counter and this throughput is reported.
732In a scatter-plot, each dot is one throughput, which means insert + remove + harness overhead.
733The harness overhead is constant when comparing linked-list implementations and keep as small as possible.
734% The remainder of the setup section discusses the choices that affected the harness overhead.
735
736To test list operations, the experiment performs the inserts/removes in different patterns, \eg insert and remove from front, insert from front and remove from back, random insert and remove, \etc.
737Unfortunately, the @std::list@ does \emph{not} support direct insert/remove from a node without an iterator, \ie no @erase( node )@, even though the list is doubly-linked.
738To eliminate the iterator, a trick is used for random insertions without replacement, which takes advantage of the array nature of the nodes.
739The @i@ fields in each node are initialized from @0..n-1@.
740These @i@ values are then shuffled in the nodes, and the @i@ value is used to represent an indirection to that node for insertion.
741Hence, the nodes are inserted in random order and removed in the same random order.
742$\label{p:Shuffle}$
743\begin{cfa}
744 for ( i; n ) @nodes[i].i = i@; $\C[3.25in]{// indirection}$
745 shuffle( nodes, n ); $\C{// random shuffle indirects within nodes}$
746
747 while ( CONTINUE ) {
748 for ( i; n ) {
749 node_t & temp = nodes[ nodes[i].i ]; $\C{// select random node in array}$
750 @temp.j = 0;@ $\C{// only touch random node for wrapped nodes}$
751 insert_first( lst, temp ); $\C{// build up}$
752 }
753 for ( i; n ) pass( &remove_first( lst ) ); $\C{// tear down}\CRT$
754 totalOpsDone += n;
755 }
756\end{cfa}
757Note, insertion is traversing the list of nodes linearly, @node[i]@.
758For intrusive lists, the inserted (random) node is always touched because its link fields are read/written for insertion into the list.
759Hence, the array of nodes is being accessed both linearly and randomly during the traversal.
760For wrapped lists, the wrapped nodes are traversed linearly but the random node is not accessed, only a pointer to it is inserted into the linearly accessed wrapped node.
761Hence, the traversal is the same as the non-random traversal above.
762To level the experiments, an explicit access to the random node is inserted after the insertion, @temp.j = 0@, for the wrapped experiment.
763Furthermore, it is rare to insert/remove nodes and not access them.
764
765% \emph{Interleaving} allows for movements other than pure stack and queue.
766% Note that the earlier example of using the iterators' array is still a pure stack: the item selected for @erase(...)@ is always the first.
767% Including a less predictable movement is important because real applications that justify doubly linked lists use them.
768% Freedom to remove from arbitrary places (and to insert under more relaxed assumptions) is the characteristic function of a doubly linked list.
769% A queue with drop-out is an example of such a movement.
770% A list implementation can show unrepresentative speed under a simple movement, for example, by enjoying unchallenged ``Is first element?'' branch predictions.
771
772% Interleaving brings ``at middle of list'' cases into a stream of add or remove invocations, which would otherwise be exclusively ``at end''.
773% A chosen split, like half middle and half end, populates a boolean array, which is then shuffled.
774% These booleans then direct the action to end-\vs-middle.
775%
776% \begin{cfa}
777% // harness (bookkeeping and shuffling elided): CFA implementation,
778% // stack movement, insert-first polarity, interleaved element-based remove access
779% dlist( item_t ) lst;
780% item_t items[ n ];
781% @bool interl[ n ];@ // elided: populate with weighted, shuffled [0,1]
782% while ( CONTINUE ) {
783% item_t * iters[ n ];
784% for ( i; n ) {
785% insert_first( items[i] );
786% iters[i] = & items[i];
787% }
788% @item_t ** crsr[ 2 ]@ = { // two cursors into iters
789% & iters[ @0@ ], // at stack-insert-first's removal end
790% & iters[ @n / interl_frac@ ] // in middle
791% };
792% for ( i; n ) {
793% item *** crsr_use = & crsr[ interl[ i ] ]@;
794% remove( *** crsr_use ); // removing from either middle or end
795% *crsr_use += 1; // that item is done
796% }
797% assert( crsr[0] == & iters[ @n / interl_frac@ ] ); // through second's start
798% assert( crsr[1] == & iters[ @n@ ] ); // did the rest
799% }
800% \end{cfa}
801%
802% By using the pair of cursors, the harness avoids branches, which could incur prediction stall times themselves, or prime a branch in the SUT.
803% This harness avoids telling the hardware what the SUT is about to do.
804
805The comparator linked-list implementations are:
806\begin{description}
807\item[std::list] The @list@ type of g++.
808\item[lq-list] The @list@ type of LQ from glibc of gcc.
809\item[lq-tailq] The @tailq@ type of the same.
810\item[upp-upp] \uCpp provided @uSequence@
811\item[cfa-cfa] \CFA's @dlist@
812\end{description}
813
814
815\subsubsection{Execution Environment}
816\label{s:ExperimentalEnvironment}
817
818The performance experiments are run on:
819\begin{description}[leftmargin=*,topsep=3pt,itemsep=2pt,parsep=0pt]
820%\item[PC]
821%with a 64-bit eight-core AMD FX-8370E, with ``taskset'' pinning to core \#6. The machine has 16 GB of RAM and 8 MB of last-level cache.
822%\item[ARM]
823%Gigabyte E252-P31 128-core socket 3.0 GHz, WO memory model
824\item[AMD]
825Supermicro AS--1125HS--TNR EPYC 9754 128--core socket, hyper-threading $\times$ 2 sockets (512 processing units) 2.25 GHz, TSO memory model, with cache structure 32KB L1i/L1d, 1024KB L2, 16MB L3, where each L3 cache covers 1 NUMA node and 8 cores (16 processors).
826\item[Intel]
827Supermicro SYS-121H-TNR Xeon Gold 6530 32--core, hyper-threading $\times$ 2 sockets (128 processing units) 2.1 GHz, TSO memory model, with cache structure 32KB L1i/L1d, 20248KB L2, 160MB L3, where each L3 cache covers 2 NUMA node and 32 cores (64 processors).
828\end{description}
829The experiments are single threaded and pinned to single core to prevent any OS movement, which might cause cache or NUMA effects perturbing the experiment.
830
831The compiler is gcc/g++-14.2.0 running on the Linux v6.8.0-52-generic OS.
832Switching between the default memory allocators @glibc@ and @llheap@ is done with @LD_PRELOAD@.
833To prevent eliding certain code patterns, crucial parts of a test are wrapped by the function @pass@
834\begin{cfa}
835// prevent eliding, cheaper than volatile
836static inline void * pass( void * v ) { __asm__ __volatile__( "" : "+r"(v) ); return v; }
837...
838pass( &remove_first( lst ) ); // wrap call to prevent elision, insert cannot be elided now
839\end{cfa}
840The call to @pass@ can prevent a small number of compiler optimizations but this cost is the same for all lists.
841
842
843\subsection{Result: Coarse comparison of styles}
844
845This comparison establishes how an intrusive list performs compared with a wrapped-reference list.
846\VRef[Figure]{fig:plot-list-zoomout} presents throughput at various list lengths for a linear and random (shuffled) insert/remove test.
847Other kinds of scans were made, but the results are similar in many cases, so it is sufficient to discuss these two scans, representing difference ends of the access spectrum.
848In the graphs, all four intrusive lists (lq-list, lq-tailq, upp-upp, cfa-cfa, see end of \VRef{s:ExperimentSetup}) are plotted with the same symbol;
849sometimes theses symbols clump on top of each other, showing the performance difference among intrusive lists is small in comparison to the wrapped list (std::list).
850See~\VRef{s:ComparingIntrusiveImplementations} for details among intrusive lists.
851
852The list lengths start at 10 due to the short insert/remove times of 2--4 ns, for intrusive lists, \vs STL's wrapped-reference list of 15--20 ns.
853For very short lists, like 4, the experiment time of 4 $\times$ 2.5 ns and experiment overhead (loops) of 2--4 ns, results in an artificial administrative bump at the start of the graph having nothing to do with the insert/remove times.
854As the list size grows, the administrative overhead for intrusive lists quickly disappears.
855
856\begin{figure}
857 \centering
858 \setlength{\tabcolsep}{0pt}
859 \begin{tabular}{p{0.75in}p{2.75in}p{3in}}
860 &
861 \subfloat[Linear List Nodes, AMD]{\label{f:Linear-swift}
862 \hspace*{-0.75in}
863 \includegraphics{plot-list-zoomout-noshuf-swift.pdf}
864 } % subfigure
865 &
866 \subfloat[Linear List Nodes, Intel]{\label{f:Linear-java}
867 \includegraphics{plot-list-zoomout-noshuf-java.pdf}
868 } % subfigure
869 \\
870 &
871 \subfloat[Random List Nodes, AMD]{\label{f:Random-swift}
872 \hspace*{-0.75in}
873 \includegraphics{plot-list-zoomout-shuf-swift.pdf}
874 } % subfigure
875 &
876 \subfloat[Random List Nodes, Intel]{\label{f:Random-java}
877 \includegraphics{plot-list-zoomout-shuf-java.pdf}
878 } % subfigure
879 \end{tabular}
880 \caption{Insert/remove duration \vs list length.
881 Lengths go as large possible without error.
882 One example operation is shown: stack movement, insert-first polarity and head-mediated access. Lower is better.}
883 \label{fig:plot-list-zoomout}
884\end{figure}
885
886The key performance factor between the intrusive and the wrapped-reference lists is the dynamic allocation for the wrapped nodes.
887Hence, this experiment is largely measuring the cost of @malloc@/\-@free@ rather than insert/remove, and is sensitive to the layout of memory by the allocator.
888For insert/remove of an intrusive list, the cost is manipulating the link fields, which is seen by the relatively similar results for the different intrusive lists.
889For insert/remove of a wrapped-reference list, the costs are: dynamically allocating/deallocating a wrapped node, copying a external-node pointer into the wrapped node for insertion, and linking the wrapped node to/from the list;
890the allocation dominates these costs.
891For example, the experiment was run with both glibc and llheap memory allocators, where llheap performance reduced the cost from 20 to 16 ns, still far from the 2--4 ns for linking an intrusive node.
892Hence, there is no way to tease apart the allocation, copying, and linking costs for wrapped lists, as there is no way to preallocate the list nodes without writing a mini-allocator to manage that storage.
893
894In detail, \VRef[Figure]{f:Linear-swift}--\subref*{f:Linear-java} shows linear insertion of all the nodes and then linear removal, both in the same direction.
895For intrusive lists, the nodes are adjacent in memory from being preallocated in an array.
896For wrapped lists, the wrapped nodes happen to be adjacent because the memory allocator uses bump allocation during the initial phase of allocation.
897As a result, these memory layouts result in high spatial and temporal locality for both kinds of lists during the linear array traversal.
898With address look-ahead, the hardware does an excellent job of managing the multi-level cache.
899Hence, performance is largely constant for both kinds of lists, until L3 cache and NUMA boundaries are crossed for longer lists and the costs increase consistently for both kinds of lists.
900For example, on AMD (\VRef[Figure]{f:Linear-swift}), there is one NUMA node but many small L3 caches, so performance slows down quickly as multiple L3 caches come into play, and remains constant at that level, except for some anomalies for very large lists.
901On Intel (\VRef[Figure]{f:Linear-java}), there are four NUMA nodes and four slowdown steps as list-length increase.
902At each step, the difference between the kinds of lists decreases as the NUMA effect increases.
903
904In detail, \VRef[Figure]{f:Random-swift}--\subref*{f:Random-java} shows random insertion and removal of the nodes.
905As for linear, there is the issue of memory allocation for the wrapped list.
906As well, the consecutive storage-layout is the same (array and bump allocation).
907Hence, the difference is the random linking among nodes, resulting in random accesses, even though the list is traversed linearly, resulting in similar cache events for both kinds of lists.
908Both \VRef[Figures]{f:Random-swift}--\subref*{f:Random-java} show the slowdown of random access as the list-length grows resulting from stepping out of caches into main memory and crossing NUMA nodes.
909% Insert and remove operations act on both sides of a link.
910%Both a next unlisted item to insert (found in the items' array, seen through the shuffling array), and a next listed item to remove (found by traversing list links), introduce a new user-item location.
911As for linear, the Intel (\VRef[Figure]{f:Random-java}) graph shows steps from the four NUMA nodes.
912Interestingly, after $10^6$ nodes, intrusive lists are slower than wrapped.
913I did not have time to track down this anomaly, but I speculate it results from the difference in touching the data in the accessed node, as the data and links are together for intrusive and separated for wrapped.
914For the llheap memory-allocator and the two tested architectures, intrusive lists out perform wrapped lists up to size $10^3$ for both linear and random, and performance begins to converge around $10^6$ nodes as architectural issues begin to dominate.
915Clearly, memory allocator and hardware architecture plays a large factor in the total cost and the crossover points as list-size increases.
916% In an odd scenario where this intuition is incorrect, and where furthermore the program's total use of the memory allocator is sufficiently limited to yield approximately adjacent allocations for successive list insertions, a non-intrusive list may be preferred for lists of approximately the cache's size.
917
918The takeaway from this experiment is that wrapped-list operations are expensive because memory allocation is expense at this fine-grained level of execution.
919Hence, when possible, using intrusive links can produce a significant performance gain, even if nodes must be dynamically allocated, because the wrapping allocations are eliminated.
920Even when space is a consideration, intrusive links may not use more storage if a node is often linked.
921Unfortunately, many programmers are unaware of intrusive lists for dynamically-sized data-structures or their tool-set does not provide them.
922
923% Note, linear access may not be realistic unless dynamic size changes may occur;
924% if the nodes are known to be adjacent, use an array.
925
926% In a wrapped-reference list, list nodes are allocated separately from the items put into the list.
927% Intrusive beats wrapped at the smaller lengths, and when shuffling is avoided, because intrusive avoids dynamic memory allocation for list nodes.
928
929% STL's performance is not affected by element order in memory.
930%The field of intrusive lists begins with length-1 operations costing around 10 ns and enjoys a ``sweet spot'' in lengths 10--100 of 5--7-ns operations.
931% This much is also unaffected by element order.
932% Beyond this point, shuffled-element list performance worsens drastically, losing to STL beyond about half a million elements, and never particularly leveling off.
933% In the same range, an unshuffled list sees some degradation, but holds onto a 1--2 $\times$ speedup over STL.
934
935% The apparent intrusive ``sweet spot,'' particularly its better-than-length-1 speed, is not because of list operations truly running faster.
936% Rather, the worsening as length decreases reflects the per-operation share of harness overheads incurred at the outer-loop level.
937% Disabling the harness's ability to drive interleaving, even though the current scenario is using a ``never work in middle'' interleave, made this rise disappear.
938% Subsequent analyses use length-controlled relative performance when comparing intrusive implementations, making this curiosity disappear.
939
940% The remaining big-swing comparison points say more about a computer's memory hierarchy than about linked lists.
941% The tests in this chapter are only inserting and removing.
942% They are not operating on any user payload data that is being listed.
943% The drastic differences at large list lengths reflect differences in link-field storage density and in correlation of link-field order to element order.
944% These differences are inherent to the two list models.
945
946% A wrapped-reference list's separate nodes are allocated right beside each other in this experiment, because no other memory allocation action is happening.
947% As a result, the interlinked nodes of the STL list are generally referencing their immediate neighbours.
948% This pattern occurs regardless of user-item shuffling because this test's ``use'' of the user-items' array is limited to storing element addresses.
949% This experiment, driving an STL list, is simply not touching the memory that holds the user data.
950% Because the interlinked nodes, being the only touched memory, are generally adjacent, this case too has high memory locality and stays fast.
951
952% But the comparison of unshuffled intrusive with wrapped-reference gives the performance of these two styles, with their the common impediment of overfilling the cache removed.
953% Intrusive consistently beats wrapped-reference by about 20 ns, at all sizes.
954% This difference is appreciable below list length 0.5 M, and enormous below 10 K.
955
956
957\subsection{Result: Comparing intrusive implementations}
958\label{s:ComparingIntrusiveImplementations}
959
960The preceding result shows the intrusive implementations have better performance to the wrapped lists for small to medium sized lists.
961This analysis covers the experiment position taken in \VRef{s:AddRemovePerformance} for movement, polarity, and accessor.
962\VRef[Figure]{f:ExperimentOperations} shows the experiment operations tested, which results in 12 experiments for comparing intrusive implementations.
963To preclude hardware interference, only list sizes below 150 are examined to differentiate among the intrusive implementations,
964The data is selected from the start of \VRef[Figures]{f:Linear-swift}--\subref*{f:Linear-java}, but the start of \VRef[Figures]{f:Random-swift}--\subref*{f:Random-java} is largely the same.
965
966\begin{figure}
967\centering
968\setlength{\tabcolsep}{8pt}
969\begin{tabular}{@{}ll@{}}
970\begin{tabular}{@{}c|c|c@{}}
971movement & polarity & accessor \\
972\hline
973\hline
974stack &
975 \begin{tabular}{@{}l@{}}
976 insert-first \\
977 \hline
978 insert-last
979 \end{tabular}
980 &
981 \begin{tabular}{@{}l@{}}
982 insert-head / remove-head \\
983 \hline
984 insert-list / remove-head \\
985 \hline
986 insert-head / remove-list
987 \end{tabular}
988 \\
989\hline
990queue &
991 \begin{tabular}{@{}l@{}}
992 insert-first \\
993 \hline
994 insert-last
995 \end{tabular}
996 &
997 \begin{tabular}{@{}l@{}}
998 insert-head / remove-head \\
999 \hline
1000 insert-list / remove-head \\
1001 \hline
1002 insert-head / remove-list
1003 \end{tabular}
1004\end{tabular}
1005&
1006 \setlength{\tabcolsep}{3pt}
1007 \small
1008 \begin{tabular}{@{}ll@{}}
1009 I: & stack, insert first, I-head / R-head \\
1010 II: & stack, insert first, I-list / R-head \\
1011 III:& stack, insert first, I-head / R-list \\
1012 IV: & stack, insert last, I-head / R-head \\
1013 V: & stack, insert last, I-list / R-head \\
1014 VI: & stack, insert last, I-head / R-list \\
1015 VII:& queue, insert first, I-head / R-head \\
1016 VIII:& queue, insert first, I-list / R-head \\
1017 IX: & queue, insert first, I-head / R-list \\
1018 X: & queue, insert last, I-head / R-head \\
1019 XI: & queue, insert last, iI-list / R-head \\
1020 XII: & queue, insert last, I-head / R-list \\
1021 \end{tabular}
1022\end{tabular}
1023\caption{Experiment Operations}
1024\label{f:ExperimentOperations}
1025\end{figure}
1026
1027\VRef[Figure]{fig:plot-list-1ord} gives the first-order effects.
1028Its first breakdown, Machine--Size-Zone, shows the effects of an insert/remove's physical situation.
1029The Intel runs faster than the AMD; the small zone runs faster than the medium zone.
1030The size effect is more pronounced on the AMD than it is on the Intel.
1031
1032\begin{figure}
1033 \centering
1034 \includegraphics{plot-list-1ord.pdf}
1035 \caption{Histogram of operation durations, decomposed by all first-order effects.
1036 Each of the three breakdowns divides the entire population of test results into its mutually disjoint constituents.
1037 \MLB{missing: overlay of means}}
1038 \label{fig:plot-list-1ord}
1039\end{figure}
1040
1041These facts stated, you will not be chosing between these particular mahines or whether to run at one of these specific size zones.
1042The key takeaway from the physical comparison is the context it establishes for interpreting the framework comparison following.
1043Both the particulars of a the machine's cache design, and a list length's effect on the program's cache friendliness, affect add/remove speed in the manner illlustrated in this breakdown.
1044Specifically, a 20\% standard deviation exists here, between the means four physical-effect categories.
1045That is, if you are running on an unknown machine, at a scale above anomaly-prone individuals, and below where major LLC caching effects take over the general intrusive-list advantage, but with an unknown relationship to the sizing of your fickle low-level caches, you are likely to experience an unpredictable speed impact on the order of 20\%.
1046
1047A similar situation comes from \VRef[Figure]{fig:plot-list-1ord}'s second comparison, by operation type.
1048Specific interactions like framework X doing better on stacks do occur; a selection of them is addressed in \MLB{TODO: cross reference}.
1049But they are so irrelevant to the issue of picking a winning framework that it is sufficient here to number the operations opaquely.
1050Whether a given list implementation is suitable for a language's general library succeeds or fails without knowledge of whether your use will have stack or queue movement.
1051So you face another lottery, with a likely win-loss range of the standard deviation of the individual operations' means: 9\%.
1052
1053This context helps interpret \VRef[Figure]{fig:plot-list-1ord}'s final comparison, by framework.
1054In this result, \CFA runs similarly to \uCpp and LQ-@list@ runs similarly to @tailq@.
1055The standard deviation of the frameworks' means is 8\%.
1056Framework choice has, therefore, less impact on your speed than the lottery tickets you already hold.
1057
1058Now, the LQs do indeed beat the UW languages by 15\%, a fact explored further in \MLB{TODO: xref}.
1059But so too does operation VIII typically beat operation IV by 38\%.
1060As does a small size on the Intel typically beat a medium size on the AMD by 66\%.
1061Framework choice is simply not where you stand to win or lose the most.
1062
1063\MLB{ TODO: find a home for these original conclusions:
1064cfa-upp similarity holde for all halves by movement or polarity;
1065splitting on accessor, \CFA has a poor result on element removal, LQ-list has a great result on the other accessors, and uC++ is unaffected. }
1066
1067
1068\begin{figure}
1069\centering
1070 \setlength{\tabcolsep}{0pt}
1071 \begin{tabular}{p{0.75in}p{2.75in}p{3in}}
1072 &
1073 \subfloat[Operation I, AMD]{\label{f:AbsoluteTime-i-swift}
1074 \hspace*{-0.75in}
1075 \includegraphics{plot-list-zoomin-abs-i-swift.pdf}
1076 } % subfigure
1077 &
1078 \subfloat[Operation I, Intel]{\label{f:AbsoluteTime-i-java}
1079 \includegraphics{plot-list-zoomin-abs-i-java.pdf}
1080 } % subfigure
1081 \\
1082 &
1083 \subfloat[Operation VIII, AMD]{\label{f:AbsoluteTime-viii-swift}
1084 \hspace*{-0.75in}
1085 \includegraphics{plot-list-zoomin-abs-viii-swift.pdf}
1086 } % subfigure
1087 &
1088 \subfloat[Operation VIII, Intel]{\label{f:AbsoluteTime-viii-java}
1089 \includegraphics{plot-list-zoomin-abs-viii-java.pdf}
1090 } % subfigure
1091 \end{tabular}
1092 \caption{Operation duration \vs list length at small-medium lengths. Two example operations are shown: [I] stack movement with head-only access (plots a and b); [VIII] queue movement with element-oriented removal access (plots c and d); both operations use insert-first polarity. Lower is better.}
1093 \label{fig:plot-list-zoomin}
1094\end{figure}
1095
1096\VRef[Figure]{fig:plot-list-zoomin} shows the sizes below 150 blown up.
1097% The same scenario as the coarse comparison is used: a stack, with insertions and removals happening at the end called ``first,'' ``head'' or ``front,'' and all changes occurring through a head-provided insert/remove operation.
1098The error bars show fastest and slowest time seen on five trials, and the central point is the mean of the remaining three trials.
1099For readability, the points are slightly staggered at a given horizontal value, where the points might otherwise appear on top of each other.
1100The experiment runs twelve operating scenarios;
1101the ones chosen for their variety are scenarios I and VIII from the listing of \VRef[Figure]{fig:plot-list-mchn-szz}, and their results appear in the rows.
1102As in the previous experiment, each hardware architecture appears in a column.
1103Note that LQ-list does not support queue operations, so this framework is absent from Operation VIII.
1104
1105At lengths 1 and 2, winner patterns seen at larger sizes generally do not apply.
1106Indeed, an issue with \CFA giving terrible on queues at length 1 is evident in \VRef[Figure]{f:AbsoluteTime-viii-swift}.
1107This phenomenon is elaborated in \MLB{TODO: xref}.
1108For the remainder of this section, these sizes are disregarded.
1109
1110Even after the very-small anomalies, the selections of operation and machine significantly affect how speed responds to size.
1111For example, Operation I on the AMD (\VRef[Figure]{f:AbsoluteTime-i-swift}) has \CFA generally winning over LQ, while the opposite is seen by switching either to Operation VIII (\VRef[Figure]{f:AbsoluteTime-viii-swift}) or to the Intel (\VRef[Figure]{f:AbsoluteTime-i-java}).
1112For another, Operation I has sore spots at lengths in the middle for \uCpp on AMD and LQ-list on Intel, which resolve at larger lengths; yet no such pattern presents with Operation VIII.
1113
1114In spite of these complex interactions, a couple spots of stability can be analyzed.
1115In these examples, the size zones 4--16, ``small,'' and above 48, ``medium,'' covering four specific sizes apiece, each tends to show a simple winner/loser story.
1116Manual inspection of other such plots (not detailed) showed that this quality is generally upheld.
1117So these zones are used for basing comparison.
1118
1119\MLB{Peter, caution beyond here. I am reconsidering if this first-dismiss-physical approach to comparison is simplest.}
1120
1121\begin{figure}
1122 \centering
1123 \includegraphics{plot-list-mchn-szz.pdf}
1124 \caption{Histogram of operation durations, decomposed by physical factors.
1125 Measurements are included from only the sizes in the ``small'' and ``medium'' stable zones.
1126 This breakdown divides the entire population of test results into four mutually disjoint constituents.}
1127 \label{fig:plot-list-mchn-szz}
1128\end{figure}
1129
1130\VRef[Figure]{fig:plot-list-mchn-szz} shows the effects of the physical factors of size zone and machine.
1131Each of these four histograms shows variation in duration coming from the four specific sizes in a size zone, from combining results of all twelve operations and all four frameworks.
1132Among the means of the four histograms, there is a standard deviation of 0.9 ns, which is 20\% of the global mean.
1133This variability is due solely to physial factors.
1134
1135From the perspective of assessing winning/losing frameworks, these physical effects are noise.
1136So, subsequent analysis conditions on the phisical effects.
1137That is, it supposes you are put into an unknown physical situation (that is one of the four being tested), then presents all the ways your outcome could change as a result of non-physical factors, assuming that the physical situation is kept constant.
1138It does do by presenting results relative to the mean of the physical quadrant (\VRef[fig]{fig:plot-list-mchn-szz} histogram) to which it belogs.
1139With this adjustment, absolute duration values (in nonsecods) are lost.
1140In return, the physical quadrants are re-combined, enabling assessment of the non-physical factors.
1141
1142\begin{comment}
1143While preparing experiment results, I first tested on my old office PC, AMD FX-8370E Eight-Core, before switching to the large new server for final testing.
1144For this experiment, the results flipped in my favour when running on the server.
1145New CPU architectures are now amazingly good at branch prediction and micro-parallelism in the pipelines.
1146Specifically, on the PC, my \CFA and companion \uCpp lists are slower than lq-tail and lq-list by 10\% to 20\%.
1147On the server, \CFA and \uCpp lists are can be fast by up to 100\%.
1148Overall, LQ-tailq does the best at short lengths but loses out above a dozen elements.
1149\end{comment}
1150
1151% \begin{figure}
1152% \centering
1153% \begin{tabular}{c}
1154% \includegraphics{plot-list-cmp-intrl-shift.pdf} \\
1155% (a) \\
1156% \includegraphics{plot-list-cmp-intrl-outcome.pdf} \\
1157% (b) \\
1158% \end{tabular}
1159% \caption{Caption TODO}
1160% \label{fig:plot-list-cmp-intrl}
1161% \end{figure}
1162
1163\begin{comment}
1164\subsection{Result: \CFA cost attribution}
1165
1166This comparison loosely itemizes the reasons that the \CFA implementation runs 15--20\% slower than LQ.
1167Each reason provides for safer programming.
1168For each reason, a version of the \CFA list was measured that forgoes its safety and regains some performance.
1169These potential sacrifices are:
1170\newcommand{\mandhead}{\emph{mand-head}}
1171\newcommand{\nolisted}{\emph{no-listed}}
1172\newcommand{\noiter}{\emph{no-iter}}
1173\begin{description}
1174\item[mand(atory)-head] Removing support for headless lists.
1175 A specific explanation of why headless support causes a slowdown is not offered.
1176 But it is reasonable for a cost to result from making one piece of code handle multiple cases; the subset of the \CFA list API that applies to headless lists shares its implementation with headed lists.
1177 In the \mandhead case, disabling the feature in \CFA means using an older version of the implementation, from before headless support was added.
1178 In the pre-headless library, trying to form a headless list (instructing, ``Insert loose element B after loose element A,'') is a checked runtime error.
1179 LQ does not support headless lists\footnote{
1180 Though its documentation does not mention the headless use case, this fact is due to one of its insert-before or insert-after routines being unusable in every list model.
1181 For \lstinline{tailq}, the API requires a head.
1182 For \lstinline{list}, this usage causes an ``uncaught'' runtime crash.}.
1183\item[no-listed] Removing support for the @is_listed@ API query.
1184 Along with it goes error checking such as ``When inserting an element, it must not already be listed, \ie be referred to from somewhere else.''
1185 These abilities have a cost because, in order to support them, a listed element that is being removed must be written to, to record its change in state.
1186 In \CFA's representation, this cost is two pointer writes.
1187 To disable the feature, these writes, and the error checking that consumes their result, are put behind an @#ifdef@.
1188 The result is that a removed element sees itself as still having neighbours (though these quasi-neighbours see it differently).
1189 This state is how LQ leaves a removed element; LQ does not offer an is-listed query.
1190\item[no-iter(ation)] Removing support for well-terminating iteration.
1191 The \CFA list uses bit-manipulation tagging on link poiters (rather than \eg null links) to express, ``No more elements this way.''
1192 This tagging has the cost of submitting a retrieved value to the ALU, and awaiting this operation's completion, before dereferencing a link pointer.
1193 In some cases, the is-terminating bit is transferred from one link to another, or has a similar influence on a resulting link value; this logic adds register pressure and more data dependency.
1194 To disable the feature, the @#ifdef@-controlled tag manipulation logic compiles in answers like, ``No, that link is not a terminator,'' ``The dereferenceable pointer is the value you read from memory,'' and ``The terminator-marked value you need to write is the pointer you started with.''
1195 Without this termination marking, repeated requests for a next valid item will always provide a positive response; when it should be negative, the indicated next element is garbage data at an address unlikely to trigger a memory error.
1196 LQ has a well-terminating iteration for listed elements.
1197 In the \noiter case, the slowdown is not inherent; it represents a \CFA optimization opportunity.
1198\end{description}
1199\MLB{Ensure benefits are discussed earlier and cross-reference} % an LQ programmer must know not to ask, ``Who's next?'' about an unlisted element; an LQ programmer cannot write assertions about an item being listed; LQ requiring a head parameter is an opportunity for the user to provide inconsistent data
1200
1201\begin{figure}
1202\centering
1203 \begin{tabular}{c}
1204 \includegraphics{plot-list-cfa-attrib-swift.pdf} \\
1205 (a) \\
1206 \includegraphics{plot-list-cfa-attrib-remelem-swift.pdf} \\
1207 (b) \\
1208 \end{tabular}
1209 \caption{Operation duration ranges for functionality-reduced \CFA list implementations. (a) has the top level slices. (b) has the next level of slicing within the slower element-based removal operation.}
1210 \label{fig:plot-list-cfa-attrib}
1211\end{figure}
1212
1213\VRef[Figure]{fig:plot-list-cfa-attrib} shows the \CFA list performance with these features, and their combinations, turned on and off. When a series name is one of the three sacrifices above, the series is showing this sacrifice in isolation. These further series names give combinations:
1214\newcommand{\attribFull}{\emph{full}}
1215\newcommand{\attribParity}{\emph{parity}}
1216\newcommand{\attribStrip}{\emph{strip}}
1217\begin{description}
1218 \item[full] No sacrifices. Same as measurements presented earlier.
1219 \item[parity] \mandhead + \nolisted. Feature parity with LQ.
1220 \item[strip] \mandhead + \nolisted + \noiter. All options set to ``faster.''
1221\end{description}
1222All list implementations are \CFA, possibly stripped.
1223The plot uses the same LQ-relative basis as earlier.
1224So getting to zero means matching LQ's @tailq@.
1225
1226\VRef[Figure]{fig:plot-list-cfa-attrib}-(a) summarizes the time attribution across the main operating scenarios.
1227The \attribFull series is repeated from \VRef[Figure]{fig:plot-list-cmp-overall}, part (b), while the series showing feature sacrifices are new.
1228Going all the way to \attribStrip at least nearly matches LQ in all operating scenarios, beats LQ often, and slightly beats LQ overall.
1229Except within the accessor splits, both sacrifices contribute improvements individually, \noiter helps more than \attribParity, and the total \attribStrip benefit depends on both contributions.
1230When the accessor is not element removal, the \attribParity shift appears to be counterproductive, leaving \noiter to deliver most of the benefit.
1231For element removals, \attribParity is the heavy hitter, with \noiter contributing modestly.
1232
1233The counterproductive shift outside of element removals is likely due to optimization done in the \attribFull version after implementing headless support, \ie not present in the \mandhead version.
1234This work streamlined both head-based operations (head-based removal being half the work of the element-insertion test).
1235This improvement could be ported to a \mandhead-style implementation, which would bring down the \attribParity time in these cases.
1236
1237More significantly, missing this optimization affects every \attribParity result because they all use head-based inserts or removes for at least half their operations.
1238It is likely a reason that \attribParity is not delivering as well overall as \noiter.
1239It even represents plausible further improvements in \attribStrip.
1240
1241\VRef[Figure]{fig:plot-list-cfa-attrib}-(b) addresses element removal being the overall \CFA slow spot and element removal having a peculiar shape in the (a) analysis.
1242Here, the \attribParity sacrifice bundle is broken out into its two constituents.
1243The result is the same regardless of the operation.
1244All three individual sacrifices contribute noteworthy improvements (\nolisted slightly less).
1245The fullest improvement requires all of them.
1246
1247The \noiter feature sacrifice is unpalatable.
1248But because it is not an inherent slowdown, there may be room to pursue a \noiter-level speed improvement without the \noiter feature sacrifice.
1249The performance crux for \noiter is the pointer-bit tagging scheme.
1250Alternative designs that may offer speedup with acceptable consequences include keeping the tag information in a separate field, and (for 64-bit architectures) keeping it in the high-order byte \ie using byte- rather than bit-oriented instructions to access it.
1251The \noiter speed improvement would bring \CFA to +5\% of LQ overall, and from high twenties to high teens, in the worst case of element removal.
1252
1253Ultimately, this analysis provides options for a future effort that needs to get the most speed out of the \CFA list.
1254
1255\end{comment}
1256
1257
1258\section{Future Work}
1259\label{toc:lst:futwork}
1260
1261Not discussed in the chapter is a \CFA type-system issue with Plan-9 @inline@ declarations, implementing the trait @embedded@ to access the contained @dlist@ link-fields.
1262This trait defines an implicit conversion from derived to base (the safe direction).
1263Such a conversion exists implicitly for concrete types using the Plan-9 inheritance feature.
1264However, this implicit conversion is only partially implemented for polymorphism types, such as @dlist@, which prevents the straightforward list interface shown throughout the chapter.
1265
1266My workaround is the macro @P9_EMBEDDED@ placed before an intrusive node is used to declare a list.
1267\begin{cfa}
1268struct node {
1269 int v;
1270 inline dlink(node);
1271};
1272@P9_EMBEDDED( node, dlink(node) );@
1273dlist( node ) dlist;
1274\end{cfa}
1275The macro creates specialized access functions to explicitly extract the required information for the polymorphic @dlist@ type.
1276These access functions could have been generated implicitly for each intrusive node by adding another compiler pass.
1277However, this would have been substantial temporary work, when the correct solution is the compiler fix.
1278Hence, the macro workaround is only a small temporary inconvenience;
1279otherwise, all the list API shown in this chapter works.
1280
1281
1282\begin{comment}
1283An API author has decided that the intended user experience is:
1284\begin{itemize}
1285\item
1286offer the user an opaque type that abstracts the API's internal state
1287\item
1288tell the user to extend this type
1289\item
1290provide functions with a ``user's type in, user's type out'' style
1291\end{itemize}
1292Fig XXX shows several attempts to provide this experience.
1293The types in (a) give the setup, achieving the first two points, while the pair of function declarations and calls of (b) are unsuccessful attempts at achieving the third point.
1294Both functions @f1@ and @f2@ allow calls of the form @f( d )@, but @f1@ has the wrong return type (@Base@) for initializing a @Derived@.
1295The @f1@ signature fails to meet ``user's type out;'' this signature does not give the \emph{user} a usable type.
1296On the other hand, the signature @f2@ offers the desired user experience, so the API author proceeds with trying to implement it.
1297
1298\begin{figure}
1299\begin{cfa}
1300#ifdef SHOW_ERRS
1301#define E(...) __VA_ARGS__
1302#else
1303#define E(...)
1304#endif
1305
1306// (a)
1307struct Base { /*...*/ }; // system offers
1308struct Derived { /*...*/ inline Base; }; // user writes
1309
1310// (b)
1311// system offers
1312Base & f1( Base & );
1313forall( T ) T & f2( T & );
1314// user writes
1315void to_elide1() {
1316 Derived & d /* = ... */;
1317 f1( d );
1318 E( Derived & d1 = f1( d ); ) // error: return is not Derived
1319 Derived & d2 = f2( d );
1320
1321 // (d), user could write
1322 Base & b = d;
1323}
1324
1325// (c), system has
1326static void helper( Base & );
1327forall( T ) T & f2( T & param ) {
1328 E( helper( param ); ) // error: param is not Base
1329 return param;
1330}
1331static void helper( Base & ) {}
1332
1333#include <list2.hfa>
1334// (e)
1335// system offers, has
1336forall( T | embedded(T, Base, Base) )
1337 T & f3( T & param ) {
1338 helper( param`inner ); // ok
1339 return param;
1340}
1341// user writes
1342struct DerivedRedux { /*...*/ inline Base; };
1343P9_EMBEDDED( DerivedRedux, Base )
1344void to_elide2() {
1345 DerivedRedux & dr /* = ... */;
1346 DerivedRedux & dr3 = f3( dr );
1347
1348 // (f)
1349 // user writes
1350 Derived & xxx /* = ... */;
1351 E( Derived & yyy = f3( xxx ); ) // error: xxx is not embedded
1352}
1353\end{cfa}
1354\end{figure}
1355
1356
1357The \CFA list examples elide the @P9_EMBEDDED@ annotations that (TODO: xref P9E future work) proposes to obviate.
1358Thus, these examples illustrate a to-be state, free of what is to be historic clutter.
1359The elided portions are immaterial to the discussion and the examples work with the annotations provided.
1360The \CFA test suite (TODO:cite?) includes equivalent demonstrations, with the annotations included.
1361\begin{cfa}
1362struct mary {
1363 float anotherdatum;
1364 inline dlink(mary);
1365};
1366struct fred {
1367 float adatum;
1368 inline struct mine { inline dlink(fred); };
1369 inline struct yours { inline dlink(fred); };
1370};
1371\end{cfa}
1372like in the thesis examples. You have to say
1373\begin{cfa}
1374struct mary {
1375 float anotherdatum;
1376 inline dlink(mary);
1377};
1378P9_EMBEDDED(mary, dlink(mary))
1379struct fred {
1380 float adatum;
1381 inline struct mine { inline dlink(fred); };
1382 inline struct yours { inline dlink(fred); };
1383};
1384P9_EMBEDDED(fred, fred.mine)
1385P9_EMBEDDED(fred, fred.yours)
1386P9_EMBEDDED(fred.mine, dlink(fred))
1387P9_EMBEDDED(fred.yours, dlink(fred))
1388\end{cfa}
1389
1390
1391The function definition in (c) gives this system-implementation attempt.
1392The system needs to operate on its own data, stored in the @Base@ part of the user's @d@, now called @param@.
1393Calling @helper@ represents this attempt to look inside.
1394It fails, because the @f2@ signature does not state that @param@ has any relationship to @Base@;
1395this signature does not give the \emph{system} a usable type.
1396The fact that the user's argument can be converted to @Base@ is lost when going through this signature.
1397
1398Moving forward needs an @f@ signature that conveys the relationship that the argument @d@ has to @Base@.
1399\CFA conveys type abilities, from caller to callee, by way of traits; so the challenge is to state the right trait member.
1400As initialization (d) illustrates, the @d@--@Base@ capability is an implicit conversion.
1401Unfortunately, in present state, \CFA does not have a first-class representation of an implicit conversion, the way operator @?{}@ (which is a value), done with the right overload, is arguably an explicit conversion.
1402So \CFA cannot directly convey ``@T@ compatible with @Base@'' in a trait.
1403
1404This work contributes a stand-in for such an ability, tunneled through the present-state trait system, shown in (e).
1405On the declaration/system side, the trait @embedded@, and its member, @`inner@, convey the ability to recover the @Base@, and thereby call @helper@.
1406On the user side, the @P9_EMBEDDED@ macro accompanies type definitions that work with @f3@-style declarations.
1407An option is presemt-state-available to compiler-emit @P9_EMBEDDED@ annotations automatically, upon each occurrence of an `inline` member.
1408The choice not to is based on conversions in \CFA being a moving target because of separate, ongoing work.
1409
1410An intended finished state for this area is achieved if \CFA's future efforts with conversions include:
1411\begin{itemize}
1412\item
1413treat conversion as operator(s), \ie values
1414\item
1415re-frame the compiler's current Plan-9 ``magic'' as seeking an implicit conversion operator, rather than seeking an @inline@ member
1416\item
1417make Plan-9 syntax cause an implementation of implicit conversion to exist (much as @struct@ syntax causes @forall(T)@ compliance to exist)
1418\end{itemize}
1419To this end, the contributed @P9_EMBEDDED@ expansion shows how to implement this conversion.
1420
1421
1422like in tests/list/dlist-insert-remove.
1423Future work should autogen those @P9_EMBEDDED@ declarations whenever it sees a plan-9 declaration.
1424The exact scheme chosen should harmonize with general user-defined conversions.
1425
1426Today's P9 scheme is: mary gets a function `inner returning this as dlink(mary).
1427Fred gets four of them in a diamond.
1428They're defined so that `inner is transitive; i.e. fred has two further ambiguous overloads mapping fred to dlink(fred).
1429The scheme allows the dlist functions to give the assertion, "we work on any T that gives a `inner to dlink(T)."
1430
1431
1432TODO: deal with: A doubly linked list is being designed.
1433
1434TODO: deal with: Link fields are system-managed.
1435Links in GDB.
1436\end{comment}
Note: See TracBrowser for help on using the repository browser.