Context Navigation

source: doc/theses/mike_brooks_MMath/list.tex@ 1eea589f

Visit:

Last change on this file since 1eea589f was 16a843b, checked in by Michael Brooks <mlbrooks@…>, 2 months ago
add linked-list performance assessment
Property mode set to `100644`
File size: 50.7 KB

Line
1	\chapter{Linked List}
2
3	This chapter presents my work on designing and building a linked-list library for \CFA.
4	Due to time limitations and the needs expressed by the \CFA runtime developers, I focussed on providing a doubly-linked list, and its bidirectionally iterators for traversal.
5	Simpler data-structures, like stack and queue, can be built from the doubly-linked mechanism with only a slight storage/performance cost because of the unused link field.
6	Reducing to data-structures with a single link follows directly from the more complex doubly-links and its iterators.
7
8
9	\section{Features}
10
11	The following features directed this project, where the goal is high-performance list operations required by \CFA runtime components, like the threading library.
12
13
14	\subsection{Core Design Issues}
15
16	The doubly-linked list attaches links intrusively, supports multiple link directions, integrates with user code via the type system, treats its ends uniformly, and identifies a list using an explicit head.
17	This design covers system and data management issues stated in \VRef{toc:lst:issue}.
18
19	\VRef[Figure]{fig:lst-features-intro} continues the running @req@ example from \VRef[Figure]{fig:lst-issues-attach} using the \CFA list.
20	The \CFA link attachment is intrusive so the resulting memory layout is per user node, as for the LQ version of \VRef[Figure]{f:Intrusive}.
21	The \CFA framework provides generic type @dlink( T, T )@ for the two link fields (front and back).
22	A user inserts the links into the @req@ structure via \CFA inline-inheritance from the Plan-9 C dialect~\cite[\S~3.3]{Thompson90new}.
23	Inline inheritance is containment, where the inlined field is unnamed but the type's internal fields are hoisted into the containing structure.
24	Hence, the field names must be unique, unlike \CC nested types, but the type names are at a nested scope level, unlike aggregate nesting in C.
25	Note, the position of the containment is normally unimportant, unless there is some form of memory or @union@ overlay.
26	The key feature of inlined inheritance is that a pointer to the containing structure is automatically converted to a pointer to any anonymous inline field for assignments and function calls, providing containment inheritance with implicit subtyping.
27	Therefore, a reference to a @req@ is implicitly convertible to @dlink@ in assignments and function calls.
28	% These links have a nontrivial, user-specified location within the @req@ structure;
29	% this convention encapsulates the implied pointer arithmetic safely.
30	The links in @dlist@ point at (links) in the containing node, know the offsets of all links (data is abstract), and any field-offset arithmetic or link-value changes are safe and abstract.
31
32	\begin{figure}
33	\lstinput{20-30}{lst-features-intro.run.cfa}
34	\caption[Multiple link directions in \CFA list library]{
35	Demonstration of the running \lstinline{req} example, done using the \CFA list library.
36	This example is equivalent to the three approaches in \VRef[Figure]{fig:lst-issues-attach}.
37	}
38	\label{fig:lst-features-intro}
39	\end{figure}
40
41	\VRef[Figure]{fig:lst-features-multidir} shows how the \CFA library supports multi-inline links, so a node can be on one or more lists simultaneously.
42	The declaration of @req@ has two inline-inheriting @dlink@ occurrences.
43	The first of these gives a type named @req.by_pri@, @req@ inherits from it, and it inherits from @dlink@.
44	The second line @req.by_rqr@ is similar to @req.by_pri@.
45	Thus, there is a diamond, non-virtual, inheritance from @req@ to @dlink@, with @by_pri@ and @by_rqr@ being the mid-level types.
46
47	Disambiguation occurs in the declarations of the list-head objects: @reqs_pri_global@, @reqs_rqr_42@, @reqs_rqr_17@, and @reqs_rqr_99@.
48	The type of the variable @reqs_pri_global@ is @dlist(req, req.by_pri)@, meaning operations called on @reqs_pri_global@ are implicitly disambiguated.
49	In the example, the calls @insert_first(reqs_pri_global, ...)@ imply, ``here, we are working by priority.''
50	As in \VRef[Figure]{fig:lst-issues-multi-static}, three lists are constructed, a priority list containing all nodes, a list with only nodes containing the value 42, and a list with only nodes containing the value 17.
51
52	\begin{figure}
53	\centering
54	\begin{tabular}{@{}ll@{}}
55	\begin{tabular}{@{}l@{}}
56	\lstinput{20-31}{lst-features-multidir.run.cfa} \\
57	\lstinput{43-71}{lst-features-multidir.run.cfa}
58	\end{tabular}
59	&
60	\lstinput[language=C++]{20-60}{lst-issues-multi-static.run.c}
61	\end{tabular}
62
63	\caption{
64	Demonstration of multiple static link directions done in the \CFA list library.
65	The right example is from \VRef[Figure]{fig:lst-issues-multi-static}.
66	The left \CFA example does the same job.
67	}
68	\label{fig:lst-features-multidir}
69	\end{figure}
70
71	The \CFA library also supports the common case, of single directionality, more naturally than LQ.
72	\VRef[Figure]{fig:lst-features-intro} shows a single-direction list done with no contrived name for the link direction,
73	where \VRef[Figure]{f:Intrusive} adds the unnecessary field name, @d@.
74	In \CFA, a user doing a single direction (\VRef[Figure]{fig:lst-features-intro}) sets up a simple inheritance with @dlink@, and declares a list head to have the simpler type @dlist( T )@.
75	In contrast, (\VRef[Figure]{fig:lst-features-multidir}) sets up a diamond inheritance with @dlink@, and declares a list head to have the more-informed type @dlist( T, DIR )@.
76
77	The directionality issue also has an advanced corner-case that needs treatment.
78	When working with multiple directions, calls like @insert_first@ benefit from implicit direction disambiguation;
79	however, other calls like @insert_after@ still require explicit disambiguation, \eg the call
80	\begin{cfa}
81	insert_after(r1, r2);
82	\end{cfa}
83	does not have enough information to clarify which of a request's simultaneous list directions is intended.
84	Is @r2@ supposed to be the next-priority request after @r1@, or is @r2@ supposed to join the same-requester list of @r1@?
85	As such, the \CFA compiler gives an ambiguity error for this call.
86	To resolve the ambiguity, the list library provides a hook for applying the \CFA language's scoping and priority rules.
87	It applies as:
88	\begin{cfa}
89	with ( DLINK_VIA(req, req.pri) ) insert_after(r1, r2);
90	\end{cfa}
91	Here, the @with@ statement opens the scope of the object type for the expression;
92	hence, the @DLINK_VIA@ result causes one of the list directions to become a more attractive candidate to \CFA's overload resolution.
93	This boost applies within the scope of the following statement, but could also be a custom block or an entire function body.
94	\begin{cquote}
95	\setlength{\tabcolsep}{15pt}
96	\begin{tabular}{@{}ll@{}}
97	\begin{cfa}
98	void f() @with( DLINK_VIA(req, req.pri) )@ {
99	...
100
101	insert_after(r1, r2);
102
103	...
104	}
105	\end{cfa}
106	&
107	\begin{cfa}
108	void f() {
109	...
110	@with( DLINK_VIA(req, req.pri) )@ {
111	... insert_after(r1, r2); ...
112	}
113	...
114	}
115	\end{cfa}
116	\end{tabular}
117	\end{cquote}
118	By using a larger scope, a user can put code within that acts as if there is only one list direction.
119	This boost is needed only when operating on a list with several directions, using operations that do not take the list head.
120	Otherwise, the sole applicable list direction \emph{just works}.
121
122	Unlike \CC templates container-types, the \CFA library works completely within the type system;
123	both @dlink@ and @dlist@ are ordinary types.
124	There is no textual expansion other than header-included static-inline function for performance.
125	Errors in user code are reported only with mention of the library's declarations.
126	Finally, the library is separately compiled from the usage code.
127
128	The \CFA library works in headed and headless modes. TODO: elaborate.
129
130
131	\section{List API}
132
133	\VRef[Figure]{f:ListAPI} shows the API for the doubly-link list operations, where each is explained.
134	\begin{itemize}[leftmargin=*]
135	\item
136	@isListed@ returns true if the node is an element of a list and false otherwise.
137	\item
138	@isEmpty@ returns true if the list has no nodes and false otherwise.
139	\item
140	@first@ returns a reference to the first node of the list without removing it or @0p@ if the list is empty.
141	\item
142	@last@ returns a reference to the last node of the list without removing it or @0p@ if the list is empty.
143	\item
144	@insert_before@ adds a node before the specified node and returns the added node for cascading.
145	\item
146	@insert_after@ adds a node after the specified node and returns the added node for cascading.
147	\item
148	@remove@ removes the specified node from the list (any location) and returns a reference to the node.
149	\item
150	@iter@ create an iterator for the list.
151	\item
152	@recede@ returns true if the iterator cursor is advanced to the previous (predecessor, towards first) node before the prior cursor node and false otherwise.
153	\item
154	@advance@ returns true if the iterator cursor is advanced to the next (successor, towards last) node after the prior cursor node and false otherwise.
155	\item
156	@isFirst@ returns true if the node is the first node in the list and false otherwise.
157	\item
158	@isLast@ returns true if the node is the last node in the list and false otherwise.
159	\item
160	@pred@ returns a reference to the previous (predecessor, towards first) node before the specified node or @0p@ if the specified node is the first node in the list.
161	\item
162	@next@ returns a reference to the next (successor, towards last) node after the specified node or @0p@ if the specified node is the last node in the list.
163	\item
164	@insert_first@ adds a node to the start of the list so it becomes the first node and returns a reference to the node for cascading.
165	\item
166	@insert_last@ adds a node to the end of the list so it becomes the last node and returns returns a reference to the node for cascading.
167	\item
168	@remove_first@ removes the first node and returns a reference to it or @0p@ if the list is empty.
169	\item
170	@remove_last@ removes the last node and returns a reference to it or @0p@ if the list is empty.
171	\item
172	@transfer@ transfers all nodes from the @from@ list to the end of the @to@ list; the @from@ list is empty after the transfer.
173	\item
174	@split@ transfers the @from@ list up to node to the end of the @to@ list; the @from@ list becomes the list after the node.
175	The node must be in the @from@ list.
176	\end{itemize}
177
178	\begin{figure}
179	\begin{cfa}
180	E & isListed( E & node );
181	E & isEmpty( dlist( E ) & list );
182	E & first( dlist( E ) & list );
183	E & last( dlist( E ) & list );
184	E & insert_before( E & before, E & node );
185	E & insert_after( E & after, E & node );
186	E & remove( E & node );
187	E & iter( dlist( E ) & list );
188	bool advance( E && refx );
189	bool recede( E && refx );
190	bool isFirst( E & node );
191	bool isLast( E & node );
192	E & prev( E & node );
193	E & next( E & node );
194	E & insert_first( dlist( E ) & list, E & node );
195	E & insert_last( dlist( E ) & list, E & node );
196	E & remove_first( dlist( E ) & list );
197	E & remove_last( dlist( E ) & list );
198	void transfer( dlist( E ) & to, dlist( E ) & from ) {
199	void split( dlist( E ) & to, dlist( E ) & from, E & node ) {
200	\end{cfa}
201	\caption{\CFA List API}
202	\label{f:ListAPI}
203	\end{figure}
204
205
206	\subsection{Iteration}
207
208	It is possible to iterate through a list manually or using a set of standard macros.
209	\VRef[Figure]{f:IteratorTemple} shows the iterator template, managing a list of nodes, used throughout the following iterator examples.
210	Each example assumes its loop body prints the value in the current node.
211
212	\begin{figure}
213	\begin{cfa}
214	#include <fstream.hfa>
215	#include <list.hfa>
216
217	struct node {
218	int v;
219	inline dlink(node);
220	};
221	int main() {
222	dlist(node) list;
223	node n1 = { 1 }, n2 = { 2 }, n3 = { 3 }, n4 = { 4 };
224	insert( list, n1 ); insert( list, n2 ); insert( list, n3 ); insert( list, n4 );
225	sout \| nlOff;
226	for ( ... ) @sout \| it.v \| ","; sout \| nl;@ // iterator examples in text
227	remove( n1 ); remove( n2 ); remove( n3 ); remove( n4 );
228	}
229	\end{cfa}
230	\caption{Iterator Temple}
231	\label{f:IteratorTemple}
232	\end{figure}
233
234	The manual method is low level but allows complete control of the iteration.
235	The list cursor (index) can be either a pointer or a reference to a node in the list.
236	The choice depends on how the programmer wants to access the fields: @it->f@ or @it.f@.
237	The following examples use a reference because the loop body manipulates the node values rather than the list pointers.
238	The end of iteration is denoted by the loop cursor having the null pointer, denoted @0p@ in \CFA.
239
240	\noindent
241	Iterating forward and reverse through the entire list.
242	\begin{cquote}
243	\setlength{\tabcolsep}{15pt}
244	\begin{tabular}{@{}l\|l@{}}
245	\begin{cfa}
246	for ( node & it = @first@( list ); &it /* != 0p */ ; &it = &@next@( it ) ...
247	for ( node & it = @last@( list ); ⁢ &it = &@prev@( it ) ) ...
248	\end{cfa}
249	&
250	\begin{cfa}
251	1, 2, 3, 4,
252	4, 3, 2, 1,
253	\end{cfa}
254	\end{tabular}
255	\end{cquote}
256	Iterating forward and reverse from a starting node through the remaining list.
257	\begin{cquote}
258	\setlength{\tabcolsep}{15pt}
259	\begin{tabular}{@{}l\|l@{}}
260	\begin{cfa}
261	for ( node & it = @n2@; ⁢ &it = &@next@( it ) ) ...
262	for ( node & it = @n3@; ⁢ &it = &@prev@( it ) ) ...
263	\end{cfa}
264	&
265	\begin{cfa}
266	2, 3, 4,
267	3, 2, 1,
268	\end{cfa}
269	\end{tabular}
270	\end{cquote}
271	Iterating forward and reverse from a starting node to an ending node through the contained list.
272	\begin{cquote}
273	\setlength{\tabcolsep}{15pt}
274	\begin{tabular}{@{}l\|l@{}}
275	\begin{cfa}
276	for ( node & it = @n2@; &it @!= &n4@; &it = &@next@( it ) ) ...
277	for ( node & it = @n4@; &it @!= &n2@; &it = &@prev@( it ) ) ...
278	\end{cfa}
279	&
280	\begin{cfa}
281	2, 3,
282	4, 3,
283	\end{cfa}
284	\end{tabular}
285	\end{cquote}
286	Iterating forward and reverse through the entire list using the shorthand start at the list head and pick a direction.
287	In this case, @advance@ and @recede@ return a boolean, like \CC @while ( cin >> i )@.
288	\begin{cquote}
289	\setlength{\tabcolsep}{15pt}
290	\begin{tabular}{@{}l\|l@{}}
291	\begin{cfa}
292	for ( node & it = @iter@( list ); @advance@( it ); ) ...
293	for ( node & it = @iter@( list ); @recede@( it ); ) ...
294	\end{cfa}
295	&
296	\begin{cfa}
297	1, 2, 3, 4,
298	4, 3, 2, 1,
299	\end{cfa}
300	\end{tabular}
301	\end{cquote}
302	Finally, there are convenience macros that look like @foreach@ in other programming languages.
303	Iterating forward and reverse through the entire list.
304	\begin{cquote}
305	\setlength{\tabcolsep}{15pt}
306	\begin{tabular}{@{}l\|l@{}}
307	\begin{cfa}
308	FOREACH( list, it ) ...
309	FOREACH_REV( list, it ) ...
310	\end{cfa}
311	&
312	\begin{cfa}
313	1, 2, 3, 4,
314	4, 3, 2, 1,
315	\end{cfa}
316	\end{tabular}
317	\end{cquote}
318	Iterating forward and reverse through the entire list or until a predicate is triggered.
319	\begin{cquote}
320	\setlength{\tabcolsep}{15pt}
321	\begin{tabular}{@{}l\|l@{}}
322	\begin{cfa}
323	FOREACH_COND( list, it, @it.v == 3@ ) ...
324	FOREACH_REV_COND( list, it, @it.v == 1@ ) ...
325	\end{cfa}
326	&
327	\begin{cfa}
328	1, 2,
329	4, 3, 2,
330	\end{cfa}
331	\end{tabular}
332	\end{cquote}
333	Macros are not ideal, so future work is to provide a language-level @foreach@ statement, like \CC.
334	Finally, a predicate can be added to any of the manual iteration loops.
335	\begin{cquote}
336	\setlength{\tabcolsep}{15pt}
337	\begin{tabular}{@{}l\|l@{}}
338	\begin{cfa}
339	for ( node & it = first( list ); &it @&& !(it.v == 3)@; &it = &next( it ) ) ...
340	for ( node & it = last( list ); &it @&& !(it.v == 1)@; &it = &prev( it ) ) ...
341	for ( node & it = iter( list ); advance( it ) @&& !(it.v == 3)@; ) ...
342	for ( node & it = iter( list ); recede( it ) @&& !(it.v == 1)@; ) ...
343	\end{cfa}
344	&
345	\begin{cfa}
346	1, 2,
347	4, 3, 2,
348	1, 2,
349	4, 3, 2,
350	\end{cfa}
351	\end{tabular}
352	\end{cquote}
353
354	\begin{comment}
355	Many languages offer an iterator interface for collections, and a corresponding for-each loop syntax for consuming the items through implicit interface calls.
356	\CFA does not yet have a general-purpose form of such a feature, though it has a form that addresses some use cases.
357	This section shows why the incumbent \CFA pattern does not work for linked lists and gives the alternative now offered by the linked-list library.
358	Chapter 5 [TODO: deal with optimism here] presents a design that satisfies both uses and accommodates even more complex collections.
359
360	The current \CFA extensible loop syntax is:
361	\begin{cfa}
362	for( elem; end )
363	for( elem; begin ~ end )
364	for( elem; begin ~ end ~ step )
365	\end{cfa}
366	Many derived forms of @begin ~ end@ exist, but are used for defining numeric ranges, so they are excluded from the linked-list discussion.
367	These three forms are rely on the iterative trait:
368	\begin{cfa}
369	forall( T ) trait Iterate {
370	void ?{}( T & t, zero_t );
371	int ?<?( T t1, T t2 );
372	int ?<=?( T t1, T t2 );
373	int ?>?( T t1, T t2 );
374	int ?>=?( T t1, T t2 );
375	T ?+=?( T & t1, T t2 );
376	T ?+=?( T & t, one_t );
377	T ?-=?( T & t1, T t2 );
378	T ?-=?( T & t, one_t );
379	}
380	\end{cfa}
381	where @zero_t@ and @one_t@ are constructors for the constants 0 and 1.
382	The simple loops above are abbreviates for:
383	\begin{cfa}
384	for( typeof(end) elem = @0@; elem @<@ end; elem @+=@ @1@ )
385	for( typeof(begin) elem = begin; elem @<@ end; elem @+=@ @1@ )
386	for( typeof(begin) elem = @0@; elem @<@ end; elem @+=@ @step@ )
387	\end{cfa}
388	which use a subset of the trait operations.
389	The shortened loop works well for iterating a number of times or through an array.
390	\begin{cfa}
391	for ( 20 ) // 20 iterations
392	for ( i: 1 ~= 21 ~ 2 ) // odd numbers
393	for ( i; n ) total += a[i]; // subscripts
394	\end{cfa}
395	which is similar to other languages, like JavaScript.
396	\begin{cfa}
397	for ( i in a ) total += a[i];
398	\end{cfa}
399	Albeit with different mechanisms for expressing the array's length.
400	It might be possible to take the \CC iterator:
401	\begin{c++}
402	for ( list<int>::iterator it=mylist.begin(); it != mylist.end(); ++it )
403	\end{c++}
404	and convert it to the \CFA form
405	\begin{cfa}
406	for ( it; begin() ~= end() )
407	\end{cfa}
408	by having a list operator @<=@ that just looks for equality, and @+=@ that moves to the next node, \etc.
409
410	However, the list usage is contrived, because a list does use its data values for relational comparison, only links for equality comparison.
411	Hence, the focus of a list iterator's stopping condition is fundamentally different.
412	So, iteration of a linked list via the existing loop syntax is to ask whether this syntax can also do double-duty for iterating values.
413	That is, to be an analog of JavaScript's @for..of@ syntax:
414	\begin{cfa}
415	for ( e of a ) total += e;
416	\end{cfa}
417
418	The \CFA team will likely implement an extension of this functionality that moves the @~@ syntax from being part of the loop, to being a first-class operator (with associated multi-pace operators for the elided derived forms).
419	With this change, both @begin ~ end@ and @end@ (in context of the latter ``two-place for'' expression) parse as \emph{ranges}, and the loop syntax becomes, simply:
420	\begin{cfa}
421	for( elem; rangeExpr )
422	\end{cfa}
423	The expansion and underlying API are under discussion.
424	TODO: explain pivot from ``is at done?'' to ``has more?''
425	Advantages of this change include being able to pass ranges to functions, for example, projecting a numerically regular subsequence of array entries, and being able to use the loop syntax to cover more collection types, such as looping over the keys of a hashtable.
426
427	When iterating an empty list, the question, ``Is there a further element?'' needs to be posed once, receiving the answer, ``no.''
428	When iterating an $n$-item list, the same question gets $n$ ``yes'' answers (one for each element), plus one ``no'' answer, once there are no more elements; the question is posed $n+1$ times.
429
430	When iterating an empty list, the question, ``What is the value of the current element?'' is never posed, nor is the command, ``Move to the next element,'' issued. When iterating an $n$-item list, each happens $n$ times.
431
432	So, asking about the existence of an element happens once more than retrieving an element's value and advancing the position.
433
434	Many iteration APIs deal with this fact by splitting these steps across different functions, and relying on the user's knowledge of iterator state to know when to call each. In Java, the function @hasNext@ should be called $n+1$ times and @next@ should be called $n$ times (doing the double duty of advancing the iteration and returning a value). In \CC, the jobs are split among the three actions, @it != end@, @it++@ and @*it@, the latter two being called one time more than the first.
435
436	TODO: deal with simultaneous axes: @DLINK_VIA@ just works
437
438	TODO: deal with spontaneous simultaneity, like a single-axis req, put into an array: which ``axis'' is @&req++@ navigating: array-adjacency vs link dereference. It should sick according to how you got it in the first place: navigating dlist(req, req.pri) vs navigating array(req, 42). (prob. future work)
439	\end{comment}
440
441
442	\section{Implementation}
443
444	\VRef[Figure]{fig:lst-impl-links} continues the running @req@ example, now showing the \CFA list library's internal representation.
445	The @dlink@ structure contains exactly two pointers: @next@ and @prev@, which are opaque to a user.
446	Even though the user-facing list model is linear, the CFA library implements all listing as circular.
447	This choice helps achieve uniform end treatment and TODO finish summarizing benefit.
448	A link pointer targets a neighbouring @dlink@ structure, rather than a neighbouring @req@.
449	(Recall, the running example has the user putting a @dlink@ within a @req@.)
450
451	\begin{figure}
452	\centering
453	\includegraphics{lst-impl-links.pdf}
454	\caption{
455	\CFA list library representations for the cases under discussion.
456	}
457	\label{fig:lst-impl-links}
458	\end{figure}
459
460	System-added link pointers (dashed lines) are internally tagged to indicate linear endpoints that are the circular pointers.
461	Links among neighbour nodes are not tagged.
462	Iteration reports ``has more elements'' when crossing natural links, and ``no more elements'' upon reaching a tagged link.
463
464	In a headed list, the list head (@dlist(req)@) acts as an extra element in the implementation-level circularly-linked list.
465	The content of a @dlist@ is a (private) @dlink@, with the @next@ pointer to the first element, and the @prev@ pointer to the last element.
466	Since the head wraps a @dlink@, as does a @req@ does too, and since a link-pointer targets a @dlink@, the resulting cycle is among @dlink@ structures, situated inside of header/node.
467	An untagged pointer points within a @req@, while a tagged pointer points within a list head.
468	In a headless list, the circular backing list is only among @dlink@s within @req@s.
469	The tags are set on the links that a user cannot navigate.
470
471	No distinction is made between an unlisted item under a headed model and a singleton list under a headless model.
472	Both are represented as an item referring to itself, with both tags set.
473
474
475	\section{Assessment}
476	\label{toc:lst:assess}
477
478	\subsection{Add-Remove Performance}
479
480	The fundamental job of a linked-list library is to manage the links that connect users' items.
481	Any link management is an action that causes pair(s) of elements to become, or cease to be, adjacent.
482	Thus, adding and removing an element are the sole primitive actions.
483
484	Repeated adding and removing is necessary to measure timing because these operations can be as simple as a dozen instructions.
485	These instruction sequences may have cases that proceed (in a modern, deep pipeline) without a stall.
486
487	This experiment takes the position that
488	\begin{itemize}
489	\item The total time to add and remove is relevant; an attribution of time spent adding vs.\ removing is not.
490	Any use case for which addition speed matters necessarily has removes paired with adds.
491	For otherwise, the alleged usage would exhaust the amount of work expressable as a main-memory full of nodes within a few seconds.
492	\item A relevant breakdown ``by operation'' is, rather, one that considers the structural context of these requests.
493	\begin{description}
494	\item[movement]
495	Is the add/remove order that of a stack, a queue, or something else?
496	\item[polarity]
497	In which direction does the movement's action apply? For a queue, do items flow from first to last or last to first? For a stack, is the first-end or the last-end used for adding and removing?
498	\item[accessor]
499	Is an add/remove location given by a list head's ``first''/``last'', or by a reference to an individual element?
500	\end{description}
501	\item Speed differences caused by the host machine's memory hierarchy need to be identified and explained,
502	but do not represent advantages of one linked-list implementation over another.
503	\end{itemize}
504
505	This experiment measures the mean duration of a list addition and removal.
506	Confidence bounds, on this mean, are discussed.
507	The distribution of speeds experienced by an individual add-remove pair (tail latency) is not discussed.
508
509	Space efficiency is shown only indirectly, by way of caches' impact on speed.
510
511	%~MONITOR
512	% If able to show cases with CFA doing better, reword.
513	The goal is to show the \CFA library performing comparably to other intrusive libraries,
514	in an experimental context sensitive enough to show also:
515	\begin{itemize}
516	\item intrusive lists performing (majorly) differently than wrapped lists
517	\item a space of (minor) performance differences typical of existing intrusive lists
518	\end{itemize}
519
520
521	\subsubsection{Experiment setup}
522
523	The experiment defines a user's datatype and considers
524	the speed of building, and tearing down, a list of $n$ instances of the user's type.
525
526	The timings are taken with a fixed-duration method based on checks @clock()@.
527	In a typical 5-sec run, the outer looping checks the clock about 200 times.
528	A number of experimental rounds per clock check is precalculated to be appropriate to the value of $n$.
529
530	\begin{cfa}
531	// simplified harness: CFA implementation,
532	// stack movement, insert-first polarity, head-mediated access
533	size_t totalOpsDone = 0;
534	dlist( item_t ) lst;
535	item_t items[ n ];
536	startTimer();
537	while ( SHOULD_CONTINUE ) {
538	for ( i; n ) insert_first( lst, items[i] );
539	for ( i; n ) remove_first( lst );
540	totalOpsDone += n;
541	}
542	stopTimer();
543	reportedDuration = getTimerDuration() / totalOpsDone;
544	\end{cfa}
545
546	One experimental round is, first, a tight loop of inserting $n$ elements into a list, followed by another, to remove these $n$ elements.
547	A counter is incremented by $n$ each round.
548	When the whole experiment is done, the total elapsed time, divided by final value of the operation counter,
549	is reported as the observed mean operation duration.
550	In a scatterplot presentation, each dot would be one such reported mean duration.
551	So, ``operation'' really means insert + remove + harness overhead.
552
553	The harness overheads are held constant when comparing linked-list implementations.
554	The remainder of the setup section discusses the choices that affected the harness overhead.
555
556	An \emph{iterators' array} provides support for element-level operations on non-intrusive lists.
557	As elaborated in Section \ref{toc:lst:issue:attach},
558	wrapped-attachment lists use a distinct type (at a distinct memory location) to represent ``an item that's in the list.''
559	Operations like insert-after and remove-here consume iterators.
560	In the STL implementation, an iterator is a pointer to a \lstinline{std::_List_node}.
561	For the STL case, the driver obtains an iterator value
562	at the time of adding to the list, and stores the iterator in an array, for consumption by subsequent element-oriented operations.
563	For intrusive-list cases, the driver stores the user object's address in the iterators' array.
564
565	\begin{c++}
566	// further simplified harness (bookkeeping elided): STL implementation,
567	// stack movement, insert-first polarity, element-based remove access
568	list< item_t * > lst;
569	item_t items[ n ];
570	while ( SHOULD_CONTINUE ) {
571	@list< item_t * >::iterator iters[ n ];@
572	for ( int i = 0; i < n; i += 1 ) {
573	lst.push_front( & items[i] );
574	@iters[i]@ = lst.begin();
575	}
576	for ( int i = 0; i < n; i += 1 ) {
577	lst.erase( @iters[i]@ );
578	}
579	}
580	\end{c++}
581
582	%~MONITOR
583	% If running insert-random scenarios, revise the assessment
584
585	A \emph{shuffling array} helps control the memory layout of user items.
586	The control required is when choosing a next item to insert.
587	The user items are allocated in a contiguous array.
588	Without shuffling, the driver's insert phase visits these items in order, producing a list whose adjavency links hop uniform strides.
589	With shuffling active, the driver's insert phase visits only the shuffling array in order,
590	which applies pseudo-random indirection to the selection of a next-to-insert element from the user-item array.
591	The result is a list whose links travel randomly far.
592
593	\begin{cfa}
594	// harness (bookkeeping and iterators elided): CFA implementation,
595	// stack movement, insert-first polarity, head-mediated access
596	dlist( item_t ) lst;
597	item_t items[ n ];
598	size_t insert_ord[ n ]; // elided: populate with shuffled [0,n)
599	while ( SHOULD_CONTINUE ) {
600	for ( i; n ) insert_first( lst, items[ @insert_ord[@ i @]@ ] );
601	for ( i; n ) remove_first( lst );
602	}
603	\end{cfa}
604
605	\emph{Interleaving} allows for movements other than pure stack and queue.
606	Note that the earlier example of using the iterators' array is still a pure stack: the item selected for @erase(...)@ is always the first.
607	Including a less predictable movement is important because real applications that justify doubly linked lists use them.
608	Freedom to remove from arbitrary places (and to insert under more relaxed assumptions) is the characteristic function of a doubly linked list.
609	A queue with drop-out is an example of such a movement.
610	A list implementation can show unrepresentative speed under a simple movement, for example, by enjoying unchallenged ``Is first element?'' branch predictions.
611
612	Interleaving brings ``at middle of list'' cases into a stream of add or remove invocations, which would otherwise be exclusively ``at end''.
613	A chosen split, like half middle and half end, populates a boolean array, which is then shuffled.
614	These booleans then direct the action to end-\vs-middle.
615
616	\begin{cfa}
617	// harness (bookkeeping and shuffling elided): CFA implementation,
618	// stack movement, insert-first polarity, interleaved element-based remove access
619	dlist( item_t ) lst;
620	item_t items[ n ];
621	@bool interl[ n ];@ // elided: populate with weighted, shuffled [0,1]
622	while ( SHOULD_CONTINUE ) {
623	item_t * iters[ n ];
624	for ( i; n ) {
625	insert_first( items[i] );
626	iters[i] = & items[i];
627	}
628	@item_t ** crsr[ 2 ]@ = { // two cursors into iters
629	& iters[ @0@ ], // at stack-insert-first's removal end
630	& iters[ @n / interl_frac@ ] // in middle
631	};
632	for ( i; n ) {
633	item *** crsr_use = & crsr[ interl[ i ] ]@;
634	remove( *** crsr_use ); // removing from either middle or end
635	*crsr_use += 1; // that item is done
636	}
637	assert( crsr[0] == & iters[ @n / interl_frac@ ] ); // through second's start
638	assert( crsr[1] == & iters[ @n@ ] ); // did the rest
639	}
640	\end{cfa}
641
642	By using the pair of cursors, the harness avoids branches, which could incur prediction stall times themselves, or prime a branch in the SUT.
643	This harness avoids telling the hardware what the SUT is about to do.
644
645	These experiments are single threaded. They run on a PC with a 64-bit eight-core AMD FX-8370E, with ``taskset'' pinning to core \#6. The machine has 16 GB of RAM and 8 MB of last-level cache.
646
647	The comparator linked-list implementations are:
648	\begin{description}
649	\item[lq-list] The @list@ type of LQ from glibc of GCC-11.
650	\item[lq-tailq] The @tailq@ type of the same.
651	\item[upp-upp] uC++ provided @uSequence@
652	\item[cfa-cfa] \CFA's @dlist@
653	\end{description}
654
655
656	\subsubsection{Result: Coarse comparison of styles}
657
658	This comparison establishes how an intrusive list performs, compared with a wrapped-reference list.
659	It also establishes the context within which it is meaningful to compare one intrusive list to another.
660
661	%These goals notwithstanding, the effect of the host machine's memory hierarchy is more significant here than linked-list implementation.
662
663	\begin{figure}
664	\centering
665	\begin{tabular}{c}
666	\includegraphics{plot-list-zoomout-shuf.pdf} \\
667	(a) \\
668	\includegraphics{plot-list-zoomout-noshuf.pdf} \\
669	(b) \\
670	\end{tabular}
671	\caption{Operation duration \vs list length at full spectrum of list lengths. One example operation is shown: stack movement, insert-first polarity and head-mediated access. Lengths go as large as completes without error. Version (a) uses shuffled items, while version (b) links items with their physical neighbours.}
672	\label{fig:plot-list-zoomout}
673	\end{figure}
674
675	\VRef[Figure]{fig:plot-list-zoomout} presents the speed measures at various list lengths.
676	STL's wrapped-reference list begins with operations on a length-1 list costing around 30 ns.
677	This time grows modetly as list length increases, apart from more drastic worsening at the largest lengths.
678	STL's performance is not affected by element order in memory.
679	The field of intrusive lists begins with length-1 operations costing around 10 ns and enjoys a ``sweet spot'' in lengths 10--100 of 5--7-ns operations.
680	This much is also unaffected by element order.
681	Beyond this point, shuffled-element list performance worsens drastically, losing to STL beyond about half a million elements, and never particularly leveling off.
682	In the same range, an unshuffled list sees some degradation, but holds onto a 1--2 $\times$ speedup over STL.
683
684	The apparent intrusive ``sweet spot,'' particularly its better-than-length-1 speed, is not because of list operations truly running faster.
685	Rather, the worsening as length decreases reflects the per-operation share of harness overheads incurred at the outer-loop level.
686	Disabling the harness's ability to drive interleaving, even though the current scenario is using a ``never work in middle'' interleave, made this rise disappear.
687	Subsequent analyses use length-controlled relative performance when comparing intrusive implementations, making this curiosity disappear.
688
689	In a wrapped-reference list, list nodes are allocated separately from the items put into the list.
690	Intrusive beats wrapped at the smaller lengths, and when shuffling is avoided, because intrusive avoids dynamic memory allocation for list nodes.
691
692	The remaining big-swing comparison points say more about a computer's memory hierarchy than about linked lists.
693	The tests in this chapter are only inserting and removing.
694	They are not operating on any user payload data that is being listed.
695	The drastic differences at large list lengths reflect differences in link-field storage density and in correlation of link-field order to element order.
696	These differences are inherent to the two list models.
697
698	The slowdown of shuffled intrusive occurs as the experiment's length grows from last-level cache, into main memory.
699	Insert and remove operations act on both sides of a link.
700	Both a next unlisted item to insert (found in the items' array, seen through the shuffling array), and a next listed item to remove (found by traversing list links), introduce a new user-item location.
701	Each time a next item is processed, the memory access is a hop to a randomly far address.
702	The target is not available in cache and a slowdown results.
703
704	With the unshuffled intrusive list, each link connects to an adjacent location. So, this case has high memory locality and stays fast. But the unshuffled assumption is simply not realistic: if you know items are adjacent, you don't need a linked list.
705
706	A wrapped-reference list's separate nodes are allocated right beside each other in this experiment, because no other memory allocation action is happening.
707	As a result, the interlinked nodes of the STL list are generally referenceing their immediate neighbours.
708	This pattern occurs regardless of user-item suffling because this test's ``use'' of the user-items' array is limited to storing element addresses.
709	This experiment, driving an STL list, is simply not touching the memory that holds the user data.
710	Because the interlinked nodes, being the only touched memory, are generally adjacent, this case too has high memory locality and stays fast.
711	But the user-data no-touch assumption is often unrealistic: decisions like,``Should I remove this item?'' need to look at the item.
712	In an odd scenario where this intuition is incorrect, and where furthermore the program's total use of the memory allocator is sufficiently limited to yield approximately adjacent allocations for successive list insertions, a nonintrusive list may be preferred for lists of approximately the chache's size.
713
714	Therefore, under clearly typical situational assumptions, both intrusive and wrapped-reference lists will suffer similarly from a large list overfilling the memory cache, experiencing degradation like shuffled intrusive shows here.
715
716	But the comparison of unshuffled intrusive with wrapped-reference gives the peformance of these two styles, with their the common impediment of overfilling the cache removed.
717	Intrusive consistently beats wrapped-reference by about 20 ns, at all sizes.
718	This difference is appreciable below list length 0.5 M, and enormous below 10 K.
719
720
721	\section{Result: Comparing intrusive implementations}
722
723	The preceding result shows that intrusive implementations have noteworthy performance differences below 150 nodes.
724	This analysis zooms in on this area and identifies the participants.
725
726	\begin{figure}
727	\centering
728	\begin{tabular}{c}
729	\includegraphics{plot-list-zoomin-abs.pdf} \\
730	(a) \\
731	\includegraphics{plot-list-zoomin-rel.pdf} \\
732	(b) \\
733	\end{tabular}
734	\caption{Operation duration \vs list length at small-medium lengths. One example operation is shown: stack movement, insert-first polarity and head-mediated access. (a) has absolute times. (b) has times relative to those of LQ-\lstinline{tailq}.}
735	\label{fig:plot-list-zoomin}
736	\end{figure}
737
738	In \VRef{fig:plot-list-zoomin} part (a) shows exactly this zoom-in.
739	The same scenario as the coarse comparison is used: a stack, with insertions and removals happening at the end called ``first,'' ``head'' or ``front,'' and all changes occuring through a head-provided insert/remove operation.
740	The error bars show fastest and slowest time seen on five trials, and the central point is the mean of the remaining three trials.
741	For readability, the frameworks are slightly staggered in the horizontal, but all trials near a given size were run at the same size.
742
743	For this particular operation, uC++ fares the worst, followed by \CFA, then LQ's @tailq@.
744	Its @list@ does the best at smaller lengths but loses its edge above a dozen elements.
745
746	Moving toward being able to consider several scenarios, \VRef{fig:plot-list-zoomin} part (b) shows the same result, adjusted to treat @tailq@ as a benchmark, and expressing all the results relative to it.
747	This change does not affect the who-wins statements, it just removes the ``sweet spot'' bend that the earlier discussion dismissed as incidental.
748	Runs faster than @tailq@'s are below the zero and slower runs are above; @tailq@'s mean is always zero by definition, but its error bars, representing a single scenario's re-run stability, are still meaningful.
749	With this bend straightened out, aggregating across lengths is feasible.
750
751	\begin{figure}
752	\centering
753	\begin{tabular}{c}
754	\includegraphics{plot-list-cmp-exout.pdf} \\
755	(a) \\
756	\includegraphics{plot-list-cmp-survey.pdf} \\
757	(b) \\
758	\end{tabular}
759	\caption{Operation duration ranges across operational scenarios. (a) has the supersets of the running example operation. (b) has the first-level slices of the full space of operations.}
760	\label{fig:plot-list-cmp-overall}
761	\end{figure}
762
763	\VRef{fig:plot-list-cmp-overall} introduces the resulting format.
764	Part (a)'s first column summarizes all the data of \VRef{fig:plot-list-zoomin}-(b).
765	Its x-axis label, ``stack/insfirst/allhead,'' names the concrete scenario that has been discussed until now.
766	Moving across the columns, the next three each stretch to include more scenarios on each of the operation dimensions, one at a time.
767	The second column considers the scenarios $\{\mathrm{stack}\} \times \{\mathrm{insfirst}\} \times \{\mathrm{allhead}, \mathrm{inselem}, \mathrm{remelem}\}$,
768	while the third stretches polarity and the fourth streches accessor.
769	Then next three columns each stretch two scenario dimensions and the last column stretches all three.
770	The \CFA bar in the last column is summarizing 840 test-program runs: 14 list lengths, 2 movements, 2 polarities, 3 accessors and 5 repetitions.
771
772	In the earlier plots of one scenario broken down by length, each data point, with its error bars, represents just 5 repetitions.
773	With a couple exceptions, this reproducibility error was small.
774	Now, for a \CFA bar, summarizing 70 (first column) to 840 (last column) runs, a bar's height is dominated by the different behaviours of the scenarios and list length that it summarizes.
775	Accordingly, the first column's bars are short and last one's are tall.
776	A box represents the inner 68\% of the durations, while its lines extend to cover 95\%.
777	The symbol on the bar is the mean duration.
778
779	The chosen benchmark of LQ-@tailq@ is not shown in this format because it would be trivial here.
780	With iter-scenario differences dominating the bar size, and @tailq@'s mean performance defined to be zero in all scenarios, a @tailq@ bar on this plot would only show @tailq@'s re-run stabiity, which is of no comparison value.
781
782	The LQ-@list@ implementation does not support all scenarios, only stack movement with insert-first polarity.
783	So, its 1, 3, 4 and 7\textsuperscript{th} bars all summarize the same set of points (those with accessor constrained to all-head), as do its 2, 5, 6 and 8\textsuperscript{th} (those with accessor unconstrained).
784
785	Rather than exploring from one scenario out, \VRef{fig:plot-list-cmp-overall}-(b) gives a more systematic breakdown of the entire experimental space.
786	Other than the last grand-total column, each breakdown column shows one value from one operation dimension.
787
788	LQ-@list@'s partial scenario coverage gives missing bars where it does not support the operation.
789	And, again, it gives repetition where all data points occur in several columns' intersection, such as stack// and /insfirst/.
790
791	In the grand total, and in all halves by movement or polarity, \CFA and uC++ are equivalent, while LQ-@list@ beats them slightly.
792	Splitting on accessor, \CFA has a poor result on element removal, LQ-@list@ has a great result on the other accessors, and uC++ is unaffected.
793	The unseen @tailq@ dominates across every category and beats \CFA and uC++ by 15--20\%.
794
795	% \begin{figure}
796	% \centering
797	% \begin{tabular}{c}
798	% \includegraphics{plot-list-cmp-intrl-shift.pdf} \\
799	% (a) \\
800	% \includegraphics{plot-list-cmp-intrl-outcome.pdf} \\
801	% (b) \\
802	% \end{tabular}
803	% \caption{Caption TODO}
804	% \label{fig:plot-list-cmp-intrl}
805	% \end{figure}
806
807	\section{Result: CFA cost attribution}
808
809	This comparison loosely itemizes the reasons that the \CFA implementation runs 15--20\% slower than LQ. Each reason provides for safer programming. For each reaon, a version of the \CFA list was measured that forgoes its safety and regains some performance. These potential sacrifices are:
810	\newcommand{\mandhead}{\emph{mand-head}}
811	\newcommand{\nolisted}{\emph{no-listed}}
812	\newcommand{\noiter}{\emph{no-iter}}
813	\begin{description}
814	\item[mand(atory)-head] Removing support for headless lists.
815	A specific explanation of why headless support causes a slowdown is not offered.
816	But it is reasonable for a cost to result from making one pieceof code handle multiple cases; the subset of the \CFA list API that applies to headless lists shares its implementation with headed lists.
817	In the \mandhead case, disabling the feature in \CFA means using an older version of the implementation, from before headless support was added.
818	In the pre-headless library, trying to form a headless list (instructing, ``Insert loose element B after loose element A,'') is a checked runtime error.
819	LQ does not support headless lists\footnote{
820	Though its documentation does not mention the headless use case, this fact is due to one of its insert-before or insert-after routines being unusable in every list model.
821	For \lstinline{tailq}, the API requires a head.
822	For \lstinline{list}, this usage causes an ``uncaught'' runtime crash.}.
823	\item[no-listed] Removing support for the @is_listed@ API query.
824	Along with it goes error checking such as ``When instering an element, it must not already be listed, \ie be referred to from somewhere else.''
825	These abilities have a cost because, in order to support them, a listed element that is being removed must be written to, to record its change in state.
826	In \CFA's representation, this cost is two pointer writes.
827	To disable the feature, these writes, and the error checking that consumes their result, are put behind an @#ifdef@.
828	The result is that a removed element sees itself as still having neighbours (though these quasi-neighbours see it differently).
829	This state is how LQ leaves a removed element; LQ does not offer an is-listed query.
830	\item[no-iter(ation)] Removing support for well-terminating iteration.
831	The \CFA list uses bit-manipulation tagging on link poiters (rather than \eg null links) to express, ``No more elements this way.''
832	This tagging has the cost of submitting a retrieved value to the ALU, and awaiting this operation's completion, before dereferencing a link pointer.
833	In some cases, the is-terminating bit is transferred from one link to another, or has a similar influence on a resulting link value; this logic adds register pressure and more data dependency.
834	To disable the feature, the @#ifdef@-controlled tag manipulation logic compiles in answers like, ``No, that link is not a terminator,'' ``The dereferenceable pointer is the value you read from memory,'' and ``The terminator-marked value you need to write is the pointer you started with.''
835	Without this termination marking, repeated requests for a next valid item will always provide a positive response; when it should be negative, the indicated next element is garbage data at an address unlikely to trigger a memory error.
836	LQ has a well-terminating iteration for listed elements.
837	In the \noiter case, the slowdown is not inherent; it represents a \CFA optimization opporunity.
838	\end{description}
839	\MLB{Ensure benefits are discussed earlier and cross-reference} % an LQ programmer must know not to ask, ``Who's next?'' about an unlisted element; an LQ programmer cannot write assertions about an item being listed; LQ requiring a head parameter is an opportunity for the user to provide inconsistent data
840
841	\begin{figure}
842	\centering
843	\begin{tabular}{c}
844	\includegraphics{plot-list-cfa-attrib.pdf} \\
845	(a) \\
846	\includegraphics{plot-list-cfa-attrib-remelem.pdf} \\
847	(b) \\
848	\end{tabular}
849	\caption{Operation duration ranges for functionality-reduced \CFA list implementations. (a) has the top level slices. (b) has the next level of slicing within the slower element-based removal operation.}
850	\label{fig:plot-list-cfa-attrib}
851	\end{figure}
852
853	\VRef[Figure]{fig:plot-list-cfa-attrib} shows the \CFA list performance with these features, and their combinations, turned on and off. When a series name is one of the three sacrifices above, the series is showing this sacrifice in isolation. These further series names give combinations:
854	\newcommand{\attribFull}{\emph{full}}
855	\newcommand{\attribParity}{\emph{parity}}
856	\newcommand{\attribStrip}{\emph{strip}}
857	\begin{description}
858	\item[full] No sacrifices. Same as measurements presented earlier.
859	\item[parity] \mandhead + \nolisted. Feature parity with LQ.
860	\item[strip] \mandhead + \nolisted + \noiter. All options set to ``faster.''
861	\end{description}
862	All list implementations are \CFA, possibly stripped.
863	The plot uses the same LQ-relative basis as earlier.
864	So getting to zero means matching LQ's @tailq@.
865
866	\VRef[Figure]{fig:plot-list-cfa-attrib}-(a) summarizes the time attribution across the main operating scenarios.
867	The \attribFull series is repeated from \VRef[Figure]{fig:plot-list-cmp-overall}, part (b), while the series showing feature sacrifices are new.
868	Going all the way to \attribStrip at least nearly matches LQ in all operating scenarios, beats LQ often, and slightly beats LQ overall.
869	Except within the accessor splits, both sacrifices contribute improvements individually, \noiter helps more than \attribParity, and the total \attribStrip benefit depends on both contributions.
870	When the accessor is not element removal, the \attribParity shift appears to be counterproductive, leaving \noiter to deliver most of the benefit.
871	For element removals, \attribParity is the heavy hitter, with \noiter contributing modestly.
872
873	The couterproductive shift outside of element removals is likely due to optimization done in the \attribFull version after implementing headless support, \ie not present in the \mandhead version.
874	This work streamlined both head-based operations (head-based removal being half the work of the element-insertion test).
875	This improvement could be ported to a \mandhead-style implementation, which would bring down the \attribParity time in these cases.
876
877	More significantly, missing this optimization affects every \attribParity result because they all use head-based inserts or removes for at least half their operations.
878	It is likely a reason that \attribParity is not delivering as well overall as \noiter.
879	It even represents plausible further improvements in \attribStrip.
880
881	\VRef[Figure]{fig:plot-list-cfa-attrib}-(b) addresses element removal being the overall \CFA slow spot and element removal having a peculiar shape in the (a) analysis.
882	Here, the \attribParity sacrifice bundle is broken out into its two consituents.
883	The result is the same regardless of the operation.
884	All three individual sacrifices contribute noteworthy improvements (\nolisted slightly less).
885	The fullest improvement requires all of them.
886
887	The \noiter feature sacrifice is unpalatable.
888	But because it is not an inherent slowdown, there may be room to pursue a \noiter-level speed improvement without the \noiter feature sacrifice.
889	The performance crux for \noiter is the pointer-bit tagging scheme.
890	Alternative designs that may offer speedup with acceptable consequences include keeping the tag information in a separate field, and (for 64-bit architectures) keeping it in the high-order byte \ie using byte- rather than bit-oriented instructions to access it.
891	The \noiter speed improvement would bring \CFA to +5\% of LQ overall, and from high twenties to high teens, in the worst case of element removal.
892
893	Utimately, this analysis provides options for a future effort that needs to get the most speed out of the \CFA list.
894
895
896
897
898	\section{Future Work}
899	\label{toc:lst:futwork}
900
901	The \CFA list examples elide the \lstinline{P9_EMBEDDED} annotations that (TODO: xref P9E future work) proposes to obviate.
902	Thus, these examples illustrate a to-be state, free of what is to be historic clutter.
903	The elided portions are immaterial to the discussion and the examples work with the annotations provided.
904	The \CFA test suite (TODO:cite?) includes equivalent demonstrations, with the annotations included.
905	\begin{cfa}
906	struct mary {
907	float anotherdatum;
908	inline dlink(mary);
909	};
910	struct fred {
911	float adatum;
912	inline struct mine { inline dlink(fred); };
913	inline struct yours { inline dlink(fred); };
914	};
915	\end{cfa}
916	like in the thesis examples. You have to say
917	\begin{cfa}
918	struct mary {
919	float anotherdatum;
920	inline dlink(mary);
921	};
922	P9_EMBEDDED(mary, dlink(mary))
923	struct fred {
924	float adatum;
925	inline struct mine { inline dlink(fred); };
926	inline struct yours { inline dlink(fred); };
927	};
928	P9_EMBEDDED(fred, fred.mine)
929	P9_EMBEDDED(fred, fred.yours)
930	P9_EMBEDDED(fred.mine, dlink(fred))
931	P9_EMBEDDED(fred.yours, dlink(fred))
932	\end{cfa}
933	like in tests/list/dlist-insert-remove.
934	Future work should autogen those @P9_EMBEDDED@ declarations whenever it sees a plan-9 declaration.
935	The exact scheme chosen should harmonize with general user-defined conversions.
936
937	Today's P9 scheme is: mary gets a function `inner returning this as dlink(mary).
938	Fred gets four of them in a diamond.
939	They're defined so that `inner is transitive; i.e. fred has two further ambiguous overloads mapping fred to dlink(fred).
940	The scheme allows the dlist functions to give the assertion, "we work on any T that gives a `inner to dlink(T)."
941
942
943	TODO: deal with: A doubly linked list is being designed.
944
945	TODO: deal with: Link fields are system-managed.
946	Links in GDB.
947

Note: See TracBrowser for help on using the repository browser.

Download in other formats: