| [5717495] | 1 | \chapter{Linked List}
|
|---|
| [74accc6] | 2 | \label{ch:list}
|
|---|
| [5717495] | 3 |
|
|---|
| [4740281] | 4 | This chapter presents my work on designing and building a linked-list library for \CFA.
|
|---|
| [0f9c67bf] | 5 | Due to time limitations and the needs expressed by the \CFA runtime developers, I focussed on providing a doubly-linked list, and its bidirectional iterators for traversal.
|
|---|
| [4740281] | 6 | Simpler data-structures, like stack and queue, can be built from the doubly-linked mechanism with only a slight storage/performance cost because of the unused link field.
|
|---|
| [0f9c67bf] | 7 | Reducing to data-structures with a single link follows directly from the more complex double links and its iterators.
|
|---|
| [5717495] | 8 |
|
|---|
| 9 |
|
|---|
| [9367cd6] | 10 | \section{Plan-9 Inheritance}
|
|---|
| 11 | \label{s:Plan9Inheritance}
|
|---|
| 12 |
|
|---|
| 13 | This chapter uses a form of inheritance from the Plan-9 C dialect~\cite[\S~3.3]{Thompson90new}, which is supported by @gcc@ and @clang@ using @-fplan9-extensions@.
|
|---|
| 14 | \CFA has its own variation of the Plan-9 mechanism, where the nested type is denoted using @inline@.
|
|---|
| 15 | \begin{cfa}
|
|---|
| 16 | union U { int x; double y; char z; };
|
|---|
| 17 | struct W { int i; double j; char k; };
|
|---|
| 18 | struct S {
|
|---|
| 19 | @inline@ struct W; $\C{// extended Plan-9 inheritance}$
|
|---|
| 20 | unsigned int tag;
|
|---|
| 21 | @inline@ U; $\C{// extended Plan-9 inheritance}$
|
|---|
| 22 | } s;
|
|---|
| 23 | \end{cfa}
|
|---|
| 24 | Inline inheritance is containment, where the inlined field is unnamed and the type's internal fields are hoisted into the containing structure.
|
|---|
| 25 | Hence, the field names must be unique, unlike \CC nested types, but any inlined type-names are at a nested scope level, unlike aggregate nesting in C.
|
|---|
| 26 | Note, the position of the containment is normally unimportant, unless there is some form of memory or @union@ overlay.
|
|---|
| 27 | Finally, the inheritance declaration of @U@ is not prefixed with @union@.
|
|---|
| 28 | Like \CC, \CFA allows optional prefixing of type names with their kind, \eg @struct@, @union@, and @enum@, unless there is ambiguity with variable names in the same scope.
|
|---|
| 29 |
|
|---|
| 30 | \VRef[Figure]{f:Plan9Polymorphism} shows the key polymorphic feature of Plan-9 inheritance: implicit conversion of values and pointers for nested types.
|
|---|
| 31 | In the example, there are implicit conversions from @S@ to @U@ and @S@ to @W@, extracting the appropriate value or pointer for the substructure.
|
|---|
| 32 | \VRef[Figure]{f:DiamondInheritancePattern} shows complex multiple inheritance patterns are possible, like the \newterm{diamond pattern}~\cite[\S~6.1]{Stroustrup89}\cite[\S~4]{Cargill91}.
|
|---|
| [bb9897c] | 33 | Currently, the \CFA type-system does not support @virtual@ inheritance.
|
|---|
| [9367cd6] | 34 |
|
|---|
| 35 | \begin{figure}
|
|---|
| 36 | \begin{cfa}
|
|---|
| 37 | void f( U, U * ); $\C{// value, pointer}$
|
|---|
| 38 | void g( W, W * ); $\C{// value, pointer}$
|
|---|
| 39 | U u, * up; W w, * wp; S s, * sp;
|
|---|
| 40 | u = s; up = sp; $\C{// value, pointer}$
|
|---|
| 41 | w = s; wp = sp; $\C{// value, pointer}$
|
|---|
| 42 | f( s, &s ); $\C{// value, pointer}$
|
|---|
| 43 | g( s, &s ); $\C{// value, pointer}$
|
|---|
| 44 | \end{cfa}
|
|---|
| [9c12dd0] | 45 | \caption{Plan-9 polymorphism}
|
|---|
| [9367cd6] | 46 | \label{f:Plan9Polymorphism}
|
|---|
| 47 | \end{figure}
|
|---|
| 48 |
|
|---|
| 49 | \begin{figure}
|
|---|
| 50 | \setlength{\tabcolsep}{10pt}
|
|---|
| 51 | \begin{tabular}{ll@{}}
|
|---|
| 52 | \multicolumn{1}{c}{\CC} & \multicolumn{1}{c}{\CFA} \\
|
|---|
| 53 | \begin{c++}
|
|---|
| 54 | struct B { int b; };
|
|---|
| 55 | struct L : @public B@ { int l, w; };
|
|---|
| 56 | struct R : @public B@ { int r, w; };
|
|---|
| 57 | struct T : @public L, R@ { int t; };
|
|---|
| [bb9897c] | 58 | T
|
|---|
| 59 | B L B R
|
|---|
| [9367cd6] | 60 | T t = { { { 5 }, 3, 4 }, { { 8 }, 6, 7 }, 3 };
|
|---|
| 61 | t.t = 42;
|
|---|
| 62 | t.l = 42;
|
|---|
| 63 | t.r = 42;
|
|---|
| 64 | ((L &)t).b = 42; // disambiguate
|
|---|
| 65 | \end{c++}
|
|---|
| 66 | &
|
|---|
| 67 | \begin{cfa}
|
|---|
| 68 | struct B { int b; };
|
|---|
| 69 | struct L { @inline B;@ int l, w; };
|
|---|
| 70 | struct R { @inline B;@ int r, w; };
|
|---|
| 71 | struct T { @inline L; inline R;@ int t; };
|
|---|
| [bb9897c] | 72 | T
|
|---|
| 73 | B L B R
|
|---|
| [9367cd6] | 74 | T t = { { { 5 }, 3, 4 }, { { 8 }, 6, 7 }, 3 };
|
|---|
| 75 | t.t = 42;
|
|---|
| 76 | t.l = 42;
|
|---|
| 77 | t.r = 42;
|
|---|
| 78 | ((L &)t).b = 42; // disambiguate, proposed solution t.L.b = 42;
|
|---|
| 79 | \end{cfa}
|
|---|
| 80 | \end{tabular}
|
|---|
| [9c12dd0] | 81 | \caption{Diamond non-virtual inheritance pattern}
|
|---|
| [9367cd6] | 82 | \label{f:DiamondInheritancePattern}
|
|---|
| 83 | \end{figure}
|
|---|
| 84 |
|
|---|
| 85 |
|
|---|
| [4740281] | 86 | \section{Features}
|
|---|
| [5717495] | 87 |
|
|---|
| [4740281] | 88 | The following features directed this project, where the goal is high-performance list operations required by \CFA runtime components, like the threading library.
|
|---|
| [5717495] | 89 |
|
|---|
| 90 |
|
|---|
| 91 | \subsection{Core Design Issues}
|
|---|
| 92 |
|
|---|
| [bb9897c] | 93 | The doubly-linked list attaches links intrusively in a node, allows a node to appear on multiple lists (axes) simultaneously, integrates with user code via the type system, treats its ends uniformly, and identifies a list using an explicit head.
|
|---|
| [b195498] | 94 | This design covers system and data management issues stated in \VRef{toc:lst:issue}.
|
|---|
| [4740281] | 95 |
|
|---|
| [9367cd6] | 96 | \VRef[Figure]{fig:lst-features-intro} continues the running @req@ example from \VRef[Figure]{fig:lst-issues-attach} using the \CFA list @dlist@.
|
|---|
| [b195498] | 97 | The \CFA link attachment is intrusive so the resulting memory layout is per user node, as for the LQ version of \VRef[Figure]{f:Intrusive}.
|
|---|
| [9367cd6] | 98 | The \CFA framework provides generic type @dlink( T )@ (not to be confused with @dlist@) for the two link fields (front and back).
|
|---|
| 99 | A user inserts the links into the @req@ structure via \CFA inline-inheritance \see{\VRef{s:Plan9Inheritance}}.
|
|---|
| 100 | Lists leverage the automatic conversion of a pointer to anonymous inline field for assignments and function calls.
|
|---|
| 101 | Therefore, a reference to a @req@ is implicitly convertible to @dlink@ in both contexts.
|
|---|
| 102 | The links in @dlist@ point at another embedded @dlist@ node, know the offsets of all links (data is abstract but accessible), and any field-offset arithmetic or link-value changes are safe and abstract.
|
|---|
| [5717495] | 103 |
|
|---|
| 104 | \begin{figure}
|
|---|
| [4740281] | 105 | \lstinput{20-30}{lst-features-intro.run.cfa}
|
|---|
| [fa8d17f] | 106 | \caption[Multiple link axes in \CFA list library]{
|
|---|
| [5d9c4bb] | 107 | Demonstration of the running \lstinline{req} example, done using the \CFA list library.
|
|---|
| [b195498] | 108 | This example is equivalent to the three approaches in \VRef[Figure]{fig:lst-issues-attach}.
|
|---|
| [5717495] | 109 | }
|
|---|
| [5d9c4bb] | 110 | \label{fig:lst-features-intro}
|
|---|
| [5717495] | 111 | \end{figure}
|
|---|
| 112 |
|
|---|
| [9367cd6] | 113 | \VRef[Figure]{f:dlistOutline} shows the outline for types @dlink@ and @dlist@.
|
|---|
| 114 | Note, the first @forall@ clause is distributed across all the declaration in its containing block, eliminating repetition on each declaration.
|
|---|
| 115 | The second nested @forall@ on @dlist@ is also distributed and adds an optional second type parameter, @tLinks@, denoting the linking axis \see{\VRef{s:Axis}}, \ie the kind of list this node can appear on.
|
|---|
| 116 |
|
|---|
| 117 | \begin{figure}
|
|---|
| 118 | \begin{cfa}
|
|---|
| 119 | forall( tE & ) { $\C{// distributed}$
|
|---|
| 120 | struct @dlink@ { ... }; $\C{// abstract type}$
|
|---|
| 121 | static inline void ?{}( dlink( tE ) & this ); $\C{// constructor}$
|
|---|
| 122 |
|
|---|
| 123 | forall( tLinks & = dlink( tE ) ) { $\C{// distributed, default type for axis}$
|
|---|
| 124 | struct @dlist@ { ... }; $\C{// abstract type}$
|
|---|
| 125 | static inline void ?{}( dlist( tE, tLinks ) & this ); $\C{// constructor}$
|
|---|
| 126 | }
|
|---|
| 127 | }
|
|---|
| 128 | \end{cfa}
|
|---|
| [9c12dd0] | 129 | \caption[Sketch of the dlink and dlist type definitions]{Sketch of the \lstinline{dlink} and \lstinline{dlist} type definitions }
|
|---|
| [9367cd6] | 130 | \label{f:dlistOutline}
|
|---|
| 131 | \end{figure}
|
|---|
| 132 |
|
|---|
| [bb9897c] | 133 | \VRef[Figure]{fig:lst-features-multidir} shows how the \CFA library supports multi-inline links, so a node has multiple axes.
|
|---|
| [4740281] | 134 | The declaration of @req@ has two inline-inheriting @dlink@ occurrences.
|
|---|
| 135 | The first of these gives a type named @req.by_pri@, @req@ inherits from it, and it inherits from @dlink@.
|
|---|
| 136 | The second line @req.by_rqr@ is similar to @req.by_pri@.
|
|---|
| [9367cd6] | 137 | Thus, there is a diamond, non-virtual, inheritance from @req@ to @dlink@, with @by_pri@ and @by_rqr@ being the mid-level types \see{\VRef[Figure]{f:DiamondInheritancePattern}}.
|
|---|
| [4740281] | 138 |
|
|---|
| [9367cd6] | 139 | The declarations of the list-head objects, @reqs_pri@, @reqs_rqr_42@, @reqs_rqr_17@, and @reqs_rqr_99@, bind which link nodes in @req@ are used by this list.
|
|---|
| [bb9897c] | 140 | Hence, the type of the variable @reqs_pri@, @dlist(req, req.by_pri)@, means operations on @reqs_pri@ implicitly select (disambiguate) the correct @dlink@s, \eg the calls @insert_first(reqs_pri, ...)@ imply, ``here, we are working by priority.''
|
|---|
| [b195498] | 141 | As in \VRef[Figure]{fig:lst-issues-multi-static}, three lists are constructed, a priority list containing all nodes, a list with only nodes containing the value 42, and a list with only nodes containing the value 17.
|
|---|
| [4740281] | 142 |
|
|---|
| [5717495] | 143 | \begin{figure}
|
|---|
| [5d9c4bb] | 144 | \centering
|
|---|
| [bb9897c] | 145 |
|
|---|
| [5d9c4bb] | 146 | \begin{tabular}{@{}ll@{}}
|
|---|
| [bb9897c] | 147 | \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{C} \\
|
|---|
| 148 | \begin{tabular}{@{}l@{}}
|
|---|
| 149 | \lstinput{20-31}{lst-features-multidir.run.cfa} \\
|
|---|
| 150 | \lstinput{43-71}{lst-features-multidir.run.cfa}
|
|---|
| [5d9c4bb] | 151 | \end{tabular}
|
|---|
| [bb9897c] | 152 | &
|
|---|
| 153 | \lstinput[language=C++]{20-60}{lst-issues-multi-static.run.c}
|
|---|
| 154 | \end{tabular}
|
|---|
| [5d9c4bb] | 155 |
|
|---|
| [9c12dd0] | 156 | \caption[Demonstration of multiple static link axes in \CFA]{
|
|---|
| 157 | Demonstration of multiple static link axes in \CFA.
|
|---|
| [bb9897c] | 158 | The right example is from \VRef[Figure]{fig:lst-issues-multi-static}.
|
|---|
| 159 | The left \CFA example does the same job.
|
|---|
| 160 | }
|
|---|
| 161 | \label{fig:lst-features-multidir}
|
|---|
| [5717495] | 162 | \end{figure}
|
|---|
| 163 |
|
|---|
| [9367cd6] | 164 | The list library also supports the common case of single directionality more naturally than LQ.
|
|---|
| [fa8d17f] | 165 | Returning to \VRef[Figure]{fig:lst-features-intro}, the single-axis list has no contrived name for the link axis as it uses the default type in the definition of @dlist@;
|
|---|
| [9367cd6] | 166 | in contrast, the LQ list in \VRef[Figure]{f:Intrusive} adds the unnecessary field name @d@.
|
|---|
| [fa8d17f] | 167 | In \CFA, a single axis list sets up a single inheritance with @dlink@, and the default list axis is to itself.
|
|---|
| [5717495] | 168 |
|
|---|
| [fa8d17f] | 169 | When operating on a list with several axes and operations that do not take the list head, the list axis can be ambiguous.
|
|---|
| [9367cd6] | 170 | For example, a call like @insert_after( r1, r2 )@ does not have enough information to know which axes to select implicitly.
|
|---|
| [4740281] | 171 | Is @r2@ supposed to be the next-priority request after @r1@, or is @r2@ supposed to join the same-requester list of @r1@?
|
|---|
| [9367cd6] | 172 | As such, the \CFA type-system gives an ambiguity error for this call.
|
|---|
| 173 | There are multiple ways to resolve the ambiguity.
|
|---|
| 174 | The simplest is an explicit cast on each call to select the specific axis, \eg @insert_after( (by_pri)r1, r2 )@.
|
|---|
| 175 | However, multiple explicit casts are tedious and error-prone.
|
|---|
| 176 | To mitigate this issue, the list library provides a hook for applying the \CFA language's scoping and priority rules.
|
|---|
| [0d41e600] | 177 | \begin{cfa}
|
|---|
| [fa8d17f] | 178 | with ( DLINK_VIA( req, req.pri ) ) insert_after( r1, r2 );
|
|---|
| [0d41e600] | 179 | \end{cfa}
|
|---|
| [4740281] | 180 | Here, the @with@ statement opens the scope of the object type for the expression;
|
|---|
| [fa8d17f] | 181 | hence, the @DLINK_VIA@ result causes one of the list axes to become a more attractive candidate to \CFA's overload resolution.
|
|---|
| [9367cd6] | 182 | This boost can be applied across multiple statements in a block or an entire function body.
|
|---|
| [4740281] | 183 | \begin{cquote}
|
|---|
| 184 | \setlength{\tabcolsep}{15pt}
|
|---|
| 185 | \begin{tabular}{@{}ll@{}}
|
|---|
| [0d41e600] | 186 | \begin{cfa}
|
|---|
| [9367cd6] | 187 | @with( DLINK_VIA( req, req.pri ) ) {@
|
|---|
| 188 | ... insert_after( r1, r2 ); ...
|
|---|
| 189 | @}@
|
|---|
| [0d41e600] | 190 | \end{cfa}
|
|---|
| 191 | &
|
|---|
| 192 | \begin{cfa}
|
|---|
| [9367cd6] | 193 | void f() @with( DLINK_VIA( req, req.pri ) ) {@
|
|---|
| 194 | ... insert_after( r1, r2 ); ...
|
|---|
| 195 | @}@
|
|---|
| [0d41e600] | 196 | \end{cfa}
|
|---|
| 197 | \end{tabular}
|
|---|
| [4740281] | 198 | \end{cquote}
|
|---|
| [fa8d17f] | 199 | Within the @with@, the code acts as if there is only one list axis, without explicit casting.
|
|---|
| [0d41e600] | 200 |
|
|---|
| [6224eeb] | 201 | Unlike the \CC template collection types, the \CFA library works completely within the type system;
|
|---|
| [9367cd6] | 202 | both @dlink@ and @dlist@ are ordinary types, not language macros.
|
|---|
| [4740281] | 203 | There is no textual expansion other than header-included static-inline function for performance.
|
|---|
| [fa8d17f] | 204 | Hence, errors in user code are reported only with mention of the library's declarations, versus long template names in error messages.
|
|---|
| [9367cd6] | 205 | Finally, the library is separately compiled from the usage code, modulo inlining.
|
|---|
| [5717495] | 206 |
|
|---|
| 207 |
|
|---|
| [0e4e43d] | 208 | \section{List API}
|
|---|
| 209 |
|
|---|
| [9d9fd81] | 210 | \VRef[Figure]{f:ListAPI} shows the API for the doubly-link list operations, where each is explained.
|
|---|
| 211 | \begin{itemize}[leftmargin=*]
|
|---|
| [0e4e43d] | 212 | \item
|
|---|
| [bb9897c] | 213 | @listed@ returns true if the node is an element of a list and false otherwise.
|
|---|
| [0e4e43d] | 214 | \item
|
|---|
| [bb9897c] | 215 | @empty@ returns true if the list has no nodes and false otherwise.
|
|---|
| [0e4e43d] | 216 | \item
|
|---|
| [b195498] | 217 | @first@ returns a reference to the first node of the list without removing it or @0p@ if the list is empty.
|
|---|
| [0e4e43d] | 218 | \item
|
|---|
| [b195498] | 219 | @last@ returns a reference to the last node of the list without removing it or @0p@ if the list is empty.
|
|---|
| [0e4e43d] | 220 | \item
|
|---|
| [9367cd6] | 221 | @insert_before@ adds a node before a specified node \see{\lstinline{insert_last} for insertion at the end}\footnote{
|
|---|
| 222 | Some list packages allow \lstinline{0p} (\lstinline{nullptr}) for the before/after node implying insert/remove at the start/end of the list, respectively.
|
|---|
| 223 | However, this inserts an \lstinline{if} statement in the fastpath of a potentially commonly used list operation.}
|
|---|
| [0e4e43d] | 224 | \item
|
|---|
| [9367cd6] | 225 | @insert_after@ adds a node after a specified node \see{\lstinline{insert_first} for insertion at the start}\footnotemark[\value{footnote}].
|
|---|
| [0e4e43d] | 226 | \item
|
|---|
| [9367cd6] | 227 | @remove@ removes a specified node from the list (any location) and returns a reference to the node.
|
|---|
| [b195498] | 228 | \item
|
|---|
| 229 | @iter@ create an iterator for the list.
|
|---|
| 230 | \item
|
|---|
| 231 | @recede@ returns true if the iterator cursor is advanced to the previous (predecessor, towards first) node before the prior cursor node and false otherwise.
|
|---|
| 232 | \item
|
|---|
| 233 | @advance@ returns true if the iterator cursor is advanced to the next (successor, towards last) node after the prior cursor node and false otherwise.
|
|---|
| 234 | \item
|
|---|
| [bb9897c] | 235 | @first@ returns true if the node is the first node in the list and false otherwise.
|
|---|
| [b195498] | 236 | \item
|
|---|
| [bb9897c] | 237 | @last@ returns true if the node is the last node in the list and false otherwise.
|
|---|
| [0e4e43d] | 238 | \item
|
|---|
| [9367cd6] | 239 | @pred@ returns a reference to the previous (predecessor, towards first) node before a specified node or @0p@ if a specified node is the first node in the list.
|
|---|
| [0e4e43d] | 240 | \item
|
|---|
| [9367cd6] | 241 | @next@ returns a reference to the next (successor, towards last) node after a specified node or @0p@ if a specified node is the last node in the list.
|
|---|
| [0e4e43d] | 242 | \item
|
|---|
| [b195498] | 243 | @insert_first@ adds a node to the start of the list so it becomes the first node and returns a reference to the node for cascading.
|
|---|
| [0e4e43d] | 244 | \item
|
|---|
| [b195498] | 245 | @insert_last@ adds a node to the end of the list so it becomes the last node and returns returns a reference to the node for cascading.
|
|---|
| [0e4e43d] | 246 | \item
|
|---|
| [b195498] | 247 | @remove_first@ removes the first node and returns a reference to it or @0p@ if the list is empty.
|
|---|
| 248 | \item
|
|---|
| 249 | @remove_last@ removes the last node and returns a reference to it or @0p@ if the list is empty.
|
|---|
| [0e4e43d] | 250 | \item
|
|---|
| [9d9fd81] | 251 | @transfer@ transfers all nodes from the @from@ list to the end of the @to@ list; the @from@ list is empty after the transfer.
|
|---|
| [0e4e43d] | 252 | \item
|
|---|
| [9d9fd81] | 253 | @split@ transfers the @from@ list up to node to the end of the @to@ list; the @from@ list becomes the list after the node.
|
|---|
| 254 | The node must be in the @from@ list.
|
|---|
| [0e4e43d] | 255 | \end{itemize}
|
|---|
| [bb9897c] | 256 | For operations @insert_*@, @insert@, and @remove@, a variable-sized list of nodes can be specified using \CFA's tuple type~\cite[\S~4.7]{Moss18} (not discussed), \eg @insert( list, n1, n2, n3 )@, recursively invokes @insert( list, n@$_i$@ )@.\footnote{
|
|---|
| 257 | Currently, a resolver bug between tuple types and references means tuple routines must use pointer parameters.
|
|---|
| 258 | Nevertheless, the imaginary reference versions are used here as the code is cleaner, \eg no \lstinline{&} on call arguments.}
|
|---|
| [0e4e43d] | 259 |
|
|---|
| [9d9fd81] | 260 | \begin{figure}
|
|---|
| 261 | \begin{cfa}
|
|---|
| [bb9897c] | 262 | E & listed( E & node );
|
|---|
| 263 | E & empty( dlist( E ) & list );
|
|---|
| [b195498] | 264 | E & first( dlist( E ) & list );
|
|---|
| 265 | E & last( dlist( E ) & list );
|
|---|
| [9d9fd81] | 266 | E & insert_before( E & before, E & node );
|
|---|
| 267 | E & insert_after( E & after, E & node );
|
|---|
| 268 | E & remove( E & node );
|
|---|
| [b195498] | 269 | E & iter( dlist( E ) & list );
|
|---|
| 270 | bool advance( E && refx );
|
|---|
| 271 | bool recede( E && refx );
|
|---|
| [bb9897c] | 272 | bool first( E & node );
|
|---|
| 273 | bool last( E & node );
|
|---|
| [b195498] | 274 | E & prev( E & node );
|
|---|
| 275 | E & next( E & node );
|
|---|
| [9d9fd81] | 276 | E & insert_first( dlist( E ) & list, E & node );
|
|---|
| [bb9897c] | 277 | E & insert_last( dlist( E ) & list, E & node ); // synonym insert
|
|---|
| [9d9fd81] | 278 | E & remove_first( dlist( E ) & list );
|
|---|
| 279 | E & remove_last( dlist( E ) & list );
|
|---|
| [9c12dd0] | 280 | void transfer( dlist( E ) & to, dlist( E ) & from );
|
|---|
| 281 | void split( dlist( E ) & to, dlist( E ) & from, E & node );
|
|---|
| [9d9fd81] | 282 | \end{cfa}
|
|---|
| [9c12dd0] | 283 | \caption{\CFA list API}
|
|---|
| [9d9fd81] | 284 | \label{f:ListAPI}
|
|---|
| 285 | \end{figure}
|
|---|
| 286 |
|
|---|
| [5717495] | 287 |
|
|---|
| 288 | \subsection{Iteration}
|
|---|
| 289 |
|
|---|
| [0e4e43d] | 290 | It is possible to iterate through a list manually or using a set of standard macros.
|
|---|
| [bb9897c] | 291 | \VRef[Figure]{f:IteratorDriver} shows the iterator outline, managing a list of nodes, used throughout the following iterator examples.
|
|---|
| [0e4e43d] | 292 | Each example assumes its loop body prints the value in the current node.
|
|---|
| 293 |
|
|---|
| [9d9fd81] | 294 | \begin{figure}
|
|---|
| 295 | \begin{cfa}
|
|---|
| 296 | #include <fstream.hfa>
|
|---|
| 297 | #include <list.hfa>
|
|---|
| 298 | struct node {
|
|---|
| 299 | int v;
|
|---|
| 300 | inline dlink(node);
|
|---|
| 301 | };
|
|---|
| 302 | int main() {
|
|---|
| 303 | dlist(node) list;
|
|---|
| 304 | node n1 = { 1 }, n2 = { 2 }, n3 = { 3 }, n4 = { 4 };
|
|---|
| [bb9897c] | 305 | insert( list, n1, n2, n3, n4 ); $\C{// insert in any order}$
|
|---|
| [9d9fd81] | 306 | sout | nlOff;
|
|---|
| [bb9897c] | 307 | @for ( ... )@ sout | it.v | ","; sout | nl; $\C{// iterator examples in text}$
|
|---|
| 308 | remove( n1, n2, n3, n4 ); $\C{// remove in any order}$
|
|---|
| [9d9fd81] | 309 | }
|
|---|
| 310 | \end{cfa}
|
|---|
| [9c12dd0] | 311 | \caption{Iterator driver}
|
|---|
| [bb9897c] | 312 | \label{f:IteratorDriver}
|
|---|
| [9d9fd81] | 313 | \end{figure}
|
|---|
| 314 |
|
|---|
| [0e4e43d] | 315 | The manual method is low level but allows complete control of the iteration.
|
|---|
| 316 | The list cursor (index) can be either a pointer or a reference to a node in the list.
|
|---|
| 317 | The choice depends on how the programmer wants to access the fields: @it->f@ or @it.f@.
|
|---|
| 318 | The following examples use a reference because the loop body manipulates the node values rather than the list pointers.
|
|---|
| [9367cd6] | 319 | The end of iteration is denoted by the loop cursor returning @0p@.
|
|---|
| [0e4e43d] | 320 |
|
|---|
| 321 | \noindent
|
|---|
| 322 | Iterating forward and reverse through the entire list.
|
|---|
| 323 | \begin{cquote}
|
|---|
| 324 | \setlength{\tabcolsep}{15pt}
|
|---|
| 325 | \begin{tabular}{@{}l|l@{}}
|
|---|
| 326 | \begin{cfa}
|
|---|
| [b195498] | 327 | for ( node & it = @first@( list ); &it /* != 0p */ ; &it = &@next@( it ) ...
|
|---|
| 328 | for ( node & it = @last@( list ); ⁢ &it = &@prev@( it ) ) ...
|
|---|
| [0e4e43d] | 329 | \end{cfa}
|
|---|
| 330 | &
|
|---|
| 331 | \begin{cfa}
|
|---|
| 332 | 1, 2, 3, 4,
|
|---|
| 333 | 4, 3, 2, 1,
|
|---|
| 334 | \end{cfa}
|
|---|
| 335 | \end{tabular}
|
|---|
| 336 | \end{cquote}
|
|---|
| 337 | Iterating forward and reverse from a starting node through the remaining list.
|
|---|
| 338 | \begin{cquote}
|
|---|
| 339 | \setlength{\tabcolsep}{15pt}
|
|---|
| 340 | \begin{tabular}{@{}l|l@{}}
|
|---|
| 341 | \begin{cfa}
|
|---|
| [b195498] | 342 | for ( node & it = @n2@; ⁢ &it = &@next@( it ) ) ...
|
|---|
| 343 | for ( node & it = @n3@; ⁢ &it = &@prev@( it ) ) ...
|
|---|
| [0e4e43d] | 344 | \end{cfa}
|
|---|
| 345 | &
|
|---|
| 346 | \begin{cfa}
|
|---|
| 347 | 2, 3, 4,
|
|---|
| 348 | 3, 2, 1,
|
|---|
| 349 | \end{cfa}
|
|---|
| 350 | \end{tabular}
|
|---|
| 351 | \end{cquote}
|
|---|
| [b195498] | 352 | Iterating forward and reverse from a starting node to an ending node through the contained list.
|
|---|
| [0e4e43d] | 353 | \begin{cquote}
|
|---|
| 354 | \setlength{\tabcolsep}{15pt}
|
|---|
| 355 | \begin{tabular}{@{}l|l@{}}
|
|---|
| 356 | \begin{cfa}
|
|---|
| [b195498] | 357 | for ( node & it = @n2@; &it @!= &n4@; &it = &@next@( it ) ) ...
|
|---|
| 358 | for ( node & it = @n4@; &it @!= &n2@; &it = &@prev@( it ) ) ...
|
|---|
| [0e4e43d] | 359 | \end{cfa}
|
|---|
| 360 | &
|
|---|
| 361 | \begin{cfa}
|
|---|
| 362 | 2, 3,
|
|---|
| 363 | 4, 3,
|
|---|
| 364 | \end{cfa}
|
|---|
| 365 | \end{tabular}
|
|---|
| 366 | \end{cquote}
|
|---|
| [bb9897c] | 367 | Iterating forward and reverse through the entire list using the shorthand start at the list head and pick an axis.
|
|---|
| [b195498] | 368 | In this case, @advance@ and @recede@ return a boolean, like \CC @while ( cin >> i )@.
|
|---|
| [0e4e43d] | 369 | \begin{cquote}
|
|---|
| 370 | \setlength{\tabcolsep}{15pt}
|
|---|
| 371 | \begin{tabular}{@{}l|l@{}}
|
|---|
| 372 | \begin{cfa}
|
|---|
| [b195498] | 373 | for ( node & it = @iter@( list ); @advance@( it ); ) ...
|
|---|
| 374 | for ( node & it = @iter@( list ); @recede@( it ); ) ...
|
|---|
| [0e4e43d] | 375 | \end{cfa}
|
|---|
| 376 | &
|
|---|
| 377 | \begin{cfa}
|
|---|
| 378 | 1, 2, 3, 4,
|
|---|
| 379 | 4, 3, 2, 1,
|
|---|
| 380 | \end{cfa}
|
|---|
| 381 | \end{tabular}
|
|---|
| 382 | \end{cquote}
|
|---|
| 383 | Finally, there are convenience macros that look like @foreach@ in other programming languages.
|
|---|
| 384 | Iterating forward and reverse through the entire list.
|
|---|
| 385 | \begin{cquote}
|
|---|
| 386 | \setlength{\tabcolsep}{15pt}
|
|---|
| 387 | \begin{tabular}{@{}l|l@{}}
|
|---|
| 388 | \begin{cfa}
|
|---|
| 389 | FOREACH( list, it ) ...
|
|---|
| 390 | FOREACH_REV( list, it ) ...
|
|---|
| 391 | \end{cfa}
|
|---|
| 392 | &
|
|---|
| 393 | \begin{cfa}
|
|---|
| 394 | 1, 2, 3, 4,
|
|---|
| 395 | 4, 3, 2, 1,
|
|---|
| 396 | \end{cfa}
|
|---|
| 397 | \end{tabular}
|
|---|
| 398 | \end{cquote}
|
|---|
| 399 | Iterating forward and reverse through the entire list or until a predicate is triggered.
|
|---|
| 400 | \begin{cquote}
|
|---|
| 401 | \setlength{\tabcolsep}{15pt}
|
|---|
| 402 | \begin{tabular}{@{}l|l@{}}
|
|---|
| 403 | \begin{cfa}
|
|---|
| 404 | FOREACH_COND( list, it, @it.v == 3@ ) ...
|
|---|
| 405 | FOREACH_REV_COND( list, it, @it.v == 1@ ) ...
|
|---|
| 406 | \end{cfa}
|
|---|
| 407 | &
|
|---|
| 408 | \begin{cfa}
|
|---|
| 409 | 1, 2,
|
|---|
| 410 | 4, 3, 2,
|
|---|
| 411 | \end{cfa}
|
|---|
| 412 | \end{tabular}
|
|---|
| 413 | \end{cquote}
|
|---|
| [b195498] | 414 | Macros are not ideal, so future work is to provide a language-level @foreach@ statement, like \CC.
|
|---|
| [9d9fd81] | 415 | Finally, a predicate can be added to any of the manual iteration loops.
|
|---|
| 416 | \begin{cquote}
|
|---|
| 417 | \setlength{\tabcolsep}{15pt}
|
|---|
| 418 | \begin{tabular}{@{}l|l@{}}
|
|---|
| [0e4e43d] | 419 | \begin{cfa}
|
|---|
| [b195498] | 420 | for ( node & it = first( list ); &it @&& !(it.v == 3)@; &it = &next( it ) ) ...
|
|---|
| 421 | for ( node & it = last( list ); &it @&& !(it.v == 1)@; &it = &prev( it ) ) ...
|
|---|
| 422 | for ( node & it = iter( list ); advance( it ) @&& !(it.v == 3)@; ) ...
|
|---|
| 423 | for ( node & it = iter( list ); recede( it ) @&& !(it.v == 1)@; ) ...
|
|---|
| [0e4e43d] | 424 | \end{cfa}
|
|---|
| [9d9fd81] | 425 | &
|
|---|
| 426 | \begin{cfa}
|
|---|
| 427 | 1, 2,
|
|---|
| 428 | 4, 3, 2,
|
|---|
| 429 | 1, 2,
|
|---|
| 430 | 4, 3, 2,
|
|---|
| 431 | \end{cfa}
|
|---|
| 432 | \end{tabular}
|
|---|
| 433 | \end{cquote}
|
|---|
| [0e4e43d] | 434 |
|
|---|
| 435 | \begin{comment}
|
|---|
| [4740281] | 436 | Many languages offer an iterator interface for collections, and a corresponding for-each loop syntax for consuming the items through implicit interface calls.
|
|---|
| [0d41e600] | 437 | \CFA does not yet have a general-purpose form of such a feature, though it has a form that addresses some use cases.
|
|---|
| [4740281] | 438 | This section shows why the incumbent \CFA pattern does not work for linked lists and gives the alternative now offered by the linked-list library.
|
|---|
| [0d41e600] | 439 | Chapter 5 [TODO: deal with optimism here] presents a design that satisfies both uses and accommodates even more complex collections.
|
|---|
| 440 |
|
|---|
| [4740281] | 441 | The current \CFA extensible loop syntax is:
|
|---|
| [0d41e600] | 442 | \begin{cfa}
|
|---|
| 443 | for( elem; end )
|
|---|
| [4740281] | 444 | for( elem; begin ~ end )
|
|---|
| 445 | for( elem; begin ~ end ~ step )
|
|---|
| [0d41e600] | 446 | \end{cfa}
|
|---|
| [4740281] | 447 | Many derived forms of @begin ~ end@ exist, but are used for defining numeric ranges, so they are excluded from the linked-list discussion.
|
|---|
| 448 | These three forms are rely on the iterative trait:
|
|---|
| [0d41e600] | 449 | \begin{cfa}
|
|---|
| [4740281] | 450 | forall( T ) trait Iterate {
|
|---|
| 451 | void ?{}( T & t, zero_t );
|
|---|
| 452 | int ?<?( T t1, T t2 );
|
|---|
| 453 | int ?<=?( T t1, T t2 );
|
|---|
| 454 | int ?>?( T t1, T t2 );
|
|---|
| 455 | int ?>=?( T t1, T t2 );
|
|---|
| 456 | T ?+=?( T & t1, T t2 );
|
|---|
| 457 | T ?+=?( T & t, one_t );
|
|---|
| 458 | T ?-=?( T & t1, T t2 );
|
|---|
| 459 | T ?-=?( T & t, one_t );
|
|---|
| 460 | }
|
|---|
| [0d41e600] | 461 | \end{cfa}
|
|---|
| [4740281] | 462 | where @zero_t@ and @one_t@ are constructors for the constants 0 and 1.
|
|---|
| 463 | The simple loops above are abbreviates for:
|
|---|
| [0d41e600] | 464 | \begin{cfa}
|
|---|
| [4740281] | 465 | for( typeof(end) elem = @0@; elem @<@ end; elem @+=@ @1@ )
|
|---|
| 466 | for( typeof(begin) elem = begin; elem @<@ end; elem @+=@ @1@ )
|
|---|
| 467 | for( typeof(begin) elem = @0@; elem @<@ end; elem @+=@ @step@ )
|
|---|
| [0d41e600] | 468 | \end{cfa}
|
|---|
| [4740281] | 469 | which use a subset of the trait operations.
|
|---|
| 470 | The shortened loop works well for iterating a number of times or through an array.
|
|---|
| 471 | \begin{cfa}
|
|---|
| 472 | for ( 20 ) // 20 iterations
|
|---|
| 473 | for ( i: 1 ~= 21 ~ 2 ) // odd numbers
|
|---|
| 474 | for ( i; n ) total += a[i]; // subscripts
|
|---|
| 475 | \end{cfa}
|
|---|
| 476 | which is similar to other languages, like JavaScript.
|
|---|
| [0d41e600] | 477 | \begin{cfa}
|
|---|
| 478 | for ( i in a ) total += a[i];
|
|---|
| 479 | \end{cfa}
|
|---|
| [4740281] | 480 | Albeit with different mechanisms for expressing the array's length.
|
|---|
| 481 | It might be possible to take the \CC iterator:
|
|---|
| 482 | \begin{c++}
|
|---|
| 483 | for ( list<int>::iterator it=mylist.begin(); it != mylist.end(); ++it )
|
|---|
| 484 | \end{c++}
|
|---|
| 485 | and convert it to the \CFA form
|
|---|
| 486 | \begin{cfa}
|
|---|
| 487 | for ( it; begin() ~= end() )
|
|---|
| 488 | \end{cfa}
|
|---|
| 489 | by having a list operator @<=@ that just looks for equality, and @+=@ that moves to the next node, \etc.
|
|---|
| 490 |
|
|---|
| [0e4e43d] | 491 | However, the list usage is contrived, because a list does use its data values for relational comparison, only links for equality comparison.
|
|---|
| [4740281] | 492 | Hence, the focus of a list iterator's stopping condition is fundamentally different.
|
|---|
| 493 | So, iteration of a linked list via the existing loop syntax is to ask whether this syntax can also do double-duty for iterating values.
|
|---|
| [0d41e600] | 494 | That is, to be an analog of JavaScript's @for..of@ syntax:
|
|---|
| 495 | \begin{cfa}
|
|---|
| 496 | for ( e of a ) total += e;
|
|---|
| 497 | \end{cfa}
|
|---|
| 498 |
|
|---|
| 499 | The \CFA team will likely implement an extension of this functionality that moves the @~@ syntax from being part of the loop, to being a first-class operator (with associated multi-pace operators for the elided derived forms).
|
|---|
| 500 | With this change, both @begin ~ end@ and @end@ (in context of the latter ``two-place for'' expression) parse as \emph{ranges}, and the loop syntax becomes, simply:
|
|---|
| 501 | \begin{cfa}
|
|---|
| [bb9897c] | 502 | for( elem; rangeExpr )
|
|---|
| [0d41e600] | 503 | \end{cfa}
|
|---|
| 504 | The expansion and underlying API are under discussion.
|
|---|
| 505 | TODO: explain pivot from ``is at done?'' to ``has more?''
|
|---|
| [eeefc0c] | 506 | Advantages of this change include being able to pass ranges to functions, for example, projecting a numerically regular subsequence of array entries, and being able to use the loop syntax to cover more collection types, such as looping over the keys of a hash table.
|
|---|
| [0d41e600] | 507 |
|
|---|
| [4740281] | 508 | When iterating an empty list, the question, ``Is there a further element?'' needs to be posed once, receiving the answer, ``no.''
|
|---|
| [0d41e600] | 509 | When iterating an $n$-item list, the same question gets $n$ ``yes'' answers (one for each element), plus one ``no'' answer, once there are no more elements; the question is posed $n+1$ times.
|
|---|
| 510 |
|
|---|
| [4740281] | 511 | When iterating an empty list, the question, ``What is the value of the current element?'' is never posed, nor is the command, ``Move to the next element,'' issued. When iterating an $n$-item list, each happens $n$ times.
|
|---|
| [0d41e600] | 512 |
|
|---|
| 513 | So, asking about the existence of an element happens once more than retrieving an element's value and advancing the position.
|
|---|
| 514 |
|
|---|
| 515 | Many iteration APIs deal with this fact by splitting these steps across different functions, and relying on the user's knowledge of iterator state to know when to call each. In Java, the function @hasNext@ should be called $n+1$ times and @next@ should be called $n$ times (doing the double duty of advancing the iteration and returning a value). In \CC, the jobs are split among the three actions, @it != end@, @it++@ and @*it@, the latter two being called one time more than the first.
|
|---|
| 516 |
|
|---|
| 517 | TODO: deal with simultaneous axes: @DLINK_VIA@ just works
|
|---|
| 518 |
|
|---|
| 519 | TODO: deal with spontaneous simultaneity, like a single-axis req, put into an array: which ``axis'' is @&req++@ navigating: array-adjacency vs link dereference. It should sick according to how you got it in the first place: navigating dlist(req, req.pri) vs navigating array(req, 42). (prob. future work)
|
|---|
| [0e4e43d] | 520 | \end{comment}
|
|---|
| 521 |
|
|---|
| [0d41e600] | 522 |
|
|---|
| [bb9897c] | 523 | \section[C++ Lists]{\CC Lists}
|
|---|
| 524 |
|
|---|
| 525 | It is worth addressing two API issues in \CC lists avoided in \CFA.
|
|---|
| 526 | First, \CC lists require two steps to remove a node versus one in \CFA.
|
|---|
| 527 | \begin{cquote}
|
|---|
| 528 | \begin{tabular}{@{}ll@{}}
|
|---|
| 529 | \multicolumn{1}{c}{\CC} & \multicolumn{1}{c}{\CFA} \\
|
|---|
| 530 | \begin{c++}
|
|---|
| 531 | list<node> li;
|
|---|
| 532 | node n = li.first(); // assignment could raise exception
|
|---|
| 533 | li.pop_front();
|
|---|
| 534 | \end{c++}
|
|---|
| 535 | &
|
|---|
| 536 | \begin{cfa}
|
|---|
| 537 | dlist(node) list;
|
|---|
| 538 | node n = remove_first( list );
|
|---|
| 539 |
|
|---|
| 540 | \end{cfa}
|
|---|
| 541 | \end{tabular}
|
|---|
| 542 | \end{cquote}
|
|---|
| 543 | The argument for two steps is exception safety: returning an unknown T by value/move might throw an exception from T's copy/move constructor.
|
|---|
| 544 | Hence, to be \emph{exception safe}, all internal list operations must complete before the copy/move so the list is consistent should the return fail.
|
|---|
| 545 | This coding style can result in contrived code, but is usually possible;
|
|---|
| [6224eeb] | 546 | however, it requires the collection designer to anticipate the potential throw.
|
|---|
| 547 | (Note, this anticipation issue is pervasive in exception systems, not just with collections.)
|
|---|
| 548 | The solution moves the coding complexity from the collection designer to the programer~\cite[ch~10, part 3]{Sutter99}.
|
|---|
| 549 | First, obtain the node, which might fail, but the collection is unmodified.
|
|---|
| 550 | Second, remove the node, which modifies the collection without the possibly of an exception.
|
|---|
| [bb9897c] | 551 | This movement of responsibility increases the cognitive effort for programmers.
|
|---|
| 552 | Unfortunately, this \emph{single-responsibility principle}, \ie preferring separate operations, is often repeated as a necessary requirement rather an an optional one.
|
|---|
| 553 | Separate operations should always be available, but their composition should also be available.
|
|---|
| 554 | Interestingly, this issue does not apply to intrusive lists, because the node data is never copied/moved in or out of a list;
|
|---|
| 555 | only the link fields are accessed in list operations.
|
|---|
| 556 |
|
|---|
| 557 | Second, \VRef[Figure]{f:CCvsCFAListIssues} shows an example where a \CC list operation is $O(N$) rather than $O(1)$ in \CFA.
|
|---|
| 558 | This issue is inherent to wrapped (non-intrusive) lists.
|
|---|
| 559 | Specifically, to remove a node requires access to the links that materialize its membership.
|
|---|
| 560 | In a wrapped list, there is no access from node to links, and for abstract reasons, no direct pointers to wrapped nodes, so the links must be found indirectly by navigating the list.
|
|---|
| 561 | The \CC iterator is the abstraction to navigate wrapped links.
|
|---|
| 562 | So an iterator is needed, not because it offers go-next, but for managing the list membership.
|
|---|
| 563 | Note, attempting to keep an array of iterators to each node requires high-complexity to ensure the list and array are harmonized.
|
|---|
| 564 |
|
|---|
| 565 | \begin{figure}
|
|---|
| 566 | \begin{tabular}{@{}ll@{}}
|
|---|
| 567 | \multicolumn{1}{c}{\CC} & \multicolumn{1}{c}{\CFA} \\
|
|---|
| 568 | \begin{c++}
|
|---|
| 569 | list<node *> list;
|
|---|
| 570 | node n1 = { 1 }, n2 = { 2 }, n3 = { 3 }, n4 = { 4 };
|
|---|
| 571 | list.push_back( &n1 ); list.push_back( &n2 );
|
|---|
| 572 | list.push_back( &n3 ); list.push_back( &n4 );
|
|---|
| 573 | list<node *>::iterator it;
|
|---|
| 574 | for ( it = list.begin(); it != list.end(); it ++ )
|
|---|
| 575 | if ( *it == &n3 ) { dl.erase( it ); break; }
|
|---|
| 576 | \end{c++}
|
|---|
| 577 | &
|
|---|
| 578 | \begin{cfa}
|
|---|
| 579 | dlist(node) list;
|
|---|
| 580 | node n1 = { 1 }, n2 = { 2 }, n3 = { 3 }, n4 = { 4 };
|
|---|
| 581 | insert( list, n1, n2, n3, n4 );
|
|---|
| 582 |
|
|---|
| 583 |
|
|---|
| 584 |
|
|---|
| 585 | remove( list, n3 );
|
|---|
| 586 | \end{cfa}
|
|---|
| 587 | \end{tabular}
|
|---|
| [9c12dd0] | 588 | \caption{Obtaining a linked-list iterator in \CC \vs \CFA}
|
|---|
| [bb9897c] | 589 | \label{f:CCvsCFAListIssues}
|
|---|
| 590 | \end{figure}
|
|---|
| 591 |
|
|---|
| 592 | \begin{comment}
|
|---|
| 593 | Dave Dice:
|
|---|
| 594 | There are what I'd call the "dark days" of C++, where the language \& libraries seemed to be getting progressively uglier - requiring more tokens to express something simple, and lots of arcana.
|
|---|
| 595 | But over time, somehow, they seem to have mostly righted the ship and now I can write C++ code that's fairly terse, like python, by ignoring all the old constructs.
|
|---|
| 596 | (They carry around the legacy baggage, of course, but seemed to have found a way to evolve away from it).
|
|---|
| 597 |
|
|---|
| 598 | If you just want to traverse a std::list, then, using modern "for" loops, you never need to see an iterator.
|
|---|
| 599 | I try hard never to need to write X.begin() or X.end().
|
|---|
| 600 | (There are situations where I'll expose iterators for my own types, however, to enable modern "for" loops).
|
|---|
| 601 | If I'm implementing simple linked lists, I'll usually skip std:: collections and do it myself, as it's less grief.
|
|---|
| 602 | I just don't get that much advantage from std::list. And my code is certainly not any shorter.
|
|---|
| 603 | On the other hand, @std::map@, @unordered_map@, @set@, and friends, are terrific, and I can usually still avoid seeing any iterators, which are blight to the eye.
|
|---|
| 604 | So those are a win and I get to move up an abstraction level, and write terse but easily understood code that still performs well.
|
|---|
| 605 | (One slight concern here is that all the C++ collection/container code is templated and lives in include files, and not traditional libraries.
|
|---|
| 606 | So the only way to replace something - say with a better algorithm, or you have a bug in the collections code - is to boil the oceans and recompile everything.
|
|---|
| 607 | But on the other hand the compiler can specialize to the specific use case, which is often a nice performance win).
|
|---|
| 608 |
|
|---|
| 609 | And yeah, the method names are pretty terrible. I think they boxed themselves in early with a set of conventions that didn't age well, and they tried to stick with it and force regularity over the collection types.
|
|---|
| 610 |
|
|---|
| 611 | I've never seen anything written up about the history, although lots of the cppcon talks make note of the "bad old days" idea and how collection \& container library design evolved. The issue is certainly recognized.
|
|---|
| 612 | \end{comment}
|
|---|
| 613 |
|
|---|
| 614 |
|
|---|
| [0d41e600] | 615 | \section{Implementation}
|
|---|
| 616 |
|
|---|
| [bb9897c] | 617 | \VRef[Figure]{fig:lst-impl-links} continues the running @req@ example, showing the \CFA list library's internal representation.
|
|---|
| [9367cd6] | 618 | The @dlink@ structure contains exactly two pointers: @next@ and @prev@, which are opaque.
|
|---|
| [bb9897c] | 619 | Even though the user-facing list model is linear, the \CFA library implements all listing as circular.
|
|---|
| 620 | This choice helps achieve uniform end treatment.
|
|---|
| 621 | % and \PAB{TODO finish summarizing benefit}.
|
|---|
| [4740281] | 622 | A link pointer targets a neighbouring @dlink@ structure, rather than a neighbouring @req@.
|
|---|
| 623 | (Recall, the running example has the user putting a @dlink@ within a @req@.)
|
|---|
| [0d41e600] | 624 |
|
|---|
| 625 | \begin{figure}
|
|---|
| 626 | \centering
|
|---|
| [9367cd6] | 627 | \includegraphics[width=\textwidth]{lst-impl-links.pdf}
|
|---|
| [0d41e600] | 628 | \caption{
|
|---|
| [9c12dd0] | 629 | \CFA list library representations for headed and headless lists
|
|---|
| [0d41e600] | 630 | }
|
|---|
| 631 | \label{fig:lst-impl-links}
|
|---|
| 632 | \end{figure}
|
|---|
| 633 |
|
|---|
| [9367cd6] | 634 | Circular link-pointers (dashed lines) are tagged internally in the pointer to indicate linear endpoints.
|
|---|
| [b195498] | 635 | Links among neighbour nodes are not tagged.
|
|---|
| [9367cd6] | 636 | Iteration reports ``has more elements'' when accessing untagged links, and ``no more elements'' when accessing tagged links.
|
|---|
| 637 | Hence, the tags are set on the links that a user cannot navigate.
|
|---|
| [0d41e600] | 638 |
|
|---|
| [bb9897c] | 639 | The \CFA library works in headed and headless modes.
|
|---|
| [0d41e600] | 640 | In a headed list, the list head (@dlist(req)@) acts as an extra element in the implementation-level circularly-linked list.
|
|---|
| [9367cd6] | 641 | The content of a @dlist@ header is a (private) @dlink@, with the @next@ pointer to the first element, and the @prev@ pointer to the last element.
|
|---|
| 642 | Since the head wraps a @dlink@, as does @req@, and since a link-pointer targets a @dlink@, the resulting cycle is among @dlink@ structures, situated inside of header/node.
|
|---|
| [b195498] | 643 | An untagged pointer points within a @req@, while a tagged pointer points within a list head.
|
|---|
| 644 | In a headless list, the circular backing list is only among @dlink@s within @req@s.
|
|---|
| 645 |
|
|---|
| [9367cd6] | 646 | No distinction is made between an unlisted node (top left and middle) under a headed model and a singleton list under a headless model.
|
|---|
| [b195498] | 647 | Both are represented as an item referring to itself, with both tags set.
|
|---|
| [0d41e600] | 648 |
|
|---|
| [5717495] | 649 |
|
|---|
| [16a843b] | 650 | \section{Assessment}
|
|---|
| 651 | \label{toc:lst:assess}
|
|---|
| 652 |
|
|---|
| [9367cd6] | 653 | This section examines the performance of the discussed list implementations.
|
|---|
| 654 | The goal is to show the \CFA lists are competitive with other designs, but the different list designs may not have equivalent functionality, so it is impossible to select a winner encompassing both functionality and execution performance.
|
|---|
| 655 |
|
|---|
| [eeefc0c] | 656 |
|
|---|
| [4cf8832] | 657 | \subsection{Experiment Design}
|
|---|
| [9367cd6] | 658 |
|
|---|
| [eeefc0c] | 659 | This section explains how the experiment is built.
|
|---|
| 660 | Many of the following parts define terminology concerning tuning knobs.
|
|---|
| 661 | \VRef[Figure]{f:ListPerfGlossary} provides a consolidated reference.
|
|---|
| 662 |
|
|---|
| [4cf8832] | 663 | \begin{figure}
|
|---|
| 664 | \noindent
|
|---|
| 665 | \begin{tabular}{p{1.75in}@{\ }p{4.5in}}
|
|---|
| 666 | Insert-Remove (IR)
|
|---|
| 667 | & The atomic unit of work being measured: one insertion plus one remove (plus all looping/tracking overheads) \\
|
|---|
| 668 | Use Case
|
|---|
| 669 | & Pattern of add-remove calls. \\
|
|---|
| 670 | -- Movement & \\
|
|---|
| 671 | \quad $\ni$ stack
|
|---|
| [eeefc0c] | 672 | & IRs happen at the same end. \\
|
|---|
| [4cf8832] | 673 | \quad $\ni$ queue
|
|---|
| [eeefc0c] | 674 | & IRs happen at opposite ends. \\
|
|---|
| [4cf8832] | 675 | -- Polarity
|
|---|
| 676 | & Which of the two orientations in which the movement happens. \\
|
|---|
| 677 | \quad $\ni$ insert-first
|
|---|
| 678 | & All inserts at front; stack removes at front; queue removes at back. \\
|
|---|
| 679 | \quad $\ni$ insert-last
|
|---|
| 680 | & All inserts at back; stack removes at back; queue removes at front. \\
|
|---|
| 681 | -- Accessor
|
|---|
| 682 | & How an insertion position, or removal element, is specified. The same position/element is picked either way. \\
|
|---|
| 683 | \quad $\ni$ all head
|
|---|
| [eeefc0c] | 684 | & IRs both through the head \\
|
|---|
| [4cf8832] | 685 | \quad $\ni$ insert element
|
|---|
| 686 | & insert by element and remove through the head \\
|
|---|
| 687 | \quad $\ni$ remove element
|
|---|
| 688 | & insert through head and remove by element\\
|
|---|
| 689 | Physical Context & \\
|
|---|
| 690 | -- Size (number) & Number of nodes being linked. Unless specified, equals the \emph{length} of the program's sole list. \emph{Width}, rarely used, is the number of lists. \\
|
|---|
| 691 | -- Size Zone
|
|---|
| 692 | & Contiguous range of sizes, chosen to avoid known anomalies and to sample a brief plateau. Each zone buckets four specific sizes. \\
|
|---|
| 693 | \quad $\ni$ small
|
|---|
| 694 | & lists of 4--16 elements \\
|
|---|
| 695 | \quad $\ni$ medium
|
|---|
| 696 | & lists of 50--200 elements \\
|
|---|
| 697 | \quad $\ni$ (other)
|
|---|
| 698 | & Not used for comparing intrusive frameworks. \\
|
|---|
| 699 | -- machine
|
|---|
| 700 | & Computer running the experiment \\
|
|---|
| 701 | \quad $\ni$ AMD
|
|---|
| 702 | & smaller cache \\
|
|---|
| 703 | \quad $\ni$ Intel
|
|---|
| 704 | & bigger cache \\
|
|---|
| 705 | Framework & A particular linked-list implementation (within its host language) \\
|
|---|
| 706 | $\ni$ \CC & The @std::list@ type of g++. \\
|
|---|
| 707 | $\ni$ lq-list & The @list@ type of LQ from glibc of gcc. \\
|
|---|
| 708 | $\ni$ lq-tailq & The @tailq@ type of the same. \\
|
|---|
| 709 | $\ni$ \uCpp & \uCpp's @uSequence@ \\
|
|---|
| 710 | $\ni$ \CFA & \CFA's @dlist@ \\
|
|---|
| 711 | Explanation being
|
|---|
| 712 | & How independent explanatory variable X is analyzed \\
|
|---|
| 713 | -- Marginalized
|
|---|
| 714 | & Left alone, allowed to vary, yielding a more absolute measure. Shows the effect that X causes. If all explanations are marginalized, then absolute times are available and a relative time has a peer group that is the entire population. \\
|
|---|
| 715 | -- Conditioned
|
|---|
| 716 | & Held constant, yielding a more relative measure. Hides the effect that X causes. Conditioning on X creates more, smaller relative-measure peer groups, by isolating each X-domain value. Resulting interpretation is, ``Assuming no change in X.'' \\
|
|---|
| 717 | \end{tabular}
|
|---|
| 718 | \caption{
|
|---|
| [9c12dd0] | 719 | Glossary of terms used in the list performance evaluation
|
|---|
| [4cf8832] | 720 | }
|
|---|
| 721 | \label{f:ListPerfGlossary}
|
|---|
| 722 | \end{figure}
|
|---|
| 723 |
|
|---|
| 724 |
|
|---|
| 725 | \subsubsection{Add-Remove Performance}
|
|---|
| [6762d46] | 726 | \label{s:AddRemovePerformance}
|
|---|
| [16a843b] | 727 |
|
|---|
| [9367cd6] | 728 | The fundamental job of a linked-list library is to manage the links that connect nodes.
|
|---|
| 729 | Any link management is an action that causes pair(s) of elements to become, or cease to be, adjacent in the list.
|
|---|
| [16a843b] | 730 | Thus, adding and removing an element are the sole primitive actions.
|
|---|
| 731 |
|
|---|
| [4cf8832] | 732 | Repeated adding and removing is necessary to measure timing because these operations can be as short as a dozen instructions.
|
|---|
| [16a843b] | 733 | These instruction sequences may have cases that proceed (in a modern, deep pipeline) without a stall.
|
|---|
| 734 |
|
|---|
| [9367cd6] | 735 | This experiment takes the position that:
|
|---|
| [eeefc0c] | 736 | \begin{enumerate}[leftmargin=*]
|
|---|
| [4cf8832] | 737 | \item The total time to add and remove is relevant, as opposed to having one time for adding and a separate time for removing.
|
|---|
| [bb9897c] | 738 | Adds without removes quickly fill memory;
|
|---|
| [4cf8832] | 739 | removing without adding is impossible.
|
|---|
| 740 | \item A relevant breakdown ``by operation'' is, rather, the usage pattern of the add/remove calls.
|
|---|
| 741 | A example pattern choice is adding and removing at the same end, making a stack, or opposite ends, for a queue.
|
|---|
| 742 | Another is pushing on the front by calling @insert_first(lst, e)@ \vs @insert(e, old_first_elm)@; this aspect provides the test's API coverage.
|
|---|
| 743 | \VRef[Section]{s:UseCases} gives the full breakdown.
|
|---|
| [bb9897c] | 744 | \item Speed differences caused by the host machine's memory hierarchy need to be identified and explained,
|
|---|
| [4cf8832] | 745 | but do not represent advantages of one framework over another.
|
|---|
| [eeefc0c] | 746 | \end{enumerate}
|
|---|
| [16a843b] | 747 |
|
|---|
| [eeefc0c] | 748 | The experiment used to measure IR cost measures the mean duration of a sequence of additions and removals.
|
|---|
| [16a843b] | 749 | The distribution of speeds experienced by an individual add-remove pair (tail latency) is not discussed.
|
|---|
| 750 | Space efficiency is shown only indirectly, by way of caches' impact on speed.
|
|---|
| [9367cd6] | 751 | The experiment is sensitive enough to show:
|
|---|
| [16a843b] | 752 | \begin{itemize}
|
|---|
| [bb9897c] | 753 | \item intrusive lists performing (majorly) differently than wrapped lists,
|
|---|
| [0f9c67bf] | 754 | \item a space of (lesser) performance differences among the intrusive lists.
|
|---|
| [16a843b] | 755 | \end{itemize}
|
|---|
| 756 |
|
|---|
| [4cf8832] | 757 | In all cases, the quantity discussed is the duration of one insert-remove (IR).
|
|---|
| [eeefc0c] | 758 | An IR is the time taken to do one innermost insertion-loop iteration, one innermost removal-loop iteration, and its share of all overheads, amortized.
|
|---|
| 759 | Lower IR is better.
|
|---|
| [4cf8832] | 760 | This experiment typically does an IR in 1--10 ns.
|
|---|
| 761 | The short end of this range has durations of single-digit clock-cycle counts.
|
|---|
| 762 | Therefore, the situations that achieve the best times are saturating the instruction pipeline successfully.
|
|---|
| 763 |
|
|---|
| 764 | Often, an IR duration value needs to be considered relatively.
|
|---|
| [eeefc0c] | 765 | For example, \VRef{s:SweetSoreSpots} asks whether one linked list implementation is more sensitive than another to the computer architecture.
|
|---|
| 766 | A finding might be that a machine slows implementation A by 10\% and B by 20\%.
|
|---|
| [4cf8832] | 767 | This finding is not saying that A is faster than B (on either machine).
|
|---|
| [eeefc0c] | 768 | The finding could stand if B starts faster and then levels off, if B starts slower and gets worse, or in myriad other cases.
|
|---|
| 769 | The finding asserts that such distinctions are not what is immediately relevant.
|
|---|
| 770 | The arithmetic producing different answers is removing the information about which one starts or ends up faster.
|
|---|
| [4cf8832] | 771 | Each implementation's to-machine duration is stated relatively to \emph{the same implementation's} from-machine duration.
|
|---|
| 772 | The resulting measure is still about a duration.
|
|---|
| 773 | The framework with the lower from-machine-relative duration handles the change better.
|
|---|
| [16a843b] | 774 |
|
|---|
| [68af77b] | 775 |
|
|---|
| [4cf8832] | 776 | \subsubsection{Test Program}
|
|---|
| 777 | \label{s:TetProgram}
|
|---|
| [16a843b] | 778 |
|
|---|
| [9367cd6] | 779 | The experiment driver defines a (intrusive) node type:
|
|---|
| 780 | \begin{cfa}
|
|---|
| 781 | struct Node {
|
|---|
| 782 | int i, j, k; // fields
|
|---|
| 783 | // possible intrusive links
|
|---|
| 784 | };
|
|---|
| 785 | \end{cfa}
|
|---|
| 786 | and considers the speed of building and tearing down a list of $n$ instances of it.
|
|---|
| 787 | % A number of experimental rounds per clock check is precalculated to be appropriate to the value of $n$.
|
|---|
| [16a843b] | 788 | \begin{cfa}
|
|---|
| 789 | // simplified harness: CFA implementation,
|
|---|
| 790 | // stack movement, insert-first polarity, head-mediated access
|
|---|
| 791 | size_t totalOpsDone = 0;
|
|---|
| [9367cd6] | 792 | dlist( node_t ) lst;
|
|---|
| 793 | node_t nodes[ n ]; $\C{// preallocated list storage}$
|
|---|
| [16a843b] | 794 | startTimer();
|
|---|
| [9367cd6] | 795 | while ( CONTINUE ) { $\C{// \(\approx\) 20 second duration}$
|
|---|
| 796 | for ( i; n ) insert_first( lst, nodes[i] ); $\C{// build up}$
|
|---|
| 797 | for ( i; n ) remove_first( lst ); $\C{// tear down}$
|
|---|
| [16a843b] | 798 | totalOpsDone += n;
|
|---|
| 799 | }
|
|---|
| 800 | stopTimer();
|
|---|
| [eeefc0c] | 801 | reportedDuration = getTimerDuration() / totalOpsDone; // throughput per IR operation
|
|---|
| [16a843b] | 802 | \end{cfa}
|
|---|
| [9367cd6] | 803 | To reduce administrative overhead, the $n$ nodes for each experiment list are preallocated in an array (on the stack), which removes dynamic allocations for this storage.
|
|---|
| 804 | These nodes contain an intrusive pointer for intrusive lists, or a pointer to it is stored in a dynamically-allocated internal-node for a wrapper list.
|
|---|
| 805 | Copying the node for wrapped lists skews the results with administration costs.
|
|---|
| 806 | The list insertion/removal operations are repeated for a typical 20+ second duration.
|
|---|
| 807 | After each round, a counter is incremented by $n$ (for throughput).
|
|---|
| 808 | Time is measured outside the loop because a large $n$ can overrun the time duration before the @CONTINUE@ flag is tested.
|
|---|
| [eeefc0c] | 809 | Hence, there is a minimum of one outer (@CONTINUE@) loop iteration for large lists.
|
|---|
| [9367cd6] | 810 | The loop duration is divided by the counter and this throughput is reported.
|
|---|
| 811 | In a scatter-plot, each dot is one throughput, which means insert + remove + harness overhead.
|
|---|
| [4cf8832] | 812 | The harness overhead is constant when comparing linked-list frameworks and is kept as small as possible.
|
|---|
| [9367cd6] | 813 | % The remainder of the setup section discusses the choices that affected the harness overhead.
|
|---|
| 814 |
|
|---|
| 815 | To test list operations, the experiment performs the inserts/removes in different patterns, \eg insert and remove from front, insert from front and remove from back, random insert and remove, \etc.
|
|---|
| [eeefc0c] | 816 | Unfortunately, the @std::list@ does \emph{not} support direct IR from a node without an iterator, \ie no @erase( node )@, even though the list is doubly-linked.
|
|---|
| [bb9897c] | 817 | To eliminate the iterator, a trick is used for random insertions without replacement, which takes advantage of the array nature of the nodes.
|
|---|
| [fa8d17f] | 818 | The @i@ fields in each node are initialized from @0..n-1@.
|
|---|
| 819 | These @i@ values are then shuffled in the nodes, and the @i@ value is used to represent an indirection to that node for insertion.
|
|---|
| [a0b7ef5] | 820 | Hence, the nodes are inserted in random order and removed in the same random order.
|
|---|
| [9367cd6] | 821 | $\label{p:Shuffle}$
|
|---|
| [a0b7ef5] | 822 | \begin{cfa}
|
|---|
| 823 | for ( i; n ) @nodes[i].i = i@; $\C[3.25in]{// indirection}$
|
|---|
| [9367cd6] | 824 | shuffle( nodes, n ); $\C{// random shuffle indirects within nodes}$
|
|---|
| [16a843b] | 825 |
|
|---|
| [9367cd6] | 826 | while ( CONTINUE ) {
|
|---|
| [fa8d17f] | 827 | for ( i; n ) {
|
|---|
| [a0b7ef5] | 828 | node_t & temp = nodes[ nodes[i].i ]; $\C{// select random node in array}$
|
|---|
| 829 | @temp.j = 0;@ $\C{// only touch random node for wrapped nodes}$
|
|---|
| 830 | insert_first( lst, temp ); $\C{// build up}$
|
|---|
| [fa8d17f] | 831 | }
|
|---|
| [a0b7ef5] | 832 | for ( i; n ) pass( &remove_first( lst ) ); $\C{// tear down}\CRT$
|
|---|
| [9367cd6] | 833 | totalOpsDone += n;
|
|---|
| [16a843b] | 834 | }
|
|---|
| [a0b7ef5] | 835 | \end{cfa}
|
|---|
| [fa8d17f] | 836 | Note, insertion is traversing the list of nodes linearly, @node[i]@.
|
|---|
| 837 | For intrusive lists, the inserted (random) node is always touched because its link fields are read/written for insertion into the list.
|
|---|
| 838 | Hence, the array of nodes is being accessed both linearly and randomly during the traversal.
|
|---|
| 839 | For wrapped lists, the wrapped nodes are traversed linearly but the random node is not accessed, only a pointer to it is inserted into the linearly accessed wrapped node.
|
|---|
| 840 | Hence, the traversal is the same as the non-random traversal above.
|
|---|
| 841 | To level the experiments, an explicit access to the random node is inserted after the insertion, @temp.j = 0@, for the wrapped experiment.
|
|---|
| [eeefc0c] | 842 | Furthermore, it is rare to IR nodes and not access them.
|
|---|
| [9367cd6] | 843 |
|
|---|
| 844 | % \emph{Interleaving} allows for movements other than pure stack and queue.
|
|---|
| 845 | % Note that the earlier example of using the iterators' array is still a pure stack: the item selected for @erase(...)@ is always the first.
|
|---|
| 846 | % Including a less predictable movement is important because real applications that justify doubly linked lists use them.
|
|---|
| 847 | % Freedom to remove from arbitrary places (and to insert under more relaxed assumptions) is the characteristic function of a doubly linked list.
|
|---|
| 848 | % A queue with drop-out is an example of such a movement.
|
|---|
| 849 | % A list implementation can show unrepresentative speed under a simple movement, for example, by enjoying unchallenged ``Is first element?'' branch predictions.
|
|---|
| 850 |
|
|---|
| 851 | % Interleaving brings ``at middle of list'' cases into a stream of add or remove invocations, which would otherwise be exclusively ``at end''.
|
|---|
| 852 | % A chosen split, like half middle and half end, populates a boolean array, which is then shuffled.
|
|---|
| 853 | % These booleans then direct the action to end-\vs-middle.
|
|---|
| 854 | %
|
|---|
| 855 | % \begin{cfa}
|
|---|
| 856 | % // harness (bookkeeping and shuffling elided): CFA implementation,
|
|---|
| 857 | % // stack movement, insert-first polarity, interleaved element-based remove access
|
|---|
| 858 | % dlist( item_t ) lst;
|
|---|
| 859 | % item_t items[ n ];
|
|---|
| 860 | % @bool interl[ n ];@ // elided: populate with weighted, shuffled [0,1]
|
|---|
| 861 | % while ( CONTINUE ) {
|
|---|
| 862 | % item_t * iters[ n ];
|
|---|
| 863 | % for ( i; n ) {
|
|---|
| 864 | % insert_first( items[i] );
|
|---|
| 865 | % iters[i] = & items[i];
|
|---|
| 866 | % }
|
|---|
| 867 | % @item_t ** crsr[ 2 ]@ = { // two cursors into iters
|
|---|
| 868 | % & iters[ @0@ ], // at stack-insert-first's removal end
|
|---|
| 869 | % & iters[ @n / interl_frac@ ] // in middle
|
|---|
| 870 | % };
|
|---|
| 871 | % for ( i; n ) {
|
|---|
| 872 | % item *** crsr_use = & crsr[ interl[ i ] ]@;
|
|---|
| 873 | % remove( *** crsr_use ); // removing from either middle or end
|
|---|
| 874 | % *crsr_use += 1; // that item is done
|
|---|
| 875 | % }
|
|---|
| 876 | % assert( crsr[0] == & iters[ @n / interl_frac@ ] ); // through second's start
|
|---|
| 877 | % assert( crsr[1] == & iters[ @n@ ] ); // did the rest
|
|---|
| 878 | % }
|
|---|
| 879 | % \end{cfa}
|
|---|
| 880 | %
|
|---|
| 881 | % By using the pair of cursors, the harness avoids branches, which could incur prediction stall times themselves, or prime a branch in the SUT.
|
|---|
| 882 | % This harness avoids telling the hardware what the SUT is about to do.
|
|---|
| [16a843b] | 883 |
|
|---|
| [4cf8832] | 884 |
|
|---|
| 885 | \subsubsection{Use Cases}
|
|---|
| 886 | \label{s:UseCases}
|
|---|
| 887 |
|
|---|
| [eeefc0c] | 888 | Where \VRef[Figure]{f:ListPerfGlossary} enumerates the specific values, recall the use-case dimensions are:
|
|---|
| 889 | \begin{description}
|
|---|
| 890 | \item[movement ($\times 2$)]
|
|---|
| 891 | In these experiments, strict stack and queue patterns are tested.
|
|---|
| 892 | \item[polarity ($\times 2$)]
|
|---|
| 893 | Obtain one polarity from the other by reversing uses of first/last.
|
|---|
| 894 | \item[accessor ($\times 3$)]
|
|---|
| 895 | Giving an add/remove location by a list head's first/last, \vs by a preexisting reference to an individual element.
|
|---|
| 896 | \end{description}
|
|---|
| 897 |
|
|---|
| [4cf8832] | 898 | \begin{figure}
|
|---|
| 899 | \begin{comment}
|
|---|
| 900 | \centering
|
|---|
| 901 | \setlength{\tabcolsep}{8pt}
|
|---|
| 902 | \begin{tabular}{@{}ll@{}}
|
|---|
| 903 | \begin{tabular}{@{}c|c|c@{}}
|
|---|
| 904 | movement & polarity & accessor \\
|
|---|
| 905 | \hline
|
|---|
| 906 | \hline
|
|---|
| 907 | stack &
|
|---|
| 908 | \begin{tabular}{@{}l@{}}
|
|---|
| 909 | insert-first \\
|
|---|
| 910 | \hline
|
|---|
| 911 | insert-last
|
|---|
| 912 | \end{tabular}
|
|---|
| 913 | &
|
|---|
| 914 | \begin{tabular}{@{}l@{}}
|
|---|
| 915 | insert-head / remove-head \\
|
|---|
| 916 | \hline
|
|---|
| 917 | insert-list / remove-head \\
|
|---|
| 918 | \hline
|
|---|
| 919 | insert-head / remove-list
|
|---|
| 920 | \end{tabular}
|
|---|
| 921 | \\
|
|---|
| 922 | \hline
|
|---|
| 923 | queue &
|
|---|
| 924 | \begin{tabular}{@{}l@{}}
|
|---|
| 925 | insert-first \\
|
|---|
| 926 | \hline
|
|---|
| 927 | insert-last
|
|---|
| 928 | \end{tabular}
|
|---|
| 929 | &
|
|---|
| 930 | \begin{tabular}{@{}l@{}}
|
|---|
| 931 | insert-head / remove-head \\
|
|---|
| 932 | \hline
|
|---|
| 933 | insert-list / remove-head \\
|
|---|
| 934 | \hline
|
|---|
| 935 | insert-head / remove-list
|
|---|
| 936 | \end{tabular}
|
|---|
| 937 | \end{tabular}
|
|---|
| 938 | &
|
|---|
| 939 | \setlength{\tabcolsep}{3pt}
|
|---|
| 940 | \small
|
|---|
| 941 | \begin{tabular}{@{}ll@{}}
|
|---|
| 942 | I: & stack, insert first, all head \\
|
|---|
| 943 | II: & stack, insert first, insert element \\
|
|---|
| 944 | III:& stack, insert first, remove element \\
|
|---|
| 945 | IV: & stack, insert last, all head \\
|
|---|
| 946 | V: & stack, insert last, insert element \\
|
|---|
| 947 | VI: & stack, insert last, remove element \\
|
|---|
| 948 | VII:& queue, insert first, all head \\
|
|---|
| 949 | VIII:& queue, insert first, insert element \\
|
|---|
| 950 | IX: & queue, insert first, remove element \\
|
|---|
| 951 | X: & queue, insert last, all head \\
|
|---|
| 952 | XI: & queue, insert last, iinsert element \\
|
|---|
| 953 | XII:& queue, insert last, remove element \\
|
|---|
| 954 | \end{tabular}
|
|---|
| 955 | \end{tabular}
|
|---|
| 956 | \end{comment}
|
|---|
| 957 | \setlength{\tabcolsep}{5pt}
|
|---|
| 958 | \small
|
|---|
| 959 | \begin{tabular}{rcccccccccccc}
|
|---|
| 960 | & I & II & III & IV & V & VI & VII & VIII & IX & X & XI & XII \\
|
|---|
| 961 | Movement &
|
|---|
| 962 | stack & stack & stack & stack & stack & stack &
|
|---|
| 963 | queue & queue & queue & queue & queue & queue \\
|
|---|
| 964 | Polarity &
|
|---|
| 965 | i-first & i-first & i-first & i-last & i-last & i-last &
|
|---|
| 966 | i-first & i-first & i-first & i-last & i-last & i-last \\
|
|---|
| 967 | Acessor &
|
|---|
| 968 | all hd & ins-e & rem-e & all hd & ins-e & rem-e &
|
|---|
| 969 | all hd & ins-e & rem-e & all hd & ins-e & rem-e
|
|---|
| 970 | \end{tabular}
|
|---|
| [9c12dd0] | 971 | \caption{Experiment use cases, numbered}
|
|---|
| [4cf8832] | 972 | \label{f:ExperimentOperations}
|
|---|
| 973 | \end{figure}
|
|---|
| 974 |
|
|---|
| 975 | A use case is a specific selection of movement, polarity and accessor.
|
|---|
| 976 | These experiments run twelve use cases.
|
|---|
| 977 | When a comparison is showing only what can happen when switching among use cases (as opposed to \eg how stacks are different from queues), the numbering scheme of \VRef[Figure]{f:ExperimentOperations} is used.
|
|---|
| 978 |
|
|---|
| 979 | With accessor, when an action names its insertion position or removal element, the harness either
|
|---|
| 980 | \begin{itemize}
|
|---|
| 981 | \item defers to the list-head's tracking of first/last (``through the head''), or
|
|---|
| 982 | \item applies its own knowledge of the current pattern, to name a position/element that happens to be first/last (``of known element'').
|
|---|
| 983 | \end{itemize}
|
|---|
| 984 |
|
|---|
| 985 | The accessor patterns, at the (\CFA) API level, are:
|
|---|
| [16a843b] | 986 | \begin{description}
|
|---|
| [eeefc0c] | 987 | \item[all (through the) head:] Both IRs happen through the list head. The list head operations are @insert_first@, @insert_last@, @remove_first@ and @remove_last@.
|
|---|
| 988 | \begin{sloppypar}
|
|---|
| [4cf8832] | 989 | \item[insert (of known) element] \dots and remove through head: Inserts use @insert_before(e, first)@ or @insert_after(e, last)@, where @e@ is being inserted and @first@/@last@ are element references known by list-independent means.
|
|---|
| [eeefc0c] | 990 | \end{sloppypar}
|
|---|
| [4cf8832] | 991 | \item[remove (of known) element] \dots and insert through the head: Removes use @remove(e)@, where @e@ is being removed. List-independent knowledge establishes that @e@ is first or last, as appropriate.
|
|---|
| [16a843b] | 992 | \end{description}
|
|---|
| 993 |
|
|---|
| [4cf8832] | 994 | Comparing all-head with insert-element gives the relative performance of head-mediated \vs element-oriented insertion, because both use the same removal style.
|
|---|
| 995 | Comparing all-head with remove-element gives the relative performance of head-mediated \vs element-oriented removal, because both use the same insertion style.
|
|---|
| 996 |
|
|---|
| 997 |
|
|---|
| [eeefc0c] | 998 | \subsubsection{Sizing}
|
|---|
| [4cf8832] | 999 |
|
|---|
| [eeefc0c] | 1000 | Intuition suggests measuring IR for different sized lists should just be a multiple of the single linking/unlinking of a node.
|
|---|
| 1001 | However, there is a scaling issue as more memory is being read and written, where caching comes into play.
|
|---|
| 1002 | But caching is more than the amount of memory being accessed;
|
|---|
| 1003 | the access pattern is equally important.
|
|---|
| 1004 | Aggressive instruction-level parallelism scheduling, which enables short IR times, is the amplifier, \eg a data dependency is a critical path in one situation but not in another.
|
|---|
| 1005 | Therefore, the duration response to size is not a steady worsening as size increases.
|
|---|
| 1006 | Often, each size-independent configuration responds to size increases in steps of slowdown.
|
|---|
| 1007 | Occasionally a slowdown step is followed by some perforamnce increase, where an incurred penalty begins to amortize away.
|
|---|
| 1008 | Hence, performance results can have interesting jitter as size increases.
|
|---|
| [4cf8832] | 1009 | The analysis treats these behaviours as incidental.
|
|---|
| 1010 | It does not try to characterize various exact-size responses.
|
|---|
| [eeefc0c] | 1011 | Rather, size zones are picked, specific effects inside of a zone are averaged away, and the story at one zone is compared to that at another.
|
|---|
| 1012 |
|
|---|
| 1013 | % It is true, but perhaps not obvious, that buildind and destroying long lists is slower than building and destroying short lists.
|
|---|
| 1014 | % Obviously, indeed, it takes longer to fuse and divide a hundred neighbours than five.
|
|---|
| 1015 | % But the key metric in this work, AII, is about a single link--unlink.
|
|---|
| 1016 | % So, critically, linking and unlinking a hundred neighbours actually takes \emph{more} than $20\times$ the time for five neighbours.
|
|---|
| 1017 | % The main reason is caching; when more neighbours are being manipulated, more memory is being read and written.
|
|---|
| 1018 | %
|
|---|
| 1019 | % But caching success is about more than the amount of memory worked on.
|
|---|
| 1020 | % Subtle changes in pattern become butterfly effects.
|
|---|
| 1021 | % Aggressive ILP scheduling, which enables short AIR times, is the amplifier.
|
|---|
| 1022 | % A data dependency, present in one framework but not another, is critical path in one situation but in not another.
|
|---|
| 1023 | % So, duration's response to size is not a steady worsening as size increases.
|
|---|
| 1024 | % Rather, each size-independent configuration often responds to size increases with leaps of worsening.
|
|---|
| 1025 | % Occasionally a leap is even followed size-run of retrograde response, where a suddenly incurred penalty has a chance to ammortize away.
|
|---|
| 1026 | % The frameworks tend to leapfrog over each other, at different points, as size increases.
|
|---|
| 1027 | %
|
|---|
| 1028 | % The analysis treats these behaviours as incidental.
|
|---|
| 1029 | % It does not try to characterize various exact-size responses.
|
|---|
| 1030 | % Rather, size zones are picked, specific effects inside of a zone are averaged away, and the story at one zone is compared to that at another zone.
|
|---|
| [4cf8832] | 1031 |
|
|---|
| [bf73608] | 1032 | \begin{figure}
|
|---|
| 1033 | \centering
|
|---|
| 1034 | \setlength{\tabcolsep}{0pt}
|
|---|
| 1035 | \begin{tabular}{p{3.4375in}p{0.125in}p{2.9375in}}
|
|---|
| 1036 | \subfloat[ ]{\label{f:zoomin-abs-i-swift}
|
|---|
| 1037 | \includegraphics{plot-list-zoomin-abs-i-swift.pdf}
|
|---|
| 1038 | } & &
|
|---|
| 1039 | \subfloat[ ]{\label{f:zoomin-abs-viii-java}
|
|---|
| 1040 | \includegraphics{plot-list-zoomin-abs-viii-java.pdf}
|
|---|
| 1041 | }
|
|---|
| 1042 | \end{tabular}
|
|---|
| [9c12dd0] | 1043 | \caption[Variety of IR duration responses to list length, at small--medium lengths]{Variety of IR duration responses to list length, at small--medium lengths. Two example use cases are shown: I, stack movement with head-only access (plot a); VIII, queue movement with element-oriented removal access (plot b); both use cases have insert-first polarity. One example is run on each machine: UC-I on AMD (ploat a); UC-VIII on Intel (plot b). Lower is better.}
|
|---|
| [bf73608] | 1044 | \label{fig:plot-list-zoomin-abs}
|
|---|
| 1045 | \end{figure}
|
|---|
| 1046 |
|
|---|
| 1047 | \VRef[Figure]{fig:plot-list-zoomin-abs} gives two example responses to size.
|
|---|
| [eeefc0c] | 1048 | % The dataset here is a small portion of the overall result and it is premature to attempt conclusions about framework differences from it.
|
|---|
| 1049 | These two example cases show how differently a pair of individual configurations behave.
|
|---|
| 1050 | % Of more immediate significance, they also have a pattern repeated, in all eight of their size responses.
|
|---|
| 1051 | % Note the ``small'' and ``medium'' overlaid boxes, which call out the size zones' definitions.
|
|---|
| [bf73608] | 1052 | Outside of an identified box, size response is erratic.
|
|---|
| [eeefc0c] | 1053 | Inside a box, size response is relatively smooth.
|
|---|
| 1054 | Within and among boxes there are identifiable patterns, which occur throughout all the experimental results.
|
|---|
| 1055 | Each individual configuration is tested by five trials, giving the error bars at min and max.
|
|---|
| 1056 | The amount of error here is typical across the configurations.
|
|---|
| 1057 | With a few exceptions, it is modest, so experiments are repeatable.
|
|---|
| 1058 |
|
|---|
| 1059 | To preview, \VRef{s:ResultCoarseComparisonStyles} dismisses large sizes (above 150 elements) and wrapped lists, because the performance story is dominated by the amount of memory touched not by intrusive \vs wrapped lists.
|
|---|
| 1060 | At smaller sizes, \VRef{s:ComparingIntrusiveImplementations} shows differences appear among the intrusive-list implementations.
|
|---|
| 1061 | Among the ``not Large'' sizes ($\le$ 150), two zones, Small and Medium, are selected as representatives of what can vary when the scale is changed.
|
|---|
| 1062 | These particular ranges are chosen because each range tends to have one story repeated across its constituent sizes.
|
|---|
| 1063 | For example, if \CFA's duration increases across Small, then the other frameworks' usually do too, or if \CFA is beating \uCpp across Medium, then it usually is at the high end too.
|
|---|
| 1064 | % The leapfrogging tends to happen outside of these two ranges.
|
|---|
| 1065 |
|
|---|
| 1066 | Finally, on the AMD architecture, \CFA performed poorly at size 1, on queue movements only, and no other framework saw the same effect.
|
|---|
| 1067 | This extreme outlier is not plotted in graphs.
|
|---|
| 1068 | After exploring the phenomenon in depth (not presented), the conclusion is a quirky interaction between the hardware and the testing harness.
|
|---|
| 1069 | A side experiment (that does not enrich the overall comparisons) saw user-induced gaps of $\approx$10 ns between same-list operations hide the effect completely.
|
|---|
| 1070 | These gaps are realistic because when an item goes on a list another action comes back to it \emph{later}.
|
|---|
| 1071 | The pattern that the general harness uses, concentrating time-adjacent operations on one list, is useful for measuring the ``small'' size zone, but is contrived, from the perspective of a data hazard that only this pattern exposes.
|
|---|
| 1072 | The general comparisons do not see the effect at all, because they use only the Small and Medium zones, with shortest length of 4.
|
|---|
| 1073 |
|
|---|
| 1074 | % A spot of poor performance appears in the general results for \CFA at size 1.
|
|---|
| 1075 | % Section \MLB{TODO:xref} explores the phenomenon and concludes that it is an anomaly due to a quirky interaction with the testing rig.
|
|---|
| 1076 | % To do so, two it considers size as either length or width.
|
|---|
| 1077 | % Length is the number of elements in a list.
|
|---|
| 1078 | % Width is a number of these lists being kept, worked upon in round-robin order.
|
|---|
| 1079 | % Outside of \MLB{TODO:xref}, size always means length, and width is 1.
|
|---|
| [4cf8832] | 1080 |
|
|---|
| [16a843b] | 1081 |
|
|---|
| [8eb85de] | 1082 | \subsubsection{Execution Environment}
|
|---|
| [9367cd6] | 1083 | \label{s:ExperimentalEnvironment}
|
|---|
| 1084 |
|
|---|
| 1085 | The performance experiments are run on:
|
|---|
| 1086 | \begin{description}[leftmargin=*,topsep=3pt,itemsep=2pt,parsep=0pt]
|
|---|
| 1087 | %\item[PC]
|
|---|
| 1088 | %with a 64-bit eight-core AMD FX-8370E, with ``taskset'' pinning to core \#6. The machine has 16 GB of RAM and 8 MB of last-level cache.
|
|---|
| 1089 | %\item[ARM]
|
|---|
| 1090 | %Gigabyte E252-P31 128-core socket 3.0 GHz, WO memory model
|
|---|
| 1091 | \item[AMD]
|
|---|
| [d6ce310] | 1092 | Supermicro AS--1125HS--TNR EPYC 9754 128--core socket, hyper-threading $\times$ 2 sockets (512 processing units) 2.25 GHz, TSO memory model, with cache structure 32KB L1i/L1d, 1024KB L2, 16MB L3, where each L3 cache covers 1 NUMA node and 8 cores (16 processors).
|
|---|
| [8eb85de] | 1093 | \item[Intel]
|
|---|
| [d6ce310] | 1094 | Supermicro SYS-121H-TNR Xeon Gold 6530 32--core, hyper-threading $\times$ 2 sockets (128 processing units) 2.1 GHz, TSO memory model, with cache structure 32KB L1i/L1d, 20248KB L2, 160MB L3, where each L3 cache covers 2 NUMA node and 32 cores (64 processors).
|
|---|
| [9367cd6] | 1095 | \end{description}
|
|---|
| 1096 | The experiments are single threaded and pinned to single core to prevent any OS movement, which might cause cache or NUMA effects perturbing the experiment.
|
|---|
| 1097 |
|
|---|
| 1098 | The compiler is gcc/g++-14.2.0 running on the Linux v6.8.0-52-generic OS.
|
|---|
| 1099 | Switching between the default memory allocators @glibc@ and @llheap@ is done with @LD_PRELOAD@.
|
|---|
| 1100 | To prevent eliding certain code patterns, crucial parts of a test are wrapped by the function @pass@
|
|---|
| 1101 | \begin{cfa}
|
|---|
| 1102 | // prevent eliding, cheaper than volatile
|
|---|
| 1103 | static inline void * pass( void * v ) { __asm__ __volatile__( "" : "+r"(v) ); return v; }
|
|---|
| 1104 | ...
|
|---|
| 1105 | pass( &remove_first( lst ) ); // wrap call to prevent elision, insert cannot be elided now
|
|---|
| 1106 | \end{cfa}
|
|---|
| 1107 | The call to @pass@ can prevent a small number of compiler optimizations but this cost is the same for all lists.
|
|---|
| 1108 |
|
|---|
| [4cf8832] | 1109 | The main difference in the machines is their cache structure.
|
|---|
| 1110 | The AMD has smaller caches that are shared less, while the Intel shares larger caches among more processors.
|
|---|
| [eeefc0c] | 1111 | This difference, while an interesting tradeoff for highly concurrent use, is rather one-sided for sequential use, such as this experiment.
|
|---|
| 1112 | Specifically, the Intel offers a single processor a bigger cache.
|
|---|
| 1113 |
|
|---|
| [4cf8832] | 1114 |
|
|---|
| 1115 | \subsubsection{Recap and Master Legend}
|
|---|
| 1116 |
|
|---|
| [eeefc0c] | 1117 | For experiments performed in later section, there are 12 use cases, which are all combinations of 2 movents, 2 polarities and 3 accessors.
|
|---|
| 1118 | There are 4 pysical contexts, which are all combinations of 2 machines and 2 size (length) zones.
|
|---|
| [4cf8832] | 1119 | Each physical context samples 4 specific sizes.
|
|---|
| 1120 | There are 3.25 frameworks.
|
|---|
| 1121 | This accounting considers how LQ-list supports only the movement--polarity combination "stack, insert first."
|
|---|
| 1122 | So LQ-list fills a quarter of the otherwise-orthogonal space.
|
|---|
| 1123 | Use case, physical context and framework are the explanatory factors.
|
|---|
| [eeefc0c] | 1124 | Taking all combinations of the explanatory factors gives 12 $\times$ 4 $\times$ 4 $\times$ 3.25 = 624 individual configurations.
|
|---|
| 1125 |
|
|---|
| 1126 | % \[
|
|---|
| 1127 | % \textrm{624 individual configurations} =
|
|---|
| 1128 | % \sum_{\substack{
|
|---|
| 1129 | % \textrm{12 use cases}\\
|
|---|
| 1130 | % \textrm{4 physical contexts}\\
|
|---|
| 1131 | % \textrm{4 specific sizes}\\
|
|---|
| 1132 | % \textrm{3.25 frameworks}
|
|---|
| 1133 | % }}
|
|---|
| 1134 | % \textrm{1 individual configuration}
|
|---|
| 1135 | % \]
|
|---|
| [4cf8832] | 1136 |
|
|---|
| [bf73608] | 1137 | \begin{figure}
|
|---|
| 1138 | \centering
|
|---|
| 1139 | \setlength{\tabcolsep}{0pt}
|
|---|
| 1140 | \includegraphics[trim={0in, 1.75in, 0in, 0in}, clip]{plot-list-legend-rel-i-swift.pdf}
|
|---|
| 1141 | \begin{tabular}{p{2.75in}p{0.125in}p{3.625in}}
|
|---|
| 1142 | \subfloat[ ]{\label{f:zoomin-rel-i-swift}
|
|---|
| 1143 | \includegraphics{plot-list-zoomin-rel-i-swift.pdf}
|
|---|
| 1144 | } & &
|
|---|
| 1145 | \subfloat[ ]{\label{f:zoomin-histo-i-swift}
|
|---|
| 1146 | \includegraphics{plot-list-histo-rel-i-swift.pdf}
|
|---|
| 1147 | }
|
|---|
| 1148 | \end{tabular}
|
|---|
| [9c12dd0] | 1149 | \caption[IR duration, transformed for general anaysis]{
|
|---|
| 1150 | IR duration, transformed for general anaysis.
|
|---|
| 1151 | The analysis follows the single example setup of \VRef[Figure]{f:zoomin-abs-i-swift}, \ie Use Case I on AMD, where IR is given as absolute duration.
|
|---|
| 1152 | Plot (a) transforms the source dataset by conditioning on specific size.
|
|---|
| 1153 | Plot (b) takes the results from only the identified size zones, discards their specific-size information, and shows the resulting distribution.
|
|---|
| 1154 | Lower is better.}
|
|---|
| [bf73608] | 1155 | \label{fig:plot-list-rel}
|
|---|
| 1156 | \end{figure}
|
|---|
| [4cf8832] | 1157 |
|
|---|
| [eeefc0c] | 1158 | \begin{comment}
|
|---|
| [bf73608] | 1159 | 32 of these individual configurations, plucked from \VRef[Figure]{f:zoomin-abs-i-swift}, are the subject of \VRef[Figure]{fig:plot-list-rel}, where they are now transformed into the format used for general analysis.
|
|---|
| 1160 | In \subref*{f:zoomin-rel-i-swift}, each of the 56 data points is an individual configuration; the subset within the two boxes has the 32 of interest.
|
|---|
| 1161 | In \subref*{f:zoomin-histo-i-swift}, the very leftmost histogram (Small, \CFA) summarizes the 4 Small--\CFA individual configurations.
|
|---|
| 1162 | Each remaining framework, at size zone Small, similarly summarizes 4 individual configurations.
|
|---|
| 1163 | This statement is the interpretation of the x-category label, ``Small (/4).''
|
|---|
| 1164 | Each line under the ``/4'' label has 4 individual configurations; so, this group has 16 individual configurations.
|
|---|
| 1165 | The interpretation of ``Medium (/4)'' is similar; it groups the remaining 16 configurations.
|
|---|
| 1166 | When both size zones are pooled together, ``Both (/8)'' results; this group revisits all 32 configurations.
|
|---|
| 1167 |
|
|---|
| 1168 | This example's use of 32 individual configurations is in contrast with full-universe comparisons like the forthcoming \VRef[Figure]{fig:plot-list-1ord}, whose coverage further includes all use cases and both machines.
|
|---|
| 1169 |
|
|---|
| 1170 | The transformation of \VRef[Figure]{f:zoomin-abs-i-swift} into \VRef[Figure]{f:zoomin-rel-i-swift} conditions on the configuration's specific size.
|
|---|
| 1171 | At a particular size, the average duration is computed, across the three frameworks that work on all use cases: \CFA, \uCpp and LQ-@tailq@.
|
|---|
| 1172 | \VRef[Figure]{fig:plot-list-rel} shows a particular configuration's duration relative to this average.
|
|---|
| 1173 |
|
|---|
| 1174 | The effect of conditioning on specific size erases the fact that \VRef[Figure]{fig:plot-list-zoomin-abs} shows, aside from the coarse hops, all frameworks getting smoothly slower as size increases.
|
|---|
| 1175 | This effect is partiularly relevant \emph{within} a size zone, most noticeable as the data lines all going up across the Small box.
|
|---|
| 1176 | Now, with size conditioned, \VRef[Figure]{f:zoomin-rel-i-swift} has the trends inside a zone box being flat.
|
|---|
| 1177 | This flatness gives \subref*{f:histo-rel-i-swift} nicely separated histograms.
|
|---|
| 1178 |
|
|---|
| 1179 | Specific size is the only factor conditioned in this view.
|
|---|
| 1180 | This choice was made to keep the relationship between \VRef[Figures]{f:zoomin-abs-i-swift} and \VRef[]{:zoomin-rel-i-swift} perceptible.
|
|---|
| 1181 | By contrast, general comparisons like \VRef[Figure]{fig:plot-list-1ord} condition on more, generally, everyting not presented.
|
|---|
| 1182 | Its physical-factor breakdown conditions on use case and framework, but not on physical factors; its other two breakdowns are defined similarly.
|
|---|
| 1183 |
|
|---|
| 1184 | The noteworthy performance hop, in this example, is LQ-@list@, which \VRef[Figures]{f:zoomin-abs-i-swift} has as consistently slow in the Small range, and consistently fast in the Medium range.
|
|---|
| 1185 | Therefore, in \VRef{f:zoomin-histo-i-swift}, its two per-size-zone histograms are far apart, and its cross-size-zone histogram is bimdal.
|
|---|
| 1186 | Hops and distribution contributions, like this one, are common.
|
|---|
| 1187 | They are attention-grabbing curiosities when comparing nearly any pair of individual configurations.
|
|---|
| 1188 | They are the background noise of the all-in inter-configuration comparisons following.
|
|---|
| 1189 | With this one, \VRef[Figure]{fig:plot-list-1ord}'s LQ-@list@ distribution (farthest right) does have a perceptible bump at $1.8\times$, corresponding to the upper mode seen here.
|
|---|
| 1190 | But this UC-I--Intel contribution is only $1/6 = 8/48$ of the configurations summarized there.
|
|---|
| 1191 |
|
|---|
| 1192 | Each individual configuration is tested by five trials, giving the error bars of \VRef[Figure]{:zoomin-rel-i-swift} (at min and max).
|
|---|
| 1193 | This trial error is unaffected by the size conditioning.
|
|---|
| 1194 | Though this error is presented only for the narrow slice of the current examples, the amount of it seen here is typical across all the configurations.
|
|---|
| 1195 | With a few exceptions, it is modest; so, the experiment is repeatable.
|
|---|
| 1196 | The data points in \subref{f:zoomin-rel-i-swift} show mean excluding min and max.
|
|---|
| 1197 | This value, alone, is the contribution to the histogram in \subref{f:zoomin-histo-i-swift}.
|
|---|
| 1198 | That is, inter-configuration rollups discard the modest trial-repeatability error.
|
|---|
| 1199 | The girth of a histogram's distribution is entirely the inter-configuration variance, of its configurations' expected performance.
|
|---|
| [eeefc0c] | 1200 | \end{comment}
|
|---|
| [bf73608] | 1201 |
|
|---|
| [eeefc0c] | 1202 | It is impossible to present this large amount of information in graphs.
|
|---|
| 1203 | Therefore, a condensed graphing style is used in subsequent plots.
|
|---|
| 1204 | \VRef[Figure]{fig:plot-list-rel} shows how the condensed graphing style is generated from raw data.
|
|---|
| 1205 | \VRef[Figure]{f:zoomin-rel-i-swift} is formed from the data in \VRef[Figure]{f:zoomin-abs-i-swift}, restructured on the Y-axis using a relative duration.
|
|---|
| 1206 | \VRef[Figure]{f:zoomin-histo-i-swift} shows the interesting data within the two boxes (Small/Medium) and their combination (Both).
|
|---|
| 1207 | This graph plots a vertical histogram for each of the 4 lists.
|
|---|
| 1208 | The light-shaded histogram is the raw data (similar data values overlap), and the dark histogram is the goemean average when there are multiple experiments condensed in a column.
|
|---|
| 1209 | The caption indicates the number of values condensed into this histogram, e.g., ``/4'' $\Rightarrow$ 4 data points.
|
|---|
| 1210 | The vertical relationship among the averages gives a quick result for a specific experiment, where lower is better.
|
|---|
| 1211 | The relative duration smooths the results, where smoothness diminishes as size increases.
|
|---|
| 1212 | This flatness gives nicely separated histograms.
|
|---|
| [bf73608] | 1213 | Thus, in the forthcoming comparison plots:
|
|---|
| [eeefc0c] | 1214 |
|
|---|
| 1215 | % \begin{itemize}[leftmargin=*]
|
|---|
| 1216 | % \item
|
|---|
| 1217 | % The measure is mean IR among the middle 3 trials out of 5, that occurred for an individual configuration.
|
|---|
| 1218 | % \item
|
|---|
| 1219 | % The number of individual configurations per histogram is stated as ``/N,'' at a relevant granularity.
|
|---|
| 1220 | % \item
|
|---|
| 1221 | % All reported averages are geometric means and all IR duration axes (verticals) are logarithmic.
|
|---|
| 1222 | % \item
|
|---|
| 1223 | % Unless indicated otherwise, all explanatory factors appearing on a plot are marginalized, while those not appearing on the plot are conditioned.
|
|---|
| 1224 | % \end{itemize}
|
|---|
| [4cf8832] | 1225 |
|
|---|
| 1226 |
|
|---|
| [f1ffc47] | 1227 | \subsection{Result: Coarse Comparison of Styles}
|
|---|
| [eeefc0c] | 1228 | \label{s:ResultCoarseComparisonStyles}
|
|---|
| [16a843b] | 1229 |
|
|---|
| [9367cd6] | 1230 | This comparison establishes how an intrusive list performs compared with a wrapped-reference list.
|
|---|
| [eeefc0c] | 1231 | \VRef[Figure]{fig:plot-list-zoomout} presents throughput at various list lengths for a linear and random (shuffled) IR test.
|
|---|
| [9367cd6] | 1232 | Other kinds of scans were made, but the results are similar in many cases, so it is sufficient to discuss these two scans, representing difference ends of the access spectrum.
|
|---|
| [eeefc0c] | 1233 | In the graphs, all four intrusive lists (lq-list, lq-tailq, \uCpp, \CFA, see Framework in \VRef[Figure]{f:ListPerfGlossary}) are plotted with the same symbol;
|
|---|
| [75ba2fa6] | 1234 | sometimes theses symbols clump on top of each other, showing the performance difference among intrusive lists is small in comparison to the wrapped list (std::list).
|
|---|
| 1235 | See~\VRef{s:ComparingIntrusiveImplementations} for details among intrusive lists.
|
|---|
| [16a843b] | 1236 |
|
|---|
| [eeefc0c] | 1237 | The list lengths start at 10 due to the short IR times of 2--4 ns, for intrusive lists \vs STL's wrapped-reference list of 15--20 ns.
|
|---|
| 1238 | For very short lists, like 4, the experiment time of 4 $\times$ 2.5 ns and experiment overhead (loops) of 2--4 ns, results in an artificial administrative bump at the start of the graph having nothing to do with the IR times.
|
|---|
| [9367cd6] | 1239 | As the list size grows, the administrative overhead for intrusive lists quickly disappears.
|
|---|
| [16a843b] | 1240 |
|
|---|
| 1241 | \begin{figure}
|
|---|
| [9367cd6] | 1242 | \centering
|
|---|
| [8eb85de] | 1243 | \setlength{\tabcolsep}{0pt}
|
|---|
| 1244 | \begin{tabular}{p{0.75in}p{2.75in}p{3in}}
|
|---|
| 1245 | &
|
|---|
| 1246 | \subfloat[Linear List Nodes, AMD]{\label{f:Linear-swift}
|
|---|
| [bb9897c] | 1247 | \hspace*{-0.75in}
|
|---|
| 1248 | \includegraphics{plot-list-zoomout-noshuf-swift.pdf}
|
|---|
| [8eb85de] | 1249 | } % subfigure
|
|---|
| 1250 | &
|
|---|
| 1251 | \subfloat[Linear List Nodes, Intel]{\label{f:Linear-java}
|
|---|
| [bb9897c] | 1252 | \includegraphics{plot-list-zoomout-noshuf-java.pdf}
|
|---|
| [9367cd6] | 1253 | } % subfigure
|
|---|
| 1254 | \\
|
|---|
| [8eb85de] | 1255 | &
|
|---|
| [75ba2fa6] | 1256 | \subfloat[Random List Nodes, AMD]{\label{f:Random-swift}
|
|---|
| [bb9897c] | 1257 | \hspace*{-0.75in}
|
|---|
| 1258 | \includegraphics{plot-list-zoomout-shuf-swift.pdf}
|
|---|
| [8eb85de] | 1259 | } % subfigure
|
|---|
| 1260 | &
|
|---|
| [75ba2fa6] | 1261 | \subfloat[Random List Nodes, Intel]{\label{f:Random-java}
|
|---|
| [bb9897c] | 1262 | \includegraphics{plot-list-zoomout-shuf-java.pdf}
|
|---|
| [9367cd6] | 1263 | } % subfigure
|
|---|
| [8eb85de] | 1264 | \end{tabular}
|
|---|
| [9c12dd0] | 1265 | \caption[IR duration \vs list length, all sizes]{IR duration \vs list length, all sizes.
|
|---|
| [9367cd6] | 1266 | Lengths go as large possible without error.
|
|---|
| [eeefc0c] | 1267 | One example use case is shown: stack movement, insert-first polarity and head-mediated access. Lower is better.}
|
|---|
| [16a843b] | 1268 | \label{fig:plot-list-zoomout}
|
|---|
| 1269 | \end{figure}
|
|---|
| 1270 |
|
|---|
| [eeefc0c] | 1271 | The key performance factor between intrusive and wrapped-reference lists is the dynamic allocation for the wrapped nodes.
|
|---|
| 1272 | Hence, this experiment is largely measuring the cost of @malloc@/\-@free@ rather than IR, and is sensitive to the layout of memory by the allocator.
|
|---|
| 1273 | For intrusive-list IR, the cost is manipulating the link fields, which is seen by the relatively similar results for the different intrusive lists.
|
|---|
| 1274 | For wrapped-reference IR, the costs are: dynamically allocating/deallocating a wrapped node, copying a external-node pointer into the wrapped node for insertion, and linking the wrapped node to/from the list;
|
|---|
| [75ba2fa6] | 1275 | the allocation dominates these costs.
|
|---|
| 1276 | For example, the experiment was run with both glibc and llheap memory allocators, where llheap performance reduced the cost from 20 to 16 ns, still far from the 2--4 ns for linking an intrusive node.
|
|---|
| 1277 | Hence, there is no way to tease apart the allocation, copying, and linking costs for wrapped lists, as there is no way to preallocate the list nodes without writing a mini-allocator to manage that storage.
|
|---|
| [9367cd6] | 1278 |
|
|---|
| [8eb85de] | 1279 | In detail, \VRef[Figure]{f:Linear-swift}--\subref*{f:Linear-java} shows linear insertion of all the nodes and then linear removal, both in the same direction.
|
|---|
| [75ba2fa6] | 1280 | For intrusive lists, the nodes are adjacent in memory from being preallocated in an array.
|
|---|
| 1281 | For wrapped lists, the wrapped nodes happen to be adjacent because the memory allocator uses bump allocation during the initial phase of allocation.
|
|---|
| [9367cd6] | 1282 | As a result, these memory layouts result in high spatial and temporal locality for both kinds of lists during the linear array traversal.
|
|---|
| 1283 | With address look-ahead, the hardware does an excellent job of managing the multi-level cache.
|
|---|
| [fa8d17f] | 1284 | Hence, performance is largely constant for both kinds of lists, until L3 cache and NUMA boundaries are crossed for longer lists and the costs increase consistently for both kinds of lists.
|
|---|
| [75ba2fa6] | 1285 | For example, on AMD (\VRef[Figure]{f:Linear-swift}), there is one NUMA node but many small L3 caches, so performance slows down quickly as multiple L3 caches come into play, and remains constant at that level, except for some anomalies for very large lists.
|
|---|
| 1286 | On Intel (\VRef[Figure]{f:Linear-java}), there are four NUMA nodes and four slowdown steps as list-length increase.
|
|---|
| 1287 | At each step, the difference between the kinds of lists decreases as the NUMA effect increases.
|
|---|
| 1288 |
|
|---|
| 1289 | In detail, \VRef[Figure]{f:Random-swift}--\subref*{f:Random-java} shows random insertion and removal of the nodes.
|
|---|
| 1290 | As for linear, there is the issue of memory allocation for the wrapped list.
|
|---|
| 1291 | As well, the consecutive storage-layout is the same (array and bump allocation).
|
|---|
| 1292 | Hence, the difference is the random linking among nodes, resulting in random accesses, even though the list is traversed linearly, resulting in similar cache events for both kinds of lists.
|
|---|
| 1293 | Both \VRef[Figures]{f:Random-swift}--\subref*{f:Random-java} show the slowdown of random access as the list-length grows resulting from stepping out of caches into main memory and crossing NUMA nodes.
|
|---|
| [9367cd6] | 1294 | % Insert and remove operations act on both sides of a link.
|
|---|
| 1295 | %Both a next unlisted item to insert (found in the items' array, seen through the shuffling array), and a next listed item to remove (found by traversing list links), introduce a new user-item location.
|
|---|
| [75ba2fa6] | 1296 | As for linear, the Intel (\VRef[Figure]{f:Random-java}) graph shows steps from the four NUMA nodes.
|
|---|
| 1297 | Interestingly, after $10^6$ nodes, intrusive lists are slower than wrapped.
|
|---|
| 1298 | I did not have time to track down this anomaly, but I speculate it results from the difference in touching the data in the accessed node, as the data and links are together for intrusive and separated for wrapped.
|
|---|
| 1299 | For the llheap memory-allocator and the two tested architectures, intrusive lists out perform wrapped lists up to size $10^3$ for both linear and random, and performance begins to converge around $10^6$ nodes as architectural issues begin to dominate.
|
|---|
| 1300 | Clearly, memory allocator and hardware architecture plays a large factor in the total cost and the crossover points as list-size increases.
|
|---|
| [9367cd6] | 1301 | % In an odd scenario where this intuition is incorrect, and where furthermore the program's total use of the memory allocator is sufficiently limited to yield approximately adjacent allocations for successive list insertions, a non-intrusive list may be preferred for lists of approximately the cache's size.
|
|---|
| 1302 |
|
|---|
| 1303 | The takeaway from this experiment is that wrapped-list operations are expensive because memory allocation is expense at this fine-grained level of execution.
|
|---|
| 1304 | Hence, when possible, using intrusive links can produce a significant performance gain, even if nodes must be dynamically allocated, because the wrapping allocations are eliminated.
|
|---|
| [75ba2fa6] | 1305 | Even when space is a consideration, intrusive links may not use more storage if a node is often linked.
|
|---|
| 1306 | Unfortunately, many programmers are unaware of intrusive lists for dynamically-sized data-structures or their tool-set does not provide them.
|
|---|
| [9367cd6] | 1307 |
|
|---|
| 1308 | % Note, linear access may not be realistic unless dynamic size changes may occur;
|
|---|
| 1309 | % if the nodes are known to be adjacent, use an array.
|
|---|
| 1310 |
|
|---|
| 1311 | % In a wrapped-reference list, list nodes are allocated separately from the items put into the list.
|
|---|
| 1312 | % Intrusive beats wrapped at the smaller lengths, and when shuffling is avoided, because intrusive avoids dynamic memory allocation for list nodes.
|
|---|
| 1313 |
|
|---|
| 1314 | % STL's performance is not affected by element order in memory.
|
|---|
| 1315 | %The field of intrusive lists begins with length-1 operations costing around 10 ns and enjoys a ``sweet spot'' in lengths 10--100 of 5--7-ns operations.
|
|---|
| 1316 | % This much is also unaffected by element order.
|
|---|
| 1317 | % Beyond this point, shuffled-element list performance worsens drastically, losing to STL beyond about half a million elements, and never particularly leveling off.
|
|---|
| 1318 | % In the same range, an unshuffled list sees some degradation, but holds onto a 1--2 $\times$ speedup over STL.
|
|---|
| 1319 |
|
|---|
| 1320 | % The apparent intrusive ``sweet spot,'' particularly its better-than-length-1 speed, is not because of list operations truly running faster.
|
|---|
| 1321 | % Rather, the worsening as length decreases reflects the per-operation share of harness overheads incurred at the outer-loop level.
|
|---|
| 1322 | % Disabling the harness's ability to drive interleaving, even though the current scenario is using a ``never work in middle'' interleave, made this rise disappear.
|
|---|
| 1323 | % Subsequent analyses use length-controlled relative performance when comparing intrusive implementations, making this curiosity disappear.
|
|---|
| 1324 |
|
|---|
| 1325 | % The remaining big-swing comparison points say more about a computer's memory hierarchy than about linked lists.
|
|---|
| 1326 | % The tests in this chapter are only inserting and removing.
|
|---|
| 1327 | % They are not operating on any user payload data that is being listed.
|
|---|
| 1328 | % The drastic differences at large list lengths reflect differences in link-field storage density and in correlation of link-field order to element order.
|
|---|
| 1329 | % These differences are inherent to the two list models.
|
|---|
| 1330 |
|
|---|
| 1331 | % A wrapped-reference list's separate nodes are allocated right beside each other in this experiment, because no other memory allocation action is happening.
|
|---|
| 1332 | % As a result, the interlinked nodes of the STL list are generally referencing their immediate neighbours.
|
|---|
| 1333 | % This pattern occurs regardless of user-item shuffling because this test's ``use'' of the user-items' array is limited to storing element addresses.
|
|---|
| 1334 | % This experiment, driving an STL list, is simply not touching the memory that holds the user data.
|
|---|
| 1335 | % Because the interlinked nodes, being the only touched memory, are generally adjacent, this case too has high memory locality and stays fast.
|
|---|
| 1336 |
|
|---|
| 1337 | % But the comparison of unshuffled intrusive with wrapped-reference gives the performance of these two styles, with their the common impediment of overfilling the cache removed.
|
|---|
| 1338 | % Intrusive consistently beats wrapped-reference by about 20 ns, at all sizes.
|
|---|
| 1339 | % This difference is appreciable below list length 0.5 M, and enormous below 10 K.
|
|---|
| [16a843b] | 1340 |
|
|---|
| 1341 |
|
|---|
| [4cf8832] | 1342 | \subsection{Result: Intrusive Winners and Losers}
|
|---|
| [9367cd6] | 1343 | \label{s:ComparingIntrusiveImplementations}
|
|---|
| [16a843b] | 1344 |
|
|---|
| [4cf8832] | 1345 | The preceding result shows the intrusive frameworks have better performance than the wrapped lists for small to medium sized lists.
|
|---|
| [6762d46] | 1346 | This analysis covers the experiment position taken in \VRef{s:AddRemovePerformance} for movement, polarity, and accessor.
|
|---|
| [4cf8832] | 1347 | \VRef[Figure]{f:ExperimentOperations} shows the experiment use cases tested, which results in 12 experiments (I--XII) for comparing intrusive frameworks.
|
|---|
| 1348 | To preclude hardware interference, only list sizes below 150 are examined to differentiate among the intrusive frameworks,
|
|---|
| [75ba2fa6] | 1349 | The data is selected from the start of \VRef[Figures]{f:Linear-swift}--\subref*{f:Linear-java}, but the start of \VRef[Figures]{f:Random-swift}--\subref*{f:Random-java} is largely the same.
|
|---|
| [16a843b] | 1350 |
|
|---|
| [6762d46] | 1351 | \begin{figure}
|
|---|
| 1352 | \centering
|
|---|
| [eeefc0c] | 1353 | \includegraphics{plot-list-1ord.pdf}
|
|---|
| [bf73608] | 1354 | \small{\textsuperscript{\textdagger} LQ-@list@ is (/48) by its incomplete (25\%) use case coverage. Its bars are scaled to match.}
|
|---|
| [9c12dd0] | 1355 | \caption[IR duration, decomposed by all first-order effects]{
|
|---|
| 1356 | IR duration, decomposed by all first-order effects.
|
|---|
| 1357 | Each of the three breakdowns divides the entire population of test results into its mutually disjoint constituents.
|
|---|
| 1358 | Lower is better.
|
|---|
| [408f954] | 1359 | }
|
|---|
| [6762d46] | 1360 | \label{fig:plot-list-1ord}
|
|---|
| 1361 | \end{figure}
|
|---|
| 1362 |
|
|---|
| [68af77b] | 1363 | \VRef[Figure]{fig:plot-list-1ord} gives the first-order effects.
|
|---|
| [eeefc0c] | 1364 | The first breakdown, architecture/size-zone (left), shows the overall performance of all 12 experiment on the two different hardware architectures for small and medium lists (624 / 4 = 156 experiments per column).
|
|---|
| 1365 | % The relative experiment duration for each experiment is shown as a bar in each column and the black bar in that column shows the average of all 12 experiments.
|
|---|
| 1366 | By inspection of the averages, Intel runs faster than AMD.
|
|---|
| 1367 | Within an architecture, the small zone (lists of 4--16 elements) runs faster than the medium zone (lists of 50--200 elements).
|
|---|
| 1368 | The overall slower execution on the AMD results from its smaller L3 cache \vs the larger cache on the Intel.
|
|---|
| [68af77b] | 1369 | (No NUMA effects for these list sizes.)
|
|---|
| [4cf8832] | 1370 | Specifically, a 20\% standard deviation exists here, between the means of the four physical-effect categories.
|
|---|
| [eeefc0c] | 1371 | These hardware effects are accounted for when interpreting the following framework comparisons.
|
|---|
| 1372 | % The key takeaway for this comparison is the context it establishes for interpreting the following framework comparisons.
|
|---|
| 1373 | % Both the particulars of a machine's cache design, and a list length's effect on the program's cache friendliness, affect IR speed in the manner illustrated in this breakdown.
|
|---|
| 1374 | % That is, if you are running on an unknown machine, at a scale above anomaly-prone individuals, and below where major LLC caching effects take over the general intrusive-list advantage, but with an unknown relationship to the sizing of your fickle low-level caches, you are likely to experience an unpredictable speed impact on the order of 20\%.
|
|---|
| 1375 |
|
|---|
| 1376 | The second breakdown, use case (middle), shows the overall performance for each of the 12 use cases from \VRef[Figure]{f:ExperimentOperations} (624 / 12 = 52 experiments per column).
|
|---|
| 1377 | % A similar situation comes from \VRef[Figure]{fig:plot-list-1ord}'s second comparison, by use case.
|
|---|
| 1378 | While specific differences do occur, like framework X doing better on stacks than on queues, the overall range of the standard deviation of the individual use cases' means is only 9\%, indicating no unusual cases.
|
|---|
| 1379 | A more detailed analysis occurs in the discussion of \VRef[Figure]{fig:plot-list-2ord}.
|
|---|
| 1380 | % But they are so irrelevant to the issue of picking a winning framework that it is sufficient here to number the use cases opaquely.
|
|---|
| 1381 | % Whether a given list framework is suitable for a language's general library succeeds or fails without knowledge of whether your use will have stack or queue movement.
|
|---|
| 1382 | % So you face another lottery, with a likely win-loss range of the standard deviation of the individual use cases' means: 9\%.
|
|---|
| 1383 |
|
|---|
| 1384 | The third breakdown, framework (right), shows the overall performance of the 4 list implementations (624 / 3.25 = 192).
|
|---|
| 1385 | Here, \CFA runs similarly to \uCpp and LQ-@list@ runs similarly to @tailq@.
|
|---|
| [6762d46] | 1386 | The standard deviation of the frameworks' means is 8\%.
|
|---|
| [eeefc0c] | 1387 | % Framework choice has, therefore, less impact on your speed than the lottery tickets you already hold.
|
|---|
| 1388 | Now, \CFA/\uCpp run slower than LQ-@list@/@tailq@ by 15\%, a fact explored further in \VRef{s:SweetSoreSpots}.
|
|---|
| [4cf8832] | 1389 | But so too does use case VIII typically beat use case IV by 38\%.
|
|---|
| [6762d46] | 1390 | As does a small size on the Intel typically beat a medium size on the AMD by 66\%.
|
|---|
| [eeefc0c] | 1391 | Hence, architecture and usage patterns have a significant affect on the specific framework.
|
|---|
| [6762d46] | 1392 |
|
|---|
| [4cf8832] | 1393 |
|
|---|
| [bf73608] | 1394 | \subsection{Result: Sweet and Sore Spots}
|
|---|
| [eeefc0c] | 1395 | \label{s:SweetSoreSpots}
|
|---|
| [4cf8832] | 1396 |
|
|---|
| [1abcec9b] | 1397 | \begin{figure}
|
|---|
| 1398 | \centering
|
|---|
| [bf73608] | 1399 | \includegraphics{plot-list-2ord.pdf}\\
|
|---|
| 1400 | \small{
|
|---|
| 1401 | \textsuperscript{\textdagger} LQ-@list@ is absent from Movement and Polarity comparisons because it does not support queue and insert-last, respectively.\\
|
|---|
| 1402 | \textsuperscript{\textdaggerdbl} LQ-@list@ is (/24) by its incomplete (25\%) use case coverage. Its bars are scaled to match.\\
|
|---|
| 1403 | \textsuperscript{*} The full population of 192 individual configurations applies (48 for LQ-@list@), but this analysis summarizes pairs of them, giving each histogram's 96 contributions (24 for LQ-@list@).
|
|---|
| 1404 | }
|
|---|
| [9c12dd0] | 1405 | \caption[IR duration where framework selection interacts with other factors]{
|
|---|
| 1406 | IR duration where framework selection interacts with other factors.
|
|---|
| 1407 | Higher favours top option; lower favours bottom option.
|
|---|
| [1abcec9b] | 1408 | }
|
|---|
| 1409 | \label{fig:plot-list-2ord}
|
|---|
| 1410 | \end{figure}
|
|---|
| 1411 |
|
|---|
| [eeefc0c] | 1412 | % \VRef[Figure]{fig:plot-list-1ord} is focused on only first-order effects in order to contextualize a winner/loser framework observation.
|
|---|
| 1413 | % But this perspective cannot address questions like, ``Where are \CFA's sore spots?''
|
|---|
| 1414 | % Moreover, the shallow treatment of use cases by ordinals said nothing about how stack usage compares with queues.
|
|---|
| 1415 |
|
|---|
| 1416 | \VRef[Figure]{fig:plot-list-2ord} shows how frameworks react to a single other factor being varied across one pair of options.
|
|---|
| 1417 | Every (binned and mean-contributing) individual data point represents a pair of test setups, one with the criterion set to the option labelled at the top; the other setup uses the bottom option.
|
|---|
| 1418 | This point's y-axis score is the ratio of these setups' durations.
|
|---|
| 1419 | The point lands in a bin closer to the label of the option that performs better.
|
|---|
| [1abcec9b] | 1420 |
|
|---|
| [eeefc0c] | 1421 | The first breakdown, size zone (left), refines the notion that a small size runs faster than a big size;
|
|---|
| 1422 | this issue is by how much.
|
|---|
| [1abcec9b] | 1423 | Indeed, all means favour small and few tails favour medium.
|
|---|
| 1424 | But the various frameworks do no respond to the different sizes and machines uniformly.
|
|---|
| [eeefc0c] | 1425 | On the AMD, \CFA and \uCpp have a modest size sensitivity, LQ-tailq's is moderate, and LQ-list seems unaffected.
|
|---|
| 1426 | On the Intel, \CFA's increases to moderate, while \uCpp is now unaffected, and both LQs have a large effect.
|
|---|
| 1427 | Hence, the Intel is more sensitive to size than the AMD.
|
|---|
| [1abcec9b] | 1428 |
|
|---|
| [eeefc0c] | 1429 | The second breakdown, movement and polarity (middle), the responses are more subdued.
|
|---|
| 1430 | Note, LQ-list has no represntation in these comparisons because it only supports stacks that push and pop with the first element.
|
|---|
| [1abcec9b] | 1431 | \CFA is completely stable under movement and polarity changes.
|
|---|
| 1432 | \uCpp and LQ show modest responses favouring queues and insertion at last.
|
|---|
| 1433 |
|
|---|
| [eeefc0c] | 1434 | The third breakdown, accessor (right), the responses are close, except for \CFA.
|
|---|
| [1abcec9b] | 1435 | Note the pair of two-way comparisons pulled from the three experiment setups used.
|
|---|
| [eeefc0c] | 1436 | First, the all-head/insert-element addresses which insertion style is better---by-head (top) and by-element (bottom).
|
|---|
| 1437 | Then, the all-head/remove-element addresses which removal style is better---by-head (top) and by-element (bottom).
|
|---|
| [1abcec9b] | 1438 | The LQs favour insertion by head and removal by element.
|
|---|
| 1439 | \CFA and \uCpp favour both operations by head.
|
|---|
| 1440 | The strongest effect is \CFA's aversion to removal by element---certainly an opportunity for improvement.
|
|---|
| 1441 |
|
|---|
| [4cf8832] | 1442 |
|
|---|
| [eeefc0c] | 1443 | \begin{comment}
|
|---|
| [4cf8832] | 1444 | \subsection{\CFA Tiny-Size Anomaly}
|
|---|
| 1445 |
|
|---|
| 1446 | The \CFA list occasionally showed a concerning slowdown at length 1.
|
|---|
| 1447 | The issue, seen in \VRef[Figure]{fig:plot-list-short} (top-left corner), has \CFA taking above 10 ns per IR (top-left corner).
|
|---|
| 1448 | It occurs only for the queue movement, only on the AMD machine, and only for the \CFA framework.
|
|---|
| 1449 |
|
|---|
| [bf73608] | 1450 | Length-1 performance is an important case.
|
|---|
| [4cf8832] | 1451 | Lists like those of waiting threads are frequently left empty, with the occasional thread (or few) momentarily joining.
|
|---|
| 1452 | These scenarios need to work.
|
|---|
| 1453 |
|
|---|
| [bf73608] | 1454 | A cause of the slowdown was never determined.
|
|---|
| 1455 | Speculation is that \CFA's increased data dependency, a result of the tagging scheme, pairs poorly with the aliasing implied by queue movement.
|
|---|
| [4cf8832] | 1456 | The aliasing, at length 1, is: the head's first element is the head's last element.
|
|---|
| [bf73608] | 1457 | With stack movement, one of these names for the first element is reaused for both insert and remove.
|
|---|
| 1458 | While with queue movement, both names are used in alternation.
|
|---|
| [4cf8832] | 1459 |
|
|---|
| [bf73608] | 1460 | The breakdowns earlier in the performance assessment vary length only.
|
|---|
| [4cf8832] | 1461 | That is, they see the story down the leftmost column in a triangle.
|
|---|
| 1462 | The insight for contextualizing this issue was to inspect both length and width.
|
|---|
| 1463 |
|
|---|
| 1464 | The issue is seen as practically mitigated by noticing that the difficutly fades away as width increases.
|
|---|
| 1465 | This effect is seen both in \VRef[Figure]{fig:plot-list-short}'s easement across the top triangle rows, and, zoomed farther out, in \VRef[Figure]{fig:plot-list-wide}.
|
|---|
| 1466 |
|
|---|
| 1467 | Increasing the width matters to the aliasing hypothesis.
|
|---|
| 1468 | In a narrow experiment, one element's insert and remove happen in rapid succession.
|
|---|
| 1469 | So, the two aliases are exercied closer together, making a data hazard (that lacks ideal hardware treatment) stretch the instruction-pipeline schedule more significantly.
|
|---|
| 1470 | Increasing the width adds harness-induced gaps between the uses of each alias, behind which a potential hazard can hide.
|
|---|
| 1471 |
|
|---|
| 1472 | In the practical scenario that judges length-1 performance as relevant, width 1 is contrived.
|
|---|
| 1473 | A thread putting itself on an often-empty waiters' list is not doing so on one such list repeatedly, at least not without taking other situation-iduced pauses.
|
|---|
| 1474 |
|
|---|
| 1475 | Thus, the congestion at low width + length comes from the harness using repetition (in order to obtain a measurable time).
|
|---|
| 1476 | It does not reflect the situation that motivates the legitimate desire for good length-1 performance.
|
|---|
| 1477 |
|
|---|
| 1478 | There likely is a real hazard, unique to the \CFA framework, when a queue movement is repeated on a tiny list \emph{without other interventing action}.
|
|---|
| 1479 | Doing so is believed to occur only in contrived situations.
|
|---|
| 1480 |
|
|---|
| 1481 |
|
|---|
| 1482 | \begin{figure}
|
|---|
| 1483 | \centering
|
|---|
| 1484 | \includegraphics[trim={00in, 5.5in, 0in, 0in}, clip, scale=0.8]{plot-list-short-temp.pdf}
|
|---|
| 1485 | \caption{Behaviour at very short lengths.}
|
|---|
| 1486 | \label{fig:plot-list-short}
|
|---|
| 1487 | \end{figure}
|
|---|
| 1488 |
|
|---|
| 1489 | \begin{figure}
|
|---|
| 1490 | \centering
|
|---|
| 1491 | \includegraphics[trim={0.25in, 1in, 0.25in, 1in}, clip, scale=0.5]{plot-list-wide-temp.pdf}
|
|---|
| 1492 | \caption{Length-1 anomaly resolving at modest width. Points are for varying widths, at fixed length 1.}
|
|---|
| 1493 | \label{fig:plot-list-wide}
|
|---|
| 1494 | \end{figure}
|
|---|
| [eeefc0c] | 1495 | \end{comment}
|
|---|
| [4cf8832] | 1496 |
|
|---|
| [1abcec9b] | 1497 | \begin{comment}
|
|---|
| 1498 | These remarks are mostly about 3ord over 2ord.
|
|---|
| 1499 | This analysis does not provide more detail about one framework beating another; it offers different benefits.
|
|---|
| 1500 | These interactions further illustrate the lottery-ticket unpredictability that a linked-list user inevitably faces, by revealing stronger-still performance swings hidden from the first-order view.
|
|---|
| 1501 | They illustrate the difficult signal-to-noise ratio that I had to overcome in preparing this data.
|
|---|
| 1502 | They may serve as a reference guiding future \CFA linked-list work by informing on where to target improvements.
|
|---|
| [4cf8832] | 1503 | Finally, the findings offer the conclusion that \CFA's list offers more consistent performance across usage scenarios, than the other lists.
|
|---|
| [1abcec9b] | 1504 | \end{comment}
|
|---|
| 1505 |
|
|---|
| 1506 | \begin{comment}
|
|---|
| 1507 | Further to the interpretation guidance of \VRef[Figure]{fig:plot-list-2ord}'s caption, a comparison with the construction of \VRef[Figure]{fig:plot-list-1ord} may be helpful.
|
|---|
| 1508 | In the first-order graph, the factors being compared had many options: four, twelve and four.
|
|---|
| 1509 | # XXX Each option contributes its own, seemingly independent, mean and distribution.
|
|---|
| 1510 | # XXX But, in fact, they are not totally independent.
|
|---|
| 1511 | # WRONG: it's not just about binary, you also need a split on a conditioned factor
|
|---|
| 1512 | I'm trying to get to:
|
|---|
| 1513 | Side by side in earlier style, but they're opposites, so they mirror each other, so you take option B, flip it over, and have option A---I'll just show A
|
|---|
| 1514 | \end{comment}
|
|---|
| 1515 |
|
|---|
| [4cf8832] | 1516 | \begin{comment}
|
|---|
| [17f2a7f4] | 1517 | \VRef[Figure]{fig:plot-list-zoomin} shows the sizes below 150 blown up.
|
|---|
| [eeefc0c] | 1518 | % The same scenario as the coarse comparison is used: a stack, with insertions and removals happening at the end called ``first,'' ``head'' or ``front,'' and all changes occurring through a head-provided IR operation.
|
|---|
| [16a843b] | 1519 | The error bars show fastest and slowest time seen on five trials, and the central point is the mean of the remaining three trials.
|
|---|
| [9367cd6] | 1520 | For readability, the points are slightly staggered at a given horizontal value, where the points might otherwise appear on top of each other.
|
|---|
| [4cf8832] | 1521 | The experiment runs twelve use cases;
|
|---|
| [75ba2fa6] | 1522 | the ones chosen for their variety are scenarios I and VIII from the listing of \VRef[Figure]{fig:plot-list-mchn-szz}, and their results appear in the rows.
|
|---|
| 1523 | As in the previous experiment, each hardware architecture appears in a column.
|
|---|
| [17f2a7f4] | 1524 | Note that LQ-list does not support queue operations, so this framework is absent from Operation VIII.
|
|---|
| 1525 |
|
|---|
| 1526 | At lengths 1 and 2, winner patterns seen at larger sizes generally do not apply.
|
|---|
| [75ba2fa6] | 1527 | Indeed, an issue with \CFA giving terrible on queues at length 1 is evident in \VRef[Figure]{f:AbsoluteTime-viii-swift}.
|
|---|
| [17f2a7f4] | 1528 | This phenomenon is elaborated in \MLB{TODO: xref}.
|
|---|
| 1529 | For the remainder of this section, these sizes are disregarded.
|
|---|
| 1530 |
|
|---|
| 1531 | Even after the very-small anomalies, the selections of operation and machine significantly affect how speed responds to size.
|
|---|
| 1532 | For example, Operation I on the AMD (\VRef[Figure]{f:AbsoluteTime-i-swift}) has \CFA generally winning over LQ, while the opposite is seen by switching either to Operation VIII (\VRef[Figure]{f:AbsoluteTime-viii-swift}) or to the Intel (\VRef[Figure]{f:AbsoluteTime-i-java}).
|
|---|
| 1533 | For another, Operation I has sore spots at lengths in the middle for \uCpp on AMD and LQ-list on Intel, which resolve at larger lengths; yet no such pattern presents with Operation VIII.
|
|---|
| 1534 |
|
|---|
| 1535 | In spite of these complex interactions, a couple spots of stability can be analyzed.
|
|---|
| [0f9c67bf] | 1536 | In these examples, the two defined Size Zones (4--16 being ``small,'' and above 50 being ``medium,'') covering four specific sizes apiece, each tends to show a simple winner/loser story.
|
|---|
| [17f2a7f4] | 1537 | Manual inspection of other such plots (not detailed) showed that this quality is generally upheld.
|
|---|
| 1538 | So these zones are used for basing comparison.
|
|---|
| 1539 |
|
|---|
| [e2e927e] | 1540 | \MLB{Peter, caution beyond here. I am reconsidering if this first-dismiss-physical approach to comparison is simplest.}
|
|---|
| 1541 |
|
|---|
| [17f2a7f4] | 1542 | \begin{figure}
|
|---|
| 1543 | \centering
|
|---|
| 1544 | \includegraphics{plot-list-mchn-szz.pdf}
|
|---|
| [e2e927e] | 1545 | \caption{Histogram of operation durations, decomposed by physical factors.
|
|---|
| 1546 | Measurements are included from only the sizes in the ``small'' and ``medium'' stable zones.
|
|---|
| [1abcec9b] | 1547 | This breakdown divides the entire population of test results into four mutually disjoint constituents.
|
|---|
| 1548 | \MLB{I see that I broke it. But we might be getting rid of it.}
|
|---|
| 1549 | }
|
|---|
| [17f2a7f4] | 1550 | \label{fig:plot-list-mchn-szz}
|
|---|
| 1551 | \end{figure}
|
|---|
| 1552 |
|
|---|
| 1553 | \VRef[Figure]{fig:plot-list-mchn-szz} shows the effects of the physical factors of size zone and machine.
|
|---|
| 1554 | Each of these four histograms shows variation in duration coming from the four specific sizes in a size zone, from combining results of all twelve operations and all four frameworks.
|
|---|
| 1555 | Among the means of the four histograms, there is a standard deviation of 0.9 ns, which is 20\% of the global mean.
|
|---|
| [1abcec9b] | 1556 | This variability is due solely to physical factors.
|
|---|
| [17f2a7f4] | 1557 |
|
|---|
| 1558 | From the perspective of assessing winning/losing frameworks, these physical effects are noise.
|
|---|
| 1559 | So, subsequent analysis conditions on the phisical effects.
|
|---|
| [e2e927e] | 1560 | That is, it supposes you are put into an unknown physical situation (that is one of the four being tested), then presents all the ways your outcome could change as a result of non-physical factors, assuming that the physical situation is kept constant.
|
|---|
| 1561 | It does do by presenting results relative to the mean of the physical quadrant (\VRef[fig]{fig:plot-list-mchn-szz} histogram) to which it belogs.
|
|---|
| [17f2a7f4] | 1562 | With this adjustment, absolute duration values (in nonsecods) are lost.
|
|---|
| 1563 | In return, the physical quadrants are re-combined, enabling assessment of the non-physical factors.
|
|---|
| [4cf8832] | 1564 | \end{comment}
|
|---|
| [17f2a7f4] | 1565 |
|
|---|
| 1566 | \begin{comment}
|
|---|
| [9367cd6] | 1567 | While preparing experiment results, I first tested on my old office PC, AMD FX-8370E Eight-Core, before switching to the large new server for final testing.
|
|---|
| 1568 | For this experiment, the results flipped in my favour when running on the server.
|
|---|
| 1569 | New CPU architectures are now amazingly good at branch prediction and micro-parallelism in the pipelines.
|
|---|
| [39ffa5e] | 1570 | Specifically, on the PC, my \CFA and companion \uCpp lists are slower than lq-tail and lq-list by 10\% to 20\%.
|
|---|
| 1571 | On the server, \CFA and \uCpp lists are can be fast by up to 100\%.
|
|---|
| [9367cd6] | 1572 | Overall, LQ-tailq does the best at short lengths but loses out above a dozen elements.
|
|---|
| [17f2a7f4] | 1573 | \end{comment}
|
|---|
| [16a843b] | 1574 |
|
|---|
| 1575 | % \begin{figure}
|
|---|
| 1576 | % \centering
|
|---|
| 1577 | % \begin{tabular}{c}
|
|---|
| 1578 | % \includegraphics{plot-list-cmp-intrl-shift.pdf} \\
|
|---|
| 1579 | % (a) \\
|
|---|
| 1580 | % \includegraphics{plot-list-cmp-intrl-outcome.pdf} \\
|
|---|
| 1581 | % (b) \\
|
|---|
| 1582 | % \end{tabular}
|
|---|
| 1583 | % \caption{Caption TODO}
|
|---|
| 1584 | % \label{fig:plot-list-cmp-intrl}
|
|---|
| 1585 | % \end{figure}
|
|---|
| 1586 |
|
|---|
| [8eb85de] | 1587 | \begin{comment}
|
|---|
| [bb9897c] | 1588 | \subsection{Result: \CFA cost attribution}
|
|---|
| [16a843b] | 1589 |
|
|---|
| [9367cd6] | 1590 | This comparison loosely itemizes the reasons that the \CFA implementation runs 15--20\% slower than LQ.
|
|---|
| 1591 | Each reason provides for safer programming.
|
|---|
| 1592 | For each reason, a version of the \CFA list was measured that forgoes its safety and regains some performance.
|
|---|
| 1593 | These potential sacrifices are:
|
|---|
| [16a843b] | 1594 | \newcommand{\mandhead}{\emph{mand-head}}
|
|---|
| 1595 | \newcommand{\nolisted}{\emph{no-listed}}
|
|---|
| 1596 | \newcommand{\noiter}{\emph{no-iter}}
|
|---|
| 1597 | \begin{description}
|
|---|
| 1598 | \item[mand(atory)-head] Removing support for headless lists.
|
|---|
| 1599 | A specific explanation of why headless support causes a slowdown is not offered.
|
|---|
| [9367cd6] | 1600 | But it is reasonable for a cost to result from making one piece of code handle multiple cases; the subset of the \CFA list API that applies to headless lists shares its implementation with headed lists.
|
|---|
| [16a843b] | 1601 | In the \mandhead case, disabling the feature in \CFA means using an older version of the implementation, from before headless support was added.
|
|---|
| 1602 | In the pre-headless library, trying to form a headless list (instructing, ``Insert loose element B after loose element A,'') is a checked runtime error.
|
|---|
| 1603 | LQ does not support headless lists\footnote{
|
|---|
| 1604 | Though its documentation does not mention the headless use case, this fact is due to one of its insert-before or insert-after routines being unusable in every list model.
|
|---|
| 1605 | For \lstinline{tailq}, the API requires a head.
|
|---|
| 1606 | For \lstinline{list}, this usage causes an ``uncaught'' runtime crash.}.
|
|---|
| 1607 | \item[no-listed] Removing support for the @is_listed@ API query.
|
|---|
| [9367cd6] | 1608 | Along with it goes error checking such as ``When inserting an element, it must not already be listed, \ie be referred to from somewhere else.''
|
|---|
| [16a843b] | 1609 | These abilities have a cost because, in order to support them, a listed element that is being removed must be written to, to record its change in state.
|
|---|
| 1610 | In \CFA's representation, this cost is two pointer writes.
|
|---|
| 1611 | To disable the feature, these writes, and the error checking that consumes their result, are put behind an @#ifdef@.
|
|---|
| 1612 | The result is that a removed element sees itself as still having neighbours (though these quasi-neighbours see it differently).
|
|---|
| 1613 | This state is how LQ leaves a removed element; LQ does not offer an is-listed query.
|
|---|
| 1614 | \item[no-iter(ation)] Removing support for well-terminating iteration.
|
|---|
| 1615 | The \CFA list uses bit-manipulation tagging on link poiters (rather than \eg null links) to express, ``No more elements this way.''
|
|---|
| 1616 | This tagging has the cost of submitting a retrieved value to the ALU, and awaiting this operation's completion, before dereferencing a link pointer.
|
|---|
| 1617 | In some cases, the is-terminating bit is transferred from one link to another, or has a similar influence on a resulting link value; this logic adds register pressure and more data dependency.
|
|---|
| 1618 | To disable the feature, the @#ifdef@-controlled tag manipulation logic compiles in answers like, ``No, that link is not a terminator,'' ``The dereferenceable pointer is the value you read from memory,'' and ``The terminator-marked value you need to write is the pointer you started with.''
|
|---|
| 1619 | Without this termination marking, repeated requests for a next valid item will always provide a positive response; when it should be negative, the indicated next element is garbage data at an address unlikely to trigger a memory error.
|
|---|
| 1620 | LQ has a well-terminating iteration for listed elements.
|
|---|
| [9367cd6] | 1621 | In the \noiter case, the slowdown is not inherent; it represents a \CFA optimization opportunity.
|
|---|
| [16a843b] | 1622 | \end{description}
|
|---|
| 1623 | \MLB{Ensure benefits are discussed earlier and cross-reference} % an LQ programmer must know not to ask, ``Who's next?'' about an unlisted element; an LQ programmer cannot write assertions about an item being listed; LQ requiring a head parameter is an opportunity for the user to provide inconsistent data
|
|---|
| 1624 |
|
|---|
| 1625 | \begin{figure}
|
|---|
| 1626 | \centering
|
|---|
| 1627 | \begin{tabular}{c}
|
|---|
| [8eb85de] | 1628 | \includegraphics{plot-list-cfa-attrib-swift.pdf} \\
|
|---|
| [16a843b] | 1629 | (a) \\
|
|---|
| [8eb85de] | 1630 | \includegraphics{plot-list-cfa-attrib-remelem-swift.pdf} \\
|
|---|
| [16a843b] | 1631 | (b) \\
|
|---|
| 1632 | \end{tabular}
|
|---|
| 1633 | \caption{Operation duration ranges for functionality-reduced \CFA list implementations. (a) has the top level slices. (b) has the next level of slicing within the slower element-based removal operation.}
|
|---|
| 1634 | \label{fig:plot-list-cfa-attrib}
|
|---|
| 1635 | \end{figure}
|
|---|
| 1636 |
|
|---|
| 1637 | \VRef[Figure]{fig:plot-list-cfa-attrib} shows the \CFA list performance with these features, and their combinations, turned on and off. When a series name is one of the three sacrifices above, the series is showing this sacrifice in isolation. These further series names give combinations:
|
|---|
| 1638 | \newcommand{\attribFull}{\emph{full}}
|
|---|
| 1639 | \newcommand{\attribParity}{\emph{parity}}
|
|---|
| 1640 | \newcommand{\attribStrip}{\emph{strip}}
|
|---|
| 1641 | \begin{description}
|
|---|
| 1642 | \item[full] No sacrifices. Same as measurements presented earlier.
|
|---|
| 1643 | \item[parity] \mandhead + \nolisted. Feature parity with LQ.
|
|---|
| 1644 | \item[strip] \mandhead + \nolisted + \noiter. All options set to ``faster.''
|
|---|
| 1645 | \end{description}
|
|---|
| 1646 | All list implementations are \CFA, possibly stripped.
|
|---|
| 1647 | The plot uses the same LQ-relative basis as earlier.
|
|---|
| 1648 | So getting to zero means matching LQ's @tailq@.
|
|---|
| 1649 |
|
|---|
| 1650 | \VRef[Figure]{fig:plot-list-cfa-attrib}-(a) summarizes the time attribution across the main operating scenarios.
|
|---|
| 1651 | The \attribFull series is repeated from \VRef[Figure]{fig:plot-list-cmp-overall}, part (b), while the series showing feature sacrifices are new.
|
|---|
| 1652 | Going all the way to \attribStrip at least nearly matches LQ in all operating scenarios, beats LQ often, and slightly beats LQ overall.
|
|---|
| 1653 | Except within the accessor splits, both sacrifices contribute improvements individually, \noiter helps more than \attribParity, and the total \attribStrip benefit depends on both contributions.
|
|---|
| 1654 | When the accessor is not element removal, the \attribParity shift appears to be counterproductive, leaving \noiter to deliver most of the benefit.
|
|---|
| 1655 | For element removals, \attribParity is the heavy hitter, with \noiter contributing modestly.
|
|---|
| 1656 |
|
|---|
| [9367cd6] | 1657 | The counterproductive shift outside of element removals is likely due to optimization done in the \attribFull version after implementing headless support, \ie not present in the \mandhead version.
|
|---|
| [16a843b] | 1658 | This work streamlined both head-based operations (head-based removal being half the work of the element-insertion test).
|
|---|
| 1659 | This improvement could be ported to a \mandhead-style implementation, which would bring down the \attribParity time in these cases.
|
|---|
| 1660 |
|
|---|
| 1661 | More significantly, missing this optimization affects every \attribParity result because they all use head-based inserts or removes for at least half their operations.
|
|---|
| 1662 | It is likely a reason that \attribParity is not delivering as well overall as \noiter.
|
|---|
| 1663 | It even represents plausible further improvements in \attribStrip.
|
|---|
| 1664 |
|
|---|
| 1665 | \VRef[Figure]{fig:plot-list-cfa-attrib}-(b) addresses element removal being the overall \CFA slow spot and element removal having a peculiar shape in the (a) analysis.
|
|---|
| [9367cd6] | 1666 | Here, the \attribParity sacrifice bundle is broken out into its two constituents.
|
|---|
| [16a843b] | 1667 | The result is the same regardless of the operation.
|
|---|
| 1668 | All three individual sacrifices contribute noteworthy improvements (\nolisted slightly less).
|
|---|
| 1669 | The fullest improvement requires all of them.
|
|---|
| 1670 |
|
|---|
| 1671 | The \noiter feature sacrifice is unpalatable.
|
|---|
| 1672 | But because it is not an inherent slowdown, there may be room to pursue a \noiter-level speed improvement without the \noiter feature sacrifice.
|
|---|
| 1673 | The performance crux for \noiter is the pointer-bit tagging scheme.
|
|---|
| 1674 | Alternative designs that may offer speedup with acceptable consequences include keeping the tag information in a separate field, and (for 64-bit architectures) keeping it in the high-order byte \ie using byte- rather than bit-oriented instructions to access it.
|
|---|
| 1675 | The \noiter speed improvement would bring \CFA to +5\% of LQ overall, and from high twenties to high teens, in the worst case of element removal.
|
|---|
| 1676 |
|
|---|
| [9367cd6] | 1677 | Ultimately, this analysis provides options for a future effort that needs to get the most speed out of the \CFA list.
|
|---|
| [8eb85de] | 1678 | \end{comment}
|
|---|
| [16a843b] | 1679 |
|
|---|
| 1680 |
|
|---|
| [5717495] | 1681 | \section{Future Work}
|
|---|
| 1682 | \label{toc:lst:futwork}
|
|---|
| 1683 |
|
|---|
| [75ba2fa6] | 1684 | Not discussed in the chapter is a \CFA type-system issue with Plan-9 @inline@ declarations, implementing the trait @embedded@ to access the contained @dlist@ link-fields.
|
|---|
| 1685 | This trait defines an implicit conversion from derived to base (the safe direction).
|
|---|
| 1686 | Such a conversion exists implicitly for concrete types using the Plan-9 inheritance feature.
|
|---|
| 1687 | However, this implicit conversion is only partially implemented for polymorphism types, such as @dlist@, which prevents the straightforward list interface shown throughout the chapter.
|
|---|
| [bb9897c] | 1688 |
|
|---|
| [75ba2fa6] | 1689 | My workaround is the macro @P9_EMBEDDED@ placed before an intrusive node is used to declare a list.
|
|---|
| 1690 | \begin{cfa}
|
|---|
| 1691 | struct node {
|
|---|
| 1692 | int v;
|
|---|
| 1693 | inline dlink(node);
|
|---|
| 1694 | };
|
|---|
| 1695 | @P9_EMBEDDED( node, dlink(node) );@
|
|---|
| 1696 | dlist( node ) dlist;
|
|---|
| 1697 | \end{cfa}
|
|---|
| 1698 | The macro creates specialized access functions to explicitly extract the required information for the polymorphic @dlist@ type.
|
|---|
| 1699 | These access functions could have been generated implicitly for each intrusive node by adding another compiler pass.
|
|---|
| 1700 | However, this would have been substantial temporary work, when the correct solution is the compiler fix.
|
|---|
| 1701 | Hence, the macro workaround is only a small temporary inconvenience;
|
|---|
| 1702 | otherwise, all the list API shown in this chapter works.
|
|---|
| 1703 |
|
|---|
| 1704 |
|
|---|
| 1705 | \begin{comment}
|
|---|
| [bb9897c] | 1706 | An API author has decided that the intended user experience is:
|
|---|
| 1707 | \begin{itemize}
|
|---|
| 1708 | \item
|
|---|
| 1709 | offer the user an opaque type that abstracts the API's internal state
|
|---|
| 1710 | \item
|
|---|
| 1711 | tell the user to extend this type
|
|---|
| 1712 | \item
|
|---|
| 1713 | provide functions with a ``user's type in, user's type out'' style
|
|---|
| 1714 | \end{itemize}
|
|---|
| 1715 | Fig XXX shows several attempts to provide this experience.
|
|---|
| 1716 | The types in (a) give the setup, achieving the first two points, while the pair of function declarations and calls of (b) are unsuccessful attempts at achieving the third point.
|
|---|
| 1717 | Both functions @f1@ and @f2@ allow calls of the form @f( d )@, but @f1@ has the wrong return type (@Base@) for initializing a @Derived@.
|
|---|
| 1718 | The @f1@ signature fails to meet ``user's type out;'' this signature does not give the \emph{user} a usable type.
|
|---|
| 1719 | On the other hand, the signature @f2@ offers the desired user experience, so the API author proceeds with trying to implement it.
|
|---|
| 1720 |
|
|---|
| 1721 | \begin{figure}
|
|---|
| 1722 | \begin{cfa}
|
|---|
| 1723 | #ifdef SHOW_ERRS
|
|---|
| 1724 | #define E(...) __VA_ARGS__
|
|---|
| 1725 | #else
|
|---|
| 1726 | #define E(...)
|
|---|
| 1727 | #endif
|
|---|
| 1728 |
|
|---|
| 1729 | // (a)
|
|---|
| 1730 | struct Base { /*...*/ }; // system offers
|
|---|
| 1731 | struct Derived { /*...*/ inline Base; }; // user writes
|
|---|
| 1732 |
|
|---|
| 1733 | // (b)
|
|---|
| 1734 | // system offers
|
|---|
| 1735 | Base & f1( Base & );
|
|---|
| 1736 | forall( T ) T & f2( T & );
|
|---|
| 1737 | // user writes
|
|---|
| 1738 | void to_elide1() {
|
|---|
| 1739 | Derived & d /* = ... */;
|
|---|
| 1740 | f1( d );
|
|---|
| 1741 | E( Derived & d1 = f1( d ); ) // error: return is not Derived
|
|---|
| 1742 | Derived & d2 = f2( d );
|
|---|
| 1743 |
|
|---|
| 1744 | // (d), user could write
|
|---|
| 1745 | Base & b = d;
|
|---|
| 1746 | }
|
|---|
| 1747 |
|
|---|
| 1748 | // (c), system has
|
|---|
| 1749 | static void helper( Base & );
|
|---|
| 1750 | forall( T ) T & f2( T & param ) {
|
|---|
| 1751 | E( helper( param ); ) // error: param is not Base
|
|---|
| 1752 | return param;
|
|---|
| 1753 | }
|
|---|
| 1754 | static void helper( Base & ) {}
|
|---|
| 1755 |
|
|---|
| 1756 | #include <list2.hfa>
|
|---|
| 1757 | // (e)
|
|---|
| 1758 | // system offers, has
|
|---|
| 1759 | forall( T | embedded(T, Base, Base) )
|
|---|
| 1760 | T & f3( T & param ) {
|
|---|
| 1761 | helper( param`inner ); // ok
|
|---|
| 1762 | return param;
|
|---|
| 1763 | }
|
|---|
| 1764 | // user writes
|
|---|
| 1765 | struct DerivedRedux { /*...*/ inline Base; };
|
|---|
| 1766 | P9_EMBEDDED( DerivedRedux, Base )
|
|---|
| 1767 | void to_elide2() {
|
|---|
| 1768 | DerivedRedux & dr /* = ... */;
|
|---|
| 1769 | DerivedRedux & dr3 = f3( dr );
|
|---|
| 1770 |
|
|---|
| 1771 | // (f)
|
|---|
| 1772 | // user writes
|
|---|
| 1773 | Derived & xxx /* = ... */;
|
|---|
| 1774 | E( Derived & yyy = f3( xxx ); ) // error: xxx is not embedded
|
|---|
| 1775 | }
|
|---|
| 1776 | \end{cfa}
|
|---|
| 1777 | \end{figure}
|
|---|
| 1778 |
|
|---|
| 1779 |
|
|---|
| 1780 | The \CFA list examples elide the @P9_EMBEDDED@ annotations that (TODO: xref P9E future work) proposes to obviate.
|
|---|
| [4740281] | 1781 | Thus, these examples illustrate a to-be state, free of what is to be historic clutter.
|
|---|
| 1782 | The elided portions are immaterial to the discussion and the examples work with the annotations provided.
|
|---|
| 1783 | The \CFA test suite (TODO:cite?) includes equivalent demonstrations, with the annotations included.
|
|---|
| 1784 | \begin{cfa}
|
|---|
| 1785 | struct mary {
|
|---|
| 1786 | float anotherdatum;
|
|---|
| 1787 | inline dlink(mary);
|
|---|
| 1788 | };
|
|---|
| 1789 | struct fred {
|
|---|
| 1790 | float adatum;
|
|---|
| 1791 | inline struct mine { inline dlink(fred); };
|
|---|
| 1792 | inline struct yours { inline dlink(fred); };
|
|---|
| 1793 | };
|
|---|
| 1794 | \end{cfa}
|
|---|
| 1795 | like in the thesis examples. You have to say
|
|---|
| 1796 | \begin{cfa}
|
|---|
| 1797 | struct mary {
|
|---|
| 1798 | float anotherdatum;
|
|---|
| 1799 | inline dlink(mary);
|
|---|
| 1800 | };
|
|---|
| 1801 | P9_EMBEDDED(mary, dlink(mary))
|
|---|
| 1802 | struct fred {
|
|---|
| 1803 | float adatum;
|
|---|
| 1804 | inline struct mine { inline dlink(fred); };
|
|---|
| 1805 | inline struct yours { inline dlink(fred); };
|
|---|
| 1806 | };
|
|---|
| 1807 | P9_EMBEDDED(fred, fred.mine)
|
|---|
| 1808 | P9_EMBEDDED(fred, fred.yours)
|
|---|
| 1809 | P9_EMBEDDED(fred.mine, dlink(fred))
|
|---|
| 1810 | P9_EMBEDDED(fred.yours, dlink(fred))
|
|---|
| 1811 | \end{cfa}
|
|---|
| [bb9897c] | 1812 |
|
|---|
| 1813 |
|
|---|
| 1814 | The function definition in (c) gives this system-implementation attempt.
|
|---|
| 1815 | The system needs to operate on its own data, stored in the @Base@ part of the user's @d@, now called @param@.
|
|---|
| 1816 | Calling @helper@ represents this attempt to look inside.
|
|---|
| 1817 | It fails, because the @f2@ signature does not state that @param@ has any relationship to @Base@;
|
|---|
| 1818 | this signature does not give the \emph{system} a usable type.
|
|---|
| 1819 | The fact that the user's argument can be converted to @Base@ is lost when going through this signature.
|
|---|
| 1820 |
|
|---|
| 1821 | Moving forward needs an @f@ signature that conveys the relationship that the argument @d@ has to @Base@.
|
|---|
| 1822 | \CFA conveys type abilities, from caller to callee, by way of traits; so the challenge is to state the right trait member.
|
|---|
| 1823 | As initialization (d) illustrates, the @d@--@Base@ capability is an implicit conversion.
|
|---|
| 1824 | Unfortunately, in present state, \CFA does not have a first-class representation of an implicit conversion, the way operator @?{}@ (which is a value), done with the right overload, is arguably an explicit conversion.
|
|---|
| 1825 | So \CFA cannot directly convey ``@T@ compatible with @Base@'' in a trait.
|
|---|
| 1826 |
|
|---|
| 1827 | This work contributes a stand-in for such an ability, tunneled through the present-state trait system, shown in (e).
|
|---|
| 1828 | On the declaration/system side, the trait @embedded@, and its member, @`inner@, convey the ability to recover the @Base@, and thereby call @helper@.
|
|---|
| 1829 | On the user side, the @P9_EMBEDDED@ macro accompanies type definitions that work with @f3@-style declarations.
|
|---|
| 1830 | An option is presemt-state-available to compiler-emit @P9_EMBEDDED@ annotations automatically, upon each occurrence of an `inline` member.
|
|---|
| 1831 | The choice not to is based on conversions in \CFA being a moving target because of separate, ongoing work.
|
|---|
| 1832 |
|
|---|
| 1833 | An intended finished state for this area is achieved if \CFA's future efforts with conversions include:
|
|---|
| 1834 | \begin{itemize}
|
|---|
| 1835 | \item
|
|---|
| 1836 | treat conversion as operator(s), \ie values
|
|---|
| 1837 | \item
|
|---|
| 1838 | re-frame the compiler's current Plan-9 ``magic'' as seeking an implicit conversion operator, rather than seeking an @inline@ member
|
|---|
| 1839 | \item
|
|---|
| 1840 | make Plan-9 syntax cause an implementation of implicit conversion to exist (much as @struct@ syntax causes @forall(T)@ compliance to exist)
|
|---|
| 1841 | \end{itemize}
|
|---|
| 1842 | To this end, the contributed @P9_EMBEDDED@ expansion shows how to implement this conversion.
|
|---|
| 1843 |
|
|---|
| 1844 |
|
|---|
| [4740281] | 1845 | like in tests/list/dlist-insert-remove.
|
|---|
| 1846 | Future work should autogen those @P9_EMBEDDED@ declarations whenever it sees a plan-9 declaration.
|
|---|
| 1847 | The exact scheme chosen should harmonize with general user-defined conversions.
|
|---|
| 1848 |
|
|---|
| 1849 | Today's P9 scheme is: mary gets a function `inner returning this as dlink(mary).
|
|---|
| 1850 | Fred gets four of them in a diamond.
|
|---|
| 1851 | They're defined so that `inner is transitive; i.e. fred has two further ambiguous overloads mapping fred to dlink(fred).
|
|---|
| 1852 | The scheme allows the dlist functions to give the assertion, "we work on any T that gives a `inner to dlink(T)."
|
|---|
| 1853 |
|
|---|
| [5717495] | 1854 |
|
|---|
| 1855 | TODO: deal with: A doubly linked list is being designed.
|
|---|
| 1856 |
|
|---|
| 1857 | TODO: deal with: Link fields are system-managed.
|
|---|
| 1858 | Links in GDB.
|
|---|
| [75ba2fa6] | 1859 | \end{comment}
|
|---|