- Timestamp:
- Apr 11, 2026, 6:39:52 PM (17 hours ago)
- Branches:
- master
- Parents:
- e8a7b66d
- Location:
- doc/theses/mike_brooks_MMath
- Files:
-
- 9 edited
-
list.tex (modified) (9 diffs)
-
plots/list-zoomin-abs-i-java.gp (modified) (2 diffs)
-
plots/list-zoomin-abs-i-swift.gp (modified) (2 diffs)
-
plots/list-zoomin-abs-viii-java.gp (modified) (2 diffs)
-
plots/list-zoomin-abs-viii-swift.gp (modified) (2 diffs)
-
plots/list-zoomout-noshuf-swift.gp (modified) (1 diff)
-
plots/list-zoomout-shuf-java.gp (modified) (2 diffs)
-
plots/list-zoomout-shuf-swift.gp (modified) (2 diffs)
-
uw-ethesis-frontpgs.tex (modified) (3 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mike_brooks_MMath/list.tex
re8a7b66d r75ba2fa6 695 695 696 696 \subsubsection{Experiment setup} 697 \label{s:ExperimentSetup} 697 698 698 699 The experiment driver defines a (intrusive) node type: … … 844 845 \VRef[Figure]{fig:plot-list-zoomout} presents throughput at various list lengths for a linear and random (shuffled) insert/remove test. 845 846 Other kinds of scans were made, but the results are similar in many cases, so it is sufficient to discuss these two scans, representing difference ends of the access spectrum. 846 In the graphs, all four intrusive lists are plotted with the same symbol; 847 theses symbols clumped on top of each, showing the performance difference among intrusive lists is small in comparison to the wrapped list \see{\VRef{s:ComparingIntrusiveImplementations} for details among intrusive lists}. 847 In the graphs, all four intrusive lists (lq-list, lq-tailq, upp-upp, cfa-cfa, see end of \VRef{s:ExperimentSetup}) are plotted with the same symbol; 848 sometimes theses symbols clump on top of each other, showing the performance difference among intrusive lists is small in comparison to the wrapped list (std::list). 849 See~\VRef{s:ComparingIntrusiveImplementations} for details among intrusive lists. 848 850 849 851 The list lengths start at 10 due to the short insert/remove times of 2--4 ns, for intrusive lists, \vs STL's wrapped-reference list of 15--20 ns. … … 866 868 \\ 867 869 & 868 \subfloat[ Shuffled List Nodes, AMD]{\label{f:Shuffled-swift}870 \subfloat[Random List Nodes, AMD]{\label{f:Random-swift} 869 871 \hspace*{-0.75in} 870 872 \includegraphics{plot-list-zoomout-shuf-swift.pdf} 871 873 } % subfigure 872 874 & 873 \subfloat[ Shuffled List Nodes, Intel]{\label{f:Shuffled-java}875 \subfloat[Random List Nodes, Intel]{\label{f:Random-java} 874 876 \includegraphics{plot-list-zoomout-shuf-java.pdf} 875 877 } % subfigure … … 877 879 \caption{Insert/remove duration \vs list length. 878 880 Lengths go as large possible without error. 879 One example operation is shown: stack movement, insert-first polarity and head-mediated access. }881 One example operation is shown: stack movement, insert-first polarity and head-mediated access. Lower is better.} 880 882 \label{fig:plot-list-zoomout} 881 883 \end{figure} 882 884 883 The large difference in performance between intrusive and wrapped-reference lists results from the dynamic allocation for the wrapped nodes. 884 In effect, this experiment is largely measuring the cost of malloc/free rather than the insert/remove, and is affected by the layout of memory by the allocator. 885 Fundamentally, insert/remove for a doubly linked-list has a basic cost, which is seen by the similar results for the intrusive lists. 886 Hence, the costs for a wrapped-reference list are: allocating/deallocating a wrapped node, copying a external-node pointer into the wrapped node for insertion, and linking the wrapped node to/from the list; 887 the dynamic allocation dominates these costs. 888 For example, the experiment was run with both glibc and llheap memory allocators, where performance from llheap reduced the cost from 20 to 16 ns, which is still far from the 2--4 ns for intrusive lists. 889 Unfortunately, there is no way to tease apart the linking and allocation costs for wrapped lists, as there is no way to preallocate the list nodes without writing a mini-allocator to manage the storage. 890 Hence, dynamic allocation cost is fundamental for wrapped lists. 885 The key performance factor between the intrusive and the wrapped-reference lists is the dynamic allocation for the wrapped nodes. 886 Hence, this experiment is largely measuring the cost of @malloc@/\-@free@ rather than insert/remove, and is sensitive to the layout of memory by the allocator. 887 For insert/remove of an intrusive list, the cost is manipulating the link fields, which is seen by the relatively similar results for the different intrusive lists. 888 For insert/remove of a wrapped-reference list, the costs are: dynamically allocating/deallocating a wrapped node, copying a external-node pointer into the wrapped node for insertion, and linking the wrapped node to/from the list; 889 the allocation dominates these costs. 890 For example, the experiment was run with both glibc and llheap memory allocators, where llheap performance reduced the cost from 20 to 16 ns, still far from the 2--4 ns for linking an intrusive node. 891 Hence, there is no way to tease apart the allocation, copying, and linking costs for wrapped lists, as there is no way to preallocate the list nodes without writing a mini-allocator to manage that storage. 891 892 892 893 In detail, \VRef[Figure]{f:Linear-swift}--\subref*{f:Linear-java} shows linear insertion of all the nodes and then linear removal, both in the same direction. 893 For intrusive lists, the nodes are adjacent because they arepreallocated in an array.894 For wrapped lists, the wrapped nodes are still adjacent because the memory allocator happens to use bump allocation for the small fixed-sized wrapped nodes.894 For intrusive lists, the nodes are adjacent in memory from being preallocated in an array. 895 For wrapped lists, the wrapped nodes happen to be adjacent because the memory allocator uses bump allocation during the initial phase of allocation. 895 896 As a result, these memory layouts result in high spatial and temporal locality for both kinds of lists during the linear array traversal. 896 897 With address look-ahead, the hardware does an excellent job of managing the multi-level cache. 897 898 Hence, performance is largely constant for both kinds of lists, until L3 cache and NUMA boundaries are crossed for longer lists and the costs increase consistently for both kinds of lists. 898 899 In detail, \VRef[Figure]{f:Shuffled-swift}--\subref*{f:Shuffled-java} shows shuffled insertion and removal of the nodes. 900 As for linear, there are issues with the wrapped list and memory allocation. 901 For intrusive lists, it is possible to link the nodes randomly, so consecutive nodes in memory seldom point at adjacent nodes. 902 For wrapped lists, the placement of wrapped nodes is dictated by the memory allocator, and results in memory layout virtually the same as for linear, \ie the wrapped nodes are laid out consecutively in memory, but each wrapped node points at a randomly located external node. 903 As seen on \VPageref{p:Shuffle}, the random access reads data from external node, which are located in random order. 904 So while the wrapped nodes are accessed linearly, external nodes are touched randomly for both kinds of lists resulting in similar cache events. 905 The slowdown of shuffled occurs as the experiment's length grows from last-level cache into main memory. 899 For example, on AMD (\VRef[Figure]{f:Linear-swift}), there is one NUMA node but many small L3 caches, so performance slows down quickly as multiple L3 caches come into play, and remains constant at that level, except for some anomalies for very large lists. 900 On Intel (\VRef[Figure]{f:Linear-java}), there are four NUMA nodes and four slowdown steps as list-length increase. 901 At each step, the difference between the kinds of lists decreases as the NUMA effect increases. 902 903 In detail, \VRef[Figure]{f:Random-swift}--\subref*{f:Random-java} shows random insertion and removal of the nodes. 904 As for linear, there is the issue of memory allocation for the wrapped list. 905 As well, the consecutive storage-layout is the same (array and bump allocation). 906 Hence, the difference is the random linking among nodes, resulting in random accesses, even though the list is traversed linearly, resulting in similar cache events for both kinds of lists. 907 Both \VRef[Figures]{f:Random-swift}--\subref*{f:Random-java} show the slowdown of random access as the list-length grows resulting from stepping out of caches into main memory and crossing NUMA nodes. 906 908 % Insert and remove operations act on both sides of a link. 907 909 %Both a next unlisted item to insert (found in the items' array, seen through the shuffling array), and a next listed item to remove (found by traversing list links), introduce a new user-item location. 908 Each time a next item is processed, the memory access is a hop to a randomly far address. 909 The target is unavailable in the cache and a slowdown results. 910 Note, the external-data no-touch assumption is often unrealistic: decisions like,``Should I remove this item?'' need to look at the item. 911 Therefore, under realistic assumptions, both intrusive and wrapped-reference lists suffer similar caching issues for very large lists. 910 As for linear, the Intel (\VRef[Figure]{f:Random-java}) graph shows steps from the four NUMA nodes. 911 Interestingly, after $10^6$ nodes, intrusive lists are slower than wrapped. 912 I did not have time to track down this anomaly, but I speculate it results from the difference in touching the data in the accessed node, as the data and links are together for intrusive and separated for wrapped. 913 For the llheap memory-allocator and the two tested architectures, intrusive lists out perform wrapped lists up to size $10^3$ for both linear and random, and performance begins to converge around $10^6$ nodes as architectural issues begin to dominate. 914 Clearly, memory allocator and hardware architecture plays a large factor in the total cost and the crossover points as list-size increases. 912 915 % In an odd scenario where this intuition is incorrect, and where furthermore the program's total use of the memory allocator is sufficiently limited to yield approximately adjacent allocations for successive list insertions, a non-intrusive list may be preferred for lists of approximately the cache's size. 913 916 914 917 The takeaway from this experiment is that wrapped-list operations are expensive because memory allocation is expense at this fine-grained level of execution. 915 918 Hence, when possible, using intrusive links can produce a significant performance gain, even if nodes must be dynamically allocated, because the wrapping allocations are eliminated. 916 Even when space is a consideration, intrusive links may not use any more storage if a node is mostlylinked.917 Unfortunately, many programmers are unaware of intrusive lists for dynamically-sized data-structures .919 Even when space is a consideration, intrusive links may not use more storage if a node is often linked. 920 Unfortunately, many programmers are unaware of intrusive lists for dynamically-sized data-structures or their tool-set does not provide them. 918 921 919 922 % Note, linear access may not be realistic unless dynamic size changes may occur; … … 954 957 \label{s:ComparingIntrusiveImplementations} 955 958 956 The preceding result shows th at all the intrusive implementations examined have noteworthy performance compared to wrapped lists.957 This analysis picks the sizes below 150 and zooms in to differentiate among the differentintrusive implementations.958 The data is selected from the start of \VRef[Figure ]{f:Linear}, but the start of \VRef[Figure]{f:Shuffled} would belargely the same.959 The preceding result shows the intrusive implementations examined have better performance compared to wrapped lists for small to medium sized lists. 960 This analysis picks list sizes below 150 and zooms in to differentiate among the intrusive implementations. 961 The data is selected from the start of \VRef[Figures]{f:Linear-swift}--\subref*{f:Linear-java}, but the start of \VRef[Figures]{f:Random-swift}--\subref*{f:Random-java} is largely the same. 959 962 960 963 \begin{figure} … … 982 985 } % subfigure 983 986 \end{tabular} 984 \caption{Operation duration \vs list length at small-medium lengths. Two example operations are shown: [I] stack movement with head-only access (plots a and b); [VIII] queue movement with element-oriented removal access (plots c and d); both operations use insert-first polarity. }987 \caption{Operation duration \vs list length at small-medium lengths. Two example operations are shown: [I] stack movement with head-only access (plots a and b); [VIII] queue movement with element-oriented removal access (plots c and d); both operations use insert-first polarity. Lower is better.} 985 988 \label{fig:plot-list-zoomin} 986 989 \end{figure} … … 990 993 The error bars show fastest and slowest time seen on five trials, and the central point is the mean of the remaining three trials. 991 994 For readability, the points are slightly staggered at a given horizontal value, where the points might otherwise appear on top of each other. 992 993 \VRef[Figure]{fig:plot-list-zoomin} has the pair machines, each in a column. 994 It also has a pair of operating scenarios, each in a row. 995 The experiment runs twelve operating scenarios; the ones chosen for their variety happen to be items I and VIII from the listing of \VRef[Figure]{fig:plot-list-mchn-szz}. 995 The experiment runs twelve operating scenarios; 996 the ones chosen for their variety are scenarios I and VIII from the listing of \VRef[Figure]{fig:plot-list-mchn-szz}, and their results appear in the rows. 997 As in the previous experiment, each hardware architecture appears in a column. 996 998 Note that LQ-list does not support queue operations, so this framework is absent from Operation VIII. 997 999 998 1000 At lengths 1 and 2, winner patterns seen at larger sizes generally do not apply. 999 Indeed, an issue with \CFA giving terrib elon queues at length 1 is evident in \VRef[Figure]{f:AbsoluteTime-viii-swift}.1001 Indeed, an issue with \CFA giving terrible on queues at length 1 is evident in \VRef[Figure]{f:AbsoluteTime-viii-swift}. 1000 1002 This phenomenon is elaborated in \MLB{TODO: xref}. 1001 1003 For the remainder of this section, these sizes are disregarded. … … 1197 1199 \label{toc:lst:futwork} 1198 1200 1199 Not discussed in the chapter is an issue in the \CFA type system with Plan-9 @inline@ declarations. 1200 It implements the trait 'embedded' for the derived-base pair named in the arguments. 1201 This trait defines a pseudo conversion called backtick-inner, from derived to base (the safe direction). 1202 Such a real conversion exists for the concrete types, due to our compiler having the p9 language feature. 1203 But when trying to be polymorphic over the types, there is no way to access this built-in conversion, so my macro stands in until Fangren does more with conversions. 1204 1201 Not discussed in the chapter is a \CFA type-system issue with Plan-9 @inline@ declarations, implementing the trait @embedded@ to access the contained @dlist@ link-fields. 1202 This trait defines an implicit conversion from derived to base (the safe direction). 1203 Such a conversion exists implicitly for concrete types using the Plan-9 inheritance feature. 1204 However, this implicit conversion is only partially implemented for polymorphism types, such as @dlist@, which prevents the straightforward list interface shown throughout the chapter. 1205 1206 My workaround is the macro @P9_EMBEDDED@ placed before an intrusive node is used to declare a list. 1207 \begin{cfa} 1208 struct node { 1209 int v; 1210 inline dlink(node); 1211 }; 1212 @P9_EMBEDDED( node, dlink(node) );@ 1213 dlist( node ) dlist; 1214 \end{cfa} 1215 The macro creates specialized access functions to explicitly extract the required information for the polymorphic @dlist@ type. 1216 These access functions could have been generated implicitly for each intrusive node by adding another compiler pass. 1217 However, this would have been substantial temporary work, when the correct solution is the compiler fix. 1218 Hence, the macro workaround is only a small temporary inconvenience; 1219 otherwise, all the list API shown in this chapter works. 1220 1221 1222 \begin{comment} 1205 1223 An API author has decided that the intended user experience is: 1206 1224 \begin{itemize} … … 1356 1374 TODO: deal with: Link fields are system-managed. 1357 1375 Links in GDB. 1358 1376 \end{comment} -
doc/theses/mike_brooks_MMath/plots/list-zoomin-abs-i-java.gp
re8a7b66d r75ba2fa6 1 set terminal pdfcairo color enhanced size 3. 0in,2.0in font "Times,17"1 set terminal pdfcairo color enhanced size 3.1in,2.0in font "Times,17" 2 2 3 3 set size 1.0, 1.0 # scale of plot area inside terminal … … 16 16 unset key 17 17 set logscale x 2 18 set logscale y19 set yrange [ 1:15];20 set ytics (1,2,3,4,5,6,7,8,10,12,14)21 set format y ""18 #set logscale y 19 set yrange [0:7]; 20 #set ytics (1,2,3,4,5,6,7,8,10,12,14) 21 #set format y "" 22 22 set xrange [0.75:180]; 23 23 # set xlabel "List length (item count)" offset 2,0 24 24 set format x "" 25 25 # set ylabel "Duration (ns)" offset -1.0,0 26 set format y ""27 26 set errorbars 2.0 28 27 set pointintervalbox 0 -
doc/theses/mike_brooks_MMath/plots/list-zoomin-abs-i-swift.gp
re8a7b66d r75ba2fa6 1 set terminal pdfcairo color enhanced size 3. 625in,2.0in font "Times,17"1 set terminal pdfcairo color enhanced size 3.5in,2.0in font "Times,17" 2 2 3 3 set size 1.0, 1.0 # scale of plot area inside terminal … … 15 15 unset key 16 16 set logscale x 2 17 set logscale y18 set yrange [ 1:15];19 set ytics (1,2,3,4,5,6,7,8,10,12,14)17 #set logscale y 18 set yrange [0:12]; 19 #set ytics (1,2,3,4,5,6,7,8,10,12) 20 20 set xrange [0.75:180]; 21 21 # set xlabel "List length (item count)" offset 2,0 22 22 set format x "" 23 set ylabel "Duration (ns)" offset -3.0,023 set ylabel "Duration (ns)" # offset -1.0,0 24 24 set errorbars 2.0 25 25 set pointintervalbox 0 -
doc/theses/mike_brooks_MMath/plots/list-zoomin-abs-viii-java.gp
re8a7b66d r75ba2fa6 1 set terminal pdfcairo color enhanced size 3. 0in,2.5in font "Times,17"1 set terminal pdfcairo color enhanced size 3.1in,2.35in font "Times,17" 2 2 3 3 set size 1.0, 1.0 # scale of plot area inside terminal … … 13 13 unset key 14 14 set logscale x 2 15 set logscale y16 set yrange [ 1:15];17 set ytics (1,2,3,4,5,6,7 ,8,10,12,14)18 set format y ""15 #set logscale y 16 set yrange [0:7]; 17 set ytics (1,2,3,4,5,6,7) 18 #set format y "" 19 19 set xrange [0.75:180]; 20 20 set xlabel "List length (item count)" offset 2,0 21 21 # set ylabel "Duration (ns)" offset -1.0,0 22 set format y ""23 22 set errorbars 2.0 24 23 set pointintervalbox 0 -
doc/theses/mike_brooks_MMath/plots/list-zoomin-abs-viii-swift.gp
re8a7b66d r75ba2fa6 1 set terminal pdfcairo color enhanced size 3. 625in,2.5in font "Times,17"1 set terminal pdfcairo color enhanced size 3.5in,2.35in font "Times,17" 2 2 3 3 set size 1.0, 1.0 # scale of plot area inside terminal … … 15 15 unset key 16 16 set logscale x 2 17 set logscale y18 set yrange [ 1:15];19 set ytics (1,2,3,4,5,6,7,8,10,12,14)17 #set logscale y 18 set yrange [0:10]; 19 #set ytics (1,2,3,4,5,6,7,8,10,12,14) 20 20 set xrange [0.75:180]; 21 21 set xlabel "List length (item count)" offset 2,0 22 set ylabel "Duration (ns)" offset -3.0,022 set ylabel "Duration (ns)" # offset -3.0,0 23 23 set errorbars 2.0 24 24 set pointintervalbox 0 -
doc/theses/mike_brooks_MMath/plots/list-zoomout-noshuf-swift.gp
re8a7b66d r75ba2fa6 24 24 # set xlabel "List length (item count)" offset 2,0 25 25 unset xlabel 26 set ylabel "Duration (ns) "26 set ylabel "Duration (ns), log scale" 27 27 set linetype 3 dashtype 2 28 28 set linetype 4 dashtype 2 -
doc/theses/mike_brooks_MMath/plots/list-zoomout-shuf-java.gp
re8a7b66d r75ba2fa6 1 set terminal pdfcairo color enhanced size 3.0in, 3.0in font "Times,17"1 set terminal pdfcairo color enhanced size 3.0in,2.35in font "Times,17" 2 2 3 3 set size 1.0, 1.0 # scale of plot area inside terminal … … 20 20 #set yrange [1:1000]; 21 21 set xlabel "List length (item count)" offset 2,0 22 #set ylabel "Duration (ns) "22 #set ylabel "Duration (ns), log scale" 23 23 unset ylabel 24 24 set format y "" -
doc/theses/mike_brooks_MMath/plots/list-zoomout-shuf-swift.gp
re8a7b66d r75ba2fa6 1 set terminal pdfcairo color enhanced size 3.625in, 3.0in font "Times,17"1 set terminal pdfcairo color enhanced size 3.625in,2.35in font "Times,17" 2 2 3 3 set size 1.0, 1.0 # scale of plot area inside terminal … … 22 22 #set yrange [1:1000]; 23 23 set xlabel "List length (item count)" offset 2,0 24 set ylabel "Duration (ns) " offset 1,024 set ylabel "Duration (ns), log scale" offset 1,0 25 25 set linetype 3 dashtype 2 26 26 set linetype 4 dashtype 2 -
doc/theses/mike_brooks_MMath/uw-ethesis-frontpgs.tex
re8a7b66d r75ba2fa6 129 129 \begin{center}\textbf{Abstract}\end{center} 130 130 131 \CFA strives to fix mistakes in C, chief among them,safety.131 \CFA strives to fix issues in C, chief among them safety. 132 132 This thesis presents a significant step forward in \CFA's goal to remove unsafe pointer operations. 133 133 It describes improvements to the \CFA language design to support advanced container features. … … 136 136 To achieve these goals, this work leverages preexisting \CFA contributions by prior students, particularly novel applications of the compiler's type system. 137 137 138 All modern programming languages provide at leastthese three high-level containers (collections): array, linked-list, and string.138 All modern programming languages provide these three high-level containers (collections): array, linked-list, and string. 139 139 Often, the array is part of the programming language, while linked lists are built from (recursive) pointer types, and strings from arrays and/or linked lists. 140 140 For all three types, languages and/or their libraries supply varying degrees of high-level mechanisms for manipulating these objects at the bulk and component levels, such as copying, slicing, extracting, and iterating among elements. … … 142 142 Therefore, hardening these three C types and suggesting programers use them as their default types goes a long way to increase memory safety in the majority of C programs. 143 143 144 Specifically, an array utility is provided that tracks length internally, relieving the user of managing explicit lengthparameters and stopping buffer-overrun errors.144 Specifically, an array is provided that tracks its length internally, relieving the user and implementor from managing explicit length arguments/parameters and stopping buffer-overrun errors. 145 145 This feature requires augmenting the \CFA type system, making array length available at compile and runtime. 146 A linked-list utility is provided , which obviates many user-managed recursive pointerswhile catering directly to system-programming using intrusive linking.146 A linked-list utility is provided that obviates many user-managed recursive pointers, while catering directly to system-programming using intrusive linking. 147 147 Finally, a string utility is provided with implicit memory management of text in a specialized heap, removing error-prone buffer management, including overrun, and providing a copy-on-write speed boost. 148 For all three utilities, performance is argued to be on-par with, and occasionally surpassing, relevant comparators.148 For all three utilities, performance is argued to be on-par or surpass those in other comparable languages. 149 149 With the array, this case is made by showing complete erasure down to a naked C array, modulo runtime bound checks, which are removable more often than with Java-style length management. 150 150 With the linked list and string, empirical measures are compared with C and \CC comparable libraries. 151 These utilities offer programmers workable alternatives to hand-rolling specialized libraries, which is a huge safety benefit, eliminating many system vulnerabilities.151 These containers offer programmers workable alternatives to hand-rolling specialized libraries, which is a huge safety benefit, eliminating many system vulnerabilities. 152 152 The results establish \CFA's position as a safety-forward programming alternative. 153 153
Note:
See TracChangeset
for help on using the changeset viewer.