Changes in / [0bd6a14:e9c5db2]


Ignore:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mubeen_zulfiqar_MMath/performance.tex

    r0bd6a14 re9c5db2  
    140140
    141141All allocators did well in this micro-benchmark, except for \textsf{dl} on the ARM.
    142 \textsf{dl}'s performace decreases and the difference with the other allocators starts increases as the number of worker threads increase.
    143 \textsf{je} was the fastest, although there is not much difference between \textsf{je} and rest of the allocators.
    144 
    145142llheap is slightly slower because it uses ownership, where many of the allocations have remote frees, which requires locking.
    146143When llheap is compiled without ownership, its performance is the same as the other allocators (not shown).
    147144
     145
    148146%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    149147%% THRASH
     
    151149
    152150\subsection{Cache Thrash}
    153 \label{sec:cache-thrash-perf}
    154151
    155152Thrash tests memory allocators for active false sharing (see \VRef{sec:benchThrashSec}).
     
    182179\end{figure}
    183180
    184 All allocators did well in this micro-benchmark, except for \textsf{dl} and \textsf{pt3}.
    185 \textsf{dl} uses a single heap for all threads so it is understable that it is generating so much active false-sharing.
    186 Requests from different threads will be dealt with sequientially by a single heap using locks which can allocate objects to different threads on the same cache line.
    187 \textsf{pt3} uses multiple heaps but it is not exactly per-thread heap.
    188 So, it is possible that multiple threads using one heap can get objects allocated on the same cache line which might be causing active false-sharing.
    189 Rest of the memory allocators generate little or no active false-sharing.
     181All allocators did well in this micro-benchmark, except for \textsf{dl} and \textsf{pt3} on the x64.
     182Either the memory allocators generate little active false-sharing or the micro-benchmark is not generating scenarios that cause active false-sharing.
    190183
    191184%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     
    224217\end{figure}
    225218
    226 This micro-benchmark divided the allocators in 2 groups.
    227 First is the group of best performers \textsf{llh}, \textsf{je}, and \textsf{rp}.
    228 These memory alloctors generate little or no passive false-sharing and their performance difference is negligible.
    229 Second is the group of the low performers which includes rest of the memory allocators.
    230 These memory allocators seem to preserve program-induced passive false-sharing.
    231 \textsf{hrd}'s performance keeps getting worst as the number of threads increase.
    232 
    233 Interestingly, allocators such as \textsf{hrd} and \textsf{glc} were among the best performers in micro-benchmark cache thrash as described in section \ref{sec:cache-thrash-perf}.
    234 But, these allocators were among the low performers in this micro-benchmark.
    235 It tells us that these allocators do not actively produce false-sharing but they may preserve program-induced passive false sharing.
     219All allocators did well in this micro-benchmark on the ARM.
     220Allocators \textsf{llh}, \textsf{je}, and \textsf{rp} did well on the x64, while the remaining allocators experienced significant slowdowns from the false sharing.
    236221
    237222%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     
    289274
    290275All allocators did well in this micro-benchmark across all allocation chains, except for \textsf{dl} and \textsf{pt3}.
    291 \textsf{dl} performed the lowest overall and its performce kept getting worse with increasing number of threads.
    292 \textsf{dl} uses a single heap with a global lock that can become a bottleneck.
    293 Multiple threads doing memory allocation in parallel can create contention on \textsf{dl}'s single heap.
    294 \textsf{pt3} which is a modification of \textsf{dl} for multi-threaded applications does not use per-thread heaps and may also have similar bottlenecks.
    295 
    296 There's a sudden increase in program completion time of chains that include \textsf{calloc} and all allocators perform relatively slower in these chains including \textsf{calloc}.
    297 \textsf{calloc} uses \textsf{memset} to set the allocated memory to zero.
    298 \textsf{memset} is a slow routine which takes a long time compared to the actual memory allocation.
    299 So, a major part of the time is taken for \textsf{memset} in performance of chains that include \textsf{calloc}.
    300 But the relative difference among the different memory allocators running the same chain of memory allocation operations still gives us an idea of theor relative performance.
    301276
    302277%speed-3-malloc.eps
     
    527502First, the differences in the shape of the curves between architectures (top ARM, bottom x64) is small, where the differences are in the amount of memory used.
    528503Hence, it is possible to focus on either the top or bottom graph.
    529 The heap curve is remains zero for 4 memory allocators: \textsf{hrd}, \textsf{je}, \textsf{pt3}, and \textsf{rp}.
    530 These memory allocators are not using the sbrk area, instead they only use mmap to get memory from the system.
    531 
    532 \textsf{hrd}, and \textsf{tbb} have higher memory footprint than the others as they use more total dynamic memory.
    533 One reason for that can be the usage of superblocks as both of these memory allocators create superblocks where each block contains objects of the same size.
    534 These superblocks are maintained throughout the life of the program.
    535 
    536 \textsf{pt3} is the only memory allocator for which the total dynamic memory goes down in the second half of the program lifetime when the memory is freed by the benchmark program.
    537 It makes pt3 the only memory allocator that gives memory back to operating system as it is freed by the program.
    538 
    539 % FOR 1 THREAD
     504The heap curve It is possible glib, hoard, jemalloc, ptmalloc3, rpmalloc do not use the sbrk area => only uses mmap.
     505
     506hoard, tbbmalloc uses more total memory
     507
     508ptmalloc3 gives memory back to operating system
    540509
    541510%mem-1-prod-1-cons-100-llh.eps
     
    544513    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-llh} }
    545514    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-llh} }
    546 \caption{Memory benchmark results with Configuration-1 for llh memory allocator}
     515\caption{Memory benchmark results with 1 producer for llh memory allocator}
    547516\label{fig:mem-1-prod-1-cons-100-llh}
     517\end{figure}
     518
     519%mem-4-prod-4-cons-100-llh.eps
     520\begin{figure}
     521\centering
     522    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-llh} }
     523    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-llh} }
     524\caption{Memory benchmark results with 4 producers for llh memory allocator}
     525\label{fig:mem-4-prod-4-cons-100-llh}
    548526\end{figure}
    549527
     
    553531    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-dl} }
    554532    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-dl} }
    555 \caption{Memory benchmark results with Configuration-1 for dl memory allocator}
     533\caption{Memory benchmark results with 1 producer for dl memory allocator}
    556534\label{fig:mem-1-prod-1-cons-100-dl}
     535\end{figure}
     536
     537%mem-4-prod-4-cons-100-dl.eps
     538\begin{figure}
     539\centering
     540    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-dl} }
     541    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-dl} }
     542\caption{Memory benchmark results with 4 producers for dl memory allocator}
     543\label{fig:mem-4-prod-4-cons-100-dl}
    557544\end{figure}
    558545
     
    562549    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-glc} }
    563550    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-glc} }
    564 \caption{Memory benchmark results with Configuration-1 for glibc memory allocator}
     551\caption{Memory benchmark results with 1 producer for glibc memory allocator}
    565552\label{fig:mem-1-prod-1-cons-100-glc}
     553\end{figure}
     554
     555%mem-4-prod-4-cons-100-glc.eps
     556\begin{figure}
     557\centering
     558    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-glc} }
     559    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-glc} }
     560\caption{Memory benchmark results with 4 producers for glibc memory allocator}
     561\label{fig:mem-4-prod-4-cons-100-glc}
    566562\end{figure}
    567563
     
    571567    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-hrd} }
    572568    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-hrd} }
    573 \caption{Memory benchmark results with Configuration-1 for hoard memory allocator}
     569\caption{Memory benchmark results with 1 producer for hoard memory allocator}
    574570\label{fig:mem-1-prod-1-cons-100-hrd}
     571\end{figure}
     572
     573%mem-4-prod-4-cons-100-hrd.eps
     574\begin{figure}
     575\centering
     576    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-hrd} }
     577    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-hrd} }
     578\caption{Memory benchmark results with 4 producers for hoard memory allocator}
     579\label{fig:mem-4-prod-4-cons-100-hrd}
    575580\end{figure}
    576581
     
    580585    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-je} }
    581586    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-je} }
    582 \caption{Memory benchmark results with Configuration-1 for je memory allocator}
     587\caption{Memory benchmark results with 1 producer for je memory allocator}
    583588\label{fig:mem-1-prod-1-cons-100-je}
     589\end{figure}
     590
     591%mem-4-prod-4-cons-100-je.eps
     592\begin{figure}
     593\centering
     594    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-je} }
     595    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-je} }
     596\caption{Memory benchmark results with 4 producers for je memory allocator}
     597\label{fig:mem-4-prod-4-cons-100-je}
    584598\end{figure}
    585599
     
    589603    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-pt3} }
    590604    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-pt3} }
    591 \caption{Memory benchmark results with Configuration-1 for pt3 memory allocator}
     605\caption{Memory benchmark results with 1 producer for pt3 memory allocator}
    592606\label{fig:mem-1-prod-1-cons-100-pt3}
     607\end{figure}
     608
     609%mem-4-prod-4-cons-100-pt3.eps
     610\begin{figure}
     611\centering
     612    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-pt3} }
     613    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-pt3} }
     614\caption{Memory benchmark results with 4 producers for pt3 memory allocator}
     615\label{fig:mem-4-prod-4-cons-100-pt3}
    593616\end{figure}
    594617
     
    598621    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-rp} }
    599622    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-rp} }
    600 \caption{Memory benchmark results with Configuration-1 for rp memory allocator}
     623\caption{Memory benchmark results with 1 producer for rp memory allocator}
    601624\label{fig:mem-1-prod-1-cons-100-rp}
     625\end{figure}
     626
     627%mem-4-prod-4-cons-100-rp.eps
     628\begin{figure}
     629\centering
     630    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-rp} }
     631    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-rp} }
     632\caption{Memory benchmark results with 4 producers for rp memory allocator}
     633\label{fig:mem-4-prod-4-cons-100-rp}
    602634\end{figure}
    603635
     
    607639    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-tbb} }
    608640    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-tbb} }
    609 \caption{Memory benchmark results with Configuration-1 for tbb memory allocator}
     641\caption{Memory benchmark results with 1 producer for tbb memory allocator}
    610642\label{fig:mem-1-prod-1-cons-100-tbb}
    611 \end{figure}
    612 
    613 % FOR 4 THREADS
    614 
    615 %mem-4-prod-4-cons-100-llh.eps
    616 \begin{figure}
    617 \centering
    618     \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-llh} }
    619     \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-llh} }
    620 \caption{Memory benchmark results with Configuration-2 for llh memory allocator}
    621 \label{fig:mem-4-prod-4-cons-100-llh}
    622 \end{figure}
    623 
    624 %mem-4-prod-4-cons-100-dl.eps
    625 \begin{figure}
    626 \centering
    627     \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-dl} }
    628     \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-dl} }
    629 \caption{Memory benchmark results with Configuration-2 for dl memory allocator}
    630 \label{fig:mem-4-prod-4-cons-100-dl}
    631 \end{figure}
    632 
    633 %mem-4-prod-4-cons-100-glc.eps
    634 \begin{figure}
    635 \centering
    636     \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-glc} }
    637     \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-glc} }
    638 \caption{Memory benchmark results with Configuration-2 for glibc memory allocator}
    639 \label{fig:mem-4-prod-4-cons-100-glc}
    640 \end{figure}
    641 
    642 %mem-4-prod-4-cons-100-hrd.eps
    643 \begin{figure}
    644 \centering
    645     \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-hrd} }
    646     \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-hrd} }
    647 \caption{Memory benchmark results with Configuration-2 for hoard memory allocator}
    648 \label{fig:mem-4-prod-4-cons-100-hrd}
    649 \end{figure}
    650 
    651 %mem-4-prod-4-cons-100-je.eps
    652 \begin{figure}
    653 \centering
    654     \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-je} }
    655     \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-je} }
    656 \caption{Memory benchmark results with Configuration-2 for je memory allocator}
    657 \label{fig:mem-4-prod-4-cons-100-je}
    658 \end{figure}
    659 
    660 %mem-4-prod-4-cons-100-pt3.eps
    661 \begin{figure}
    662 \centering
    663     \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-pt3} }
    664     \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-pt3} }
    665 \caption{Memory benchmark results with Configuration-2 for pt3 memory allocator}
    666 \label{fig:mem-4-prod-4-cons-100-pt3}
    667 \end{figure}
    668 
    669 %mem-4-prod-4-cons-100-rp.eps
    670 \begin{figure}
    671 \centering
    672     \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-rp} }
    673     \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-rp} }
    674 \caption{Memory benchmark results with Configuration-2 for rp memory allocator}
    675 \label{fig:mem-4-prod-4-cons-100-rp}
    676643\end{figure}
    677644
     
    681648    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-tbb} }
    682649    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-tbb} }
    683 \caption{Memory benchmark results with Configuration-2 for tbb memory allocator}
     650\caption{Memory benchmark results with 4 producers for tbb memory allocator}
    684651\label{fig:mem-4-prod-4-cons-100-tbb}
    685652\end{figure}
     653
     654%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     655%% ANALYSIS
     656%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Note: See TracChangeset for help on using the changeset viewer.