Ignore:
Timestamp:
Apr 27, 2022, 7:36:49 PM (2 years ago)
Author:
m3zulfiq <m3zulfiq@…>
Branches:
ADT, ast-experimental, master, pthread-emulation, qualifiedEnum
Children:
0bd6a14
Parents:
a6c10de
Message:

added analysis for benchmark results

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/mubeen_zulfiqar_MMath/performance.tex

    ra6c10de r4b2ea0d  
    140140
    141141All allocators did well in this micro-benchmark, except for \textsf{dl} on the ARM.
     142\textsf{dl}'s performace decreases and the difference with the other allocators starts increases as the number of worker threads increase.
     143\textsf{je} was the fastest, although there is not much difference between \textsf{je} and rest of the allocators.
     144
    142145llheap is slightly slower because it uses ownership, where many of the allocations have remote frees, which requires locking.
    143146When llheap is compiled without ownership, its performance is the same as the other allocators (not shown).
    144147
    145 
    146148%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    147149%% THRASH
     
    149151
    150152\subsection{Cache Thrash}
     153\label{sec:cache-thrash-perf}
    151154
    152155Thrash tests memory allocators for active false sharing (see \VRef{sec:benchThrashSec}).
     
    179182\end{figure}
    180183
    181 All allocators did well in this micro-benchmark, except for \textsf{dl} and \textsf{pt3} on the x64.
    182 Either the memory allocators generate little active false-sharing or the micro-benchmark is not generating scenarios that cause active false-sharing.
     184All allocators did well in this micro-benchmark, except for \textsf{dl} and \textsf{pt3}.
     185\textsf{dl} uses a single heap for all threads so it is understable that it is generating so much active false-sharing.
     186Requests from different threads will be dealt with sequientially by a single heap using locks which can allocate objects to different threads on the same cache line.
     187\textsf{pt3} uses multiple heaps but it is not exactly per-thread heap.
     188So, it is possible that multiple threads using one heap can get objects allocated on the same cache line which might be causing active false-sharing.
     189Rest of the memory allocators generate little or no active false-sharing.
    183190
    184191%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     
    217224\end{figure}
    218225
    219 All allocators did well in this micro-benchmark on the ARM.
    220 Allocators \textsf{llh}, \textsf{je}, and \textsf{rp} did well on the x64, while the remaining allocators experienced significant slowdowns from the false sharing.
     226This micro-benchmark divided the allocators in 2 groups.
     227First is the group of best performers \textsf{llh}, \textsf{je}, and \textsf{rp}.
     228These memory alloctors generate little or no passive false-sharing and their performance difference is negligible.
     229Second is the group of the low performers which includes rest of the memory allocators.
     230These memory allocators seem to preserve program-induced passive false-sharing.
     231\textsf{hrd}'s performance keeps getting worst as the number of threads increase.
     232
     233Interestingly, allocators such as \textsf{hrd} and \textsf{glc} were among the best performers in micro-benchmark cache thrash as described in section \ref{sec:cache-thrash-perf}.
     234But, these allocators were among the low performers in this micro-benchmark.
     235It tells us that these allocators do not actively produce false-sharing but they may preserve program-induced passive false sharing.
    221236
    222237%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
     
    274289
    275290All allocators did well in this micro-benchmark across all allocation chains, except for \textsf{dl} and \textsf{pt3}.
     291\textsf{dl} performed the lowest overall and its performce kept getting worse with increasing number of threads.
     292\textsf{dl} uses a single heap with a global lock that can become a bottleneck.
     293Multiple threads doing memory allocation in parallel can create contention on \textsf{dl}'s single heap.
     294\textsf{pt3} which is a modification of \textsf{dl} for multi-threaded applications does not use per-thread heaps and may also have similar bottlenecks.
     295
     296There's a sudden increase in program completion time of chains that include \textsf{calloc} and all allocators perform relatively slower in these chains including \textsf{calloc}.
     297\textsf{calloc} uses \textsf{memset} to set the allocated memory to zero.
     298\textsf{memset} is a slow routine which takes a long time compared to the actual memory allocation.
     299So, a major part of the time is taken for \textsf{memset} in performance of chains that include \textsf{calloc}.
     300But the relative difference among the different memory allocators running the same chain of memory allocation operations still gives us an idea of theor relative performance.
    276301
    277302%speed-3-malloc.eps
     
    502527First, the differences in the shape of the curves between architectures (top ARM, bottom x64) is small, where the differences are in the amount of memory used.
    503528Hence, it is possible to focus on either the top or bottom graph.
    504 The heap curve It is possible glib, hoard, jemalloc, ptmalloc3, rpmalloc do not use the sbrk area => only uses mmap.
    505 
    506 hoard, tbbmalloc uses more total memory
    507 
    508 ptmalloc3 gives memory back to operating system
     529The heap curve is remains zero for 4 memory allocators: \textsf{hrd}, \textsf{je}, \textsf{pt3}, and \textsf{rp}.
     530These memory allocators are not using the sbrk area, instead they only use mmap to get memory from the system.
     531
     532\textsf{hrd}, and \textsf{tbb} have higher memory footprint than the others as they use more total dynamic memory.
     533One reason for that can be the usage of superblocks as both of these memory allocators create superblocks where each block contains objects of the same size.
     534These superblocks are maintained throughout the life of the program.
     535
     536\textsf{pt3} is the only memory allocator for which the total dynamic memory goes down in the second half of the program lifetime when the memory is freed by the benchmark program.
     537It makes pt3 the only memory allocator that gives memory back to operating system as it is freed by the program.
     538
     539% FOR 1 THREAD
    509540
    510541%mem-1-prod-1-cons-100-llh.eps
     
    513544    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-llh} }
    514545    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-llh} }
    515 \caption{Memory benchmark results with 1 producer for llh memory allocator}
     546\caption{Memory benchmark results with Configuration-1 for llh memory allocator}
    516547\label{fig:mem-1-prod-1-cons-100-llh}
    517548\end{figure}
     549
     550%mem-1-prod-1-cons-100-dl.eps
     551\begin{figure}
     552\centering
     553    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-dl} }
     554    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-dl} }
     555\caption{Memory benchmark results with Configuration-1 for dl memory allocator}
     556\label{fig:mem-1-prod-1-cons-100-dl}
     557\end{figure}
     558
     559%mem-1-prod-1-cons-100-glc.eps
     560\begin{figure}
     561\centering
     562    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-glc} }
     563    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-glc} }
     564\caption{Memory benchmark results with Configuration-1 for glibc memory allocator}
     565\label{fig:mem-1-prod-1-cons-100-glc}
     566\end{figure}
     567
     568%mem-1-prod-1-cons-100-hrd.eps
     569\begin{figure}
     570\centering
     571    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-hrd} }
     572    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-hrd} }
     573\caption{Memory benchmark results with Configuration-1 for hoard memory allocator}
     574\label{fig:mem-1-prod-1-cons-100-hrd}
     575\end{figure}
     576
     577%mem-1-prod-1-cons-100-je.eps
     578\begin{figure}
     579\centering
     580    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-je} }
     581    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-je} }
     582\caption{Memory benchmark results with Configuration-1 for je memory allocator}
     583\label{fig:mem-1-prod-1-cons-100-je}
     584\end{figure}
     585
     586%mem-1-prod-1-cons-100-pt3.eps
     587\begin{figure}
     588\centering
     589    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-pt3} }
     590    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-pt3} }
     591\caption{Memory benchmark results with Configuration-1 for pt3 memory allocator}
     592\label{fig:mem-1-prod-1-cons-100-pt3}
     593\end{figure}
     594
     595%mem-1-prod-1-cons-100-rp.eps
     596\begin{figure}
     597\centering
     598    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-rp} }
     599    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-rp} }
     600\caption{Memory benchmark results with Configuration-1 for rp memory allocator}
     601\label{fig:mem-1-prod-1-cons-100-rp}
     602\end{figure}
     603
     604%mem-1-prod-1-cons-100-tbb.eps
     605\begin{figure}
     606\centering
     607    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-tbb} }
     608    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-tbb} }
     609\caption{Memory benchmark results with Configuration-1 for tbb memory allocator}
     610\label{fig:mem-1-prod-1-cons-100-tbb}
     611\end{figure}
     612
     613% FOR 4 THREADS
    518614
    519615%mem-4-prod-4-cons-100-llh.eps
     
    522618    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-llh} }
    523619    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-llh} }
    524 \caption{Memory benchmark results with 4 producers for llh memory allocator}
     620\caption{Memory benchmark results with Configuration-2 for llh memory allocator}
    525621\label{fig:mem-4-prod-4-cons-100-llh}
    526 \end{figure}
    527 
    528 %mem-1-prod-1-cons-100-dl.eps
    529 \begin{figure}
    530 \centering
    531     \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-dl} }
    532     \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-dl} }
    533 \caption{Memory benchmark results with 1 producer for dl memory allocator}
    534 \label{fig:mem-1-prod-1-cons-100-dl}
    535622\end{figure}
    536623
     
    540627    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-dl} }
    541628    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-dl} }
    542 \caption{Memory benchmark results with 4 producers for dl memory allocator}
     629\caption{Memory benchmark results with Configuration-2 for dl memory allocator}
    543630\label{fig:mem-4-prod-4-cons-100-dl}
    544 \end{figure}
    545 
    546 %mem-1-prod-1-cons-100-glc.eps
    547 \begin{figure}
    548 \centering
    549     \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-glc} }
    550     \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-glc} }
    551 \caption{Memory benchmark results with 1 producer for glibc memory allocator}
    552 \label{fig:mem-1-prod-1-cons-100-glc}
    553631\end{figure}
    554632
     
    558636    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-glc} }
    559637    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-glc} }
    560 \caption{Memory benchmark results with 4 producers for glibc memory allocator}
     638\caption{Memory benchmark results with Configuration-2 for glibc memory allocator}
    561639\label{fig:mem-4-prod-4-cons-100-glc}
    562 \end{figure}
    563 
    564 %mem-1-prod-1-cons-100-hrd.eps
    565 \begin{figure}
    566 \centering
    567     \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-hrd} }
    568     \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-hrd} }
    569 \caption{Memory benchmark results with 1 producer for hoard memory allocator}
    570 \label{fig:mem-1-prod-1-cons-100-hrd}
    571640\end{figure}
    572641
     
    576645    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-hrd} }
    577646    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-hrd} }
    578 \caption{Memory benchmark results with 4 producers for hoard memory allocator}
     647\caption{Memory benchmark results with Configuration-2 for hoard memory allocator}
    579648\label{fig:mem-4-prod-4-cons-100-hrd}
    580 \end{figure}
    581 
    582 %mem-1-prod-1-cons-100-je.eps
    583 \begin{figure}
    584 \centering
    585     \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-je} }
    586     \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-je} }
    587 \caption{Memory benchmark results with 1 producer for je memory allocator}
    588 \label{fig:mem-1-prod-1-cons-100-je}
    589649\end{figure}
    590650
     
    594654    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-je} }
    595655    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-je} }
    596 \caption{Memory benchmark results with 4 producers for je memory allocator}
     656\caption{Memory benchmark results with Configuration-2 for je memory allocator}
    597657\label{fig:mem-4-prod-4-cons-100-je}
    598 \end{figure}
    599 
    600 %mem-1-prod-1-cons-100-pt3.eps
    601 \begin{figure}
    602 \centering
    603     \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-pt3} }
    604     \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-pt3} }
    605 \caption{Memory benchmark results with 1 producer for pt3 memory allocator}
    606 \label{fig:mem-1-prod-1-cons-100-pt3}
    607658\end{figure}
    608659
     
    612663    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-pt3} }
    613664    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-pt3} }
    614 \caption{Memory benchmark results with 4 producers for pt3 memory allocator}
     665\caption{Memory benchmark results with Configuration-2 for pt3 memory allocator}
    615666\label{fig:mem-4-prod-4-cons-100-pt3}
    616 \end{figure}
    617 
    618 %mem-1-prod-1-cons-100-rp.eps
    619 \begin{figure}
    620 \centering
    621     \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-rp} }
    622     \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-rp} }
    623 \caption{Memory benchmark results with 1 producer for rp memory allocator}
    624 \label{fig:mem-1-prod-1-cons-100-rp}
    625667\end{figure}
    626668
     
    630672    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-rp} }
    631673    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-rp} }
    632 \caption{Memory benchmark results with 4 producers for rp memory allocator}
     674\caption{Memory benchmark results with Configuration-2 for rp memory allocator}
    633675\label{fig:mem-4-prod-4-cons-100-rp}
    634 \end{figure}
    635 
    636 %mem-1-prod-1-cons-100-tbb.eps
    637 \begin{figure}
    638 \centering
    639     \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-tbb} }
    640     \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-tbb} }
    641 \caption{Memory benchmark results with 1 producer for tbb memory allocator}
    642 \label{fig:mem-1-prod-1-cons-100-tbb}
    643676\end{figure}
    644677
     
    648681    \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-tbb} }
    649682    \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-tbb} }
    650 \caption{Memory benchmark results with 4 producers for tbb memory allocator}
     683\caption{Memory benchmark results with Configuration-2 for tbb memory allocator}
    651684\label{fig:mem-4-prod-4-cons-100-tbb}
    652685\end{figure}
    653 
    654 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    655 %% ANALYSIS
    656 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Note: See TracChangeset for help on using the changeset viewer.