Changes in / [0bd6a14:e9c5db2]
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mubeen_zulfiqar_MMath/performance.tex
r0bd6a14 re9c5db2 140 140 141 141 All allocators did well in this micro-benchmark, except for \textsf{dl} on the ARM. 142 \textsf{dl}'s performace decreases and the difference with the other allocators starts increases as the number of worker threads increase.143 \textsf{je} was the fastest, although there is not much difference between \textsf{je} and rest of the allocators.144 145 142 llheap is slightly slower because it uses ownership, where many of the allocations have remote frees, which requires locking. 146 143 When llheap is compiled without ownership, its performance is the same as the other allocators (not shown). 147 144 145 148 146 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 149 147 %% THRASH … … 151 149 152 150 \subsection{Cache Thrash} 153 \label{sec:cache-thrash-perf}154 151 155 152 Thrash tests memory allocators for active false sharing (see \VRef{sec:benchThrashSec}). … … 182 179 \end{figure} 183 180 184 All allocators did well in this micro-benchmark, except for \textsf{dl} and \textsf{pt3}. 185 \textsf{dl} uses a single heap for all threads so it is understable that it is generating so much active false-sharing. 186 Requests from different threads will be dealt with sequientially by a single heap using locks which can allocate objects to different threads on the same cache line. 187 \textsf{pt3} uses multiple heaps but it is not exactly per-thread heap. 188 So, it is possible that multiple threads using one heap can get objects allocated on the same cache line which might be causing active false-sharing. 189 Rest of the memory allocators generate little or no active false-sharing. 181 All allocators did well in this micro-benchmark, except for \textsf{dl} and \textsf{pt3} on the x64. 182 Either the memory allocators generate little active false-sharing or the micro-benchmark is not generating scenarios that cause active false-sharing. 190 183 191 184 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% … … 224 217 \end{figure} 225 218 226 This micro-benchmark divided the allocators in 2 groups. 227 First is the group of best performers \textsf{llh}, \textsf{je}, and \textsf{rp}. 228 These memory alloctors generate little or no passive false-sharing and their performance difference is negligible. 229 Second is the group of the low performers which includes rest of the memory allocators. 230 These memory allocators seem to preserve program-induced passive false-sharing. 231 \textsf{hrd}'s performance keeps getting worst as the number of threads increase. 232 233 Interestingly, allocators such as \textsf{hrd} and \textsf{glc} were among the best performers in micro-benchmark cache thrash as described in section \ref{sec:cache-thrash-perf}. 234 But, these allocators were among the low performers in this micro-benchmark. 235 It tells us that these allocators do not actively produce false-sharing but they may preserve program-induced passive false sharing. 219 All allocators did well in this micro-benchmark on the ARM. 220 Allocators \textsf{llh}, \textsf{je}, and \textsf{rp} did well on the x64, while the remaining allocators experienced significant slowdowns from the false sharing. 236 221 237 222 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% … … 289 274 290 275 All allocators did well in this micro-benchmark across all allocation chains, except for \textsf{dl} and \textsf{pt3}. 291 \textsf{dl} performed the lowest overall and its performce kept getting worse with increasing number of threads.292 \textsf{dl} uses a single heap with a global lock that can become a bottleneck.293 Multiple threads doing memory allocation in parallel can create contention on \textsf{dl}'s single heap.294 \textsf{pt3} which is a modification of \textsf{dl} for multi-threaded applications does not use per-thread heaps and may also have similar bottlenecks.295 296 There's a sudden increase in program completion time of chains that include \textsf{calloc} and all allocators perform relatively slower in these chains including \textsf{calloc}.297 \textsf{calloc} uses \textsf{memset} to set the allocated memory to zero.298 \textsf{memset} is a slow routine which takes a long time compared to the actual memory allocation.299 So, a major part of the time is taken for \textsf{memset} in performance of chains that include \textsf{calloc}.300 But the relative difference among the different memory allocators running the same chain of memory allocation operations still gives us an idea of theor relative performance.301 276 302 277 %speed-3-malloc.eps … … 527 502 First, the differences in the shape of the curves between architectures (top ARM, bottom x64) is small, where the differences are in the amount of memory used. 528 503 Hence, it is possible to focus on either the top or bottom graph. 529 The heap curve is remains zero for 4 memory allocators: \textsf{hrd}, \textsf{je}, \textsf{pt3}, and \textsf{rp}. 530 These memory allocators are not using the sbrk area, instead they only use mmap to get memory from the system. 531 532 \textsf{hrd}, and \textsf{tbb} have higher memory footprint than the others as they use more total dynamic memory. 533 One reason for that can be the usage of superblocks as both of these memory allocators create superblocks where each block contains objects of the same size. 534 These superblocks are maintained throughout the life of the program. 535 536 \textsf{pt3} is the only memory allocator for which the total dynamic memory goes down in the second half of the program lifetime when the memory is freed by the benchmark program. 537 It makes pt3 the only memory allocator that gives memory back to operating system as it is freed by the program. 538 539 % FOR 1 THREAD 504 The heap curve It is possible glib, hoard, jemalloc, ptmalloc3, rpmalloc do not use the sbrk area => only uses mmap. 505 506 hoard, tbbmalloc uses more total memory 507 508 ptmalloc3 gives memory back to operating system 540 509 541 510 %mem-1-prod-1-cons-100-llh.eps … … 544 513 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-llh} } 545 514 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-llh} } 546 \caption{Memory benchmark results with Configuration-1for llh memory allocator}515 \caption{Memory benchmark results with 1 producer for llh memory allocator} 547 516 \label{fig:mem-1-prod-1-cons-100-llh} 517 \end{figure} 518 519 %mem-4-prod-4-cons-100-llh.eps 520 \begin{figure} 521 \centering 522 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-llh} } 523 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-llh} } 524 \caption{Memory benchmark results with 4 producers for llh memory allocator} 525 \label{fig:mem-4-prod-4-cons-100-llh} 548 526 \end{figure} 549 527 … … 553 531 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-dl} } 554 532 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-dl} } 555 \caption{Memory benchmark results with Configuration-1for dl memory allocator}533 \caption{Memory benchmark results with 1 producer for dl memory allocator} 556 534 \label{fig:mem-1-prod-1-cons-100-dl} 535 \end{figure} 536 537 %mem-4-prod-4-cons-100-dl.eps 538 \begin{figure} 539 \centering 540 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-dl} } 541 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-dl} } 542 \caption{Memory benchmark results with 4 producers for dl memory allocator} 543 \label{fig:mem-4-prod-4-cons-100-dl} 557 544 \end{figure} 558 545 … … 562 549 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-glc} } 563 550 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-glc} } 564 \caption{Memory benchmark results with Configuration-1for glibc memory allocator}551 \caption{Memory benchmark results with 1 producer for glibc memory allocator} 565 552 \label{fig:mem-1-prod-1-cons-100-glc} 553 \end{figure} 554 555 %mem-4-prod-4-cons-100-glc.eps 556 \begin{figure} 557 \centering 558 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-glc} } 559 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-glc} } 560 \caption{Memory benchmark results with 4 producers for glibc memory allocator} 561 \label{fig:mem-4-prod-4-cons-100-glc} 566 562 \end{figure} 567 563 … … 571 567 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-hrd} } 572 568 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-hrd} } 573 \caption{Memory benchmark results with Configuration-1for hoard memory allocator}569 \caption{Memory benchmark results with 1 producer for hoard memory allocator} 574 570 \label{fig:mem-1-prod-1-cons-100-hrd} 571 \end{figure} 572 573 %mem-4-prod-4-cons-100-hrd.eps 574 \begin{figure} 575 \centering 576 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-hrd} } 577 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-hrd} } 578 \caption{Memory benchmark results with 4 producers for hoard memory allocator} 579 \label{fig:mem-4-prod-4-cons-100-hrd} 575 580 \end{figure} 576 581 … … 580 585 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-je} } 581 586 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-je} } 582 \caption{Memory benchmark results with Configuration-1for je memory allocator}587 \caption{Memory benchmark results with 1 producer for je memory allocator} 583 588 \label{fig:mem-1-prod-1-cons-100-je} 589 \end{figure} 590 591 %mem-4-prod-4-cons-100-je.eps 592 \begin{figure} 593 \centering 594 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-je} } 595 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-je} } 596 \caption{Memory benchmark results with 4 producers for je memory allocator} 597 \label{fig:mem-4-prod-4-cons-100-je} 584 598 \end{figure} 585 599 … … 589 603 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-pt3} } 590 604 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-pt3} } 591 \caption{Memory benchmark results with Configuration-1for pt3 memory allocator}605 \caption{Memory benchmark results with 1 producer for pt3 memory allocator} 592 606 \label{fig:mem-1-prod-1-cons-100-pt3} 607 \end{figure} 608 609 %mem-4-prod-4-cons-100-pt3.eps 610 \begin{figure} 611 \centering 612 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-pt3} } 613 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-pt3} } 614 \caption{Memory benchmark results with 4 producers for pt3 memory allocator} 615 \label{fig:mem-4-prod-4-cons-100-pt3} 593 616 \end{figure} 594 617 … … 598 621 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-rp} } 599 622 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-rp} } 600 \caption{Memory benchmark results with Configuration-1for rp memory allocator}623 \caption{Memory benchmark results with 1 producer for rp memory allocator} 601 624 \label{fig:mem-1-prod-1-cons-100-rp} 625 \end{figure} 626 627 %mem-4-prod-4-cons-100-rp.eps 628 \begin{figure} 629 \centering 630 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-rp} } 631 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-rp} } 632 \caption{Memory benchmark results with 4 producers for rp memory allocator} 633 \label{fig:mem-4-prod-4-cons-100-rp} 602 634 \end{figure} 603 635 … … 607 639 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-tbb} } 608 640 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-tbb} } 609 \caption{Memory benchmark results with Configuration-1for tbb memory allocator}641 \caption{Memory benchmark results with 1 producer for tbb memory allocator} 610 642 \label{fig:mem-1-prod-1-cons-100-tbb} 611 \end{figure}612 613 % FOR 4 THREADS614 615 %mem-4-prod-4-cons-100-llh.eps616 \begin{figure}617 \centering618 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-llh} }619 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-llh} }620 \caption{Memory benchmark results with Configuration-2 for llh memory allocator}621 \label{fig:mem-4-prod-4-cons-100-llh}622 \end{figure}623 624 %mem-4-prod-4-cons-100-dl.eps625 \begin{figure}626 \centering627 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-dl} }628 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-dl} }629 \caption{Memory benchmark results with Configuration-2 for dl memory allocator}630 \label{fig:mem-4-prod-4-cons-100-dl}631 \end{figure}632 633 %mem-4-prod-4-cons-100-glc.eps634 \begin{figure}635 \centering636 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-glc} }637 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-glc} }638 \caption{Memory benchmark results with Configuration-2 for glibc memory allocator}639 \label{fig:mem-4-prod-4-cons-100-glc}640 \end{figure}641 642 %mem-4-prod-4-cons-100-hrd.eps643 \begin{figure}644 \centering645 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-hrd} }646 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-hrd} }647 \caption{Memory benchmark results with Configuration-2 for hoard memory allocator}648 \label{fig:mem-4-prod-4-cons-100-hrd}649 \end{figure}650 651 %mem-4-prod-4-cons-100-je.eps652 \begin{figure}653 \centering654 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-je} }655 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-je} }656 \caption{Memory benchmark results with Configuration-2 for je memory allocator}657 \label{fig:mem-4-prod-4-cons-100-je}658 \end{figure}659 660 %mem-4-prod-4-cons-100-pt3.eps661 \begin{figure}662 \centering663 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-pt3} }664 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-pt3} }665 \caption{Memory benchmark results with Configuration-2 for pt3 memory allocator}666 \label{fig:mem-4-prod-4-cons-100-pt3}667 \end{figure}668 669 %mem-4-prod-4-cons-100-rp.eps670 \begin{figure}671 \centering672 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-rp} }673 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-rp} }674 \caption{Memory benchmark results with Configuration-2 for rp memory allocator}675 \label{fig:mem-4-prod-4-cons-100-rp}676 643 \end{figure} 677 644 … … 681 648 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-tbb} } 682 649 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-tbb} } 683 \caption{Memory benchmark results with Configuration-2for tbb memory allocator}650 \caption{Memory benchmark results with 4 producers for tbb memory allocator} 684 651 \label{fig:mem-4-prod-4-cons-100-tbb} 685 652 \end{figure} 653 654 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 655 %% ANALYSIS 656 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Note: See TracChangeset
for help on using the changeset viewer.