Changeset 4b2ea0d for doc/theses/mubeen_zulfiqar_MMath
- Timestamp:
- Apr 27, 2022, 7:36:49 PM (3 years ago)
- Branches:
- ADT, ast-experimental, master, pthread-emulation, qualifiedEnum
- Children:
- 0bd6a14
- Parents:
- a6c10de
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/mubeen_zulfiqar_MMath/performance.tex
ra6c10de r4b2ea0d 140 140 141 141 All allocators did well in this micro-benchmark, except for \textsf{dl} on the ARM. 142 \textsf{dl}'s performace decreases and the difference with the other allocators starts increases as the number of worker threads increase. 143 \textsf{je} was the fastest, although there is not much difference between \textsf{je} and rest of the allocators. 144 142 145 llheap is slightly slower because it uses ownership, where many of the allocations have remote frees, which requires locking. 143 146 When llheap is compiled without ownership, its performance is the same as the other allocators (not shown). 144 147 145 146 148 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 147 149 %% THRASH … … 149 151 150 152 \subsection{Cache Thrash} 153 \label{sec:cache-thrash-perf} 151 154 152 155 Thrash tests memory allocators for active false sharing (see \VRef{sec:benchThrashSec}). … … 179 182 \end{figure} 180 183 181 All allocators did well in this micro-benchmark, except for \textsf{dl} and \textsf{pt3} on the x64. 182 Either the memory allocators generate little active false-sharing or the micro-benchmark is not generating scenarios that cause active false-sharing. 184 All allocators did well in this micro-benchmark, except for \textsf{dl} and \textsf{pt3}. 185 \textsf{dl} uses a single heap for all threads so it is understable that it is generating so much active false-sharing. 186 Requests from different threads will be dealt with sequientially by a single heap using locks which can allocate objects to different threads on the same cache line. 187 \textsf{pt3} uses multiple heaps but it is not exactly per-thread heap. 188 So, it is possible that multiple threads using one heap can get objects allocated on the same cache line which might be causing active false-sharing. 189 Rest of the memory allocators generate little or no active false-sharing. 183 190 184 191 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% … … 217 224 \end{figure} 218 225 219 All allocators did well in this micro-benchmark on the ARM. 220 Allocators \textsf{llh}, \textsf{je}, and \textsf{rp} did well on the x64, while the remaining allocators experienced significant slowdowns from the false sharing. 226 This micro-benchmark divided the allocators in 2 groups. 227 First is the group of best performers \textsf{llh}, \textsf{je}, and \textsf{rp}. 228 These memory alloctors generate little or no passive false-sharing and their performance difference is negligible. 229 Second is the group of the low performers which includes rest of the memory allocators. 230 These memory allocators seem to preserve program-induced passive false-sharing. 231 \textsf{hrd}'s performance keeps getting worst as the number of threads increase. 232 233 Interestingly, allocators such as \textsf{hrd} and \textsf{glc} were among the best performers in micro-benchmark cache thrash as described in section \ref{sec:cache-thrash-perf}. 234 But, these allocators were among the low performers in this micro-benchmark. 235 It tells us that these allocators do not actively produce false-sharing but they may preserve program-induced passive false sharing. 221 236 222 237 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% … … 274 289 275 290 All allocators did well in this micro-benchmark across all allocation chains, except for \textsf{dl} and \textsf{pt3}. 291 \textsf{dl} performed the lowest overall and its performce kept getting worse with increasing number of threads. 292 \textsf{dl} uses a single heap with a global lock that can become a bottleneck. 293 Multiple threads doing memory allocation in parallel can create contention on \textsf{dl}'s single heap. 294 \textsf{pt3} which is a modification of \textsf{dl} for multi-threaded applications does not use per-thread heaps and may also have similar bottlenecks. 295 296 There's a sudden increase in program completion time of chains that include \textsf{calloc} and all allocators perform relatively slower in these chains including \textsf{calloc}. 297 \textsf{calloc} uses \textsf{memset} to set the allocated memory to zero. 298 \textsf{memset} is a slow routine which takes a long time compared to the actual memory allocation. 299 So, a major part of the time is taken for \textsf{memset} in performance of chains that include \textsf{calloc}. 300 But the relative difference among the different memory allocators running the same chain of memory allocation operations still gives us an idea of theor relative performance. 276 301 277 302 %speed-3-malloc.eps … … 502 527 First, the differences in the shape of the curves between architectures (top ARM, bottom x64) is small, where the differences are in the amount of memory used. 503 528 Hence, it is possible to focus on either the top or bottom graph. 504 The heap curve It is possible glib, hoard, jemalloc, ptmalloc3, rpmalloc do not use the sbrk area => only uses mmap. 505 506 hoard, tbbmalloc uses more total memory 507 508 ptmalloc3 gives memory back to operating system 529 The heap curve is remains zero for 4 memory allocators: \textsf{hrd}, \textsf{je}, \textsf{pt3}, and \textsf{rp}. 530 These memory allocators are not using the sbrk area, instead they only use mmap to get memory from the system. 531 532 \textsf{hrd}, and \textsf{tbb} have higher memory footprint than the others as they use more total dynamic memory. 533 One reason for that can be the usage of superblocks as both of these memory allocators create superblocks where each block contains objects of the same size. 534 These superblocks are maintained throughout the life of the program. 535 536 \textsf{pt3} is the only memory allocator for which the total dynamic memory goes down in the second half of the program lifetime when the memory is freed by the benchmark program. 537 It makes pt3 the only memory allocator that gives memory back to operating system as it is freed by the program. 538 539 % FOR 1 THREAD 509 540 510 541 %mem-1-prod-1-cons-100-llh.eps … … 513 544 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-llh} } 514 545 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-llh} } 515 \caption{Memory benchmark results with 1 producerfor llh memory allocator}546 \caption{Memory benchmark results with Configuration-1 for llh memory allocator} 516 547 \label{fig:mem-1-prod-1-cons-100-llh} 517 548 \end{figure} 549 550 %mem-1-prod-1-cons-100-dl.eps 551 \begin{figure} 552 \centering 553 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-dl} } 554 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-dl} } 555 \caption{Memory benchmark results with Configuration-1 for dl memory allocator} 556 \label{fig:mem-1-prod-1-cons-100-dl} 557 \end{figure} 558 559 %mem-1-prod-1-cons-100-glc.eps 560 \begin{figure} 561 \centering 562 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-glc} } 563 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-glc} } 564 \caption{Memory benchmark results with Configuration-1 for glibc memory allocator} 565 \label{fig:mem-1-prod-1-cons-100-glc} 566 \end{figure} 567 568 %mem-1-prod-1-cons-100-hrd.eps 569 \begin{figure} 570 \centering 571 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-hrd} } 572 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-hrd} } 573 \caption{Memory benchmark results with Configuration-1 for hoard memory allocator} 574 \label{fig:mem-1-prod-1-cons-100-hrd} 575 \end{figure} 576 577 %mem-1-prod-1-cons-100-je.eps 578 \begin{figure} 579 \centering 580 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-je} } 581 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-je} } 582 \caption{Memory benchmark results with Configuration-1 for je memory allocator} 583 \label{fig:mem-1-prod-1-cons-100-je} 584 \end{figure} 585 586 %mem-1-prod-1-cons-100-pt3.eps 587 \begin{figure} 588 \centering 589 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-pt3} } 590 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-pt3} } 591 \caption{Memory benchmark results with Configuration-1 for pt3 memory allocator} 592 \label{fig:mem-1-prod-1-cons-100-pt3} 593 \end{figure} 594 595 %mem-1-prod-1-cons-100-rp.eps 596 \begin{figure} 597 \centering 598 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-rp} } 599 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-rp} } 600 \caption{Memory benchmark results with Configuration-1 for rp memory allocator} 601 \label{fig:mem-1-prod-1-cons-100-rp} 602 \end{figure} 603 604 %mem-1-prod-1-cons-100-tbb.eps 605 \begin{figure} 606 \centering 607 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-tbb} } 608 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-tbb} } 609 \caption{Memory benchmark results with Configuration-1 for tbb memory allocator} 610 \label{fig:mem-1-prod-1-cons-100-tbb} 611 \end{figure} 612 613 % FOR 4 THREADS 518 614 519 615 %mem-4-prod-4-cons-100-llh.eps … … 522 618 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-llh} } 523 619 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-llh} } 524 \caption{Memory benchmark results with 4 producersfor llh memory allocator}620 \caption{Memory benchmark results with Configuration-2 for llh memory allocator} 525 621 \label{fig:mem-4-prod-4-cons-100-llh} 526 \end{figure}527 528 %mem-1-prod-1-cons-100-dl.eps529 \begin{figure}530 \centering531 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-dl} }532 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-dl} }533 \caption{Memory benchmark results with 1 producer for dl memory allocator}534 \label{fig:mem-1-prod-1-cons-100-dl}535 622 \end{figure} 536 623 … … 540 627 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-dl} } 541 628 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-dl} } 542 \caption{Memory benchmark results with 4 producersfor dl memory allocator}629 \caption{Memory benchmark results with Configuration-2 for dl memory allocator} 543 630 \label{fig:mem-4-prod-4-cons-100-dl} 544 \end{figure}545 546 %mem-1-prod-1-cons-100-glc.eps547 \begin{figure}548 \centering549 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-glc} }550 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-glc} }551 \caption{Memory benchmark results with 1 producer for glibc memory allocator}552 \label{fig:mem-1-prod-1-cons-100-glc}553 631 \end{figure} 554 632 … … 558 636 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-glc} } 559 637 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-glc} } 560 \caption{Memory benchmark results with 4 producersfor glibc memory allocator}638 \caption{Memory benchmark results with Configuration-2 for glibc memory allocator} 561 639 \label{fig:mem-4-prod-4-cons-100-glc} 562 \end{figure}563 564 %mem-1-prod-1-cons-100-hrd.eps565 \begin{figure}566 \centering567 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-hrd} }568 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-hrd} }569 \caption{Memory benchmark results with 1 producer for hoard memory allocator}570 \label{fig:mem-1-prod-1-cons-100-hrd}571 640 \end{figure} 572 641 … … 576 645 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-hrd} } 577 646 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-hrd} } 578 \caption{Memory benchmark results with 4 producersfor hoard memory allocator}647 \caption{Memory benchmark results with Configuration-2 for hoard memory allocator} 579 648 \label{fig:mem-4-prod-4-cons-100-hrd} 580 \end{figure}581 582 %mem-1-prod-1-cons-100-je.eps583 \begin{figure}584 \centering585 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-je} }586 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-je} }587 \caption{Memory benchmark results with 1 producer for je memory allocator}588 \label{fig:mem-1-prod-1-cons-100-je}589 649 \end{figure} 590 650 … … 594 654 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-je} } 595 655 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-je} } 596 \caption{Memory benchmark results with 4 producersfor je memory allocator}656 \caption{Memory benchmark results with Configuration-2 for je memory allocator} 597 657 \label{fig:mem-4-prod-4-cons-100-je} 598 \end{figure}599 600 %mem-1-prod-1-cons-100-pt3.eps601 \begin{figure}602 \centering603 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-pt3} }604 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-pt3} }605 \caption{Memory benchmark results with 1 producer for pt3 memory allocator}606 \label{fig:mem-1-prod-1-cons-100-pt3}607 658 \end{figure} 608 659 … … 612 663 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-pt3} } 613 664 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-pt3} } 614 \caption{Memory benchmark results with 4 producersfor pt3 memory allocator}665 \caption{Memory benchmark results with Configuration-2 for pt3 memory allocator} 615 666 \label{fig:mem-4-prod-4-cons-100-pt3} 616 \end{figure}617 618 %mem-1-prod-1-cons-100-rp.eps619 \begin{figure}620 \centering621 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-rp} }622 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-rp} }623 \caption{Memory benchmark results with 1 producer for rp memory allocator}624 \label{fig:mem-1-prod-1-cons-100-rp}625 667 \end{figure} 626 668 … … 630 672 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-rp} } 631 673 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-rp} } 632 \caption{Memory benchmark results with 4 producersfor rp memory allocator}674 \caption{Memory benchmark results with Configuration-2 for rp memory allocator} 633 675 \label{fig:mem-4-prod-4-cons-100-rp} 634 \end{figure}635 636 %mem-1-prod-1-cons-100-tbb.eps637 \begin{figure}638 \centering639 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-1-prod-1-cons-100-tbb} }640 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-1-prod-1-cons-100-tbb} }641 \caption{Memory benchmark results with 1 producer for tbb memory allocator}642 \label{fig:mem-1-prod-1-cons-100-tbb}643 676 \end{figure} 644 677 … … 648 681 \subfigure[Algol]{ \includegraphics[width=0.95\textwidth]{evaluations/algol-perf-eps/mem-4-prod-4-cons-100-tbb} } 649 682 \subfigure[Nasus]{ \includegraphics[width=0.95\textwidth]{evaluations/nasus-perf-eps/mem-4-prod-4-cons-100-tbb} } 650 \caption{Memory benchmark results with 4 producersfor tbb memory allocator}683 \caption{Memory benchmark results with Configuration-2 for tbb memory allocator} 651 684 \label{fig:mem-4-prod-4-cons-100-tbb} 652 685 \end{figure} 653 654 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%655 %% ANALYSIS656 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Note: See TracChangeset
for help on using the changeset viewer.