Context Navigation

← Previous Change
Next Change →

benchmarks.tex

Timestamp:

Jun 2, 2022, 3:11:21 PM (23 months ago)

Author:

caparsons <caparson@…>

Branches:

ADT, ast-experimental, master, pthread-emulation, qualifiedEnum

Children:

ced5e2a

Parents:

015925a (diff), fc134a48 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

File:

: 1 edited

doc/theses/mubeen_zulfiqar_MMath/benchmarks.tex (modified) (15 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/theses/mubeen_zulfiqar_MMath/benchmarks.tex

-                      r015925a
+                      re5d9274
 \item[Benchmarks]
 are a suite of application programs (SPEC CPU/WEB) that are exercised in a common way (inputs) to find differences among underlying software implementations associated with an application (compiler, memory allocator, web server, \etc).
 The applications are suppose to represent common execution patterns that need to perform well with respect to an underlying software implementation.
+The applications are supposed to represent common execution patterns that need to perform well with respect to an underlying software implementation.
 Benchmarks are often criticized for having overlapping patterns, insufficient patterns, or extraneous code that masks patterns.
 \item[Micro-Benchmarks]
 …
 This thesis designs and examines a new set of micro-benchmarks for memory allocators that test a variety of allocation patterns, each with multiple tuning parameters.
 The aim of the micro-benchmark suite is to create a set of programs that can evaluate a memory allocator based on the key performance matrices such as speed, memory overhead, and cache performance.
+The aim of the micro-benchmark suite is to create a set of programs that can evaluate a memory allocator based on the key performance metrics such as speed, memory overhead, and cache performance.
 % These programs can be taken as a standard to benchmark an allocator's basic goals.
 These programs give details of an allocator's memory overhead and speed under certain allocation patterns.
 The allocation patterns are configurable (adjustment knobs) to observe an allocator's performance across a spectrum of events for a desired allocation pattern, which is seldom possible with benchmark programs.
+The allocation patterns are configurable (adjustment knobs) to observe an allocator's performance across a spectrum allocation patterns, which is seldom possible with benchmark programs.
 Each micro-benchmark program has multiple control knobs specified by command-line arguments.
 The new micro-benchmark suite measures performance by allocating dynamic objects and measuring specific matrices.
+The new micro-benchmark suite measures performance by allocating dynamic objects and measuring specific metrics.
 An allocator's speed is benchmarked in different ways, as are issues like false sharing.
 …
 Modern memory allocators, such as llheap, must handle multi-threaded programs at the KT and UT level.
 The following multi-threaded micro-benchmarks are presented to give a sense of prior work~\cite{Berger00} at the KT level.
 None of the prior work address multi-threading at the UT level.
+None of the prior work addresses multi-threading at the UT level.
 …
 This benchmark stresses the ability of the allocator to handle different threads allocating and deallocating independently.
 There is no interaction among threads, \ie no object sharing.
 Each thread repeatedly allocate 100,000 \emph{8-byte} objects then deallocates them in the order they were allocated.
 Runtime of the benchmark evaluates its efficiency.
+Each thread repeatedly allocates 100,000 \emph{8-byte} objects then deallocates them in the order they were allocated.
+The execution time of the benchmark evaluates its efficiency.
 …
 Before the thread terminates, it passes its array of 10,000 objects to a new child thread to continue the process.
 The number of thread generations varies depending on the thread speed.
 It calculates memory operations per second as an indicator of memory allocator's performance.
+It calculates memory operations per second as an indicator of the memory allocator's performance.
 …
 \label{s:ChurnBenchmark}
 The churn benchmark measures the runtime speed of an allocator in a multi-threaded scenerio, where each thread extensively allocates and frees dynamic memory.
+The churn benchmark measures the runtime speed of an allocator in a multi-threaded scenario, where each thread extensively allocates and frees dynamic memory.
 Only @malloc@ and @free@ are used to eliminate any extra cost, such as @memcpy@ in @calloc@ or @realloc@.
 Churn simulates a memory intensive program that can be tuned to create different scenarios.
+Churn simulates a memory intensive program and can be tuned to create different scenarios.
 \VRef[Figure]{fig:ChurnBenchFig} shows the pseudo code for the churn micro-benchmark.
 …
 When threads share a cache line, frequent reads/writes to their cache-line object causes cache misses, which cause escalating delays as cache distance increases.
 Cache thrash tries to create a scenerio that leads to false sharing, if the underlying memory allocator is allocating dynamic memory to multiple threads on the same cache lines.
+Cache thrash tries to create a scenario that leads to false sharing, if the underlying memory allocator is allocating dynamic memory to multiple threads on the same cache lines.
 Ideally, a memory allocator should distance the dynamic memory region of one thread from another.
 Having multiple threads allocating small objects simultaneously can cause a memory allocator to allocate objects on the same cache line, if its not distancing the memory among different threads.
 …
 Each worker thread allocates an object and intensively reads/writes it for M times to possible invalidate cache lines that may interfere with other threads sharing the same cache line.
 Each thread repeats this for N times.
 The main thread measures the total time taken to for all worker threads to complete.
 Worker threads sharing cache lines with each other will take longer.
+The main thread measures the total time taken for all worker threads to complete.
+Worker threads sharing cache lines with each other are expected to take longer.
 \begin{figure}
 …
         signal workers to free
         ...
-        print addresses from each $thread$
 Worker Thread$\(_1\)$
         allocate, write, read, free
         warmup memory in chunkc of 16 bytes
         ...
         malloc N objects
         ...
         free objects
         return object address to Main Thread
+        warm up memory in chunks of 16 bytes
+        ...
+        For N
+                malloc an object
+                read/write the object M times
+                free the object
+        ...
 Worker Thread$\(_2\)$
         // same as Worker Thread$\(_1\)$
 …
 The cache-scratch micro-benchmark measures allocator-induced passive false-sharing as illustrated in \VRef{s:AllocatorInducedPassiveFalseSharing}.
 As for cache thrash, if memory is allocated for multiple threads on the same cache line, this can significantly slow down program performance.
+As with cache thrash, if memory is allocated for multiple threads on the same cache line, this can significantly slow down program performance.
 In this scenario, the false sharing is being caused by the memory allocator although it is started by the program sharing an object.
 …
 Cache scratch tries to create a scenario that leads to false sharing and should make the memory allocator preserve the program-induced false sharing, if it does not return a freed object to its owner thread and, instead, re-uses it instantly.
 An allocator using object ownership, as described in section \VRef{s:Ownership}, is less susceptible to allocator-induced passive false-sharing.
 If the object is returned to the thread who owns it, then the thread that gets a new object is less likely to be on the same cache line.
+If the object is returned to the thread that owns it, then the new object that the thread gets is less likely to be on the same cache line.
 \VRef[Figure]{fig:benchScratchFig} shows the pseudo code for the cache-scratch micro-benchmark.
 …
         signal workers to free
         ...
-        print addresses from each $thread$
 Worker Thread$\(_1\)$
+        allocate, write, read, free
+        warmup memory in chunkc of 16 bytes
+        ...
+        for ( N )
+                free an object passed by Main Thread
+        warmup memory in chunks of 16 bytes
+        ...
+        free the object passed by the Main Thread
+        For N
                 malloc new object
         ...
         free objects
         return new object addresses to Main Thread
+                read/write the object M times
+                free the object
+        ...
 Worker Thread$\(_2\)$
         // same as Worker Thread$\(_1\)$
 …
 Similar to benchmark cache thrash in section \VRef{sec:benchThrashSec}, different cache access scenarios can be created using the following command-line arguments.
 \begin{description}[itemsep=0pt,parsep=0pt]
+\begin{description}[topsep=0pt,itemsep=0pt,parsep=0pt]
 \item[threads:]
 number of threads (K).
 …
 \subsection{Speed Micro-Benchmark}
 \label{s:SpeedMicroBenchmark}
+\vspace*{-4pt}
 The speed benchmark measures the runtime speed of individual and sequences of memory allocation routines:
 \begin{enumerate}[itemsep=0pt,parsep=0pt]
+\begin{enumerate}[topsep=-5pt,itemsep=0pt,parsep=0pt]
 \item malloc
 \item realloc
 …
 \VRef[Figure]{fig:MemoryBenchFig} shows the pseudo code for the memory micro-benchmark.
 It creates a producer-consumer scenario with K producer threads and each producer has M consumer threads.
 A producer has a separate buffer for each consumer and allocates N objects of random sizes following a settable distribution for each consumer.
+A producer has a separate buffer for each consumer and allocates N objects of random sizes following a configurable distribution for each consumer.
 A consumer frees these objects.
 After every memory operation, program memory usage is recorded throughout the runtime.

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset e5d9274 for doc/theses/mubeen_zulfiqar_MMath/benchmarks.tex

Legend:

doc/theses/mubeen_zulfiqar_MMath/benchmarks.tex

Download in other formats: