Ignore:
Timestamp:
Jul 13, 2023, 5:52:20 PM (10 months ago)
Author:
caparsons <caparson@…>
Branches:
master
Children:
4acf56d, ac09751
Parents:
60a9164
Message:

reworked part of actor perf section

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/colby_parsons_MMAth/text/actors.tex

    r60a9164 ra3c7bac  
    12071207The results are spread out more, and there is a difference between AMD and Intel.
    12081208Again, CAF is significantly slower than the other actor systems.
    1209 On the AMD there is a tight grouping of uC++, ProroActor, and Akka;
    1210 on the Intel, uC++, ProroActor, and Akka are spread out.
     1209To keep the graphs readable, the y-axis was cut at 100 seconds; as the core count increases from 8-32, CAF ranges around 200 seconds on AMD and between 300-1000 seconds on the Intel.
     1210On the AMD there is a tight grouping of uC++, ProtoActor, and Akka;
     1211on the Intel, uC++, ProtoActor, and Akka are spread out.
    12111212Finally, \CFA runs consistently on both of the AMD and Intel, and is faster than \uC on the AMD, but slightly slower on the Intel.
    1212 This benchmark is a pathological case for work stealing actor systems, as the majority of work is being performed by the single actor conducting the scatter/gather.
    1213 The impact of work stealing on this benchmark is discussed further in Section~\ref{s:steal_perf}.
    1214 Here, gains from using the copy queue are much less apparent, due to the costs of stealing.
     1213Here, gains from using the copy queue are much less apparent.
    12151214
    12161215\begin{figure}
     
    13641363
    13651364Figures~\ref{f:cfaRepeatAMD} and~\ref{f:cfaRepeatIntel} show the effects of the stealing heuristics for the repeat benchmark.
    1366 As mentioned, the repeat benchmark is a pathological case for work stealing systems since there is one actor with the majority of the work, and not enough other work to go around.
    1367 The worst-case scenario is if the actor doing the majority of work or its mail queue is stolen by the work stealing system, as this incurs a huge cost to move the work and refill the local cache.
     1365This benchmark is a pathological case for work stealing actor systems, as the majority of work is being performed by the single actor conducting the scatter/gather.
     1366The single actor (the client) of this experiment is long running and maintains a lot of state, as it needs to know the handles of all the servers.
     1367When stealing the client or its respective queue (in \CFA's inverted model), moving the client incurs a high cost due to cache invalidation.
    13681368This worst-case steal is likely to happen since there is little other work in the system between scatter/gather rounds.
    13691369However, all heuristics are comparable in performance on the repeat benchmark.
    1370 This result is surprising especially for the No-Stealing variant, which should have better performance than the stealing variants.
    1371 It is likely the No-Stealing variant is impacted by other design decisions in the \CFA actor system related to work stealing.
     1370This result is surprising especially for the No-Stealing variant, which one would expect to have better performance than the stealing variants.
     1371This is not the case, since the stealing happens lazily and fails fast, the queue containing the long-running client actor is rarely stolen.
    13721372
    13731373Work stealing performance can be further analyzed by reexamining the executor and repeat benchmarks in Figures~\ref{f:ExecutorBenchmark} and \ref{f:RepeatBenchmark}, respectively.
    1374 In both, benchmarks CAF performs poorly.
     1374In both benchmarks, CAF performs poorly.
    13751375It is hypothesized that CAF has an aggressive work stealing algorithm that eagerly attempts to steal.
    13761376This results in the poor performance with small messages containing little work per message in both of these benchmarks.
    13771377In comparison with the other systems, \uC does well on both benchmarks since it does not have work stealing.
    13781378
    1379 \PAB{In particular, on the Intel machine in Figure~\ref{f:RepeatIntel}, the cost of stealing is significantly higher, which can be seen in the vertical shift of Akka, CAF and \CFA compared to the AMD results in Figure~\ref{f:RepeatAMD} (\uC and ProtoActor do not have work stealing).
    1380 The shift for CAF is particularly large, which supports the hypothesis that CAF's work stealing is particularly eager.
    1381 The client of this experiment is long running and maintains a lot of state, as it needs to know the handles of all the servers.
    1382 When stealing the client or its respective queue (in \CFA's inverted model), moving the client incurs a high cost due to cache invalidation.
    1383 As such stealing the client can result in a hit in performance.}
    1384 
    13851379Finally, Figures~\ref{f:cfaMatrixAMD} and~\ref{f:cfaMatrixIntel} show the effects of the stealing heuristics for the matrix-multiply benchmark.
    13861380Here, there is negligible performance difference across stealing heuristics, likely due to the long running workload of each message.
    13871381
    13881382Stealing can still improve performance marginally in the matrix-multiply benchmark.
    1389 In \ref{f:MatrixAMD} CAF performs much better; few messages are sent, so the eager work stealing allows for the clean up of loose ends to occur faster.
     1383In \ref{f:MatrixAMD} CAF performs better; few messages are sent, so the eager work stealing allows for the clean up of loose ends to occur faster.
    13901384This hypothesis stems from experimentation with \CFA.
    13911385CAF uses a randomized work stealing heuristic.
Note: See TracChangeset for help on using the changeset viewer.