Context Navigation

-                      r60a9164
+                      ra3c7bac
 The results are spread out more, and there is a difference between AMD and Intel.
 Again, CAF is significantly slower than the other actor systems.
+On the AMD there is a tight grouping of uC++, ProroActor, and Akka;
+on the Intel, uC++, ProroActor, and Akka are spread out.
+To keep the graphs readable, the y-axis was cut at 100 seconds; as the core count increases from 8-32, CAF ranges around 200 seconds on AMD and between 300-1000 seconds on the Intel.
+On the AMD there is a tight grouping of uC++, ProtoActor, and Akka;
+on the Intel, uC++, ProtoActor, and Akka are spread out.
 Finally, \CFA runs consistently on both of the AMD and Intel, and is faster than \uC on the AMD, but slightly slower on the Intel.
+This benchmark is a pathological case for work stealing actor systems, as the majority of work is being performed by the single actor conducting the scatter/gather.
+The impact of work stealing on this benchmark is discussed further in Section~\ref{s:steal_perf}.
+Here, gains from using the copy queue are much less apparent, due to the costs of stealing.
+Here, gains from using the copy queue are much less apparent.
 \begin{figure}
 …
 Figures~\ref{f:cfaRepeatAMD} and~\ref{f:cfaRepeatIntel} show the effects of the stealing heuristics for the repeat benchmark.
+As mentioned, the repeat benchmark is a pathological case for work stealing systems since there is one actor with the majority of the work, and not enough other work to go around.
+The worst-case scenario is if the actor doing the majority of work or its mail queue is stolen by the work stealing system, as this incurs a huge cost to move the work and refill the local cache.
+This benchmark is a pathological case for work stealing actor systems, as the majority of work is being performed by the single actor conducting the scatter/gather.
+The single actor (the client) of this experiment is long running and maintains a lot of state, as it needs to know the handles of all the servers.
+When stealing the client or its respective queue (in \CFA's inverted model), moving the client incurs a high cost due to cache invalidation.
 This worst-case steal is likely to happen since there is little other work in the system between scatter/gather rounds.
 However, all heuristics are comparable in performance on the repeat benchmark.
 This result is surprising especially for the No-Stealing variant, which should have better performance than the stealing variants.
 It is likely the No-Stealing variant is impacted by other design decisions in the \CFA actor system related to work stealing.
+This result is surprising especially for the No-Stealing variant, which one would expect to have better performance than the stealing variants.
+This is not the case, since the stealing happens lazily and fails fast, the queue containing the long-running client actor is rarely stolen.
 Work stealing performance can be further analyzed by reexamining the executor and repeat benchmarks in Figures~\ref{f:ExecutorBenchmark} and \ref{f:RepeatBenchmark}, respectively.
 In both, benchmarks CAF performs poorly.
+In both benchmarks, CAF performs poorly.
 It is hypothesized that CAF has an aggressive work stealing algorithm that eagerly attempts to steal.
 This results in the poor performance with small messages containing little work per message in both of these benchmarks.
 In comparison with the other systems, \uC does well on both benchmarks since it does not have work stealing.
-\PAB{In particular, on the Intel machine in Figure~\ref{f:RepeatIntel}, the cost of stealing is significantly higher, which can be seen in the vertical shift of Akka, CAF and \CFA compared to the AMD results in Figure~\ref{f:RepeatAMD} (\uC and ProtoActor do not have work stealing).
-The shift for CAF is particularly large, which supports the hypothesis that CAF's work stealing is particularly eager.
-The client of this experiment is long running and maintains a lot of state, as it needs to know the handles of all the servers.
-When stealing the client or its respective queue (in \CFA's inverted model), moving the client incurs a high cost due to cache invalidation.
-As such stealing the client can result in a hit in performance.}
 Finally, Figures~\ref{f:cfaMatrixAMD} and~\ref{f:cfaMatrixIntel} show the effects of the stealing heuristics for the matrix-multiply benchmark.
 Here, there is negligible performance difference across stealing heuristics, likely due to the long running workload of each message.
 Stealing can still improve performance marginally in the matrix-multiply benchmark.
 In \ref{f:MatrixAMD} CAF performs much better; few messages are sent, so the eager work stealing allows for the clean up of loose ends to occur faster.
+In \ref{f:MatrixAMD} CAF performs better; few messages are sent, so the eager work stealing allows for the clean up of loose ends to occur faster.
 This hypothesis stems from experimentation with \CFA.
 CAF uses a randomized work stealing heuristic.

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset a3c7bac for doc/theses/colby_parsons_MMAth/text/actors.tex

Legend:

doc/theses/colby_parsons_MMAth/text/actors.tex

Download in other formats: