Changeset 0e34a14 for doc/theses/thierry_delisle_PhD/thesis/text
- Timestamp:
- Aug 14, 2022, 4:15:10 PM (2 years ago)
- Branches:
- ADT, ast-experimental, master, pthread-emulation
- Children:
- 41a6a78
- Parents:
- 2ae6a99
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/thierry_delisle_PhD/thesis/text/eval_micro.tex
r2ae6a99 r0e34a14 406 406 Beyond that performance starts to suffer from increased caching costs. 407 407 408 Indeed on Figures~\ref{fig:churn:jax:ops} and \ref{fig:churn:jax:ns} show that with 100 \ats per \proc, \CFA, libfibre, and tokio achieve effectively equivalent performance for most \proc count. 409 Interestingly, Go starts with better scaling at very low \proc counts but then performance quickly plateaus, resulting in worse performance at higher \proc counts. 410 This performance difference disappears in Figures~\ref{fig:churn:jax:low:ops} and \ref{fig:churn:jax:low:ns}, where the performance of all runtimes is equivalent. 411 412 Figure~\ref{fig:churn:nasus} again shows a similar story. 413 \CFA, libfibre, and tokio achieve effectively equivalent performance for most \proc count. 414 Go still shows different scaling than the other 3 runtimes. 415 The distinction is that on AMD the difference between Go and the other runtime is more significant. 416 Indeed, even with only 1 \at per \proc, Go achieves notably different scaling than the other runtimes. 417 418 One possible explanation for this difference is that since Go has very few available concurrent primitives, a channel was used instead of a semaphore. 419 On paper a semaphore can be replaced by a channel and with zero-sized objects passed along equivalent performance could be expected. 420 However, in practice there can be implementation difference between the two. 421 This is especially true if the semaphore count can get somewhat high. 422 Note that this replacement is also made in the cycle benchmark, however in that context it did not seem to have a notable impact. 408 Indeed on Figures~\ref{fig:churn:jax:ops} and \ref{fig:churn:jax:ns} show that with 1 and 100 \ats per \proc, \CFA, libfibre, Go and tokio achieve effectively equivalent performance for most \proc count. 409 410 However, Figure~\ref{fig:churn:nasus} again shows a somewhat different story on AMD. 411 While \CFA, libfibre, and tokio achieve effectively equivalent performance for most \proc count, Go starts with better scaling at very low \proc counts but then performance quickly plateaus, resulting in worse performance at higher \proc counts. 412 This performance difference is visible at both high and low \at counts. 413 414 One possible explanation for this difference is that since Go has very few available concurrent primitives, a channel was used instead of a semaphore. 415 On paper a semaphore can be replaced by a channel and with zero-sized objects passed along equivalent performance could be expected. 416 However, in practice there can be implementation difference between the two. 417 This is especially true if the semaphore count can get somewhat high. 418 Note that this replacement is also made in the cycle benchmark, however in that context it did not seem to have a notable impact. 419 420 As second possible explanation is that Go may sometimes use the heap when allocating variables based on the result of escape analysis of the code. 421 It is possible that variables that should be placed on the stack are placed on the heap. 422 This could cause extra pointer chasing in the benchmark, heightning locality effects. 423 Depending on how the heap is structure, this could also lead to false sharing. 423 424 424 425 The objective of this benchmark is to demonstrate that unparking \ats from remote \procs do not cause too much contention on the local queues. … … 541 542 In both cases, the graphs on the left column show the results for the @share@ variation and the graphs on the right column show the results for the @noshare@. 542 543 543 that the results somewhat follow the expectation.544 On the left of the figure showing the results for the shared variation, where \CFA and tokio outperform libfibre as expected.544 On Intel, Figure~\ref{fig:locality:jax} shows Go trailing behind the 3 other runtimes. 545 On the left of the figure showing the results for the shared variation, where \CFA and tokio slightly outperform libfibre as expected. 545 546 And correspondingly on the right, we see the expected performance inversion where libfibre now outperforms \CFA and tokio. 546 547 Otherwise the results are similar to the churn benchmark, with lower throughtput due to the array processing. 547 It is unclear why Go's performance is notably worst than the other runtimes.548 Presumably the reason why Go trails behind are the same as in Figure~\ref{fig:churn:nasus}. 548 549 549 550 Figure~\ref{fig:locality:nasus} shows the same experiment on AMD.
Note: See TracChangeset
for help on using the changeset viewer.