Changeset 4d8fbf4 for doc/theses/andrew_beach_MMath/performance.tex
- Timestamp:
- Sep 16, 2021, 2:22:01 PM (3 years ago)
- Branches:
- ADT, ast-experimental, enum, forall-pointer-decay, master, pthread-emulation, qualifiedEnum
- Children:
- 432bffe, 7e7a076
- Parents:
- a8367eb (diff), 140eb16 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/andrew_beach_MMath/performance.tex
ra8367eb r4d8fbf4 5 5 Instead, the focus was to get the features working. The only performance 6 6 requirement is to ensure the tests for correctness run in a reasonable 7 amount of time. Hence, a few basic performance tests were performed to7 amount of time. Hence, only a few basic performance tests were performed to 8 8 check this requirement. 9 9 … … 13 13 one with termination and one with resumption. 14 14 15 C++ is the most comparable language because both it and \CFA use the same15 GCC C++ is the most comparable language because both it and \CFA use the same 16 16 framework, libunwind. 17 17 In fact, the comparison is almost entirely in quality of implementation. … … 46 46 the number used in the timing runs is given with the results per test. 47 47 The Java tests run the main loop 1000 times before 48 beginning the actual test to ``warm -up" the JVM.48 beginning the actual test to ``warm up" the JVM. 49 49 % All other languages are precompiled or interpreted. 50 50 … … 54 54 unhandled exceptions in \Cpp and Java as that would cause the process to 55 55 terminate. 56 Luckily, performance on the ``give -up and kill the process" path is not56 Luckily, performance on the ``give up and kill the process" path is not 57 57 critical. 58 58 … … 76 76 using gcc-10 10.3.0 as a backend. 77 77 g++-10 10.3.0 is used for \Cpp. 78 Java tests are complied and run with version 11.0.11.79 Python used version 3.8.10.78 Java tests are complied and run with Oracle OpenJDK version 11.0.11. 79 Python used CPython version 3.8.10. 80 80 The machines used to run the tests are: 81 81 \begin{itemize}[nosep] … … 85 85 \lstinline{@} 2.5 GHz running Linux v5.11.0-25 86 86 \end{itemize} 87 Representingthe two major families of hardware architecture.87 These represent the two major families of hardware architecture. 88 88 89 89 \section{Tests} … … 93 93 94 94 \paragraph{Stack Traversal} 95 This group measures the cost of traversing the stack,95 This group of tests measures the cost of traversing the stack 96 96 (and in termination, unwinding it). 97 97 Inside the main loop is a call to a recursive function. … … 147 147 This group of tests measures the cost for setting up exception handling, 148 148 if it is 149 not used (because the exceptional case did not occur).149 not used because the exceptional case did not occur. 150 150 Tests repeatedly cross (enter, execute and leave) a try statement but never 151 151 perform a raise. … … 222 222 for that language and the result is marked N/A. 223 223 There are also cases where the feature is supported but measuring its 224 cost is impossible. This happened with Java, which uses a JIT that optimize 225 away the tests and itcannot be stopped.\cite{Dice21}224 cost is impossible. This happened with Java, which uses a JIT that optimizes 225 away the tests and cannot be stopped.\cite{Dice21} 226 226 These tests are marked N/C. 227 227 To get results in a consistent range (1 second to 1 minute is ideal, … … 230 230 results and has a value in the millions. 231 231 232 An anomaly in some results came from \CFA's use of gccnested functions.232 An anomaly in some results came from \CFA's use of GCC nested functions. 233 233 These nested functions are used to create closures that can access stack 234 234 variables in their lexical scope. 235 However, if they do so, then they can cause the benchmark's run -time to235 However, if they do so, then they can cause the benchmark's run time to 236 236 increase by an order of magnitude. 237 237 The simplest solution is to make those values global variables instead 238 of function 238 of function-local variables. 239 239 % Do we know if editing a global inside nested function is a problem? 240 240 Tests that had to be modified to avoid this problem have been marked … … 255 255 \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ 256 256 \hline 257 Empty Traversal (1M) & 3.4 & 2.8 & 18.3 & 23.4 & 3.7 & 3.2 & 15.5 & 14.8\\258 D'tor Traversal (1M) & 48. 4 & 23.6 & N/A & N/A & 64.2 & 29.0& N/A & N/A \\259 Finally Traversal (1M) & 3. 4* & N/A & 17.9 & 29.0 & 4.1* & N/A & 15.6& 19.0 \\260 Other Traversal (1M) & 3. 6* & 23.2 & 18.2 & 32.7 & 4.0* & 24.5 & 15.5 & 21.4\\261 Cross Handler (1 00M) & 6.0 & 0.9 & N/C & 37.4 & 10.0 & 0.8 & N/C & 32.2\\262 Cross Finally (1 00M) & 0.9 & N/A & N/C & 44.1 & 0.8& N/A & N/C & 37.3 \\263 Match All (10M) & 3 2.9 & 20.7 & 13.4 & 4.9 & 36.2 & 24.5 & 12.0& 3.1 \\264 Match None (10M) & 3 2.7 & 50.3 & 11.0 & 5.1 & 36.3 & 71.9 & 12.3 & 4.2\\257 Empty Traversal (1M) & 23.0 & 9.6 & 17.6 & 23.4 & 30.6 & 13.6 & 15.5 & 14.7 \\ 258 D'tor Traversal (1M) & 48.1 & 23.5 & N/A & N/A & 64.2 & 29.2 & N/A & N/A \\ 259 Finally Traversal (1M) & 3.2* & N/A & 17.6 & 29.2 & 3.9* & N/A & 15.5 & 19.0 \\ 260 Other Traversal (1M) & 3.3* & 23.9 & 17.7 & 32.8 & 3.9* & 24.5 & 15.5 & 21.6 \\ 261 Cross Handler (1B) & 6.5 & 0.9 & N/C & 38.0 & 9.6 & 0.8 & N/C & 32.1 \\ 262 Cross Finally (1B) & 0.8 & N/A & N/C & 44.6 & 0.6 & N/A & N/C & 37.3 \\ 263 Match All (10M) & 30.5 & 20.6 & 11.2 & 3.9 & 36.9 & 24.6 & 10.7 & 3.1 \\ 264 Match None (10M) & 30.6 & 50.9 & 11.2 & 5.0 & 36.9 & 71.9 & 10.7 & 4.1 \\ 265 265 \hline 266 266 \end{tabular} … … 276 276 & AMD & ARM \\ 277 277 \hline 278 Empty Traversal (10M) & 0.2 & 0.3\\278 Empty Traversal (10M) & 1.4 & 1.2 \\ 279 279 D'tor Traversal (10M) & 1.8 & 1.0 \\ 280 Finally Traversal (10M) & 1. 7& 1.0 \\281 Other Traversal (10M) & 22.6 & 25. 9\\282 Cross Handler (1 00M) & 8.4& 11.9 \\280 Finally Traversal (10M) & 1.8 & 1.0 \\ 281 Other Traversal (10M) & 22.6 & 25.8 \\ 282 Cross Handler (1B) & 9.0 & 11.9 \\ 283 283 Match All (100M) & 2.3 & 3.2 \\ 284 Match None (100M) & 2.9 & 3.9\\284 Match None (100M) & 3.0 & 3.8 \\ 285 285 \hline 286 286 \end{tabular} … … 300 300 \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ 301 301 \hline 302 Resume Empty (10M) & 1. 5 & 1.5 & 14.7 & 2.3 & 176.1 & 1.0 & 1.4 & 8.9 & 1.2 & 119.9\\302 Resume Empty (10M) & 1.4 & 1.4 & 15.4 & 2.3 & 178.0 & 1.2 & 1.2 & 8.9 & 1.2 & 118.4 \\ 303 303 \hline 304 304 \end{tabular} … … 312 312 \CFA, \Cpp and Java. 313 313 % To be exact, the Match All and Match None cases. 314 %\todo{Not true in Python.} 314 315 The most likely explanation is that, since exceptions 315 316 are rarely considered to be the common case, the more optimized languages … … 346 347 Performance is similar to Empty Traversal in all languages that support finally 347 348 clauses. Only Python seems to have a larger than random noise change in 348 its run -time and it is still not large.349 its run time and it is still not large. 349 350 Despite the similarity between finally clauses and destructors, 350 finally clauses seem to avoid the spike that run -time destructors have.351 finally clauses seem to avoid the spike that run time destructors have. 351 352 Possibly some optimization removes the cost of changing contexts. 352 353 … … 356 357 This results in a significant jump. 357 358 358 Other languages experience a small increase in run -time.359 Other languages experience a small increase in run time. 359 360 The small increase likely comes from running the checks, 360 361 but they could avoid the spike by not having the same kind of overhead for … … 362 363 363 364 \item[Cross Handler] 364 Here \CFA falls behind \Cpp by a much more significant margin.365 This is likely due to the fact \CFA has to insert two extra function366 calls, while \Cpp does not have to doexecute any other instructions.365 Here, \CFA falls behind \Cpp by a much more significant margin. 366 This is likely due to the fact that \CFA has to insert two extra function 367 calls, while \Cpp does not have to execute any other instructions. 367 368 Python is much further behind. 368 369 … … 375 376 \item[Conditional Match] 376 377 Both of the conditional matching tests can be considered on their own. 377 However for evaluating the value of conditional matching itself, the378 However, for evaluating the value of conditional matching itself, the 378 379 comparison of the two sets of results is useful. 379 Consider the massive jump in run -time for \Cpp going from match all to match380 Consider the massive jump in run time for \Cpp going from match all to match 380 381 none, which none of the other languages have. 381 Some strange interaction is causing run -time to more than double for doing382 Some strange interaction is causing run time to more than double for doing 382 383 twice as many raises. 383 Java and Python avoid this problem and have similar run -time for both tests,384 Java and Python avoid this problem and have similar run time for both tests, 384 385 possibly through resource reuse or their program representation. 385 However \CFA is built like \Cpp and avoids the problem as well, this matches 386 However, \CFA is built like \Cpp, and avoids the problem as well. 387 This matches 386 388 the pattern of the conditional match, which makes the two execution paths 387 389 very similar. … … 391 393 \subsection{Resumption \texorpdfstring{(\autoref{t:PerformanceResumption})}{}} 392 394 393 Moving on to resumption, there is one general note ,395 Moving on to resumption, there is one general note: 394 396 resumption is \textit{fast}. The only test where it fell 395 397 behind termination is Cross Handler. 396 398 In every other case, the number of iterations had to be increased by a 397 factor of 10 to get the run -time in an appropriate range399 factor of 10 to get the run time in an appropriate range 398 400 and in some cases resumption still took less time. 399 401 … … 408 410 409 411 \item[D'tor Traversal] 410 Resumption does have the same spike in run -time that termination has.411 The run -time is actually very similar to Finally Traversal.412 Resumption does have the same spike in run time that termination has. 413 The run time is actually very similar to Finally Traversal. 412 414 As resumption does not unwind the stack, both destructors and finally 413 415 clauses are run while walking down the stack during the recursive returns. … … 416 418 \item[Finally Traversal] 417 419 Same as D'tor Traversal, 418 except termination did not have a spike in run -time on this test case.420 except termination did not have a spike in run time on this test case. 419 421 420 422 \item[Other Traversal] … … 427 429 The only test case where resumption could not keep up with termination, 428 430 although the difference is not as significant as many other cases. 429 It is simply a matter of where the costs come from ,430 both termination and resumption have some work to set -up or tear-down a431 It is simply a matter of where the costs come from: 432 both termination and resumption have some work to set up or tear down a 431 433 handler. It just so happens that resumption's work is slightly slower. 432 434 … … 434 436 Resumption shows a slight slowdown if the exception is not matched 435 437 by the first handler, which follows from the fact the second handler now has 436 to be checked. However the difference is not large.438 to be checked. However, the difference is not large. 437 439 438 440 \end{description} … … 448 450 More experiments could try to tease out the exact trade-offs, 449 451 but the prototype's only performance goal is to be reasonable. 450 It has already in that range, and \CFA's fixup routine simulation is452 It is already in that range, and \CFA's fixup routine simulation is 451 453 one of the faster simulations as well. 452 Plus exceptions add features and remove syntactic overhead,453 so even at similar performance resumptions have advantages454 Plus, exceptions add features and remove syntactic overhead, 455 so even at similar performance, resumptions have advantages 454 456 over fixup routines.
Note: See TracChangeset
for help on using the changeset viewer.