Changeset 949339b for doc/theses/andrew_beach_MMath/performance.tex
- Timestamp:
- Sep 27, 2021, 2:09:55 PM (4 years ago)
- Branches:
- ADT, ast-experimental, enum, forall-pointer-decay, master, pthread-emulation, qualifiedEnum
- Children:
- cc287800
- Parents:
- 4e28d2e9 (diff), 056cbdb (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)links above to see all the changes relative to each parent. - File:
-
- 1 edited
-
doc/theses/andrew_beach_MMath/performance.tex (modified) (25 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/andrew_beach_MMath/performance.tex
r4e28d2e9 r949339b 5 5 Instead, the focus was to get the features working. The only performance 6 6 requirement is to ensure the tests for correctness run in a reasonable 7 amount of time. Hence, a few basic performance tests were performed to7 amount of time. Hence, only a few basic performance tests were performed to 8 8 check this requirement. 9 9 … … 13 13 one with termination and one with resumption. 14 14 15 C++ is the most comparable language because both it and \CFA use the same15 GCC C++ is the most comparable language because both it and \CFA use the same 16 16 framework, libunwind. 17 17 In fact, the comparison is almost entirely in quality of implementation. … … 37 37 resumption exceptions. Even the older programming languages with resumption 38 38 seem to be notable only for having resumption. 39 On the other hand, the functional equivalents to resumption are too new. 40 There does not seem to be any standard implementations in well-known 41 languages; so far, they seem confined to extensions and research languages. 42 % There was some maybe interesting comparison to an OCaml extension 43 % but I'm not sure how to get that working if it is interesting. 39 44 Instead, resumption is compared to its simulation in other programming 40 45 languages: fixup functions that are explicitly passed into a function. … … 46 51 the number used in the timing runs is given with the results per test. 47 52 The Java tests run the main loop 1000 times before 48 beginning the actual test to ``warm -up" the JVM.53 beginning the actual test to ``warm up" the JVM. 49 54 % All other languages are precompiled or interpreted. 50 55 … … 54 59 unhandled exceptions in \Cpp and Java as that would cause the process to 55 60 terminate. 56 Luckily, performance on the ``give -up and kill the process" path is not61 Luckily, performance on the ``give up and kill the process" path is not 57 62 critical. 58 63 … … 76 81 using gcc-10 10.3.0 as a backend. 77 82 g++-10 10.3.0 is used for \Cpp. 78 Java tests are complied and run with version 11.0.11.79 Python used version 3.8.10.83 Java tests are complied and run with Oracle OpenJDK version 11.0.11. 84 Python used CPython version 3.8.10. 80 85 The machines used to run the tests are: 81 86 \begin{itemize}[nosep] … … 85 90 \lstinline{@} 2.5 GHz running Linux v5.11.0-25 86 91 \end{itemize} 87 Representingthe two major families of hardware architecture.92 These represent the two major families of hardware architecture. 88 93 89 94 \section{Tests} … … 93 98 94 99 \paragraph{Stack Traversal} 95 This group measures the cost of traversing the stack,100 This group of tests measures the cost of traversing the stack 96 101 (and in termination, unwinding it). 97 102 Inside the main loop is a call to a recursive function. … … 147 152 This group of tests measures the cost for setting up exception handling, 148 153 if it is 149 not used (because the exceptional case did not occur).154 not used because the exceptional case did not occur. 150 155 Tests repeatedly cross (enter, execute and leave) a try statement but never 151 156 perform a raise. … … 222 227 for that language and the result is marked N/A. 223 228 There are also cases where the feature is supported but measuring its 224 cost is impossible. This happened with Java, which uses a JIT that optimize 225 away the tests and itcannot be stopped.\cite{Dice21}229 cost is impossible. This happened with Java, which uses a JIT that optimizes 230 away the tests and cannot be stopped.\cite{Dice21} 226 231 These tests are marked N/C. 227 232 To get results in a consistent range (1 second to 1 minute is ideal, … … 230 235 results and has a value in the millions. 231 236 232 An anomaly in some results came from \CFA's use of gccnested functions.237 An anomaly in some results came from \CFA's use of GCC nested functions. 233 238 These nested functions are used to create closures that can access stack 234 239 variables in their lexical scope. 235 However, if they do so, then they can cause the benchmark's run -time to240 However, if they do so, then they can cause the benchmark's run time to 236 241 increase by an order of magnitude. 237 242 The simplest solution is to make those values global variables instead 238 of function local variables.243 of function-local variables. 239 244 % Do we know if editing a global inside nested function is a problem? 240 245 Tests that had to be modified to avoid this problem have been marked … … 255 260 \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ 256 261 \hline 257 Empty Traversal (1M) & 3.4 & 2.8 & 18.3 & 23.4 & 3.7 & 3.2 & 15.5 & 14.8\\258 D'tor Traversal (1M) & 48. 4 & 23.6 & N/A & N/A & 64.2 & 29.0& N/A & N/A \\259 Finally Traversal (1M) & 3. 4* & N/A & 17.9 & 29.0 & 4.1* & N/A & 15.6& 19.0 \\260 Other Traversal (1M) & 3. 6* & 23.2 & 18.2 & 32.7 & 4.0* & 24.5 & 15.5 & 21.4\\261 Cross Handler (1 00M) & 6.0 & 0.9 & N/C & 37.4 & 10.0 & 0.8 & N/C & 32.2\\262 Cross Finally (1 00M) & 0.9 & N/A & N/C & 44.1 & 0.8& N/A & N/C & 37.3 \\263 Match All (10M) & 3 2.9 & 20.7 & 13.4 & 4.9 & 36.2 & 24.5 & 12.0& 3.1 \\264 Match None (10M) & 3 2.7 & 50.3 & 11.0 & 5.1 & 36.3 & 71.9 & 12.3 & 4.2\\262 Empty Traversal (1M) & 23.0 & 9.6 & 17.6 & 23.4 & 30.6 & 13.6 & 15.5 & 14.7 \\ 263 D'tor Traversal (1M) & 48.1 & 23.5 & N/A & N/A & 64.2 & 29.2 & N/A & N/A \\ 264 Finally Traversal (1M) & 3.2* & N/A & 17.6 & 29.2 & 3.9* & N/A & 15.5 & 19.0 \\ 265 Other Traversal (1M) & 3.3* & 23.9 & 17.7 & 32.8 & 3.9* & 24.5 & 15.5 & 21.6 \\ 266 Cross Handler (1B) & 6.5 & 0.9 & N/C & 38.0 & 9.6 & 0.8 & N/C & 32.1 \\ 267 Cross Finally (1B) & 0.8 & N/A & N/C & 44.6 & 0.6 & N/A & N/C & 37.3 \\ 268 Match All (10M) & 30.5 & 20.6 & 11.2 & 3.9 & 36.9 & 24.6 & 10.7 & 3.1 \\ 269 Match None (10M) & 30.6 & 50.9 & 11.2 & 5.0 & 36.9 & 71.9 & 10.7 & 4.1 \\ 265 270 \hline 266 271 \end{tabular} … … 276 281 & AMD & ARM \\ 277 282 \hline 278 Empty Traversal (10M) & 0.2 & 0.3\\283 Empty Traversal (10M) & 1.4 & 1.2 \\ 279 284 D'tor Traversal (10M) & 1.8 & 1.0 \\ 280 Finally Traversal (10M) & 1. 7& 1.0 \\281 Other Traversal (10M) & 22.6 & 25. 9\\282 Cross Handler (1 00M) & 8.4& 11.9 \\285 Finally Traversal (10M) & 1.8 & 1.0 \\ 286 Other Traversal (10M) & 22.6 & 25.8 \\ 287 Cross Handler (1B) & 9.0 & 11.9 \\ 283 288 Match All (100M) & 2.3 & 3.2 \\ 284 Match None (100M) & 2.9 & 3.9\\289 Match None (100M) & 3.0 & 3.8 \\ 285 290 \hline 286 291 \end{tabular} … … 300 305 \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ 301 306 \hline 302 Resume Empty (10M) & 1. 5 & 1.5 & 14.7 & 2.3 & 176.1 & 1.0 & 1.4 & 8.9 & 1.2 & 119.9\\307 Resume Empty (10M) & 1.4 & 1.4 & 15.4 & 2.3 & 178.0 & 1.2 & 1.2 & 8.9 & 1.2 & 118.4 \\ 303 308 \hline 304 309 \end{tabular} … … 312 317 \CFA, \Cpp and Java. 313 318 % To be exact, the Match All and Match None cases. 314 The most likely explanation is that, since exceptions 315 are rarely considered to be the common case, the more optimized languages 316 make that case expensive to improve other cases. 319 The most likely explanation is that 320 the generally faster languages have made ``common cases fast" at the expense 321 of the rarer cases. Since exceptions are considered rare, they are made 322 expensive to help speed up common actions, such as entering and leaving try 323 statements. 324 Python, on the other hand, while generally slower than the other languages, 325 uses exceptions more and has not sacrificed their performance. 317 326 In addition, languages with high-level representations have a much 318 327 easier time scanning the stack as there is less to decode. … … 346 355 Performance is similar to Empty Traversal in all languages that support finally 347 356 clauses. Only Python seems to have a larger than random noise change in 348 its run -time and it is still not large.357 its run time and it is still not large. 349 358 Despite the similarity between finally clauses and destructors, 350 finally clauses seem to avoid the spike that run -time destructors have.359 finally clauses seem to avoid the spike that run time destructors have. 351 360 Possibly some optimization removes the cost of changing contexts. 352 361 … … 356 365 This results in a significant jump. 357 366 358 Other languages experience a small increase in run -time.367 Other languages experience a small increase in run time. 359 368 The small increase likely comes from running the checks, 360 369 but they could avoid the spike by not having the same kind of overhead for … … 362 371 363 372 \item[Cross Handler] 364 Here \CFA falls behind \Cpp by a much more significant margin.365 This is likely due to the fact \CFA has to insert two extra function366 calls, while \Cpp does not have to doexecute any other instructions.373 Here, \CFA falls behind \Cpp by a much more significant margin. 374 This is likely due to the fact that \CFA has to insert two extra function 375 calls, while \Cpp does not have to execute any other instructions. 367 376 Python is much further behind. 368 377 … … 375 384 \item[Conditional Match] 376 385 Both of the conditional matching tests can be considered on their own. 377 However for evaluating the value of conditional matching itself, the386 However, for evaluating the value of conditional matching itself, the 378 387 comparison of the two sets of results is useful. 379 Consider the massive jump in run -time for \Cpp going from match all to match388 Consider the massive jump in run time for \Cpp going from match all to match 380 389 none, which none of the other languages have. 381 Some strange interaction is causing run -time to more than double for doing390 Some strange interaction is causing run time to more than double for doing 382 391 twice as many raises. 383 Java and Python avoid this problem and have similar run -time for both tests,392 Java and Python avoid this problem and have similar run time for both tests, 384 393 possibly through resource reuse or their program representation. 385 However \CFA is built like \Cpp and avoids the problem as well, this matches 394 However, \CFA is built like \Cpp, and avoids the problem as well. 395 This matches 386 396 the pattern of the conditional match, which makes the two execution paths 387 397 very similar. … … 391 401 \subsection{Resumption \texorpdfstring{(\autoref{t:PerformanceResumption})}{}} 392 402 393 Moving on to resumption, there is one general note ,403 Moving on to resumption, there is one general note: 394 404 resumption is \textit{fast}. The only test where it fell 395 405 behind termination is Cross Handler. 396 406 In every other case, the number of iterations had to be increased by a 397 factor of 10 to get the run -time in an appropriate range407 factor of 10 to get the run time in an appropriate range 398 408 and in some cases resumption still took less time. 399 409 … … 408 418 409 419 \item[D'tor Traversal] 410 Resumption does have the same spike in run -time that termination has.411 The run -time is actually very similar to Finally Traversal.420 Resumption does have the same spike in run time that termination has. 421 The run time is actually very similar to Finally Traversal. 412 422 As resumption does not unwind the stack, both destructors and finally 413 423 clauses are run while walking down the stack during the recursive returns. … … 416 426 \item[Finally Traversal] 417 427 Same as D'tor Traversal, 418 except termination did not have a spike in run -time on this test case.428 except termination did not have a spike in run time on this test case. 419 429 420 430 \item[Other Traversal] … … 427 437 The only test case where resumption could not keep up with termination, 428 438 although the difference is not as significant as many other cases. 429 It is simply a matter of where the costs come from ,430 both termination and resumption have some work to set -up or tear-down a439 It is simply a matter of where the costs come from: 440 both termination and resumption have some work to set up or tear down a 431 441 handler. It just so happens that resumption's work is slightly slower. 432 442 … … 434 444 Resumption shows a slight slowdown if the exception is not matched 435 445 by the first handler, which follows from the fact the second handler now has 436 to be checked. However the difference is not large.446 to be checked. However, the difference is not large. 437 447 438 448 \end{description} … … 448 458 More experiments could try to tease out the exact trade-offs, 449 459 but the prototype's only performance goal is to be reasonable. 450 It has already in that range, and \CFA's fixup routine simulation is460 It is already in that range, and \CFA's fixup routine simulation is 451 461 one of the faster simulations as well. 452 Plus exceptions add features and remove syntactic overhead,453 so even at similar performance resumptions have advantages462 Plus, exceptions add features and remove syntactic overhead, 463 so even at similar performance, resumptions have advantages 454 464 over fixup routines.
Note:
See TracChangeset
for help on using the changeset viewer.