Changeset 7737c29
- Timestamp:
- Aug 30, 2021, 11:51:36 AM (3 years ago)
- Branches:
- ADT, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, pthread-emulation, qualifiedEnum
- Children:
- 0477127
- Parents:
- 9cb6514
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/andrew_beach_MMath/performance.tex
r9cb6514 r7737c29 11 11 Tests were run in \CFA, C++, Java and Python. 12 12 In addition there are two sets of tests for \CFA, 13 one with termination and one with resumption.13 one for termination and once with resumption. 14 14 15 15 C++ is the most comparable language because both it and \CFA use the same … … 21 21 but otherwise \Cpp should have a significant advantage. 22 22 23 Java ,a popular language with similar termination semantics, but23 Java a popular language with similar termination semantics, but 24 24 it is implemented in a very different environment, a virtual machine with 25 25 garbage collection. … … 38 38 seem to be notable only for having resumption. 39 39 Instead, resumption is compared to its simulation in other programming 40 languages: fixup functions that are explicit ly passed into a function.40 languages: fixup functions that are explicity passed into a function. 41 41 42 42 All tests are run inside a main loop that repeatedly performs a test. 43 43 This approach avoids start-up or tear-down time from 44 44 affecting the timing results. 45 The number of times the loop is run is configurable from the command line ;45 The number of times the loop is run is configurable from the command line, 46 46 the number used in the timing runs is given with the results per test. 47 %Tests ran their main loop a million times.48 The Java tests run the main loop 1000 times before47 Tests ran their main loop a million times. 48 The Java tests runs the main loop 1000 times before 49 49 beginning the actual test to ``warm-up" the JVM. 50 50 % All other languages are precompiled or interpreted. … … 72 72 % \code{C++}{catch(...)}). 73 73 74 When collecting data ,each test is run eleven times. The top three and bottom74 When collecting data each test is run eleven times. The top three and bottom 75 75 three results are discarded and the remaining five values are averaged. 76 The test are run with the latest (still pre-release) \CFA compiler ,76 The test are run with the latest (still pre-release) \CFA compiler was used, 77 77 using gcc-10 as a backend. 78 78 g++-10 is used for \Cpp. … … 113 113 } 114 114 \end{cfa} 115 Other test cases have additional code around the recursive call add ing115 Other test cases have additional code around the recursive call add 116 116 something besides simple stack frames to the stack. 117 Note that both termination and resumption have to traverse over117 Note that both termination and resumption will have to traverse over 118 118 the stack but only termination has to unwind it. 119 119 \begin{itemize}[nosep] … … 124 124 \item Empty: 125 125 The repeating function is empty except for the necessary control code. 126 As other traversal tests add to this, it is the baseline for the group126 As other traversal tests add to this, so it is the baseline for the group 127 127 as the cost comes from traversing over and unwinding a stack frame 128 128 that has no other interactions with the exception system. … … 130 130 The repeating function creates an object with a destructor before calling 131 131 itself. 132 Comparing this to the empty test gives the time to traverse over and 132 Comparing this to the empty test gives the time to traverse over and/or 133 133 unwind a destructor. 134 134 \item Finally: 135 135 The repeating function calls itself inside a try block with a finally clause 136 136 attached. 137 Comparing this to the empty test gives the time to traverse over and 137 Comparing this to the empty test gives the time to traverse over and/or 138 138 unwind a finally clause. 139 139 \item Other Handler: 140 140 The repeating function calls itself inside a try block with a handler that 141 doesnot match the raised exception, but is of the same kind of handler.142 This means that the EHM has to check each handler, butcontinue143 over all of the muntil it reaches the base of the stack.144 Comparing this to the empty test gives the time to traverse over and 141 will not match the raised exception, but is of the same kind of handler. 142 This means that the EHM will have to check each handler, but will continue 143 over all of the until it reaches the base of the stack. 144 Comparing this to the empty test gives the time to traverse over and/or 145 145 unwind a handler. 146 146 \end{itemize} 147 147 148 148 \paragraph{Cross Try Statement} 149 This group of tests measures the cost for setting up exception handling,if it is149 This group of tests measures the cost setting up exception handling if it is 150 150 not used (because the exceptional case did not occur). 151 Tests repeatedly cross (enter , execute, and leave) a try statement but never152 p erform a raise.151 Tests repeatedly cross (enter and leave, execute) a try statement but never 152 preform a raise. 153 153 \begin{itemize}[nosep] 154 154 \item Handler: … … 165 165 to handle that exception. 166 166 167 Here is the pattern shown in \CFA and \Cpp. Java and Python use the same167 There is the pattern shown in \CFA and \Cpp. Java and Python use the same 168 168 pattern as \Cpp, but with their own syntax. 169 169 … … 229 229 going higher is better than going low) N, the number of iterations of the 230 230 main loop in each test, is varied between tests. It is also given in the 231 results and hasa value in the millions.231 results and usually have a value in the millions. 232 232 233 233 An anomaly in some results came from \CFA's use of gcc nested functions. 234 234 These nested functions are used to create closures that can access stack 235 235 variables in their lexical scope. 236 However, if they do so ,then they can cause the benchmark's run-time to236 However, if they do so then they can cause the benchmark's run-time to 237 237 increase by an order of magnitude. 238 238 The simplest solution is to make those values global variables instead … … 309 309 % Now discuss the results in the tables. 310 310 One result not directly related to \CFA but important to keep in mind is that, 311 for exceptions ,the standard intuition about which languages should go311 for exceptions the standard intuition about which languages should go 312 312 faster often does not hold. 313 313 For example, there are a few cases where Python out-performs … … 324 324 The only performance requirement is to insure the \CFA EHM has reasonable 325 325 performance for prototyping. 326 Although that may be hard to exactly quantify, Ibelieve it has succeeded326 Although that may be hard to exactly quantify, we believe it has succeeded 327 327 in that regard. 328 328 Details on the different test cases follow. 329 330 \subsection{Termination, \autoref{t:PerformanceTermination}}331 329 332 330 \begin{description} … … 334 332 \CFA is slower than \Cpp, but is still faster than the other languages 335 333 and closer to \Cpp than other languages. 336 This resultis to be expected as \CFA is closer to \Cpp than the other languages.334 This is to be expected as \CFA is closer to \Cpp than the other languages. 337 335 338 336 \item[D'tor Traversal] 339 Running destructors causes a huge slowdown in the two languages that support337 Running destructors causes huge slowdown in every language that supports 340 338 them. \CFA has a higher proportionate slowdown but it is similar to \Cpp's. 341 Considering the amount of work done in destructors is virtually zero (asm instruction),the cost342 must come from the change of context required to trigger the destructor.339 Considering the amount of work done in destructors is so low the cost 340 likely comes from the change of context required to do that work. 343 341 344 342 \item[Finally Traversal] 345 Performanceis similar to Empty Traversal in all languages that support finally343 Speed is similar to Empty Traversal in all languages that support finally 346 344 clauses. Only Python seems to have a larger than random noise change in 347 345 its run-time and it is still not large. 348 346 Despite the similarity between finally clauses and destructors, 349 finally clauses seem to avoid the spike thatrun-time destructors have.347 finally clauses seem to avoid the spike in run-time destructors have. 350 348 Possibly some optimization removes the cost of changing contexts. 351 349 \todo{OK, I think the finally clause may have been optimized out.} … … 356 354 This results in a significant jump. 357 355 358 Other languages experi ence a small increase in run-time.356 Other languages experiance a small increase in run-time. 359 357 The small increase likely comes from running the checks, 360 358 but they could avoid the spike by not having the same kind of overhead for 361 359 switching to the check's context. 362 360 363 \todo{Could revis it Other Traversal, after Finally Traversal.}361 \todo{Could revist Other Traversal, after Finally Traversal.} 364 362 365 363 \item[Cross Handler] 366 364 Here \CFA falls behind \Cpp by a much more significant margin. 367 365 This is likely due to the fact \CFA has to insert two extra function 368 calls , while \Cpp does not have to execute any other instructions.366 calls while \Cpp doesn't have to do execute any other instructions. 369 367 Python is much further behind. 370 368 … … 372 370 \CFA's performance now matches \Cpp's from Cross Handler. 373 371 If the code from the finally clause is being inlined, 374 which is just a nasm comment, than there are no additional instructions372 which is just a asm comment, than there are no additional instructions 375 373 to execute again when exiting the try statement normally. 376 374 377 375 \item[Conditional Match] 378 Both of the conditional matching tests can be considered on their own .379 However for evaluating the value of conditional matching itself,the376 Both of the conditional matching tests can be considered on their own, 377 however for evaluating the value of conditional matching itself the 380 378 comparison of the two sets of results is useful. 381 379 Consider the massive jump in run-time for \Cpp going from match all to match … … 386 384 possibly through resource reuse or their program representation. 387 385 However \CFA is built like \Cpp and avoids the problem as well, this matches 388 the pattern of the conditional match ,which makes the two execution paths389 verysimilar.386 the pattern of the conditional match which makes the two execution paths 387 much more similar. 390 388 391 389 \end{description} 392 390 393 \subsection{Resumption, \autoref{t:PerformanceResumption}} 394 395 Moving on to resumption, there is one general note, 396 resumption is \textit{fast}. The only test where it fell 391 Moving on to resumption there is one general note, 392 resumption is \textit{fast}, the only test where it fell 397 393 behind termination is Cross Handler. 398 394 In every other case, the number of iterations had to be increased by a 399 factor of 10 to get the run-time in an appropr iate range395 factor of 10 to get the run-time in an approprate range 400 396 and in some cases resumption still took less time. 401 397 … … 405 401 \item[Empty Traversal] 406 402 See above for the general speed-up notes. 407 This result is not surprising as resumption's link ed-list approach403 This result is not surprising as resumption's link list approach 408 404 means that traversing over stack frames without a resumption handler is 409 405 $O(1)$. … … 412 408 Resumption does have the same spike in run-time that termination has. 413 409 The run-time is actually very similar to Finally Traversal. 414 As resumption does not unwind the stack ,both destructors and finally415 clauses are run while walking down the stack during the recursion returns.410 As resumption does not unwind the stack both destructors and finally 411 clauses are run while walking down the stack normally. 416 412 So it follows their performance is similar. 417 413 418 414 \item[Finally Traversal] 419 % The increase in run-time from Empty Traversal (once adjusted for 420 % the number of iterations) is roughly the same as for termination. 421 % This suggests that the 422 See D'tor Traversal discussion. 415 The increase in run-time fromm Empty Traversal (once adjusted for 416 the number of iterations) roughly the same as for termination. 417 This suggests that the 423 418 424 419 \item[Other Traversal] … … 431 426 The only test case where resumption could not keep up with termination, 432 427 although the difference is not as significant as many other cases. 433 It is simply a matter of where the costs come from. \PAB{What does this mean? 434 Even if \CFA termination 435 is not ``zero-cost", passing through an empty function still seems to be 436 cheaper than updating global values.} 428 It is simply a matter of where the costs come from. Even if \CFA termination 429 is not ``zero-cost" passing through an empty function still seems to be 430 cheaper than updating global values. 437 431 438 432 \item[Conditional Match] … … 443 437 \end{description} 444 438 445 \subsection{Resumption/Fixup, \autoref{t:PerformanceFixupRoutines}}446 447 439 Finally are the results of the resumption/fixup routine comparison. 448 These results are surprisingly varied . It is possible that creating a closure440 These results are surprisingly varied, it is possible that creating a closure 449 441 has more to do with performance than passing the argument through layers of 450 442 calls. 451 443 Even with 100 stack frames though, resumption is only about as fast as 452 444 manually passing a fixup routine. 453 However, as the number of fixup routines is increased, the cost of passing them454 should make the resumption dynamic-search cheaper.455 445 So there is a cost for the additional power and flexibility exceptions 456 446 provide.
Note: See TracChangeset
for help on using the changeset viewer.