Changeset 0477127
- Timestamp:
- Aug 30, 2021, 6:24:40 PM (3 years ago)
- Branches:
- ADT, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, pthread-emulation, qualifiedEnum
- Children:
- f93d7fc
- Parents:
- 7737c29
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/andrew_beach_MMath/performance.tex
r7737c29 r0477127 11 11 Tests were run in \CFA, C++, Java and Python. 12 12 In addition there are two sets of tests for \CFA, 13 one for termination and once with resumption.13 one with termination and one with resumption. 14 14 15 15 C++ is the most comparable language because both it and \CFA use the same … … 21 21 but otherwise \Cpp should have a significant advantage. 22 22 23 Java a popular language with similar termination semantics, but24 i t is implemented in a very different environment, a virtual machine with23 Java, a popular language with similar termination semantics, 24 is implemented in a very different environment, a virtual machine with 25 25 garbage collection. 26 26 It also implements the finally clause on try blocks allowing for a direct … … 38 38 seem to be notable only for having resumption. 39 39 Instead, resumption is compared to its simulation in other programming 40 languages: fixup functions that are explicit y passed into a function.40 languages: fixup functions that are explicitly passed into a function. 41 41 42 42 All tests are run inside a main loop that repeatedly performs a test. 43 43 This approach avoids start-up or tear-down time from 44 44 affecting the timing results. 45 The number of times the loop is run is configurable from the command line ,45 The number of times the loop is run is configurable from the command line; 46 46 the number used in the timing runs is given with the results per test. 47 Tests ran their main loop a million times. 48 The Java tests runs the main loop 1000 times before 47 The Java tests run the main loop 1000 times before 49 48 beginning the actual test to ``warm-up" the JVM. 50 49 % All other languages are precompiled or interpreted. … … 72 71 % \code{C++}{catch(...)}). 73 72 74 When collecting data each test is run eleven times. The top three and bottom73 When collecting data, each test is run eleven times. The top three and bottom 75 74 three results are discarded and the remaining five values are averaged. 76 The test are run with the latest (still pre-release) \CFA compiler was used,75 The test are run with the latest (still pre-release) \CFA compiler, 77 76 using gcc-10 as a backend. 78 77 g++-10 is used for \Cpp. … … 113 112 } 114 113 \end{cfa} 115 Other test cases have additional code around the recursive call add 114 Other test cases have additional code around the recursive call adding 116 115 something besides simple stack frames to the stack. 117 Note that both termination and resumption willhave to traverse over116 Note that both termination and resumption have to traverse over 118 117 the stack but only termination has to unwind it. 119 118 \begin{itemize}[nosep] … … 124 123 \item Empty: 125 124 The repeating function is empty except for the necessary control code. 126 As other traversal tests add to this, soit is the baseline for the group125 As other traversal tests add to this, it is the baseline for the group 127 126 as the cost comes from traversing over and unwinding a stack frame 128 127 that has no other interactions with the exception system. … … 130 129 The repeating function creates an object with a destructor before calling 131 130 itself. 132 Comparing this to the empty test gives the time to traverse over and /or131 Comparing this to the empty test gives the time to traverse over and 133 132 unwind a destructor. 134 133 \item Finally: 135 134 The repeating function calls itself inside a try block with a finally clause 136 135 attached. 137 Comparing this to the empty test gives the time to traverse over and /or136 Comparing this to the empty test gives the time to traverse over and 138 137 unwind a finally clause. 139 138 \item Other Handler: 140 139 The repeating function calls itself inside a try block with a handler that 141 willnot match the raised exception, but is of the same kind of handler.142 This means that the EHM will have to check each handler, but willcontinue143 over all of the until it reaches the base of the stack.144 Comparing this to the empty test gives the time to traverse over and /or140 does not match the raised exception, but is of the same kind of handler. 141 This means that the EHM has to check each handler, and continue 142 over all of them until it reaches the base of the stack. 143 Comparing this to the empty test gives the time to traverse over and 145 144 unwind a handler. 146 145 \end{itemize} 147 146 148 147 \paragraph{Cross Try Statement} 149 This group of tests measures the cost setting up exception handling if it is 148 This group of tests measures the cost for setting up exception handling, 149 if it is 150 150 not used (because the exceptional case did not occur). 151 Tests repeatedly cross (enter and leave, execute) a try statement but never152 p reform a raise.151 Tests repeatedly cross (enter, execute and leave) a try statement but never 152 perform a raise. 153 153 \begin{itemize}[nosep] 154 154 \item Handler: … … 165 165 to handle that exception. 166 166 167 There is the pattern shown in \CFA and \Cpp. Java and Python use the same167 Here is the pattern shown in \CFA and \Cpp. Java and Python use the same 168 168 pattern as \Cpp, but with their own syntax. 169 169 … … 229 229 going higher is better than going low) N, the number of iterations of the 230 230 main loop in each test, is varied between tests. It is also given in the 231 results and usually havea value in the millions.231 results and has a value in the millions. 232 232 233 233 An anomaly in some results came from \CFA's use of gcc nested functions. 234 234 These nested functions are used to create closures that can access stack 235 235 variables in their lexical scope. 236 However, if they do so then they can cause the benchmark's run-time to236 However, if they do so, then they can cause the benchmark's run-time to 237 237 increase by an order of magnitude. 238 238 The simplest solution is to make those values global variables instead … … 301 301 \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ 302 302 \hline 303 Resume Empty (10M) & 3.8 & 3.5 & 14.7 & 2.3 & 176.1 & 0.3 & 0.1 & 8.9 & 1.2 & 119.9 \\ 304 %Resume Other (10M) & 4.0* & 0.1* & 21.9 & 6.2 & 381.0 & 0.3* & 0.1* & 13.2 & 5.0 & 290.7 \\ 303 Resume Empty (10M) & 1.5 & 1.5 & 14.7 & 2.3 & 176.1 & 1.0 & 1.4 & 8.9 & 1.2 & 119.9 \\ 305 304 \hline 306 305 \end{tabular} … … 309 308 % Now discuss the results in the tables. 310 309 One result not directly related to \CFA but important to keep in mind is that, 311 for exceptions the standard intuition about which languages should go310 for exceptions, the standard intuition about which languages should go 312 311 faster often does not hold. 313 312 For example, there are a few cases where Python out-performs 314 313 \CFA, \Cpp and Java. 314 \todo{Make sure there are still cases where Python wins.} 315 315 The most likely explanation is that, since exceptions 316 316 are rarely considered to be the common case, the more optimized languages … … 324 324 The only performance requirement is to insure the \CFA EHM has reasonable 325 325 performance for prototyping. 326 Although that may be hard to exactly quantify, webelieve it has succeeded326 Although that may be hard to exactly quantify, I believe it has succeeded 327 327 in that regard. 328 328 Details on the different test cases follow. 329 330 \subsection{Termination \texorpdfstring{(\autoref{t:PerformanceTermination})}{}} 329 331 330 332 \begin{description} … … 332 334 \CFA is slower than \Cpp, but is still faster than the other languages 333 335 and closer to \Cpp than other languages. 334 This is to be expected as \CFA is closer to \Cpp than the other languages. 336 This result is to be expected, 337 as \CFA is closer to \Cpp than the other languages. 335 338 336 339 \item[D'tor Traversal] 337 Running destructors causes huge slowdown in every language that supports340 Running destructors causes a huge slowdown in the two languages that support 338 341 them. \CFA has a higher proportionate slowdown but it is similar to \Cpp's. 339 Considering the amount of work done in destructors is so low the cost 340 likely comes from the change of context required to do that work. 342 Considering the amount of work done in destructors is effectively zero 343 (an assembly comment), the cost 344 must come from the change of context required to run the destructor. 341 345 342 346 \item[Finally Traversal] 343 Speedis similar to Empty Traversal in all languages that support finally347 Performance is similar to Empty Traversal in all languages that support finally 344 348 clauses. Only Python seems to have a larger than random noise change in 345 349 its run-time and it is still not large. 346 350 Despite the similarity between finally clauses and destructors, 347 finally clauses seem to avoid the spike inrun-time destructors have.351 finally clauses seem to avoid the spike that run-time destructors have. 348 352 Possibly some optimization removes the cost of changing contexts. 349 353 \todo{OK, I think the finally clause may have been optimized out.} … … 354 358 This results in a significant jump. 355 359 356 Other languages experi ance a small increase in run-time.360 Other languages experience a small increase in run-time. 357 361 The small increase likely comes from running the checks, 358 362 but they could avoid the spike by not having the same kind of overhead for 359 363 switching to the check's context. 360 361 \todo{Could revist Other Traversal, after Finally Traversal.} 364 \todo{Could revisit Other Traversal, after Finally Traversal.} 362 365 363 366 \item[Cross Handler] 364 367 Here \CFA falls behind \Cpp by a much more significant margin. 365 368 This is likely due to the fact \CFA has to insert two extra function 366 calls while \Cpp doesn't have to do execute any other instructions.369 calls, while \Cpp does not have to do execute any other instructions. 367 370 Python is much further behind. 368 371 … … 370 373 \CFA's performance now matches \Cpp's from Cross Handler. 371 374 If the code from the finally clause is being inlined, 372 which is just a asm comment, than there are no additional instructions375 which is just an asm comment, than there are no additional instructions 373 376 to execute again when exiting the try statement normally. 374 377 375 378 \item[Conditional Match] 376 Both of the conditional matching tests can be considered on their own ,377 however for evaluating the value of conditional matching itselfthe379 Both of the conditional matching tests can be considered on their own. 380 However for evaluating the value of conditional matching itself, the 378 381 comparison of the two sets of results is useful. 379 382 Consider the massive jump in run-time for \Cpp going from match all to match … … 384 387 possibly through resource reuse or their program representation. 385 388 However \CFA is built like \Cpp and avoids the problem as well, this matches 386 the pattern of the conditional match which makes the two execution paths387 much moresimilar.389 the pattern of the conditional match, which makes the two execution paths 390 very similar. 388 391 389 392 \end{description} 390 393 391 Moving on to resumption there is one general note, 392 resumption is \textit{fast}, the only test where it fell 394 \subsection{Resumption \texorpdfstring{(\autoref{t:PerformanceResumption})}{}} 395 396 Moving on to resumption, there is one general note, 397 resumption is \textit{fast}. The only test where it fell 393 398 behind termination is Cross Handler. 394 399 In every other case, the number of iterations had to be increased by a 395 factor of 10 to get the run-time in an appropr ate range400 factor of 10 to get the run-time in an appropriate range 396 401 and in some cases resumption still took less time. 397 402 … … 401 406 \item[Empty Traversal] 402 407 See above for the general speed-up notes. 403 This result is not surprising as resumption's link 408 This result is not surprising as resumption's linked-list approach 404 409 means that traversing over stack frames without a resumption handler is 405 410 $O(1)$. … … 408 413 Resumption does have the same spike in run-time that termination has. 409 414 The run-time is actually very similar to Finally Traversal. 410 As resumption does not unwind the stack both destructors and finally411 clauses are run while walking down the stack normally.415 As resumption does not unwind the stack, both destructors and finally 416 clauses are run while walking down the stack during the recursive returns. 412 417 So it follows their performance is similar. 413 418 414 419 \item[Finally Traversal] 415 The increase in run-time fromm Empty Traversal (once adjusted for 416 the number of iterations) roughly the same as for termination. 417 This suggests that the 420 Same as D'tor Traversal, 421 except termination did not have a spike in run-time on this test case. 418 422 419 423 \item[Other Traversal] … … 426 430 The only test case where resumption could not keep up with termination, 427 431 although the difference is not as significant as many other cases. 428 It is simply a matter of where the costs come from . Even if \CFA termination429 is not ``zero-cost" passing through an empty function still seems to be 430 cheaper than updating global values.432 It is simply a matter of where the costs come from, 433 both termination and resumption have some work to set-up or tear-down a 434 handler. It just so happens that resumption's work is slightly slower. 431 435 432 436 \item[Conditional Match] … … 437 441 \end{description} 438 442 443 \subsection{Resumption/Fixup \texorpdfstring{(\autoref{t:PerformanceFixupRoutines})}{}} 444 439 445 Finally are the results of the resumption/fixup routine comparison. 440 These results are surprisingly varied , it is possible that creating a closure446 These results are surprisingly varied. It is possible that creating a closure 441 447 has more to do with performance than passing the argument through layers of 442 448 calls. 443 Even with 100 stack frames though, resumption is only about as fast as 444 manually passing a fixup routine. 445 So there is a cost for the additional power and flexibility exceptions 446 provide. 449 At 100 stack frames, resumption and manual fixup routines have similar 450 performance in \CFA. 451 More experiments could try to tease out the exact trade-offs, 452 but the prototype's only performance goal is to be reasonable. 453 It has already in that range, and \CFA's fixup routine simulation is 454 one of the faster simulations as well. 455 Plus exceptions add features and remove syntactic overhead, 456 so even at similar performance resumptions have advantages 457 over fixup routines.
Note: See TracChangeset
for help on using the changeset viewer.