Ignore:
Timestamp:
Sep 10, 2021, 10:43:15 AM (4 months ago)
Author:
Andrew Beach <ajbeach@…>
Branches:
master
Children:
63b3279
Parents:
d0b9247
Message:

Andrew MMath: Used (most of) Gregor's feedback to update the thesis. There are still a few \todo items as well as a general request for examples.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/andrew_beach_MMath/performance.tex

    rd0b9247 r9cdfa5fb  
    55Instead, the focus was to get the features working. The only performance
    66requirement is to ensure the tests for correctness run in a reasonable
    7 amount of time. Hence, a few basic performance tests were performed to
     7amount of time. Hence, only a few basic performance tests were performed to
    88check this requirement.
    99
     
    1313one with termination and one with resumption.
    1414
    15 C++ is the most comparable language because both it and \CFA use the same
     15GCC C++ is the most comparable language because both it and \CFA use the same
    1616framework, libunwind.
    1717In fact, the comparison is almost entirely in quality of implementation.
     
    4646the number used in the timing runs is given with the results per test.
    4747The Java tests run the main loop 1000 times before
    48 beginning the actual test to ``warm-up" the JVM.
     48beginning the actual test to ``warm up" the JVM.
    4949% All other languages are precompiled or interpreted.
    5050
     
    5454unhandled exceptions in \Cpp and Java as that would cause the process to
    5555terminate.
    56 Luckily, performance on the ``give-up and kill the process" path is not
     56Luckily, performance on the ``give up and kill the process" path is not
    5757critical.
    5858
     
    7676using gcc-10 10.3.0 as a backend.
    7777g++-10 10.3.0 is used for \Cpp.
    78 Java tests are complied and run with version 11.0.11.
    79 Python used version 3.8.10.
     78Java tests are complied and run with Oracle OpenJDK version 11.0.11.
     79Python used CPython version 3.8.10.
    8080The machines used to run the tests are:
    8181\begin{itemize}[nosep]
     
    8585      \lstinline{@} 2.5 GHz running Linux v5.11.0-25
    8686\end{itemize}
    87 Representing the two major families of hardware architecture.
     87These represent the two major families of hardware architecture.
    8888
    8989\section{Tests}
     
    9393
    9494\paragraph{Stack Traversal}
    95 This group measures the cost of traversing the stack,
     95This group of tests measures the cost of traversing the stack
    9696(and in termination, unwinding it).
    9797Inside the main loop is a call to a recursive function.
     
    147147This group of tests measures the cost for setting up exception handling,
    148148if it is
    149 not used (because the exceptional case did not occur).
     149not used because the exceptional case did not occur.
    150150Tests repeatedly cross (enter, execute and leave) a try statement but never
    151151perform a raise.
     
    222222for that language and the result is marked N/A.
    223223There are also cases where the feature is supported but measuring its
    224 cost is impossible. This happened with Java, which uses a JIT that optimize
    225 away the tests and it cannot be stopped.\cite{Dice21}
     224cost is impossible. This happened with Java, which uses a JIT that optimizes
     225away the tests and cannot be stopped.\cite{Dice21}
    226226These tests are marked N/C.
    227227To get results in a consistent range (1 second to 1 minute is ideal,
     
    230230results and has a value in the millions.
    231231
    232 An anomaly in some results came from \CFA's use of gcc nested functions.
     232An anomaly in some results came from \CFA's use of GCC nested functions.
    233233These nested functions are used to create closures that can access stack
    234234variables in their lexical scope.
    235 However, if they do so, then they can cause the benchmark's run-time to
     235However, if they do so, then they can cause the benchmark's run time to
    236236increase by an order of magnitude.
    237237The simplest solution is to make those values global variables instead
    238 of function local variables.
     238of function-local variables.
    239239% Do we know if editing a global inside nested function is a problem?
    240240Tests that had to be modified to avoid this problem have been marked
     
    312312\CFA, \Cpp and Java.
    313313% To be exact, the Match All and Match None cases.
     314%\todo{Not true in Python.}
    314315The most likely explanation is that, since exceptions
    315316are rarely considered to be the common case, the more optimized languages
     
    346347Performance is similar to Empty Traversal in all languages that support finally
    347348clauses. Only Python seems to have a larger than random noise change in
    348 its run-time and it is still not large.
     349its run time and it is still not large.
    349350Despite the similarity between finally clauses and destructors,
    350 finally clauses seem to avoid the spike that run-time destructors have.
     351finally clauses seem to avoid the spike that run time destructors have.
    351352Possibly some optimization removes the cost of changing contexts.
    352353
     
    356357This results in a significant jump.
    357358
    358 Other languages experience a small increase in run-time.
     359Other languages experience a small increase in run time.
    359360The small increase likely comes from running the checks,
    360361but they could avoid the spike by not having the same kind of overhead for
     
    362363
    363364\item[Cross Handler]
    364 Here \CFA falls behind \Cpp by a much more significant margin.
    365 This is likely due to the fact \CFA has to insert two extra function
    366 calls, while \Cpp does not have to do execute any other instructions.
     365Here, \CFA falls behind \Cpp by a much more significant margin.
     366This is likely due to the fact that \CFA has to insert two extra function
     367calls, while \Cpp does not have to execute any other instructions.
    367368Python is much further behind.
    368369
     
    375376\item[Conditional Match]
    376377Both of the conditional matching tests can be considered on their own.
    377 However for evaluating the value of conditional matching itself, the
     378However, for evaluating the value of conditional matching itself, the
    378379comparison of the two sets of results is useful.
    379 Consider the massive jump in run-time for \Cpp going from match all to match
     380Consider the massive jump in run time for \Cpp going from match all to match
    380381none, which none of the other languages have.
    381 Some strange interaction is causing run-time to more than double for doing
     382Some strange interaction is causing run time to more than double for doing
    382383twice as many raises.
    383 Java and Python avoid this problem and have similar run-time for both tests,
     384Java and Python avoid this problem and have similar run time for both tests,
    384385possibly through resource reuse or their program representation.
    385 However \CFA is built like \Cpp and avoids the problem as well, this matches
     386However, \CFA is built like \Cpp, and avoids the problem as well.
     387This matches
    386388the pattern of the conditional match, which makes the two execution paths
    387389very similar.
     
    391393\subsection{Resumption \texorpdfstring{(\autoref{t:PerformanceResumption})}{}}
    392394
    393 Moving on to resumption, there is one general note,
     395Moving on to resumption, there is one general note:
    394396resumption is \textit{fast}. The only test where it fell
    395397behind termination is Cross Handler.
    396398In every other case, the number of iterations had to be increased by a
    397 factor of 10 to get the run-time in an appropriate range
     399factor of 10 to get the run time in an appropriate range
    398400and in some cases resumption still took less time.
    399401
     
    408410
    409411\item[D'tor Traversal]
    410 Resumption does have the same spike in run-time that termination has.
    411 The run-time is actually very similar to Finally Traversal.
     412Resumption does have the same spike in run time that termination has.
     413The run time is actually very similar to Finally Traversal.
    412414As resumption does not unwind the stack, both destructors and finally
    413415clauses are run while walking down the stack during the recursive returns.
     
    416418\item[Finally Traversal]
    417419Same as D'tor Traversal,
    418 except termination did not have a spike in run-time on this test case.
     420except termination did not have a spike in run time on this test case.
    419421
    420422\item[Other Traversal]
     
    427429The only test case where resumption could not keep up with termination,
    428430although the difference is not as significant as many other cases.
    429 It is simply a matter of where the costs come from,
    430 both termination and resumption have some work to set-up or tear-down a
     431It is simply a matter of where the costs come from:
     432both termination and resumption have some work to set up or tear down a
    431433handler. It just so happens that resumption's work is slightly slower.
    432434
     
    434436Resumption shows a slight slowdown if the exception is not matched
    435437by the first handler, which follows from the fact the second handler now has
    436 to be checked. However the difference is not large.
     438to be checked. However, the difference is not large.
    437439
    438440\end{description}
     
    448450More experiments could try to tease out the exact trade-offs,
    449451but the prototype's only performance goal is to be reasonable.
    450 It has already in that range, and \CFA's fixup routine simulation is
     452It is already in that range, and \CFA's fixup routine simulation is
    451453one of the faster simulations as well.
    452 Plus exceptions add features and remove syntactic overhead,
    453 so even at similar performance resumptions have advantages
     454Plus, exceptions add features and remove syntactic overhead,
     455so even at similar performance, resumptions have advantages
    454456over fixup routines.
Note: See TracChangeset for help on using the changeset viewer.