# Changeset 9cdfa5fb for doc/theses/andrew_beach_MMath/performance.tex

Ignore:
Timestamp:
Sep 10, 2021, 10:43:15 AM (4 months ago)
Branches:
master
Children:
63b3279
Parents:
d0b9247
Message:

Andrew MMath: Used (most of) Gregor's feedback to update the thesis. There are still a few \todo items as well as a general request for examples.

File:
1 edited

### Legend:

Unmodified
 rd0b9247 Instead, the focus was to get the features working. The only performance requirement is to ensure the tests for correctness run in a reasonable amount of time. Hence, a few basic performance tests were performed to amount of time. Hence, only a few basic performance tests were performed to check this requirement. one with termination and one with resumption. C++ is the most comparable language because both it and \CFA use the same GCC C++ is the most comparable language because both it and \CFA use the same framework, libunwind. In fact, the comparison is almost entirely in quality of implementation. the number used in the timing runs is given with the results per test. The Java tests run the main loop 1000 times before beginning the actual test to warm-up" the JVM. beginning the actual test to warm up" the JVM. % All other languages are precompiled or interpreted. unhandled exceptions in \Cpp and Java as that would cause the process to terminate. Luckily, performance on the give-up and kill the process" path is not Luckily, performance on the give up and kill the process" path is not critical. using gcc-10 10.3.0 as a backend. g++-10 10.3.0 is used for \Cpp. Java tests are complied and run with version 11.0.11. Python used version 3.8.10. Java tests are complied and run with Oracle OpenJDK version 11.0.11. Python used CPython version 3.8.10. The machines used to run the tests are: \begin{itemize}[nosep] \lstinline{@} 2.5 GHz running Linux v5.11.0-25 \end{itemize} Representing the two major families of hardware architecture. These represent the two major families of hardware architecture. \section{Tests} \paragraph{Stack Traversal} This group measures the cost of traversing the stack, This group of tests measures the cost of traversing the stack (and in termination, unwinding it). Inside the main loop is a call to a recursive function. This group of tests measures the cost for setting up exception handling, if it is not used (because the exceptional case did not occur). not used because the exceptional case did not occur. Tests repeatedly cross (enter, execute and leave) a try statement but never perform a raise. for that language and the result is marked N/A. There are also cases where the feature is supported but measuring its cost is impossible. This happened with Java, which uses a JIT that optimize away the tests and it cannot be stopped.\cite{Dice21} cost is impossible. This happened with Java, which uses a JIT that optimizes away the tests and cannot be stopped.\cite{Dice21} These tests are marked N/C. To get results in a consistent range (1 second to 1 minute is ideal, results and has a value in the millions. An anomaly in some results came from \CFA's use of gcc nested functions. An anomaly in some results came from \CFA's use of GCC nested functions. These nested functions are used to create closures that can access stack variables in their lexical scope. However, if they do so, then they can cause the benchmark's run-time to However, if they do so, then they can cause the benchmark's run time to increase by an order of magnitude. The simplest solution is to make those values global variables instead of function local variables. of function-local variables. % Do we know if editing a global inside nested function is a problem? Tests that had to be modified to avoid this problem have been marked \CFA, \Cpp and Java. % To be exact, the Match All and Match None cases. %\todo{Not true in Python.} The most likely explanation is that, since exceptions are rarely considered to be the common case, the more optimized languages Performance is similar to Empty Traversal in all languages that support finally clauses. Only Python seems to have a larger than random noise change in its run-time and it is still not large. its run time and it is still not large. Despite the similarity between finally clauses and destructors, finally clauses seem to avoid the spike that run-time destructors have. finally clauses seem to avoid the spike that run time destructors have. Possibly some optimization removes the cost of changing contexts. This results in a significant jump. Other languages experience a small increase in run-time. Other languages experience a small increase in run time. The small increase likely comes from running the checks, but they could avoid the spike by not having the same kind of overhead for \item[Cross Handler] Here \CFA falls behind \Cpp by a much more significant margin. This is likely due to the fact \CFA has to insert two extra function calls, while \Cpp does not have to do execute any other instructions. Here, \CFA falls behind \Cpp by a much more significant margin. This is likely due to the fact that \CFA has to insert two extra function calls, while \Cpp does not have to execute any other instructions. Python is much further behind. \item[Conditional Match] Both of the conditional matching tests can be considered on their own. However for evaluating the value of conditional matching itself, the However, for evaluating the value of conditional matching itself, the comparison of the two sets of results is useful. Consider the massive jump in run-time for \Cpp going from match all to match Consider the massive jump in run time for \Cpp going from match all to match none, which none of the other languages have. Some strange interaction is causing run-time to more than double for doing Some strange interaction is causing run time to more than double for doing twice as many raises. Java and Python avoid this problem and have similar run-time for both tests, Java and Python avoid this problem and have similar run time for both tests, possibly through resource reuse or their program representation. However \CFA is built like \Cpp and avoids the problem as well, this matches However, \CFA is built like \Cpp, and avoids the problem as well. This matches the pattern of the conditional match, which makes the two execution paths very similar. \subsection{Resumption \texorpdfstring{(\autoref{t:PerformanceResumption})}{}} Moving on to resumption, there is one general note, Moving on to resumption, there is one general note: resumption is \textit{fast}. The only test where it fell behind termination is Cross Handler. In every other case, the number of iterations had to be increased by a factor of 10 to get the run-time in an appropriate range factor of 10 to get the run time in an appropriate range and in some cases resumption still took less time. \item[D'tor Traversal] Resumption does have the same spike in run-time that termination has. The run-time is actually very similar to Finally Traversal. Resumption does have the same spike in run time that termination has. The run time is actually very similar to Finally Traversal. As resumption does not unwind the stack, both destructors and finally clauses are run while walking down the stack during the recursive returns. \item[Finally Traversal] Same as D'tor Traversal, except termination did not have a spike in run-time on this test case. except termination did not have a spike in run time on this test case. \item[Other Traversal] The only test case where resumption could not keep up with termination, although the difference is not as significant as many other cases. It is simply a matter of where the costs come from, both termination and resumption have some work to set-up or tear-down a It is simply a matter of where the costs come from: both termination and resumption have some work to set up or tear down a handler. It just so happens that resumption's work is slightly slower. Resumption shows a slight slowdown if the exception is not matched by the first handler, which follows from the fact the second handler now has to be checked. However the difference is not large. to be checked. However, the difference is not large. \end{description} More experiments could try to tease out the exact trade-offs, but the prototype's only performance goal is to be reasonable. It has already in that range, and \CFA's fixup routine simulation is It is already in that range, and \CFA's fixup routine simulation is one of the faster simulations as well. Plus exceptions add features and remove syntactic overhead, so even at similar performance resumptions have advantages Plus, exceptions add features and remove syntactic overhead, so even at similar performance, resumptions have advantages over fixup routines.