# Changeset 7372065

Ignore:
Timestamp:
Aug 21, 2021, 5:49:45 PM (19 months ago)
Branches:
enum, forall-pointer-decay, jacob/cs343-translation, master, pthread-emulation, qualifiedEnum
Children:
c2a9d88
Parents:
d8f8d08
Message:

Saved and reverted another set of Peter's changes.

Location:
doc/theses/andrew_beach_MMath
Files:
4 edited

Unmodified
Removed
• ## doc/theses/andrew_beach_MMath/conclusion.tex

 rd8f8d08 \chapter{Conclusion} \label{c:conclusion} % Just a little knot to tie the paper together. In the previous chapters this thesis presents the design and implementation of \CFA's EHM.  Both the design and implementation are based off of tools and techniques developed for other programming languages but they were adapted to better fit \CFA's feature set and add a few features that do not exist in other EHMs, like conditional catch, default handlers, implicitly changing resumption into termination in the resumption default handler, and cancellation through coroutines and threads back to program main. In the previous chapters this thesis presents the design and implementation of \CFA's exception handling mechanism (EHM). Both the design and implementation are based off of tools and techniques developed for other programming languages but they were adapted to better fit \CFA's feature set. The resulting features cover all of the major use cases of the most popular termination EHMs of today, along with reintroducing resumption exceptions and creating some new features that fit with \CFA's larger programming patterns, such as virtuals independent of traditional objects. creating some new features that fix with \CFA's larger programming patterns. The implementation has been tested through a set of small but interesting micro-benchmarks and compared to other implementations. The implementation has been tested and compared to other implementations. The results, while not cutting edge, are good enough for prototyping, which is \CFA's current stage of development. is \CFA's stage of development. This initial EHM is a valuable new feature for \CFA in its own right but also serves as a tool and motivation for other developments in the language. This is a valuable new feature for \CFA in its own right but also serves as a tool (and motivation) for other developments in the language.
• ## doc/theses/andrew_beach_MMath/future.tex

 rd8f8d08 \label{c:future} The following discussion covers both missing language features that affected my work and research based improvements. \section{Language Improvements} \todo{Future/Language Improvements seems to have gotten mixed up. It is presented as waiting on language improvements" but really its more non-research based impovements.} \CFA is a developing programming language. As such, there are partially or unimplemented features (including several broken components) that I had to workaround while building an EHM largely in unimplemented features of the language (including several broken components) that I had to workaround while building an exception handling system largely in the \CFA language (some C components).  The following are a few of these issues, and once implemented/fixed, how they would affect the exception system. \item The implementation of termination is not portable because it includes hand-crafted assembly statements for each architecture, where the ARM processor was just added. % The existing compilers cannot translate that for other platforms and those % sections must be ported by hand to Supporting more hardware architectures in a general way is important. hand-crafted assembly statements. The existing compilers cannot translate that for other platforms and those sections must be ported by hand to support more hardware architectures, such as the ARM processor. \item Due to a type-system problem, the catch clause cannot bind the exception to a @return@, \etc. The reason is that current code generation hoists a handler into a nested function for convenience (versus assemble-code generation at the @try@ statement). Hence, when the handler runs, its can access local variable in the lexical scope of the @try@ statement, but the closure does not capture local control-flow points so it cannot perform non-local transfers in the hoisted function. @try@ statement). Hence, when the handler runs, its code is not in the lexical scope of the @try@ statement, where the local control-flow transfers are meaningful. \item There is no detection of colliding unwinds. It is possible for clean-up code run during an unwind to trigger another unwind that escapes the clean-up code itself; such as a termination exception caught further down the stack or a cancellation. There do exist ways to handle this case, but currently there is no detection and the first unwind is simply forgotten, often leaving cancellation. There do exist ways to handle this but currently they are not even detected and the first unwind will simply be forgotten, often leaving it in a bad state. \item Finally, the exception system has not have a lot programmer testing. More time with encouraged usage will reveal new quality of life upgrades that can be made. Also the exception system did not have a lot of time to be tried and tested. So just letting people use the exception system more will reveal new quality of life upgrades that can be made with time. \end{itemize} project, but was thrust upon it to do exception inheritance; hence, only minimal work is done. A draft for a complete virtual system is available but not finalized.  A future \CFA project is to complete that work and then it is not finalized.  A future \CFA project is to complete that work and then update the exception system that uses the current version. exception traits. The most important one is an assertion to check one virtual type is a child of another. This check precisely captures many of the current ad-hoc correctness requirements. correctness requirements. The full virtual system might also include other improvement like associated types to allow traits to refer to types not listed in their header. This feature allows exception traits to not refer to the virtual-table type explicitly. %, removing the need for the current interface macros. explicitly, removing the need for the current interface macros. \section{Additional Raises} Checked exceptions make exceptions part of a function's type by adding an exception signature. An exception signature must declare all checked exceptions that could propagate from the function, either because they were raised inside the function or a call to a sub-function. This improves safety exceptions that could propagate from the function (either because they were raised inside the function or came from a sub-function). This improves safety by making sure every checked exception is either handled or consciously passed on. However checked exceptions were never seriously considered for this project because they have significant trade-offs in usability and code reuse in because they have significant trade-offs in usablity and code reuse in exchange for the increased safety. These trade-offs are most problematic when trying to pass exceptions through not support a successful-exiting stack-search without doing an unwind. Workarounds are possible but awkward. Ideally an extension to libunwind could be made, but that would either require separate maintenance or gaining enough support to have it folded into the code base. be made, but that would either require separate maintenance or gain enough support to have it folded into the standard. Also new techniques to skip previously searched parts of the stack need to be to leave the handler. Currently, mimicking this behaviour in \CFA is possible by throwing a termination exception inside a resumption handler. termination inside a resumption handler. % Maybe talk about the escape; and escape CONTROL_STMT; statements or how
• ## doc/theses/andrew_beach_MMath/implement.tex

 rd8f8d08 \label{c:performance} Performance is of secondary importance for most of this project. Instead, the focus was to get the features working. The only performance requirement is to ensure the tests for correctness run in a reasonable amount of time. Hence, a few basic performance tests were performed to check this requirement. Performance has been of secondary importance for most of this project. Instead, the focus has been to get the features working. The only performance requirements is to ensure the tests for correctness run in a reasonable amount of time. \section{Test Set-Up} Tests were run in \CFA, C++, Java and Python. Tests will be run in \CFA, C++, Java and Python. In addition there are two sets of tests for \CFA, one for termination and one for resumption exceptions. one for termination exceptions and once with resumption exceptions. C++ is the most comparable language because both it and \CFA use the same framework, libunwind. In fact, the comparison is almost entirely a quality of implementation. Specifically, \CFA's EHM has had significantly less time to be optimized and In fact, the comparison is almost entirely a quality of implementation comparison. \CFA's EHM has had significantly less time to be optimized and does not generate its own assembly. It does have a slight advantage in that there are some features it handles directly instead of through utility functions, but otherwise \Cpp should have a significant advantage. Java is a popular language with similar termination semantics, but it is implemented in a very different environment, a virtual machine with there are some features it does not handle, through utility functions, but otherwise \Cpp has a significant advantage. Java is another very popular language with similar termination semantics. It is implemented in a very different environment, a virtual machine with garbage collection. It also implements the @finally@ clause on @try@ blocks allowing for a direct It also implements the finally clause on try blocks allowing for a direct feature-to-feature comparison. As with \Cpp, Java's implementation is mature, optimized and has extra features. Python is used as an alternative comparison because of the \CFA EHM's current performance goals, which is not to be prohibitively slow while the As with \Cpp, Java's implementation is more mature, has more optimizations and more extra features. Python was used as a point of comparison because of the \CFA EHM's current performance goals, which is not be prohibitively slow while the features are designed and examined. Python has similar performance goals for creating quick scripts and its wide use suggests it has achieved those goals. Unfortunately, there are no notable modern programming languages with resumption exceptions. Even the older programming languages with resumption seem to be notable only for having resumption. So instead, resumption is compared to its simulation in other programming languages using fixup functions that are explicitly passed for correction or logging purposes. % So instead, resumption is compared to a less similar but much more familiar %feature, termination exceptions. All tests are run inside a main loop that repeatedly performs a test. This approach avoids start-up or tear-down time from Unfortunately there are no notable modern programming languages with resumption exceptions. Even the older programming languages with resumptions seem to be notable only for having resumptions. So instead resumptions are compared to a less similar but much more familiar feature, termination exceptions. All tests are run inside a main loop which will perform the test repeatedly. This is to avoids start-up or tear-down time from affecting the timing results. Each test is run a N times (configurable from the command line). The Java tests runs the main loop 1000 times before beginning the actual test to warm-up" the JVM. Tests ran their main loop a million times. The Java versions of the test also run this loop an extra 1000 times before beginning to time the results to warm-up" the JVM. Timing is done internally, with time measured immediately before and after the test loop. The difference is calculated and printed. immediately after the test loop. The difference is calculated and printed. The loop structure and internal timing means it is impossible to test unhandled exceptions in \Cpp and Java as that would cause the process to critical. The exceptions used in these tests are always based off of a base exception. This requirement minimizes performance differences based on the object model used to represent the exception. All tests are designed to be as minimal as possible, while still preventing excessive optimizations. The exceptions used in these tests will always be a exception based off of the base exception. This requirement minimizes performance differences based on the object model used to repersent the exception. All tests were designed to be as minimal as possible while still preventing exessive optimizations. For example, empty inline assembly blocks are used in \CFA and \Cpp to prevent excessive optimizations while adding no actual work. Each test was run eleven times. The top three and bottom three results were discarded and the remaining five values are averaged. The tests are compiled with gcc-10 for \CFA and g++-10 for \Cpp. Java is compiled with version 11.0.11. Python with version 3.8. The tests were run on: \begin{itemize}[nosep] \item ARM 2280 Kunpeng 920 48-core 2$\times$socket \lstinline{@} 2.6 GHz running Linux v5.11.0-25 \item AMD 6380 Abu Dhabi 16-core 4$\times$socket \lstinline{@} 2.5 GHz running Linux v5.11.0-25 \end{itemize} Two kinds of hardware architecture allows discriminating any implementation and architectural effects. % We don't use catch-alls but if we did: The following tests were selected to test the performance of different components of the exception system. They should provide a guide as to where the EHM's costs are found. The should provide a guide as to where the EHM's costs can be found. \paragraph{Raise and Handle} This group measures the cost of a try statement when exceptions are raised and the stack is unwound (termination) or not unwound (resumption).  Each test has has a repeating function like the following \begin{lstlisting}[language=CFA,{moredelim=**[is][\color{red}]{@}{@}}] void unwind_empty(unsigned int frames) { if (frames) { @unwind_empty(frames - 1);@ // AUGMENTED IN OTHER EXPERIMENTS } else throw (empty_exception){&empty_vt}; } \end{lstlisting} which is called N times, where each call recurses to a depth of R (configurable from the command line), an exception is raised, the stack is a unwound, and the exception caught. The first group of tests involve setting up So there is three layers to the test. The first is set up and a loop, which configures the test and then runs it repeatedly to reduce the impact of start-up and shutdown on the results. Each iteration of the main loop \begin{itemize}[nosep] \item Empty: For termination, this test measures the cost of raising (stack walking) an exception through empty stack frames from the bottom of the recursion to an empty handler, and unwinding the stack. (see above code) \medskip For resumption, this test measures the same raising cost but does not unwind the stack. For languages without resumption, a fixup function is to the bottom of the recursion and called to simulate a fixup operation at that point. \begin{cfa} void nounwind_fixup(unsigned int frames, void (*raised_rtn)(int &)) { if (frames) { nounwind_fixup(frames - 1, raised_rtn); } else { int fixup = 17; raised_rtn(fixup); } } \end{cfa} where the passed fixup function is: \begin{cfa} void raised(int & fixup) { fixup = 42; } \end{cfa} For comparison, a \CFA version passing a function is also included. \item Empty Function: The repeating function is empty except for the necessary control code. \item Destructor: This test measures the cost of raising an exception through non-empty frames, where each frame has an object requiring destruction, from the bottom of the recursion to an empty handler. Hence, there are N destructor calls during unwinding. \medskip This test is not meaningful for resumption because the stack is only unwound as the recursion returns. \begin{cfa} WithDestructor object; unwind_destructor(frames - 1); \end{cfa} The repeating function creates an object with a destructor before calling itself. \item Finally: This test measures the cost of establishing a try block with an empty finally clause on the front side of the recursion and running the empty finally clauses during stack unwinding from the bottom of the recursion to an empty handler. \begin{cfa} try { unwind_finally(frames - 1); } finally {} \end{cfa} \medskip This test is not meaningful for resumption because the stack is only unwound as the recursion returns. The repeating function calls itself inside a try block with a finally clause attached. \item Other Handler: For termination, this test is like the finally test but the try block has a catch clause for an exception that is not raised, so catch matching is executed during stack unwinding but the match never successes until the catch at the bottom of the recursion. \begin{cfa} try { unwind_other(frames - 1); } catch (not_raised_exception *) {} \end{cfa} \medskip For resumption, this test measures the same raising cost but does not unwind the stack. For languages without resumption, the same fixup function is passed and called. The repeating function calls itself inside a try block with a handler that will not match the raised exception. (But is of the same kind of handler.) \end{itemize} \paragraph{Try/Handle/Finally Statement} This group measures just the cost of executing a try statement so \emph{there is no stack unwinding}.  Hence, the program main loops N times around: \begin{cfa} try { } catch (not_raised_exception *) {} \end{cfa} \paragraph{Cross Try Statement} The next group measures the cost of a try statement when no exceptions are raised. The test is set-up, then there is a loop to reduce the impact of start-up and shutdown on the results. In each iteration, a try statement is executed. Entering and leaving a loop is all the test wants to do. \begin{itemize}[nosep] \item Handler: The try statement has a handler (catch/resume). The try statement has a handler (of the matching kind). \item Finally: The try statement has a finally clause. \paragraph{Conditional Matching} This group measures the cost of conditional matching. This group of tests checks the cost of conditional matching. Only \CFA implements the language level conditional match, the other languages mimic with an unconditional" match (it still checks the exception's type) and conditional re-raise if it is not suppose the other languages must mimic with an unconditional" match (it still checks the exception's type) and conditional re-raise if it was not supposed to handle that exception. \begin{center} \begin{tabular}{ll} \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp, Java, Python} \\ \begin{cfa} try { throw_exception(); } catch (empty_exception * exc; should_catch) { } \end{cfa} & \begin{cfa} try { throw_exception(); } catch (EmptyException & exc) { if (!should_catch) throw; } \end{cfa} \end{tabular} \end{center} \begin{itemize}[nosep] \item Match All: \end{itemize} \medskip \noindent All omitted test code for other languages is functionally identical to the \CFA tests or simulated, and available online~\cite{CforallExceptionBenchmarks}. %\section{Cost in Size} %Using exceptions also has a cost in the size of the executable. \section{Results} One result not directly related to \CFA but important to keep in mind is that, for exceptions, the standard intuition about which languages should go faster often does not hold. For example, there are a few cases where Python out-performs \CFA, \Cpp and Java. The most likely explanation is that, since exceptions are rarely considered to be the common case, the more optimized languages make that case expense. In addition, languages with high-level representations have a much easier time scanning the stack as there is less to decode. Tables~\ref{t:PerformanceTermination} and~\ref{t:PerformanceResumption} show the test results for termination and resumption, respectively.  In cases where a feature is not supported by a language, the test is skipped for that language (marked N/A).  For some Java experiments it was impossible to measure certain effects because the JIT corrupted the test (marked N/C). No workaround was possible~\cite{Dice21}.  To get experiments in the range of 1--100 seconds, the number of times an experiment is run (N) is varied (N is marked beside each experiment, e.g., 1M $\Rightarrow$ 1 million test iterations). An anomaly exists with gcc nested functions used as thunks for implementing much of the \CFA EHM. If a nested-function closure captures local variables in its lexical scope, performance dropped by a factor of 10.  Specifically, in try statements of the form: \begin{cfa} try { unwind_other(frames - 1); } catch (not_raised_exception *) {} \end{cfa} the try block is hoisted into a nested function and the variable @frames@ is the local parameter to the recursive function, which triggers the anomaly. The workaround is to remove the recursion parameter and make it a global variable that is explicitly decremented outside of the try block (marked with a *''): \begin{cfa} frames -= 1; try { unwind_other(); } catch (not_raised_exception *) {} \end{cfa} To make comparisons fair, a dummy parameter is added and the dummy value passed in the recursion. Note, nested functions in gcc are rarely used (if not completely unknown) and must follow the C calling convention, unlike \Cpp lambdas, so it is not surprising if there are performance issues efficiently capturing closures. % Similarly, if a test does not change between resumption % and termination in \CFA, then only one test is written and the result % was put into the termination column. Each test was run eleven times. The top three and bottom three results were discarded and the remaining five values are averaged. In cases where a feature is not supported by a language the test is skipped for that language. Similarly, if a test is does not change between resumption and termination in \CFA, then only one test is written and the result was put into the termination column. % Raw Data: % Match None    & 0.0 & 0.0 &  9476060146 & 0.0 & 0.0 \\ \begin{tabular}{|l|c c c c c|} \hline & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\ \hline Raise Empty   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ Raise D'tor   & 0.0 & 0.0 & 0.0 & N/A & N/A \\ Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\ Raise Other   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\ Match All     & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ Match None    & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ \hline \end{tabular} % run-plg7a-a.sat % --------------- % Match None    & 0.0 & 0.0 &  7829059869 & 0.0 & 0.0 \\ \begin{table} \centering \caption{Performance Results Termination (sec)} \label{t:PerformanceTermination} \begin{tabular}{|r|*{2}{|r r r r|}} \hline & \multicolumn{4}{c||}{AMD}             & \multicolumn{4}{c|}{ARM}      \\ \cline{2-9} N\hspace{8pt}           & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ \hline Throw Empty (1M)        & 3.4   & 2.8   & 18.3  & 23.4          & 3.7   & 3.2   & 15.5  & 14.8  \\ Throw D'tor (1M)        & 48.4  & 23.6  & N/A   & N/A           & 64.2  & 29.0  & N/A   & N/A   \\ Throw Finally (1M)      & 3.4*  & N/A   & 17.9  & 29.0          & 4.1*  & N/A   & 15.6  & 19.0  \\ Throw Other (1M)        & 3.6*  & 23.2  & 18.2  & 32.7          & 4.0*  & 24.5  & 15.5  & 21.4  \\ Try/Catch (100M)        & 6.0   & 0.9   & N/C   & 37.4          & 10.0  & 0.8   & N/C   & 32.2  \\ Try/Finally (100M)      & 0.9   & N/A   & N/C   & 44.1          & 0.8   & N/A   & N/C   & 37.3  \\ Match All (10M)         & 32.9  & 20.7  & 13.4  & 4.9           & 36.2  & 24.5  & 12.0  & 3.1   \\ Match None (10M)        & 32.7  & 50.3  & 11.0  & 5.1           & 36.3  & 71.9  & 12.3  & 4.2   \\ % PLG7A (in seconds) \begin{tabular}{|l|c c c c c|} \hline & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\ \hline Raise Empty   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ Raise D'tor   & 0.0 & 0.0 & 0.0 & N/A & N/A \\ Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\ Raise Other   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\ Match All     & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ Match None    & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ \hline \end{tabular} \end{table} \begin{table} \centering \small \caption{Performance Results Resumption (sec)} \label{t:PerformanceResumption} \setlength{\tabcolsep}{5pt} \begin{tabular}{|r|*{2}{|r r r r|}} \hline & \multicolumn{4}{c||}{AMD}             & \multicolumn{4}{c|}{ARM}      \\ \cline{2-9} N\hspace{8pt}           & \multicolumn{1}{c}{\CFA (R/F)} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} & \multicolumn{1}{c}{\CFA (R/F)} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ \hline Resume Empty (10M)      & 3.8/3.5       & 14.7  & 2.3   & 176.1 & 0.3/0.1       & 8.9   & 1.2   & 119.9 \\ Resume Other (10M)      & 4.0*/0.1*     & 21.9  & 6.2   & 381.0 & 0.3*/0.1*     & 13.2  & 5.0   & 290.7 \\ Try/Resume (100M)       & 8.8           & N/A   & N/A   & N/A   & 12.3          & N/A   & N/A   & N/A   \\ Match All (10M)         & 0.3           & N/A   & N/A   & N/A   & 0.3           & N/A   & N/A   & N/A   \\ Match None (10M)        & 0.3           & N/A   & N/A   & N/A   & 0.4           & N/A   & N/A   & N/A   \\ \hline \end{tabular} \end{table} As stated, the performance tests are not attempting to compare exception handling across languages.  The only performance requirement is to ensure the \CFA EHM implementation runs in a reasonable amount of time, given its constraints. In general, the \CFA implement did very well. Each of the tests is analysed. \begin{description} \item[Throw/Resume Empty] For termination, \CFA is close to \Cpp, where other languages have a higher cost. For resumption, \CFA is better than the fixup simulations in the other languages, except Java. The \CFA results on the ARM computer for both resumption and function simulation are particularly low; I have no explanation for this anomaly, except the optimizer has managed to remove part of the experiment. Python has a high cost for passing the lambda during the recursion. \item[Throw D'tor] For termination, \CFA is twice the cost of \Cpp. The higher cost for \CFA must be related to how destructors are handled. \item[Throw Finally] \CFA is better than the other languages with a @finally@ clause, which is the same for termination and resumption. \item[Throw/Resume Other] For termination, \CFA is better than the other languages. For resumption, \CFA is equal to or better the other languages. Again, the \CFA results on the ARM computer for both resumption and function simulation are particularly low. Python has a high cost for passing the lambda during the recursion. \item[Try/Catch/Resume] For termination, installing a try statement is more expressive than \Cpp because the try components are hoisted into local functions.  At runtime, these functions are than passed to libunwind functions to set up the try statement. \Cpp zero-cost try-entry accounts for its performance advantage. For resumption, there are similar costs to termination to set up the try statement but libunwind is not used. \item[Try/Finally] Setting up a try finally is less expensive in \CFA than setting up handlers, and is significantly less than other languages. \item[Throw/Resume Match All] For termination, \CFA is close to the other language simulations. For resumption, the stack unwinding is much faster because it does not use libunwind.  Instead resumption is just traversing a linked list with each node being the next stack frame with the try block. \item[Throw/Resume Match None] The same results as for Match All. \end{description} \begin{comment} This observation means that while \CFA does not actually keep up with Python in every case, it is usually no worse than roughly half the speed of \Cpp. This performance is good enough for the prototyping purposes of the project. One result that is not directly related to \CFA but is important to keep in mind is that in exceptions the standard intuitions about which languages should go faster often do not hold. There are cases where Python out-preforms \Cpp and Java. The most likely explination is that, since exceptions are rarely considered to be the common case, the more optimized langages have optimized at their expence. In addition languages with high level repersentations have a much easier time scanning the stack as there is less to decode. This means that while \CFA does not actually keep up with Python in every case it is usually no worse than roughly half the speed of \Cpp. This is good enough for the prototyping purposes of the project. The test case where \CFA falls short is Raise Other, the case where the stack is unwound including a bunch of non-matching handlers. This slowdown seems to come from missing optimizations. This slowdown seems to come from missing optimizations, the results above came from gcc/g++ 10 (gcc as \CFA backend or g++ for \Cpp) but the results change if they are run in gcc/g++ 9 instead. Importantly, there is a huge slowdown in \Cpp's results bringing that brings \CFA's performace back in that roughly half speed area. However many other \CFA benchmarks increase their run-time by a similar amount falling far behind their \Cpp counter-parts. This suggests that the performance issue in Raise Other is just an Resumption exception handling is also incredibly fast. Often an order of magnitude or two better than the best termination speed. There is a simple explanation for this; traversing a linked list is much There is a simple explination for this; traversing a linked list is much faster than examining and unwinding the stack. When resumption does not do as well its when more try statements are used per raise. Updating the internal linked list is not very expensive but it does add up. well its when more try statements are used per raise. Updating the interal linked list is not very expencive but it does add up. The relative speed of the Match All and Match None tests (within each \item Java and Python get similar values in both tests. Between the interpreted code, a higher level representation of the call Between the interperated code, a higher level repersentation of the call stack and exception reuse it it is possible the cost for a second throw can be folded into the first. % Is this due to optimization? \item Both types of \CFA are slightly slower if there is not a match. Both types of \CFA are slighly slower if there is not a match. For termination this likely comes from unwinding a bit more stack through libunwind instead of executing the code normally. The difference in relative performance does show that there are savings to be made by performing the check without catching the exception. \end{comment} \begin{comment} From: Dave Dice To: "Peter A. Buhr" Subject: Re: [External] : JIT Date: Mon, 16 Aug 2021 01:21:56 +0000 > On 2021-8-15, at 7:14 PM, Peter A. Buhr wrote: > > My student is trying to measure the cost of installing a try block with a > finally clause in Java. > > We tried the random trick (see below). But if the try block is comment out, the > results are the same. So the program measures the calls to the random number > generator and there is no cost for installing the try block. > > Maybe there is no cost for a try block with an empty finally, i.e., the try is > optimized away from the get-go. There's quite a bit of optimization magic behind the HotSpot curtains for try-finally.  (I sound like the proverbial broken record (:>)). In many cases we can determine that the try block can't throw any exceptions, so we can elide all try-finally plumbing.  In other cases, we can convert the try-finally to normal if-then control flow, in the case where the exception is thrown into the same method.  This makes exceptions _almost cost-free.  If we actually need to "physically" rip down stacks, then things get expensive, impacting both the throw cost, and inhibiting other useful optimizations at the catch point.  Such "true" throws are not just expensive, they're _very expensive.  The extremely aggressive inlining used by the JIT helps, because we can convert cases where a heavy rip-down would normally needed back into simple control flow. Other quirks involve the thrown exception object.  If it's never accessed then we're apply a nice set of optimizations to avoid its construction.  If it's accessed but never escapes the catch frame (common) then we can also cheat. And if we find we're hitting lots of heavy rip-down cases, the JIT will consider recompilation - better inlining -- to see if we can merge the throw and catch into the same physical frame, and shift to simple branches. In your example below, System.out.print() can throw, I believe.  (I could be wrong, but most IO can throw).  Native calls that throw will "unwind" normally in C++ code until they hit the boundary where they reenter java emitted code, at which point the JIT-ed code checks for a potential pending exception.  So in a sense the throw point is implicitly after the call to the native method, so we can usually make those cases efficient. Also, when we're running in the interpreter and warming up, we'll notice that the == 42 case never occurs, and so when we start to JIT the code, we elide the call to System.out.print(), replacing it (and anything else which appears in that if x == 42 block) with a bit of code we call an "uncommon trap".  I'm presuming we encounter 42 rarely.  So if we ever hit the x == 42 case, control hits the trap, which triggers synchronous recompilation of the method, this time with the call to System.out.print() and, because of that, we now to adapt the new code to handle any traps thrown by print().  This is tricky stuff, as we may need to rebuild stack frames to reflect the newly emitted method.  And we have to construct a weird bit of "thunk" code that allows us to fall back directly into the newly emitted "if" block.  So there's a large one-time cost when we bump into the uncommon trap and recompile, and subsequent execution might get slightly slower as the exception could actually be generated, whereas before we hit the trap, we knew the exception could never be raised. Oh, and things also get expensive if we need to actually fill in the stack trace associated with the exception object.  Walking stacks is hellish. Quite a bit of effort was put into all this as some of the specjvm benchmarks showed significant benefit. It's hard to get sensible measurements as the JIT is working against you at every turn.  What's good for the normal user is awful for anybody trying to benchmark.  Also, all the magic results in fairly noisy and less reproducible results. Regards Dave p.s., I think I've mentioned this before, but throwing in C++ is grim as unrelated throws in different threads take common locks, so nothing scales as you might expect. \end{comment}