Index: doc/theses/andrew_beach_MMath/performance.tex
===================================================================
--- doc/theses/andrew_beach_MMath/performance.tex	(revision e8bad5c86203a25b54b5a781370676cdfbb2146f)
+++ doc/theses/andrew_beach_MMath/performance.tex	(revision cfbab079bede7e2b9a9c69c159d6182db091eff9)
@@ -2,51 +2,54 @@
 \label{c:performance}
 
-Performance has been of secondary importance for most of this project.
-Instead, the focus has been to get the features working. The only performance
-requirements is to ensure the tests for correctness run in a reasonable
-amount of time.
+Performance is of secondary importance for most of this project.
+Instead, the focus was to get the features working. The only performance
+requirement is to ensure the tests for correctness run in a reasonable
+amount of time. Hence, a few basic performance tests were performed to
+check this requirement.
 
 \section{Test Set-Up}
-Tests will be run in \CFA, C++, Java and Python.
+Tests were run in \CFA, C++, Java and Python.
 In addition there are two sets of tests for \CFA,
-one for termination exceptions and once with resumption exceptions.
+one for termination and once with resumption.
 
 C++ is the most comparable language because both it and \CFA use the same
 framework, libunwind.
-In fact, the comparison is almost entirely a quality of implementation
-comparison. \CFA's EHM has had significantly less time to be optimized and
+In fact, the comparison is almost entirely in quality of implementation.
+Specifically, \CFA's EHM has had significantly less time to be optimized and
 does not generate its own assembly. It does have a slight advantage in that
-there are some features it does not handle, through utility functions,
-but otherwise \Cpp has a significant advantage.
-
-Java is another very popular language with similar termination semantics.
-It is implemented in a very different environment, a virtual machine with
+\Cpp has to do some extra bookkeeping to support its utility functions,
+but otherwise \Cpp should have a significant advantage.
+
+Java a popular language with similar termination semantics, but
+it is implemented in a very different environment, a virtual machine with
 garbage collection.
 It also implements the finally clause on try blocks allowing for a direct
 feature-to-feature comparison.
-As with \Cpp, Java's implementation is more mature, has more optimizations
-and more extra features.
-
-Python was used as a point of comparison because of the \CFA EHM's
-current performance goals, which is not be prohibitively slow while the
+As with \Cpp, Java's implementation is mature, has more optimizations
+and extra features as compared to \CFA.
+
+Python is used as an alternative comparison because of the \CFA EHM's
+current performance goals, which is to not be prohibitively slow while the
 features are designed and examined. Python has similar performance goals for
 creating quick scripts and its wide use suggests it has achieved those goals.
 
-Unfortunately there are no notable modern programming languages with
-resumption exceptions. Even the older programming languages with resumptions
-seem to be notable only for having resumptions.
-So instead resumptions are compared to a less similar but much more familiar
-feature, termination exceptions.
-
-All tests are run inside a main loop which will perform the test
-repeatedly. This is to avoids start-up or tear-down time from
+Unfortunately, there are no notable modern programming languages with
+resumption exceptions. Even the older programming languages with resumption
+seem to be notable only for having resumption.
+Instead, resumption is compared to its simulation in other programming
+languages: fixup functions that are explicity passed into a function.
+
+All tests are run inside a main loop that repeatedly performs a test.
+This approach avoids start-up or tear-down time from
 affecting the timing results.
+The number of times the loop is run is configurable from the command line,
+the number used in the timing runs is given with the results per test.
 Tests ran their main loop a million times.
-The Java versions of the test also run this loop an extra 1000 times before
-beginning to time the results to ``warm-up" the JVM.
+The Java tests runs the main loop 1000 times before
+beginning the actual test to ``warm-up" the JVM.
+% All other languages are precompiled or interpreted.
 
 Timing is done internally, with time measured immediately before and
-immediately after the test loop. The difference is calculated and printed.
-
+after the test loop. The difference is calculated and printed.
 The loop structure and internal timing means it is impossible to test
 unhandled exceptions in \Cpp and Java as that would cause the process to
@@ -55,10 +58,11 @@
 critical.
 
-The exceptions used in these tests will always be a exception based off of
-the base exception. This requirement minimizes performance differences based
-on the object model used to repersent the exception.
-
-All tests were designed to be as minimal as possible while still preventing
-exessive optimizations.
+The exceptions used in these tests are always based off of
+the base exception for the language.
+This requirement minimizes performance differences based
+on the object model used to represent the exception.
+
+All tests are designed to be as minimal as possible, while still preventing
+excessive optimizations.
 For example, empty inline assembly blocks are used in \CFA and \Cpp to
 prevent excessive optimizations while adding no actual work.
@@ -68,38 +72,86 @@
 % \code{C++}{catch(...)}).
 
+When collecting data each test is run eleven times. The top three and bottom
+three results are discarded and the remaining five values are averaged.
+The test are run with the latest (still pre-release) \CFA compiler was used,
+using gcc-10 as a backend.
+g++-10 is used for \Cpp.
+Java tests are complied and run with version 11.0.11.
+Python used version 3.8.
+The machines used to run the tests are:
+\todo{Get patch versions for python, gcc and g++.}
+\begin{itemize}[nosep]
+\item ARM 2280 Kunpeng 920 48-core 2$\times$socket
+      \lstinline{@} 2.6 GHz running Linux v5.11.0-25
+\item AMD 6380 Abu Dhabi 16-core 4$\times$socket
+      \lstinline{@} 2.5 GHz running Linux v5.11.0-25
+\end{itemize}
+Representing the two major families of hardware architecture.
+
 \section{Tests}
 The following tests were selected to test the performance of different
 components of the exception system.
-The should provide a guide as to where the EHM's costs can be found.
-
-\paragraph{Raise and Handle}
-The first group of tests involve setting up
-So there is three layers to the test. The first is set up and a loop, which
-configures the test and then runs it repeatedly to reduce the impact of
-start-up and shutdown on the results.
-Each iteration of the main loop
+They should provide a guide as to where the EHM's costs are found.
+
+\paragraph{Stack Traversal}
+This group measures the cost of traversing the stack,
+(and in termination, unwinding it).
+Inside the main loop is a call to a recursive function.
+This function calls itself F times before raising an exception.
+F is configurable from the command line, but is usually 100.
+This builds up many stack frames, and any contents they may have,
+before the raise.
+The exception is always handled at the base of the stack.
+For example the Empty test for \CFA resumption looks like:
+\begin{cfa}
+void unwind_empty(unsigned int frames) {
+	if (frames) {
+		unwind_empty(frames - 1);
+	} else {
+		throwResume (empty_exception){&empty_vt};
+	}
+}
+\end{cfa}
+Other test cases have additional code around the recursive call add
+something besides simple stack frames to the stack.
+Note that both termination and resumption will have to traverse over
+the stack but only termination has to unwind it.
 \begin{itemize}[nosep]
-\item Empty Function:
+% \item None:
+% Reuses the empty test code (see below) except that the number of frames
+% is set to 0 (this is the only test for which the number of frames is not
+% 100). This isolates the start-up and shut-down time of a throw.
+\item Empty:
 The repeating function is empty except for the necessary control code.
+As other traversal tests add to this, so it is the baseline for the group
+as the cost comes from traversing over and unwinding a stack frame
+that has no other interactions with the exception system.
 \item Destructor:
 The repeating function creates an object with a destructor before calling
 itself.
+Comparing this to the empty test gives the time to traverse over and/or
+unwind a destructor.
 \item Finally:
 The repeating function calls itself inside a try block with a finally clause
 attached.
+Comparing this to the empty test gives the time to traverse over and/or
+unwind a finally clause.
 \item Other Handler:
 The repeating function calls itself inside a try block with a handler that
-will not match the raised exception. (But is of the same kind of handler.)
+will not match the raised exception, but is of the same kind of handler.
+This means that the EHM will have to check each handler, but will continue
+over all of the until it reaches the base of the stack.
+Comparing this to the empty test gives the time to traverse over and/or
+unwind a handler.
 \end{itemize}
 
 \paragraph{Cross Try Statement}
-The next group measures the cost of a try statement when no exceptions are
-raised. The test is set-up, then there is a loop to reduce the impact of
-start-up and shutdown on the results.
-In each iteration, a try statement is executed. Entering and leaving a loop
-is all the test wants to do.
+This group of tests measures the cost setting up exception handling if it is
+not used (because the exceptional case did not occur).
+Tests repeatedly cross (enter and leave, execute) a try statement but never
+preform a raise.
 \begin{itemize}[nosep]
 \item Handler:
-The try statement has a handler (of the matching kind).
+The try statement has a handler (of the appropriate kind).
 \item Finally:
 The try statement has a finally clause.
@@ -107,9 +159,33 @@
 
 \paragraph{Conditional Matching}
-This group of tests checks the cost of conditional matching.
+This group measures the cost of conditional matching.
 Only \CFA implements the language level conditional match,
-the other languages must mimic with an ``unconditional" match (it still
-checks the exception's type) and conditional re-raise if it was not supposed
+the other languages mimic it with an ``unconditional" match (it still
+checks the exception's type) and conditional re-raise if it is not supposed
 to handle that exception.
+
+There is the pattern shown in \CFA and \Cpp. Java and Python use the same
+pattern as \Cpp, but with their own syntax.
+
+\begin{minipage}{0.45\textwidth}
+\begin{cfa}
+try {
+	...
+} catch (exception_t * e ;
+		should_catch(e)) {
+	...
+}
+\end{cfa}
+\end{minipage}
+\begin{minipage}{0.55\textwidth}
+\begin{lstlisting}[language=C++]
+try {
+	...
+} catch (std::exception & e) {
+	if (!should_catch(e)) throw;
+	...
+}
+\end{lstlisting}
+\end{minipage}
 \begin{itemize}[nosep]
 \item Match All:
@@ -118,4 +194,13 @@
 The condition is always false. (Never matches or always re-raises.)
 \end{itemize}
+
+\paragraph{Resumption Simulation}
+A slightly altered version of the Empty Traversal test is used when comparing
+resumption to fix-up routines.
+The handler, the actual resumption handler or the fix-up routine,
+always captures a variable at the base of the loop,
+and receives a reference to a variable at the raise site, either as a
+field on the exception or an argument to the fix-up routine.
+% I don't actually know why that is here but not anywhere else.
 
 %\section{Cost in Size}
@@ -130,174 +215,232 @@
 
 \section{Results}
-Each test was run eleven times. The top three and bottom three results were
-discarded and the remaining five values are averaged.
-
-In cases where a feature is not supported by a language the test is skipped
-for that language. Similarly, if a test is does not change between resumption
-and termination in \CFA, then only one test is written and the result
-was put into the termination column.
-
-% Raw Data:
-% run-algol-a.sat
-% ---------------
-% Raise Empty   &  82687046678 &  291616256 &   3252824847 & 15422937623 & 14736271114 \\
-% Raise D'tor   & 219933199603 &  297897792 & 223602799362 &         N/A &         N/A \\
-% Raise Finally & 219703078448 &  298391745 &          N/A &         ... & 18923060958 \\
-% Raise Other   & 296744104920 & 2854342084 & 112981255103 & 15475924808 & 21293137454 \\
-% Cross Handler &      9256648 &   13518430 &       769328 &     3486252 &    31790804 \\
-% Cross Finally &       769319 &        N/A &          N/A &     2272831 &    37491962 \\
-% Match All     &   3654278402 &   47518560 &   3218907794 &  1296748192 &   624071886 \\
-% Match None    &   4788861754 &   58418952 &   9458936430 &  1318065020 &   625200906 \\
-%
-% run-algol-thr-c
-% ---------------
-% Raise Empty   &   3757606400 &   36472972 &   3257803337 & 15439375452 & 14717808642 \\
-% Raise D'tor   &  64546302019 &  102148375 & 223648121635 &         N/A &         N/A \\
-% Raise Finally &  64671359172 &  103285005 &          N/A & 15442729458 & 18927008844 \\
-% Raise Other   & 294143497130 & 2630130385 & 112969055576 & 15448220154 & 21279953424 \\
-% Cross Handler &      9646462 &   11955668 &       769328 &     3453707 &    31864074 \\
-% Cross Finally &       773412 &        N/A &          N/A &     2253825 &    37266476 \\
-% Match All     &   3719462155 &   43294042 &   3223004977 &  1286054154 &   623887874 \\
-% Match None    &   4971630929 &   55311709 &   9481225467 &  1310251289 &   623752624 \\
-%
-% run-algol-04-a
-% --------------
-% Raise Empty   & 0.0 & 0.0 &  3250260945 & 0.0 & 0.0 \\
-% Raise D'tor   & 0.0 & 0.0 & 29017675113 & N/A & N/A \\
-% Raise Finally & 0.0 & 0.0 &         N/A & 0.0 & 0.0 \\
-% Raise Other   & 0.0 & 0.0 & 24411823773 & 0.0 & 0.0 \\
-% Cross Handler & 0.0 & 0.0 &      769334 & 0.0 & 0.0 \\
-% Cross Finally & 0.0 & N/A &         N/A & 0.0 & 0.0 \\
-% Match All     & 0.0 & 0.0 &  3254283504 & 0.0 & 0.0 \\
-% Match None    & 0.0 & 0.0 &  9476060146 & 0.0 & 0.0 \\
-
-\begin{tabular}{|l|c c c c c|}
-\hline
-              & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\
-\hline
-Raise Empty   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
-Raise D'tor   & 0.0 & 0.0 & 0.0 & N/A & N/A \\
-Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
-Raise Other   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
-Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
-Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
-Match All     & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
-Match None    & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
+% First, introduce the tables.
+\autoref{t:PerformanceTermination},
+\autoref{t:PerformanceResumption}
+and~\autoref{t:PerformanceFixupRoutines}
+show the test results.
+In cases where a feature is not supported by a language, the test is skipped
+for that language and the result is marked N/A.
+There are also cases where the feature is supported but measuring its
+cost is impossible. This happened with Java, which uses a JIT that optimize
+away the tests and it cannot be stopped.\cite{Dice21}
+These tests are marked N/C.
+To get results in a consistent range (1 second to 1 minute is ideal,
+going higher is better than going low) N, the number of iterations of the
+main loop in each test, is varied between tests. It is also given in the
+results and usually have a value in the millions.
+
+An anomaly in some results came from \CFA's use of gcc nested functions.
+These nested functions are used to create closures that can access stack
+variables in their lexical scope.
+However, if they do so then they can cause the benchmark's run-time to
+increase by an order of magnitude.
+The simplest solution is to make those values global variables instead
+of function local variables.
+% Do we know if editing a global inside nested function is a problem?
+Tests that had to be modified to avoid this problem have been marked
+with a ``*'' in the results.
+
+% Now come the tables themselves:
+% You might need a wider window for this.
+
+\begin{table}[htb]
+\centering
+\caption{Termination Performance Results (sec)}
+\label{t:PerformanceTermination}
+\begin{tabular}{|r|*{2}{|r r r r|}}
+\hline
+                       & \multicolumn{4}{c||}{AMD}         & \multicolumn{4}{c|}{ARM}  \\
+\cline{2-9}
+N\hspace{8pt}          & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} &
+                         \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\
+\hline
+Empty Traversal (1M)   & 3.4   & 2.8   & 18.3  & 23.4      & 3.7   & 3.2   & 15.5  & 14.8  \\
+D'tor Traversal (1M)   & 48.4  & 23.6  & N/A   & N/A       & 64.2  & 29.0  & N/A   & N/A   \\
+Finally Traversal (1M) & 3.4*  & N/A   & 17.9  & 29.0      & 4.1*  & N/A   & 15.6  & 19.0  \\
+Other Traversal (1M)   & 3.6*  & 23.2  & 18.2  & 32.7      & 4.0*  & 24.5  & 15.5  & 21.4  \\
+Cross Handler (100M)   & 6.0   & 0.9   & N/C   & 37.4      & 10.0  & 0.8   & N/C   & 32.2  \\
+Cross Finally (100M)   & 0.9   & N/A   & N/C   & 44.1      & 0.8   & N/A   & N/C   & 37.3  \\
+Match All (10M)        & 32.9  & 20.7  & 13.4  & 4.9       & 36.2  & 24.5  & 12.0  & 3.1   \\
+Match None (10M)       & 32.7  & 50.3  & 11.0  & 5.1       & 36.3  & 71.9  & 12.3  & 4.2   \\
 \hline
 \end{tabular}
-
-% run-plg7a-a.sat
-% ---------------
-% Raise Empty   &  57169011329 &  296612564 &   2788557155 & 17511466039 & 23324548496 \\
-% Raise D'tor   & 150599858014 &  318443709 & 149651693682 &         N/A &         N/A \\
-% Raise Finally & 148223145000 &  373325807 &          N/A &         ... & 29074552998 \\
-% Raise Other   & 189463708732 & 3017109322 &  85819281694 & 17584295487 & 32602686679 \\
-% Cross Handler &      8001654 &   13584858 &      1555995 &     6626775 &    41927358 \\
-% Cross Finally &      1002473 &        N/A &          N/A &     4554344 &    51114381 \\
-% Match All     &   3162460860 &   37315018 &   2649464591 &  1523205769 &   742374509 \\
-% Match None    &   4054773797 &   47052659 &   7759229131 &  1555373654 &   744656403 \\
-%
-% run-plg7a-thr-a
-% ---------------
-% Raise Empty   &   3604235388 &   29829965 &   2786931833 & 17576506385 & 23352975105 \\
-% Raise D'tor   &  46552380948 &  178709605 & 149834207219 &         N/A &         N/A \\
-% Raise Finally &  46265157775 &  177906320 &          N/A & 17493045092 & 29170962959 \\
-% Raise Other   & 195659245764 & 2376968982 &  86070431924 & 17552979675 & 32501882918 \\
-% Cross Handler &    397031776 &   12503552 &      1451225 &     6658628 &    42304965 \\
-% Cross Finally &      1136746 &        N/A &          N/A &     4468799 &    46155817 \\
-% Match All     &   3189512499 &   39124453 &   2667795989 &  1525889031 &   733785613 \\
-% Match None    &   4094675477 &   48749857 &   7850618572 &  1566713577 &   733478963 \\
-%
-% run-plg7a-04-a
-% --------------
-% 0.0 are unfilled.
-% Raise Empty   & 0.0 & 0.0 &  2770781479 & 0.0 & 0.0 \\
-% Raise D'tor   & 0.0 & 0.0 & 23530084907 & N/A & N/A \\
-% Raise Finally & 0.0 & 0.0 &         N/A & 0.0 & 0.0 \\
-% Raise Other   & 0.0 & 0.0 & 23816827982 & 0.0 & 0.0 \\
-% Cross Handler & 0.0 & 0.0 &     1422188 & 0.0 & 0.0 \\
-% Cross Finally & 0.0 & N/A &         N/A & 0.0 & 0.0 \\
-% Match All     & 0.0 & 0.0 &  2671989778 & 0.0 & 0.0 \\
-% Match None    & 0.0 & 0.0 &  7829059869 & 0.0 & 0.0 \\
-
-% PLG7A (in seconds)
-\begin{tabular}{|l|c c c c c|}
-\hline
-              & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\
-\hline
-Raise Empty   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
-Raise D'tor   & 0.0 & 0.0 & 0.0 & N/A & N/A \\
-Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
-Raise Other   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
-Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
-Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
-Match All     & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
-Match None    & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
+\end{table}
+
+\begin{table}[htb]
+\centering
+\caption{Resumption Performance Results (sec)}
+\label{t:PerformanceResumption}
+\begin{tabular}{|r||r||r|}
+\hline
+N\hspace{8pt}
+                        & AMD     & ARM  \\
+\hline
+Empty Traversal (10M)   & 0.2     & 0.3  \\
+D'tor Traversal (10M)   & 1.8     & 1.0  \\
+Finally Traversal (10M) & 1.7     & 1.0  \\
+Other Traversal (10M)   & 22.6    & 25.9 \\
+Cross Handler (100M)    & 8.4     & 11.9 \\
+Match All (100M)        & 2.3     & 3.2  \\
+Match None (100M)       & 2.9     & 3.9  \\
 \hline
 \end{tabular}
-
-One result that is not directly related to \CFA but is important to keep in
-mind is that in exceptions the standard intuitions about which languages
-should go faster often do not hold. There are cases where Python out-preforms
-\Cpp and Java. The most likely explination is that, since exceptions are
-rarely considered to be the common case, the more optimized langages have 
-optimized at their expence. In addition languages with high level            
-repersentations have a much easier time scanning the stack as there is less
-to decode.
-
-This means that while \CFA does not actually keep up with Python in every
-case it is usually no worse than roughly half the speed of \Cpp. This is good
-enough for the prototyping purposes of the project.
-
-The test case where \CFA falls short is Raise Other, the case where the
-stack is unwound including a bunch of non-matching handlers.
-This slowdown seems to come from missing optimizations,
-the results above came from gcc/g++ 10 (gcc as \CFA backend or g++ for \Cpp)
-but the results change if they are run in gcc/g++ 9 instead.
-Importantly, there is a huge slowdown in \Cpp's results bringing that brings
-\CFA's performace back in that roughly half speed area. However many other
-\CFA benchmarks increase their run-time by a similar amount falling far
-behind their \Cpp counter-parts.
-
-This suggests that the performance issue in Raise Other is just an
-optimization not being applied. Later versions of gcc may be able to
-optimize this case further, at least down to the half of \Cpp mark.
-A \CFA compiler that directly produced assembly could do even better as it
-would not have to work across some of \CFA's current abstractions, like
-the try terminate function.
-
-Resumption exception handling is also incredibly fast. Often an order of
-magnitude or two better than the best termination speed.
-There is a simple explination for this; traversing a linked list is much   
-faster than examining and unwinding the stack. When resumption does not do as
-well its when more try statements are used per raise. Updating the interal
-linked list is not very expencive but it does add up.
-
-The relative speed of the Match All and Match None tests (within each
-language) can also show the effectiveness conditional matching as compared
-to catch and rethrow.
-\begin{itemize}[nosep]
-\item
-Java and Python get similar values in both tests.
-Between the interperated code, a higher level repersentation of the call
-stack and exception reuse it it is possible the cost for a second
-throw can be folded into the first.
-% Is this due to optimization?
-\item
-Both types of \CFA are slighly slower if there is not a match.
-For termination this likely comes from unwinding a bit more stack through
-libunwind instead of executing the code normally.
-For resumption there is extra work in traversing more of the list and running
-more checks for a matching exceptions.
-% Resumption is a bit high for that but this is my best theory.
-\item
-Then there is \Cpp, which takes 2--3 times longer to catch and rethrow vs.
-just the catch. This is very high, but it does have to repeat the same
-process of unwinding the stack and may have to parse the LSDA of the function
-with the catch and rethrow twice, once before the catch and once after the
-rethrow.
-% I spent a long time thinking of what could push it over twice, this is all
-% I have to explain it.
-\end{itemize}
-The difference in relative performance does show that there are savings to
-be made by performing the check without catching the exception.
+\end{table}
+
+\begin{table}[htb]
+\centering
+\small
+\caption{Resumption/Fixup Routine Comparison (sec)}
+\label{t:PerformanceFixupRoutines}
+\setlength{\tabcolsep}{5pt}
+\begin{tabular}{|r|*{2}{|r r r r r|}}
+\hline
+            & \multicolumn{5}{c||}{AMD}     & \multicolumn{5}{c|}{ARM}  \\
+\cline{2-11}
+N\hspace{8pt}       & \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} &
+              \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\
+\hline
+Resume Empty (10M)  & 3.8  & 3.5  & 14.7  & 2.3   & 176.1 & 0.3  & 0.1  & 8.9   & 1.2   & 119.9 \\
+%Resume Other (10M)  & 4.0* & 0.1* & 21.9  & 6.2   & 381.0 & 0.3* & 0.1* & 13.2  & 5.0   & 290.7 \\
+\hline
+\end{tabular}
+\end{table}
+
+% Now discuss the results in the tables.
+One result not directly related to \CFA but important to keep in mind is that,
+for exceptions the standard intuition about which languages should go
+faster often does not hold.
+For example, there are a few cases where Python out-performs
+\CFA, \Cpp and Java.
+The most likely explanation is that, since exceptions
+are rarely considered to be the common case, the more optimized languages
+make that case expensive to improve other cases.
+In addition, languages with high-level representations have a much
+easier time scanning the stack as there is less to decode.
+
+As stated,
+the performance tests are not attempting to show \CFA has a new competitive
+way of implementing exception handling.
+The only performance requirement is to insure the \CFA EHM has reasonable
+performance for prototyping.
+Although that may be hard to exactly quantify, we believe it has succeeded
+in that regard.
+Details on the different test cases follow.
+
+\begin{description}
+\item[Empty Traversal]
+\CFA is slower than \Cpp, but is still faster than the other languages
+and closer to \Cpp than other languages.
+This is to be expected as \CFA is closer to \Cpp than the other languages.
+
+\item[D'tor Traversal]
+Running destructors causes huge slowdown in every language that supports
+them. \CFA has a higher proportionate slowdown but it is similar to \Cpp's.
+Considering the amount of work done in destructors is so low the cost
+likely comes from the change of context required to do that work.
+
+\item[Finally Traversal]
+Speed is similar to Empty Traversal in all languages that support finally
+clauses. Only Python seems to have a larger than random noise change in
+its run-time and it is still not large.
+Despite the similarity between finally clauses and destructors,
+finally clauses seem to avoid the spike in run-time destructors have.
+Possibly some optimization removes the cost of changing contexts.
+\todo{OK, I think the finally clause may have been optimized out.}
+
+\item[Other Traversal]
+For \Cpp, stopping to check if a handler applies seems to be about as
+expensive as stopping to run a destructor.
+This results in a significant jump.
+
+Other languages experiance a small increase in run-time.
+The small increase likely comes from running the checks,
+but they could avoid the spike by not having the same kind of overhead for
+switching to the check's context.
+
+\todo{Could revist Other Traversal, after Finally Traversal.}
+
+\item[Cross Handler]
+Here \CFA falls behind \Cpp by a much more significant margin.
+This is likely due to the fact \CFA has to insert two extra function
+calls while \Cpp doesn't have to do execute any other instructions.
+Python is much further behind.
+
+\item[Cross Finally]
+\CFA's performance now matches \Cpp's from Cross Handler.
+If the code from the finally clause is being inlined,
+which is just a asm comment, than there are no additional instructions
+to execute again when exiting the try statement normally.
+
+\item[Conditional Match]
+Both of the conditional matching tests can be considered on their own,
+however for evaluating the value of conditional matching itself the
+comparison of the two sets of results is useful.
+Consider the massive jump in run-time for \Cpp going from match all to match
+none, which none of the other languages have.
+Some strange interaction is causing run-time to more than double for doing
+twice as many raises.
+Java and Python avoid this problem and have similar run-time for both tests,
+possibly through resource reuse or their program representation.
+However \CFA is built like \Cpp and avoids the problem as well, this matches
+the pattern of the conditional match which makes the two execution paths
+much more similar.
+
+\end{description}
+
+Moving on to resumption there is one general note,
+resumption is \textit{fast}, the only test where it fell
+behind termination is Cross Handler.
+In every other case, the number of iterations had to be increased by a
+factor of 10 to get the run-time in an approprate range
+and in some cases resumption still took less time.
+
+% I tried \paragraph and \subparagraph, maybe if I could adjust spacing
+% between paragraphs those would work.
+\begin{description}
+\item[Empty Traversal]
+See above for the general speed-up notes.
+This result is not surprising as resumption's link list approach
+means that traversing over stack frames without a resumption handler is
+$O(1)$.
+
+\item[D'tor Traversal]
+Resumption does have the same spike in run-time that termination has.
+The run-time is actually very similar to Finally Traversal.
+As resumption does not unwind the stack both destructors and finally
+clauses are run while walking down the stack normally.
+So it follows their performance is similar.
+
+\item[Finally Traversal]
+The increase in run-time fromm Empty Traversal (once adjusted for
+the number of iterations) roughly the same as for termination.
+This suggests that the
+
+\item[Other Traversal]
+Traversing across handlers reduces resumption's advantage as it actually
+has to stop and check each one.
+Resumption still came out ahead (adjusting for iterations) but by much less
+than the other cases.
+
+\item[Cross Handler]
+The only test case where resumption could not keep up with termination,
+although the difference is not as significant as many other cases.
+It is simply a matter of where the costs come from. Even if \CFA termination
+is not ``zero-cost" passing through an empty function still seems to be
+cheaper than updating global values.
+
+\item[Conditional Match]
+Resumption shows a slight slowdown if the exception is not matched
+by the first handler, which follows from the fact the second handler now has
+to be checked. However the difference is not large.
+
+\end{description}
+
+Finally are the results of the resumption/fixup routine comparison.
+These results are surprisingly varied, it is possible that creating a closure
+has more to do with performance than passing the argument through layers of
+calls.
+Even with 100 stack frames though, resumption is only about as fast as
+manually passing a fixup routine.
+So there is a cost for the additional power and flexibility exceptions
+provide.
