\chapter{Performance}
\label{c:performance}

Performance has been of secondary importance for most of this project.
Instead, the focus has been to get the features working. The only performance
requirements is to ensure the tests for correctness run in a reasonable
amount of time.

\section{Test Set-Up}
Tests will be run in \CFA, C++, Java and Python.
In addition there are two sets of tests for \CFA,
one for termination exceptions and once with resumption exceptions.

C++ is the most comparable language because both it and \CFA use the same
framework, libunwind.
In fact, the comparison is almost entirely a quality of implementation
comparison. \CFA's EHM has had significantly less time to be optimized and
does not generate its own assembly. It does have a slight advantage in that
there are some features it does not handle, through utility functions,
but otherwise \Cpp has a significant advantage.

Java is another very popular language with similar termination semantics.
It is implemented in a very different environment, a virtual machine with
garbage collection.
It also implements the finally clause on try blocks allowing for a direct
feature-to-feature comparison.
As with \Cpp, Java's implementation is more mature, has more optimizations
and more extra features.

Python was used as a point of comparison because of the \CFA EHM's
current performance goals, which is not be prohibitively slow while the
features are designed and examined. Python has similar performance goals for
creating quick scripts and its wide use suggests it has achieved those goals.

Unfortunately there are no notable modern programming languages with
resumption exceptions. Even the older programming languages with resumptions
seem to be notable only for having resumptions.
So instead resumptions are compared to a less similar but much more familiar
feature, termination exceptions.

All tests are run inside a main loop which will perform the test
repeatedly. This is to avoids start-up or tear-down time from
affecting the timing results.
Tests ran their main loop a million times.
The Java versions of the test also run this loop an extra 1000 times before
beginning to time the results to ``warm-up" the JVM.

Timing is done internally, with time measured immediately before and
immediately after the test loop. The difference is calculated and printed.

The loop structure and internal timing means it is impossible to test
unhandled exceptions in \Cpp and Java as that would cause the process to
terminate.
Luckily, performance on the ``give-up and kill the process" path is not
critical.

The exceptions used in these tests will always be a exception based off of
the base exception. This requirement minimizes performance differences based
on the object model used to repersent the exception.

All tests were designed to be as minimal as possible while still preventing
exessive optimizations.
For example, empty inline assembly blocks are used in \CFA and \Cpp to
prevent excessive optimizations while adding no actual work.

% We don't use catch-alls but if we did:
% Catch-alls are done by catching the root exception type (not using \Cpp's
% \code{C++}{catch(...)}).

\section{Tests}
The following tests were selected to test the performance of different
components of the exception system.
The should provide a guide as to where the EHM's costs can be found.

\paragraph{Raise and Handle}
The first group of tests involve setting up
So there is three layers to the test. The first is set up and a loop, which
configures the test and then runs it repeatedly to reduce the impact of
start-up and shutdown on the results.
Each iteration of the main loop
\begin{itemize}[nosep]
\item Empty Function:
The repeating function is empty except for the necessary control code.
\item Destructor:
The repeating function creates an object with a destructor before calling
itself.
\item Finally:
The repeating function calls itself inside a try block with a finally clause
attached.
\item Other Handler:
The repeating function calls itself inside a try block with a handler that
will not match the raised exception. (But is of the same kind of handler.)
\end{itemize}

\paragraph{Cross Try Statement}
The next group measures the cost of a try statement when no exceptions are
raised. The test is set-up, then there is a loop to reduce the impact of
start-up and shutdown on the results.
In each iteration, a try statement is executed. Entering and leaving a loop
is all the test wants to do.
\begin{itemize}[nosep]
\item Handler:
The try statement has a handler (of the matching kind).
\item Finally:
The try statement has a finally clause.
\end{itemize}

\paragraph{Conditional Matching}
This group of tests checks the cost of conditional matching.
Only \CFA implements the language level conditional match,
the other languages must mimic with an ``unconditional" match (it still
checks the exception's type) and conditional re-raise if it was not supposed
to handle that exception.
\begin{itemize}[nosep]
\item Match All:
The condition is always true. (Always matches or never re-raises.)
\item Match None:
The condition is always false. (Never matches or always re-raises.)
\end{itemize}

%\section{Cost in Size}
%Using exceptions also has a cost in the size of the executable.
%Although it is sometimes ignored
%
%There is a size cost to defining a personality function but the later problem
%is the LSDA which will be generated for every function.
%
%(I haven't actually figured out how to compare this, probably using something
%related to -fexceptions.)

\section{Results}
Each test was run eleven times. The top three and bottom three results were
discarded and the remaining five values are averaged.

In cases where a feature is not supported by a language the test is skipped
for that language. Similarly, if a test is does not change between resumption
and termination in \CFA, then only one test is written and the result
was put into the termination column.

% Raw Data:
% run-algol-a.sat
% ---------------
% Raise Empty   &  82687046678 &  291616256 &   3252824847 & 15422937623 & 14736271114 \\
% Raise D'tor   & 219933199603 &  297897792 & 223602799362 &         N/A &         N/A \\
% Raise Finally & 219703078448 &  298391745 &          N/A &         ... & 18923060958 \\
% Raise Other   & 296744104920 & 2854342084 & 112981255103 & 15475924808 & 21293137454 \\
% Cross Handler &      9256648 &   13518430 &       769328 &     3486252 &    31790804 \\
% Cross Finally &       769319 &        N/A &          N/A &     2272831 &    37491962 \\
% Match All     &   3654278402 &   47518560 &   3218907794 &  1296748192 &   624071886 \\
% Match None    &   4788861754 &   58418952 &   9458936430 &  1318065020 &   625200906 \\
%
% run-algol-thr-c
% ---------------
% Raise Empty   &   3757606400 &   36472972 &   3257803337 & 15439375452 & 14717808642 \\
% Raise D'tor   &  64546302019 &  102148375 & 223648121635 &         N/A &         N/A \\
% Raise Finally &  64671359172 &  103285005 &          N/A & 15442729458 & 18927008844 \\
% Raise Other   & 294143497130 & 2630130385 & 112969055576 & 15448220154 & 21279953424 \\
% Cross Handler &      9646462 &   11955668 &       769328 &     3453707 &    31864074 \\
% Cross Finally &       773412 &        N/A &          N/A &     2253825 &    37266476 \\
% Match All     &   3719462155 &   43294042 &   3223004977 &  1286054154 &   623887874 \\
% Match None    &   4971630929 &   55311709 &   9481225467 &  1310251289 &   623752624 \\
%
% run-algol-04-a
% --------------
% Raise Empty   & 0.0 & 0.0 &  3250260945 & 0.0 & 0.0 \\
% Raise D'tor   & 0.0 & 0.0 & 29017675113 & N/A & N/A \\
% Raise Finally & 0.0 & 0.0 &         N/A & 0.0 & 0.0 \\
% Raise Other   & 0.0 & 0.0 & 24411823773 & 0.0 & 0.0 \\
% Cross Handler & 0.0 & 0.0 &      769334 & 0.0 & 0.0 \\
% Cross Finally & 0.0 & N/A &         N/A & 0.0 & 0.0 \\
% Match All     & 0.0 & 0.0 &  3254283504 & 0.0 & 0.0 \\
% Match None    & 0.0 & 0.0 &  9476060146 & 0.0 & 0.0 \\

\begin{tabular}{|l|c c c c c|}
\hline
              & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\
\hline
Raise Empty   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
Raise D'tor   & 0.0 & 0.0 & 0.0 & N/A & N/A \\
Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
Raise Other   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
Match All     & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
Match None    & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
\hline
\end{tabular}

% run-plg7a-a.sat
% ---------------
% Raise Empty   &  57169011329 &  296612564 &   2788557155 & 17511466039 & 23324548496 \\
% Raise D'tor   & 150599858014 &  318443709 & 149651693682 &         N/A &         N/A \\
% Raise Finally & 148223145000 &  373325807 &          N/A &         ... & 29074552998 \\
% Raise Other   & 189463708732 & 3017109322 &  85819281694 & 17584295487 & 32602686679 \\
% Cross Handler &      8001654 &   13584858 &      1555995 &     6626775 &    41927358 \\
% Cross Finally &      1002473 &        N/A &          N/A &     4554344 &    51114381 \\
% Match All     &   3162460860 &   37315018 &   2649464591 &  1523205769 &   742374509 \\
% Match None    &   4054773797 &   47052659 &   7759229131 &  1555373654 &   744656403 \\
%
% run-plg7a-thr-a
% ---------------
% Raise Empty   &   3604235388 &   29829965 &   2786931833 & 17576506385 & 23352975105 \\
% Raise D'tor   &  46552380948 &  178709605 & 149834207219 &         N/A &         N/A \\
% Raise Finally &  46265157775 &  177906320 &          N/A & 17493045092 & 29170962959 \\
% Raise Other   & 195659245764 & 2376968982 &  86070431924 & 17552979675 & 32501882918 \\
% Cross Handler &    397031776 &   12503552 &      1451225 &     6658628 &    42304965 \\
% Cross Finally &      1136746 &        N/A &          N/A &     4468799 &    46155817 \\
% Match All     &   3189512499 &   39124453 &   2667795989 &  1525889031 &   733785613 \\
% Match None    &   4094675477 &   48749857 &   7850618572 &  1566713577 &   733478963 \\
%
% run-plg7a-04-a
% --------------
% 0.0 are unfilled.
% Raise Empty   & 0.0 & 0.0 &  2770781479 & 0.0 & 0.0 \\
% Raise D'tor   & 0.0 & 0.0 & 23530084907 & N/A & N/A \\
% Raise Finally & 0.0 & 0.0 &         N/A & 0.0 & 0.0 \\
% Raise Other   & 0.0 & 0.0 & 23816827982 & 0.0 & 0.0 \\
% Cross Handler & 0.0 & 0.0 &     1422188 & 0.0 & 0.0 \\
% Cross Finally & 0.0 & N/A &         N/A & 0.0 & 0.0 \\
% Match All     & 0.0 & 0.0 &  2671989778 & 0.0 & 0.0 \\
% Match None    & 0.0 & 0.0 &  7829059869 & 0.0 & 0.0 \\

% PLG7A (in seconds)
\begin{tabular}{|l|c c c c c|}
\hline
              & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\
\hline
Raise Empty   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
Raise D'tor   & 0.0 & 0.0 & 0.0 & N/A & N/A \\
Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
Raise Other   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
Match All     & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
Match None    & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
\hline
\end{tabular}

One result that is not directly related to \CFA but is important to keep in
mind is that in exceptions the standard intuitions about which languages
should go faster often do not hold. There are cases where Python out-preforms
\Cpp and Java. The most likely explination is that, since exceptions are
rarely considered to be the common case, the more optimized langages have 
optimized at their expence. In addition languages with high level            
repersentations have a much easier time scanning the stack as there is less
to decode.

This means that while \CFA does not actually keep up with Python in every
case it is usually no worse than roughly half the speed of \Cpp. This is good
enough for the prototyping purposes of the project.

The test case where \CFA falls short is Raise Other, the case where the
stack is unwound including a bunch of non-matching handlers.
This slowdown seems to come from missing optimizations,
the results above came from gcc/g++ 10 (gcc as \CFA backend or g++ for \Cpp)
but the results change if they are run in gcc/g++ 9 instead.
Importantly, there is a huge slowdown in \Cpp's results bringing that brings
\CFA's performace back in that roughly half speed area. However many other
\CFA benchmarks increase their run-time by a similar amount falling far
behind their \Cpp counter-parts.

This suggests that the performance issue in Raise Other is just an
optimization not being applied. Later versions of gcc may be able to
optimize this case further, at least down to the half of \Cpp mark.
A \CFA compiler that directly produced assembly could do even better as it
would not have to work across some of \CFA's current abstractions, like
the try terminate function.

Resumption exception handling is also incredibly fast. Often an order of
magnitude or two better than the best termination speed.
There is a simple explination for this; traversing a linked list is much   
faster than examining and unwinding the stack. When resumption does not do as
well its when more try statements are used per raise. Updating the interal
linked list is not very expencive but it does add up.

The relative speed of the Match All and Match None tests (within each
language) can also show the effectiveness conditional matching as compared
to catch and rethrow.
\begin{itemize}[nosep]
\item
Java and Python get similar values in both tests.
Between the interperated code, a higher level repersentation of the call
stack and exception reuse it it is possible the cost for a second
throw can be folded into the first.
% Is this due to optimization?
\item
Both types of \CFA are slighly slower if there is not a match.
For termination this likely comes from unwinding a bit more stack through
libunwind instead of executing the code normally.
For resumption there is extra work in traversing more of the list and running
more checks for a matching exceptions.
% Resumption is a bit high for that but this is my best theory.
\item
Then there is \Cpp, which takes 2--3 times longer to catch and rethrow vs.
just the catch. This is very high, but it does have to repeat the same
process of unwinding the stack and may have to parse the LSDA of the function
with the catch and rethrow twice, once before the catch and once after the
rethrow.
% I spent a long time thinking of what could push it over twice, this is all
% I have to explain it.
\end{itemize}
The difference in relative performance does show that there are savings to
be made by performing the check without catching the exception.