Index: doc/theses/andrew_beach_MMath/features.tex
===================================================================
--- doc/theses/andrew_beach_MMath/features.tex	(revision be497c6a13beb0f6db241a5dc3448ff27dadb004)
+++ doc/theses/andrew_beach_MMath/features.tex	(revision 3b8acfbe95c3db3794731829fed72ce38aee0c4f)
@@ -124,4 +124,5 @@
 
 \section{Virtuals}
+\label{s:virtuals}
 Virtual types and casts are not part of \CFA's EHM nor are they required for
 any EHM.
Index: doc/theses/andrew_beach_MMath/performance.tex
===================================================================
--- doc/theses/andrew_beach_MMath/performance.tex	(revision be497c6a13beb0f6db241a5dc3448ff27dadb004)
+++ doc/theses/andrew_beach_MMath/performance.tex	(revision 3b8acfbe95c3db3794731829fed72ce38aee0c4f)
@@ -2,20 +2,20 @@
 \label{c:performance}
 
-Performance has been of secondary importance for most of this project.
-Instead, the focus has been to get the features working. The only performance
-requirements is to ensure the tests for correctness run in a reasonable
+Performance is of secondary importance for most of this project.
+Instead, the focus is to get the features working. The only performance
+requirement is to ensure the tests for correctness run in a reasonable
 amount of time.
 
 \section{Test Set-Up}
-Tests will be run in \CFA, C++, Java and Python.
+Tests were run in \CFA, C++, Java and Python.
 In addition there are two sets of tests for \CFA,
-one for termination exceptions and once with resumption exceptions.
+one for termination and one for resumption exceptions.
 
 C++ is the most comparable language because both it and \CFA use the same
 framework, libunwind.
 In fact, the comparison is almost entirely a quality of implementation
-comparison. \CFA's EHM has had significantly less time to be optimized and
+comparison: \CFA's EHM has had significantly less time to be optimized and
 does not generate its own assembly. It does have a slight advantage in that
-there are some features it does not handle, through utility functions,
+there are some features it handles directly instead of through utility functions,
 but otherwise \Cpp has a significant advantage.
 
@@ -23,30 +23,29 @@
 It is implemented in a very different environment, a virtual machine with
 garbage collection.
-It also implements the finally clause on try blocks allowing for a direct
+It also implements the @finally@ clause on @try@ blocks allowing for a direct
 feature-to-feature comparison.
-As with \Cpp, Java's implementation is more mature, has more optimizations
-and more extra features.
-
-Python was used as a point of comparison because of the \CFA EHM's
-current performance goals, which is not be prohibitively slow while the
+As with \Cpp, Java's implementation is mature, optimizations
+and has extra features.
+
+Python is used as an alternative point of comparison because of the \CFA EHM's
+current performance goals, which is not to be prohibitively slow while the
 features are designed and examined. Python has similar performance goals for
 creating quick scripts and its wide use suggests it has achieved those goals.
 
-Unfortunately there are no notable modern programming languages with
-resumption exceptions. Even the older programming languages with resumptions
-seem to be notable only for having resumptions.
-So instead resumptions are compared to a less similar but much more familiar
+Unfortunately, there are no notable modern programming languages with
+resumption exceptions. Even the older programming languages with resumption
+seem to be notable only for having resumption.
+So instead, resumption is compared to a less similar but much more familiar
 feature, termination exceptions.
 
-All tests are run inside a main loop which will perform the test
-repeatedly. This is to avoids start-up or tear-down time from
+All tests are run inside a main loop that repeatedly performs a test.
+This approach avoids start-up or tear-down time from
 affecting the timing results.
-Tests ran their main loop a million times.
-The Java versions of the test also run this loop an extra 1000 times before
-beginning to time the results to ``warm-up" the JVM.
+Each test is run a million times.
+The Java versions of the test run this loop an extra 1000 times before
+beginning to actual test to ``warm-up" the JVM.
 
 Timing is done internally, with time measured immediately before and
-immediately after the test loop. The difference is calculated and printed.
-
+after the test loop. The difference is calculated and printed.
 The loop structure and internal timing means it is impossible to test
 unhandled exceptions in \Cpp and Java as that would cause the process to
@@ -55,12 +54,23 @@
 critical.
 
-The exceptions used in these tests will always be a exception based off of
-the base exception. This requirement minimizes performance differences based
-on the object model used to repersent the exception.
-
-All tests were designed to be as minimal as possible while still preventing
-exessive optimizations.
+The exceptions used in these tests are always based off of
+a base exception. This requirement minimizes performance differences based
+on the object model used to represent the exception.
+
+All tests are designed to be as minimal as possible, while still preventing
+excessive optimizations.
 For example, empty inline assembly blocks are used in \CFA and \Cpp to
 prevent excessive optimizations while adding no actual work.
+Each test was run eleven times. The top three and bottom three results were
+discarded and the remaining five values are averaged.
+
+The tests are compiled with gcc-10 for \CFA and g++-10 for \Cpp. Java is
+compiled with 11.0.11. Python with 3.8. The tests were run on:
+\begin{itemize}[nosep]
+\item
+ARM 2280 Kunpeng 920 48-core 2$\times$socket \lstinline{@} 2.6 GHz running Linux v5.11.0-25
+\item
+AMD 6380 Abu Dhabi 16-core 4$\times$socket \lstinline{@} 2.5 GHz running Linux v5.11.0-25
+\end{itemize}
 
 % We don't use catch-alls but if we did:
@@ -71,45 +81,97 @@
 The following tests were selected to test the performance of different
 components of the exception system.
-The should provide a guide as to where the EHM's costs can be found.
+They should provide a guide as to where the EHM's costs are found.
 
 \paragraph{Raise and Handle}
-The first group of tests involve setting up
-So there is three layers to the test. The first is set up and a loop, which
-configures the test and then runs it repeatedly to reduce the impact of
-start-up and shutdown on the results.
-Each iteration of the main loop
-\begin{itemize}[nosep]
-\item Empty Function:
-The repeating function is empty except for the necessary control code.
+The first group measures the cost of a try statement when exceptions are raised
+and \emph{the stack is unwound}.  Each test has has a repeating function like
+the following
+\begin{cfa}
+void unwind_empty(unsigned int frames) {
+	if (frames) {
+		unwind_empty(frames - 1);
+	} else throw (empty_exception){&empty_vt};
+}
+\end{cfa}
+which is called M times, where each call recurses to a depth of N, an
+exception is raised, the stack is a unwound, and the exception caught.
+\begin{itemize}[nosep]
+\item Empty:
+This test measures the cost of raising (stack walking) an exception through empty
+empty stack frames to an empty handler. (see above)
 \item Destructor:
-The repeating function creates an object with a destructor before calling
-itself.
+
+This test measures the cost of raising an exception through non-empty frames
+where each frame has an object requiring destruction, to an empty
+handler. Hence, there are N destructor calls during unwinding.
+\begin{cfa}
+if (frames) {
+	WithDestructor object;
+	unwind_empty(frames - 1);
+\end{cfa}
 \item Finally:
-The repeating function calls itself inside a try block with a finally clause
-attached.
+This test measures the cost of establishing a try block with an empty finally
+clause on the front side of the recursion and running the empty finally clause
+on the back side of the recursion during stack unwinding.
+\begin{cfa}
+if (frames) {
+	try {
+		unwind_finally(frames - 1);
+	} finally {}
+\end{cfa}
 \item Other Handler:
-The repeating function calls itself inside a try block with a handler that
-will not match the raised exception. (But is of the same kind of handler.)
+This test is like the finally test but the try block has a catch clause for an
+exception that is not raised, so catch matching is executed during stack
+unwinding but the match never successes until the catch at the bottom of the
+stack.
+\begin{cfa}
+if (frames) {
+	try {
+		unwind_other(frames - 1);
+	} catch (not_raised_exception *) {}
+\end{cfa}
 \end{itemize}
 
 \paragraph{Cross Try Statement}
-The next group measures the cost of a try statement when no exceptions are
-raised. The test is set-up, then there is a loop to reduce the impact of
-start-up and shutdown on the results.
-In each iteration, a try statement is executed. Entering and leaving a loop
-is all the test wants to do.
+The next group measures just the cost of executing a try statement so
+\emph{there is no stack unwinding}.  Hence, the program main loops N times
+around:
+\begin{cfa}
+try {
+} catch (not_raised_exception *) {}
+\end{cfa}
 \begin{itemize}[nosep]
 \item Handler:
-The try statement has a handler (of the matching kind).
+The try statement has a handler.
 \item Finally:
-The try statement has a finally clause.
+The try statement replaces the handler with a finally clause.
 \end{itemize}
 
 \paragraph{Conditional Matching}
-This group of tests checks the cost of conditional matching.
+This final group measures the cost of conditional matching.
 Only \CFA implements the language level conditional match,
 the other languages must mimic with an ``unconditional" match (it still
 checks the exception's type) and conditional re-raise if it was not supposed
 to handle that exception.
+\begin{center}
+\begin{tabular}{ll}
+\multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp, Java, Python} \\
+\begin{cfa}
+try {
+	throw_exception();
+} catch (empty_exception * exc;
+		 should_catch) {
+}
+\end{cfa}
+&
+\begin{cfa}
+try {
+	throw_exception();
+} catch (EmptyException & exc) {
+	if (!should_catch) throw;
+}
+\end{cfa}
+\end{tabular}
+\end{center}
 \begin{itemize}[nosep]
 \item Match All:
@@ -130,11 +192,12 @@
 
 \section{Results}
-Each test was run eleven times. The top three and bottom three results were
-discarded and the remaining five values are averaged.
-
 In cases where a feature is not supported by a language the test is skipped
-for that language. Similarly, if a test is does not change between resumption
+for that language.
+\PAB{Report all values.
+
+Similarly, if a test does not change between resumption
 and termination in \CFA, then only one test is written and the result
 was put into the termination column.
+}
 
 % Raw Data:
@@ -237,24 +300,23 @@
 \end{tabular}
 
-One result that is not directly related to \CFA but is important to keep in
-mind is that in exceptions the standard intuitions about which languages
-should go faster often do not hold. There are cases where Python out-preforms
-\Cpp and Java. The most likely explination is that, since exceptions are
-rarely considered to be the common case, the more optimized langages have 
-optimized at their expence. In addition languages with high level            
-repersentations have a much easier time scanning the stack as there is less
+One result not directly related to \CFA but important to keep in
+mind is that, for exceptions, the standard intuition about which languages
+should go faster often does not hold. For example, there are cases where Python out-performs
+\Cpp and Java. The most likely explanation is that, since exceptions are
+rarely considered to be the common case, the more optimized languages
+make that case expense. In addition, languages with high-level
+representations have a much easier time scanning the stack as there is less
 to decode.
 
-This means that while \CFA does not actually keep up with Python in every
-case it is usually no worse than roughly half the speed of \Cpp. This is good
+This observation means that while \CFA does not actually keep up with Python in every
+case, it is usually no worse than roughly half the speed of \Cpp. This performance is good
 enough for the prototyping purposes of the project.
 
 The test case where \CFA falls short is Raise Other, the case where the
 stack is unwound including a bunch of non-matching handlers.
-This slowdown seems to come from missing optimizations,
-the results above came from gcc/g++ 10 (gcc as \CFA backend or g++ for \Cpp)
-but the results change if they are run in gcc/g++ 9 instead.
+This slowdown seems to come from missing optimizations.
+
 Importantly, there is a huge slowdown in \Cpp's results bringing that brings
-\CFA's performace back in that roughly half speed area. However many other
+\CFA's performance back in that roughly half speed area. However many other
 \CFA benchmarks increase their run-time by a similar amount falling far
 behind their \Cpp counter-parts.
@@ -269,8 +331,8 @@
 Resumption exception handling is also incredibly fast. Often an order of
 magnitude or two better than the best termination speed.
-There is a simple explination for this; traversing a linked list is much   
+There is a simple explanation for this; traversing a linked list is much   
 faster than examining and unwinding the stack. When resumption does not do as
-well its when more try statements are used per raise. Updating the interal
-linked list is not very expencive but it does add up.
+well its when more try statements are used per raise. Updating the internal
+linked list is not very expensive but it does add up.
 
 The relative speed of the Match All and Match None tests (within each
@@ -280,10 +342,10 @@
 \item
 Java and Python get similar values in both tests.
-Between the interperated code, a higher level repersentation of the call
+Between the interpreted code, a higher level representation of the call
 stack and exception reuse it it is possible the cost for a second
 throw can be folded into the first.
 % Is this due to optimization?
 \item
-Both types of \CFA are slighly slower if there is not a match.
+Both types of \CFA are slightly slower if there is not a match.
 For termination this likely comes from unwinding a bit more stack through
 libunwind instead of executing the code normally.
Index: doc/theses/andrew_beach_MMath/vtable.fig
===================================================================
--- doc/theses/andrew_beach_MMath/vtable.fig	(revision 3b8acfbe95c3db3794731829fed72ce38aee0c4f)
+++ doc/theses/andrew_beach_MMath/vtable.fig	(revision 3b8acfbe95c3db3794731829fed72ce38aee0c4f)
@@ -0,0 +1,42 @@
+#FIG 3.2  Produced by xfig version 3.2.7b
+Landscape
+Center
+Metric
+A4
+100.00
+Single
+-2
+1200 2
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 1260 1350 1485 1665
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 1260 1350 1035 1665
+2 1 1 1 0 7 50 -1 -1 4.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 1263 1346 1578 1571
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 2520 1350 2520 1665
+2 1 0 1 0 7 50 -1 -1 0.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 2520 1800 2520 2115
+2 1 1 1 0 7 50 -1 -1 4.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 2520 1350 2835 1575
+2 1 1 1 0 7 50 -1 -1 4.000 0 0 -1 1 0 2
+	1 1 1.00 45.00 90.00
+	 2517 1804 2832 2029
+4 1 0 50 -1 5 12 0.0000 2 120 240 1035 1800 V1\001
+4 1 0 50 -1 5 12 0.0000 2 120 240 1485 1800 V2\001
+4 1 0 50 -1 5 12 0.0000 2 120 240 1260 1350 V0\001
+4 0 0 50 -1 0 11 0.0000 2 135 420 1620 1665 vtable\001
+4 1 0 50 -1 5 12 0.0000 2 120 240 2520 1350 W0\001
+4 1 0 50 -1 5 12 0.0000 2 120 240 2520 2250 W2\001
+4 1 0 50 -1 5 12 0.0000 2 120 240 2520 1800 W1\001
+4 0 0 50 -1 0 11 0.0000 2 135 420 2880 1620 vtable\001
+4 0 0 50 -1 0 11 0.0000 2 135 420 2880 2070 vtable\001
+4 1 0 50 -1 0 12 0.0000 2 180 1365 1935 1080 virtual type trees\001
+4 0 0 50 -1 5 11 0.0000 2 150 735 3060 1755 Id; <,+\001
+4 0 0 50 -1 5 11 0.0000 2 150 1155 3060 2250 Id; <,+,w,-\001
