Index: doc/theses/andrew_beach_MMath/performance.tex
===================================================================
--- doc/theses/andrew_beach_MMath/performance.tex	(revision 7737c299f2304d8ecd2afae199a2a19ed04d633b)
+++ doc/theses/andrew_beach_MMath/performance.tex	(revision 04771277d3a9bb7a7cde6cf031b5cf7f14085235)
@@ -11,5 +11,5 @@
 Tests were run in \CFA, C++, Java and Python.
 In addition there are two sets of tests for \CFA,
-one for termination and once with resumption.
+one with termination and one with resumption.
 
 C++ is the most comparable language because both it and \CFA use the same
@@ -21,6 +21,6 @@
 but otherwise \Cpp should have a significant advantage.
 
-Java a popular language with similar termination semantics, but
-it is implemented in a very different environment, a virtual machine with
+Java, a popular language with similar termination semantics,
+is implemented in a very different environment, a virtual machine with
 garbage collection.
 It also implements the finally clause on try blocks allowing for a direct
@@ -38,13 +38,12 @@
 seem to be notable only for having resumption.
 Instead, resumption is compared to its simulation in other programming
-languages: fixup functions that are explicity passed into a function.
+languages: fixup functions that are explicitly passed into a function.
 
 All tests are run inside a main loop that repeatedly performs a test.
 This approach avoids start-up or tear-down time from
 affecting the timing results.
-The number of times the loop is run is configurable from the command line,
+The number of times the loop is run is configurable from the command line;
 the number used in the timing runs is given with the results per test.
-Tests ran their main loop a million times.
-The Java tests runs the main loop 1000 times before
+The Java tests run the main loop 1000 times before
 beginning the actual test to ``warm-up" the JVM.
 % All other languages are precompiled or interpreted.
@@ -72,7 +71,7 @@
 % \code{C++}{catch(...)}).
 
-When collecting data each test is run eleven times. The top three and bottom
+When collecting data, each test is run eleven times. The top three and bottom
 three results are discarded and the remaining five values are averaged.
-The test are run with the latest (still pre-release) \CFA compiler was used,
+The test are run with the latest (still pre-release) \CFA compiler,
 using gcc-10 as a backend.
 g++-10 is used for \Cpp.
@@ -113,7 +112,7 @@
 }
 \end{cfa}
-Other test cases have additional code around the recursive call add
+Other test cases have additional code around the recursive call adding
 something besides simple stack frames to the stack.
-Note that both termination and resumption will have to traverse over
+Note that both termination and resumption have to traverse over
 the stack but only termination has to unwind it.
 \begin{itemize}[nosep]
@@ -124,5 +123,5 @@
 \item Empty:
 The repeating function is empty except for the necessary control code.
-As other traversal tests add to this, so it is the baseline for the group
+As other traversal tests add to this, it is the baseline for the group
 as the cost comes from traversing over and unwinding a stack frame
 that has no other interactions with the exception system.
@@ -130,25 +129,26 @@
 The repeating function creates an object with a destructor before calling
 itself.
-Comparing this to the empty test gives the time to traverse over and/or
+Comparing this to the empty test gives the time to traverse over and
 unwind a destructor.
 \item Finally:
 The repeating function calls itself inside a try block with a finally clause
 attached.
-Comparing this to the empty test gives the time to traverse over and/or
+Comparing this to the empty test gives the time to traverse over and
 unwind a finally clause.
 \item Other Handler:
 The repeating function calls itself inside a try block with a handler that
-will not match the raised exception, but is of the same kind of handler.
-This means that the EHM will have to check each handler, but will continue
-over all of the until it reaches the base of the stack.
-Comparing this to the empty test gives the time to traverse over and/or
+does not match the raised exception, but is of the same kind of handler.
+This means that the EHM has to check each handler, and continue
+over all of them until it reaches the base of the stack.
+Comparing this to the empty test gives the time to traverse over and
 unwind a handler.
 \end{itemize}
 
 \paragraph{Cross Try Statement}
-This group of tests measures the cost setting up exception handling if it is
+This group of tests measures the cost for setting up exception handling,
+if it is
 not used (because the exceptional case did not occur).
-Tests repeatedly cross (enter and leave, execute) a try statement but never
-preform a raise.
+Tests repeatedly cross (enter, execute and leave) a try statement but never
+perform a raise.
 \begin{itemize}[nosep]
 \item Handler:
@@ -165,5 +165,5 @@
 to handle that exception.
 
-There is the pattern shown in \CFA and \Cpp. Java and Python use the same
+Here is the pattern shown in \CFA and \Cpp. Java and Python use the same
 pattern as \Cpp, but with their own syntax.
 
@@ -229,10 +229,10 @@
 going higher is better than going low) N, the number of iterations of the
 main loop in each test, is varied between tests. It is also given in the
-results and usually have a value in the millions.
+results and has a value in the millions.
 
 An anomaly in some results came from \CFA's use of gcc nested functions.
 These nested functions are used to create closures that can access stack
 variables in their lexical scope.
-However, if they do so then they can cause the benchmark's run-time to
+However, if they do so, then they can cause the benchmark's run-time to
 increase by an order of magnitude.
 The simplest solution is to make those values global variables instead
@@ -301,6 +301,5 @@
               \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\
 \hline
-Resume Empty (10M)  & 3.8  & 3.5  & 14.7  & 2.3   & 176.1 & 0.3  & 0.1  & 8.9   & 1.2   & 119.9 \\
-%Resume Other (10M)  & 4.0* & 0.1* & 21.9  & 6.2   & 381.0 & 0.3* & 0.1* & 13.2  & 5.0   & 290.7 \\
+Resume Empty (10M)  & 1.5 & 1.5 & 14.7 & 2.3 & 176.1  & 1.0 & 1.4 & 8.9 & 1.2 & 119.9 \\
 \hline
 \end{tabular}
@@ -309,8 +308,9 @@
 % Now discuss the results in the tables.
 One result not directly related to \CFA but important to keep in mind is that,
-for exceptions the standard intuition about which languages should go
+for exceptions, the standard intuition about which languages should go
 faster often does not hold.
 For example, there are a few cases where Python out-performs
 \CFA, \Cpp and Java.
+\todo{Make sure there are still cases where Python wins.}
 The most likely explanation is that, since exceptions
 are rarely considered to be the common case, the more optimized languages
@@ -324,7 +324,9 @@
 The only performance requirement is to insure the \CFA EHM has reasonable
 performance for prototyping.
-Although that may be hard to exactly quantify, we believe it has succeeded
+Although that may be hard to exactly quantify, I believe it has succeeded
 in that regard.
 Details on the different test cases follow.
+
+\subsection{Termination \texorpdfstring{(\autoref{t:PerformanceTermination})}{}}
 
 \begin{description}
@@ -332,18 +334,20 @@
 \CFA is slower than \Cpp, but is still faster than the other languages
 and closer to \Cpp than other languages.
-This is to be expected as \CFA is closer to \Cpp than the other languages.
+This result is to be expected,
+as \CFA is closer to \Cpp than the other languages.
 
 \item[D'tor Traversal]
-Running destructors causes huge slowdown in every language that supports
+Running destructors causes a huge slowdown in the two languages that support
 them. \CFA has a higher proportionate slowdown but it is similar to \Cpp's.
-Considering the amount of work done in destructors is so low the cost
-likely comes from the change of context required to do that work.
+Considering the amount of work done in destructors is effectively zero
+(an assembly comment), the cost
+must come from the change of context required to run the destructor.
 
 \item[Finally Traversal]
-Speed is similar to Empty Traversal in all languages that support finally
+Performance is similar to Empty Traversal in all languages that support finally
 clauses. Only Python seems to have a larger than random noise change in
 its run-time and it is still not large.
 Despite the similarity between finally clauses and destructors,
-finally clauses seem to avoid the spike in run-time destructors have.
+finally clauses seem to avoid the spike that run-time destructors have.
 Possibly some optimization removes the cost of changing contexts.
 \todo{OK, I think the finally clause may have been optimized out.}
@@ -354,15 +358,14 @@
 This results in a significant jump.
 
-Other languages experiance a small increase in run-time.
+Other languages experience a small increase in run-time.
 The small increase likely comes from running the checks,
 but they could avoid the spike by not having the same kind of overhead for
 switching to the check's context.
-
-\todo{Could revist Other Traversal, after Finally Traversal.}
+\todo{Could revisit Other Traversal, after Finally Traversal.}
 
 \item[Cross Handler]
 Here \CFA falls behind \Cpp by a much more significant margin.
 This is likely due to the fact \CFA has to insert two extra function
-calls while \Cpp doesn't have to do execute any other instructions.
+calls, while \Cpp does not have to do execute any other instructions.
 Python is much further behind.
 
@@ -370,10 +373,10 @@
 \CFA's performance now matches \Cpp's from Cross Handler.
 If the code from the finally clause is being inlined,
-which is just a asm comment, than there are no additional instructions
+which is just an asm comment, than there are no additional instructions
 to execute again when exiting the try statement normally.
 
 \item[Conditional Match]
-Both of the conditional matching tests can be considered on their own,
-however for evaluating the value of conditional matching itself the
+Both of the conditional matching tests can be considered on their own.
+However for evaluating the value of conditional matching itself, the
 comparison of the two sets of results is useful.
 Consider the massive jump in run-time for \Cpp going from match all to match
@@ -384,14 +387,16 @@
 possibly through resource reuse or their program representation.
 However \CFA is built like \Cpp and avoids the problem as well, this matches
-the pattern of the conditional match which makes the two execution paths
-much more similar.
+the pattern of the conditional match, which makes the two execution paths
+very similar.
 
 \end{description}
 
-Moving on to resumption there is one general note,
-resumption is \textit{fast}, the only test where it fell
+\subsection{Resumption \texorpdfstring{(\autoref{t:PerformanceResumption})}{}}
+
+Moving on to resumption, there is one general note,
+resumption is \textit{fast}. The only test where it fell
 behind termination is Cross Handler.
 In every other case, the number of iterations had to be increased by a
-factor of 10 to get the run-time in an approprate range
+factor of 10 to get the run-time in an appropriate range
 and in some cases resumption still took less time.
 
@@ -401,5 +406,5 @@
 \item[Empty Traversal]
 See above for the general speed-up notes.
-This result is not surprising as resumption's link list approach
+This result is not surprising as resumption's linked-list approach
 means that traversing over stack frames without a resumption handler is
 $O(1)$.
@@ -408,12 +413,11 @@
 Resumption does have the same spike in run-time that termination has.
 The run-time is actually very similar to Finally Traversal.
-As resumption does not unwind the stack both destructors and finally
-clauses are run while walking down the stack normally.
+As resumption does not unwind the stack, both destructors and finally
+clauses are run while walking down the stack during the recursive returns.
 So it follows their performance is similar.
 
 \item[Finally Traversal]
-The increase in run-time fromm Empty Traversal (once adjusted for
-the number of iterations) roughly the same as for termination.
-This suggests that the
+Same as D'tor Traversal,
+except termination did not have a spike in run-time on this test case.
 
 \item[Other Traversal]
@@ -426,7 +430,7 @@
 The only test case where resumption could not keep up with termination,
 although the difference is not as significant as many other cases.
-It is simply a matter of where the costs come from. Even if \CFA termination
-is not ``zero-cost" passing through an empty function still seems to be
-cheaper than updating global values.
+It is simply a matter of where the costs come from,
+both termination and resumption have some work to set-up or tear-down a
+handler. It just so happens that resumption's work is slightly slower.
 
 \item[Conditional Match]
@@ -437,10 +441,17 @@
 \end{description}
 
+\subsection{Resumption/Fixup \texorpdfstring{(\autoref{t:PerformanceFixupRoutines})}{}}
+
 Finally are the results of the resumption/fixup routine comparison.
-These results are surprisingly varied, it is possible that creating a closure
+These results are surprisingly varied. It is possible that creating a closure
 has more to do with performance than passing the argument through layers of
 calls.
-Even with 100 stack frames though, resumption is only about as fast as
-manually passing a fixup routine.
-So there is a cost for the additional power and flexibility exceptions
-provide.
+At 100 stack frames, resumption and manual fixup routines have similar
+performance in \CFA.
+More experiments could try to tease out the exact trade-offs,
+but the prototype's only performance goal is to be reasonable.
+It has already in that range, and \CFA's fixup routine simulation is
+one of the faster simulations as well.
+Plus exceptions add features and remove syntactic overhead,
+so even at similar performance resumptions have advantages
+over fixup routines.