| 1 | \chapter{Performance} | 
|---|
| 2 | \label{c:performance} | 
|---|
| 3 |  | 
|---|
| 4 | Performance is of secondary importance for most of this project. | 
|---|
| 5 | Instead, the focus was to get the features working. The only performance | 
|---|
| 6 | requirement is to ensure the tests for correctness run in a reasonable | 
|---|
| 7 | amount of time. Hence, only a few basic performance tests were performed to | 
|---|
| 8 | check this requirement. | 
|---|
| 9 |  | 
|---|
| 10 | \section{Test Set-Up} | 
|---|
| 11 | Tests were run in \CFA, C++, Java and Python. | 
|---|
| 12 | In addition there are two sets of tests for \CFA, | 
|---|
| 13 | one with termination and one with resumption. | 
|---|
| 14 |  | 
|---|
| 15 | GCC C++ is the most comparable language because both it and \CFA use the same | 
|---|
| 16 | framework, libunwind. | 
|---|
| 17 | In fact, the comparison is almost entirely in quality of implementation. | 
|---|
| 18 | Specifically, \CFA's EHM has had significantly less time to be optimized and | 
|---|
| 19 | does not generate its own assembly. It does have a slight advantage in that | 
|---|
| 20 | \Cpp has to do some extra bookkeeping to support its utility functions, | 
|---|
| 21 | but otherwise \Cpp should have a significant advantage. | 
|---|
| 22 |  | 
|---|
| 23 | Java, a popular language with similar termination semantics, | 
|---|
| 24 | is implemented in a very different environment, a virtual machine with | 
|---|
| 25 | garbage collection. | 
|---|
| 26 | It also implements the finally clause on try blocks allowing for a direct | 
|---|
| 27 | feature-to-feature comparison. | 
|---|
| 28 | As with \Cpp, Java's implementation is mature, has more optimizations | 
|---|
| 29 | and extra features as compared to \CFA. | 
|---|
| 30 |  | 
|---|
| 31 | Python is used as an alternative comparison because of the \CFA EHM's | 
|---|
| 32 | current performance goals, which is to not be prohibitively slow while the | 
|---|
| 33 | features are designed and examined. Python has similar performance goals for | 
|---|
| 34 | creating quick scripts and its wide use suggests it has achieved those goals. | 
|---|
| 35 |  | 
|---|
| 36 | Unfortunately, there are no notable modern programming languages with | 
|---|
| 37 | resumption exceptions. Even the older programming languages with resumption | 
|---|
| 38 | seem to be notable only for having resumption. | 
|---|
| 39 | On the other hand, the functional equivalents to resumption are too new. | 
|---|
| 40 | There does not seem to be any standard implementations in well-known | 
|---|
| 41 | languages; so far, they seem confined to extensions and research languages. | 
|---|
| 42 | % There was some maybe interesting comparison to an OCaml extension | 
|---|
| 43 | % but I'm not sure how to get that working if it is interesting. | 
|---|
| 44 | Instead, resumption is compared to its simulation in other programming | 
|---|
| 45 | languages: fixup functions that are explicitly passed into a function. | 
|---|
| 46 |  | 
|---|
| 47 | All tests are run inside a main loop that repeatedly performs a test. | 
|---|
| 48 | This approach avoids start-up or tear-down time from | 
|---|
| 49 | affecting the timing results. | 
|---|
| 50 | The number of times the loop is run is configurable from the command line; | 
|---|
| 51 | the number used in the timing runs is given with the results per test. | 
|---|
| 52 | The Java tests run the main loop 1000 times before | 
|---|
| 53 | beginning the actual test to ``warm up" the JVM. | 
|---|
| 54 | % All other languages are precompiled or interpreted. | 
|---|
| 55 |  | 
|---|
| 56 | Timing is done internally, with time measured immediately before and | 
|---|
| 57 | after the test loop. The difference is calculated and printed. | 
|---|
| 58 | The loop structure and internal timing means it is impossible to test | 
|---|
| 59 | unhandled exceptions in \Cpp and Java as that would cause the process to | 
|---|
| 60 | terminate. | 
|---|
| 61 | Luckily, performance on the ``give up and kill the process" path is not | 
|---|
| 62 | critical. | 
|---|
| 63 |  | 
|---|
| 64 | The exceptions used in these tests are always based off of | 
|---|
| 65 | the base exception for the language. | 
|---|
| 66 | This requirement minimizes performance differences based | 
|---|
| 67 | on the object model used to represent the exception. | 
|---|
| 68 |  | 
|---|
| 69 | All tests are designed to be as minimal as possible, while still preventing | 
|---|
| 70 | excessive optimizations. | 
|---|
| 71 | For example, empty inline assembly blocks are used in \CFA and \Cpp to | 
|---|
| 72 | prevent excessive optimizations while adding no actual work. | 
|---|
| 73 |  | 
|---|
| 74 | % We don't use catch-alls but if we did: | 
|---|
| 75 | % Catch-alls are done by catching the root exception type (not using \Cpp's | 
|---|
| 76 | % \code{C++}{catch(...)}). | 
|---|
| 77 |  | 
|---|
| 78 | When collecting data, each test is run eleven times. The top three and bottom | 
|---|
| 79 | three results are discarded and the remaining five values are averaged. | 
|---|
| 80 | The test are run with the latest (still pre-release) \CFA compiler, | 
|---|
| 81 | using gcc-10 10.3.0 as a backend. | 
|---|
| 82 | g++-10 10.3.0 is used for \Cpp. | 
|---|
| 83 | Java tests are complied and run with Oracle OpenJDK version 11.0.11. | 
|---|
| 84 | Python used CPython version 3.8.10. | 
|---|
| 85 | The machines used to run the tests are: | 
|---|
| 86 | \begin{itemize}[nosep] | 
|---|
| 87 | \item ARM 2280 Kunpeng 920 48-core 2$\times$socket | 
|---|
| 88 | \lstinline{@} 2.6 GHz running Linux v5.11.0-25 | 
|---|
| 89 | \item AMD 6380 Abu Dhabi 16-core 4$\times$socket | 
|---|
| 90 | \lstinline{@} 2.5 GHz running Linux v5.11.0-25 | 
|---|
| 91 | \end{itemize} | 
|---|
| 92 | These represent the two major families of hardware architecture. | 
|---|
| 93 |  | 
|---|
| 94 | \section{Tests} | 
|---|
| 95 | The following tests were selected to test the performance of different | 
|---|
| 96 | components of the exception system. | 
|---|
| 97 | They should provide a guide as to where the EHM's costs are found. | 
|---|
| 98 |  | 
|---|
| 99 | \paragraph{Stack Traversal} | 
|---|
| 100 | This group of tests measures the cost of traversing the stack | 
|---|
| 101 | (and in termination, unwinding it). | 
|---|
| 102 | Inside the main loop is a call to a recursive function. | 
|---|
| 103 | This function calls itself F times before raising an exception. | 
|---|
| 104 | F is configurable from the command line, but is usually 100. | 
|---|
| 105 | This builds up many stack frames, and any contents they may have, | 
|---|
| 106 | before the raise. | 
|---|
| 107 | The exception is always handled at the base of the stack. | 
|---|
| 108 | For example the Empty test for \CFA resumption looks like: | 
|---|
| 109 | \begin{cfa} | 
|---|
| 110 | void unwind_empty(unsigned int frames) { | 
|---|
| 111 | if (frames) { | 
|---|
| 112 | unwind_empty(frames - 1); | 
|---|
| 113 | } else { | 
|---|
| 114 | throwResume (empty_exception){&empty_vt}; | 
|---|
| 115 | } | 
|---|
| 116 | } | 
|---|
| 117 | \end{cfa} | 
|---|
| 118 | Other test cases have additional code around the recursive call adding | 
|---|
| 119 | something besides simple stack frames to the stack. | 
|---|
| 120 | Note that both termination and resumption have to traverse over | 
|---|
| 121 | the stack but only termination has to unwind it. | 
|---|
| 122 | \begin{itemize}[nosep] | 
|---|
| 123 | % \item None: | 
|---|
| 124 | % Reuses the empty test code (see below) except that the number of frames | 
|---|
| 125 | % is set to 0 (this is the only test for which the number of frames is not | 
|---|
| 126 | % 100). This isolates the start-up and shut-down time of a throw. | 
|---|
| 127 | \item Empty: | 
|---|
| 128 | The repeating function is empty except for the necessary control code. | 
|---|
| 129 | As other traversal tests add to this, it is the baseline for the group | 
|---|
| 130 | as the cost comes from traversing over and unwinding a stack frame | 
|---|
| 131 | that has no other interactions with the exception system. | 
|---|
| 132 | \item Destructor: | 
|---|
| 133 | The repeating function creates an object with a destructor before calling | 
|---|
| 134 | itself. | 
|---|
| 135 | Comparing this to the empty test gives the time to traverse over and | 
|---|
| 136 | unwind a destructor. | 
|---|
| 137 | \item Finally: | 
|---|
| 138 | The repeating function calls itself inside a try block with a finally clause | 
|---|
| 139 | attached. | 
|---|
| 140 | Comparing this to the empty test gives the time to traverse over and | 
|---|
| 141 | unwind a finally clause. | 
|---|
| 142 | \item Other Handler: | 
|---|
| 143 | The repeating function calls itself inside a try block with a handler that | 
|---|
| 144 | does not match the raised exception, but is of the same kind of handler. | 
|---|
| 145 | This means that the EHM has to check each handler, and continue | 
|---|
| 146 | over all of them until it reaches the base of the stack. | 
|---|
| 147 | Comparing this to the empty test gives the time to traverse over and | 
|---|
| 148 | unwind a handler. | 
|---|
| 149 | \end{itemize} | 
|---|
| 150 |  | 
|---|
| 151 | \paragraph{Cross Try Statement} | 
|---|
| 152 | This group of tests measures the cost for setting up exception handling, | 
|---|
| 153 | if it is | 
|---|
| 154 | not used because the exceptional case did not occur. | 
|---|
| 155 | Tests repeatedly cross (enter, execute and leave) a try statement but never | 
|---|
| 156 | perform a raise. | 
|---|
| 157 | \begin{itemize}[nosep] | 
|---|
| 158 | \item Handler: | 
|---|
| 159 | The try statement has a handler (of the appropriate kind). | 
|---|
| 160 | \item Finally: | 
|---|
| 161 | The try statement has a finally clause. | 
|---|
| 162 | \end{itemize} | 
|---|
| 163 |  | 
|---|
| 164 | \paragraph{Conditional Matching} | 
|---|
| 165 | This group measures the cost of conditional matching. | 
|---|
| 166 | Only \CFA implements the language level conditional match, | 
|---|
| 167 | the other languages mimic it with an ``unconditional" match (it still | 
|---|
| 168 | checks the exception's type) and conditional re-raise if it is not supposed | 
|---|
| 169 | to handle that exception. | 
|---|
| 170 |  | 
|---|
| 171 | Here is the pattern shown in \CFA and \Cpp. Java and Python use the same | 
|---|
| 172 | pattern as \Cpp, but with their own syntax. | 
|---|
| 173 |  | 
|---|
| 174 | \begin{minipage}{0.45\textwidth} | 
|---|
| 175 | \begin{cfa} | 
|---|
| 176 | try { | 
|---|
| 177 | ... | 
|---|
| 178 | } catch (exception_t * e ; | 
|---|
| 179 | should_catch(e)) { | 
|---|
| 180 | ... | 
|---|
| 181 | } | 
|---|
| 182 | \end{cfa} | 
|---|
| 183 | \end{minipage} | 
|---|
| 184 | \begin{minipage}{0.55\textwidth} | 
|---|
| 185 | \begin{lstlisting}[language=C++] | 
|---|
| 186 | try { | 
|---|
| 187 | ... | 
|---|
| 188 | } catch (std::exception & e) { | 
|---|
| 189 | if (!should_catch(e)) throw; | 
|---|
| 190 | ... | 
|---|
| 191 | } | 
|---|
| 192 | \end{lstlisting} | 
|---|
| 193 | \end{minipage} | 
|---|
| 194 | \begin{itemize}[nosep] | 
|---|
| 195 | \item Match All: | 
|---|
| 196 | The condition is always true. (Always matches or never re-raises.) | 
|---|
| 197 | \item Match None: | 
|---|
| 198 | The condition is always false. (Never matches or always re-raises.) | 
|---|
| 199 | \end{itemize} | 
|---|
| 200 |  | 
|---|
| 201 | \paragraph{Resumption Simulation} | 
|---|
| 202 | A slightly altered version of the Empty Traversal test is used when comparing | 
|---|
| 203 | resumption to fix-up routines. | 
|---|
| 204 | The handler, the actual resumption handler or the fix-up routine, | 
|---|
| 205 | always captures a variable at the base of the loop, | 
|---|
| 206 | and receives a reference to a variable at the raise site, either as a | 
|---|
| 207 | field on the exception or an argument to the fix-up routine. | 
|---|
| 208 | % I don't actually know why that is here but not anywhere else. | 
|---|
| 209 |  | 
|---|
| 210 | %\section{Cost in Size} | 
|---|
| 211 | %Using exceptions also has a cost in the size of the executable. | 
|---|
| 212 | %Although it is sometimes ignored | 
|---|
| 213 | % | 
|---|
| 214 | %There is a size cost to defining a personality function but the later problem | 
|---|
| 215 | %is the LSDA which will be generated for every function. | 
|---|
| 216 | % | 
|---|
| 217 | %(I haven't actually figured out how to compare this, probably using something | 
|---|
| 218 | %related to -fexceptions.) | 
|---|
| 219 |  | 
|---|
| 220 | \section{Results} | 
|---|
| 221 | % First, introduce the tables. | 
|---|
| 222 | \autoref{t:PerformanceTermination}, | 
|---|
| 223 | \autoref{t:PerformanceResumption} | 
|---|
| 224 | and~\autoref{t:PerformanceFixupRoutines} | 
|---|
| 225 | show the test results. | 
|---|
| 226 | In cases where a feature is not supported by a language, the test is skipped | 
|---|
| 227 | for that language and the result is marked N/A. | 
|---|
| 228 | There are also cases where the feature is supported but measuring its | 
|---|
| 229 | cost is impossible. This happened with Java, which uses a JIT that optimizes | 
|---|
| 230 | away the tests and cannot be stopped.\cite{Dice21} | 
|---|
| 231 | These tests are marked N/C. | 
|---|
| 232 | To get results in a consistent range (1 second to 1 minute is ideal, | 
|---|
| 233 | going higher is better than going low) N, the number of iterations of the | 
|---|
| 234 | main loop in each test, is varied between tests. It is also given in the | 
|---|
| 235 | results and has a value in the millions. | 
|---|
| 236 |  | 
|---|
| 237 | An anomaly in some results came from \CFA's use of GCC nested functions. | 
|---|
| 238 | These nested functions are used to create closures that can access stack | 
|---|
| 239 | variables in their lexical scope. | 
|---|
| 240 | However, if they do so, then they can cause the benchmark's run time to | 
|---|
| 241 | increase by an order of magnitude. | 
|---|
| 242 | The simplest solution is to make those values global variables instead | 
|---|
| 243 | of function-local variables. | 
|---|
| 244 | % Do we know if editing a global inside nested function is a problem? | 
|---|
| 245 | Tests that had to be modified to avoid this problem have been marked | 
|---|
| 246 | with a ``*'' in the results. | 
|---|
| 247 |  | 
|---|
| 248 | % Now come the tables themselves: | 
|---|
| 249 | % You might need a wider window for this. | 
|---|
| 250 |  | 
|---|
| 251 | \begin{table}[htb] | 
|---|
| 252 | \centering | 
|---|
| 253 | \caption{Termination Performance Results (sec)} | 
|---|
| 254 | \label{t:PerformanceTermination} | 
|---|
| 255 | \begin{tabular}{|r|*{2}{|r r r r|}} | 
|---|
| 256 | \hline | 
|---|
| 257 | & \multicolumn{4}{c||}{AMD}         & \multicolumn{4}{c|}{ARM}  \\ | 
|---|
| 258 | \cline{2-9} | 
|---|
| 259 | N\hspace{8pt}          & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} & | 
|---|
| 260 | \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ | 
|---|
| 261 | \hline | 
|---|
| 262 | Empty Traversal (1M)   & 23.0  & 9.6   & 17.6  & 23.4      & 30.6  & 13.6  & 15.5  & 14.7  \\ | 
|---|
| 263 | D'tor Traversal (1M)   & 48.1  & 23.5  & N/A   & N/A       & 64.2  & 29.2  & N/A   & N/A   \\ | 
|---|
| 264 | Finally Traversal (1M) & 3.2*  & N/A   & 17.6  & 29.2      & 3.9*  & N/A   & 15.5  & 19.0  \\ | 
|---|
| 265 | Other Traversal (1M)   & 3.3*  & 23.9  & 17.7  & 32.8      & 3.9*  & 24.5  & 15.5  & 21.6  \\ | 
|---|
| 266 | Cross Handler (1B)     & 6.5   & 0.9   & N/C   & 38.0      & 9.6   & 0.8   & N/C   & 32.1  \\ | 
|---|
| 267 | Cross Finally (1B)     & 0.8   & N/A   & N/C   & 44.6      & 0.6   & N/A   & N/C   & 37.3  \\ | 
|---|
| 268 | Match All (10M)        & 30.5  & 20.6  & 11.2  & 3.9       & 36.9  & 24.6  & 10.7  & 3.1   \\ | 
|---|
| 269 | Match None (10M)       & 30.6  & 50.9  & 11.2  & 5.0       & 36.9  & 71.9  & 10.7  & 4.1   \\ | 
|---|
| 270 | \hline | 
|---|
| 271 | \end{tabular} | 
|---|
| 272 | \end{table} | 
|---|
| 273 |  | 
|---|
| 274 | \begin{table}[htb] | 
|---|
| 275 | \centering | 
|---|
| 276 | \caption{Resumption Performance Results (sec)} | 
|---|
| 277 | \label{t:PerformanceResumption} | 
|---|
| 278 | \begin{tabular}{|r||r||r|} | 
|---|
| 279 | \hline | 
|---|
| 280 | N\hspace{8pt} | 
|---|
| 281 | & AMD     & ARM  \\ | 
|---|
| 282 | \hline | 
|---|
| 283 | Empty Traversal (10M)   & 1.4     & 1.2  \\ | 
|---|
| 284 | D'tor Traversal (10M)   & 1.8     & 1.0  \\ | 
|---|
| 285 | Finally Traversal (10M) & 1.8     & 1.0  \\ | 
|---|
| 286 | Other Traversal (10M)   & 22.6    & 25.8 \\ | 
|---|
| 287 | Cross Handler (1B)      & 9.0     & 11.9 \\ | 
|---|
| 288 | Match All (100M)        & 2.3     & 3.2  \\ | 
|---|
| 289 | Match None (100M)       & 3.0     & 3.8  \\ | 
|---|
| 290 | \hline | 
|---|
| 291 | \end{tabular} | 
|---|
| 292 | \end{table} | 
|---|
| 293 |  | 
|---|
| 294 | \begin{table}[htb] | 
|---|
| 295 | \centering | 
|---|
| 296 | \small | 
|---|
| 297 | \caption{Resumption/Fixup Routine Comparison (sec)} | 
|---|
| 298 | \label{t:PerformanceFixupRoutines} | 
|---|
| 299 | \setlength{\tabcolsep}{5pt} | 
|---|
| 300 | \begin{tabular}{|r|*{2}{|r r r r r|}} | 
|---|
| 301 | \hline | 
|---|
| 302 | & \multicolumn{5}{c||}{AMD}     & \multicolumn{5}{c|}{ARM}  \\ | 
|---|
| 303 | \cline{2-11} | 
|---|
| 304 | N\hspace{8pt}       & \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} & | 
|---|
| 305 | \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ | 
|---|
| 306 | \hline | 
|---|
| 307 | Resume Empty (10M)  & 1.4 & 1.4 & 15.4 & 2.3 & 178.0  & 1.2 & 1.2 & 8.9 & 1.2 & 118.4 \\ | 
|---|
| 308 | \hline | 
|---|
| 309 | \end{tabular} | 
|---|
| 310 | \end{table} | 
|---|
| 311 |  | 
|---|
| 312 | % Now discuss the results in the tables. | 
|---|
| 313 | One result not directly related to \CFA but important to keep in mind is that, | 
|---|
| 314 | for exceptions, the standard intuition about which languages should go | 
|---|
| 315 | faster often does not hold. | 
|---|
| 316 | For example, there are a few cases where Python out-performs | 
|---|
| 317 | \CFA, \Cpp and Java. | 
|---|
| 318 | % To be exact, the Match All and Match None cases. | 
|---|
| 319 | The most likely explanation is that | 
|---|
| 320 | the generally faster languages have made ``common cases fast" at the expense | 
|---|
| 321 | of the rarer cases. Since exceptions are considered rare, they are made | 
|---|
| 322 | expensive to help speed up common actions, such as entering and leaving try | 
|---|
| 323 | statements. | 
|---|
| 324 | Python, on the other hand, while generally slower than the other languages, | 
|---|
| 325 | uses exceptions more and has not sacrificed their performance. | 
|---|
| 326 | In addition, languages with high-level representations have a much | 
|---|
| 327 | easier time scanning the stack as there is less to decode. | 
|---|
| 328 |  | 
|---|
| 329 | As stated, | 
|---|
| 330 | the performance tests are not attempting to show \CFA has a new competitive | 
|---|
| 331 | way of implementing exception handling. | 
|---|
| 332 | The only performance requirement is to insure the \CFA EHM has reasonable | 
|---|
| 333 | performance for prototyping. | 
|---|
| 334 | Although that may be hard to exactly quantify, I believe it has succeeded | 
|---|
| 335 | in that regard. | 
|---|
| 336 | Details on the different test cases follow. | 
|---|
| 337 |  | 
|---|
| 338 | \subsection{Termination \texorpdfstring{(\autoref{t:PerformanceTermination})}{}} | 
|---|
| 339 |  | 
|---|
| 340 | \begin{description} | 
|---|
| 341 | \item[Empty Traversal] | 
|---|
| 342 | \CFA is slower than \Cpp, but is still faster than the other languages | 
|---|
| 343 | and closer to \Cpp than other languages. | 
|---|
| 344 | This result is to be expected, | 
|---|
| 345 | as \CFA is closer to \Cpp than the other languages. | 
|---|
| 346 |  | 
|---|
| 347 | \item[D'tor Traversal] | 
|---|
| 348 | Running destructors causes a huge slowdown in the two languages that support | 
|---|
| 349 | them. \CFA has a higher proportionate slowdown but it is similar to \Cpp's. | 
|---|
| 350 | Considering the amount of work done in destructors is effectively zero | 
|---|
| 351 | (an assembly comment), the cost | 
|---|
| 352 | must come from the change of context required to run the destructor. | 
|---|
| 353 |  | 
|---|
| 354 | \item[Finally Traversal] | 
|---|
| 355 | Performance is similar to Empty Traversal in all languages that support finally | 
|---|
| 356 | clauses. Only Python seems to have a larger than random noise change in | 
|---|
| 357 | its run time and it is still not large. | 
|---|
| 358 | Despite the similarity between finally clauses and destructors, | 
|---|
| 359 | finally clauses seem to avoid the spike that run time destructors have. | 
|---|
| 360 | Possibly some optimization removes the cost of changing contexts. | 
|---|
| 361 |  | 
|---|
| 362 | \item[Other Traversal] | 
|---|
| 363 | For \Cpp, stopping to check if a handler applies seems to be about as | 
|---|
| 364 | expensive as stopping to run a destructor. | 
|---|
| 365 | This results in a significant jump. | 
|---|
| 366 |  | 
|---|
| 367 | Other languages experience a small increase in run time. | 
|---|
| 368 | The small increase likely comes from running the checks, | 
|---|
| 369 | but they could avoid the spike by not having the same kind of overhead for | 
|---|
| 370 | switching to the check's context. | 
|---|
| 371 |  | 
|---|
| 372 | \item[Cross Handler] | 
|---|
| 373 | Here, \CFA falls behind \Cpp by a much more significant margin. | 
|---|
| 374 | This is likely due to the fact that \CFA has to insert two extra function | 
|---|
| 375 | calls, while \Cpp does not have to execute any other instructions. | 
|---|
| 376 | Python is much further behind. | 
|---|
| 377 |  | 
|---|
| 378 | \item[Cross Finally] | 
|---|
| 379 | \CFA's performance now matches \Cpp's from Cross Handler. | 
|---|
| 380 | If the code from the finally clause is being inlined, | 
|---|
| 381 | which is just an asm comment, than there are no additional instructions | 
|---|
| 382 | to execute again when exiting the try statement normally. | 
|---|
| 383 |  | 
|---|
| 384 | \item[Conditional Match] | 
|---|
| 385 | Both of the conditional matching tests can be considered on their own. | 
|---|
| 386 | However, for evaluating the value of conditional matching itself, the | 
|---|
| 387 | comparison of the two sets of results is useful. | 
|---|
| 388 | Consider the massive jump in run time for \Cpp going from match all to match | 
|---|
| 389 | none, which none of the other languages have. | 
|---|
| 390 | Some strange interaction is causing run time to more than double for doing | 
|---|
| 391 | twice as many raises. | 
|---|
| 392 | Java and Python avoid this problem and have similar run time for both tests, | 
|---|
| 393 | possibly through resource reuse or their program representation. | 
|---|
| 394 | However, \CFA is built like \Cpp, and avoids the problem as well. | 
|---|
| 395 | This matches | 
|---|
| 396 | the pattern of the conditional match, which makes the two execution paths | 
|---|
| 397 | very similar. | 
|---|
| 398 |  | 
|---|
| 399 | \end{description} | 
|---|
| 400 |  | 
|---|
| 401 | \subsection{Resumption \texorpdfstring{(\autoref{t:PerformanceResumption})}{}} | 
|---|
| 402 |  | 
|---|
| 403 | Moving on to resumption, there is one general note: | 
|---|
| 404 | resumption is \textit{fast}. The only test where it fell | 
|---|
| 405 | behind termination is Cross Handler. | 
|---|
| 406 | In every other case, the number of iterations had to be increased by a | 
|---|
| 407 | factor of 10 to get the run time in an appropriate range | 
|---|
| 408 | and in some cases resumption still took less time. | 
|---|
| 409 |  | 
|---|
| 410 | % I tried \paragraph and \subparagraph, maybe if I could adjust spacing | 
|---|
| 411 | % between paragraphs those would work. | 
|---|
| 412 | \begin{description} | 
|---|
| 413 | \item[Empty Traversal] | 
|---|
| 414 | See above for the general speed-up notes. | 
|---|
| 415 | This result is not surprising as resumption's linked-list approach | 
|---|
| 416 | means that traversing over stack frames without a resumption handler is | 
|---|
| 417 | $O(1)$. | 
|---|
| 418 |  | 
|---|
| 419 | \item[D'tor Traversal] | 
|---|
| 420 | Resumption does have the same spike in run time that termination has. | 
|---|
| 421 | The run time is actually very similar to Finally Traversal. | 
|---|
| 422 | As resumption does not unwind the stack, both destructors and finally | 
|---|
| 423 | clauses are run while walking down the stack during the recursive returns. | 
|---|
| 424 | So it follows their performance is similar. | 
|---|
| 425 |  | 
|---|
| 426 | \item[Finally Traversal] | 
|---|
| 427 | Same as D'tor Traversal, | 
|---|
| 428 | except termination did not have a spike in run time on this test case. | 
|---|
| 429 |  | 
|---|
| 430 | \item[Other Traversal] | 
|---|
| 431 | Traversing across handlers reduces resumption's advantage as it actually | 
|---|
| 432 | has to stop and check each one. | 
|---|
| 433 | Resumption still came out ahead (adjusting for iterations) but by much less | 
|---|
| 434 | than the other cases. | 
|---|
| 435 |  | 
|---|
| 436 | \item[Cross Handler] | 
|---|
| 437 | The only test case where resumption could not keep up with termination, | 
|---|
| 438 | although the difference is not as significant as many other cases. | 
|---|
| 439 | It is simply a matter of where the costs come from: | 
|---|
| 440 | both termination and resumption have some work to set up or tear down a | 
|---|
| 441 | handler. It just so happens that resumption's work is slightly slower. | 
|---|
| 442 |  | 
|---|
| 443 | \item[Conditional Match] | 
|---|
| 444 | Resumption shows a slight slowdown if the exception is not matched | 
|---|
| 445 | by the first handler, which follows from the fact the second handler now has | 
|---|
| 446 | to be checked. However, the difference is not large. | 
|---|
| 447 |  | 
|---|
| 448 | \end{description} | 
|---|
| 449 |  | 
|---|
| 450 | \subsection{Resumption/Fixup \texorpdfstring{(\autoref{t:PerformanceFixupRoutines})}{}} | 
|---|
| 451 |  | 
|---|
| 452 | Finally are the results of the resumption/fixup routine comparison. | 
|---|
| 453 | These results are surprisingly varied. It is possible that creating a closure | 
|---|
| 454 | has more to do with performance than passing the argument through layers of | 
|---|
| 455 | calls. | 
|---|
| 456 | At 100 stack frames, resumption and manual fixup routines have similar | 
|---|
| 457 | performance in \CFA. | 
|---|
| 458 | More experiments could try to tease out the exact trade-offs, | 
|---|
| 459 | but the prototype's only performance goal is to be reasonable. | 
|---|
| 460 | It is already in that range, and \CFA's fixup routine simulation is | 
|---|
| 461 | one of the faster simulations as well. | 
|---|
| 462 | Plus, exceptions add features and remove syntactic overhead, | 
|---|
| 463 | so even at similar performance, resumptions have advantages | 
|---|
| 464 | over fixup routines. | 
|---|