[dac16a0] | 1 | \chapter{Performance} |
---|
| 2 | \label{c:performance} |
---|
| 3 | |
---|
[cfbab07] | 4 | Performance is of secondary importance for most of this project. |
---|
| 5 | Instead, the focus was to get the features working. The only performance |
---|
| 6 | requirement is to ensure the tests for correctness run in a reasonable |
---|
| 7 | amount of time. Hence, a few basic performance tests were performed to |
---|
| 8 | check this requirement. |
---|
[b51e389c] | 9 | |
---|
| 10 | \section{Test Set-Up} |
---|
[cfbab07] | 11 | Tests were run in \CFA, C++, Java and Python. |
---|
[9698690] | 12 | In addition there are two sets of tests for \CFA, |
---|
[cfbab07] | 13 | one for termination and once with resumption. |
---|
[dac16a0] | 14 | |
---|
| 15 | C++ is the most comparable language because both it and \CFA use the same |
---|
| 16 | framework, libunwind. |
---|
[cfbab07] | 17 | In fact, the comparison is almost entirely in quality of implementation. |
---|
| 18 | Specifically, \CFA's EHM has had significantly less time to be optimized and |
---|
[dac16a0] | 19 | does not generate its own assembly. It does have a slight advantage in that |
---|
[cfbab07] | 20 | \Cpp has to do some extra bookkeeping to support its utility functions, |
---|
| 21 | but otherwise \Cpp should have a significant advantage. |
---|
[dac16a0] | 22 | |
---|
[cfbab07] | 23 | Java a popular language with similar termination semantics, but |
---|
| 24 | it is implemented in a very different environment, a virtual machine with |
---|
[029cbc0] | 25 | garbage collection. |
---|
[7372065] | 26 | It also implements the finally clause on try blocks allowing for a direct |
---|
[029cbc0] | 27 | feature-to-feature comparison. |
---|
[cfbab07] | 28 | As with \Cpp, Java's implementation is mature, has more optimizations |
---|
| 29 | and extra features as compared to \CFA. |
---|
[9698690] | 30 | |
---|
[cfbab07] | 31 | Python is used as an alternative comparison because of the \CFA EHM's |
---|
| 32 | current performance goals, which is to not be prohibitively slow while the |
---|
[9698690] | 33 | features are designed and examined. Python has similar performance goals for |
---|
| 34 | creating quick scripts and its wide use suggests it has achieved those goals. |
---|
| 35 | |
---|
[cfbab07] | 36 | Unfortunately, there are no notable modern programming languages with |
---|
| 37 | resumption exceptions. Even the older programming languages with resumption |
---|
| 38 | seem to be notable only for having resumption. |
---|
| 39 | Instead, resumption is compared to its simulation in other programming |
---|
| 40 | languages: fixup functions that are explicity passed into a function. |
---|
[dac16a0] | 41 | |
---|
[cfbab07] | 42 | All tests are run inside a main loop that repeatedly performs a test. |
---|
| 43 | This approach avoids start-up or tear-down time from |
---|
[dac16a0] | 44 | affecting the timing results. |
---|
[cfbab07] | 45 | The number of times the loop is run is configurable from the command line, |
---|
| 46 | the number used in the timing runs is given with the results per test. |
---|
[7372065] | 47 | Tests ran their main loop a million times. |
---|
[cfbab07] | 48 | The Java tests runs the main loop 1000 times before |
---|
| 49 | beginning the actual test to ``warm-up" the JVM. |
---|
| 50 | % All other languages are precompiled or interpreted. |
---|
[9698690] | 51 | |
---|
| 52 | Timing is done internally, with time measured immediately before and |
---|
[cfbab07] | 53 | after the test loop. The difference is calculated and printed. |
---|
[9698690] | 54 | The loop structure and internal timing means it is impossible to test |
---|
| 55 | unhandled exceptions in \Cpp and Java as that would cause the process to |
---|
| 56 | terminate. |
---|
| 57 | Luckily, performance on the ``give-up and kill the process" path is not |
---|
| 58 | critical. |
---|
[dac16a0] | 59 | |
---|
[cfbab07] | 60 | The exceptions used in these tests are always based off of |
---|
| 61 | the base exception for the language. |
---|
| 62 | This requirement minimizes performance differences based |
---|
| 63 | on the object model used to represent the exception. |
---|
[9698690] | 64 | |
---|
[cfbab07] | 65 | All tests are designed to be as minimal as possible, while still preventing |
---|
| 66 | excessive optimizations. |
---|
[9698690] | 67 | For example, empty inline assembly blocks are used in \CFA and \Cpp to |
---|
| 68 | prevent excessive optimizations while adding no actual work. |
---|
[dac16a0] | 69 | |
---|
[9698690] | 70 | % We don't use catch-alls but if we did: |
---|
| 71 | % Catch-alls are done by catching the root exception type (not using \Cpp's |
---|
| 72 | % \code{C++}{catch(...)}). |
---|
[dac16a0] | 73 | |
---|
[cfbab07] | 74 | When collecting data each test is run eleven times. The top three and bottom |
---|
| 75 | three results are discarded and the remaining five values are averaged. |
---|
| 76 | The test are run with the latest (still pre-release) \CFA compiler was used, |
---|
| 77 | using gcc-10 as a backend. |
---|
| 78 | g++-10 is used for \Cpp. |
---|
| 79 | Java tests are complied and run with version 11.0.11. |
---|
| 80 | Python used version 3.8. |
---|
| 81 | The machines used to run the tests are: |
---|
| 82 | \todo{Get patch versions for python, gcc and g++.} |
---|
| 83 | \begin{itemize}[nosep] |
---|
| 84 | \item ARM 2280 Kunpeng 920 48-core 2$\times$socket |
---|
| 85 | \lstinline{@} 2.6 GHz running Linux v5.11.0-25 |
---|
| 86 | \item AMD 6380 Abu Dhabi 16-core 4$\times$socket |
---|
| 87 | \lstinline{@} 2.5 GHz running Linux v5.11.0-25 |
---|
| 88 | \end{itemize} |
---|
| 89 | Representing the two major families of hardware architecture. |
---|
| 90 | |
---|
[b51e389c] | 91 | \section{Tests} |
---|
[029cbc0] | 92 | The following tests were selected to test the performance of different |
---|
| 93 | components of the exception system. |
---|
[cfbab07] | 94 | They should provide a guide as to where the EHM's costs are found. |
---|
| 95 | |
---|
| 96 | \paragraph{Stack Traversal} |
---|
| 97 | This group measures the cost of traversing the stack, |
---|
| 98 | (and in termination, unwinding it). |
---|
| 99 | Inside the main loop is a call to a recursive function. |
---|
| 100 | This function calls itself F times before raising an exception. |
---|
| 101 | F is configurable from the command line, but is usually 100. |
---|
| 102 | This builds up many stack frames, and any contents they may have, |
---|
| 103 | before the raise. |
---|
| 104 | The exception is always handled at the base of the stack. |
---|
| 105 | For example the Empty test for \CFA resumption looks like: |
---|
| 106 | \begin{cfa} |
---|
| 107 | void unwind_empty(unsigned int frames) { |
---|
| 108 | if (frames) { |
---|
| 109 | unwind_empty(frames - 1); |
---|
| 110 | } else { |
---|
| 111 | throwResume (empty_exception){&empty_vt}; |
---|
| 112 | } |
---|
| 113 | } |
---|
| 114 | \end{cfa} |
---|
| 115 | Other test cases have additional code around the recursive call add |
---|
| 116 | something besides simple stack frames to the stack. |
---|
| 117 | Note that both termination and resumption will have to traverse over |
---|
| 118 | the stack but only termination has to unwind it. |
---|
[9698690] | 119 | \begin{itemize}[nosep] |
---|
[cfbab07] | 120 | % \item None: |
---|
| 121 | % Reuses the empty test code (see below) except that the number of frames |
---|
| 122 | % is set to 0 (this is the only test for which the number of frames is not |
---|
| 123 | % 100). This isolates the start-up and shut-down time of a throw. |
---|
| 124 | \item Empty: |
---|
[7372065] | 125 | The repeating function is empty except for the necessary control code. |
---|
[cfbab07] | 126 | As other traversal tests add to this, so it is the baseline for the group |
---|
| 127 | as the cost comes from traversing over and unwinding a stack frame |
---|
| 128 | that has no other interactions with the exception system. |
---|
[ea593a3] | 129 | \item Destructor: |
---|
[7372065] | 130 | The repeating function creates an object with a destructor before calling |
---|
| 131 | itself. |
---|
[cfbab07] | 132 | Comparing this to the empty test gives the time to traverse over and/or |
---|
| 133 | unwind a destructor. |
---|
[ea593a3] | 134 | \item Finally: |
---|
[7372065] | 135 | The repeating function calls itself inside a try block with a finally clause |
---|
| 136 | attached. |
---|
[cfbab07] | 137 | Comparing this to the empty test gives the time to traverse over and/or |
---|
| 138 | unwind a finally clause. |
---|
[ea593a3] | 139 | \item Other Handler: |
---|
[7372065] | 140 | The repeating function calls itself inside a try block with a handler that |
---|
[cfbab07] | 141 | will not match the raised exception, but is of the same kind of handler. |
---|
| 142 | This means that the EHM will have to check each handler, but will continue |
---|
| 143 | over all of the until it reaches the base of the stack. |
---|
| 144 | Comparing this to the empty test gives the time to traverse over and/or |
---|
| 145 | unwind a handler. |
---|
[ea593a3] | 146 | \end{itemize} |
---|
| 147 | |
---|
[7372065] | 148 | \paragraph{Cross Try Statement} |
---|
[cfbab07] | 149 | This group of tests measures the cost setting up exception handling if it is |
---|
| 150 | not used (because the exceptional case did not occur). |
---|
| 151 | Tests repeatedly cross (enter and leave, execute) a try statement but never |
---|
| 152 | preform a raise. |
---|
[9698690] | 153 | \begin{itemize}[nosep] |
---|
[ea593a3] | 154 | \item Handler: |
---|
[cfbab07] | 155 | The try statement has a handler (of the appropriate kind). |
---|
[ea593a3] | 156 | \item Finally: |
---|
[262deda0] | 157 | The try statement has a finally clause. |
---|
[ea593a3] | 158 | \end{itemize} |
---|
| 159 | |
---|
| 160 | \paragraph{Conditional Matching} |
---|
[cfbab07] | 161 | This group measures the cost of conditional matching. |
---|
[ea593a3] | 162 | Only \CFA implements the language level conditional match, |
---|
[cfbab07] | 163 | the other languages mimic it with an ``unconditional" match (it still |
---|
| 164 | checks the exception's type) and conditional re-raise if it is not supposed |
---|
[9698690] | 165 | to handle that exception. |
---|
[cfbab07] | 166 | |
---|
| 167 | There is the pattern shown in \CFA and \Cpp. Java and Python use the same |
---|
| 168 | pattern as \Cpp, but with their own syntax. |
---|
| 169 | |
---|
| 170 | \begin{minipage}{0.45\textwidth} |
---|
| 171 | \begin{cfa} |
---|
| 172 | try { |
---|
| 173 | ... |
---|
| 174 | } catch (exception_t * e ; |
---|
| 175 | should_catch(e)) { |
---|
| 176 | ... |
---|
| 177 | } |
---|
| 178 | \end{cfa} |
---|
| 179 | \end{minipage} |
---|
| 180 | \begin{minipage}{0.55\textwidth} |
---|
| 181 | \begin{lstlisting}[language=C++] |
---|
| 182 | try { |
---|
| 183 | ... |
---|
| 184 | } catch (std::exception & e) { |
---|
| 185 | if (!should_catch(e)) throw; |
---|
| 186 | ... |
---|
| 187 | } |
---|
| 188 | \end{lstlisting} |
---|
| 189 | \end{minipage} |
---|
[9698690] | 190 | \begin{itemize}[nosep] |
---|
| 191 | \item Match All: |
---|
[ea593a3] | 192 | The condition is always true. (Always matches or never re-raises.) |
---|
[9698690] | 193 | \item Match None: |
---|
[ea593a3] | 194 | The condition is always false. (Never matches or always re-raises.) |
---|
| 195 | \end{itemize} |
---|
[dac16a0] | 196 | |
---|
[cfbab07] | 197 | \paragraph{Resumption Simulation} |
---|
| 198 | A slightly altered version of the Empty Traversal test is used when comparing |
---|
| 199 | resumption to fix-up routines. |
---|
| 200 | The handler, the actual resumption handler or the fix-up routine, |
---|
| 201 | always captures a variable at the base of the loop, |
---|
| 202 | and receives a reference to a variable at the raise site, either as a |
---|
| 203 | field on the exception or an argument to the fix-up routine. |
---|
| 204 | % I don't actually know why that is here but not anywhere else. |
---|
| 205 | |
---|
[dac16a0] | 206 | %\section{Cost in Size} |
---|
| 207 | %Using exceptions also has a cost in the size of the executable. |
---|
| 208 | %Although it is sometimes ignored |
---|
| 209 | % |
---|
| 210 | %There is a size cost to defining a personality function but the later problem |
---|
| 211 | %is the LSDA which will be generated for every function. |
---|
| 212 | % |
---|
| 213 | %(I haven't actually figured out how to compare this, probably using something |
---|
| 214 | %related to -fexceptions.) |
---|
[029cbc0] | 215 | |
---|
[9698690] | 216 | \section{Results} |
---|
[cfbab07] | 217 | % First, introduce the tables. |
---|
| 218 | \autoref{t:PerformanceTermination}, |
---|
| 219 | \autoref{t:PerformanceResumption} |
---|
| 220 | and~\autoref{t:PerformanceFixupRoutines} |
---|
| 221 | show the test results. |
---|
| 222 | In cases where a feature is not supported by a language, the test is skipped |
---|
| 223 | for that language and the result is marked N/A. |
---|
| 224 | There are also cases where the feature is supported but measuring its |
---|
| 225 | cost is impossible. This happened with Java, which uses a JIT that optimize |
---|
| 226 | away the tests and it cannot be stopped.\cite{Dice21} |
---|
| 227 | These tests are marked N/C. |
---|
| 228 | To get results in a consistent range (1 second to 1 minute is ideal, |
---|
| 229 | going higher is better than going low) N, the number of iterations of the |
---|
| 230 | main loop in each test, is varied between tests. It is also given in the |
---|
| 231 | results and usually have a value in the millions. |
---|
| 232 | |
---|
| 233 | An anomaly in some results came from \CFA's use of gcc nested functions. |
---|
| 234 | These nested functions are used to create closures that can access stack |
---|
| 235 | variables in their lexical scope. |
---|
| 236 | However, if they do so then they can cause the benchmark's run-time to |
---|
| 237 | increase by an order of magnitude. |
---|
| 238 | The simplest solution is to make those values global variables instead |
---|
| 239 | of function local variables. |
---|
| 240 | % Do we know if editing a global inside nested function is a problem? |
---|
| 241 | Tests that had to be modified to avoid this problem have been marked |
---|
| 242 | with a ``*'' in the results. |
---|
| 243 | |
---|
| 244 | % Now come the tables themselves: |
---|
| 245 | % You might need a wider window for this. |
---|
| 246 | |
---|
| 247 | \begin{table}[htb] |
---|
| 248 | \centering |
---|
| 249 | \caption{Termination Performance Results (sec)} |
---|
| 250 | \label{t:PerformanceTermination} |
---|
| 251 | \begin{tabular}{|r|*{2}{|r r r r|}} |
---|
[7372065] | 252 | \hline |
---|
[cfbab07] | 253 | & \multicolumn{4}{c||}{AMD} & \multicolumn{4}{c|}{ARM} \\ |
---|
| 254 | \cline{2-9} |
---|
| 255 | N\hspace{8pt} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} & |
---|
| 256 | \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ |
---|
[7372065] | 257 | \hline |
---|
[cfbab07] | 258 | Empty Traversal (1M) & 3.4 & 2.8 & 18.3 & 23.4 & 3.7 & 3.2 & 15.5 & 14.8 \\ |
---|
| 259 | D'tor Traversal (1M) & 48.4 & 23.6 & N/A & N/A & 64.2 & 29.0 & N/A & N/A \\ |
---|
| 260 | Finally Traversal (1M) & 3.4* & N/A & 17.9 & 29.0 & 4.1* & N/A & 15.6 & 19.0 \\ |
---|
| 261 | Other Traversal (1M) & 3.6* & 23.2 & 18.2 & 32.7 & 4.0* & 24.5 & 15.5 & 21.4 \\ |
---|
| 262 | Cross Handler (100M) & 6.0 & 0.9 & N/C & 37.4 & 10.0 & 0.8 & N/C & 32.2 \\ |
---|
| 263 | Cross Finally (100M) & 0.9 & N/A & N/C & 44.1 & 0.8 & N/A & N/C & 37.3 \\ |
---|
| 264 | Match All (10M) & 32.9 & 20.7 & 13.4 & 4.9 & 36.2 & 24.5 & 12.0 & 3.1 \\ |
---|
| 265 | Match None (10M) & 32.7 & 50.3 & 11.0 & 5.1 & 36.3 & 71.9 & 12.3 & 4.2 \\ |
---|
[7372065] | 266 | \hline |
---|
| 267 | \end{tabular} |
---|
[cfbab07] | 268 | \end{table} |
---|
[7372065] | 269 | |
---|
[cfbab07] | 270 | \begin{table}[htb] |
---|
| 271 | \centering |
---|
| 272 | \caption{Resumption Performance Results (sec)} |
---|
| 273 | \label{t:PerformanceResumption} |
---|
| 274 | \begin{tabular}{|r||r||r|} |
---|
[0b67a19] | 275 | \hline |
---|
[cfbab07] | 276 | N\hspace{8pt} |
---|
| 277 | & AMD & ARM \\ |
---|
[0b67a19] | 278 | \hline |
---|
[cfbab07] | 279 | Empty Traversal (10M) & 0.2 & 0.3 \\ |
---|
| 280 | D'tor Traversal (10M) & 1.8 & 1.0 \\ |
---|
| 281 | Finally Traversal (10M) & 1.7 & 1.0 \\ |
---|
| 282 | Other Traversal (10M) & 22.6 & 25.9 \\ |
---|
| 283 | Cross Handler (100M) & 8.4 & 11.9 \\ |
---|
| 284 | Match All (100M) & 2.3 & 3.2 \\ |
---|
| 285 | Match None (100M) & 2.9 & 3.9 \\ |
---|
[0b67a19] | 286 | \hline |
---|
| 287 | \end{tabular} |
---|
[cfbab07] | 288 | \end{table} |
---|
[262deda0] | 289 | |
---|
[cfbab07] | 290 | \begin{table}[htb] |
---|
| 291 | \centering |
---|
| 292 | \small |
---|
| 293 | \caption{Resumption/Fixup Routine Comparison (sec)} |
---|
| 294 | \label{t:PerformanceFixupRoutines} |
---|
| 295 | \setlength{\tabcolsep}{5pt} |
---|
| 296 | \begin{tabular}{|r|*{2}{|r r r r r|}} |
---|
| 297 | \hline |
---|
| 298 | & \multicolumn{5}{c||}{AMD} & \multicolumn{5}{c|}{ARM} \\ |
---|
| 299 | \cline{2-11} |
---|
| 300 | N\hspace{8pt} & \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} & |
---|
| 301 | \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ |
---|
| 302 | \hline |
---|
| 303 | Resume Empty (10M) & 3.8 & 3.5 & 14.7 & 2.3 & 176.1 & 0.3 & 0.1 & 8.9 & 1.2 & 119.9 \\ |
---|
| 304 | %Resume Other (10M) & 4.0* & 0.1* & 21.9 & 6.2 & 381.0 & 0.3* & 0.1* & 13.2 & 5.0 & 290.7 \\ |
---|
| 305 | \hline |
---|
| 306 | \end{tabular} |
---|
| 307 | \end{table} |
---|
| 308 | |
---|
| 309 | % Now discuss the results in the tables. |
---|
| 310 | One result not directly related to \CFA but important to keep in mind is that, |
---|
| 311 | for exceptions the standard intuition about which languages should go |
---|
| 312 | faster often does not hold. |
---|
| 313 | For example, there are a few cases where Python out-performs |
---|
| 314 | \CFA, \Cpp and Java. |
---|
| 315 | The most likely explanation is that, since exceptions |
---|
| 316 | are rarely considered to be the common case, the more optimized languages |
---|
| 317 | make that case expensive to improve other cases. |
---|
| 318 | In addition, languages with high-level representations have a much |
---|
| 319 | easier time scanning the stack as there is less to decode. |
---|
| 320 | |
---|
| 321 | As stated, |
---|
| 322 | the performance tests are not attempting to show \CFA has a new competitive |
---|
| 323 | way of implementing exception handling. |
---|
| 324 | The only performance requirement is to insure the \CFA EHM has reasonable |
---|
| 325 | performance for prototyping. |
---|
| 326 | Although that may be hard to exactly quantify, we believe it has succeeded |
---|
| 327 | in that regard. |
---|
| 328 | Details on the different test cases follow. |
---|
| 329 | |
---|
| 330 | \begin{description} |
---|
| 331 | \item[Empty Traversal] |
---|
| 332 | \CFA is slower than \Cpp, but is still faster than the other languages |
---|
| 333 | and closer to \Cpp than other languages. |
---|
| 334 | This is to be expected as \CFA is closer to \Cpp than the other languages. |
---|
| 335 | |
---|
| 336 | \item[D'tor Traversal] |
---|
| 337 | Running destructors causes huge slowdown in every language that supports |
---|
| 338 | them. \CFA has a higher proportionate slowdown but it is similar to \Cpp's. |
---|
| 339 | Considering the amount of work done in destructors is so low the cost |
---|
| 340 | likely comes from the change of context required to do that work. |
---|
| 341 | |
---|
| 342 | \item[Finally Traversal] |
---|
| 343 | Speed is similar to Empty Traversal in all languages that support finally |
---|
| 344 | clauses. Only Python seems to have a larger than random noise change in |
---|
| 345 | its run-time and it is still not large. |
---|
| 346 | Despite the similarity between finally clauses and destructors, |
---|
| 347 | finally clauses seem to avoid the spike in run-time destructors have. |
---|
| 348 | Possibly some optimization removes the cost of changing contexts. |
---|
| 349 | \todo{OK, I think the finally clause may have been optimized out.} |
---|
| 350 | |
---|
| 351 | \item[Other Traversal] |
---|
| 352 | For \Cpp, stopping to check if a handler applies seems to be about as |
---|
| 353 | expensive as stopping to run a destructor. |
---|
| 354 | This results in a significant jump. |
---|
| 355 | |
---|
| 356 | Other languages experiance a small increase in run-time. |
---|
| 357 | The small increase likely comes from running the checks, |
---|
| 358 | but they could avoid the spike by not having the same kind of overhead for |
---|
| 359 | switching to the check's context. |
---|
| 360 | |
---|
| 361 | \todo{Could revist Other Traversal, after Finally Traversal.} |
---|
| 362 | |
---|
| 363 | \item[Cross Handler] |
---|
| 364 | Here \CFA falls behind \Cpp by a much more significant margin. |
---|
| 365 | This is likely due to the fact \CFA has to insert two extra function |
---|
| 366 | calls while \Cpp doesn't have to do execute any other instructions. |
---|
| 367 | Python is much further behind. |
---|
| 368 | |
---|
| 369 | \item[Cross Finally] |
---|
| 370 | \CFA's performance now matches \Cpp's from Cross Handler. |
---|
| 371 | If the code from the finally clause is being inlined, |
---|
| 372 | which is just a asm comment, than there are no additional instructions |
---|
| 373 | to execute again when exiting the try statement normally. |
---|
| 374 | |
---|
| 375 | \item[Conditional Match] |
---|
| 376 | Both of the conditional matching tests can be considered on their own, |
---|
| 377 | however for evaluating the value of conditional matching itself the |
---|
| 378 | comparison of the two sets of results is useful. |
---|
| 379 | Consider the massive jump in run-time for \Cpp going from match all to match |
---|
| 380 | none, which none of the other languages have. |
---|
| 381 | Some strange interaction is causing run-time to more than double for doing |
---|
| 382 | twice as many raises. |
---|
| 383 | Java and Python avoid this problem and have similar run-time for both tests, |
---|
| 384 | possibly through resource reuse or their program representation. |
---|
| 385 | However \CFA is built like \Cpp and avoids the problem as well, this matches |
---|
| 386 | the pattern of the conditional match which makes the two execution paths |
---|
| 387 | much more similar. |
---|
| 388 | |
---|
| 389 | \end{description} |
---|
| 390 | |
---|
| 391 | Moving on to resumption there is one general note, |
---|
| 392 | resumption is \textit{fast}, the only test where it fell |
---|
| 393 | behind termination is Cross Handler. |
---|
| 394 | In every other case, the number of iterations had to be increased by a |
---|
| 395 | factor of 10 to get the run-time in an approprate range |
---|
| 396 | and in some cases resumption still took less time. |
---|
| 397 | |
---|
| 398 | % I tried \paragraph and \subparagraph, maybe if I could adjust spacing |
---|
| 399 | % between paragraphs those would work. |
---|
| 400 | \begin{description} |
---|
| 401 | \item[Empty Traversal] |
---|
| 402 | See above for the general speed-up notes. |
---|
| 403 | This result is not surprising as resumption's link list approach |
---|
| 404 | means that traversing over stack frames without a resumption handler is |
---|
| 405 | $O(1)$. |
---|
| 406 | |
---|
| 407 | \item[D'tor Traversal] |
---|
| 408 | Resumption does have the same spike in run-time that termination has. |
---|
| 409 | The run-time is actually very similar to Finally Traversal. |
---|
| 410 | As resumption does not unwind the stack both destructors and finally |
---|
| 411 | clauses are run while walking down the stack normally. |
---|
| 412 | So it follows their performance is similar. |
---|
| 413 | |
---|
| 414 | \item[Finally Traversal] |
---|
| 415 | The increase in run-time fromm Empty Traversal (once adjusted for |
---|
| 416 | the number of iterations) roughly the same as for termination. |
---|
| 417 | This suggests that the |
---|
| 418 | |
---|
| 419 | \item[Other Traversal] |
---|
| 420 | Traversing across handlers reduces resumption's advantage as it actually |
---|
| 421 | has to stop and check each one. |
---|
| 422 | Resumption still came out ahead (adjusting for iterations) but by much less |
---|
| 423 | than the other cases. |
---|
| 424 | |
---|
| 425 | \item[Cross Handler] |
---|
| 426 | The only test case where resumption could not keep up with termination, |
---|
| 427 | although the difference is not as significant as many other cases. |
---|
| 428 | It is simply a matter of where the costs come from. Even if \CFA termination |
---|
| 429 | is not ``zero-cost" passing through an empty function still seems to be |
---|
| 430 | cheaper than updating global values. |
---|
| 431 | |
---|
| 432 | \item[Conditional Match] |
---|
| 433 | Resumption shows a slight slowdown if the exception is not matched |
---|
| 434 | by the first handler, which follows from the fact the second handler now has |
---|
| 435 | to be checked. However the difference is not large. |
---|
| 436 | |
---|
| 437 | \end{description} |
---|
| 438 | |
---|
| 439 | Finally are the results of the resumption/fixup routine comparison. |
---|
| 440 | These results are surprisingly varied, it is possible that creating a closure |
---|
| 441 | has more to do with performance than passing the argument through layers of |
---|
| 442 | calls. |
---|
| 443 | Even with 100 stack frames though, resumption is only about as fast as |
---|
| 444 | manually passing a fixup routine. |
---|
| 445 | So there is a cost for the additional power and flexibility exceptions |
---|
| 446 | provide. |
---|