1 | \chapter{Performance}
|
---|
2 | \label{c:performance}
|
---|
3 |
|
---|
4 | Performance is of secondary importance for most of this project.
|
---|
5 | Instead, the focus was to get the features working. The only performance
|
---|
6 | requirement is to ensure the tests for correctness run in a reasonable
|
---|
7 | amount of time. Hence, a few basic performance tests were performed to
|
---|
8 | check this requirement.
|
---|
9 |
|
---|
10 | \section{Test Set-Up}
|
---|
11 | Tests were run in \CFA, C++, Java and Python.
|
---|
12 | In addition there are two sets of tests for \CFA,
|
---|
13 | one with termination and one with resumption.
|
---|
14 |
|
---|
15 | C++ is the most comparable language because both it and \CFA use the same
|
---|
16 | framework, libunwind.
|
---|
17 | In fact, the comparison is almost entirely in quality of implementation.
|
---|
18 | Specifically, \CFA's EHM has had significantly less time to be optimized and
|
---|
19 | does not generate its own assembly. It does have a slight advantage in that
|
---|
20 | \Cpp has to do some extra bookkeeping to support its utility functions,
|
---|
21 | but otherwise \Cpp should have a significant advantage.
|
---|
22 |
|
---|
23 | Java, a popular language with similar termination semantics,
|
---|
24 | is implemented in a very different environment, a virtual machine with
|
---|
25 | garbage collection.
|
---|
26 | It also implements the finally clause on try blocks allowing for a direct
|
---|
27 | feature-to-feature comparison.
|
---|
28 | As with \Cpp, Java's implementation is mature, has more optimizations
|
---|
29 | and extra features as compared to \CFA.
|
---|
30 |
|
---|
31 | Python is used as an alternative comparison because of the \CFA EHM's
|
---|
32 | current performance goals, which is to not be prohibitively slow while the
|
---|
33 | features are designed and examined. Python has similar performance goals for
|
---|
34 | creating quick scripts and its wide use suggests it has achieved those goals.
|
---|
35 |
|
---|
36 | Unfortunately, there are no notable modern programming languages with
|
---|
37 | resumption exceptions. Even the older programming languages with resumption
|
---|
38 | seem to be notable only for having resumption.
|
---|
39 | Instead, resumption is compared to its simulation in other programming
|
---|
40 | languages: fixup functions that are explicitly passed into a function.
|
---|
41 |
|
---|
42 | All tests are run inside a main loop that repeatedly performs a test.
|
---|
43 | This approach avoids start-up or tear-down time from
|
---|
44 | affecting the timing results.
|
---|
45 | The number of times the loop is run is configurable from the command line;
|
---|
46 | the number used in the timing runs is given with the results per test.
|
---|
47 | The Java tests run the main loop 1000 times before
|
---|
48 | beginning the actual test to ``warm-up" the JVM.
|
---|
49 | % All other languages are precompiled or interpreted.
|
---|
50 |
|
---|
51 | Timing is done internally, with time measured immediately before and
|
---|
52 | after the test loop. The difference is calculated and printed.
|
---|
53 | The loop structure and internal timing means it is impossible to test
|
---|
54 | unhandled exceptions in \Cpp and Java as that would cause the process to
|
---|
55 | terminate.
|
---|
56 | Luckily, performance on the ``give-up and kill the process" path is not
|
---|
57 | critical.
|
---|
58 |
|
---|
59 | The exceptions used in these tests are always based off of
|
---|
60 | the base exception for the language.
|
---|
61 | This requirement minimizes performance differences based
|
---|
62 | on the object model used to represent the exception.
|
---|
63 |
|
---|
64 | All tests are designed to be as minimal as possible, while still preventing
|
---|
65 | excessive optimizations.
|
---|
66 | For example, empty inline assembly blocks are used in \CFA and \Cpp to
|
---|
67 | prevent excessive optimizations while adding no actual work.
|
---|
68 |
|
---|
69 | % We don't use catch-alls but if we did:
|
---|
70 | % Catch-alls are done by catching the root exception type (not using \Cpp's
|
---|
71 | % \code{C++}{catch(...)}).
|
---|
72 |
|
---|
73 | When collecting data, each test is run eleven times. The top three and bottom
|
---|
74 | three results are discarded and the remaining five values are averaged.
|
---|
75 | The test are run with the latest (still pre-release) \CFA compiler,
|
---|
76 | using gcc-10 as a backend.
|
---|
77 | g++-10 is used for \Cpp.
|
---|
78 | Java tests are complied and run with version 11.0.11.
|
---|
79 | Python used version 3.8.
|
---|
80 | The machines used to run the tests are:
|
---|
81 | \todo{Get patch versions for python, gcc and g++.}
|
---|
82 | \begin{itemize}[nosep]
|
---|
83 | \item ARM 2280 Kunpeng 920 48-core 2$\times$socket
|
---|
84 | \lstinline{@} 2.6 GHz running Linux v5.11.0-25
|
---|
85 | \item AMD 6380 Abu Dhabi 16-core 4$\times$socket
|
---|
86 | \lstinline{@} 2.5 GHz running Linux v5.11.0-25
|
---|
87 | \end{itemize}
|
---|
88 | Representing the two major families of hardware architecture.
|
---|
89 |
|
---|
90 | \section{Tests}
|
---|
91 | The following tests were selected to test the performance of different
|
---|
92 | components of the exception system.
|
---|
93 | They should provide a guide as to where the EHM's costs are found.
|
---|
94 |
|
---|
95 | \paragraph{Stack Traversal}
|
---|
96 | This group measures the cost of traversing the stack,
|
---|
97 | (and in termination, unwinding it).
|
---|
98 | Inside the main loop is a call to a recursive function.
|
---|
99 | This function calls itself F times before raising an exception.
|
---|
100 | F is configurable from the command line, but is usually 100.
|
---|
101 | This builds up many stack frames, and any contents they may have,
|
---|
102 | before the raise.
|
---|
103 | The exception is always handled at the base of the stack.
|
---|
104 | For example the Empty test for \CFA resumption looks like:
|
---|
105 | \begin{cfa}
|
---|
106 | void unwind_empty(unsigned int frames) {
|
---|
107 | if (frames) {
|
---|
108 | unwind_empty(frames - 1);
|
---|
109 | } else {
|
---|
110 | throwResume (empty_exception){&empty_vt};
|
---|
111 | }
|
---|
112 | }
|
---|
113 | \end{cfa}
|
---|
114 | Other test cases have additional code around the recursive call adding
|
---|
115 | something besides simple stack frames to the stack.
|
---|
116 | Note that both termination and resumption have to traverse over
|
---|
117 | the stack but only termination has to unwind it.
|
---|
118 | \begin{itemize}[nosep]
|
---|
119 | % \item None:
|
---|
120 | % Reuses the empty test code (see below) except that the number of frames
|
---|
121 | % is set to 0 (this is the only test for which the number of frames is not
|
---|
122 | % 100). This isolates the start-up and shut-down time of a throw.
|
---|
123 | \item Empty:
|
---|
124 | The repeating function is empty except for the necessary control code.
|
---|
125 | As other traversal tests add to this, it is the baseline for the group
|
---|
126 | as the cost comes from traversing over and unwinding a stack frame
|
---|
127 | that has no other interactions with the exception system.
|
---|
128 | \item Destructor:
|
---|
129 | The repeating function creates an object with a destructor before calling
|
---|
130 | itself.
|
---|
131 | Comparing this to the empty test gives the time to traverse over and
|
---|
132 | unwind a destructor.
|
---|
133 | \item Finally:
|
---|
134 | The repeating function calls itself inside a try block with a finally clause
|
---|
135 | attached.
|
---|
136 | Comparing this to the empty test gives the time to traverse over and
|
---|
137 | unwind a finally clause.
|
---|
138 | \item Other Handler:
|
---|
139 | The repeating function calls itself inside a try block with a handler that
|
---|
140 | does not match the raised exception, but is of the same kind of handler.
|
---|
141 | This means that the EHM has to check each handler, and continue
|
---|
142 | over all of them until it reaches the base of the stack.
|
---|
143 | Comparing this to the empty test gives the time to traverse over and
|
---|
144 | unwind a handler.
|
---|
145 | \end{itemize}
|
---|
146 |
|
---|
147 | \paragraph{Cross Try Statement}
|
---|
148 | This group of tests measures the cost for setting up exception handling,
|
---|
149 | if it is
|
---|
150 | not used (because the exceptional case did not occur).
|
---|
151 | Tests repeatedly cross (enter, execute and leave) a try statement but never
|
---|
152 | perform a raise.
|
---|
153 | \begin{itemize}[nosep]
|
---|
154 | \item Handler:
|
---|
155 | The try statement has a handler (of the appropriate kind).
|
---|
156 | \item Finally:
|
---|
157 | The try statement has a finally clause.
|
---|
158 | \end{itemize}
|
---|
159 |
|
---|
160 | \paragraph{Conditional Matching}
|
---|
161 | This group measures the cost of conditional matching.
|
---|
162 | Only \CFA implements the language level conditional match,
|
---|
163 | the other languages mimic it with an ``unconditional" match (it still
|
---|
164 | checks the exception's type) and conditional re-raise if it is not supposed
|
---|
165 | to handle that exception.
|
---|
166 |
|
---|
167 | Here is the pattern shown in \CFA and \Cpp. Java and Python use the same
|
---|
168 | pattern as \Cpp, but with their own syntax.
|
---|
169 |
|
---|
170 | \begin{minipage}{0.45\textwidth}
|
---|
171 | \begin{cfa}
|
---|
172 | try {
|
---|
173 | ...
|
---|
174 | } catch (exception_t * e ;
|
---|
175 | should_catch(e)) {
|
---|
176 | ...
|
---|
177 | }
|
---|
178 | \end{cfa}
|
---|
179 | \end{minipage}
|
---|
180 | \begin{minipage}{0.55\textwidth}
|
---|
181 | \begin{lstlisting}[language=C++]
|
---|
182 | try {
|
---|
183 | ...
|
---|
184 | } catch (std::exception & e) {
|
---|
185 | if (!should_catch(e)) throw;
|
---|
186 | ...
|
---|
187 | }
|
---|
188 | \end{lstlisting}
|
---|
189 | \end{minipage}
|
---|
190 | \begin{itemize}[nosep]
|
---|
191 | \item Match All:
|
---|
192 | The condition is always true. (Always matches or never re-raises.)
|
---|
193 | \item Match None:
|
---|
194 | The condition is always false. (Never matches or always re-raises.)
|
---|
195 | \end{itemize}
|
---|
196 |
|
---|
197 | \paragraph{Resumption Simulation}
|
---|
198 | A slightly altered version of the Empty Traversal test is used when comparing
|
---|
199 | resumption to fix-up routines.
|
---|
200 | The handler, the actual resumption handler or the fix-up routine,
|
---|
201 | always captures a variable at the base of the loop,
|
---|
202 | and receives a reference to a variable at the raise site, either as a
|
---|
203 | field on the exception or an argument to the fix-up routine.
|
---|
204 | % I don't actually know why that is here but not anywhere else.
|
---|
205 |
|
---|
206 | %\section{Cost in Size}
|
---|
207 | %Using exceptions also has a cost in the size of the executable.
|
---|
208 | %Although it is sometimes ignored
|
---|
209 | %
|
---|
210 | %There is a size cost to defining a personality function but the later problem
|
---|
211 | %is the LSDA which will be generated for every function.
|
---|
212 | %
|
---|
213 | %(I haven't actually figured out how to compare this, probably using something
|
---|
214 | %related to -fexceptions.)
|
---|
215 |
|
---|
216 | \section{Results}
|
---|
217 | % First, introduce the tables.
|
---|
218 | \autoref{t:PerformanceTermination},
|
---|
219 | \autoref{t:PerformanceResumption}
|
---|
220 | and~\autoref{t:PerformanceFixupRoutines}
|
---|
221 | show the test results.
|
---|
222 | In cases where a feature is not supported by a language, the test is skipped
|
---|
223 | for that language and the result is marked N/A.
|
---|
224 | There are also cases where the feature is supported but measuring its
|
---|
225 | cost is impossible. This happened with Java, which uses a JIT that optimize
|
---|
226 | away the tests and it cannot be stopped.\cite{Dice21}
|
---|
227 | These tests are marked N/C.
|
---|
228 | To get results in a consistent range (1 second to 1 minute is ideal,
|
---|
229 | going higher is better than going low) N, the number of iterations of the
|
---|
230 | main loop in each test, is varied between tests. It is also given in the
|
---|
231 | results and has a value in the millions.
|
---|
232 |
|
---|
233 | An anomaly in some results came from \CFA's use of gcc nested functions.
|
---|
234 | These nested functions are used to create closures that can access stack
|
---|
235 | variables in their lexical scope.
|
---|
236 | However, if they do so, then they can cause the benchmark's run-time to
|
---|
237 | increase by an order of magnitude.
|
---|
238 | The simplest solution is to make those values global variables instead
|
---|
239 | of function local variables.
|
---|
240 | % Do we know if editing a global inside nested function is a problem?
|
---|
241 | Tests that had to be modified to avoid this problem have been marked
|
---|
242 | with a ``*'' in the results.
|
---|
243 |
|
---|
244 | % Now come the tables themselves:
|
---|
245 | % You might need a wider window for this.
|
---|
246 |
|
---|
247 | \begin{table}[htb]
|
---|
248 | \centering
|
---|
249 | \caption{Termination Performance Results (sec)}
|
---|
250 | \label{t:PerformanceTermination}
|
---|
251 | \begin{tabular}{|r|*{2}{|r r r r|}}
|
---|
252 | \hline
|
---|
253 | & \multicolumn{4}{c||}{AMD} & \multicolumn{4}{c|}{ARM} \\
|
---|
254 | \cline{2-9}
|
---|
255 | N\hspace{8pt} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} &
|
---|
256 | \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\
|
---|
257 | \hline
|
---|
258 | Empty Traversal (1M) & 3.4 & 2.8 & 18.3 & 23.4 & 3.7 & 3.2 & 15.5 & 14.8 \\
|
---|
259 | D'tor Traversal (1M) & 48.4 & 23.6 & N/A & N/A & 64.2 & 29.0 & N/A & N/A \\
|
---|
260 | Finally Traversal (1M) & 3.4* & N/A & 17.9 & 29.0 & 4.1* & N/A & 15.6 & 19.0 \\
|
---|
261 | Other Traversal (1M) & 3.6* & 23.2 & 18.2 & 32.7 & 4.0* & 24.5 & 15.5 & 21.4 \\
|
---|
262 | Cross Handler (100M) & 6.0 & 0.9 & N/C & 37.4 & 10.0 & 0.8 & N/C & 32.2 \\
|
---|
263 | Cross Finally (100M) & 0.9 & N/A & N/C & 44.1 & 0.8 & N/A & N/C & 37.3 \\
|
---|
264 | Match All (10M) & 32.9 & 20.7 & 13.4 & 4.9 & 36.2 & 24.5 & 12.0 & 3.1 \\
|
---|
265 | Match None (10M) & 32.7 & 50.3 & 11.0 & 5.1 & 36.3 & 71.9 & 12.3 & 4.2 \\
|
---|
266 | \hline
|
---|
267 | \end{tabular}
|
---|
268 | \end{table}
|
---|
269 |
|
---|
270 | \begin{table}[htb]
|
---|
271 | \centering
|
---|
272 | \caption{Resumption Performance Results (sec)}
|
---|
273 | \label{t:PerformanceResumption}
|
---|
274 | \begin{tabular}{|r||r||r|}
|
---|
275 | \hline
|
---|
276 | N\hspace{8pt}
|
---|
277 | & AMD & ARM \\
|
---|
278 | \hline
|
---|
279 | Empty Traversal (10M) & 0.2 & 0.3 \\
|
---|
280 | D'tor Traversal (10M) & 1.8 & 1.0 \\
|
---|
281 | Finally Traversal (10M) & 1.7 & 1.0 \\
|
---|
282 | Other Traversal (10M) & 22.6 & 25.9 \\
|
---|
283 | Cross Handler (100M) & 8.4 & 11.9 \\
|
---|
284 | Match All (100M) & 2.3 & 3.2 \\
|
---|
285 | Match None (100M) & 2.9 & 3.9 \\
|
---|
286 | \hline
|
---|
287 | \end{tabular}
|
---|
288 | \end{table}
|
---|
289 |
|
---|
290 | \begin{table}[htb]
|
---|
291 | \centering
|
---|
292 | \small
|
---|
293 | \caption{Resumption/Fixup Routine Comparison (sec)}
|
---|
294 | \label{t:PerformanceFixupRoutines}
|
---|
295 | \setlength{\tabcolsep}{5pt}
|
---|
296 | \begin{tabular}{|r|*{2}{|r r r r r|}}
|
---|
297 | \hline
|
---|
298 | & \multicolumn{5}{c||}{AMD} & \multicolumn{5}{c|}{ARM} \\
|
---|
299 | \cline{2-11}
|
---|
300 | N\hspace{8pt} & \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} &
|
---|
301 | \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\
|
---|
302 | \hline
|
---|
303 | Resume Empty (10M) & 1.5 & 1.5 & 14.7 & 2.3 & 176.1 & 1.0 & 1.4 & 8.9 & 1.2 & 119.9 \\
|
---|
304 | \hline
|
---|
305 | \end{tabular}
|
---|
306 | \end{table}
|
---|
307 |
|
---|
308 | % Now discuss the results in the tables.
|
---|
309 | One result not directly related to \CFA but important to keep in mind is that,
|
---|
310 | for exceptions, the standard intuition about which languages should go
|
---|
311 | faster often does not hold.
|
---|
312 | For example, there are a few cases where Python out-performs
|
---|
313 | \CFA, \Cpp and Java.
|
---|
314 | \todo{Make sure there are still cases where Python wins.}
|
---|
315 | The most likely explanation is that, since exceptions
|
---|
316 | are rarely considered to be the common case, the more optimized languages
|
---|
317 | make that case expensive to improve other cases.
|
---|
318 | In addition, languages with high-level representations have a much
|
---|
319 | easier time scanning the stack as there is less to decode.
|
---|
320 |
|
---|
321 | As stated,
|
---|
322 | the performance tests are not attempting to show \CFA has a new competitive
|
---|
323 | way of implementing exception handling.
|
---|
324 | The only performance requirement is to insure the \CFA EHM has reasonable
|
---|
325 | performance for prototyping.
|
---|
326 | Although that may be hard to exactly quantify, I believe it has succeeded
|
---|
327 | in that regard.
|
---|
328 | Details on the different test cases follow.
|
---|
329 |
|
---|
330 | \subsection{Termination \texorpdfstring{(\autoref{t:PerformanceTermination})}{}}
|
---|
331 |
|
---|
332 | \begin{description}
|
---|
333 | \item[Empty Traversal]
|
---|
334 | \CFA is slower than \Cpp, but is still faster than the other languages
|
---|
335 | and closer to \Cpp than other languages.
|
---|
336 | This result is to be expected,
|
---|
337 | as \CFA is closer to \Cpp than the other languages.
|
---|
338 |
|
---|
339 | \item[D'tor Traversal]
|
---|
340 | Running destructors causes a huge slowdown in the two languages that support
|
---|
341 | them. \CFA has a higher proportionate slowdown but it is similar to \Cpp's.
|
---|
342 | Considering the amount of work done in destructors is effectively zero
|
---|
343 | (an assembly comment), the cost
|
---|
344 | must come from the change of context required to run the destructor.
|
---|
345 |
|
---|
346 | \item[Finally Traversal]
|
---|
347 | Performance is similar to Empty Traversal in all languages that support finally
|
---|
348 | clauses. Only Python seems to have a larger than random noise change in
|
---|
349 | its run-time and it is still not large.
|
---|
350 | Despite the similarity between finally clauses and destructors,
|
---|
351 | finally clauses seem to avoid the spike that run-time destructors have.
|
---|
352 | Possibly some optimization removes the cost of changing contexts.
|
---|
353 | \todo{OK, I think the finally clause may have been optimized out.}
|
---|
354 |
|
---|
355 | \item[Other Traversal]
|
---|
356 | For \Cpp, stopping to check if a handler applies seems to be about as
|
---|
357 | expensive as stopping to run a destructor.
|
---|
358 | This results in a significant jump.
|
---|
359 |
|
---|
360 | Other languages experience a small increase in run-time.
|
---|
361 | The small increase likely comes from running the checks,
|
---|
362 | but they could avoid the spike by not having the same kind of overhead for
|
---|
363 | switching to the check's context.
|
---|
364 | \todo{Could revisit Other Traversal, after Finally Traversal.}
|
---|
365 |
|
---|
366 | \item[Cross Handler]
|
---|
367 | Here \CFA falls behind \Cpp by a much more significant margin.
|
---|
368 | This is likely due to the fact \CFA has to insert two extra function
|
---|
369 | calls, while \Cpp does not have to do execute any other instructions.
|
---|
370 | Python is much further behind.
|
---|
371 |
|
---|
372 | \item[Cross Finally]
|
---|
373 | \CFA's performance now matches \Cpp's from Cross Handler.
|
---|
374 | If the code from the finally clause is being inlined,
|
---|
375 | which is just an asm comment, than there are no additional instructions
|
---|
376 | to execute again when exiting the try statement normally.
|
---|
377 |
|
---|
378 | \item[Conditional Match]
|
---|
379 | Both of the conditional matching tests can be considered on their own.
|
---|
380 | However for evaluating the value of conditional matching itself, the
|
---|
381 | comparison of the two sets of results is useful.
|
---|
382 | Consider the massive jump in run-time for \Cpp going from match all to match
|
---|
383 | none, which none of the other languages have.
|
---|
384 | Some strange interaction is causing run-time to more than double for doing
|
---|
385 | twice as many raises.
|
---|
386 | Java and Python avoid this problem and have similar run-time for both tests,
|
---|
387 | possibly through resource reuse or their program representation.
|
---|
388 | However \CFA is built like \Cpp and avoids the problem as well, this matches
|
---|
389 | the pattern of the conditional match, which makes the two execution paths
|
---|
390 | very similar.
|
---|
391 |
|
---|
392 | \end{description}
|
---|
393 |
|
---|
394 | \subsection{Resumption \texorpdfstring{(\autoref{t:PerformanceResumption})}{}}
|
---|
395 |
|
---|
396 | Moving on to resumption, there is one general note,
|
---|
397 | resumption is \textit{fast}. The only test where it fell
|
---|
398 | behind termination is Cross Handler.
|
---|
399 | In every other case, the number of iterations had to be increased by a
|
---|
400 | factor of 10 to get the run-time in an appropriate range
|
---|
401 | and in some cases resumption still took less time.
|
---|
402 |
|
---|
403 | % I tried \paragraph and \subparagraph, maybe if I could adjust spacing
|
---|
404 | % between paragraphs those would work.
|
---|
405 | \begin{description}
|
---|
406 | \item[Empty Traversal]
|
---|
407 | See above for the general speed-up notes.
|
---|
408 | This result is not surprising as resumption's linked-list approach
|
---|
409 | means that traversing over stack frames without a resumption handler is
|
---|
410 | $O(1)$.
|
---|
411 |
|
---|
412 | \item[D'tor Traversal]
|
---|
413 | Resumption does have the same spike in run-time that termination has.
|
---|
414 | The run-time is actually very similar to Finally Traversal.
|
---|
415 | As resumption does not unwind the stack, both destructors and finally
|
---|
416 | clauses are run while walking down the stack during the recursive returns.
|
---|
417 | So it follows their performance is similar.
|
---|
418 |
|
---|
419 | \item[Finally Traversal]
|
---|
420 | Same as D'tor Traversal,
|
---|
421 | except termination did not have a spike in run-time on this test case.
|
---|
422 |
|
---|
423 | \item[Other Traversal]
|
---|
424 | Traversing across handlers reduces resumption's advantage as it actually
|
---|
425 | has to stop and check each one.
|
---|
426 | Resumption still came out ahead (adjusting for iterations) but by much less
|
---|
427 | than the other cases.
|
---|
428 |
|
---|
429 | \item[Cross Handler]
|
---|
430 | The only test case where resumption could not keep up with termination,
|
---|
431 | although the difference is not as significant as many other cases.
|
---|
432 | It is simply a matter of where the costs come from,
|
---|
433 | both termination and resumption have some work to set-up or tear-down a
|
---|
434 | handler. It just so happens that resumption's work is slightly slower.
|
---|
435 |
|
---|
436 | \item[Conditional Match]
|
---|
437 | Resumption shows a slight slowdown if the exception is not matched
|
---|
438 | by the first handler, which follows from the fact the second handler now has
|
---|
439 | to be checked. However the difference is not large.
|
---|
440 |
|
---|
441 | \end{description}
|
---|
442 |
|
---|
443 | \subsection{Resumption/Fixup \texorpdfstring{(\autoref{t:PerformanceFixupRoutines})}{}}
|
---|
444 |
|
---|
445 | Finally are the results of the resumption/fixup routine comparison.
|
---|
446 | These results are surprisingly varied. It is possible that creating a closure
|
---|
447 | has more to do with performance than passing the argument through layers of
|
---|
448 | calls.
|
---|
449 | At 100 stack frames, resumption and manual fixup routines have similar
|
---|
450 | performance in \CFA.
|
---|
451 | More experiments could try to tease out the exact trade-offs,
|
---|
452 | but the prototype's only performance goal is to be reasonable.
|
---|
453 | It has already in that range, and \CFA's fixup routine simulation is
|
---|
454 | one of the faster simulations as well.
|
---|
455 | Plus exceptions add features and remove syntactic overhead,
|
---|
456 | so even at similar performance resumptions have advantages
|
---|
457 | over fixup routines.
|
---|