1 | \chapter{Performance} |
---|
2 | \label{c:performance} |
---|
3 | |
---|
4 | Performance is of secondary importance for most of this project. |
---|
5 | Instead, the focus was to get the features working. The only performance |
---|
6 | requirement is to ensure the tests for correctness run in a reasonable |
---|
7 | amount of time. Hence, only a few basic performance tests were performed to |
---|
8 | check this requirement. |
---|
9 | |
---|
10 | \section{Test Set-Up} |
---|
11 | Tests were run in \CFA, C++, Java and Python. |
---|
12 | In addition there are two sets of tests for \CFA, |
---|
13 | one with termination and one with resumption. |
---|
14 | |
---|
15 | GCC C++ is the most comparable language because both it and \CFA use the same |
---|
16 | framework, libunwind. |
---|
17 | In fact, the comparison is almost entirely in quality of implementation. |
---|
18 | Specifically, \CFA's EHM has had significantly less time to be optimized and |
---|
19 | does not generate its own assembly. It does have a slight advantage in that |
---|
20 | \Cpp has to do some extra bookkeeping to support its utility functions, |
---|
21 | but otherwise \Cpp should have a significant advantage. |
---|
22 | |
---|
23 | Java, a popular language with similar termination semantics, |
---|
24 | is implemented in a very different environment, a virtual machine with |
---|
25 | garbage collection. |
---|
26 | It also implements the finally clause on try blocks allowing for a direct |
---|
27 | feature-to-feature comparison. |
---|
28 | As with \Cpp, Java's implementation is mature, has more optimizations |
---|
29 | and extra features as compared to \CFA. |
---|
30 | |
---|
31 | Python is used as an alternative comparison because of the \CFA EHM's |
---|
32 | current performance goals, which is to not be prohibitively slow while the |
---|
33 | features are designed and examined. Python has similar performance goals for |
---|
34 | creating quick scripts and its wide use suggests it has achieved those goals. |
---|
35 | |
---|
36 | Unfortunately, there are no notable modern programming languages with |
---|
37 | resumption exceptions. Even the older programming languages with resumption |
---|
38 | seem to be notable only for having resumption. |
---|
39 | On the other hand, the functional equivalents to resumption are too new. |
---|
40 | There does not seem to be any standard implementations in well-known |
---|
41 | languages, so far they seem confined to extensions and research languages. |
---|
42 | % There was some maybe interesting comparison to an OCaml extension |
---|
43 | % but I'm not sure how to get that working if it is interesting. |
---|
44 | Instead, resumption is compared to its simulation in other programming |
---|
45 | languages: fixup functions that are explicitly passed into a function. |
---|
46 | |
---|
47 | All tests are run inside a main loop that repeatedly performs a test. |
---|
48 | This approach avoids start-up or tear-down time from |
---|
49 | affecting the timing results. |
---|
50 | The number of times the loop is run is configurable from the command line; |
---|
51 | the number used in the timing runs is given with the results per test. |
---|
52 | The Java tests run the main loop 1000 times before |
---|
53 | beginning the actual test to ``warm up" the JVM. |
---|
54 | % All other languages are precompiled or interpreted. |
---|
55 | |
---|
56 | Timing is done internally, with time measured immediately before and |
---|
57 | after the test loop. The difference is calculated and printed. |
---|
58 | The loop structure and internal timing means it is impossible to test |
---|
59 | unhandled exceptions in \Cpp and Java as that would cause the process to |
---|
60 | terminate. |
---|
61 | Luckily, performance on the ``give up and kill the process" path is not |
---|
62 | critical. |
---|
63 | |
---|
64 | The exceptions used in these tests are always based off of |
---|
65 | the base exception for the language. |
---|
66 | This requirement minimizes performance differences based |
---|
67 | on the object model used to represent the exception. |
---|
68 | |
---|
69 | All tests are designed to be as minimal as possible, while still preventing |
---|
70 | excessive optimizations. |
---|
71 | For example, empty inline assembly blocks are used in \CFA and \Cpp to |
---|
72 | prevent excessive optimizations while adding no actual work. |
---|
73 | |
---|
74 | % We don't use catch-alls but if we did: |
---|
75 | % Catch-alls are done by catching the root exception type (not using \Cpp's |
---|
76 | % \code{C++}{catch(...)}). |
---|
77 | |
---|
78 | When collecting data, each test is run eleven times. The top three and bottom |
---|
79 | three results are discarded and the remaining five values are averaged. |
---|
80 | The test are run with the latest (still pre-release) \CFA compiler, |
---|
81 | using gcc-10 10.3.0 as a backend. |
---|
82 | g++-10 10.3.0 is used for \Cpp. |
---|
83 | Java tests are complied and run with Oracle OpenJDK version 11.0.11. |
---|
84 | Python used CPython version 3.8.10. |
---|
85 | The machines used to run the tests are: |
---|
86 | \begin{itemize}[nosep] |
---|
87 | \item ARM 2280 Kunpeng 920 48-core 2$\times$socket |
---|
88 | \lstinline{@} 2.6 GHz running Linux v5.11.0-25 |
---|
89 | \item AMD 6380 Abu Dhabi 16-core 4$\times$socket |
---|
90 | \lstinline{@} 2.5 GHz running Linux v5.11.0-25 |
---|
91 | \end{itemize} |
---|
92 | These represent the two major families of hardware architecture. |
---|
93 | |
---|
94 | \section{Tests} |
---|
95 | The following tests were selected to test the performance of different |
---|
96 | components of the exception system. |
---|
97 | They should provide a guide as to where the EHM's costs are found. |
---|
98 | |
---|
99 | \paragraph{Stack Traversal} |
---|
100 | This group of tests measures the cost of traversing the stack |
---|
101 | (and in termination, unwinding it). |
---|
102 | Inside the main loop is a call to a recursive function. |
---|
103 | This function calls itself F times before raising an exception. |
---|
104 | F is configurable from the command line, but is usually 100. |
---|
105 | This builds up many stack frames, and any contents they may have, |
---|
106 | before the raise. |
---|
107 | The exception is always handled at the base of the stack. |
---|
108 | For example the Empty test for \CFA resumption looks like: |
---|
109 | \begin{cfa} |
---|
110 | void unwind_empty(unsigned int frames) { |
---|
111 | if (frames) { |
---|
112 | unwind_empty(frames - 1); |
---|
113 | } else { |
---|
114 | throwResume (empty_exception){&empty_vt}; |
---|
115 | } |
---|
116 | } |
---|
117 | \end{cfa} |
---|
118 | Other test cases have additional code around the recursive call adding |
---|
119 | something besides simple stack frames to the stack. |
---|
120 | Note that both termination and resumption have to traverse over |
---|
121 | the stack but only termination has to unwind it. |
---|
122 | \begin{itemize}[nosep] |
---|
123 | % \item None: |
---|
124 | % Reuses the empty test code (see below) except that the number of frames |
---|
125 | % is set to 0 (this is the only test for which the number of frames is not |
---|
126 | % 100). This isolates the start-up and shut-down time of a throw. |
---|
127 | \item Empty: |
---|
128 | The repeating function is empty except for the necessary control code. |
---|
129 | As other traversal tests add to this, it is the baseline for the group |
---|
130 | as the cost comes from traversing over and unwinding a stack frame |
---|
131 | that has no other interactions with the exception system. |
---|
132 | \item Destructor: |
---|
133 | The repeating function creates an object with a destructor before calling |
---|
134 | itself. |
---|
135 | Comparing this to the empty test gives the time to traverse over and |
---|
136 | unwind a destructor. |
---|
137 | \item Finally: |
---|
138 | The repeating function calls itself inside a try block with a finally clause |
---|
139 | attached. |
---|
140 | Comparing this to the empty test gives the time to traverse over and |
---|
141 | unwind a finally clause. |
---|
142 | \item Other Handler: |
---|
143 | The repeating function calls itself inside a try block with a handler that |
---|
144 | does not match the raised exception, but is of the same kind of handler. |
---|
145 | This means that the EHM has to check each handler, and continue |
---|
146 | over all of them until it reaches the base of the stack. |
---|
147 | Comparing this to the empty test gives the time to traverse over and |
---|
148 | unwind a handler. |
---|
149 | \end{itemize} |
---|
150 | |
---|
151 | \paragraph{Cross Try Statement} |
---|
152 | This group of tests measures the cost for setting up exception handling, |
---|
153 | if it is |
---|
154 | not used because the exceptional case did not occur. |
---|
155 | Tests repeatedly cross (enter, execute and leave) a try statement but never |
---|
156 | perform a raise. |
---|
157 | \begin{itemize}[nosep] |
---|
158 | \item Handler: |
---|
159 | The try statement has a handler (of the appropriate kind). |
---|
160 | \item Finally: |
---|
161 | The try statement has a finally clause. |
---|
162 | \end{itemize} |
---|
163 | |
---|
164 | \paragraph{Conditional Matching} |
---|
165 | This group measures the cost of conditional matching. |
---|
166 | Only \CFA implements the language level conditional match, |
---|
167 | the other languages mimic it with an ``unconditional" match (it still |
---|
168 | checks the exception's type) and conditional re-raise if it is not supposed |
---|
169 | to handle that exception. |
---|
170 | |
---|
171 | Here is the pattern shown in \CFA and \Cpp. Java and Python use the same |
---|
172 | pattern as \Cpp, but with their own syntax. |
---|
173 | |
---|
174 | \begin{minipage}{0.45\textwidth} |
---|
175 | \begin{cfa} |
---|
176 | try { |
---|
177 | ... |
---|
178 | } catch (exception_t * e ; |
---|
179 | should_catch(e)) { |
---|
180 | ... |
---|
181 | } |
---|
182 | \end{cfa} |
---|
183 | \end{minipage} |
---|
184 | \begin{minipage}{0.55\textwidth} |
---|
185 | \begin{lstlisting}[language=C++] |
---|
186 | try { |
---|
187 | ... |
---|
188 | } catch (std::exception & e) { |
---|
189 | if (!should_catch(e)) throw; |
---|
190 | ... |
---|
191 | } |
---|
192 | \end{lstlisting} |
---|
193 | \end{minipage} |
---|
194 | \begin{itemize}[nosep] |
---|
195 | \item Match All: |
---|
196 | The condition is always true. (Always matches or never re-raises.) |
---|
197 | \item Match None: |
---|
198 | The condition is always false. (Never matches or always re-raises.) |
---|
199 | \end{itemize} |
---|
200 | |
---|
201 | \paragraph{Resumption Simulation} |
---|
202 | A slightly altered version of the Empty Traversal test is used when comparing |
---|
203 | resumption to fix-up routines. |
---|
204 | The handler, the actual resumption handler or the fix-up routine, |
---|
205 | always captures a variable at the base of the loop, |
---|
206 | and receives a reference to a variable at the raise site, either as a |
---|
207 | field on the exception or an argument to the fix-up routine. |
---|
208 | % I don't actually know why that is here but not anywhere else. |
---|
209 | |
---|
210 | %\section{Cost in Size} |
---|
211 | %Using exceptions also has a cost in the size of the executable. |
---|
212 | %Although it is sometimes ignored |
---|
213 | % |
---|
214 | %There is a size cost to defining a personality function but the later problem |
---|
215 | %is the LSDA which will be generated for every function. |
---|
216 | % |
---|
217 | %(I haven't actually figured out how to compare this, probably using something |
---|
218 | %related to -fexceptions.) |
---|
219 | |
---|
220 | \section{Results} |
---|
221 | % First, introduce the tables. |
---|
222 | \autoref{t:PerformanceTermination}, |
---|
223 | \autoref{t:PerformanceResumption} |
---|
224 | and~\autoref{t:PerformanceFixupRoutines} |
---|
225 | show the test results. |
---|
226 | In cases where a feature is not supported by a language, the test is skipped |
---|
227 | for that language and the result is marked N/A. |
---|
228 | There are also cases where the feature is supported but measuring its |
---|
229 | cost is impossible. This happened with Java, which uses a JIT that optimizes |
---|
230 | away the tests and cannot be stopped.\cite{Dice21} |
---|
231 | These tests are marked N/C. |
---|
232 | To get results in a consistent range (1 second to 1 minute is ideal, |
---|
233 | going higher is better than going low) N, the number of iterations of the |
---|
234 | main loop in each test, is varied between tests. It is also given in the |
---|
235 | results and has a value in the millions. |
---|
236 | |
---|
237 | An anomaly in some results came from \CFA's use of GCC nested functions. |
---|
238 | These nested functions are used to create closures that can access stack |
---|
239 | variables in their lexical scope. |
---|
240 | However, if they do so, then they can cause the benchmark's run time to |
---|
241 | increase by an order of magnitude. |
---|
242 | The simplest solution is to make those values global variables instead |
---|
243 | of function-local variables. |
---|
244 | % Do we know if editing a global inside nested function is a problem? |
---|
245 | Tests that had to be modified to avoid this problem have been marked |
---|
246 | with a ``*'' in the results. |
---|
247 | |
---|
248 | % Now come the tables themselves: |
---|
249 | % You might need a wider window for this. |
---|
250 | |
---|
251 | \begin{table}[htb] |
---|
252 | \centering |
---|
253 | \caption{Termination Performance Results (sec)} |
---|
254 | \label{t:PerformanceTermination} |
---|
255 | \begin{tabular}{|r|*{2}{|r r r r|}} |
---|
256 | \hline |
---|
257 | & \multicolumn{4}{c||}{AMD} & \multicolumn{4}{c|}{ARM} \\ |
---|
258 | \cline{2-9} |
---|
259 | N\hspace{8pt} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} & |
---|
260 | \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ |
---|
261 | \hline |
---|
262 | Empty Traversal (1M) & 23.0 & 9.6 & 17.6 & 23.4 & 30.6 & 13.6 & 15.5 & 14.7 \\ |
---|
263 | D'tor Traversal (1M) & 48.1 & 23.5 & N/A & N/A & 64.2 & 29.2 & N/A & N/A \\ |
---|
264 | Finally Traversal (1M) & 3.2* & N/A & 17.6 & 29.2 & 3.9* & N/A & 15.5 & 19.0 \\ |
---|
265 | Other Traversal (1M) & 3.3* & 23.9 & 17.7 & 32.8 & 3.9* & 24.5 & 15.5 & 21.6 \\ |
---|
266 | Cross Handler (1B) & 6.5 & 0.9 & N/C & 38.0 & 9.6 & 0.8 & N/C & 32.1 \\ |
---|
267 | Cross Finally (1B) & 0.8 & N/A & N/C & 44.6 & 0.6 & N/A & N/C & 37.3 \\ |
---|
268 | Match All (10M) & 30.5 & 20.6 & 11.2 & 3.9 & 36.9 & 24.6 & 10.7 & 3.1 \\ |
---|
269 | Match None (10M) & 30.6 & 50.9 & 11.2 & 5.0 & 36.9 & 71.9 & 10.7 & 4.1 \\ |
---|
270 | \hline |
---|
271 | \end{tabular} |
---|
272 | \end{table} |
---|
273 | |
---|
274 | \begin{table}[htb] |
---|
275 | \centering |
---|
276 | \caption{Resumption Performance Results (sec)} |
---|
277 | \label{t:PerformanceResumption} |
---|
278 | \begin{tabular}{|r||r||r|} |
---|
279 | \hline |
---|
280 | N\hspace{8pt} |
---|
281 | & AMD & ARM \\ |
---|
282 | \hline |
---|
283 | Empty Traversal (10M) & 1.4 & 1.2 \\ |
---|
284 | D'tor Traversal (10M) & 1.8 & 1.0 \\ |
---|
285 | Finally Traversal (10M) & 1.8 & 1.0 \\ |
---|
286 | Other Traversal (10M) & 22.6 & 25.8 \\ |
---|
287 | Cross Handler (1B) & 9.0 & 11.9 \\ |
---|
288 | Match All (100M) & 2.3 & 3.2 \\ |
---|
289 | Match None (100M) & 3.0 & 3.8 \\ |
---|
290 | \hline |
---|
291 | \end{tabular} |
---|
292 | \end{table} |
---|
293 | |
---|
294 | \begin{table}[htb] |
---|
295 | \centering |
---|
296 | \small |
---|
297 | \caption{Resumption/Fixup Routine Comparison (sec)} |
---|
298 | \label{t:PerformanceFixupRoutines} |
---|
299 | \setlength{\tabcolsep}{5pt} |
---|
300 | \begin{tabular}{|r|*{2}{|r r r r r|}} |
---|
301 | \hline |
---|
302 | & \multicolumn{5}{c||}{AMD} & \multicolumn{5}{c|}{ARM} \\ |
---|
303 | \cline{2-11} |
---|
304 | N\hspace{8pt} & \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} & |
---|
305 | \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ |
---|
306 | \hline |
---|
307 | Resume Empty (10M) & 1.4 & 1.4 & 15.4 & 2.3 & 178.0 & 1.2 & 1.2 & 8.9 & 1.2 & 118.4 \\ |
---|
308 | \hline |
---|
309 | \end{tabular} |
---|
310 | \end{table} |
---|
311 | |
---|
312 | % Now discuss the results in the tables. |
---|
313 | One result not directly related to \CFA but important to keep in mind is that, |
---|
314 | for exceptions, the standard intuition about which languages should go |
---|
315 | faster often does not hold. |
---|
316 | For example, there are a few cases where Python out-performs |
---|
317 | \CFA, \Cpp and Java. |
---|
318 | % To be exact, the Match All and Match None cases. |
---|
319 | The most likely explination is that, |
---|
320 | the generally faster languages have made ``common cases fast" at the expense |
---|
321 | of the rarer cases. Since exceptions are considered rare, they are made |
---|
322 | expensive to help speed up common actions, such as entering and leaving try |
---|
323 | statements. |
---|
324 | Python on the other hand, while generally slower than the other languages, |
---|
325 | uses exceptions more and has not scarified their performance. |
---|
326 | In addition, languages with high-level representations have a much |
---|
327 | easier time scanning the stack as there is less to decode. |
---|
328 | |
---|
329 | As stated, |
---|
330 | the performance tests are not attempting to show \CFA has a new competitive |
---|
331 | way of implementing exception handling. |
---|
332 | The only performance requirement is to insure the \CFA EHM has reasonable |
---|
333 | performance for prototyping. |
---|
334 | Although that may be hard to exactly quantify, I believe it has succeeded |
---|
335 | in that regard. |
---|
336 | Details on the different test cases follow. |
---|
337 | |
---|
338 | \subsection{Termination \texorpdfstring{(\autoref{t:PerformanceTermination})}{}} |
---|
339 | |
---|
340 | \begin{description} |
---|
341 | \item[Empty Traversal] |
---|
342 | \CFA is slower than \Cpp, but is still faster than the other languages |
---|
343 | and closer to \Cpp than other languages. |
---|
344 | This result is to be expected, |
---|
345 | as \CFA is closer to \Cpp than the other languages. |
---|
346 | |
---|
347 | \item[D'tor Traversal] |
---|
348 | Running destructors causes a huge slowdown in the two languages that support |
---|
349 | them. \CFA has a higher proportionate slowdown but it is similar to \Cpp's. |
---|
350 | Considering the amount of work done in destructors is effectively zero |
---|
351 | (an assembly comment), the cost |
---|
352 | must come from the change of context required to run the destructor. |
---|
353 | |
---|
354 | \item[Finally Traversal] |
---|
355 | Performance is similar to Empty Traversal in all languages that support finally |
---|
356 | clauses. Only Python seems to have a larger than random noise change in |
---|
357 | its run time and it is still not large. |
---|
358 | Despite the similarity between finally clauses and destructors, |
---|
359 | finally clauses seem to avoid the spike that run time destructors have. |
---|
360 | Possibly some optimization removes the cost of changing contexts. |
---|
361 | |
---|
362 | \item[Other Traversal] |
---|
363 | For \Cpp, stopping to check if a handler applies seems to be about as |
---|
364 | expensive as stopping to run a destructor. |
---|
365 | This results in a significant jump. |
---|
366 | |
---|
367 | Other languages experience a small increase in run time. |
---|
368 | The small increase likely comes from running the checks, |
---|
369 | but they could avoid the spike by not having the same kind of overhead for |
---|
370 | switching to the check's context. |
---|
371 | |
---|
372 | \item[Cross Handler] |
---|
373 | Here, \CFA falls behind \Cpp by a much more significant margin. |
---|
374 | This is likely due to the fact that \CFA has to insert two extra function |
---|
375 | calls, while \Cpp does not have to execute any other instructions. |
---|
376 | Python is much further behind. |
---|
377 | |
---|
378 | \item[Cross Finally] |
---|
379 | \CFA's performance now matches \Cpp's from Cross Handler. |
---|
380 | If the code from the finally clause is being inlined, |
---|
381 | which is just an asm comment, than there are no additional instructions |
---|
382 | to execute again when exiting the try statement normally. |
---|
383 | |
---|
384 | \item[Conditional Match] |
---|
385 | Both of the conditional matching tests can be considered on their own. |
---|
386 | However, for evaluating the value of conditional matching itself, the |
---|
387 | comparison of the two sets of results is useful. |
---|
388 | Consider the massive jump in run time for \Cpp going from match all to match |
---|
389 | none, which none of the other languages have. |
---|
390 | Some strange interaction is causing run time to more than double for doing |
---|
391 | twice as many raises. |
---|
392 | Java and Python avoid this problem and have similar run time for both tests, |
---|
393 | possibly through resource reuse or their program representation. |
---|
394 | However, \CFA is built like \Cpp, and avoids the problem as well. |
---|
395 | This matches |
---|
396 | the pattern of the conditional match, which makes the two execution paths |
---|
397 | very similar. |
---|
398 | |
---|
399 | \end{description} |
---|
400 | |
---|
401 | \subsection{Resumption \texorpdfstring{(\autoref{t:PerformanceResumption})}{}} |
---|
402 | |
---|
403 | Moving on to resumption, there is one general note: |
---|
404 | resumption is \textit{fast}. The only test where it fell |
---|
405 | behind termination is Cross Handler. |
---|
406 | In every other case, the number of iterations had to be increased by a |
---|
407 | factor of 10 to get the run time in an appropriate range |
---|
408 | and in some cases resumption still took less time. |
---|
409 | |
---|
410 | % I tried \paragraph and \subparagraph, maybe if I could adjust spacing |
---|
411 | % between paragraphs those would work. |
---|
412 | \begin{description} |
---|
413 | \item[Empty Traversal] |
---|
414 | See above for the general speed-up notes. |
---|
415 | This result is not surprising as resumption's linked-list approach |
---|
416 | means that traversing over stack frames without a resumption handler is |
---|
417 | $O(1)$. |
---|
418 | |
---|
419 | \item[D'tor Traversal] |
---|
420 | Resumption does have the same spike in run time that termination has. |
---|
421 | The run time is actually very similar to Finally Traversal. |
---|
422 | As resumption does not unwind the stack, both destructors and finally |
---|
423 | clauses are run while walking down the stack during the recursive returns. |
---|
424 | So it follows their performance is similar. |
---|
425 | |
---|
426 | \item[Finally Traversal] |
---|
427 | Same as D'tor Traversal, |
---|
428 | except termination did not have a spike in run time on this test case. |
---|
429 | |
---|
430 | \item[Other Traversal] |
---|
431 | Traversing across handlers reduces resumption's advantage as it actually |
---|
432 | has to stop and check each one. |
---|
433 | Resumption still came out ahead (adjusting for iterations) but by much less |
---|
434 | than the other cases. |
---|
435 | |
---|
436 | \item[Cross Handler] |
---|
437 | The only test case where resumption could not keep up with termination, |
---|
438 | although the difference is not as significant as many other cases. |
---|
439 | It is simply a matter of where the costs come from: |
---|
440 | both termination and resumption have some work to set up or tear down a |
---|
441 | handler. It just so happens that resumption's work is slightly slower. |
---|
442 | |
---|
443 | \item[Conditional Match] |
---|
444 | Resumption shows a slight slowdown if the exception is not matched |
---|
445 | by the first handler, which follows from the fact the second handler now has |
---|
446 | to be checked. However, the difference is not large. |
---|
447 | |
---|
448 | \end{description} |
---|
449 | |
---|
450 | \subsection{Resumption/Fixup \texorpdfstring{(\autoref{t:PerformanceFixupRoutines})}{}} |
---|
451 | |
---|
452 | Finally are the results of the resumption/fixup routine comparison. |
---|
453 | These results are surprisingly varied. It is possible that creating a closure |
---|
454 | has more to do with performance than passing the argument through layers of |
---|
455 | calls. |
---|
456 | At 100 stack frames, resumption and manual fixup routines have similar |
---|
457 | performance in \CFA. |
---|
458 | More experiments could try to tease out the exact trade-offs, |
---|
459 | but the prototype's only performance goal is to be reasonable. |
---|
460 | It is already in that range, and \CFA's fixup routine simulation is |
---|
461 | one of the faster simulations as well. |
---|
462 | Plus, exceptions add features and remove syntactic overhead, |
---|
463 | so even at similar performance, resumptions have advantages |
---|
464 | over fixup routines. |
---|