1 | \chapter{Performance} |
---|
2 | \label{c:performance} |
---|
3 | |
---|
4 | Performance is of secondary importance for most of this project. |
---|
5 | Instead, the focus is to get the features working. The only performance |
---|
6 | requirement is to ensure the tests for correctness run in a reasonable |
---|
7 | amount of time. |
---|
8 | |
---|
9 | \section{Test Set-Up} |
---|
10 | Tests were run in \CFA, C++, Java and Python. |
---|
11 | In addition there are two sets of tests for \CFA, |
---|
12 | one for termination and one for resumption exceptions. |
---|
13 | |
---|
14 | C++ is the most comparable language because both it and \CFA use the same |
---|
15 | framework, libunwind. |
---|
16 | In fact, the comparison is almost entirely a quality of implementation |
---|
17 | comparison: \CFA's EHM has had significantly less time to be optimized and |
---|
18 | does not generate its own assembly. It does have a slight advantage in that |
---|
19 | there are some features it handles directly instead of through utility functions, |
---|
20 | but otherwise \Cpp has a significant advantage. |
---|
21 | |
---|
22 | Java is another very popular language with similar termination semantics. |
---|
23 | It is implemented in a very different environment, a virtual machine with |
---|
24 | garbage collection. |
---|
25 | It also implements the @finally@ clause on @try@ blocks allowing for a direct |
---|
26 | feature-to-feature comparison. |
---|
27 | As with \Cpp, Java's implementation is mature, optimizations |
---|
28 | and has extra features. |
---|
29 | |
---|
30 | Python is used as an alternative point of comparison because of the \CFA EHM's |
---|
31 | current performance goals, which is not to be prohibitively slow while the |
---|
32 | features are designed and examined. Python has similar performance goals for |
---|
33 | creating quick scripts and its wide use suggests it has achieved those goals. |
---|
34 | |
---|
35 | Unfortunately, there are no notable modern programming languages with |
---|
36 | resumption exceptions. Even the older programming languages with resumption |
---|
37 | seem to be notable only for having resumption. |
---|
38 | So instead, resumption is compared to a less similar but much more familiar |
---|
39 | feature, termination exceptions. |
---|
40 | |
---|
41 | All tests are run inside a main loop that repeatedly performs a test. |
---|
42 | This approach avoids start-up or tear-down time from |
---|
43 | affecting the timing results. |
---|
44 | Each test is run a million times. |
---|
45 | The Java versions of the test run this loop an extra 1000 times before |
---|
46 | beginning to actual test to ``warm-up" the JVM. |
---|
47 | |
---|
48 | Timing is done internally, with time measured immediately before and |
---|
49 | after the test loop. The difference is calculated and printed. |
---|
50 | The loop structure and internal timing means it is impossible to test |
---|
51 | unhandled exceptions in \Cpp and Java as that would cause the process to |
---|
52 | terminate. |
---|
53 | Luckily, performance on the ``give-up and kill the process" path is not |
---|
54 | critical. |
---|
55 | |
---|
56 | The exceptions used in these tests are always based off of |
---|
57 | a base exception. This requirement minimizes performance differences based |
---|
58 | on the object model used to represent the exception. |
---|
59 | |
---|
60 | All tests are designed to be as minimal as possible, while still preventing |
---|
61 | excessive optimizations. |
---|
62 | For example, empty inline assembly blocks are used in \CFA and \Cpp to |
---|
63 | prevent excessive optimizations while adding no actual work. |
---|
64 | Each test was run eleven times. The top three and bottom three results were |
---|
65 | discarded and the remaining five values are averaged. |
---|
66 | |
---|
67 | The tests are compiled with gcc-10 for \CFA and g++-10 for \Cpp. Java is |
---|
68 | compiled with 11.0.11. Python with 3.8. The tests were run on: |
---|
69 | \begin{itemize}[nosep] |
---|
70 | \item |
---|
71 | ARM 2280 Kunpeng 920 48-core 2$\times$socket \lstinline{@} 2.6 GHz running Linux v5.11.0-25 |
---|
72 | \item |
---|
73 | AMD 6380 Abu Dhabi 16-core 4$\times$socket \lstinline{@} 2.5 GHz running Linux v5.11.0-25 |
---|
74 | \end{itemize} |
---|
75 | |
---|
76 | % We don't use catch-alls but if we did: |
---|
77 | % Catch-alls are done by catching the root exception type (not using \Cpp's |
---|
78 | % \code{C++}{catch(...)}). |
---|
79 | |
---|
80 | \section{Tests} |
---|
81 | The following tests were selected to test the performance of different |
---|
82 | components of the exception system. |
---|
83 | They should provide a guide as to where the EHM's costs are found. |
---|
84 | |
---|
85 | \paragraph{Raise and Handle} |
---|
86 | The first group measures the cost of a try statement when exceptions are raised |
---|
87 | and \emph{the stack is unwound}. Each test has has a repeating function like |
---|
88 | the following |
---|
89 | \begin{cfa} |
---|
90 | void unwind_empty(unsigned int frames) { |
---|
91 | if (frames) { |
---|
92 | unwind_empty(frames - 1); |
---|
93 | } else throw (empty_exception){&empty_vt}; |
---|
94 | } |
---|
95 | \end{cfa} |
---|
96 | which is called M times, where each call recurses to a depth of N, an |
---|
97 | exception is raised, the stack is a unwound, and the exception caught. |
---|
98 | \begin{itemize}[nosep] |
---|
99 | \item Empty: |
---|
100 | This test measures the cost of raising (stack walking) an exception through empty |
---|
101 | empty stack frames to an empty handler. (see above) |
---|
102 | \item Destructor: |
---|
103 | |
---|
104 | This test measures the cost of raising an exception through non-empty frames |
---|
105 | where each frame has an object requiring destruction, to an empty |
---|
106 | handler. Hence, there are N destructor calls during unwinding. |
---|
107 | \begin{cfa} |
---|
108 | if (frames) { |
---|
109 | WithDestructor object; |
---|
110 | unwind_empty(frames - 1); |
---|
111 | \end{cfa} |
---|
112 | \item Finally: |
---|
113 | This test measures the cost of establishing a try block with an empty finally |
---|
114 | clause on the front side of the recursion and running the empty finally clause |
---|
115 | on the back side of the recursion during stack unwinding. |
---|
116 | \begin{cfa} |
---|
117 | if (frames) { |
---|
118 | try { |
---|
119 | unwind_finally(frames - 1); |
---|
120 | } finally {} |
---|
121 | \end{cfa} |
---|
122 | \item Other Handler: |
---|
123 | This test is like the finally test but the try block has a catch clause for an |
---|
124 | exception that is not raised, so catch matching is executed during stack |
---|
125 | unwinding but the match never successes until the catch at the bottom of the |
---|
126 | stack. |
---|
127 | \begin{cfa} |
---|
128 | if (frames) { |
---|
129 | try { |
---|
130 | unwind_other(frames - 1); |
---|
131 | } catch (not_raised_exception *) {} |
---|
132 | \end{cfa} |
---|
133 | \end{itemize} |
---|
134 | |
---|
135 | \paragraph{Cross Try Statement} |
---|
136 | The next group measures just the cost of executing a try statement so |
---|
137 | \emph{there is no stack unwinding}. Hence, the program main loops N times |
---|
138 | around: |
---|
139 | \begin{cfa} |
---|
140 | try { |
---|
141 | } catch (not_raised_exception *) {} |
---|
142 | \end{cfa} |
---|
143 | \begin{itemize}[nosep] |
---|
144 | \item Handler: |
---|
145 | The try statement has a handler. |
---|
146 | \item Finally: |
---|
147 | The try statement replaces the handler with a finally clause. |
---|
148 | \end{itemize} |
---|
149 | |
---|
150 | \paragraph{Conditional Matching} |
---|
151 | This final group measures the cost of conditional matching. |
---|
152 | Only \CFA implements the language level conditional match, |
---|
153 | the other languages must mimic with an ``unconditional" match (it still |
---|
154 | checks the exception's type) and conditional re-raise if it was not supposed |
---|
155 | to handle that exception. |
---|
156 | \begin{center} |
---|
157 | \begin{tabular}{ll} |
---|
158 | \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp, Java, Python} \\ |
---|
159 | \begin{cfa} |
---|
160 | try { |
---|
161 | throw_exception(); |
---|
162 | } catch (empty_exception * exc; |
---|
163 | should_catch) { |
---|
164 | } |
---|
165 | \end{cfa} |
---|
166 | & |
---|
167 | \begin{cfa} |
---|
168 | try { |
---|
169 | throw_exception(); |
---|
170 | } catch (EmptyException & exc) { |
---|
171 | if (!should_catch) throw; |
---|
172 | } |
---|
173 | \end{cfa} |
---|
174 | \end{tabular} |
---|
175 | \end{center} |
---|
176 | \begin{itemize}[nosep] |
---|
177 | \item Match All: |
---|
178 | The condition is always true. (Always matches or never re-raises.) |
---|
179 | \item Match None: |
---|
180 | The condition is always false. (Never matches or always re-raises.) |
---|
181 | \end{itemize} |
---|
182 | |
---|
183 | %\section{Cost in Size} |
---|
184 | %Using exceptions also has a cost in the size of the executable. |
---|
185 | %Although it is sometimes ignored |
---|
186 | % |
---|
187 | %There is a size cost to defining a personality function but the later problem |
---|
188 | %is the LSDA which will be generated for every function. |
---|
189 | % |
---|
190 | %(I haven't actually figured out how to compare this, probably using something |
---|
191 | %related to -fexceptions.) |
---|
192 | |
---|
193 | \section{Results} |
---|
194 | In cases where a feature is not supported by a language the test is skipped |
---|
195 | for that language. |
---|
196 | \PAB{Report all values. |
---|
197 | |
---|
198 | Similarly, if a test does not change between resumption |
---|
199 | and termination in \CFA, then only one test is written and the result |
---|
200 | was put into the termination column. |
---|
201 | } |
---|
202 | |
---|
203 | % Raw Data: |
---|
204 | % run-algol-a.sat |
---|
205 | % --------------- |
---|
206 | % Raise Empty & 82687046678 & 291616256 & 3252824847 & 15422937623 & 14736271114 \\ |
---|
207 | % Raise D'tor & 219933199603 & 297897792 & 223602799362 & N/A & N/A \\ |
---|
208 | % Raise Finally & 219703078448 & 298391745 & N/A & ... & 18923060958 \\ |
---|
209 | % Raise Other & 296744104920 & 2854342084 & 112981255103 & 15475924808 & 21293137454 \\ |
---|
210 | % Cross Handler & 9256648 & 13518430 & 769328 & 3486252 & 31790804 \\ |
---|
211 | % Cross Finally & 769319 & N/A & N/A & 2272831 & 37491962 \\ |
---|
212 | % Match All & 3654278402 & 47518560 & 3218907794 & 1296748192 & 624071886 \\ |
---|
213 | % Match None & 4788861754 & 58418952 & 9458936430 & 1318065020 & 625200906 \\ |
---|
214 | % |
---|
215 | % run-algol-thr-c |
---|
216 | % --------------- |
---|
217 | % Raise Empty & 3757606400 & 36472972 & 3257803337 & 15439375452 & 14717808642 \\ |
---|
218 | % Raise D'tor & 64546302019 & 102148375 & 223648121635 & N/A & N/A \\ |
---|
219 | % Raise Finally & 64671359172 & 103285005 & N/A & 15442729458 & 18927008844 \\ |
---|
220 | % Raise Other & 294143497130 & 2630130385 & 112969055576 & 15448220154 & 21279953424 \\ |
---|
221 | % Cross Handler & 9646462 & 11955668 & 769328 & 3453707 & 31864074 \\ |
---|
222 | % Cross Finally & 773412 & N/A & N/A & 2253825 & 37266476 \\ |
---|
223 | % Match All & 3719462155 & 43294042 & 3223004977 & 1286054154 & 623887874 \\ |
---|
224 | % Match None & 4971630929 & 55311709 & 9481225467 & 1310251289 & 623752624 \\ |
---|
225 | % |
---|
226 | % run-algol-04-a |
---|
227 | % -------------- |
---|
228 | % Raise Empty & 0.0 & 0.0 & 3250260945 & 0.0 & 0.0 \\ |
---|
229 | % Raise D'tor & 0.0 & 0.0 & 29017675113 & N/A & N/A \\ |
---|
230 | % Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\ |
---|
231 | % Raise Other & 0.0 & 0.0 & 24411823773 & 0.0 & 0.0 \\ |
---|
232 | % Cross Handler & 0.0 & 0.0 & 769334 & 0.0 & 0.0 \\ |
---|
233 | % Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\ |
---|
234 | % Match All & 0.0 & 0.0 & 3254283504 & 0.0 & 0.0 \\ |
---|
235 | % Match None & 0.0 & 0.0 & 9476060146 & 0.0 & 0.0 \\ |
---|
236 | |
---|
237 | \begin{tabular}{|l|c c c c c|} |
---|
238 | \hline |
---|
239 | & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\ |
---|
240 | \hline |
---|
241 | Raise Empty & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ |
---|
242 | Raise D'tor & 0.0 & 0.0 & 0.0 & N/A & N/A \\ |
---|
243 | Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\ |
---|
244 | Raise Other & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ |
---|
245 | Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ |
---|
246 | Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\ |
---|
247 | Match All & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ |
---|
248 | Match None & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ |
---|
249 | \hline |
---|
250 | \end{tabular} |
---|
251 | |
---|
252 | % run-plg7a-a.sat |
---|
253 | % --------------- |
---|
254 | % Raise Empty & 57169011329 & 296612564 & 2788557155 & 17511466039 & 23324548496 \\ |
---|
255 | % Raise D'tor & 150599858014 & 318443709 & 149651693682 & N/A & N/A \\ |
---|
256 | % Raise Finally & 148223145000 & 373325807 & N/A & ... & 29074552998 \\ |
---|
257 | % Raise Other & 189463708732 & 3017109322 & 85819281694 & 17584295487 & 32602686679 \\ |
---|
258 | % Cross Handler & 8001654 & 13584858 & 1555995 & 6626775 & 41927358 \\ |
---|
259 | % Cross Finally & 1002473 & N/A & N/A & 4554344 & 51114381 \\ |
---|
260 | % Match All & 3162460860 & 37315018 & 2649464591 & 1523205769 & 742374509 \\ |
---|
261 | % Match None & 4054773797 & 47052659 & 7759229131 & 1555373654 & 744656403 \\ |
---|
262 | % |
---|
263 | % run-plg7a-thr-a |
---|
264 | % --------------- |
---|
265 | % Raise Empty & 3604235388 & 29829965 & 2786931833 & 17576506385 & 23352975105 \\ |
---|
266 | % Raise D'tor & 46552380948 & 178709605 & 149834207219 & N/A & N/A \\ |
---|
267 | % Raise Finally & 46265157775 & 177906320 & N/A & 17493045092 & 29170962959 \\ |
---|
268 | % Raise Other & 195659245764 & 2376968982 & 86070431924 & 17552979675 & 32501882918 \\ |
---|
269 | % Cross Handler & 397031776 & 12503552 & 1451225 & 6658628 & 42304965 \\ |
---|
270 | % Cross Finally & 1136746 & N/A & N/A & 4468799 & 46155817 \\ |
---|
271 | % Match All & 3189512499 & 39124453 & 2667795989 & 1525889031 & 733785613 \\ |
---|
272 | % Match None & 4094675477 & 48749857 & 7850618572 & 1566713577 & 733478963 \\ |
---|
273 | % |
---|
274 | % run-plg7a-04-a |
---|
275 | % -------------- |
---|
276 | % 0.0 are unfilled. |
---|
277 | % Raise Empty & 0.0 & 0.0 & 2770781479 & 0.0 & 0.0 \\ |
---|
278 | % Raise D'tor & 0.0 & 0.0 & 23530084907 & N/A & N/A \\ |
---|
279 | % Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\ |
---|
280 | % Raise Other & 0.0 & 0.0 & 23816827982 & 0.0 & 0.0 \\ |
---|
281 | % Cross Handler & 0.0 & 0.0 & 1422188 & 0.0 & 0.0 \\ |
---|
282 | % Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\ |
---|
283 | % Match All & 0.0 & 0.0 & 2671989778 & 0.0 & 0.0 \\ |
---|
284 | % Match None & 0.0 & 0.0 & 7829059869 & 0.0 & 0.0 \\ |
---|
285 | |
---|
286 | % PLG7A (in seconds) |
---|
287 | \begin{tabular}{|l|c c c c c|} |
---|
288 | \hline |
---|
289 | & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\ |
---|
290 | \hline |
---|
291 | Raise Empty & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ |
---|
292 | Raise D'tor & 0.0 & 0.0 & 0.0 & N/A & N/A \\ |
---|
293 | Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\ |
---|
294 | Raise Other & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ |
---|
295 | Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ |
---|
296 | Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\ |
---|
297 | Match All & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ |
---|
298 | Match None & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ |
---|
299 | \hline |
---|
300 | \end{tabular} |
---|
301 | |
---|
302 | One result not directly related to \CFA but important to keep in |
---|
303 | mind is that, for exceptions, the standard intuition about which languages |
---|
304 | should go faster often does not hold. For example, there are cases where Python out-performs |
---|
305 | \Cpp and Java. The most likely explanation is that, since exceptions are |
---|
306 | rarely considered to be the common case, the more optimized languages |
---|
307 | make that case expense. In addition, languages with high-level |
---|
308 | representations have a much easier time scanning the stack as there is less |
---|
309 | to decode. |
---|
310 | |
---|
311 | This observation means that while \CFA does not actually keep up with Python in every |
---|
312 | case, it is usually no worse than roughly half the speed of \Cpp. This performance is good |
---|
313 | enough for the prototyping purposes of the project. |
---|
314 | |
---|
315 | The test case where \CFA falls short is Raise Other, the case where the |
---|
316 | stack is unwound including a bunch of non-matching handlers. |
---|
317 | This slowdown seems to come from missing optimizations. |
---|
318 | |
---|
319 | Importantly, there is a huge slowdown in \Cpp's results bringing that brings |
---|
320 | \CFA's performance back in that roughly half speed area. However many other |
---|
321 | \CFA benchmarks increase their run-time by a similar amount falling far |
---|
322 | behind their \Cpp counter-parts. |
---|
323 | |
---|
324 | This suggests that the performance issue in Raise Other is just an |
---|
325 | optimization not being applied. Later versions of gcc may be able to |
---|
326 | optimize this case further, at least down to the half of \Cpp mark. |
---|
327 | A \CFA compiler that directly produced assembly could do even better as it |
---|
328 | would not have to work across some of \CFA's current abstractions, like |
---|
329 | the try terminate function. |
---|
330 | |
---|
331 | Resumption exception handling is also incredibly fast. Often an order of |
---|
332 | magnitude or two better than the best termination speed. |
---|
333 | There is a simple explanation for this; traversing a linked list is much |
---|
334 | faster than examining and unwinding the stack. When resumption does not do as |
---|
335 | well its when more try statements are used per raise. Updating the internal |
---|
336 | linked list is not very expensive but it does add up. |
---|
337 | |
---|
338 | The relative speed of the Match All and Match None tests (within each |
---|
339 | language) can also show the effectiveness conditional matching as compared |
---|
340 | to catch and rethrow. |
---|
341 | \begin{itemize}[nosep] |
---|
342 | \item |
---|
343 | Java and Python get similar values in both tests. |
---|
344 | Between the interpreted code, a higher level representation of the call |
---|
345 | stack and exception reuse it it is possible the cost for a second |
---|
346 | throw can be folded into the first. |
---|
347 | % Is this due to optimization? |
---|
348 | \item |
---|
349 | Both types of \CFA are slightly slower if there is not a match. |
---|
350 | For termination this likely comes from unwinding a bit more stack through |
---|
351 | libunwind instead of executing the code normally. |
---|
352 | For resumption there is extra work in traversing more of the list and running |
---|
353 | more checks for a matching exceptions. |
---|
354 | % Resumption is a bit high for that but this is my best theory. |
---|
355 | \item |
---|
356 | Then there is \Cpp, which takes 2--3 times longer to catch and rethrow vs. |
---|
357 | just the catch. This is very high, but it does have to repeat the same |
---|
358 | process of unwinding the stack and may have to parse the LSDA of the function |
---|
359 | with the catch and rethrow twice, once before the catch and once after the |
---|
360 | rethrow. |
---|
361 | % I spent a long time thinking of what could push it over twice, this is all |
---|
362 | % I have to explain it. |
---|
363 | \end{itemize} |
---|
364 | The difference in relative performance does show that there are savings to |
---|
365 | be made by performing the check without catching the exception. |
---|