source: doc/theses/andrew_beach_MMath/performance.tex @ c9f9d4f

enumforall-pointer-decayjacob/cs343-translationnew-ast-unique-expr
Last change on this file since c9f9d4f was c9f9d4f, checked in by Peter A. Buhr <pabuhr@…>, 11 months ago

first proofread of performance chapter

  • Property mode set to 100644
File size: 15.6 KB
Line 
1\chapter{Performance}
2\label{c:performance}
3
4Performance is of secondary importance for most of this project.
5Instead, the focus is to get the features working. The only performance
6requirement is to ensure the tests for correctness run in a reasonable
7amount of time.
8
9\section{Test Set-Up}
10Tests were run in \CFA, C++, Java and Python.
11In addition there are two sets of tests for \CFA,
12one for termination and one for resumption exceptions.
13
14C++ is the most comparable language because both it and \CFA use the same
15framework, libunwind.
16In fact, the comparison is almost entirely a quality of implementation
17comparison: \CFA's EHM has had significantly less time to be optimized and
18does not generate its own assembly. It does have a slight advantage in that
19there are some features it handles directly instead of through utility functions,
20but otherwise \Cpp has a significant advantage.
21
22Java is another very popular language with similar termination semantics.
23It is implemented in a very different environment, a virtual machine with
24garbage collection.
25It also implements the @finally@ clause on @try@ blocks allowing for a direct
26feature-to-feature comparison.
27As with \Cpp, Java's implementation is mature, optimizations
28and has extra features.
29
30Python is used as an alternative point of comparison because of the \CFA EHM's
31current performance goals, which is not to be prohibitively slow while the
32features are designed and examined. Python has similar performance goals for
33creating quick scripts and its wide use suggests it has achieved those goals.
34
35Unfortunately, there are no notable modern programming languages with
36resumption exceptions. Even the older programming languages with resumption
37seem to be notable only for having resumption.
38So instead, resumption is compared to a less similar but much more familiar
39feature, termination exceptions.
40
41All tests are run inside a main loop that repeatedly performs a test.
42This approach avoids start-up or tear-down time from
43affecting the timing results.
44Each test is run a million times.
45The Java versions of the test run this loop an extra 1000 times before
46beginning to actual test to ``warm-up" the JVM.
47
48Timing is done internally, with time measured immediately before and
49after the test loop. The difference is calculated and printed.
50The loop structure and internal timing means it is impossible to test
51unhandled exceptions in \Cpp and Java as that would cause the process to
52terminate.
53Luckily, performance on the ``give-up and kill the process" path is not
54critical.
55
56The exceptions used in these tests are always based off of
57a base exception. This requirement minimizes performance differences based
58on the object model used to represent the exception.
59
60All tests are designed to be as minimal as possible, while still preventing
61excessive optimizations.
62For example, empty inline assembly blocks are used in \CFA and \Cpp to
63prevent excessive optimizations while adding no actual work.
64Each test was run eleven times. The top three and bottom three results were
65discarded and the remaining five values are averaged.
66
67The tests are compiled with gcc-10 for \CFA and g++-10 for \Cpp. Java is
68compiled with 11.0.11. Python with 3.8. The tests were run on:
69\begin{itemize}[nosep]
70\item
71ARM 2280 Kunpeng 920 48-core 2$\times$socket \lstinline{@} 2.6 GHz running Linux v5.11.0-25
72\item
73AMD 6380 Abu Dhabi 16-core 4$\times$socket \lstinline{@} 2.5 GHz running Linux v5.11.0-25
74\end{itemize}
75
76% We don't use catch-alls but if we did:
77% Catch-alls are done by catching the root exception type (not using \Cpp's
78% \code{C++}{catch(...)}).
79
80\section{Tests}
81The following tests were selected to test the performance of different
82components of the exception system.
83They should provide a guide as to where the EHM's costs are found.
84
85\paragraph{Raise and Handle}
86The first group measures the cost of a try statement when exceptions are raised
87and \emph{the stack is unwound}.  Each test has has a repeating function like
88the following
89\begin{cfa}
90void unwind_empty(unsigned int frames) {
91        if (frames) {
92                unwind_empty(frames - 1);
93        } else throw (empty_exception){&empty_vt};
94}
95\end{cfa}
96which is called M times, where each call recurses to a depth of N, an
97exception is raised, the stack is a unwound, and the exception caught.
98\begin{itemize}[nosep]
99\item Empty:
100This test measures the cost of raising (stack walking) an exception through empty
101empty stack frames to an empty handler. (see above)
102\item Destructor:
103
104This test measures the cost of raising an exception through non-empty frames
105where each frame has an object requiring destruction, to an empty
106handler. Hence, there are N destructor calls during unwinding.
107\begin{cfa}
108if (frames) {
109        WithDestructor object;
110        unwind_empty(frames - 1);
111\end{cfa}
112\item Finally:
113This test measures the cost of establishing a try block with an empty finally
114clause on the front side of the recursion and running the empty finally clause
115on the back side of the recursion during stack unwinding.
116\begin{cfa}
117if (frames) {
118        try {
119                unwind_finally(frames - 1);
120        } finally {}
121\end{cfa}
122\item Other Handler:
123This test is like the finally test but the try block has a catch clause for an
124exception that is not raised, so catch matching is executed during stack
125unwinding but the match never successes until the catch at the bottom of the
126stack.
127\begin{cfa}
128if (frames) {
129        try {
130                unwind_other(frames - 1);
131        } catch (not_raised_exception *) {}
132\end{cfa}
133\end{itemize}
134
135\paragraph{Cross Try Statement}
136The next group measures just the cost of executing a try statement so
137\emph{there is no stack unwinding}.  Hence, the program main loops N times
138around:
139\begin{cfa}
140try {
141} catch (not_raised_exception *) {}
142\end{cfa}
143\begin{itemize}[nosep]
144\item Handler:
145The try statement has a handler.
146\item Finally:
147The try statement replaces the handler with a finally clause.
148\end{itemize}
149
150\paragraph{Conditional Matching}
151This final group measures the cost of conditional matching.
152Only \CFA implements the language level conditional match,
153the other languages must mimic with an ``unconditional" match (it still
154checks the exception's type) and conditional re-raise if it was not supposed
155to handle that exception.
156\begin{center}
157\begin{tabular}{ll}
158\multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp, Java, Python} \\
159\begin{cfa}
160try {
161        throw_exception();
162} catch (empty_exception * exc;
163                 should_catch) {
164}
165\end{cfa}
166&
167\begin{cfa}
168try {
169        throw_exception();
170} catch (EmptyException & exc) {
171        if (!should_catch) throw;
172}
173\end{cfa}
174\end{tabular}
175\end{center}
176\begin{itemize}[nosep]
177\item Match All:
178The condition is always true. (Always matches or never re-raises.)
179\item Match None:
180The condition is always false. (Never matches or always re-raises.)
181\end{itemize}
182
183%\section{Cost in Size}
184%Using exceptions also has a cost in the size of the executable.
185%Although it is sometimes ignored
186%
187%There is a size cost to defining a personality function but the later problem
188%is the LSDA which will be generated for every function.
189%
190%(I haven't actually figured out how to compare this, probably using something
191%related to -fexceptions.)
192
193\section{Results}
194In cases where a feature is not supported by a language the test is skipped
195for that language.
196\PAB{Report all values.
197
198Similarly, if a test does not change between resumption
199and termination in \CFA, then only one test is written and the result
200was put into the termination column.
201}
202
203% Raw Data:
204% run-algol-a.sat
205% ---------------
206% Raise Empty   &  82687046678 &  291616256 &   3252824847 & 15422937623 & 14736271114 \\
207% Raise D'tor   & 219933199603 &  297897792 & 223602799362 &         N/A &         N/A \\
208% Raise Finally & 219703078448 &  298391745 &          N/A &         ... & 18923060958 \\
209% Raise Other   & 296744104920 & 2854342084 & 112981255103 & 15475924808 & 21293137454 \\
210% Cross Handler &      9256648 &   13518430 &       769328 &     3486252 &    31790804 \\
211% Cross Finally &       769319 &        N/A &          N/A &     2272831 &    37491962 \\
212% Match All     &   3654278402 &   47518560 &   3218907794 &  1296748192 &   624071886 \\
213% Match None    &   4788861754 &   58418952 &   9458936430 &  1318065020 &   625200906 \\
214%
215% run-algol-thr-c
216% ---------------
217% Raise Empty   &   3757606400 &   36472972 &   3257803337 & 15439375452 & 14717808642 \\
218% Raise D'tor   &  64546302019 &  102148375 & 223648121635 &         N/A &         N/A \\
219% Raise Finally &  64671359172 &  103285005 &          N/A & 15442729458 & 18927008844 \\
220% Raise Other   & 294143497130 & 2630130385 & 112969055576 & 15448220154 & 21279953424 \\
221% Cross Handler &      9646462 &   11955668 &       769328 &     3453707 &    31864074 \\
222% Cross Finally &       773412 &        N/A &          N/A &     2253825 &    37266476 \\
223% Match All     &   3719462155 &   43294042 &   3223004977 &  1286054154 &   623887874 \\
224% Match None    &   4971630929 &   55311709 &   9481225467 &  1310251289 &   623752624 \\
225%
226% run-algol-04-a
227% --------------
228% Raise Empty   & 0.0 & 0.0 &  3250260945 & 0.0 & 0.0 \\
229% Raise D'tor   & 0.0 & 0.0 & 29017675113 & N/A & N/A \\
230% Raise Finally & 0.0 & 0.0 &         N/A & 0.0 & 0.0 \\
231% Raise Other   & 0.0 & 0.0 & 24411823773 & 0.0 & 0.0 \\
232% Cross Handler & 0.0 & 0.0 &      769334 & 0.0 & 0.0 \\
233% Cross Finally & 0.0 & N/A &         N/A & 0.0 & 0.0 \\
234% Match All     & 0.0 & 0.0 &  3254283504 & 0.0 & 0.0 \\
235% Match None    & 0.0 & 0.0 &  9476060146 & 0.0 & 0.0 \\
236
237\begin{tabular}{|l|c c c c c|}
238\hline
239              & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\
240\hline
241Raise Empty   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
242Raise D'tor   & 0.0 & 0.0 & 0.0 & N/A & N/A \\
243Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
244Raise Other   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
245Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
246Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
247Match All     & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
248Match None    & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
249\hline
250\end{tabular}
251
252% run-plg7a-a.sat
253% ---------------
254% Raise Empty   &  57169011329 &  296612564 &   2788557155 & 17511466039 & 23324548496 \\
255% Raise D'tor   & 150599858014 &  318443709 & 149651693682 &         N/A &         N/A \\
256% Raise Finally & 148223145000 &  373325807 &          N/A &         ... & 29074552998 \\
257% Raise Other   & 189463708732 & 3017109322 &  85819281694 & 17584295487 & 32602686679 \\
258% Cross Handler &      8001654 &   13584858 &      1555995 &     6626775 &    41927358 \\
259% Cross Finally &      1002473 &        N/A &          N/A &     4554344 &    51114381 \\
260% Match All     &   3162460860 &   37315018 &   2649464591 &  1523205769 &   742374509 \\
261% Match None    &   4054773797 &   47052659 &   7759229131 &  1555373654 &   744656403 \\
262%
263% run-plg7a-thr-a
264% ---------------
265% Raise Empty   &   3604235388 &   29829965 &   2786931833 & 17576506385 & 23352975105 \\
266% Raise D'tor   &  46552380948 &  178709605 & 149834207219 &         N/A &         N/A \\
267% Raise Finally &  46265157775 &  177906320 &          N/A & 17493045092 & 29170962959 \\
268% Raise Other   & 195659245764 & 2376968982 &  86070431924 & 17552979675 & 32501882918 \\
269% Cross Handler &    397031776 &   12503552 &      1451225 &     6658628 &    42304965 \\
270% Cross Finally &      1136746 &        N/A &          N/A &     4468799 &    46155817 \\
271% Match All     &   3189512499 &   39124453 &   2667795989 &  1525889031 &   733785613 \\
272% Match None    &   4094675477 &   48749857 &   7850618572 &  1566713577 &   733478963 \\
273%
274% run-plg7a-04-a
275% --------------
276% 0.0 are unfilled.
277% Raise Empty   & 0.0 & 0.0 &  2770781479 & 0.0 & 0.0 \\
278% Raise D'tor   & 0.0 & 0.0 & 23530084907 & N/A & N/A \\
279% Raise Finally & 0.0 & 0.0 &         N/A & 0.0 & 0.0 \\
280% Raise Other   & 0.0 & 0.0 & 23816827982 & 0.0 & 0.0 \\
281% Cross Handler & 0.0 & 0.0 &     1422188 & 0.0 & 0.0 \\
282% Cross Finally & 0.0 & N/A &         N/A & 0.0 & 0.0 \\
283% Match All     & 0.0 & 0.0 &  2671989778 & 0.0 & 0.0 \\
284% Match None    & 0.0 & 0.0 &  7829059869 & 0.0 & 0.0 \\
285
286% PLG7A (in seconds)
287\begin{tabular}{|l|c c c c c|}
288\hline
289              & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\
290\hline
291Raise Empty   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
292Raise D'tor   & 0.0 & 0.0 & 0.0 & N/A & N/A \\
293Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
294Raise Other   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
295Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
296Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
297Match All     & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
298Match None    & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
299\hline
300\end{tabular}
301
302One result not directly related to \CFA but important to keep in
303mind is that, for exceptions, the standard intuition about which languages
304should go faster often does not hold. For example, there are cases where Python out-performs
305\Cpp and Java. The most likely explanation is that, since exceptions are
306rarely considered to be the common case, the more optimized languages
307make that case expense. In addition, languages with high-level
308representations have a much easier time scanning the stack as there is less
309to decode.
310
311This observation means that while \CFA does not actually keep up with Python in every
312case, it is usually no worse than roughly half the speed of \Cpp. This performance is good
313enough for the prototyping purposes of the project.
314
315The test case where \CFA falls short is Raise Other, the case where the
316stack is unwound including a bunch of non-matching handlers.
317This slowdown seems to come from missing optimizations.
318
319Importantly, there is a huge slowdown in \Cpp's results bringing that brings
320\CFA's performance back in that roughly half speed area. However many other
321\CFA benchmarks increase their run-time by a similar amount falling far
322behind their \Cpp counter-parts.
323
324This suggests that the performance issue in Raise Other is just an
325optimization not being applied. Later versions of gcc may be able to
326optimize this case further, at least down to the half of \Cpp mark.
327A \CFA compiler that directly produced assembly could do even better as it
328would not have to work across some of \CFA's current abstractions, like
329the try terminate function.
330
331Resumption exception handling is also incredibly fast. Often an order of
332magnitude or two better than the best termination speed.
333There is a simple explanation for this; traversing a linked list is much   
334faster than examining and unwinding the stack. When resumption does not do as
335well its when more try statements are used per raise. Updating the internal
336linked list is not very expensive but it does add up.
337
338The relative speed of the Match All and Match None tests (within each
339language) can also show the effectiveness conditional matching as compared
340to catch and rethrow.
341\begin{itemize}[nosep]
342\item
343Java and Python get similar values in both tests.
344Between the interpreted code, a higher level representation of the call
345stack and exception reuse it it is possible the cost for a second
346throw can be folded into the first.
347% Is this due to optimization?
348\item
349Both types of \CFA are slightly slower if there is not a match.
350For termination this likely comes from unwinding a bit more stack through
351libunwind instead of executing the code normally.
352For resumption there is extra work in traversing more of the list and running
353more checks for a matching exceptions.
354% Resumption is a bit high for that but this is my best theory.
355\item
356Then there is \Cpp, which takes 2--3 times longer to catch and rethrow vs.
357just the catch. This is very high, but it does have to repeat the same
358process of unwinding the stack and may have to parse the LSDA of the function
359with the catch and rethrow twice, once before the catch and once after the
360rethrow.
361% I spent a long time thinking of what could push it over twice, this is all
362% I have to explain it.
363\end{itemize}
364The difference in relative performance does show that there are savings to
365be made by performing the check without catching the exception.
Note: See TracBrowser for help on using the repository browser.