source: doc/theses/andrew_beach_MMath/performance.tex @ 50b29d9

ADTast-experimentalenumforall-pointer-decayjacob/cs343-translationnew-ast-unique-exprpthread-emulationqualifiedEnum
Last change on this file since 50b29d9 was 5438e41, checked in by Andrew Beach <ajbeach@…>, 3 years ago

Andrew MMath: Had some bad performance numbers, updated the description as well.

  • Property mode set to 100644
File size: 14.3 KB
Line 
1\chapter{Performance}
2\label{c:performance}
3
4Performance has been of secondary importance for most of this project.
5Instead, the focus has been to get the features working. The only performance
6requirements is to ensure the tests for correctness run in a reasonable
7amount of time.
8
9\section{Test Set-Up}
10Tests will be run in \CFA, C++, Java and Python.
11In addition there are two sets of tests for \CFA,
12one for termination exceptions and once with resumption exceptions.
13
14C++ is the most comparable language because both it and \CFA use the same
15framework, libunwind.
16In fact, the comparison is almost entirely a quality of implementation
17comparison. \CFA's EHM has had significantly less time to be optimized and
18does not generate its own assembly. It does have a slight advantage in that
19there are some features it does not handle, through utility functions,
20but otherwise \Cpp has a significant advantage.
21
22Java is another very popular language with similar termination semantics.
23It is implemented in a very different environment, a virtual machine with
24garbage collection.
25It also implements the finally clause on try blocks allowing for a direct
26feature-to-feature comparison.
27As with \Cpp, Java's implementation is more mature, has more optimizations
28and more extra features.
29
30Python was used as a point of comparison because of the \CFA EHM's
31current performance goals, which is not be prohibitively slow while the
32features are designed and examined. Python has similar performance goals for
33creating quick scripts and its wide use suggests it has achieved those goals.
34
35Unfortunately there are no notable modern programming languages with
36resumption exceptions. Even the older programming languages with resumptions
37seem to be notable only for having resumptions.
38So instead resumptions are compared to a less similar but much more familiar
39feature, termination exceptions.
40
41All tests are run inside a main loop which will perform the test
42repeatedly. This is to avoids start-up or tear-down time from
43affecting the timing results.
44Tests ran their main loop a million times.
45The Java versions of the test also run this loop an extra 1000 times before
46beginning to time the results to ``warm-up" the JVM.
47
48Timing is done internally, with time measured immediately before and
49immediately after the test loop. The difference is calculated and printed.
50
51The loop structure and internal timing means it is impossible to test
52unhandled exceptions in \Cpp and Java as that would cause the process to
53terminate.
54Luckily, performance on the ``give-up and kill the process" path is not
55critical.
56
57The exceptions used in these tests will always be a exception based off of
58the base exception. This requirement minimizes performance differences based
59on the object model used to repersent the exception.
60
61All tests were designed to be as minimal as possible while still preventing
62exessive optimizations.
63For example, empty inline assembly blocks are used in \CFA and \Cpp to
64prevent excessive optimizations while adding no actual work.
65
66% We don't use catch-alls but if we did:
67% Catch-alls are done by catching the root exception type (not using \Cpp's
68% \code{C++}{catch(...)}).
69
70\section{Tests}
71The following tests were selected to test the performance of different
72components of the exception system.
73The should provide a guide as to where the EHM's costs can be found.
74
75\paragraph{Raise and Handle}
76The first group of tests involve setting up
77So there is three layers to the test. The first is set up and a loop, which
78configures the test and then runs it repeatedly to reduce the impact of
79start-up and shutdown on the results.
80Each iteration of the main loop
81\begin{itemize}[nosep]
82\item Empty Function:
83The repeating function is empty except for the necessary control code.
84\item Destructor:
85The repeating function creates an object with a destructor before calling
86itself.
87\item Finally:
88The repeating function calls itself inside a try block with a finally clause
89attached.
90\item Other Handler:
91The repeating function calls itself inside a try block with a handler that
92will not match the raised exception. (But is of the same kind of handler.)
93\end{itemize}
94
95\paragraph{Cross Try Statement}
96The next group measures the cost of a try statement when no exceptions are
97raised. The test is set-up, then there is a loop to reduce the impact of
98start-up and shutdown on the results.
99In each iteration, a try statement is executed. Entering and leaving a loop
100is all the test wants to do.
101\begin{itemize}[nosep]
102\item Handler:
103The try statement has a handler (of the matching kind).
104\item Finally:
105The try statement has a finally clause.
106\end{itemize}
107
108\paragraph{Conditional Matching}
109This group of tests checks the cost of conditional matching.
110Only \CFA implements the language level conditional match,
111the other languages must mimic with an ``unconditional" match (it still
112checks the exception's type) and conditional re-raise if it was not supposed
113to handle that exception.
114\begin{itemize}[nosep]
115\item Match All:
116The condition is always true. (Always matches or never re-raises.)
117\item Match None:
118The condition is always false. (Never matches or always re-raises.)
119\end{itemize}
120
121%\section{Cost in Size}
122%Using exceptions also has a cost in the size of the executable.
123%Although it is sometimes ignored
124%
125%There is a size cost to defining a personality function but the later problem
126%is the LSDA which will be generated for every function.
127%
128%(I haven't actually figured out how to compare this, probably using something
129%related to -fexceptions.)
130
131\section{Results}
132Each test was run eleven times. The top three and bottom three results were
133discarded and the remaining five values are averaged.
134
135In cases where a feature is not supported by a language the test is skipped
136for that language. Similarly, if a test is does not change between resumption
137and termination in \CFA, then only one test is written and the result
138was put into the termination column.
139
140% Raw Data:
141% run-algol-a.sat
142% ---------------
143% Raise Empty   &  82687046678 &  291616256 &   3252824847 & 15422937623 & 14736271114 \\
144% Raise D'tor   & 219933199603 &  297897792 & 223602799362 &         N/A &         N/A \\
145% Raise Finally & 219703078448 &  298391745 &          N/A &         ... & 18923060958 \\
146% Raise Other   & 296744104920 & 2854342084 & 112981255103 & 15475924808 & 21293137454 \\
147% Cross Handler &      9256648 &   13518430 &       769328 &     3486252 &    31790804 \\
148% Cross Finally &       769319 &        N/A &          N/A &     2272831 &    37491962 \\
149% Match All     &   3654278402 &   47518560 &   3218907794 &  1296748192 &   624071886 \\
150% Match None    &   4788861754 &   58418952 &   9458936430 &  1318065020 &   625200906 \\
151%
152% run-algol-thr-c
153% ---------------
154% Raise Empty   &   3757606400 &   36472972 &   3257803337 & 15439375452 & 14717808642 \\
155% Raise D'tor   &  64546302019 &  102148375 & 223648121635 &         N/A &         N/A \\
156% Raise Finally &  64671359172 &  103285005 &          N/A & 15442729458 & 18927008844 \\
157% Raise Other   & 294143497130 & 2630130385 & 112969055576 & 15448220154 & 21279953424 \\
158% Cross Handler &      9646462 &   11955668 &       769328 &     3453707 &    31864074 \\
159% Cross Finally &       773412 &        N/A &          N/A &     2253825 &    37266476 \\
160% Match All     &   3719462155 &   43294042 &   3223004977 &  1286054154 &   623887874 \\
161% Match None    &   4971630929 &   55311709 &   9481225467 &  1310251289 &   623752624 \\
162%
163% run-algol-04-a
164% --------------
165% Raise Empty   & 0.0 & 0.0 &  3250260945 & 0.0 & 0.0 \\
166% Raise D'tor   & 0.0 & 0.0 & 29017675113 & N/A & N/A \\
167% Raise Finally & 0.0 & 0.0 &         N/A & 0.0 & 0.0 \\
168% Raise Other   & 0.0 & 0.0 & 24411823773 & 0.0 & 0.0 \\
169% Cross Handler & 0.0 & 0.0 &      769334 & 0.0 & 0.0 \\
170% Cross Finally & 0.0 & N/A &         N/A & 0.0 & 0.0 \\
171% Match All     & 0.0 & 0.0 &  3254283504 & 0.0 & 0.0 \\
172% Match None    & 0.0 & 0.0 &  9476060146 & 0.0 & 0.0 \\
173
174\begin{tabular}{|l|c c c c c|}
175\hline
176              & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\
177\hline
178Raise Empty   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
179Raise D'tor   & 0.0 & 0.0 & 0.0 & N/A & N/A \\
180Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
181Raise Other   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
182Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
183Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
184Match All     & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
185Match None    & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
186\hline
187\end{tabular}
188
189% run-plg7a-a.sat
190% ---------------
191% Raise Empty   &  57169011329 &  296612564 &   2788557155 & 17511466039 & 23324548496 \\
192% Raise D'tor   & 150599858014 &  318443709 & 149651693682 &         N/A &         N/A \\
193% Raise Finally & 148223145000 &  373325807 &          N/A &         ... & 29074552998 \\
194% Raise Other   & 189463708732 & 3017109322 &  85819281694 & 17584295487 & 32602686679 \\
195% Cross Handler &      8001654 &   13584858 &      1555995 &     6626775 &    41927358 \\
196% Cross Finally &      1002473 &        N/A &          N/A &     4554344 &    51114381 \\
197% Match All     &   3162460860 &   37315018 &   2649464591 &  1523205769 &   742374509 \\
198% Match None    &   4054773797 &   47052659 &   7759229131 &  1555373654 &   744656403 \\
199%
200% run-plg7a-thr-a
201% ---------------
202% Raise Empty   &   3604235388 &   29829965 &   2786931833 & 17576506385 & 23352975105 \\
203% Raise D'tor   &  46552380948 &  178709605 & 149834207219 &         N/A &         N/A \\
204% Raise Finally &  46265157775 &  177906320 &          N/A & 17493045092 & 29170962959 \\
205% Raise Other   & 195659245764 & 2376968982 &  86070431924 & 17552979675 & 32501882918 \\
206% Cross Handler &    397031776 &   12503552 &      1451225 &     6658628 &    42304965 \\
207% Cross Finally &      1136746 &        N/A &          N/A &     4468799 &    46155817 \\
208% Match All     &   3189512499 &   39124453 &   2667795989 &  1525889031 &   733785613 \\
209% Match None    &   4094675477 &   48749857 &   7850618572 &  1566713577 &   733478963 \\
210%
211% run-plg7a-04-a
212% --------------
213% 0.0 are unfilled.
214% Raise Empty   & 0.0 & 0.0 &  2770781479 & 0.0 & 0.0 \\
215% Raise D'tor   & 0.0 & 0.0 & 23530084907 & N/A & N/A \\
216% Raise Finally & 0.0 & 0.0 &         N/A & 0.0 & 0.0 \\
217% Raise Other   & 0.0 & 0.0 & 23816827982 & 0.0 & 0.0 \\
218% Cross Handler & 0.0 & 0.0 &     1422188 & 0.0 & 0.0 \\
219% Cross Finally & 0.0 & N/A &         N/A & 0.0 & 0.0 \\
220% Match All     & 0.0 & 0.0 &  2671989778 & 0.0 & 0.0 \\
221% Match None    & 0.0 & 0.0 &  7829059869 & 0.0 & 0.0 \\
222
223% PLG7A (in seconds)
224\begin{tabular}{|l|c c c c c|}
225\hline
226              & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\
227\hline
228Raise Empty   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
229Raise D'tor   & 0.0 & 0.0 & 0.0 & N/A & N/A \\
230Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
231Raise Other   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
232Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
233Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
234Match All     & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
235Match None    & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
236\hline
237\end{tabular}
238
239One result that is not directly related to \CFA but is important to keep in
240mind is that in exceptions the standard intuitions about which languages
241should go faster often do not hold. There are cases where Python out-preforms
242\Cpp and Java. The most likely explination is that, since exceptions are
243rarely considered to be the common case, the more optimized langages have
244optimized at their expence. In addition languages with high level           
245repersentations have a much easier time scanning the stack as there is less
246to decode.
247
248This means that while \CFA does not actually keep up with Python in every
249case it is usually no worse than roughly half the speed of \Cpp. This is good
250enough for the prototyping purposes of the project.
251
252The test case where \CFA falls short is Raise Other, the case where the
253stack is unwound including a bunch of non-matching handlers.
254This slowdown seems to come from missing optimizations,
255the results above came from gcc/g++ 10 (gcc as \CFA backend or g++ for \Cpp)
256but the results change if they are run in gcc/g++ 9 instead.
257Importantly, there is a huge slowdown in \Cpp's results bringing that brings
258\CFA's performace back in that roughly half speed area. However many other
259\CFA benchmarks increase their run-time by a similar amount falling far
260behind their \Cpp counter-parts.
261
262This suggests that the performance issue in Raise Other is just an
263optimization not being applied. Later versions of gcc may be able to
264optimize this case further, at least down to the half of \Cpp mark.
265A \CFA compiler that directly produced assembly could do even better as it
266would not have to work across some of \CFA's current abstractions, like
267the try terminate function.
268
269Resumption exception handling is also incredibly fast. Often an order of
270magnitude or two better than the best termination speed.
271There is a simple explination for this; traversing a linked list is much   
272faster than examining and unwinding the stack. When resumption does not do as
273well its when more try statements are used per raise. Updating the interal
274linked list is not very expencive but it does add up.
275
276The relative speed of the Match All and Match None tests (within each
277language) can also show the effectiveness conditional matching as compared
278to catch and rethrow.
279\begin{itemize}[nosep]
280\item
281Java and Python get similar values in both tests.
282Between the interperated code, a higher level repersentation of the call
283stack and exception reuse it it is possible the cost for a second
284throw can be folded into the first.
285% Is this due to optimization?
286\item
287Both types of \CFA are slighly slower if there is not a match.
288For termination this likely comes from unwinding a bit more stack through
289libunwind instead of executing the code normally.
290For resumption there is extra work in traversing more of the list and running
291more checks for a matching exceptions.
292% Resumption is a bit high for that but this is my best theory.
293\item
294Then there is \Cpp, which takes 2--3 times longer to catch and rethrow vs.
295just the catch. This is very high, but it does have to repeat the same
296process of unwinding the stack and may have to parse the LSDA of the function
297with the catch and rethrow twice, once before the catch and once after the
298rethrow.
299% I spent a long time thinking of what could push it over twice, this is all
300% I have to explain it.
301\end{itemize}
302The difference in relative performance does show that there are savings to
303be made by performing the check without catching the exception.
Note: See TracBrowser for help on using the repository browser.