Context Navigation

source: doc/theses/andrew_beach_MMath/performance.tex @ b2525d9

ADTast-experimentalenumforall-pointer-decayjacob/cs343-translationnew-ast-unique-exprpthread-emulationqualifiedEnum

Last change on this file since b2525d9 was 0b67a19, checked in by Andrew Beach <ajbeach@…>, 3 years ago
Andrew MMath: First draft of the performance results.
Property mode set to `100644`
File size: 13.3 KB

Line
1	\chapter{Performance}
2	\label{c:performance}
3
4	Performance has been of secondary importance for most of this project.
5	Instead, the focus has been to get the features working. The only performance
6	requirements is to ensure the tests for correctness run in a reasonable
7	amount of time.
8
9	\section{Test Set-Up}
10	Tests will be run in \CFA, C++, Java and Python.
11	In addition there are two sets of tests for \CFA,
12	one for termination exceptions and once with resumption exceptions.
13
14	C++ is the most comparable language because both it and \CFA use the same
15	framework, libunwind.
16	In fact, the comparison is almost entirely a quality of implementation
17	comparison. \CFA's EHM has had significantly less time to be optimized and
18	does not generate its own assembly. It does have a slight advantage in that
19	there are some features it does not handle, through utility functions,
20	but otherwise \Cpp has a significant advantage.
21
22	Java is another very popular language with similar termination semantics.
23	It is implemented in a very different environment, a virtual machine with
24	garbage collection.
25	It also implements the finally clause on try blocks allowing for a direct
26	feature-to-feature comparison.
27	As with \Cpp, Java's implementation is more mature, has more optimizations
28	and more extra features.
29
30	Python was used as a point of comparison because of the \CFA EHM's
31	current performance goals, which is not be prohibitively slow while the
32	features are designed and examined. Python has similar performance goals for
33	creating quick scripts and its wide use suggests it has achieved those goals.
34
35	Unfortunately there are no notable modern programming languages with
36	resumption exceptions. Even the older programming languages with resumptions
37	seem to be notable only for having resumptions.
38	So instead resumptions are compared to a less similar but much more familiar
39	feature, termination exceptions.
40
41	All tests are run inside a main loop which will perform the test
42	repeatedly. This is to avoids start-up or tear-down time from
43	affecting the timing results.
44	Tests ran their main loop a million times.
45	The Java versions of the test also run this loop an extra 1000 times before
46	beginning to time the results to ``warm-up" the JVM.
47
48	Timing is done internally, with time measured immediately before and
49	immediately after the test loop. The difference is calculated and printed.
50
51	The loop structure and internal timing means it is impossible to test
52	unhandled exceptions in \Cpp and Java as that would cause the process to
53	terminate.
54	Luckily, performance on the ``give-up and kill the process" path is not
55	critical.
56
57	The exceptions used in these tests will always be a exception based off of
58	the base exception. This requirement minimizes performance differences based
59	on the object model used to repersent the exception.
60
61	All tests were designed to be as minimal as possible while still preventing
62	exessive optimizations.
63	For example, empty inline assembly blocks are used in \CFA and \Cpp to
64	prevent excessive optimizations while adding no actual work.
65
66	% We don't use catch-alls but if we did:
67	% Catch-alls are done by catching the root exception type (not using \Cpp's
68	% \code{C++}{catch(...)}).
69
70	\section{Tests}
71	The following tests were selected to test the performance of different
72	components of the exception system.
73	The should provide a guide as to where the EHM's costs can be found.
74
75	\paragraph{Raise and Handle}
76	The first group of tests involve setting up
77	So there is three layers to the test. The first is set up and a loop, which
78	configures the test and then runs it repeatedly to reduce the impact of
79	start-up and shutdown on the results.
80	Each iteration of the main loop
81	\begin{itemize}[nosep]
82	\item Empty Function:
83	The repeating function is empty except for the necessary control code.
84	\item Destructor:
85	The repeating function creates an object with a destructor before calling
86	itself.
87	\item Finally:
88	The repeating function calls itself inside a try block with a finally clause
89	attached.
90	\item Other Handler:
91	The repeating function calls itself inside a try block with a handler that
92	will not match the raised exception. (But is of the same kind of handler.)
93	\end{itemize}
94
95	\paragraph{Cross Try Statement}
96	The next group measures the cost of a try statement when no exceptions are
97	raised. The test is set-up, then there is a loop to reduce the impact of
98	start-up and shutdown on the results.
99	In each iteration, a try statement is executed. Entering and leaving a loop
100	is all the test wants to do.
101	\begin{itemize}[nosep]
102	\item Handler:
103	The try statement has a handler (of the matching kind).
104	\item Finally:
105	The try statement has a finally clause.
106	\end{itemize}
107
108	\paragraph{Conditional Matching}
109	This group of tests checks the cost of conditional matching.
110	Only \CFA implements the language level conditional match,
111	the other languages must mimic with an ``unconditional" match (it still
112	checks the exception's type) and conditional re-raise if it was not supposed
113	to handle that exception.
114	\begin{itemize}[nosep]
115	\item Match All:
116	The condition is always true. (Always matches or never re-raises.)
117	\item Match None:
118	The condition is always false. (Never matches or always re-raises.)
119	\end{itemize}
120
121	%\section{Cost in Size}
122	%Using exceptions also has a cost in the size of the executable.
123	%Although it is sometimes ignored
124	%
125	%There is a size cost to defining a personality function but the later problem
126	%is the LSDA which will be generated for every function.
127	%
128	%(I haven't actually figured out how to compare this, probably using something
129	%related to -fexceptions.)
130
131	\section{Results}
132	Each test was run eleven times. The top three and bottom three results were
133	discarded and the remaining five values are averaged.
134
135	In cases where a feature is not supported by a language the test is skipped
136	for that language. Similarly, if a test is does not change between resumption
137	and termination in \CFA, then only one test is written and the result
138	was put into the termination column.
139
140	% Raw Data:
141	% run-algol-a.sat
142	% ---------------
143	% Raise Empty & 82687046678 & 291616256 & 3252824847 & 15422937623 & 14736271114 \\
144	% Raise D'tor & 219933199603 & 297897792 & 223602799362 & N/A & N/A \\
145	% Raise Finally & 219703078448 & 298391745 & N/A & ... & 18923060958 \\
146	% Raise Other & 296744104920 & 2854342084 & 112981255103 & 15475924808 & 21293137454 \\
147	% Cross Handler & 9256648 & 13518430 & 769328 & 3486252 & 31790804 \\
148	% Cross Finally & 769319 & N/A & N/A & 2272831 & 37491962 \\
149	% Match All & 3654278402 & 47518560 & 3218907794 & 1296748192 & 624071886 \\
150	% Match None & 4788861754 & 58418952 & 9458936430 & 1318065020 & 625200906 \\
151	%
152	% run-algol-thr-c
153	% ---------------
154	% Raise Empty & 3757606400 & 36472972 & 3257803337 & 15439375452 & 14717808642 \\
155	% Raise D'tor & 64546302019 & 102148375 & 223648121635 & N/A & N/A \\
156	% Raise Finally & 64671359172 & 103285005 & N/A & 15442729458 & 18927008844 \\
157	% Raise Other & 294143497130 & 2630130385 & 112969055576 & 15448220154 & 21279953424 \\
158	% Cross Handler & 9646462 & 11955668 & 769328 & 3453707 & 31864074 \\
159	% Cross Finally & 773412 & N/A & N/A & 2253825 & 37266476 \\
160	% Match All & 3719462155 & 43294042 & 3223004977 & 1286054154 & 623887874 \\
161	% Match None & 4971630929 & 55311709 & 9481225467 & 1310251289 & 623752624 \\
162	\begin{tabular}{\|l\|c c c c c\|}
163	\hline
164	& \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\
165	\hline
166	Raise Empty & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
167	Raise D'tor & 0.0 & 0.0 & 0.0 & N/A & N/A \\
168	Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
169	Raise Other & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
170	Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
171	Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
172	Match All & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
173	Match None & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
174	\hline
175	\end{tabular}
176
177	% run-plg7a-a.sat
178	% ---------------
179	% Raise Empty & 57169011329 & 296612564 & 2788557155 & 17511466039 & 23324548496 \\
180	% Raise D'tor & 150599858014 & 318443709 & 149651693682 & N/A & N/A \\
181	% Raise Finally & 148223145000 & 373325807 & N/A & ... & 29074552998 \\
182	% Raise Other & 189463708732 & 3017109322 & 85819281694 & 17584295487 & 32602686679 \\
183	% Cross Handler & 8001654 & 13584858 & 1555995 & 6626775 & 41927358 \\
184	% Cross Finally & 1002473 & N/A & N/A & 4554344 & 51114381 \\
185	% Match All & 3162460860 & 37315018 & 2649464591 & 1523205769 & 742374509 \\
186	% Match None & 4054773797 & 47052659 & 7759229131 & 1555373654 & 744656403 \\
187	%
188	% run-plg7a-thr-a
189	% ---------------
190	% Raise Empty & 3604235388 & 29829965 & 2786931833 & 17576506385 & 23352975105 \\
191	% Raise D'tor & 46552380948 & 178709605 & 149834207219 & N/A & N/A \\
192	% Raise Finally & 46265157775 & 177906320 & N/A & 17493045092 & 29170962959 \\
193	% Raise Other & 195659245764 & 2376968982 & 86070431924 & 17552979675 & 32501882918 \\
194	% Cross Handler & 397031776 & 12503552 & 1451225 & 6658628 & 42304965 \\
195	% Cross Finally & 1136746 & N/A & N/A & 4468799 & 46155817 \\
196	% Match All & 3189512499 & 39124453 & 2667795989 & 1525889031 & 733785613 \\
197	% Match None & 4094675477 & 48749857 & 7850618572 & 1566713577 & 733478963 \\
198
199	% PLG7A (in seconds)
200	\begin{tabular}{\|l\|c c c c c\|}
201	\hline
202	& \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\
203	\hline
204	% Raise Empty & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
205	% Raise D'tor & 0.0 & 0.0 & 0.0 & N/A & N/A \\
206	% Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
207	% Raise Other & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
208	% Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
209	% Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
210	% Match All & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
211	% Match None & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
212	Raise Empty & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
213	Raise D'tor & 0.0 & 0.0 & 0.0 & N/A & N/A \\
214	Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
215	Raise Other & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
216	Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
217	Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
218	Match All & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
219	Match None & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
220	\hline
221	\end{tabular}
222
223	One result that is not directly related to \CFA but is important to keep in
224	mind is that in exceptions the standard intuitions about which languages
225	should go faster often do not hold. There are cases where Python out-preforms
226	\Cpp and Java. The most likely explination is that, since exceptions are
227	rarely considered to be the common case, the more optimized langages have
228	optimized at their expence. In addition languages with high level
229	repersentations have a much easier time scanning the stack as there is less
230	to decode.
231
232	This means that while \CFA does not actually keep up with Python in every
233	case it is no worse than roughly half the speed of \Cpp. This is good
234	enough for the prototyping purposes of the project.
235
236	One difference not shown is that optimizations in \CFA is very fragile.
237	The \CFA compiler uses gcc as part of its complation process and the version
238	of gcc could change the speed of some of the benchmarks by 10 times or more.
239	Similar changes to g++ for the \Cpp benchmarks had no significant changes.
240	Because of the connection between gcc and g++; this suggests it is not the
241	optimizations that are changing but how the optimizer is detecting if the
242	optimizations can be applied. So the optimizations are always applied in
243	g++, but only newer versions of gcc can detect that they can be applied in
244	the more complex \CFA code.
245
246	Resumption exception handling is also incredibly fast. Often an order of
247	magnitude or two better than the best termination speed.
248	There is a simple explination for this; traversing a linked list is much
249	faster than examining and unwinding the stack. When resumption does not do as
250	well its when more try statements are used per raise. Updating the interal
251	linked list is not very expencive but it does add up.
252
253	The relative speed of the Match All and Match None tests (within each
254	language) can also show the effectiveness conditional matching as compared
255	to catch and rethrow.
256	\begin{itemize}[nosep]
257	\item
258	Java and Python get similar values in both tests.
259	Between the interperated code, a higher level repersentation of the call
260	stack and exception reuse it it is possible the cost for a second
261	throw can be folded into the first.
262	% Is this due to optimization?
263	\item
264	Both types of \CFA are slighly slower if there is not a match.
265	For termination this likely comes from unwinding a bit more stack through
266	libunwind instead of executing the code normally.
267	For resumption there is extra work in traversing more of the list and running
268	more checks for a matching exceptions.
269	% Resumption is a bit high for that but this is my best theory.
270	\item
271	Then there is \Cpp, which takes 2--3 times longer to catch and rethrow vs.
272	just the catch. This is very high, but it does have to repeat the same
273	process of unwinding the stack and may have to parse the LSDA of the function
274	with the catch and rethrow twice, once before the catch and once after the
275	rethrow.
276	% I spent a long time thinking of what could push it over twice, this is all
277	% I have to explain it.
278	\end{itemize}
279	The difference in relative performance does show that there are savings to
280	be made by performing the check without catching the exception.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: