Context Navigation

source: doc/theses/andrew_beach_MMath/performance.tex@ 9f5a19fa

Visit:

ADT ast-experimental enum forall-pointer-decay jacob/cs343-translation pthread-emulation qualifiedEnum

Last change on this file since 9f5a19fa was 262deda0, checked in by Peter A. Buhr <pabuhr@…>, 4 years ago
proofread performance chapter and add local bibliography entry
Property mode set to `100644`
File size: 26.2 KB

Line
1	\chapter{Performance}
2	\label{c:performance}
3
4	Performance is of secondary importance for most of this project.
5	Instead, the focus was to get the features working. The only performance
6	requirement is to ensure the tests for correctness run in a reasonable
7	amount of time. Hence, a few basic performance tests were performed to
8	check this requirement.
9
10	\section{Test Set-Up}
11	Tests were run in \CFA, C++, Java and Python.
12	In addition there are two sets of tests for \CFA,
13	one for termination and one for resumption exceptions.
14
15	C++ is the most comparable language because both it and \CFA use the same
16	framework, libunwind.
17	In fact, the comparison is almost entirely a quality of implementation.
18	Specifically, \CFA's EHM has had significantly less time to be optimized and
19	does not generate its own assembly. It does have a slight advantage in that
20	there are some features it handles directly instead of through utility functions,
21	but otherwise \Cpp should have a significant advantage.
22
23	Java is a popular language with similar termination semantics, but
24	it is implemented in a very different environment, a virtual machine with
25	garbage collection.
26	It also implements the @finally@ clause on @try@ blocks allowing for a direct
27	feature-to-feature comparison.
28	As with \Cpp, Java's implementation is mature, optimized
29	and has extra features.
30
31	Python is used as an alternative comparison because of the \CFA EHM's
32	current performance goals, which is not to be prohibitively slow while the
33	features are designed and examined. Python has similar performance goals for
34	creating quick scripts and its wide use suggests it has achieved those goals.
35
36	Unfortunately, there are no notable modern programming languages with
37	resumption exceptions. Even the older programming languages with resumption
38	seem to be notable only for having resumption.
39	So instead, resumption is compared to its simulation in other programming
40	languages using fixup functions that are explicitly passed for correction or
41	logging purposes.
42	% So instead, resumption is compared to a less similar but much more familiar
43	%feature, termination exceptions.
44
45	All tests are run inside a main loop that repeatedly performs a test.
46	This approach avoids start-up or tear-down time from
47	affecting the timing results.
48	Each test is run a N times (configurable from the command line).
49	The Java tests runs the main loop 1000 times before
50	beginning the actual test to ``warm-up" the JVM.
51
52	Timing is done internally, with time measured immediately before and
53	after the test loop. The difference is calculated and printed.
54	The loop structure and internal timing means it is impossible to test
55	unhandled exceptions in \Cpp and Java as that would cause the process to
56	terminate.
57	Luckily, performance on the ``give-up and kill the process" path is not
58	critical.
59
60	The exceptions used in these tests are always based off of
61	a base exception. This requirement minimizes performance differences based
62	on the object model used to represent the exception.
63
64	All tests are designed to be as minimal as possible, while still preventing
65	excessive optimizations.
66	For example, empty inline assembly blocks are used in \CFA and \Cpp to
67	prevent excessive optimizations while adding no actual work.
68	Each test was run eleven times. The top three and bottom three results were
69	discarded and the remaining five values are averaged.
70
71	The tests are compiled with gcc-10 for \CFA and g++-10 for \Cpp. Java is
72	compiled with version 11.0.11. Python with version 3.8. The tests were run on:
73	\begin{itemize}[nosep]
74	\item
75	ARM 2280 Kunpeng 920 48-core 2$\times$socket \lstinline{@} 2.6 GHz running Linux v5.11.0-25
76	\item
77	AMD 6380 Abu Dhabi 16-core 4$\times$socket \lstinline{@} 2.5 GHz running Linux v5.11.0-25
78	\end{itemize}
79	Two kinds of hardware architecture allows discriminating any implementation and
80	architectural effects.
81
82
83	% We don't use catch-alls but if we did:
84	% Catch-alls are done by catching the root exception type (not using \Cpp's
85	% \code{C++}{catch(...)}).
86
87	\section{Tests}
88	The following tests were selected to test the performance of different
89	components of the exception system.
90	They should provide a guide as to where the EHM's costs are found.
91
92	\paragraph{Raise and Handle}
93	This group measures the cost of a try statement when exceptions are raised and
94	the stack is unwound (termination) or not unwound (resumption). Each test has
95	has a repeating function like the following
96	\begin{lstlisting}[language=CFA,{moredelim=**[is][\color{red}]{@}{@}}]
97	void unwind_empty(unsigned int frames) {
98	if (frames) {
99	@unwind_empty(frames - 1);@ // AUGMENTED IN OTHER EXPERIMENTS
100	} else throw (empty_exception){&empty_vt};
101	}
102	\end{lstlisting}
103	which is called N times, where each call recurses to a depth of R (configurable from the command line), an
104	exception is raised, the stack is a unwound, and the exception caught.
105	\begin{itemize}[nosep]
106	\item Empty:
107	For termination, this test measures the cost of raising (stack walking) an
108	exception through empty stack frames from the bottom of the recursion to an
109	empty handler, and unwinding the stack. (see above code)
110
111	\medskip
112	For resumption, this test measures the same raising cost but does not unwind
113	the stack. For languages without resumption, a fixup function is to the bottom
114	of the recursion and called to simulate a fixup operation at that point.
115	\begin{cfa}
116	void nounwind_fixup(unsigned int frames, void (*raised_rtn)(int &)) {
117	if (frames) {
118	nounwind_fixup(frames - 1, raised_rtn);
119	} else {
120	int fixup = 17;
121	raised_rtn(fixup);
122	}
123	}
124	\end{cfa}
125	where the passed fixup function is:
126	\begin{cfa}
127	void raised(int & fixup) {
128	fixup = 42;
129	}
130	\end{cfa}
131	For comparison, a \CFA version passing a function is also included.
132	\item Destructor:
133	This test measures the cost of raising an exception through non-empty frames,
134	where each frame has an object requiring destruction, from the bottom of the
135	recursion to an empty handler. Hence, there are N destructor calls during
136	unwinding.
137
138	\medskip
139	This test is not meaningful for resumption because the stack is only unwound as
140	the recursion returns.
141	\begin{cfa}
142	WithDestructor object;
143	unwind_destructor(frames - 1);
144	\end{cfa}
145	\item Finally:
146	This test measures the cost of establishing a try block with an empty finally
147	clause on the front side of the recursion and running the empty finally clauses
148	during stack unwinding from the bottom of the recursion to an empty handler.
149	\begin{cfa}
150	try {
151	unwind_finally(frames - 1);
152	} finally {}
153	\end{cfa}
154
155	\medskip
156	This test is not meaningful for resumption because the stack is only unwound as
157	the recursion returns.
158	\item Other Handler:
159	For termination, this test is like the finally test but the try block has a
160	catch clause for an exception that is not raised, so catch matching is executed
161	during stack unwinding but the match never successes until the catch at the
162	bottom of the recursion.
163	\begin{cfa}
164	try {
165	unwind_other(frames - 1);
166	} catch (not_raised_exception *) {}
167	\end{cfa}
168
169	\medskip
170	For resumption, this test measures the same raising cost but does not unwind
171	the stack. For languages without resumption, the same fixup function is passed
172	and called.
173	\end{itemize}
174
175	\paragraph{Try/Handle/Finally Statement}
176	This group measures just the cost of executing a try statement so
177	\emph{there is no stack unwinding}. Hence, the program main loops N times
178	around:
179	\begin{cfa}
180	try {
181	} catch (not_raised_exception *) {}
182	\end{cfa}
183	\begin{itemize}[nosep]
184	\item Handler:
185	The try statement has a handler (catch/resume).
186	\item Finally:
187	The try statement has a finally clause.
188	\end{itemize}
189
190	\paragraph{Conditional Matching}
191	This group measures the cost of conditional matching.
192	Only \CFA implements the language level conditional match,
193	the other languages mimic with an ``unconditional" match (it still
194	checks the exception's type) and conditional re-raise if it is not suppose
195	to handle that exception.
196	\begin{center}
197	\begin{tabular}{ll}
198	\multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp, Java, Python} \\
199	\begin{cfa}
200	try {
201	throw_exception();
202	} catch (empty_exception * exc;
203	should_catch) {
204	}
205	\end{cfa}
206	&
207	\begin{cfa}
208	try {
209	throw_exception();
210	} catch (EmptyException & exc) {
211	if (!should_catch) throw;
212	}
213	\end{cfa}
214	\end{tabular}
215	\end{center}
216	\begin{itemize}[nosep]
217	\item Match All:
218	The condition is always true. (Always matches or never re-raises.)
219	\item Match None:
220	The condition is always false. (Never matches or always re-raises.)
221	\end{itemize}
222
223	\medskip
224	\noindent
225	All omitted test code for other languages is functionally identical to the \CFA
226	tests or simulated, and available online~\cite{CforallExceptionBenchmarks}.
227
228	%\section{Cost in Size}
229	%Using exceptions also has a cost in the size of the executable.
230	%Although it is sometimes ignored
231	%
232	%There is a size cost to defining a personality function but the later problem
233	%is the LSDA which will be generated for every function.
234	%
235	%(I haven't actually figured out how to compare this, probably using something
236	%related to -fexceptions.)
237
238	\section{Results}
239	One result not directly related to \CFA but important to keep in
240	mind is that, for exceptions, the standard intuition about which languages
241	should go faster often does not hold. For example, there are a few cases where Python out-performs
242	\CFA, \Cpp and Java. The most likely explanation is that, since exceptions are
243	rarely considered to be the common case, the more optimized languages
244	make that case expense. In addition, languages with high-level
245	representations have a much easier time scanning the stack as there is less
246	to decode.
247
248	Tables~\ref{t:PerformanceTermination} and~\ref{t:PerformanceResumption} show
249	the test results for termination and resumption, respectively. In cases where
250	a feature is not supported by a language, the test is skipped for that language
251	(marked N/A). For some Java experiments it was impossible to measure certain
252	effects because the JIT corrupted the test (marked N/C). No workaround was
253	possible~\cite{Dice21}. To get experiments in the range of 1--100 seconds, the
254	number of times an experiment is run (N) is varied (N is marked beside each
255	experiment, e.g., 1M $\Rightarrow$ 1 million test iterations).
256
257	An anomaly exists with gcc nested functions used as thunks for implementing
258	much of the \CFA EHM. If a nested-function closure captures local variables in
259	its lexical scope, performance dropped by a factor of 10. Specifically, in try
260	statements of the form:
261	\begin{cfa}
262	try {
263	unwind_other(frames - 1);
264	} catch (not_raised_exception *) {}
265	\end{cfa}
266	the try block is hoisted into a nested function and the variable @frames@ is
267	the local parameter to the recursive function, which triggers the anomaly. The
268	workaround is to remove the recursion parameter and make it a global variable
269	that is explicitly decremented outside of the try block (marked with a ``*''):
270	\begin{cfa}
271	frames -= 1;
272	try {
273	unwind_other();
274	} catch (not_raised_exception *) {}
275	\end{cfa}
276	To make comparisons fair, a dummy parameter is added and the dummy value passed
277	in the recursion. Note, nested functions in gcc are rarely used (if not
278	completely unknown) and must follow the C calling convention, unlike \Cpp
279	lambdas, so it is not surprising if there are performance issues efficiently
280	capturing closures.
281
282	% Similarly, if a test does not change between resumption
283	% and termination in \CFA, then only one test is written and the result
284	% was put into the termination column.
285
286	% Raw Data:
287	% run-algol-a.sat
288	% ---------------
289	% Raise Empty & 82687046678 & 291616256 & 3252824847 & 15422937623 & 14736271114 \\
290	% Raise D'tor & 219933199603 & 297897792 & 223602799362 & N/A & N/A \\
291	% Raise Finally & 219703078448 & 298391745 & N/A & ... & 18923060958 \\
292	% Raise Other & 296744104920 & 2854342084 & 112981255103 & 15475924808 & 21293137454 \\
293	% Cross Handler & 9256648 & 13518430 & 769328 & 3486252 & 31790804 \\
294	% Cross Finally & 769319 & N/A & N/A & 2272831 & 37491962 \\
295	% Match All & 3654278402 & 47518560 & 3218907794 & 1296748192 & 624071886 \\
296	% Match None & 4788861754 & 58418952 & 9458936430 & 1318065020 & 625200906 \\
297	%
298	% run-algol-thr-c
299	% ---------------
300	% Raise Empty & 3757606400 & 36472972 & 3257803337 & 15439375452 & 14717808642 \\
301	% Raise D'tor & 64546302019 & 102148375 & 223648121635 & N/A & N/A \\
302	% Raise Finally & 64671359172 & 103285005 & N/A & 15442729458 & 18927008844 \\
303	% Raise Other & 294143497130 & 2630130385 & 112969055576 & 15448220154 & 21279953424 \\
304	% Cross Handler & 9646462 & 11955668 & 769328 & 3453707 & 31864074 \\
305	% Cross Finally & 773412 & N/A & N/A & 2253825 & 37266476 \\
306	% Match All & 3719462155 & 43294042 & 3223004977 & 1286054154 & 623887874 \\
307	% Match None & 4971630929 & 55311709 & 9481225467 & 1310251289 & 623752624 \\
308	%
309	% run-algol-04-a
310	% --------------
311	% Raise Empty & 0.0 & 0.0 & 3250260945 & 0.0 & 0.0 \\
312	% Raise D'tor & 0.0 & 0.0 & 29017675113 & N/A & N/A \\
313	% Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
314	% Raise Other & 0.0 & 0.0 & 24411823773 & 0.0 & 0.0 \\
315	% Cross Handler & 0.0 & 0.0 & 769334 & 0.0 & 0.0 \\
316	% Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
317	% Match All & 0.0 & 0.0 & 3254283504 & 0.0 & 0.0 \\
318	% Match None & 0.0 & 0.0 & 9476060146 & 0.0 & 0.0 \\
319
320	% run-plg7a-a.sat
321	% ---------------
322	% Raise Empty & 57169011329 & 296612564 & 2788557155 & 17511466039 & 23324548496 \\
323	% Raise D'tor & 150599858014 & 318443709 & 149651693682 & N/A & N/A \\
324	% Raise Finally & 148223145000 & 373325807 & N/A & ... & 29074552998 \\
325	% Raise Other & 189463708732 & 3017109322 & 85819281694 & 17584295487 & 32602686679 \\
326	% Cross Handler & 8001654 & 13584858 & 1555995 & 6626775 & 41927358 \\
327	% Cross Finally & 1002473 & N/A & N/A & 4554344 & 51114381 \\
328	% Match All & 3162460860 & 37315018 & 2649464591 & 1523205769 & 742374509 \\
329	% Match None & 4054773797 & 47052659 & 7759229131 & 1555373654 & 744656403 \\
330	%
331	% run-plg7a-thr-a
332	% ---------------
333	% Raise Empty & 3604235388 & 29829965 & 2786931833 & 17576506385 & 23352975105 \\
334	% Raise D'tor & 46552380948 & 178709605 & 149834207219 & N/A & N/A \\
335	% Raise Finally & 46265157775 & 177906320 & N/A & 17493045092 & 29170962959 \\
336	% Raise Other & 195659245764 & 2376968982 & 86070431924 & 17552979675 & 32501882918 \\
337	% Cross Handler & 397031776 & 12503552 & 1451225 & 6658628 & 42304965 \\
338	% Cross Finally & 1136746 & N/A & N/A & 4468799 & 46155817 \\
339	% Match All & 3189512499 & 39124453 & 2667795989 & 1525889031 & 733785613 \\
340	% Match None & 4094675477 & 48749857 & 7850618572 & 1566713577 & 733478963 \\
341	%
342	% run-plg7a-04-a
343	% --------------
344	% 0.0 are unfilled.
345	% Raise Empty & 0.0 & 0.0 & 2770781479 & 0.0 & 0.0 \\
346	% Raise D'tor & 0.0 & 0.0 & 23530084907 & N/A & N/A \\
347	% Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
348	% Raise Other & 0.0 & 0.0 & 23816827982 & 0.0 & 0.0 \\
349	% Cross Handler & 0.0 & 0.0 & 1422188 & 0.0 & 0.0 \\
350	% Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
351	% Match All & 0.0 & 0.0 & 2671989778 & 0.0 & 0.0 \\
352	% Match None & 0.0 & 0.0 & 7829059869 & 0.0 & 0.0 \\
353
354	\begin{table}
355	\centering
356	\caption{Performance Results Termination (sec)}
357	\label{t:PerformanceTermination}
358	\begin{tabular}{\|r\|*{2}{\|r r r r\|}}
359	\hline
360	& \multicolumn{4}{c\|\|}{AMD} & \multicolumn{4}{c\|}{ARM} \\
361	\cline{2-9}
362	N\hspace{8pt} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c\|\|}{Python} &
363	\multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c\|}{Python} \\
364	\hline
365	Throw Empty (1M) & 3.4 & 2.8 & 18.3 & 23.4 & 3.7 & 3.2 & 15.5 & 14.8 \\
366	Throw D'tor (1M) & 48.4 & 23.6 & N/A & N/A & 64.2 & 29.0 & N/A & N/A \\
367	Throw Finally (1M) & 3.4* & N/A & 17.9 & 29.0 & 4.1* & N/A & 15.6 & 19.0 \\
368	Throw Other (1M) & 3.6* & 23.2 & 18.2 & 32.7 & 4.0* & 24.5 & 15.5 & 21.4 \\
369	Try/Catch (100M) & 6.0 & 0.9 & N/C & 37.4 & 10.0 & 0.8 & N/C & 32.2 \\
370	Try/Finally (100M) & 0.9 & N/A & N/C & 44.1 & 0.8 & N/A & N/C & 37.3 \\
371	Match All (10M) & 32.9 & 20.7 & 13.4 & 4.9 & 36.2 & 24.5 & 12.0 & 3.1 \\
372	Match None (10M) & 32.7 & 50.3 & 11.0 & 5.1 & 36.3 & 71.9 & 12.3 & 4.2 \\
373	\hline
374	\end{tabular}
375	\end{table}
376
377	\begin{table}
378	\centering
379	\small
380	\caption{Performance Results Resumption (sec)}
381	\label{t:PerformanceResumption}
382	\setlength{\tabcolsep}{5pt}
383	\begin{tabular}{\|r\|*{2}{\|r r r r\|}}
384	\hline
385	& \multicolumn{4}{c\|\|}{AMD} & \multicolumn{4}{c\|}{ARM} \\
386	\cline{2-9}
387	N\hspace{8pt} & \multicolumn{1}{c}{\CFA (R/F)} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c\|\|}{Python} &
388	\multicolumn{1}{c}{\CFA (R/F)} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c\|}{Python} \\
389	\hline
390	Resume Empty (10M) & 3.8/3.5 & 14.7 & 2.3 & 176.1 & 0.3/0.1 & 8.9 & 1.2 & 119.9 \\
391	Resume Other (10M) & 4.0/0.1 & 21.9 & 6.2 & 381.0 & 0.3/0.1 & 13.2 & 5.0 & 290.7 \\
392	Try/Resume (100M) & 8.8 & N/A & N/A & N/A & 12.3 & N/A & N/A & N/A \\
393	Match All (10M) & 0.3 & N/A & N/A & N/A & 0.3 & N/A & N/A & N/A \\
394	Match None (10M) & 0.3 & N/A & N/A & N/A & 0.4 & N/A & N/A & N/A \\
395	\hline
396	\end{tabular}
397	\end{table}
398
399	As stated, the performance tests are not attempting to compare exception
400	handling across languages. The only performance requirement is to ensure the
401	\CFA EHM implementation runs in a reasonable amount of time, given its
402	constraints. In general, the \CFA implement did very well. Each of the tests is
403	analysed.
404	\begin{description}
405	\item[Throw/Resume Empty]
406	For termination, \CFA is close to \Cpp, where other languages have a higher cost.
407
408	For resumption, \CFA is better than the fixup simulations in the other languages, except Java.
409	The \CFA results on the ARM computer for both resumption and function simulation are particularly low;
410	I have no explanation for this anomaly, except the optimizer has managed to remove part of the experiment.
411	Python has a high cost for passing the lambda during the recursion.
412
413	\item[Throw D'tor]
414	For termination, \CFA is twice the cost of \Cpp.
415	The higher cost for \CFA must be related to how destructors are handled.
416
417	\item[Throw Finally]
418	\CFA is better than the other languages with a @finally@ clause, which is the
419	same for termination and resumption.
420
421	\item[Throw/Resume Other]
422	For termination, \CFA is better than the other languages.
423
424	For resumption, \CFA is equal to or better the other languages.
425	Again, the \CFA results on the ARM computer for both resumption and function simulation are particularly low.
426	Python has a high cost for passing the lambda during the recursion.
427
428	\item[Try/Catch/Resume]
429	For termination, installing a try statement is more expressive than \Cpp
430	because the try components are hoisted into local functions. At runtime, these
431	functions are than passed to libunwind functions to set up the try statement.
432	\Cpp zero-cost try-entry accounts for its performance advantage.
433
434	For resumption, there are similar costs to termination to set up the try
435	statement but libunwind is not used.
436
437	\item[Try/Finally]
438	Setting up a try finally is less expensive in \CFA than setting up handlers,
439	and is significantly less than other languages.
440
441	\item[Throw/Resume Match All]
442	For termination, \CFA is close to the other language simulations.
443
444	For resumption, the stack unwinding is much faster because it does not use
445	libunwind. Instead resumption is just traversing a linked list with each node
446	being the next stack frame with the try block.
447
448	\item[Throw/Resume Match None]
449	The same results as for Match All.
450	\end{description}
451
452	\begin{comment}
453	This observation means that while \CFA does not actually keep up with Python in
454	every case, it is usually no worse than roughly half the speed of \Cpp. This
455	performance is good enough for the prototyping purposes of the project.
456
457	The test case where \CFA falls short is Raise Other, the case where the
458	stack is unwound including a bunch of non-matching handlers.
459	This slowdown seems to come from missing optimizations.
460
461	This suggests that the performance issue in Raise Other is just an
462	optimization not being applied. Later versions of gcc may be able to
463	optimize this case further, at least down to the half of \Cpp mark.
464	A \CFA compiler that directly produced assembly could do even better as it
465	would not have to work across some of \CFA's current abstractions, like
466	the try terminate function.
467
468	Resumption exception handling is also incredibly fast. Often an order of
469	magnitude or two better than the best termination speed.
470	There is a simple explanation for this; traversing a linked list is much
471	faster than examining and unwinding the stack. When resumption does not do as
472	well its when more try statements are used per raise. Updating the internal
473	linked list is not very expensive but it does add up.
474
475	The relative speed of the Match All and Match None tests (within each
476	language) can also show the effectiveness conditional matching as compared
477	to catch and rethrow.
478	\begin{itemize}[nosep]
479	\item
480	Java and Python get similar values in both tests.
481	Between the interpreted code, a higher level representation of the call
482	stack and exception reuse it it is possible the cost for a second
483	throw can be folded into the first.
484	% Is this due to optimization?
485	\item
486	Both types of \CFA are slightly slower if there is not a match.
487	For termination this likely comes from unwinding a bit more stack through
488	libunwind instead of executing the code normally.
489	For resumption there is extra work in traversing more of the list and running
490	more checks for a matching exceptions.
491	% Resumption is a bit high for that but this is my best theory.
492	\item
493	Then there is \Cpp, which takes 2--3 times longer to catch and rethrow vs.
494	just the catch. This is very high, but it does have to repeat the same
495	process of unwinding the stack and may have to parse the LSDA of the function
496	with the catch and rethrow twice, once before the catch and once after the
497	rethrow.
498	% I spent a long time thinking of what could push it over twice, this is all
499	% I have to explain it.
500	\end{itemize}
501	The difference in relative performance does show that there are savings to
502	be made by performing the check without catching the exception.
503	\end{comment}
504
505
506	\begin{comment}
507	From: Dave Dice <dave.dice@oracle.com>
508	To: "Peter A. Buhr" <pabuhr@uwaterloo.ca>
509	Subject: Re: [External] : JIT
510	Date: Mon, 16 Aug 2021 01:21:56 +0000
511
512	> On 2021-8-15, at 7:14 PM, Peter A. Buhr <pabuhr@uwaterloo.ca> wrote:
513	>
514	> My student is trying to measure the cost of installing a try block with a
515	> finally clause in Java.
516	>
517	> We tried the random trick (see below). But if the try block is comment out, the
518	> results are the same. So the program measures the calls to the random number
519	> generator and there is no cost for installing the try block.
520	>
521	> Maybe there is no cost for a try block with an empty finally, i.e., the try is
522	> optimized away from the get-go.
523
524	There's quite a bit of optimization magic behind the HotSpot curtains for
525	try-finally. (I sound like the proverbial broken record (:>)).
526
527	In many cases we can determine that the try block can't throw any exceptions,
528	so we can elide all try-finally plumbing. In other cases, we can convert the
529	try-finally to normal if-then control flow, in the case where the exception is
530	thrown into the same method. This makes exceptions _almost cost-free. If we
531	actually need to "physically" rip down stacks, then things get expensive,
532	impacting both the throw cost, and inhibiting other useful optimizations at the
533	catch point. Such "true" throws are not just expensive, they're _very
534	expensive. The extremely aggressive inlining used by the JIT helps, because we
535	can convert cases where a heavy rip-down would normally needed back into simple
536	control flow.
537
538	Other quirks involve the thrown exception object. If it's never accessed then
539	we're apply a nice set of optimizations to avoid its construction. If it's
540	accessed but never escapes the catch frame (common) then we can also cheat.
541	And if we find we're hitting lots of heavy rip-down cases, the JIT will
542	consider recompilation - better inlining -- to see if we can merge the throw
543	and catch into the same physical frame, and shift to simple branches.
544
545	In your example below, System.out.print() can throw, I believe. (I could be
546	wrong, but most IO can throw). Native calls that throw will "unwind" normally
547	in C++ code until they hit the boundary where they reenter java emitted code,
548	at which point the JIT-ed code checks for a potential pending exception. So in
549	a sense the throw point is implicitly after the call to the native method, so
550	we can usually make those cases efficient.
551
552	Also, when we're running in the interpreter and warming up, we'll notice that
553	the == 42 case never occurs, and so when we start to JIT the code, we elide the
554	call to System.out.print(), replacing it (and anything else which appears in
555	that if x == 42 block) with a bit of code we call an "uncommon trap". I'm
556	presuming we encounter 42 rarely. So if we ever hit the x == 42 case, control
557	hits the trap, which triggers synchronous recompilation of the method, this
558	time with the call to System.out.print() and, because of that, we now to adapt
559	the new code to handle any traps thrown by print(). This is tricky stuff, as
560	we may need to rebuild stack frames to reflect the newly emitted method. And
561	we have to construct a weird bit of "thunk" code that allows us to fall back
562	directly into the newly emitted "if" block. So there's a large one-time cost
563	when we bump into the uncommon trap and recompile, and subsequent execution
564	might get slightly slower as the exception could actually be generated, whereas
565	before we hit the trap, we knew the exception could never be raised.
566
567	Oh, and things also get expensive if we need to actually fill in the stack
568	trace associated with the exception object. Walking stacks is hellish.
569
570	Quite a bit of effort was put into all this as some of the specjvm benchmarks
571	showed significant benefit.
572
573	It's hard to get sensible measurements as the JIT is working against you at
574	every turn. What's good for the normal user is awful for anybody trying to
575	benchmark. Also, all the magic results in fairly noisy and less reproducible
576	results.
577
578	Regards
579	Dave
580
581	p.s., I think I've mentioned this before, but throwing in C++ is grim as
582	unrelated throws in different threads take common locks, so nothing scales as
583	you might expect.
584	\end{comment}

Note: See TracBrowser for help on using the repository browser.

Download in other formats: