Context Navigation

source: doc/theses/andrew_beach_MMath/performance.tex @ c9f9d4f

ADTast-experimentalenumforall-pointer-decayjacob/cs343-translationnew-ast-unique-exprpthread-emulationqualifiedEnum

Last change on this file since c9f9d4f was c9f9d4f, checked in by Peter A. Buhr <pabuhr@…>, 3 years ago
first proofread of performance chapter
Property mode set to `100644`
File size: 15.6 KB

Rev	Line
[dac16a0]	1	\chapter{Performance}
	2	\label{c:performance}
	3
[c9f9d4f]	4	Performance is of secondary importance for most of this project.
	5	Instead, the focus is to get the features working. The only performance
	6	requirement is to ensure the tests for correctness run in a reasonable
[dac16a0]	7	amount of time.
[b51e389c]	8
	9	\section{Test Set-Up}
[c9f9d4f]	10	Tests were run in \CFA, C++, Java and Python.
[9698690]	11	In addition there are two sets of tests for \CFA,
[c9f9d4f]	12	one for termination and one for resumption exceptions.
[dac16a0]	13
	14	C++ is the most comparable language because both it and \CFA use the same
	15	framework, libunwind.
[029cbc0]	16	In fact, the comparison is almost entirely a quality of implementation
[c9f9d4f]	17	comparison: \CFA's EHM has had significantly less time to be optimized and
[dac16a0]	18	does not generate its own assembly. It does have a slight advantage in that
[c9f9d4f]	19	there are some features it handles directly instead of through utility functions,
[9698690]	20	but otherwise \Cpp has a significant advantage.
[dac16a0]	21
[029cbc0]	22	Java is another very popular language with similar termination semantics.
	23	It is implemented in a very different environment, a virtual machine with
	24	garbage collection.
[c9f9d4f]	25	It also implements the @finally@ clause on @try@ blocks allowing for a direct
[029cbc0]	26	feature-to-feature comparison.
[c9f9d4f]	27	As with \Cpp, Java's implementation is mature, optimizations
	28	and has extra features.
[9698690]	29
[c9f9d4f]	30	Python is used as an alternative point of comparison because of the \CFA EHM's
	31	current performance goals, which is not to be prohibitively slow while the
[9698690]	32	features are designed and examined. Python has similar performance goals for
	33	creating quick scripts and its wide use suggests it has achieved those goals.
	34
[c9f9d4f]	35	Unfortunately, there are no notable modern programming languages with
	36	resumption exceptions. Even the older programming languages with resumption
	37	seem to be notable only for having resumption.
	38	So instead, resumption is compared to a less similar but much more familiar
[9698690]	39	feature, termination exceptions.
[dac16a0]	40
[c9f9d4f]	41	All tests are run inside a main loop that repeatedly performs a test.
	42	This approach avoids start-up or tear-down time from
[dac16a0]	43	affecting the timing results.
[c9f9d4f]	44	Each test is run a million times.
	45	The Java versions of the test run this loop an extra 1000 times before
	46	beginning to actual test to ``warm-up" the JVM.
[9698690]	47
	48	Timing is done internally, with time measured immediately before and
[c9f9d4f]	49	after the test loop. The difference is calculated and printed.
[9698690]	50	The loop structure and internal timing means it is impossible to test
	51	unhandled exceptions in \Cpp and Java as that would cause the process to
	52	terminate.
	53	Luckily, performance on the ``give-up and kill the process" path is not
	54	critical.
[dac16a0]	55
[c9f9d4f]	56	The exceptions used in these tests are always based off of
	57	a base exception. This requirement minimizes performance differences based
	58	on the object model used to represent the exception.
[9698690]	59
[c9f9d4f]	60	All tests are designed to be as minimal as possible, while still preventing
	61	excessive optimizations.
[9698690]	62	For example, empty inline assembly blocks are used in \CFA and \Cpp to
	63	prevent excessive optimizations while adding no actual work.
[c9f9d4f]	64	Each test was run eleven times. The top three and bottom three results were
	65	discarded and the remaining five values are averaged.
	66
	67	The tests are compiled with gcc-10 for \CFA and g++-10 for \Cpp. Java is
	68	compiled with 11.0.11. Python with 3.8. The tests were run on:
	69	\begin{itemize}[nosep]
	70	\item
	71	ARM 2280 Kunpeng 920 48-core 2$\times$socket \lstinline{@} 2.6 GHz running Linux v5.11.0-25
	72	\item
	73	AMD 6380 Abu Dhabi 16-core 4$\times$socket \lstinline{@} 2.5 GHz running Linux v5.11.0-25
	74	\end{itemize}
[dac16a0]	75
[9698690]	76	% We don't use catch-alls but if we did:
	77	% Catch-alls are done by catching the root exception type (not using \Cpp's
	78	% \code{C++}{catch(...)}).
[dac16a0]	79
[b51e389c]	80	\section{Tests}
[029cbc0]	81	The following tests were selected to test the performance of different
	82	components of the exception system.
[c9f9d4f]	83	They should provide a guide as to where the EHM's costs are found.
[029cbc0]	84
[ea593a3]	85	\paragraph{Raise and Handle}
[c9f9d4f]	86	The first group measures the cost of a try statement when exceptions are raised
	87	and \emph{the stack is unwound}. Each test has has a repeating function like
	88	the following
	89	\begin{cfa}
	90	void unwind_empty(unsigned int frames) {
	91	if (frames) {
	92	unwind_empty(frames - 1);
	93	} else throw (empty_exception){&empty_vt};
	94	}
	95	\end{cfa}
	96	which is called M times, where each call recurses to a depth of N, an
	97	exception is raised, the stack is a unwound, and the exception caught.
[9698690]	98	\begin{itemize}[nosep]
[c9f9d4f]	99	\item Empty:
	100	This test measures the cost of raising (stack walking) an exception through empty
	101	empty stack frames to an empty handler. (see above)
[ea593a3]	102	\item Destructor:
[c9f9d4f]	103
	104	This test measures the cost of raising an exception through non-empty frames
	105	where each frame has an object requiring destruction, to an empty
	106	handler. Hence, there are N destructor calls during unwinding.
	107	\begin{cfa}
	108	if (frames) {
	109	WithDestructor object;
	110	unwind_empty(frames - 1);
	111	\end{cfa}
[ea593a3]	112	\item Finally:
[c9f9d4f]	113	This test measures the cost of establishing a try block with an empty finally
	114	clause on the front side of the recursion and running the empty finally clause
	115	on the back side of the recursion during stack unwinding.
	116	\begin{cfa}
	117	if (frames) {
	118	try {
	119	unwind_finally(frames - 1);
	120	} finally {}
	121	\end{cfa}
[ea593a3]	122	\item Other Handler:
[c9f9d4f]	123	This test is like the finally test but the try block has a catch clause for an
	124	exception that is not raised, so catch matching is executed during stack
	125	unwinding but the match never successes until the catch at the bottom of the
	126	stack.
	127	\begin{cfa}
	128	if (frames) {
	129	try {
	130	unwind_other(frames - 1);
	131	} catch (not_raised_exception *) {}
	132	\end{cfa}
[ea593a3]	133	\end{itemize}
	134
	135	\paragraph{Cross Try Statement}
[c9f9d4f]	136	The next group measures just the cost of executing a try statement so
	137	\emph{there is no stack unwinding}. Hence, the program main loops N times
	138	around:
	139	\begin{cfa}
	140	try {
	141	} catch (not_raised_exception *) {}
	142	\end{cfa}
[9698690]	143	\begin{itemize}[nosep]
[ea593a3]	144	\item Handler:
[c9f9d4f]	145	The try statement has a handler.
[ea593a3]	146	\item Finally:
[c9f9d4f]	147	The try statement replaces the handler with a finally clause.
[ea593a3]	148	\end{itemize}
	149
	150	\paragraph{Conditional Matching}
[c9f9d4f]	151	This final group measures the cost of conditional matching.
[ea593a3]	152	Only \CFA implements the language level conditional match,
	153	the other languages must mimic with an ``unconditional" match (it still
[9698690]	154	checks the exception's type) and conditional re-raise if it was not supposed
	155	to handle that exception.
[c9f9d4f]	156	\begin{center}
	157	\begin{tabular}{ll}
	158	\multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp, Java, Python} \\
	159	\begin{cfa}
	160	try {
	161	throw_exception();
	162	} catch (empty_exception * exc;
	163	should_catch) {
	164	}
	165	\end{cfa}
	166	&
	167	\begin{cfa}
	168	try {
	169	throw_exception();
	170	} catch (EmptyException & exc) {
	171	if (!should_catch) throw;
	172	}
	173	\end{cfa}
	174	\end{tabular}
	175	\end{center}
[9698690]	176	\begin{itemize}[nosep]
	177	\item Match All:
[ea593a3]	178	The condition is always true. (Always matches or never re-raises.)
[9698690]	179	\item Match None:
[ea593a3]	180	The condition is always false. (Never matches or always re-raises.)
	181	\end{itemize}
[dac16a0]	182
	183	%\section{Cost in Size}
	184	%Using exceptions also has a cost in the size of the executable.
	185	%Although it is sometimes ignored
	186	%
	187	%There is a size cost to defining a personality function but the later problem
	188	%is the LSDA which will be generated for every function.
	189	%
	190	%(I haven't actually figured out how to compare this, probably using something
	191	%related to -fexceptions.)
[029cbc0]	192
[9698690]	193	\section{Results}
	194	In cases where a feature is not supported by a language the test is skipped
[c9f9d4f]	195	for that language.
	196	\PAB{Report all values.
	197
	198	Similarly, if a test does not change between resumption
[9698690]	199	and termination in \CFA, then only one test is written and the result
	200	was put into the termination column.
[c9f9d4f]	201	}
[9698690]	202
[0b67a19]	203	% Raw Data:
	204	% run-algol-a.sat
	205	% ---------------
	206	% Raise Empty & 82687046678 & 291616256 & 3252824847 & 15422937623 & 14736271114 \\
	207	% Raise D'tor & 219933199603 & 297897792 & 223602799362 & N/A & N/A \\
	208	% Raise Finally & 219703078448 & 298391745 & N/A & ... & 18923060958 \\
	209	% Raise Other & 296744104920 & 2854342084 & 112981255103 & 15475924808 & 21293137454 \\
	210	% Cross Handler & 9256648 & 13518430 & 769328 & 3486252 & 31790804 \\
	211	% Cross Finally & 769319 & N/A & N/A & 2272831 & 37491962 \\
	212	% Match All & 3654278402 & 47518560 & 3218907794 & 1296748192 & 624071886 \\
	213	% Match None & 4788861754 & 58418952 & 9458936430 & 1318065020 & 625200906 \\
	214	%
	215	% run-algol-thr-c
	216	% ---------------
	217	% Raise Empty & 3757606400 & 36472972 & 3257803337 & 15439375452 & 14717808642 \\
	218	% Raise D'tor & 64546302019 & 102148375 & 223648121635 & N/A & N/A \\
	219	% Raise Finally & 64671359172 & 103285005 & N/A & 15442729458 & 18927008844 \\
	220	% Raise Other & 294143497130 & 2630130385 & 112969055576 & 15448220154 & 21279953424 \\
	221	% Cross Handler & 9646462 & 11955668 & 769328 & 3453707 & 31864074 \\
	222	% Cross Finally & 773412 & N/A & N/A & 2253825 & 37266476 \\
	223	% Match All & 3719462155 & 43294042 & 3223004977 & 1286054154 & 623887874 \\
	224	% Match None & 4971630929 & 55311709 & 9481225467 & 1310251289 & 623752624 \\
[5438e41]	225	%
	226	% run-algol-04-a
	227	% --------------
	228	% Raise Empty & 0.0 & 0.0 & 3250260945 & 0.0 & 0.0 \\
	229	% Raise D'tor & 0.0 & 0.0 & 29017675113 & N/A & N/A \\
	230	% Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
	231	% Raise Other & 0.0 & 0.0 & 24411823773 & 0.0 & 0.0 \\
	232	% Cross Handler & 0.0 & 0.0 & 769334 & 0.0 & 0.0 \\
	233	% Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
	234	% Match All & 0.0 & 0.0 & 3254283504 & 0.0 & 0.0 \\
	235	% Match None & 0.0 & 0.0 & 9476060146 & 0.0 & 0.0 \\
	236
[9698690]	237	\begin{tabular}{\|l\|c c c c c\|}
	238	\hline
	239	& \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\
	240	\hline
	241	Raise Empty & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
	242	Raise D'tor & 0.0 & 0.0 & 0.0 & N/A & N/A \\
	243	Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
	244	Raise Other & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
	245	Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
	246	Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
	247	Match All & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
	248	Match None & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
	249	\hline
	250	\end{tabular}
[0b67a19]	251
	252	% run-plg7a-a.sat
	253	% ---------------
	254	% Raise Empty & 57169011329 & 296612564 & 2788557155 & 17511466039 & 23324548496 \\
	255	% Raise D'tor & 150599858014 & 318443709 & 149651693682 & N/A & N/A \\
	256	% Raise Finally & 148223145000 & 373325807 & N/A & ... & 29074552998 \\
	257	% Raise Other & 189463708732 & 3017109322 & 85819281694 & 17584295487 & 32602686679 \\
	258	% Cross Handler & 8001654 & 13584858 & 1555995 & 6626775 & 41927358 \\
	259	% Cross Finally & 1002473 & N/A & N/A & 4554344 & 51114381 \\
	260	% Match All & 3162460860 & 37315018 & 2649464591 & 1523205769 & 742374509 \\
	261	% Match None & 4054773797 & 47052659 & 7759229131 & 1555373654 & 744656403 \\
	262	%
	263	% run-plg7a-thr-a
	264	% ---------------
	265	% Raise Empty & 3604235388 & 29829965 & 2786931833 & 17576506385 & 23352975105 \\
	266	% Raise D'tor & 46552380948 & 178709605 & 149834207219 & N/A & N/A \\
	267	% Raise Finally & 46265157775 & 177906320 & N/A & 17493045092 & 29170962959 \\
	268	% Raise Other & 195659245764 & 2376968982 & 86070431924 & 17552979675 & 32501882918 \\
	269	% Cross Handler & 397031776 & 12503552 & 1451225 & 6658628 & 42304965 \\
	270	% Cross Finally & 1136746 & N/A & N/A & 4468799 & 46155817 \\
	271	% Match All & 3189512499 & 39124453 & 2667795989 & 1525889031 & 733785613 \\
	272	% Match None & 4094675477 & 48749857 & 7850618572 & 1566713577 & 733478963 \\
[5438e41]	273	%
	274	% run-plg7a-04-a
	275	% --------------
	276	% 0.0 are unfilled.
	277	% Raise Empty & 0.0 & 0.0 & 2770781479 & 0.0 & 0.0 \\
	278	% Raise D'tor & 0.0 & 0.0 & 23530084907 & N/A & N/A \\
	279	% Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
	280	% Raise Other & 0.0 & 0.0 & 23816827982 & 0.0 & 0.0 \\
	281	% Cross Handler & 0.0 & 0.0 & 1422188 & 0.0 & 0.0 \\
	282	% Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
	283	% Match All & 0.0 & 0.0 & 2671989778 & 0.0 & 0.0 \\
	284	% Match None & 0.0 & 0.0 & 7829059869 & 0.0 & 0.0 \\
[0b67a19]	285
	286	% PLG7A (in seconds)
	287	\begin{tabular}{\|l\|c c c c c\|}
	288	\hline
	289	& \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\
	290	\hline
	291	Raise Empty & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
	292	Raise D'tor & 0.0 & 0.0 & 0.0 & N/A & N/A \\
	293	Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
	294	Raise Other & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
	295	Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
	296	Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
	297	Match All & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
	298	Match None & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
	299	\hline
	300	\end{tabular}
	301
[c9f9d4f]	302	One result not directly related to \CFA but important to keep in
	303	mind is that, for exceptions, the standard intuition about which languages
	304	should go faster often does not hold. For example, there are cases where Python out-performs
	305	\Cpp and Java. The most likely explanation is that, since exceptions are
	306	rarely considered to be the common case, the more optimized languages
	307	make that case expense. In addition, languages with high-level
	308	representations have a much easier time scanning the stack as there is less
[0b67a19]	309	to decode.
	310
[c9f9d4f]	311	This observation means that while \CFA does not actually keep up with Python in every
	312	case, it is usually no worse than roughly half the speed of \Cpp. This performance is good
[0b67a19]	313	enough for the prototyping purposes of the project.
	314
[5438e41]	315	The test case where \CFA falls short is Raise Other, the case where the
	316	stack is unwound including a bunch of non-matching handlers.
[c9f9d4f]	317	This slowdown seems to come from missing optimizations.
	318
[5438e41]	319	Importantly, there is a huge slowdown in \Cpp's results bringing that brings
[c9f9d4f]	320	\CFA's performance back in that roughly half speed area. However many other
[5438e41]	321	\CFA benchmarks increase their run-time by a similar amount falling far
	322	behind their \Cpp counter-parts.
	323
	324	This suggests that the performance issue in Raise Other is just an
	325	optimization not being applied. Later versions of gcc may be able to
	326	optimize this case further, at least down to the half of \Cpp mark.
	327	A \CFA compiler that directly produced assembly could do even better as it
	328	would not have to work across some of \CFA's current abstractions, like
	329	the try terminate function.
[0b67a19]	330
	331	Resumption exception handling is also incredibly fast. Often an order of
	332	magnitude or two better than the best termination speed.
[c9f9d4f]	333	There is a simple explanation for this; traversing a linked list is much
[0b67a19]	334	faster than examining and unwinding the stack. When resumption does not do as
[c9f9d4f]	335	well its when more try statements are used per raise. Updating the internal
	336	linked list is not very expensive but it does add up.
[0b67a19]	337
	338	The relative speed of the Match All and Match None tests (within each
	339	language) can also show the effectiveness conditional matching as compared
	340	to catch and rethrow.
	341	\begin{itemize}[nosep]
	342	\item
	343	Java and Python get similar values in both tests.
[c9f9d4f]	344	Between the interpreted code, a higher level representation of the call
[0b67a19]	345	stack and exception reuse it it is possible the cost for a second
	346	throw can be folded into the first.
	347	% Is this due to optimization?
	348	\item
[c9f9d4f]	349	Both types of \CFA are slightly slower if there is not a match.
[0b67a19]	350	For termination this likely comes from unwinding a bit more stack through
	351	libunwind instead of executing the code normally.
	352	For resumption there is extra work in traversing more of the list and running
	353	more checks for a matching exceptions.
	354	% Resumption is a bit high for that but this is my best theory.
	355	\item
	356	Then there is \Cpp, which takes 2--3 times longer to catch and rethrow vs.
	357	just the catch. This is very high, but it does have to repeat the same
	358	process of unwinding the stack and may have to parse the LSDA of the function
	359	with the catch and rethrow twice, once before the catch and once after the
	360	rethrow.
	361	% I spent a long time thinking of what could push it over twice, this is all
	362	% I have to explain it.
	363	\end{itemize}
	364	The difference in relative performance does show that there are savings to
	365	be made by performing the check without catching the exception.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: