Changeset cfbab07 for doc/theses
- Timestamp:
- Aug 29, 2021, 11:45:49 AM (3 years ago)
- Branches:
- ADT, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, pthread-emulation, qualifiedEnum
- Children:
- eaeca5f
- Parents:
- e8bad5c8
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/andrew_beach_MMath/performance.tex
re8bad5c8 rcfbab07 2 2 \label{c:performance} 3 3 4 Performance has been of secondary importance for most of this project. 5 Instead, the focus has been to get the features working. The only performance 6 requirements is to ensure the tests for correctness run in a reasonable 7 amount of time. 4 Performance is of secondary importance for most of this project. 5 Instead, the focus was to get the features working. The only performance 6 requirement is to ensure the tests for correctness run in a reasonable 7 amount of time. Hence, a few basic performance tests were performed to 8 check this requirement. 8 9 9 10 \section{Test Set-Up} 10 Tests w ill be run in \CFA, C++, Java and Python.11 Tests were run in \CFA, C++, Java and Python. 11 12 In addition there are two sets of tests for \CFA, 12 one for termination exceptions and once with resumption exceptions.13 one for termination and once with resumption. 13 14 14 15 C++ is the most comparable language because both it and \CFA use the same 15 16 framework, libunwind. 16 In fact, the comparison is almost entirely a quality of implementation17 comparison.\CFA's EHM has had significantly less time to be optimized and17 In fact, the comparison is almost entirely in quality of implementation. 18 Specifically, \CFA's EHM has had significantly less time to be optimized and 18 19 does not generate its own assembly. It does have a slight advantage in that 19 there are some features it does not handle, throughutility functions,20 but otherwise \Cpp hasa significant advantage.21 22 Java is another very popular language with similar termination semantics.23 It is implemented in a very different environment, a virtual machine with20 \Cpp has to do some extra bookkeeping to support its utility functions, 21 but otherwise \Cpp should have a significant advantage. 22 23 Java a popular language with similar termination semantics, but 24 it is implemented in a very different environment, a virtual machine with 24 25 garbage collection. 25 26 It also implements the finally clause on try blocks allowing for a direct 26 27 feature-to-feature comparison. 27 As with \Cpp, Java's implementation is m ore mature, has more optimizations28 and more extra features.29 30 Python was used as a point ofcomparison because of the \CFA EHM's31 current performance goals, which is not be prohibitively slow while the28 As with \Cpp, Java's implementation is mature, has more optimizations 29 and extra features as compared to \CFA. 30 31 Python is used as an alternative comparison because of the \CFA EHM's 32 current performance goals, which is to not be prohibitively slow while the 32 33 features are designed and examined. Python has similar performance goals for 33 34 creating quick scripts and its wide use suggests it has achieved those goals. 34 35 35 Unfortunately there are no notable modern programming languages with36 resumption exceptions. Even the older programming languages with resumption s37 seem to be notable only for having resumption s.38 So instead resumptions are compared to a less similar but much more familiar 39 feature, termination exceptions.40 41 All tests are run inside a main loop which will perform the test42 repeatedly. This is toavoids start-up or tear-down time from36 Unfortunately, there are no notable modern programming languages with 37 resumption exceptions. Even the older programming languages with resumption 38 seem to be notable only for having resumption. 39 Instead, resumption is compared to its simulation in other programming 40 languages: fixup functions that are explicity passed into a function. 41 42 All tests are run inside a main loop that repeatedly performs a test. 43 This approach avoids start-up or tear-down time from 43 44 affecting the timing results. 45 The number of times the loop is run is configurable from the command line, 46 the number used in the timing runs is given with the results per test. 44 47 Tests ran their main loop a million times. 45 The Java versions of the test also run this loop an extra 1000 times before 46 beginning to time the results to ``warm-up" the JVM. 48 The Java tests runs the main loop 1000 times before 49 beginning the actual test to ``warm-up" the JVM. 50 % All other languages are precompiled or interpreted. 47 51 48 52 Timing is done internally, with time measured immediately before and 49 immediately after the test loop. The difference is calculated and printed. 50 53 after the test loop. The difference is calculated and printed. 51 54 The loop structure and internal timing means it is impossible to test 52 55 unhandled exceptions in \Cpp and Java as that would cause the process to … … 55 58 critical. 56 59 57 The exceptions used in these tests will always be a exception based off of 58 the base exception. This requirement minimizes performance differences based 59 on the object model used to repersent the exception. 60 61 All tests were designed to be as minimal as possible while still preventing 62 exessive optimizations. 60 The exceptions used in these tests are always based off of 61 the base exception for the language. 62 This requirement minimizes performance differences based 63 on the object model used to represent the exception. 64 65 All tests are designed to be as minimal as possible, while still preventing 66 excessive optimizations. 63 67 For example, empty inline assembly blocks are used in \CFA and \Cpp to 64 68 prevent excessive optimizations while adding no actual work. … … 68 72 % \code{C++}{catch(...)}). 69 73 74 When collecting data each test is run eleven times. The top three and bottom 75 three results are discarded and the remaining five values are averaged. 76 The test are run with the latest (still pre-release) \CFA compiler was used, 77 using gcc-10 as a backend. 78 g++-10 is used for \Cpp. 79 Java tests are complied and run with version 11.0.11. 80 Python used version 3.8. 81 The machines used to run the tests are: 82 \todo{Get patch versions for python, gcc and g++.} 83 \begin{itemize}[nosep] 84 \item ARM 2280 Kunpeng 920 48-core 2$\times$socket 85 \lstinline{@} 2.6 GHz running Linux v5.11.0-25 86 \item AMD 6380 Abu Dhabi 16-core 4$\times$socket 87 \lstinline{@} 2.5 GHz running Linux v5.11.0-25 88 \end{itemize} 89 Representing the two major families of hardware architecture. 90 70 91 \section{Tests} 71 92 The following tests were selected to test the performance of different 72 93 components of the exception system. 73 The should provide a guide as to where the EHM's costs can be found. 74 75 \paragraph{Raise and Handle} 76 The first group of tests involve setting up 77 So there is three layers to the test. The first is set up and a loop, which 78 configures the test and then runs it repeatedly to reduce the impact of 79 start-up and shutdown on the results. 80 Each iteration of the main loop 94 They should provide a guide as to where the EHM's costs are found. 95 96 \paragraph{Stack Traversal} 97 This group measures the cost of traversing the stack, 98 (and in termination, unwinding it). 99 Inside the main loop is a call to a recursive function. 100 This function calls itself F times before raising an exception. 101 F is configurable from the command line, but is usually 100. 102 This builds up many stack frames, and any contents they may have, 103 before the raise. 104 The exception is always handled at the base of the stack. 105 For example the Empty test for \CFA resumption looks like: 106 \begin{cfa} 107 void unwind_empty(unsigned int frames) { 108 if (frames) { 109 unwind_empty(frames - 1); 110 } else { 111 throwResume (empty_exception){&empty_vt}; 112 } 113 } 114 \end{cfa} 115 Other test cases have additional code around the recursive call add 116 something besides simple stack frames to the stack. 117 Note that both termination and resumption will have to traverse over 118 the stack but only termination has to unwind it. 81 119 \begin{itemize}[nosep] 82 \item Empty Function: 120 % \item None: 121 % Reuses the empty test code (see below) except that the number of frames 122 % is set to 0 (this is the only test for which the number of frames is not 123 % 100). This isolates the start-up and shut-down time of a throw. 124 \item Empty: 83 125 The repeating function is empty except for the necessary control code. 126 As other traversal tests add to this, so it is the baseline for the group 127 as the cost comes from traversing over and unwinding a stack frame 128 that has no other interactions with the exception system. 84 129 \item Destructor: 85 130 The repeating function creates an object with a destructor before calling 86 131 itself. 132 Comparing this to the empty test gives the time to traverse over and/or 133 unwind a destructor. 87 134 \item Finally: 88 135 The repeating function calls itself inside a try block with a finally clause 89 136 attached. 137 Comparing this to the empty test gives the time to traverse over and/or 138 unwind a finally clause. 90 139 \item Other Handler: 91 140 The repeating function calls itself inside a try block with a handler that 92 will not match the raised exception. (But is of the same kind of handler.) 141 will not match the raised exception, but is of the same kind of handler. 142 This means that the EHM will have to check each handler, but will continue 143 over all of the until it reaches the base of the stack. 144 Comparing this to the empty test gives the time to traverse over and/or 145 unwind a handler. 93 146 \end{itemize} 94 147 95 148 \paragraph{Cross Try Statement} 96 The next group measures the cost of a try statement when no exceptions are 97 raised. The test is set-up, then there is a loop to reduce the impact of 98 start-up and shutdown on the results. 99 In each iteration, a try statement is executed. Entering and leaving a loop 100 is all the test wants to do. 149 This group of tests measures the cost setting up exception handling if it is 150 not used (because the exceptional case did not occur). 151 Tests repeatedly cross (enter and leave, execute) a try statement but never 152 preform a raise. 101 153 \begin{itemize}[nosep] 102 154 \item Handler: 103 The try statement has a handler (of the matchingkind).155 The try statement has a handler (of the appropriate kind). 104 156 \item Finally: 105 157 The try statement has a finally clause. … … 107 159 108 160 \paragraph{Conditional Matching} 109 This group of tests checks the cost of conditional matching.161 This group measures the cost of conditional matching. 110 162 Only \CFA implements the language level conditional match, 111 the other languages m ust mimicwith an ``unconditional" match (it still112 checks the exception's type) and conditional re-raise if it was not supposed163 the other languages mimic it with an ``unconditional" match (it still 164 checks the exception's type) and conditional re-raise if it is not supposed 113 165 to handle that exception. 166 167 There is the pattern shown in \CFA and \Cpp. Java and Python use the same 168 pattern as \Cpp, but with their own syntax. 169 170 \begin{minipage}{0.45\textwidth} 171 \begin{cfa} 172 try { 173 ... 174 } catch (exception_t * e ; 175 should_catch(e)) { 176 ... 177 } 178 \end{cfa} 179 \end{minipage} 180 \begin{minipage}{0.55\textwidth} 181 \begin{lstlisting}[language=C++] 182 try { 183 ... 184 } catch (std::exception & e) { 185 if (!should_catch(e)) throw; 186 ... 187 } 188 \end{lstlisting} 189 \end{minipage} 114 190 \begin{itemize}[nosep] 115 191 \item Match All: … … 118 194 The condition is always false. (Never matches or always re-raises.) 119 195 \end{itemize} 196 197 \paragraph{Resumption Simulation} 198 A slightly altered version of the Empty Traversal test is used when comparing 199 resumption to fix-up routines. 200 The handler, the actual resumption handler or the fix-up routine, 201 always captures a variable at the base of the loop, 202 and receives a reference to a variable at the raise site, either as a 203 field on the exception or an argument to the fix-up routine. 204 % I don't actually know why that is here but not anywhere else. 120 205 121 206 %\section{Cost in Size} … … 130 215 131 216 \section{Results} 132 Each test was run eleven times. The top three and bottom three results were 133 discarded and the remaining five values are averaged. 134 135 In cases where a feature is not supported by a language the test is skipped 136 for that language. Similarly, if a test is does not change between resumption 137 and termination in \CFA, then only one test is written and the result 138 was put into the termination column. 139 140 % Raw Data: 141 % run-algol-a.sat 142 % --------------- 143 % Raise Empty & 82687046678 & 291616256 & 3252824847 & 15422937623 & 14736271114 \\ 144 % Raise D'tor & 219933199603 & 297897792 & 223602799362 & N/A & N/A \\ 145 % Raise Finally & 219703078448 & 298391745 & N/A & ... & 18923060958 \\ 146 % Raise Other & 296744104920 & 2854342084 & 112981255103 & 15475924808 & 21293137454 \\ 147 % Cross Handler & 9256648 & 13518430 & 769328 & 3486252 & 31790804 \\ 148 % Cross Finally & 769319 & N/A & N/A & 2272831 & 37491962 \\ 149 % Match All & 3654278402 & 47518560 & 3218907794 & 1296748192 & 624071886 \\ 150 % Match None & 4788861754 & 58418952 & 9458936430 & 1318065020 & 625200906 \\ 151 % 152 % run-algol-thr-c 153 % --------------- 154 % Raise Empty & 3757606400 & 36472972 & 3257803337 & 15439375452 & 14717808642 \\ 155 % Raise D'tor & 64546302019 & 102148375 & 223648121635 & N/A & N/A \\ 156 % Raise Finally & 64671359172 & 103285005 & N/A & 15442729458 & 18927008844 \\ 157 % Raise Other & 294143497130 & 2630130385 & 112969055576 & 15448220154 & 21279953424 \\ 158 % Cross Handler & 9646462 & 11955668 & 769328 & 3453707 & 31864074 \\ 159 % Cross Finally & 773412 & N/A & N/A & 2253825 & 37266476 \\ 160 % Match All & 3719462155 & 43294042 & 3223004977 & 1286054154 & 623887874 \\ 161 % Match None & 4971630929 & 55311709 & 9481225467 & 1310251289 & 623752624 \\ 162 % 163 % run-algol-04-a 164 % -------------- 165 % Raise Empty & 0.0 & 0.0 & 3250260945 & 0.0 & 0.0 \\ 166 % Raise D'tor & 0.0 & 0.0 & 29017675113 & N/A & N/A \\ 167 % Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\ 168 % Raise Other & 0.0 & 0.0 & 24411823773 & 0.0 & 0.0 \\ 169 % Cross Handler & 0.0 & 0.0 & 769334 & 0.0 & 0.0 \\ 170 % Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\ 171 % Match All & 0.0 & 0.0 & 3254283504 & 0.0 & 0.0 \\ 172 % Match None & 0.0 & 0.0 & 9476060146 & 0.0 & 0.0 \\ 173 174 \begin{tabular}{|l|c c c c c|} 175 \hline 176 & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\ 177 \hline 178 Raise Empty & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 179 Raise D'tor & 0.0 & 0.0 & 0.0 & N/A & N/A \\ 180 Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\ 181 Raise Other & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 182 Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 183 Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\ 184 Match All & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 185 Match None & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 217 % First, introduce the tables. 218 \autoref{t:PerformanceTermination}, 219 \autoref{t:PerformanceResumption} 220 and~\autoref{t:PerformanceFixupRoutines} 221 show the test results. 222 In cases where a feature is not supported by a language, the test is skipped 223 for that language and the result is marked N/A. 224 There are also cases where the feature is supported but measuring its 225 cost is impossible. This happened with Java, which uses a JIT that optimize 226 away the tests and it cannot be stopped.\cite{Dice21} 227 These tests are marked N/C. 228 To get results in a consistent range (1 second to 1 minute is ideal, 229 going higher is better than going low) N, the number of iterations of the 230 main loop in each test, is varied between tests. It is also given in the 231 results and usually have a value in the millions. 232 233 An anomaly in some results came from \CFA's use of gcc nested functions. 234 These nested functions are used to create closures that can access stack 235 variables in their lexical scope. 236 However, if they do so then they can cause the benchmark's run-time to 237 increase by an order of magnitude. 238 The simplest solution is to make those values global variables instead 239 of function local variables. 240 % Do we know if editing a global inside nested function is a problem? 241 Tests that had to be modified to avoid this problem have been marked 242 with a ``*'' in the results. 243 244 % Now come the tables themselves: 245 % You might need a wider window for this. 246 247 \begin{table}[htb] 248 \centering 249 \caption{Termination Performance Results (sec)} 250 \label{t:PerformanceTermination} 251 \begin{tabular}{|r|*{2}{|r r r r|}} 252 \hline 253 & \multicolumn{4}{c||}{AMD} & \multicolumn{4}{c|}{ARM} \\ 254 \cline{2-9} 255 N\hspace{8pt} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} & 256 \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ 257 \hline 258 Empty Traversal (1M) & 3.4 & 2.8 & 18.3 & 23.4 & 3.7 & 3.2 & 15.5 & 14.8 \\ 259 D'tor Traversal (1M) & 48.4 & 23.6 & N/A & N/A & 64.2 & 29.0 & N/A & N/A \\ 260 Finally Traversal (1M) & 3.4* & N/A & 17.9 & 29.0 & 4.1* & N/A & 15.6 & 19.0 \\ 261 Other Traversal (1M) & 3.6* & 23.2 & 18.2 & 32.7 & 4.0* & 24.5 & 15.5 & 21.4 \\ 262 Cross Handler (100M) & 6.0 & 0.9 & N/C & 37.4 & 10.0 & 0.8 & N/C & 32.2 \\ 263 Cross Finally (100M) & 0.9 & N/A & N/C & 44.1 & 0.8 & N/A & N/C & 37.3 \\ 264 Match All (10M) & 32.9 & 20.7 & 13.4 & 4.9 & 36.2 & 24.5 & 12.0 & 3.1 \\ 265 Match None (10M) & 32.7 & 50.3 & 11.0 & 5.1 & 36.3 & 71.9 & 12.3 & 4.2 \\ 186 266 \hline 187 267 \end{tabular} 188 189 % run-plg7a-a.sat 190 % --------------- 191 % Raise Empty & 57169011329 & 296612564 & 2788557155 & 17511466039 & 23324548496 \\ 192 % Raise D'tor & 150599858014 & 318443709 & 149651693682 & N/A & N/A \\ 193 % Raise Finally & 148223145000 & 373325807 & N/A & ... & 29074552998 \\ 194 % Raise Other & 189463708732 & 3017109322 & 85819281694 & 17584295487 & 32602686679 \\ 195 % Cross Handler & 8001654 & 13584858 & 1555995 & 6626775 & 41927358 \\ 196 % Cross Finally & 1002473 & N/A & N/A & 4554344 & 51114381 \\ 197 % Match All & 3162460860 & 37315018 & 2649464591 & 1523205769 & 742374509 \\ 198 % Match None & 4054773797 & 47052659 & 7759229131 & 1555373654 & 744656403 \\ 199 % 200 % run-plg7a-thr-a 201 % --------------- 202 % Raise Empty & 3604235388 & 29829965 & 2786931833 & 17576506385 & 23352975105 \\ 203 % Raise D'tor & 46552380948 & 178709605 & 149834207219 & N/A & N/A \\ 204 % Raise Finally & 46265157775 & 177906320 & N/A & 17493045092 & 29170962959 \\ 205 % Raise Other & 195659245764 & 2376968982 & 86070431924 & 17552979675 & 32501882918 \\ 206 % Cross Handler & 397031776 & 12503552 & 1451225 & 6658628 & 42304965 \\ 207 % Cross Finally & 1136746 & N/A & N/A & 4468799 & 46155817 \\ 208 % Match All & 3189512499 & 39124453 & 2667795989 & 1525889031 & 733785613 \\ 209 % Match None & 4094675477 & 48749857 & 7850618572 & 1566713577 & 733478963 \\ 210 % 211 % run-plg7a-04-a 212 % -------------- 213 % 0.0 are unfilled. 214 % Raise Empty & 0.0 & 0.0 & 2770781479 & 0.0 & 0.0 \\ 215 % Raise D'tor & 0.0 & 0.0 & 23530084907 & N/A & N/A \\ 216 % Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\ 217 % Raise Other & 0.0 & 0.0 & 23816827982 & 0.0 & 0.0 \\ 218 % Cross Handler & 0.0 & 0.0 & 1422188 & 0.0 & 0.0 \\ 219 % Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\ 220 % Match All & 0.0 & 0.0 & 2671989778 & 0.0 & 0.0 \\ 221 % Match None & 0.0 & 0.0 & 7829059869 & 0.0 & 0.0 \\ 222 223 % PLG7A (in seconds) 224 \begin{tabular}{|l|c c c c c|} 225 \hline 226 & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\ 227 \hline 228 Raise Empty & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 229 Raise D'tor & 0.0 & 0.0 & 0.0 & N/A & N/A \\ 230 Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\ 231 Raise Other & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 232 Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 233 Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\ 234 Match All & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 235 Match None & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 268 \end{table} 269 270 \begin{table}[htb] 271 \centering 272 \caption{Resumption Performance Results (sec)} 273 \label{t:PerformanceResumption} 274 \begin{tabular}{|r||r||r|} 275 \hline 276 N\hspace{8pt} 277 & AMD & ARM \\ 278 \hline 279 Empty Traversal (10M) & 0.2 & 0.3 \\ 280 D'tor Traversal (10M) & 1.8 & 1.0 \\ 281 Finally Traversal (10M) & 1.7 & 1.0 \\ 282 Other Traversal (10M) & 22.6 & 25.9 \\ 283 Cross Handler (100M) & 8.4 & 11.9 \\ 284 Match All (100M) & 2.3 & 3.2 \\ 285 Match None (100M) & 2.9 & 3.9 \\ 236 286 \hline 237 287 \end{tabular} 238 239 One result that is not directly related to \CFA but is important to keep in 240 mind is that in exceptions the standard intuitions about which languages 241 should go faster often do not hold. There are cases where Python out-preforms 242 \Cpp and Java. The most likely explination is that, since exceptions are 243 rarely considered to be the common case, the more optimized langages have 244 optimized at their expence. In addition languages with high level 245 repersentations have a much easier time scanning the stack as there is less 246 to decode. 247 248 This means that while \CFA does not actually keep up with Python in every 249 case it is usually no worse than roughly half the speed of \Cpp. This is good 250 enough for the prototyping purposes of the project. 251 252 The test case where \CFA falls short is Raise Other, the case where the 253 stack is unwound including a bunch of non-matching handlers. 254 This slowdown seems to come from missing optimizations, 255 the results above came from gcc/g++ 10 (gcc as \CFA backend or g++ for \Cpp) 256 but the results change if they are run in gcc/g++ 9 instead. 257 Importantly, there is a huge slowdown in \Cpp's results bringing that brings 258 \CFA's performace back in that roughly half speed area. However many other 259 \CFA benchmarks increase their run-time by a similar amount falling far 260 behind their \Cpp counter-parts. 261 262 This suggests that the performance issue in Raise Other is just an 263 optimization not being applied. Later versions of gcc may be able to 264 optimize this case further, at least down to the half of \Cpp mark. 265 A \CFA compiler that directly produced assembly could do even better as it 266 would not have to work across some of \CFA's current abstractions, like 267 the try terminate function. 268 269 Resumption exception handling is also incredibly fast. Often an order of 270 magnitude or two better than the best termination speed. 271 There is a simple explination for this; traversing a linked list is much 272 faster than examining and unwinding the stack. When resumption does not do as 273 well its when more try statements are used per raise. Updating the interal 274 linked list is not very expencive but it does add up. 275 276 The relative speed of the Match All and Match None tests (within each 277 language) can also show the effectiveness conditional matching as compared 278 to catch and rethrow. 279 \begin{itemize}[nosep] 280 \item 281 Java and Python get similar values in both tests. 282 Between the interperated code, a higher level repersentation of the call 283 stack and exception reuse it it is possible the cost for a second 284 throw can be folded into the first. 285 % Is this due to optimization? 286 \item 287 Both types of \CFA are slighly slower if there is not a match. 288 For termination this likely comes from unwinding a bit more stack through 289 libunwind instead of executing the code normally. 290 For resumption there is extra work in traversing more of the list and running 291 more checks for a matching exceptions. 292 % Resumption is a bit high for that but this is my best theory. 293 \item 294 Then there is \Cpp, which takes 2--3 times longer to catch and rethrow vs. 295 just the catch. This is very high, but it does have to repeat the same 296 process of unwinding the stack and may have to parse the LSDA of the function 297 with the catch and rethrow twice, once before the catch and once after the 298 rethrow. 299 % I spent a long time thinking of what could push it over twice, this is all 300 % I have to explain it. 301 \end{itemize} 302 The difference in relative performance does show that there are savings to 303 be made by performing the check without catching the exception. 288 \end{table} 289 290 \begin{table}[htb] 291 \centering 292 \small 293 \caption{Resumption/Fixup Routine Comparison (sec)} 294 \label{t:PerformanceFixupRoutines} 295 \setlength{\tabcolsep}{5pt} 296 \begin{tabular}{|r|*{2}{|r r r r r|}} 297 \hline 298 & \multicolumn{5}{c||}{AMD} & \multicolumn{5}{c|}{ARM} \\ 299 \cline{2-11} 300 N\hspace{8pt} & \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} & 301 \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ 302 \hline 303 Resume Empty (10M) & 3.8 & 3.5 & 14.7 & 2.3 & 176.1 & 0.3 & 0.1 & 8.9 & 1.2 & 119.9 \\ 304 %Resume Other (10M) & 4.0* & 0.1* & 21.9 & 6.2 & 381.0 & 0.3* & 0.1* & 13.2 & 5.0 & 290.7 \\ 305 \hline 306 \end{tabular} 307 \end{table} 308 309 % Now discuss the results in the tables. 310 One result not directly related to \CFA but important to keep in mind is that, 311 for exceptions the standard intuition about which languages should go 312 faster often does not hold. 313 For example, there are a few cases where Python out-performs 314 \CFA, \Cpp and Java. 315 The most likely explanation is that, since exceptions 316 are rarely considered to be the common case, the more optimized languages 317 make that case expensive to improve other cases. 318 In addition, languages with high-level representations have a much 319 easier time scanning the stack as there is less to decode. 320 321 As stated, 322 the performance tests are not attempting to show \CFA has a new competitive 323 way of implementing exception handling. 324 The only performance requirement is to insure the \CFA EHM has reasonable 325 performance for prototyping. 326 Although that may be hard to exactly quantify, we believe it has succeeded 327 in that regard. 328 Details on the different test cases follow. 329 330 \begin{description} 331 \item[Empty Traversal] 332 \CFA is slower than \Cpp, but is still faster than the other languages 333 and closer to \Cpp than other languages. 334 This is to be expected as \CFA is closer to \Cpp than the other languages. 335 336 \item[D'tor Traversal] 337 Running destructors causes huge slowdown in every language that supports 338 them. \CFA has a higher proportionate slowdown but it is similar to \Cpp's. 339 Considering the amount of work done in destructors is so low the cost 340 likely comes from the change of context required to do that work. 341 342 \item[Finally Traversal] 343 Speed is similar to Empty Traversal in all languages that support finally 344 clauses. Only Python seems to have a larger than random noise change in 345 its run-time and it is still not large. 346 Despite the similarity between finally clauses and destructors, 347 finally clauses seem to avoid the spike in run-time destructors have. 348 Possibly some optimization removes the cost of changing contexts. 349 \todo{OK, I think the finally clause may have been optimized out.} 350 351 \item[Other Traversal] 352 For \Cpp, stopping to check if a handler applies seems to be about as 353 expensive as stopping to run a destructor. 354 This results in a significant jump. 355 356 Other languages experiance a small increase in run-time. 357 The small increase likely comes from running the checks, 358 but they could avoid the spike by not having the same kind of overhead for 359 switching to the check's context. 360 361 \todo{Could revist Other Traversal, after Finally Traversal.} 362 363 \item[Cross Handler] 364 Here \CFA falls behind \Cpp by a much more significant margin. 365 This is likely due to the fact \CFA has to insert two extra function 366 calls while \Cpp doesn't have to do execute any other instructions. 367 Python is much further behind. 368 369 \item[Cross Finally] 370 \CFA's performance now matches \Cpp's from Cross Handler. 371 If the code from the finally clause is being inlined, 372 which is just a asm comment, than there are no additional instructions 373 to execute again when exiting the try statement normally. 374 375 \item[Conditional Match] 376 Both of the conditional matching tests can be considered on their own, 377 however for evaluating the value of conditional matching itself the 378 comparison of the two sets of results is useful. 379 Consider the massive jump in run-time for \Cpp going from match all to match 380 none, which none of the other languages have. 381 Some strange interaction is causing run-time to more than double for doing 382 twice as many raises. 383 Java and Python avoid this problem and have similar run-time for both tests, 384 possibly through resource reuse or their program representation. 385 However \CFA is built like \Cpp and avoids the problem as well, this matches 386 the pattern of the conditional match which makes the two execution paths 387 much more similar. 388 389 \end{description} 390 391 Moving on to resumption there is one general note, 392 resumption is \textit{fast}, the only test where it fell 393 behind termination is Cross Handler. 394 In every other case, the number of iterations had to be increased by a 395 factor of 10 to get the run-time in an approprate range 396 and in some cases resumption still took less time. 397 398 % I tried \paragraph and \subparagraph, maybe if I could adjust spacing 399 % between paragraphs those would work. 400 \begin{description} 401 \item[Empty Traversal] 402 See above for the general speed-up notes. 403 This result is not surprising as resumption's link list approach 404 means that traversing over stack frames without a resumption handler is 405 $O(1)$. 406 407 \item[D'tor Traversal] 408 Resumption does have the same spike in run-time that termination has. 409 The run-time is actually very similar to Finally Traversal. 410 As resumption does not unwind the stack both destructors and finally 411 clauses are run while walking down the stack normally. 412 So it follows their performance is similar. 413 414 \item[Finally Traversal] 415 The increase in run-time fromm Empty Traversal (once adjusted for 416 the number of iterations) roughly the same as for termination. 417 This suggests that the 418 419 \item[Other Traversal] 420 Traversing across handlers reduces resumption's advantage as it actually 421 has to stop and check each one. 422 Resumption still came out ahead (adjusting for iterations) but by much less 423 than the other cases. 424 425 \item[Cross Handler] 426 The only test case where resumption could not keep up with termination, 427 although the difference is not as significant as many other cases. 428 It is simply a matter of where the costs come from. Even if \CFA termination 429 is not ``zero-cost" passing through an empty function still seems to be 430 cheaper than updating global values. 431 432 \item[Conditional Match] 433 Resumption shows a slight slowdown if the exception is not matched 434 by the first handler, which follows from the fact the second handler now has 435 to be checked. However the difference is not large. 436 437 \end{description} 438 439 Finally are the results of the resumption/fixup routine comparison. 440 These results are surprisingly varied, it is possible that creating a closure 441 has more to do with performance than passing the argument through layers of 442 calls. 443 Even with 100 stack frames though, resumption is only about as fast as 444 manually passing a fixup routine. 445 So there is a cost for the additional power and flexibility exceptions 446 provide.
Note: See TracChangeset
for help on using the changeset viewer.