Changeset 5a40e4e for doc/theses/andrew_beach_MMath/performance.tex
- Timestamp:
- Sep 9, 2021, 3:56:32 PM (4 years ago)
- Branches:
- ADT, ast-experimental, enum, forall-pointer-decay, master, pthread-emulation, qualifiedEnum
- Children:
- d0b9247
- Parents:
- dd1cc02 (diff), d8d512e (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)links above to see all the changes relative to each parent. - File:
-
- 1 edited
-
doc/theses/andrew_beach_MMath/performance.tex (modified) (6 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/andrew_beach_MMath/performance.tex
rdd1cc02 r5a40e4e 2 2 \label{c:performance} 3 3 4 Performance has been of secondary importance for most of this project. 5 Instead, the focus has been to get the features working. The only performance 6 requirements is to ensure the tests for correctness run in a reasonable 7 amount of time. 4 Performance is of secondary importance for most of this project. 5 Instead, the focus was to get the features working. The only performance 6 requirement is to ensure the tests for correctness run in a reasonable 7 amount of time. Hence, a few basic performance tests were performed to 8 check this requirement. 8 9 9 10 \section{Test Set-Up} 10 Tests w ill be run in \CFA, C++, Java and Python.11 Tests were run in \CFA, C++, Java and Python. 11 12 In addition there are two sets of tests for \CFA, 12 one for termination exceptions and once with resumption exceptions.13 one with termination and one with resumption. 13 14 14 15 C++ is the most comparable language because both it and \CFA use the same 15 16 framework, libunwind. 16 In fact, the comparison is almost entirely a quality of implementation17 comparison.\CFA's EHM has had significantly less time to be optimized and17 In fact, the comparison is almost entirely in quality of implementation. 18 Specifically, \CFA's EHM has had significantly less time to be optimized and 18 19 does not generate its own assembly. It does have a slight advantage in that 19 there are some features it does not handle, throughutility functions,20 but otherwise \Cpp hasa significant advantage.21 22 Java is another very popular language with similar termination semantics.23 Itis implemented in a very different environment, a virtual machine with20 \Cpp has to do some extra bookkeeping to support its utility functions, 21 but otherwise \Cpp should have a significant advantage. 22 23 Java, a popular language with similar termination semantics, 24 is implemented in a very different environment, a virtual machine with 24 25 garbage collection. 25 26 It also implements the finally clause on try blocks allowing for a direct 26 27 feature-to-feature comparison. 27 As with \Cpp, Java's implementation is m ore mature, has more optimizations28 and more extra features.29 30 Python was used as a point ofcomparison because of the \CFA EHM's31 current performance goals, which is not be prohibitively slow while the28 As with \Cpp, Java's implementation is mature, has more optimizations 29 and extra features as compared to \CFA. 30 31 Python is used as an alternative comparison because of the \CFA EHM's 32 current performance goals, which is to not be prohibitively slow while the 32 33 features are designed and examined. Python has similar performance goals for 33 34 creating quick scripts and its wide use suggests it has achieved those goals. 34 35 35 Unfortunately there are no notable modern programming languages with36 resumption exceptions. Even the older programming languages with resumption s37 seem to be notable only for having resumption s.38 So instead resumptions are compared to a less similar but much more familiar 39 feature, termination exceptions.40 41 All tests are run inside a main loop which will perform the test42 repeatedly. This is toavoids start-up or tear-down time from36 Unfortunately, there are no notable modern programming languages with 37 resumption exceptions. Even the older programming languages with resumption 38 seem to be notable only for having resumption. 39 Instead, resumption is compared to its simulation in other programming 40 languages: fixup functions that are explicitly passed into a function. 41 42 All tests are run inside a main loop that repeatedly performs a test. 43 This approach avoids start-up or tear-down time from 43 44 affecting the timing results. 44 Tests ran their main loop a million times. 45 The Java versions of the test also run this loop an extra 1000 times before 46 beginning to time the results to ``warm-up" the JVM. 45 The number of times the loop is run is configurable from the command line; 46 the number used in the timing runs is given with the results per test. 47 The Java tests run the main loop 1000 times before 48 beginning the actual test to ``warm-up" the JVM. 49 % All other languages are precompiled or interpreted. 47 50 48 51 Timing is done internally, with time measured immediately before and 49 immediately after the test loop. The difference is calculated and printed. 50 52 after the test loop. The difference is calculated and printed. 51 53 The loop structure and internal timing means it is impossible to test 52 54 unhandled exceptions in \Cpp and Java as that would cause the process to … … 55 57 critical. 56 58 57 The exceptions used in these tests will always be a exception based off of 58 the base exception. This requirement minimizes performance differences based 59 on the object model used to repersent the exception. 60 61 All tests were designed to be as minimal as possible while still preventing 62 exessive optimizations. 59 The exceptions used in these tests are always based off of 60 the base exception for the language. 61 This requirement minimizes performance differences based 62 on the object model used to represent the exception. 63 64 All tests are designed to be as minimal as possible, while still preventing 65 excessive optimizations. 63 66 For example, empty inline assembly blocks are used in \CFA and \Cpp to 64 67 prevent excessive optimizations while adding no actual work. … … 68 71 % \code{C++}{catch(...)}). 69 72 73 When collecting data, each test is run eleven times. The top three and bottom 74 three results are discarded and the remaining five values are averaged. 75 The test are run with the latest (still pre-release) \CFA compiler, 76 using gcc-10 10.3.0 as a backend. 77 g++-10 10.3.0 is used for \Cpp. 78 Java tests are complied and run with version 11.0.11. 79 Python used version 3.8.10. 80 The machines used to run the tests are: 81 \begin{itemize}[nosep] 82 \item ARM 2280 Kunpeng 920 48-core 2$\times$socket 83 \lstinline{@} 2.6 GHz running Linux v5.11.0-25 84 \item AMD 6380 Abu Dhabi 16-core 4$\times$socket 85 \lstinline{@} 2.5 GHz running Linux v5.11.0-25 86 \end{itemize} 87 Representing the two major families of hardware architecture. 88 70 89 \section{Tests} 71 90 The following tests were selected to test the performance of different 72 91 components of the exception system. 73 The should provide a guide as to where the EHM's costs can be found. 74 75 \paragraph{Raise and Handle} 76 The first group of tests involve setting up 77 So there is three layers to the test. The first is set up and a loop, which 78 configures the test and then runs it repeatedly to reduce the impact of 79 start-up and shutdown on the results. 80 Each iteration of the main loop 92 They should provide a guide as to where the EHM's costs are found. 93 94 \paragraph{Stack Traversal} 95 This group measures the cost of traversing the stack, 96 (and in termination, unwinding it). 97 Inside the main loop is a call to a recursive function. 98 This function calls itself F times before raising an exception. 99 F is configurable from the command line, but is usually 100. 100 This builds up many stack frames, and any contents they may have, 101 before the raise. 102 The exception is always handled at the base of the stack. 103 For example the Empty test for \CFA resumption looks like: 104 \begin{cfa} 105 void unwind_empty(unsigned int frames) { 106 if (frames) { 107 unwind_empty(frames - 1); 108 } else { 109 throwResume (empty_exception){&empty_vt}; 110 } 111 } 112 \end{cfa} 113 Other test cases have additional code around the recursive call adding 114 something besides simple stack frames to the stack. 115 Note that both termination and resumption have to traverse over 116 the stack but only termination has to unwind it. 81 117 \begin{itemize}[nosep] 82 \item Empty Function: 118 % \item None: 119 % Reuses the empty test code (see below) except that the number of frames 120 % is set to 0 (this is the only test for which the number of frames is not 121 % 100). This isolates the start-up and shut-down time of a throw. 122 \item Empty: 83 123 The repeating function is empty except for the necessary control code. 124 As other traversal tests add to this, it is the baseline for the group 125 as the cost comes from traversing over and unwinding a stack frame 126 that has no other interactions with the exception system. 84 127 \item Destructor: 85 128 The repeating function creates an object with a destructor before calling 86 129 itself. 130 Comparing this to the empty test gives the time to traverse over and 131 unwind a destructor. 87 132 \item Finally: 88 133 The repeating function calls itself inside a try block with a finally clause 89 134 attached. 135 Comparing this to the empty test gives the time to traverse over and 136 unwind a finally clause. 90 137 \item Other Handler: 91 138 The repeating function calls itself inside a try block with a handler that 92 will not match the raised exception. (But is of the same kind of handler.) 139 does not match the raised exception, but is of the same kind of handler. 140 This means that the EHM has to check each handler, and continue 141 over all of them until it reaches the base of the stack. 142 Comparing this to the empty test gives the time to traverse over and 143 unwind a handler. 93 144 \end{itemize} 94 145 95 146 \paragraph{Cross Try Statement} 96 Th e next group measures the cost of a try statement when no exceptions are97 raised. The test is set-up, then there is a loop to reduce the impact of 98 start-up and shutdown on the results.99 In each iteration, a try statement is executed. Entering and leaving a loop 100 is all the test wants to do.147 This group of tests measures the cost for setting up exception handling, 148 if it is 149 not used (because the exceptional case did not occur). 150 Tests repeatedly cross (enter, execute and leave) a try statement but never 151 perform a raise. 101 152 \begin{itemize}[nosep] 102 153 \item Handler: 103 The try statement has a handler (of the matchingkind).154 The try statement has a handler (of the appropriate kind). 104 155 \item Finally: 105 156 The try statement has a finally clause. … … 107 158 108 159 \paragraph{Conditional Matching} 109 This group of tests checks the cost of conditional matching.160 This group measures the cost of conditional matching. 110 161 Only \CFA implements the language level conditional match, 111 the other languages m ust mimicwith an ``unconditional" match (it still112 checks the exception's type) and conditional re-raise if it was not supposed162 the other languages mimic it with an ``unconditional" match (it still 163 checks the exception's type) and conditional re-raise if it is not supposed 113 164 to handle that exception. 165 166 Here is the pattern shown in \CFA and \Cpp. Java and Python use the same 167 pattern as \Cpp, but with their own syntax. 168 169 \begin{minipage}{0.45\textwidth} 170 \begin{cfa} 171 try { 172 ... 173 } catch (exception_t * e ; 174 should_catch(e)) { 175 ... 176 } 177 \end{cfa} 178 \end{minipage} 179 \begin{minipage}{0.55\textwidth} 180 \begin{lstlisting}[language=C++] 181 try { 182 ... 183 } catch (std::exception & e) { 184 if (!should_catch(e)) throw; 185 ... 186 } 187 \end{lstlisting} 188 \end{minipage} 114 189 \begin{itemize}[nosep] 115 190 \item Match All: … … 118 193 The condition is always false. (Never matches or always re-raises.) 119 194 \end{itemize} 195 196 \paragraph{Resumption Simulation} 197 A slightly altered version of the Empty Traversal test is used when comparing 198 resumption to fix-up routines. 199 The handler, the actual resumption handler or the fix-up routine, 200 always captures a variable at the base of the loop, 201 and receives a reference to a variable at the raise site, either as a 202 field on the exception or an argument to the fix-up routine. 203 % I don't actually know why that is here but not anywhere else. 120 204 121 205 %\section{Cost in Size} … … 130 214 131 215 \section{Results} 132 Each test was run eleven times. The top three and bottom three results were 133 discarded and the remaining five values are averaged. 134 135 In cases where a feature is not supported by a language the test is skipped 136 for that language. Similarly, if a test is does not change between resumption 137 and termination in \CFA, then only one test is written and the result 138 was put into the termination column. 139 140 % Raw Data: 141 % run-algol-a.sat 142 % --------------- 143 % Raise Empty & 82687046678 & 291616256 & 3252824847 & 15422937623 & 14736271114 \\ 144 % Raise D'tor & 219933199603 & 297897792 & 223602799362 & N/A & N/A \\ 145 % Raise Finally & 219703078448 & 298391745 & N/A & ... & 18923060958 \\ 146 % Raise Other & 296744104920 & 2854342084 & 112981255103 & 15475924808 & 21293137454 \\ 147 % Cross Handler & 9256648 & 13518430 & 769328 & 3486252 & 31790804 \\ 148 % Cross Finally & 769319 & N/A & N/A & 2272831 & 37491962 \\ 149 % Match All & 3654278402 & 47518560 & 3218907794 & 1296748192 & 624071886 \\ 150 % Match None & 4788861754 & 58418952 & 9458936430 & 1318065020 & 625200906 \\ 151 % 152 % run-algol-thr-c 153 % --------------- 154 % Raise Empty & 3757606400 & 36472972 & 3257803337 & 15439375452 & 14717808642 \\ 155 % Raise D'tor & 64546302019 & 102148375 & 223648121635 & N/A & N/A \\ 156 % Raise Finally & 64671359172 & 103285005 & N/A & 15442729458 & 18927008844 \\ 157 % Raise Other & 294143497130 & 2630130385 & 112969055576 & 15448220154 & 21279953424 \\ 158 % Cross Handler & 9646462 & 11955668 & 769328 & 3453707 & 31864074 \\ 159 % Cross Finally & 773412 & N/A & N/A & 2253825 & 37266476 \\ 160 % Match All & 3719462155 & 43294042 & 3223004977 & 1286054154 & 623887874 \\ 161 % Match None & 4971630929 & 55311709 & 9481225467 & 1310251289 & 623752624 \\ 162 \begin{tabular}{|l|c c c c c|} 163 \hline 164 & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\ 165 \hline 166 Raise Empty & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 167 Raise D'tor & 0.0 & 0.0 & 0.0 & N/A & N/A \\ 168 Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\ 169 Raise Other & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 170 Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 171 Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\ 172 Match All & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 173 Match None & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 216 % First, introduce the tables. 217 \autoref{t:PerformanceTermination}, 218 \autoref{t:PerformanceResumption} 219 and~\autoref{t:PerformanceFixupRoutines} 220 show the test results. 221 In cases where a feature is not supported by a language, the test is skipped 222 for that language and the result is marked N/A. 223 There are also cases where the feature is supported but measuring its 224 cost is impossible. This happened with Java, which uses a JIT that optimize 225 away the tests and it cannot be stopped.\cite{Dice21} 226 These tests are marked N/C. 227 To get results in a consistent range (1 second to 1 minute is ideal, 228 going higher is better than going low) N, the number of iterations of the 229 main loop in each test, is varied between tests. It is also given in the 230 results and has a value in the millions. 231 232 An anomaly in some results came from \CFA's use of gcc nested functions. 233 These nested functions are used to create closures that can access stack 234 variables in their lexical scope. 235 However, if they do so, then they can cause the benchmark's run-time to 236 increase by an order of magnitude. 237 The simplest solution is to make those values global variables instead 238 of function local variables. 239 % Do we know if editing a global inside nested function is a problem? 240 Tests that had to be modified to avoid this problem have been marked 241 with a ``*'' in the results. 242 243 % Now come the tables themselves: 244 % You might need a wider window for this. 245 246 \begin{table}[htb] 247 \centering 248 \caption{Termination Performance Results (sec)} 249 \label{t:PerformanceTermination} 250 \begin{tabular}{|r|*{2}{|r r r r|}} 251 \hline 252 & \multicolumn{4}{c||}{AMD} & \multicolumn{4}{c|}{ARM} \\ 253 \cline{2-9} 254 N\hspace{8pt} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} & 255 \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ 256 \hline 257 Empty Traversal (1M) & 3.4 & 2.8 & 18.3 & 23.4 & 3.7 & 3.2 & 15.5 & 14.8 \\ 258 D'tor Traversal (1M) & 48.4 & 23.6 & N/A & N/A & 64.2 & 29.0 & N/A & N/A \\ 259 Finally Traversal (1M) & 3.4* & N/A & 17.9 & 29.0 & 4.1* & N/A & 15.6 & 19.0 \\ 260 Other Traversal (1M) & 3.6* & 23.2 & 18.2 & 32.7 & 4.0* & 24.5 & 15.5 & 21.4 \\ 261 Cross Handler (1B) & 6.0 & 0.9 & N/C & 37.4 & 10.0 & 0.8 & N/C & 32.2 \\ 262 Cross Finally (1B) & 0.9 & N/A & N/C & 44.1 & 0.8 & N/A & N/C & 37.3 \\ 263 Match All (10M) & 32.9 & 20.7 & 13.4 & 4.9 & 36.2 & 24.5 & 12.0 & 3.1 \\ 264 Match None (10M) & 32.7 & 50.3 & 11.0 & 5.1 & 36.3 & 71.9 & 12.3 & 4.2 \\ 174 265 \hline 175 266 \end{tabular} 176 177 % run-plg7a-a.sat 178 % --------------- 179 % Raise Empty & 57169011329 & 296612564 & 2788557155 & 17511466039 & 23324548496 \\ 180 % Raise D'tor & 150599858014 & 318443709 & 149651693682 & N/A & N/A \\ 181 % Raise Finally & 148223145000 & 373325807 & N/A & ... & 29074552998 \\ 182 % Raise Other & 189463708732 & 3017109322 & 85819281694 & 17584295487 & 32602686679 \\ 183 % Cross Handler & 8001654 & 13584858 & 1555995 & 6626775 & 41927358 \\ 184 % Cross Finally & 1002473 & N/A & N/A & 4554344 & 51114381 \\ 185 % Match All & 3162460860 & 37315018 & 2649464591 & 1523205769 & 742374509 \\ 186 % Match None & 4054773797 & 47052659 & 7759229131 & 1555373654 & 744656403 \\ 187 % 188 % run-plg7a-thr-a 189 % --------------- 190 % Raise Empty & 3604235388 & 29829965 & 2786931833 & 17576506385 & 23352975105 \\ 191 % Raise D'tor & 46552380948 & 178709605 & 149834207219 & N/A & N/A \\ 192 % Raise Finally & 46265157775 & 177906320 & N/A & 17493045092 & 29170962959 \\ 193 % Raise Other & 195659245764 & 2376968982 & 86070431924 & 17552979675 & 32501882918 \\ 194 % Cross Handler & 397031776 & 12503552 & 1451225 & 6658628 & 42304965 \\ 195 % Cross Finally & 1136746 & N/A & N/A & 4468799 & 46155817 \\ 196 % Match All & 3189512499 & 39124453 & 2667795989 & 1525889031 & 733785613 \\ 197 % Match None & 4094675477 & 48749857 & 7850618572 & 1566713577 & 733478963 \\ 198 199 % PLG7A (in seconds) 200 \begin{tabular}{|l|c c c c c|} 201 \hline 202 & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\ 203 \hline 204 % Raise Empty & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 205 % Raise D'tor & 0.0 & 0.0 & 0.0 & N/A & N/A \\ 206 % Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\ 207 % Raise Other & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 208 % Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 209 % Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\ 210 % Match All & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 211 % Match None & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 212 Raise Empty & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 213 Raise D'tor & 0.0 & 0.0 & 0.0 & N/A & N/A \\ 214 Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\ 215 Raise Other & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 216 Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 217 Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\ 218 Match All & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 219 Match None & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 267 \end{table} 268 269 \begin{table}[htb] 270 \centering 271 \caption{Resumption Performance Results (sec)} 272 \label{t:PerformanceResumption} 273 \begin{tabular}{|r||r||r|} 274 \hline 275 N\hspace{8pt} 276 & AMD & ARM \\ 277 \hline 278 Empty Traversal (10M) & 0.2 & 0.3 \\ 279 D'tor Traversal (10M) & 1.8 & 1.0 \\ 280 Finally Traversal (10M) & 1.7 & 1.0 \\ 281 Other Traversal (10M) & 22.6 & 25.9 \\ 282 Cross Handler (1B) & 8.4 & 11.9 \\ 283 Match All (100M) & 2.3 & 3.2 \\ 284 Match None (100M) & 2.9 & 3.9 \\ 220 285 \hline 221 286 \end{tabular} 222 223 One result that is not directly related to \CFA but is important to keep in 224 mind is that in exceptions the standard intuitions about which languages 225 should go faster often do not hold. There are cases where Python out-preforms 226 \Cpp and Java. The most likely explination is that, since exceptions are 227 rarely considered to be the common case, the more optimized langages have 228 optimized at their expence. In addition languages with high level 229 repersentations have a much easier time scanning the stack as there is less 230 to decode. 231 232 This means that while \CFA does not actually keep up with Python in every 233 case it is no worse than roughly half the speed of \Cpp. This is good 234 enough for the prototyping purposes of the project. 235 236 One difference not shown is that optimizations in \CFA is very fragile. 237 The \CFA compiler uses gcc as part of its complation process and the version 238 of gcc could change the speed of some of the benchmarks by 10 times or more. 239 Similar changes to g++ for the \Cpp benchmarks had no significant changes. 240 Because of the connection between gcc and g++; this suggests it is not the 241 optimizations that are changing but how the optimizer is detecting if the 242 optimizations can be applied. So the optimizations are always applied in 243 g++, but only newer versions of gcc can detect that they can be applied in 244 the more complex \CFA code. 245 246 Resumption exception handling is also incredibly fast. Often an order of 247 magnitude or two better than the best termination speed. 248 There is a simple explination for this; traversing a linked list is much 249 faster than examining and unwinding the stack. When resumption does not do as 250 well its when more try statements are used per raise. Updating the interal 251 linked list is not very expencive but it does add up. 252 253 The relative speed of the Match All and Match None tests (within each 254 language) can also show the effectiveness conditional matching as compared 255 to catch and rethrow. 256 \begin{itemize}[nosep] 257 \item 258 Java and Python get similar values in both tests. 259 Between the interperated code, a higher level repersentation of the call 260 stack and exception reuse it it is possible the cost for a second 261 throw can be folded into the first. 262 % Is this due to optimization? 263 \item 264 Both types of \CFA are slighly slower if there is not a match. 265 For termination this likely comes from unwinding a bit more stack through 266 libunwind instead of executing the code normally. 267 For resumption there is extra work in traversing more of the list and running 268 more checks for a matching exceptions. 269 % Resumption is a bit high for that but this is my best theory. 270 \item 271 Then there is \Cpp, which takes 2--3 times longer to catch and rethrow vs. 272 just the catch. This is very high, but it does have to repeat the same 273 process of unwinding the stack and may have to parse the LSDA of the function 274 with the catch and rethrow twice, once before the catch and once after the 275 rethrow. 276 % I spent a long time thinking of what could push it over twice, this is all 277 % I have to explain it. 278 \end{itemize} 279 The difference in relative performance does show that there are savings to 280 be made by performing the check without catching the exception. 287 \end{table} 288 289 \begin{table}[htb] 290 \centering 291 \small 292 \caption{Resumption/Fixup Routine Comparison (sec)} 293 \label{t:PerformanceFixupRoutines} 294 \setlength{\tabcolsep}{5pt} 295 \begin{tabular}{|r|*{2}{|r r r r r|}} 296 \hline 297 & \multicolumn{5}{c||}{AMD} & \multicolumn{5}{c|}{ARM} \\ 298 \cline{2-11} 299 N\hspace{8pt} & \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} & 300 \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ 301 \hline 302 Resume Empty (10M) & 1.5 & 1.5 & 14.7 & 2.3 & 176.1 & 1.0 & 1.4 & 8.9 & 1.2 & 119.9 \\ 303 \hline 304 \end{tabular} 305 \end{table} 306 307 % Now discuss the results in the tables. 308 One result not directly related to \CFA but important to keep in mind is that, 309 for exceptions, the standard intuition about which languages should go 310 faster often does not hold. 311 For example, there are a few cases where Python out-performs 312 \CFA, \Cpp and Java. 313 % To be exact, the Match All and Match None cases. 314 The most likely explanation is that, since exceptions 315 are rarely considered to be the common case, the more optimized languages 316 make that case expensive to improve other cases. 317 In addition, languages with high-level representations have a much 318 easier time scanning the stack as there is less to decode. 319 320 As stated, 321 the performance tests are not attempting to show \CFA has a new competitive 322 way of implementing exception handling. 323 The only performance requirement is to insure the \CFA EHM has reasonable 324 performance for prototyping. 325 Although that may be hard to exactly quantify, I believe it has succeeded 326 in that regard. 327 Details on the different test cases follow. 328 329 \subsection{Termination \texorpdfstring{(\autoref{t:PerformanceTermination})}{}} 330 331 \begin{description} 332 \item[Empty Traversal] 333 \CFA is slower than \Cpp, but is still faster than the other languages 334 and closer to \Cpp than other languages. 335 This result is to be expected, 336 as \CFA is closer to \Cpp than the other languages. 337 338 \item[D'tor Traversal] 339 Running destructors causes a huge slowdown in the two languages that support 340 them. \CFA has a higher proportionate slowdown but it is similar to \Cpp's. 341 Considering the amount of work done in destructors is effectively zero 342 (an assembly comment), the cost 343 must come from the change of context required to run the destructor. 344 345 \item[Finally Traversal] 346 Performance is similar to Empty Traversal in all languages that support finally 347 clauses. Only Python seems to have a larger than random noise change in 348 its run-time and it is still not large. 349 Despite the similarity between finally clauses and destructors, 350 finally clauses seem to avoid the spike that run-time destructors have. 351 Possibly some optimization removes the cost of changing contexts. 352 353 \item[Other Traversal] 354 For \Cpp, stopping to check if a handler applies seems to be about as 355 expensive as stopping to run a destructor. 356 This results in a significant jump. 357 358 Other languages experience a small increase in run-time. 359 The small increase likely comes from running the checks, 360 but they could avoid the spike by not having the same kind of overhead for 361 switching to the check's context. 362 363 \item[Cross Handler] 364 Here \CFA falls behind \Cpp by a much more significant margin. 365 This is likely due to the fact \CFA has to insert two extra function 366 calls, while \Cpp does not have to do execute any other instructions. 367 Python is much further behind. 368 369 \item[Cross Finally] 370 \CFA's performance now matches \Cpp's from Cross Handler. 371 If the code from the finally clause is being inlined, 372 which is just an asm comment, than there are no additional instructions 373 to execute again when exiting the try statement normally. 374 375 \item[Conditional Match] 376 Both of the conditional matching tests can be considered on their own. 377 However for evaluating the value of conditional matching itself, the 378 comparison of the two sets of results is useful. 379 Consider the massive jump in run-time for \Cpp going from match all to match 380 none, which none of the other languages have. 381 Some strange interaction is causing run-time to more than double for doing 382 twice as many raises. 383 Java and Python avoid this problem and have similar run-time for both tests, 384 possibly through resource reuse or their program representation. 385 However \CFA is built like \Cpp and avoids the problem as well, this matches 386 the pattern of the conditional match, which makes the two execution paths 387 very similar. 388 389 \end{description} 390 391 \subsection{Resumption \texorpdfstring{(\autoref{t:PerformanceResumption})}{}} 392 393 Moving on to resumption, there is one general note, 394 resumption is \textit{fast}. The only test where it fell 395 behind termination is Cross Handler. 396 In every other case, the number of iterations had to be increased by a 397 factor of 10 to get the run-time in an appropriate range 398 and in some cases resumption still took less time. 399 400 % I tried \paragraph and \subparagraph, maybe if I could adjust spacing 401 % between paragraphs those would work. 402 \begin{description} 403 \item[Empty Traversal] 404 See above for the general speed-up notes. 405 This result is not surprising as resumption's linked-list approach 406 means that traversing over stack frames without a resumption handler is 407 $O(1)$. 408 409 \item[D'tor Traversal] 410 Resumption does have the same spike in run-time that termination has. 411 The run-time is actually very similar to Finally Traversal. 412 As resumption does not unwind the stack, both destructors and finally 413 clauses are run while walking down the stack during the recursive returns. 414 So it follows their performance is similar. 415 416 \item[Finally Traversal] 417 Same as D'tor Traversal, 418 except termination did not have a spike in run-time on this test case. 419 420 \item[Other Traversal] 421 Traversing across handlers reduces resumption's advantage as it actually 422 has to stop and check each one. 423 Resumption still came out ahead (adjusting for iterations) but by much less 424 than the other cases. 425 426 \item[Cross Handler] 427 The only test case where resumption could not keep up with termination, 428 although the difference is not as significant as many other cases. 429 It is simply a matter of where the costs come from, 430 both termination and resumption have some work to set-up or tear-down a 431 handler. It just so happens that resumption's work is slightly slower. 432 433 \item[Conditional Match] 434 Resumption shows a slight slowdown if the exception is not matched 435 by the first handler, which follows from the fact the second handler now has 436 to be checked. However the difference is not large. 437 438 \end{description} 439 440 \subsection{Resumption/Fixup \texorpdfstring{(\autoref{t:PerformanceFixupRoutines})}{}} 441 442 Finally are the results of the resumption/fixup routine comparison. 443 These results are surprisingly varied. It is possible that creating a closure 444 has more to do with performance than passing the argument through layers of 445 calls. 446 At 100 stack frames, resumption and manual fixup routines have similar 447 performance in \CFA. 448 More experiments could try to tease out the exact trade-offs, 449 but the prototype's only performance goal is to be reasonable. 450 It has already in that range, and \CFA's fixup routine simulation is 451 one of the faster simulations as well. 452 Plus exceptions add features and remove syntactic overhead, 453 so even at similar performance resumptions have advantages 454 over fixup routines.
Note:
See TracChangeset
for help on using the changeset viewer.