- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/andrew_beach_MMath/performance.tex
rc9f9d4f r5438e41 2 2 \label{c:performance} 3 3 4 Performance isof secondary importance for most of this project.5 Instead, the focus isto get the features working. The only performance6 requirement is to ensure the tests for correctness run in a reasonable4 Performance has been of secondary importance for most of this project. 5 Instead, the focus has been to get the features working. The only performance 6 requirements is to ensure the tests for correctness run in a reasonable 7 7 amount of time. 8 8 9 9 \section{Test Set-Up} 10 Tests w ere run in \CFA, C++, Java and Python.10 Tests will be run in \CFA, C++, Java and Python. 11 11 In addition there are two sets of tests for \CFA, 12 one for termination and one forresumption exceptions.12 one for termination exceptions and once with resumption exceptions. 13 13 14 14 C++ is the most comparable language because both it and \CFA use the same 15 15 framework, libunwind. 16 16 In fact, the comparison is almost entirely a quality of implementation 17 comparison :\CFA's EHM has had significantly less time to be optimized and17 comparison. \CFA's EHM has had significantly less time to be optimized and 18 18 does not generate its own assembly. It does have a slight advantage in that 19 there are some features it handles directly instead ofthrough utility functions,19 there are some features it does not handle, through utility functions, 20 20 but otherwise \Cpp has a significant advantage. 21 21 … … 23 23 It is implemented in a very different environment, a virtual machine with 24 24 garbage collection. 25 It also implements the @finally@ clause on @try@blocks allowing for a direct25 It also implements the finally clause on try blocks allowing for a direct 26 26 feature-to-feature comparison. 27 As with \Cpp, Java's implementation is m ature,optimizations28 and hasextra features.29 30 Python is used as an alternativepoint of comparison because of the \CFA EHM's31 current performance goals, which is not tobe prohibitively slow while the27 As with \Cpp, Java's implementation is more mature, has more optimizations 28 and more extra features. 29 30 Python was used as a point of comparison because of the \CFA EHM's 31 current performance goals, which is not be prohibitively slow while the 32 32 features are designed and examined. Python has similar performance goals for 33 33 creating quick scripts and its wide use suggests it has achieved those goals. 34 34 35 Unfortunately ,there are no notable modern programming languages with36 resumption exceptions. Even the older programming languages with resumption 37 seem to be notable only for having resumption .38 So instead , resumption iscompared to a less similar but much more familiar35 Unfortunately there are no notable modern programming languages with 36 resumption exceptions. Even the older programming languages with resumptions 37 seem to be notable only for having resumptions. 38 So instead resumptions are compared to a less similar but much more familiar 39 39 feature, termination exceptions. 40 40 41 All tests are run inside a main loop that repeatedly performs a test.42 This approachavoids start-up or tear-down time from41 All tests are run inside a main loop which will perform the test 42 repeatedly. This is to avoids start-up or tear-down time from 43 43 affecting the timing results. 44 Each test is runa million times.45 The Java versions of the test run this loop an extra 1000 times before46 beginning to actual testto ``warm-up" the JVM.44 Tests ran their main loop a million times. 45 The Java versions of the test also run this loop an extra 1000 times before 46 beginning to time the results to ``warm-up" the JVM. 47 47 48 48 Timing is done internally, with time measured immediately before and 49 after the test loop. The difference is calculated and printed. 49 immediately after the test loop. The difference is calculated and printed. 50 50 51 The loop structure and internal timing means it is impossible to test 51 52 unhandled exceptions in \Cpp and Java as that would cause the process to … … 54 55 critical. 55 56 56 The exceptions used in these tests are alwaysbased off of57 abase exception. This requirement minimizes performance differences based58 on the object model used to rep resent the exception.59 60 All tests are designed to be as minimal as possible,while still preventing61 ex cessive optimizations.57 The exceptions used in these tests will always be a exception based off of 58 the base exception. This requirement minimizes performance differences based 59 on the object model used to repersent the exception. 60 61 All tests were designed to be as minimal as possible while still preventing 62 exessive optimizations. 62 63 For example, empty inline assembly blocks are used in \CFA and \Cpp to 63 64 prevent excessive optimizations while adding no actual work. 64 Each test was run eleven times. The top three and bottom three results were65 discarded and the remaining five values are averaged.66 67 The tests are compiled with gcc-10 for \CFA and g++-10 for \Cpp. Java is68 compiled with 11.0.11. Python with 3.8. The tests were run on:69 \begin{itemize}[nosep]70 \item71 ARM 2280 Kunpeng 920 48-core 2$\times$socket \lstinline{@} 2.6 GHz running Linux v5.11.0-2572 \item73 AMD 6380 Abu Dhabi 16-core 4$\times$socket \lstinline{@} 2.5 GHz running Linux v5.11.0-2574 \end{itemize}75 65 76 66 % We don't use catch-alls but if we did: … … 81 71 The following tests were selected to test the performance of different 82 72 components of the exception system. 83 The y should provide a guide as to where the EHM's costs are found.73 The should provide a guide as to where the EHM's costs can be found. 84 74 85 75 \paragraph{Raise and Handle} 86 The first group measures the cost of a try statement when exceptions are raised 87 and \emph{the stack is unwound}. Each test has has a repeating function like 88 the following 89 \begin{cfa} 90 void unwind_empty(unsigned int frames) { 91 if (frames) { 92 unwind_empty(frames - 1); 93 } else throw (empty_exception){&empty_vt}; 94 } 95 \end{cfa} 96 which is called M times, where each call recurses to a depth of N, an 97 exception is raised, the stack is a unwound, and the exception caught. 76 The first group of tests involve setting up 77 So there is three layers to the test. The first is set up and a loop, which 78 configures the test and then runs it repeatedly to reduce the impact of 79 start-up and shutdown on the results. 80 Each iteration of the main loop 98 81 \begin{itemize}[nosep] 99 \item Empty: 100 This test measures the cost of raising (stack walking) an exception through empty 101 empty stack frames to an empty handler. (see above) 82 \item Empty Function: 83 The repeating function is empty except for the necessary control code. 102 84 \item Destructor: 103 104 This test measures the cost of raising an exception through non-empty frames 105 where each frame has an object requiring destruction, to an empty 106 handler. Hence, there are N destructor calls during unwinding. 107 \begin{cfa} 108 if (frames) { 109 WithDestructor object; 110 unwind_empty(frames - 1); 111 \end{cfa} 85 The repeating function creates an object with a destructor before calling 86 itself. 112 87 \item Finally: 113 This test measures the cost of establishing a try block with an empty finally 114 clause on the front side of the recursion and running the empty finally clause 115 on the back side of the recursion during stack unwinding. 116 \begin{cfa} 117 if (frames) { 118 try { 119 unwind_finally(frames - 1); 120 } finally {} 121 \end{cfa} 88 The repeating function calls itself inside a try block with a finally clause 89 attached. 122 90 \item Other Handler: 123 This test is like the finally test but the try block has a catch clause for an 124 exception that is not raised, so catch matching is executed during stack 125 unwinding but the match never successes until the catch at the bottom of the 126 stack. 127 \begin{cfa} 128 if (frames) { 129 try { 130 unwind_other(frames - 1); 131 } catch (not_raised_exception *) {} 132 \end{cfa} 91 The repeating function calls itself inside a try block with a handler that 92 will not match the raised exception. (But is of the same kind of handler.) 133 93 \end{itemize} 134 94 135 95 \paragraph{Cross Try Statement} 136 The next group measures just the cost of executing a try statement so 137 \emph{there is no stack unwinding}. Hence, the program main loops N times 138 around: 139 \begin{cfa} 140 try { 141 } catch (not_raised_exception *) {} 142 \end{cfa} 96 The next group measures the cost of a try statement when no exceptions are 97 raised. The test is set-up, then there is a loop to reduce the impact of 98 start-up and shutdown on the results. 99 In each iteration, a try statement is executed. Entering and leaving a loop 100 is all the test wants to do. 143 101 \begin{itemize}[nosep] 144 102 \item Handler: 145 The try statement has a handler .103 The try statement has a handler (of the matching kind). 146 104 \item Finally: 147 The try statement replaces the handler witha finally clause.105 The try statement has a finally clause. 148 106 \end{itemize} 149 107 150 108 \paragraph{Conditional Matching} 151 This final group measures the cost of conditional matching.109 This group of tests checks the cost of conditional matching. 152 110 Only \CFA implements the language level conditional match, 153 111 the other languages must mimic with an ``unconditional" match (it still 154 112 checks the exception's type) and conditional re-raise if it was not supposed 155 113 to handle that exception. 156 \begin{center}157 \begin{tabular}{ll}158 \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp, Java, Python} \\159 \begin{cfa}160 try {161 throw_exception();162 } catch (empty_exception * exc;163 should_catch) {164 }165 \end{cfa}166 &167 \begin{cfa}168 try {169 throw_exception();170 } catch (EmptyException & exc) {171 if (!should_catch) throw;172 }173 \end{cfa}174 \end{tabular}175 \end{center}176 114 \begin{itemize}[nosep] 177 115 \item Match All: … … 192 130 193 131 \section{Results} 132 Each test was run eleven times. The top three and bottom three results were 133 discarded and the remaining five values are averaged. 134 194 135 In cases where a feature is not supported by a language the test is skipped 195 for that language. 196 \PAB{Report all values. 197 198 Similarly, if a test does not change between resumption 136 for that language. Similarly, if a test is does not change between resumption 199 137 and termination in \CFA, then only one test is written and the result 200 138 was put into the termination column. 201 }202 139 203 140 % Raw Data: … … 300 237 \end{tabular} 301 238 302 One result not directly related to \CFA butimportant to keep in303 mind is that , for exceptions, the standard intuitionabout which languages304 should go faster often do es not hold. For example, there are cases where Python out-performs305 \Cpp and Java. The most likely expl anation is that, since exceptions are306 rarely considered to be the common case, the more optimized lang uages307 make that case expense. In addition, languages with high-level 308 rep resentations have a much easier time scanning the stack as there is less239 One result that is not directly related to \CFA but is important to keep in 240 mind is that in exceptions the standard intuitions about which languages 241 should go faster often do not hold. There are cases where Python out-preforms 242 \Cpp and Java. The most likely explination is that, since exceptions are 243 rarely considered to be the common case, the more optimized langages have 244 optimized at their expence. In addition languages with high level 245 repersentations have a much easier time scanning the stack as there is less 309 246 to decode. 310 247 311 This observationmeans that while \CFA does not actually keep up with Python in every312 case , it is usually no worse than roughly half the speed of \Cpp. This performanceis good248 This means that while \CFA does not actually keep up with Python in every 249 case it is usually no worse than roughly half the speed of \Cpp. This is good 313 250 enough for the prototyping purposes of the project. 314 251 315 252 The test case where \CFA falls short is Raise Other, the case where the 316 253 stack is unwound including a bunch of non-matching handlers. 317 This slowdown seems to come from missing optimizations. 318 254 This slowdown seems to come from missing optimizations, 255 the results above came from gcc/g++ 10 (gcc as \CFA backend or g++ for \Cpp) 256 but the results change if they are run in gcc/g++ 9 instead. 319 257 Importantly, there is a huge slowdown in \Cpp's results bringing that brings 320 \CFA's performa nce back in that roughly half speed area. However many other258 \CFA's performace back in that roughly half speed area. However many other 321 259 \CFA benchmarks increase their run-time by a similar amount falling far 322 260 behind their \Cpp counter-parts. … … 331 269 Resumption exception handling is also incredibly fast. Often an order of 332 270 magnitude or two better than the best termination speed. 333 There is a simple expl anation for this; traversing a linked list is much271 There is a simple explination for this; traversing a linked list is much 334 272 faster than examining and unwinding the stack. When resumption does not do as 335 well its when more try statements are used per raise. Updating the inter nal336 linked list is not very expen sive but it does add up.273 well its when more try statements are used per raise. Updating the interal 274 linked list is not very expencive but it does add up. 337 275 338 276 The relative speed of the Match All and Match None tests (within each … … 342 280 \item 343 281 Java and Python get similar values in both tests. 344 Between the interp reted code, a higher level representation of the call282 Between the interperated code, a higher level repersentation of the call 345 283 stack and exception reuse it it is possible the cost for a second 346 284 throw can be folded into the first. 347 285 % Is this due to optimization? 348 286 \item 349 Both types of \CFA are sligh tly slower if there is not a match.287 Both types of \CFA are slighly slower if there is not a match. 350 288 For termination this likely comes from unwinding a bit more stack through 351 289 libunwind instead of executing the code normally.
Note: See TracChangeset
for help on using the changeset viewer.