Changeset c9f9d4f for doc/theses
- Timestamp:
- Aug 12, 2021, 10:12:54 PM (3 years ago)
- Branches:
- ADT, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, new-ast-unique-expr, pthread-emulation, qualifiedEnum
- Children:
- 3b8acfb, 6cebfef, c99a0d1
- Parents:
- 93d0ed3
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/andrew_beach_MMath/performance.tex
r93d0ed3 rc9f9d4f 2 2 \label{c:performance} 3 3 4 Performance has beenof secondary importance for most of this project.5 Instead, the focus has beento get the features working. The only performance6 requirement sis to ensure the tests for correctness run in a reasonable4 Performance is of secondary importance for most of this project. 5 Instead, the focus is to get the features working. The only performance 6 requirement is to ensure the tests for correctness run in a reasonable 7 7 amount of time. 8 8 9 9 \section{Test Set-Up} 10 Tests w ill be run in \CFA, C++, Java and Python.10 Tests were run in \CFA, C++, Java and Python. 11 11 In addition there are two sets of tests for \CFA, 12 one for termination exceptions and once withresumption exceptions.12 one for termination and one for resumption exceptions. 13 13 14 14 C++ is the most comparable language because both it and \CFA use the same 15 15 framework, libunwind. 16 16 In fact, the comparison is almost entirely a quality of implementation 17 comparison .\CFA's EHM has had significantly less time to be optimized and17 comparison: \CFA's EHM has had significantly less time to be optimized and 18 18 does not generate its own assembly. It does have a slight advantage in that 19 there are some features it does not handle,through utility functions,19 there are some features it handles directly instead of through utility functions, 20 20 but otherwise \Cpp has a significant advantage. 21 21 … … 23 23 It is implemented in a very different environment, a virtual machine with 24 24 garbage collection. 25 It also implements the finally clause on tryblocks allowing for a direct25 It also implements the @finally@ clause on @try@ blocks allowing for a direct 26 26 feature-to-feature comparison. 27 As with \Cpp, Java's implementation is m ore mature, has moreoptimizations28 and moreextra features.29 30 Python was used as apoint of comparison because of the \CFA EHM's31 current performance goals, which is not be prohibitively slow while the27 As with \Cpp, Java's implementation is mature, optimizations 28 and has extra features. 29 30 Python is used as an alternative point of comparison because of the \CFA EHM's 31 current performance goals, which is not to be prohibitively slow while the 32 32 features are designed and examined. Python has similar performance goals for 33 33 creating quick scripts and its wide use suggests it has achieved those goals. 34 34 35 Unfortunately there are no notable modern programming languages with36 resumption exceptions. Even the older programming languages with resumption s37 seem to be notable only for having resumption s.38 So instead resumptions arecompared to a less similar but much more familiar35 Unfortunately, there are no notable modern programming languages with 36 resumption exceptions. Even the older programming languages with resumption 37 seem to be notable only for having resumption. 38 So instead, resumption is compared to a less similar but much more familiar 39 39 feature, termination exceptions. 40 40 41 All tests are run inside a main loop which will perform the test42 repeatedly. This is toavoids start-up or tear-down time from41 All tests are run inside a main loop that repeatedly performs a test. 42 This approach avoids start-up or tear-down time from 43 43 affecting the timing results. 44 Tests ran their main loopa million times.45 The Java versions of the test alsorun this loop an extra 1000 times before46 beginning to time the resultsto ``warm-up" the JVM.44 Each test is run a million times. 45 The Java versions of the test run this loop an extra 1000 times before 46 beginning to actual test to ``warm-up" the JVM. 47 47 48 48 Timing is done internally, with time measured immediately before and 49 immediately after the test loop. The difference is calculated and printed. 50 49 after the test loop. The difference is calculated and printed. 51 50 The loop structure and internal timing means it is impossible to test 52 51 unhandled exceptions in \Cpp and Java as that would cause the process to … … 55 54 critical. 56 55 57 The exceptions used in these tests will always be a exceptionbased off of58 thebase exception. This requirement minimizes performance differences based59 on the object model used to rep ersent the exception.60 61 All tests were designed to be as minimal as possiblewhile still preventing62 ex essive optimizations.56 The exceptions used in these tests are always based off of 57 a base exception. This requirement minimizes performance differences based 58 on the object model used to represent the exception. 59 60 All tests are designed to be as minimal as possible, while still preventing 61 excessive optimizations. 63 62 For example, empty inline assembly blocks are used in \CFA and \Cpp to 64 63 prevent excessive optimizations while adding no actual work. 64 Each test was run eleven times. The top three and bottom three results were 65 discarded and the remaining five values are averaged. 66 67 The tests are compiled with gcc-10 for \CFA and g++-10 for \Cpp. Java is 68 compiled with 11.0.11. Python with 3.8. The tests were run on: 69 \begin{itemize}[nosep] 70 \item 71 ARM 2280 Kunpeng 920 48-core 2$\times$socket \lstinline{@} 2.6 GHz running Linux v5.11.0-25 72 \item 73 AMD 6380 Abu Dhabi 16-core 4$\times$socket \lstinline{@} 2.5 GHz running Linux v5.11.0-25 74 \end{itemize} 65 75 66 76 % We don't use catch-alls but if we did: … … 71 81 The following tests were selected to test the performance of different 72 82 components of the exception system. 73 The should provide a guide as to where the EHM's costs can be found.83 They should provide a guide as to where the EHM's costs are found. 74 84 75 85 \paragraph{Raise and Handle} 76 The first group of tests involve setting up 77 So there is three layers to the test. The first is set up and a loop, which 78 configures the test and then runs it repeatedly to reduce the impact of 79 start-up and shutdown on the results. 80 Each iteration of the main loop 81 \begin{itemize}[nosep] 82 \item Empty Function: 83 The repeating function is empty except for the necessary control code. 86 The first group measures the cost of a try statement when exceptions are raised 87 and \emph{the stack is unwound}. Each test has has a repeating function like 88 the following 89 \begin{cfa} 90 void unwind_empty(unsigned int frames) { 91 if (frames) { 92 unwind_empty(frames - 1); 93 } else throw (empty_exception){&empty_vt}; 94 } 95 \end{cfa} 96 which is called M times, where each call recurses to a depth of N, an 97 exception is raised, the stack is a unwound, and the exception caught. 98 \begin{itemize}[nosep] 99 \item Empty: 100 This test measures the cost of raising (stack walking) an exception through empty 101 empty stack frames to an empty handler. (see above) 84 102 \item Destructor: 85 The repeating function creates an object with a destructor before calling 86 itself. 103 104 This test measures the cost of raising an exception through non-empty frames 105 where each frame has an object requiring destruction, to an empty 106 handler. Hence, there are N destructor calls during unwinding. 107 \begin{cfa} 108 if (frames) { 109 WithDestructor object; 110 unwind_empty(frames - 1); 111 \end{cfa} 87 112 \item Finally: 88 The repeating function calls itself inside a try block with a finally clause 89 attached. 113 This test measures the cost of establishing a try block with an empty finally 114 clause on the front side of the recursion and running the empty finally clause 115 on the back side of the recursion during stack unwinding. 116 \begin{cfa} 117 if (frames) { 118 try { 119 unwind_finally(frames - 1); 120 } finally {} 121 \end{cfa} 90 122 \item Other Handler: 91 The repeating function calls itself inside a try block with a handler that 92 will not match the raised exception. (But is of the same kind of handler.) 123 This test is like the finally test but the try block has a catch clause for an 124 exception that is not raised, so catch matching is executed during stack 125 unwinding but the match never successes until the catch at the bottom of the 126 stack. 127 \begin{cfa} 128 if (frames) { 129 try { 130 unwind_other(frames - 1); 131 } catch (not_raised_exception *) {} 132 \end{cfa} 93 133 \end{itemize} 94 134 95 135 \paragraph{Cross Try Statement} 96 The next group measures the cost of a try statement when no exceptions are 97 raised. The test is set-up, then there is a loop to reduce the impact of 98 start-up and shutdown on the results. 99 In each iteration, a try statement is executed. Entering and leaving a loop 100 is all the test wants to do. 136 The next group measures just the cost of executing a try statement so 137 \emph{there is no stack unwinding}. Hence, the program main loops N times 138 around: 139 \begin{cfa} 140 try { 141 } catch (not_raised_exception *) {} 142 \end{cfa} 101 143 \begin{itemize}[nosep] 102 144 \item Handler: 103 The try statement has a handler (of the matching kind).145 The try statement has a handler. 104 146 \item Finally: 105 The try statement hasa finally clause.147 The try statement replaces the handler with a finally clause. 106 148 \end{itemize} 107 149 108 150 \paragraph{Conditional Matching} 109 This group of tests checks the cost of conditional matching.151 This final group measures the cost of conditional matching. 110 152 Only \CFA implements the language level conditional match, 111 153 the other languages must mimic with an ``unconditional" match (it still 112 154 checks the exception's type) and conditional re-raise if it was not supposed 113 155 to handle that exception. 156 \begin{center} 157 \begin{tabular}{ll} 158 \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp, Java, Python} \\ 159 \begin{cfa} 160 try { 161 throw_exception(); 162 } catch (empty_exception * exc; 163 should_catch) { 164 } 165 \end{cfa} 166 & 167 \begin{cfa} 168 try { 169 throw_exception(); 170 } catch (EmptyException & exc) { 171 if (!should_catch) throw; 172 } 173 \end{cfa} 174 \end{tabular} 175 \end{center} 114 176 \begin{itemize}[nosep] 115 177 \item Match All: … … 130 192 131 193 \section{Results} 132 Each test was run eleven times. The top three and bottom three results were133 discarded and the remaining five values are averaged.134 135 194 In cases where a feature is not supported by a language the test is skipped 136 for that language. Similarly, if a test is does not change between resumption 195 for that language. 196 \PAB{Report all values. 197 198 Similarly, if a test does not change between resumption 137 199 and termination in \CFA, then only one test is written and the result 138 200 was put into the termination column. 201 } 139 202 140 203 % Raw Data: … … 237 300 \end{tabular} 238 301 239 One result that is not directly related to \CFA but isimportant to keep in240 mind is that in exceptions the standard intuitionsabout which languages241 should go faster often do not hold. There are cases where Python out-preforms242 \Cpp and Java. The most likely expl ination is that, since exceptions are243 rarely considered to be the common case, the more optimized lang ages have244 optimized at their expence. In addition languages with high level 245 rep ersentations have a much easier time scanning the stack as there is less302 One result not directly related to \CFA but important to keep in 303 mind is that, for exceptions, the standard intuition about which languages 304 should go faster often does not hold. For example, there are cases where Python out-performs 305 \Cpp and Java. The most likely explanation is that, since exceptions are 306 rarely considered to be the common case, the more optimized languages 307 make that case expense. In addition, languages with high-level 308 representations have a much easier time scanning the stack as there is less 246 309 to decode. 247 310 248 This means that while \CFA does not actually keep up with Python in every249 case it is usually no worse than roughly half the speed of \Cpp. Thisis good311 This observation means that while \CFA does not actually keep up with Python in every 312 case, it is usually no worse than roughly half the speed of \Cpp. This performance is good 250 313 enough for the prototyping purposes of the project. 251 314 252 315 The test case where \CFA falls short is Raise Other, the case where the 253 316 stack is unwound including a bunch of non-matching handlers. 254 This slowdown seems to come from missing optimizations, 255 the results above came from gcc/g++ 10 (gcc as \CFA backend or g++ for \Cpp) 256 but the results change if they are run in gcc/g++ 9 instead. 317 This slowdown seems to come from missing optimizations. 318 257 319 Importantly, there is a huge slowdown in \Cpp's results bringing that brings 258 \CFA's performa ce back in that roughly half speed area. However many other320 \CFA's performance back in that roughly half speed area. However many other 259 321 \CFA benchmarks increase their run-time by a similar amount falling far 260 322 behind their \Cpp counter-parts. … … 269 331 Resumption exception handling is also incredibly fast. Often an order of 270 332 magnitude or two better than the best termination speed. 271 There is a simple expl ination for this; traversing a linked list is much333 There is a simple explanation for this; traversing a linked list is much 272 334 faster than examining and unwinding the stack. When resumption does not do as 273 well its when more try statements are used per raise. Updating the inter al274 linked list is not very expen cive but it does add up.335 well its when more try statements are used per raise. Updating the internal 336 linked list is not very expensive but it does add up. 275 337 276 338 The relative speed of the Match All and Match None tests (within each … … 280 342 \item 281 343 Java and Python get similar values in both tests. 282 Between the interp erated code, a higher level repersentation of the call344 Between the interpreted code, a higher level representation of the call 283 345 stack and exception reuse it it is possible the cost for a second 284 346 throw can be folded into the first. 285 347 % Is this due to optimization? 286 348 \item 287 Both types of \CFA are sligh ly slower if there is not a match.349 Both types of \CFA are slightly slower if there is not a match. 288 350 For termination this likely comes from unwinding a bit more stack through 289 351 libunwind instead of executing the code normally.
Note: See TracChangeset
for help on using the changeset viewer.