Changeset 262deda0 for doc/theses/andrew_beach_MMath
- Timestamp:
- Aug 19, 2021, 1:59:53 PM (3 years ago)
- Branches:
- ADT, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, pthread-emulation, qualifiedEnum
- Children:
- fe8aa21
- Parents:
- f79ee0d
- Location:
- doc/theses/andrew_beach_MMath
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/andrew_beach_MMath/performance.tex
rf79ee0d r262deda0 3 3 4 4 Performance is of secondary importance for most of this project. 5 Instead, the focus is to get the features working. The only performance5 Instead, the focus was to get the features working. The only performance 6 6 requirement is to ensure the tests for correctness run in a reasonable 7 amount of time. 7 amount of time. Hence, a few basic performance tests were performed to 8 check this requirement. 8 9 9 10 \section{Test Set-Up} … … 14 15 C++ is the most comparable language because both it and \CFA use the same 15 16 framework, libunwind. 16 In fact, the comparison is almost entirely a quality of implementation 17 comparison:\CFA's EHM has had significantly less time to be optimized and17 In fact, the comparison is almost entirely a quality of implementation. 18 Specifically, \CFA's EHM has had significantly less time to be optimized and 18 19 does not generate its own assembly. It does have a slight advantage in that 19 20 there are some features it handles directly instead of through utility functions, 20 but otherwise \Cpp hasa significant advantage.21 22 Java is a nother very popular language with similar termination semantics.23 It is implemented in a very different environment, a virtual machine with21 but otherwise \Cpp should have a significant advantage. 22 23 Java is a popular language with similar termination semantics, but 24 it is implemented in a very different environment, a virtual machine with 24 25 garbage collection. 25 26 It also implements the @finally@ clause on @try@ blocks allowing for a direct 26 27 feature-to-feature comparison. 27 As with \Cpp, Java's implementation is mature, optimiz ations28 As with \Cpp, Java's implementation is mature, optimized 28 29 and has extra features. 29 30 30 Python is used as an alternative point ofcomparison because of the \CFA EHM's31 Python is used as an alternative comparison because of the \CFA EHM's 31 32 current performance goals, which is not to be prohibitively slow while the 32 33 features are designed and examined. Python has similar performance goals for … … 36 37 resumption exceptions. Even the older programming languages with resumption 37 38 seem to be notable only for having resumption. 38 So instead, resumption is compared to a less similar but much more familiar 39 feature, termination exceptions. 39 So instead, resumption is compared to its simulation in other programming 40 languages using fixup functions that are explicitly passed for correction or 41 logging purposes. 42 % So instead, resumption is compared to a less similar but much more familiar 43 %feature, termination exceptions. 40 44 41 45 All tests are run inside a main loop that repeatedly performs a test. 42 46 This approach avoids start-up or tear-down time from 43 47 affecting the timing results. 44 Each test is run a million times.45 The Java versions of the test run this loop an extra1000 times before46 beginning t oactual test to ``warm-up" the JVM.48 Each test is run a N times (configurable from the command line). 49 The Java tests runs the main loop 1000 times before 50 beginning the actual test to ``warm-up" the JVM. 47 51 48 52 Timing is done internally, with time measured immediately before and … … 66 70 67 71 The tests are compiled with gcc-10 for \CFA and g++-10 for \Cpp. Java is 68 compiled with 11.0.11. Python with3.8. The tests were run on:72 compiled with version 11.0.11. Python with version 3.8. The tests were run on: 69 73 \begin{itemize}[nosep] 70 74 \item … … 73 77 AMD 6380 Abu Dhabi 16-core 4$\times$socket \lstinline{@} 2.5 GHz running Linux v5.11.0-25 74 78 \end{itemize} 79 Two kinds of hardware architecture allows discriminating any implementation and 80 architectural effects. 81 75 82 76 83 % We don't use catch-alls but if we did: … … 84 91 85 92 \paragraph{Raise and Handle} 86 Th e first group measures the cost of a try statement when exceptions are raised87 and \emph{the stack is unwound}. Each test has has a repeating function like 88 the following89 \begin{ cfa}93 This group measures the cost of a try statement when exceptions are raised and 94 the stack is unwound (termination) or not unwound (resumption). Each test has 95 has a repeating function like the following 96 \begin{lstlisting}[language=CFA,{moredelim=**[is][\color{red}]{@}{@}}] 90 97 void unwind_empty(unsigned int frames) { 91 98 if (frames) { 92 unwind_empty(frames - 1);99 @unwind_empty(frames - 1);@ // AUGMENTED IN OTHER EXPERIMENTS 93 100 } else throw (empty_exception){&empty_vt}; 94 101 } 95 \end{ cfa}96 which is called M times, where each call recurses to a depth of N, an102 \end{lstlisting} 103 which is called N times, where each call recurses to a depth of R (configurable from the command line), an 97 104 exception is raised, the stack is a unwound, and the exception caught. 98 105 \begin{itemize}[nosep] 99 106 \item Empty: 100 This test measures the cost of raising (stack walking) an exception through empty 101 empty stack frames to an empty handler. (see above) 107 For termination, this test measures the cost of raising (stack walking) an 108 exception through empty stack frames from the bottom of the recursion to an 109 empty handler, and unwinding the stack. (see above code) 110 111 \medskip 112 For resumption, this test measures the same raising cost but does not unwind 113 the stack. For languages without resumption, a fixup function is to the bottom 114 of the recursion and called to simulate a fixup operation at that point. 115 \begin{cfa} 116 void nounwind_fixup(unsigned int frames, void (*raised_rtn)(int &)) { 117 if (frames) { 118 nounwind_fixup(frames - 1, raised_rtn); 119 } else { 120 int fixup = 17; 121 raised_rtn(fixup); 122 } 123 } 124 \end{cfa} 125 where the passed fixup function is: 126 \begin{cfa} 127 void raised(int & fixup) { 128 fixup = 42; 129 } 130 \end{cfa} 131 For comparison, a \CFA version passing a function is also included. 102 132 \item Destructor: 103 104 This test measures the cost of raising an exception through non-empty frames 105 where each frame has an object requiring destruction, to an empty 106 handler. Hence, there are N destructor calls during unwinding. 107 \begin{cfa} 108 if (frames) { 133 This test measures the cost of raising an exception through non-empty frames, 134 where each frame has an object requiring destruction, from the bottom of the 135 recursion to an empty handler. Hence, there are N destructor calls during 136 unwinding. 137 138 \medskip 139 This test is not meaningful for resumption because the stack is only unwound as 140 the recursion returns. 141 \begin{cfa} 109 142 WithDestructor object; 110 unwind_ empty(frames - 1);143 unwind_destructor(frames - 1); 111 144 \end{cfa} 112 145 \item Finally: 113 146 This test measures the cost of establishing a try block with an empty finally 114 clause on the front side of the recursion and running the empty finally clause 115 on the back side of the recursion during stack unwinding. 116 \begin{cfa} 117 if (frames) { 147 clause on the front side of the recursion and running the empty finally clauses 148 during stack unwinding from the bottom of the recursion to an empty handler. 149 \begin{cfa} 118 150 try { 119 151 unwind_finally(frames - 1); 120 152 } finally {} 121 153 \end{cfa} 154 155 \medskip 156 This test is not meaningful for resumption because the stack is only unwound as 157 the recursion returns. 122 158 \item Other Handler: 123 This test is like the finally test but the try block has a catch clause for an 124 exception that is not raised, so catch matching is executed during stack 125 unwinding but the match never successes until the catch at the bottom of the 126 stack. 127 \begin{cfa} 128 if (frames) { 159 For termination, this test is like the finally test but the try block has a 160 catch clause for an exception that is not raised, so catch matching is executed 161 during stack unwinding but the match never successes until the catch at the 162 bottom of the recursion. 163 \begin{cfa} 129 164 try { 130 165 unwind_other(frames - 1); 131 166 } catch (not_raised_exception *) {} 132 167 \end{cfa} 168 169 \medskip 170 For resumption, this test measures the same raising cost but does not unwind 171 the stack. For languages without resumption, the same fixup function is passed 172 and called. 133 173 \end{itemize} 134 174 135 \paragraph{ Cross Try Statement}136 Th e nextgroup measures just the cost of executing a try statement so175 \paragraph{Try/Handle/Finally Statement} 176 This group measures just the cost of executing a try statement so 137 177 \emph{there is no stack unwinding}. Hence, the program main loops N times 138 178 around: … … 143 183 \begin{itemize}[nosep] 144 184 \item Handler: 145 The try statement has a handler .185 The try statement has a handler (catch/resume). 146 186 \item Finally: 147 The try statement replaces the handler witha finally clause.187 The try statement has a finally clause. 148 188 \end{itemize} 149 189 150 190 \paragraph{Conditional Matching} 151 This finalgroup measures the cost of conditional matching.191 This group measures the cost of conditional matching. 152 192 Only \CFA implements the language level conditional match, 153 the other languages m ust mimic with an ``unconditional" match (it still154 checks the exception's type) and conditional re-raise if it was not supposed193 the other languages mimic with an ``unconditional" match (it still 194 checks the exception's type) and conditional re-raise if it is not suppose 155 195 to handle that exception. 156 196 \begin{center} … … 180 220 The condition is always false. (Never matches or always re-raises.) 181 221 \end{itemize} 222 223 \medskip 224 \noindent 225 All omitted test code for other languages is functionally identical to the \CFA 226 tests or simulated, and available online~\cite{CforallExceptionBenchmarks}. 182 227 183 228 %\section{Cost in Size} … … 192 237 193 238 \section{Results} 194 In cases where a feature is not supported by a language the test is skipped 195 for that language. 196 \PAB{Report all values. 197 198 Similarly, if a test does not change between resumption 199 and termination in \CFA, then only one test is written and the result 200 was put into the termination column. 201 } 239 One result not directly related to \CFA but important to keep in 240 mind is that, for exceptions, the standard intuition about which languages 241 should go faster often does not hold. For example, there are a few cases where Python out-performs 242 \CFA, \Cpp and Java. The most likely explanation is that, since exceptions are 243 rarely considered to be the common case, the more optimized languages 244 make that case expense. In addition, languages with high-level 245 representations have a much easier time scanning the stack as there is less 246 to decode. 247 248 Tables~\ref{t:PerformanceTermination} and~\ref{t:PerformanceResumption} show 249 the test results for termination and resumption, respectively. In cases where 250 a feature is not supported by a language, the test is skipped for that language 251 (marked N/A). For some Java experiments it was impossible to measure certain 252 effects because the JIT corrupted the test (marked N/C). No workaround was 253 possible~\cite{Dice21}. To get experiments in the range of 1--100 seconds, the 254 number of times an experiment is run (N) is varied (N is marked beside each 255 experiment, e.g., 1M $\Rightarrow$ 1 million test iterations). 256 257 An anomaly exists with gcc nested functions used as thunks for implementing 258 much of the \CFA EHM. If a nested-function closure captures local variables in 259 its lexical scope, performance dropped by a factor of 10. Specifically, in try 260 statements of the form: 261 \begin{cfa} 262 try { 263 unwind_other(frames - 1); 264 } catch (not_raised_exception *) {} 265 \end{cfa} 266 the try block is hoisted into a nested function and the variable @frames@ is 267 the local parameter to the recursive function, which triggers the anomaly. The 268 workaround is to remove the recursion parameter and make it a global variable 269 that is explicitly decremented outside of the try block (marked with a ``*''): 270 \begin{cfa} 271 frames -= 1; 272 try { 273 unwind_other(); 274 } catch (not_raised_exception *) {} 275 \end{cfa} 276 To make comparisons fair, a dummy parameter is added and the dummy value passed 277 in the recursion. Note, nested functions in gcc are rarely used (if not 278 completely unknown) and must follow the C calling convention, unlike \Cpp 279 lambdas, so it is not surprising if there are performance issues efficiently 280 capturing closures. 281 282 % Similarly, if a test does not change between resumption 283 % and termination in \CFA, then only one test is written and the result 284 % was put into the termination column. 202 285 203 286 % Raw Data: … … 235 318 % Match None & 0.0 & 0.0 & 9476060146 & 0.0 & 0.0 \\ 236 319 237 \begin{tabular}{|l|c c c c c|}238 \hline239 & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\240 \hline241 Raise Empty & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\242 Raise D'tor & 0.0 & 0.0 & 0.0 & N/A & N/A \\243 Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\244 Raise Other & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\245 Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\246 Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\247 Match All & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\248 Match None & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\249 \hline250 \end{tabular}251 252 320 % run-plg7a-a.sat 253 321 % --------------- … … 284 352 % Match None & 0.0 & 0.0 & 7829059869 & 0.0 & 0.0 \\ 285 353 286 % PLG7A (in seconds) 287 \begin{tabular}{|l|c c c c c|} 354 \begin{table} 355 \centering 356 \caption{Performance Results Termination (sec)} 357 \label{t:PerformanceTermination} 358 \begin{tabular}{|r|*{2}{|r r r r|}} 288 359 \hline 289 & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\ 290 \hline 291 Raise Empty & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 292 Raise D'tor & 0.0 & 0.0 & 0.0 & N/A & N/A \\ 293 Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\ 294 Raise Other & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 295 Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 296 Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\ 297 Match All & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 298 Match None & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 360 & \multicolumn{4}{c||}{AMD} & \multicolumn{4}{c|}{ARM} \\ 361 \cline{2-9} 362 N\hspace{8pt} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} & 363 \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ 364 \hline 365 Throw Empty (1M) & 3.4 & 2.8 & 18.3 & 23.4 & 3.7 & 3.2 & 15.5 & 14.8 \\ 366 Throw D'tor (1M) & 48.4 & 23.6 & N/A & N/A & 64.2 & 29.0 & N/A & N/A \\ 367 Throw Finally (1M) & 3.4* & N/A & 17.9 & 29.0 & 4.1* & N/A & 15.6 & 19.0 \\ 368 Throw Other (1M) & 3.6* & 23.2 & 18.2 & 32.7 & 4.0* & 24.5 & 15.5 & 21.4 \\ 369 Try/Catch (100M) & 6.0 & 0.9 & N/C & 37.4 & 10.0 & 0.8 & N/C & 32.2 \\ 370 Try/Finally (100M) & 0.9 & N/A & N/C & 44.1 & 0.8 & N/A & N/C & 37.3 \\ 371 Match All (10M) & 32.9 & 20.7 & 13.4 & 4.9 & 36.2 & 24.5 & 12.0 & 3.1 \\ 372 Match None (10M) & 32.7 & 50.3 & 11.0 & 5.1 & 36.3 & 71.9 & 12.3 & 4.2 \\ 299 373 \hline 300 374 \end{tabular} 301 302 One result not directly related to \CFA but important to keep in 303 mind is that, for exceptions, the standard intuition about which languages 304 should go faster often does not hold. For example, there are cases where Python out-performs 305 \Cpp and Java. The most likely explanation is that, since exceptions are 306 rarely considered to be the common case, the more optimized languages 307 make that case expense. In addition, languages with high-level 308 representations have a much easier time scanning the stack as there is less 309 to decode. 310 311 This observation means that while \CFA does not actually keep up with Python in every 312 case, it is usually no worse than roughly half the speed of \Cpp. This performance is good 313 enough for the prototyping purposes of the project. 375 \end{table} 376 377 \begin{table} 378 \centering 379 \small 380 \caption{Performance Results Resumption (sec)} 381 \label{t:PerformanceResumption} 382 \setlength{\tabcolsep}{5pt} 383 \begin{tabular}{|r|*{2}{|r r r r|}} 384 \hline 385 & \multicolumn{4}{c||}{AMD} & \multicolumn{4}{c|}{ARM} \\ 386 \cline{2-9} 387 N\hspace{8pt} & \multicolumn{1}{c}{\CFA (R/F)} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} & 388 \multicolumn{1}{c}{\CFA (R/F)} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ 389 \hline 390 Resume Empty (10M) & 3.8/3.5 & 14.7 & 2.3 & 176.1 & 0.3/0.1 & 8.9 & 1.2 & 119.9 \\ 391 Resume Other (10M) & 4.0*/0.1* & 21.9 & 6.2 & 381.0 & 0.3*/0.1* & 13.2 & 5.0 & 290.7 \\ 392 Try/Resume (100M) & 8.8 & N/A & N/A & N/A & 12.3 & N/A & N/A & N/A \\ 393 Match All (10M) & 0.3 & N/A & N/A & N/A & 0.3 & N/A & N/A & N/A \\ 394 Match None (10M) & 0.3 & N/A & N/A & N/A & 0.4 & N/A & N/A & N/A \\ 395 \hline 396 \end{tabular} 397 \end{table} 398 399 As stated, the performance tests are not attempting to compare exception 400 handling across languages. The only performance requirement is to ensure the 401 \CFA EHM implementation runs in a reasonable amount of time, given its 402 constraints. In general, the \CFA implement did very well. Each of the tests is 403 analysed. 404 \begin{description} 405 \item[Throw/Resume Empty] 406 For termination, \CFA is close to \Cpp, where other languages have a higher cost. 407 408 For resumption, \CFA is better than the fixup simulations in the other languages, except Java. 409 The \CFA results on the ARM computer for both resumption and function simulation are particularly low; 410 I have no explanation for this anomaly, except the optimizer has managed to remove part of the experiment. 411 Python has a high cost for passing the lambda during the recursion. 412 413 \item[Throw D'tor] 414 For termination, \CFA is twice the cost of \Cpp. 415 The higher cost for \CFA must be related to how destructors are handled. 416 417 \item[Throw Finally] 418 \CFA is better than the other languages with a @finally@ clause, which is the 419 same for termination and resumption. 420 421 \item[Throw/Resume Other] 422 For termination, \CFA is better than the other languages. 423 424 For resumption, \CFA is equal to or better the other languages. 425 Again, the \CFA results on the ARM computer for both resumption and function simulation are particularly low. 426 Python has a high cost for passing the lambda during the recursion. 427 428 \item[Try/Catch/Resume] 429 For termination, installing a try statement is more expressive than \Cpp 430 because the try components are hoisted into local functions. At runtime, these 431 functions are than passed to libunwind functions to set up the try statement. 432 \Cpp zero-cost try-entry accounts for its performance advantage. 433 434 For resumption, there are similar costs to termination to set up the try 435 statement but libunwind is not used. 436 437 \item[Try/Finally] 438 Setting up a try finally is less expensive in \CFA than setting up handlers, 439 and is significantly less than other languages. 440 441 \item[Throw/Resume Match All] 442 For termination, \CFA is close to the other language simulations. 443 444 For resumption, the stack unwinding is much faster because it does not use 445 libunwind. Instead resumption is just traversing a linked list with each node 446 being the next stack frame with the try block. 447 448 \item[Throw/Resume Match None] 449 The same results as for Match All. 450 \end{description} 451 452 \begin{comment} 453 This observation means that while \CFA does not actually keep up with Python in 454 every case, it is usually no worse than roughly half the speed of \Cpp. This 455 performance is good enough for the prototyping purposes of the project. 314 456 315 457 The test case where \CFA falls short is Raise Other, the case where the 316 458 stack is unwound including a bunch of non-matching handlers. 317 459 This slowdown seems to come from missing optimizations. 318 319 Importantly, there is a huge slowdown in \Cpp's results bringing that brings320 \CFA's performance back in that roughly half speed area. However many other321 \CFA benchmarks increase their run-time by a similar amount falling far322 behind their \Cpp counter-parts.323 460 324 461 This suggests that the performance issue in Raise Other is just an … … 364 501 The difference in relative performance does show that there are savings to 365 502 be made by performing the check without catching the exception. 503 \end{comment} 504 505 506 \begin{comment} 507 From: Dave Dice <dave.dice@oracle.com> 508 To: "Peter A. Buhr" <pabuhr@uwaterloo.ca> 509 Subject: Re: [External] : JIT 510 Date: Mon, 16 Aug 2021 01:21:56 +0000 511 512 > On 2021-8-15, at 7:14 PM, Peter A. Buhr <pabuhr@uwaterloo.ca> wrote: 513 > 514 > My student is trying to measure the cost of installing a try block with a 515 > finally clause in Java. 516 > 517 > We tried the random trick (see below). But if the try block is comment out, the 518 > results are the same. So the program measures the calls to the random number 519 > generator and there is no cost for installing the try block. 520 > 521 > Maybe there is no cost for a try block with an empty finally, i.e., the try is 522 > optimized away from the get-go. 523 524 There's quite a bit of optimization magic behind the HotSpot curtains for 525 try-finally. (I sound like the proverbial broken record (:>)). 526 527 In many cases we can determine that the try block can't throw any exceptions, 528 so we can elide all try-finally plumbing. In other cases, we can convert the 529 try-finally to normal if-then control flow, in the case where the exception is 530 thrown into the same method. This makes exceptions _almost cost-free. If we 531 actually need to "physically" rip down stacks, then things get expensive, 532 impacting both the throw cost, and inhibiting other useful optimizations at the 533 catch point. Such "true" throws are not just expensive, they're _very 534 expensive. The extremely aggressive inlining used by the JIT helps, because we 535 can convert cases where a heavy rip-down would normally needed back into simple 536 control flow. 537 538 Other quirks involve the thrown exception object. If it's never accessed then 539 we're apply a nice set of optimizations to avoid its construction. If it's 540 accessed but never escapes the catch frame (common) then we can also cheat. 541 And if we find we're hitting lots of heavy rip-down cases, the JIT will 542 consider recompilation - better inlining -- to see if we can merge the throw 543 and catch into the same physical frame, and shift to simple branches. 544 545 In your example below, System.out.print() can throw, I believe. (I could be 546 wrong, but most IO can throw). Native calls that throw will "unwind" normally 547 in C++ code until they hit the boundary where they reenter java emitted code, 548 at which point the JIT-ed code checks for a potential pending exception. So in 549 a sense the throw point is implicitly after the call to the native method, so 550 we can usually make those cases efficient. 551 552 Also, when we're running in the interpreter and warming up, we'll notice that 553 the == 42 case never occurs, and so when we start to JIT the code, we elide the 554 call to System.out.print(), replacing it (and anything else which appears in 555 that if x == 42 block) with a bit of code we call an "uncommon trap". I'm 556 presuming we encounter 42 rarely. So if we ever hit the x == 42 case, control 557 hits the trap, which triggers synchronous recompilation of the method, this 558 time with the call to System.out.print() and, because of that, we now to adapt 559 the new code to handle any traps thrown by print(). This is tricky stuff, as 560 we may need to rebuild stack frames to reflect the newly emitted method. And 561 we have to construct a weird bit of "thunk" code that allows us to fall back 562 directly into the newly emitted "if" block. So there's a large one-time cost 563 when we bump into the uncommon trap and recompile, and subsequent execution 564 might get slightly slower as the exception could actually be generated, whereas 565 before we hit the trap, we knew the exception could never be raised. 566 567 Oh, and things also get expensive if we need to actually fill in the stack 568 trace associated with the exception object. Walking stacks is hellish. 569 570 Quite a bit of effort was put into all this as some of the specjvm benchmarks 571 showed significant benefit. 572 573 It's hard to get sensible measurements as the JIT is working against you at 574 every turn. What's good for the normal user is awful for anybody trying to 575 benchmark. Also, all the magic results in fairly noisy and less reproducible 576 results. 577 578 Regards 579 Dave 580 581 p.s., I think I've mentioned this before, but throwing in C++ is grim as 582 unrelated throws in different threads take common locks, so nothing scales as 583 you might expect. 584 \end{comment} -
doc/theses/andrew_beach_MMath/uw-ethesis.bib
rf79ee0d r262deda0 2 2 % For use with BibTeX 3 3 4 @book{goossens.book, 5 author = "Michel Goossens and Frank Mittelbach and 6 Alexander Samarin", 7 title = "The \LaTeX\ Companion", 8 year = "1994", 9 publisher = "Addison-Wesley", 10 address = "Reading, Massachusetts" 4 @misc{Dice21, 5 author = {Dave Dice}, 6 year = 2021, 7 month = aug, 8 howpublished= {personal communication} 11 9 } 12 10 13 @book{knuth.book, 14 author = "Donald Knuth", 15 title = "The \TeX book", 16 year = "1986", 17 publisher = "Addison-Wesley", 18 address = "Reading, Massachusetts" 11 @misc{CforallExceptionBenchmarks, 12 contributer = {pabuhr@plg}, 13 key = {Cforall Exception Benchmarks}, 14 author = {{\textsf{C}{$\mathbf{\forall}$} Exception Benchmarks}}, 15 howpublished= {\href{https://github.com/cforall/ExceptionBenchmarks_SPE20}{https://\-github.com/\-cforall/\-ExceptionBenchmarks\_SPE20}}, 19 16 } 20 21 @book{lamport.book,22 author = "Leslie Lamport",23 title = "\LaTeX\ --- A Document Preparation System",24 edition = "Second",25 year = "1994",26 publisher = "Addison-Wesley",27 address = "Reading, Massachusetts"28 }
Note: See TracChangeset
for help on using the changeset viewer.