Changeset c9f9d4f for doc/theses


Ignore:
Timestamp:
Aug 12, 2021, 10:12:54 PM (3 years ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
ADT, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
3b8acfb, 6cebfef, c99a0d1
Parents:
93d0ed3
Message:

first proofread of performance chapter

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/andrew_beach_MMath/performance.tex

    r93d0ed3 rc9f9d4f  
    22\label{c:performance}
    33
    4 Performance has been of secondary importance for most of this project.
    5 Instead, the focus has been to get the features working. The only performance
    6 requirements is to ensure the tests for correctness run in a reasonable
     4Performance is of secondary importance for most of this project.
     5Instead, the focus is to get the features working. The only performance
     6requirement is to ensure the tests for correctness run in a reasonable
    77amount of time.
    88
    99\section{Test Set-Up}
    10 Tests will be run in \CFA, C++, Java and Python.
     10Tests were run in \CFA, C++, Java and Python.
    1111In addition there are two sets of tests for \CFA,
    12 one for termination exceptions and once with resumption exceptions.
     12one for termination and one for resumption exceptions.
    1313
    1414C++ is the most comparable language because both it and \CFA use the same
    1515framework, libunwind.
    1616In fact, the comparison is almost entirely a quality of implementation
    17 comparison. \CFA's EHM has had significantly less time to be optimized and
     17comparison: \CFA's EHM has had significantly less time to be optimized and
    1818does not generate its own assembly. It does have a slight advantage in that
    19 there are some features it does not handle, through utility functions,
     19there are some features it handles directly instead of through utility functions,
    2020but otherwise \Cpp has a significant advantage.
    2121
     
    2323It is implemented in a very different environment, a virtual machine with
    2424garbage collection.
    25 It also implements the finally clause on try blocks allowing for a direct
     25It also implements the @finally@ clause on @try@ blocks allowing for a direct
    2626feature-to-feature comparison.
    27 As with \Cpp, Java's implementation is more mature, has more optimizations
    28 and more extra features.
    29 
    30 Python was used as a point of comparison because of the \CFA EHM's
    31 current performance goals, which is not be prohibitively slow while the
     27As with \Cpp, Java's implementation is mature, optimizations
     28and has extra features.
     29
     30Python is used as an alternative point of comparison because of the \CFA EHM's
     31current performance goals, which is not to be prohibitively slow while the
    3232features are designed and examined. Python has similar performance goals for
    3333creating quick scripts and its wide use suggests it has achieved those goals.
    3434
    35 Unfortunately there are no notable modern programming languages with
    36 resumption exceptions. Even the older programming languages with resumptions
    37 seem to be notable only for having resumptions.
    38 So instead resumptions are compared to a less similar but much more familiar
     35Unfortunately, there are no notable modern programming languages with
     36resumption exceptions. Even the older programming languages with resumption
     37seem to be notable only for having resumption.
     38So instead, resumption is compared to a less similar but much more familiar
    3939feature, termination exceptions.
    4040
    41 All tests are run inside a main loop which will perform the test
    42 repeatedly. This is to avoids start-up or tear-down time from
     41All tests are run inside a main loop that repeatedly performs a test.
     42This approach avoids start-up or tear-down time from
    4343affecting the timing results.
    44 Tests ran their main loop a million times.
    45 The Java versions of the test also run this loop an extra 1000 times before
    46 beginning to time the results to ``warm-up" the JVM.
     44Each test is run a million times.
     45The Java versions of the test run this loop an extra 1000 times before
     46beginning to actual test to ``warm-up" the JVM.
    4747
    4848Timing is done internally, with time measured immediately before and
    49 immediately after the test loop. The difference is calculated and printed.
    50 
     49after the test loop. The difference is calculated and printed.
    5150The loop structure and internal timing means it is impossible to test
    5251unhandled exceptions in \Cpp and Java as that would cause the process to
     
    5554critical.
    5655
    57 The exceptions used in these tests will always be a exception based off of
    58 the base exception. This requirement minimizes performance differences based
    59 on the object model used to repersent the exception.
    60 
    61 All tests were designed to be as minimal as possible while still preventing
    62 exessive optimizations.
     56The exceptions used in these tests are always based off of
     57a base exception. This requirement minimizes performance differences based
     58on the object model used to represent the exception.
     59
     60All tests are designed to be as minimal as possible, while still preventing
     61excessive optimizations.
    6362For example, empty inline assembly blocks are used in \CFA and \Cpp to
    6463prevent excessive optimizations while adding no actual work.
     64Each test was run eleven times. The top three and bottom three results were
     65discarded and the remaining five values are averaged.
     66
     67The tests are compiled with gcc-10 for \CFA and g++-10 for \Cpp. Java is
     68compiled with 11.0.11. Python with 3.8. The tests were run on:
     69\begin{itemize}[nosep]
     70\item
     71ARM 2280 Kunpeng 920 48-core 2$\times$socket \lstinline{@} 2.6 GHz running Linux v5.11.0-25
     72\item
     73AMD 6380 Abu Dhabi 16-core 4$\times$socket \lstinline{@} 2.5 GHz running Linux v5.11.0-25
     74\end{itemize}
    6575
    6676% We don't use catch-alls but if we did:
     
    7181The following tests were selected to test the performance of different
    7282components of the exception system.
    73 The should provide a guide as to where the EHM's costs can be found.
     83They should provide a guide as to where the EHM's costs are found.
    7484
    7585\paragraph{Raise and Handle}
    76 The first group of tests involve setting up
    77 So there is three layers to the test. The first is set up and a loop, which
    78 configures the test and then runs it repeatedly to reduce the impact of
    79 start-up and shutdown on the results.
    80 Each iteration of the main loop
    81 \begin{itemize}[nosep]
    82 \item Empty Function:
    83 The repeating function is empty except for the necessary control code.
     86The first group measures the cost of a try statement when exceptions are raised
     87and \emph{the stack is unwound}.  Each test has has a repeating function like
     88the following
     89\begin{cfa}
     90void unwind_empty(unsigned int frames) {
     91        if (frames) {
     92                unwind_empty(frames - 1);
     93        } else throw (empty_exception){&empty_vt};
     94}
     95\end{cfa}
     96which is called M times, where each call recurses to a depth of N, an
     97exception is raised, the stack is a unwound, and the exception caught.
     98\begin{itemize}[nosep]
     99\item Empty:
     100This test measures the cost of raising (stack walking) an exception through empty
     101empty stack frames to an empty handler. (see above)
    84102\item Destructor:
    85 The repeating function creates an object with a destructor before calling
    86 itself.
     103
     104This test measures the cost of raising an exception through non-empty frames
     105where each frame has an object requiring destruction, to an empty
     106handler. Hence, there are N destructor calls during unwinding.
     107\begin{cfa}
     108if (frames) {
     109        WithDestructor object;
     110        unwind_empty(frames - 1);
     111\end{cfa}
    87112\item Finally:
    88 The repeating function calls itself inside a try block with a finally clause
    89 attached.
     113This test measures the cost of establishing a try block with an empty finally
     114clause on the front side of the recursion and running the empty finally clause
     115on the back side of the recursion during stack unwinding.
     116\begin{cfa}
     117if (frames) {
     118        try {
     119                unwind_finally(frames - 1);
     120        } finally {}
     121\end{cfa}
    90122\item Other Handler:
    91 The repeating function calls itself inside a try block with a handler that
    92 will not match the raised exception. (But is of the same kind of handler.)
     123This test is like the finally test but the try block has a catch clause for an
     124exception that is not raised, so catch matching is executed during stack
     125unwinding but the match never successes until the catch at the bottom of the
     126stack.
     127\begin{cfa}
     128if (frames) {
     129        try {
     130                unwind_other(frames - 1);
     131        } catch (not_raised_exception *) {}
     132\end{cfa}
    93133\end{itemize}
    94134
    95135\paragraph{Cross Try Statement}
    96 The next group measures the cost of a try statement when no exceptions are
    97 raised. The test is set-up, then there is a loop to reduce the impact of
    98 start-up and shutdown on the results.
    99 In each iteration, a try statement is executed. Entering and leaving a loop
    100 is all the test wants to do.
     136The next group measures just the cost of executing a try statement so
     137\emph{there is no stack unwinding}.  Hence, the program main loops N times
     138around:
     139\begin{cfa}
     140try {
     141} catch (not_raised_exception *) {}
     142\end{cfa}
    101143\begin{itemize}[nosep]
    102144\item Handler:
    103 The try statement has a handler (of the matching kind).
     145The try statement has a handler.
    104146\item Finally:
    105 The try statement has a finally clause.
     147The try statement replaces the handler with a finally clause.
    106148\end{itemize}
    107149
    108150\paragraph{Conditional Matching}
    109 This group of tests checks the cost of conditional matching.
     151This final group measures the cost of conditional matching.
    110152Only \CFA implements the language level conditional match,
    111153the other languages must mimic with an ``unconditional" match (it still
    112154checks the exception's type) and conditional re-raise if it was not supposed
    113155to handle that exception.
     156\begin{center}
     157\begin{tabular}{ll}
     158\multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp, Java, Python} \\
     159\begin{cfa}
     160try {
     161        throw_exception();
     162} catch (empty_exception * exc;
     163                 should_catch) {
     164}
     165\end{cfa}
     166&
     167\begin{cfa}
     168try {
     169        throw_exception();
     170} catch (EmptyException & exc) {
     171        if (!should_catch) throw;
     172}
     173\end{cfa}
     174\end{tabular}
     175\end{center}
    114176\begin{itemize}[nosep]
    115177\item Match All:
     
    130192
    131193\section{Results}
    132 Each test was run eleven times. The top three and bottom three results were
    133 discarded and the remaining five values are averaged.
    134 
    135194In cases where a feature is not supported by a language the test is skipped
    136 for that language. Similarly, if a test is does not change between resumption
     195for that language.
     196\PAB{Report all values.
     197
     198Similarly, if a test does not change between resumption
    137199and termination in \CFA, then only one test is written and the result
    138200was put into the termination column.
     201}
    139202
    140203% Raw Data:
     
    237300\end{tabular}
    238301
    239 One result that is not directly related to \CFA but is important to keep in
    240 mind is that in exceptions the standard intuitions about which languages
    241 should go faster often do not hold. There are cases where Python out-preforms
    242 \Cpp and Java. The most likely explination is that, since exceptions are
    243 rarely considered to be the common case, the more optimized langages have
    244 optimized at their expence. In addition languages with high level           
    245 repersentations have a much easier time scanning the stack as there is less
     302One result not directly related to \CFA but important to keep in
     303mind is that, for exceptions, the standard intuition about which languages
     304should go faster often does not hold. For example, there are cases where Python out-performs
     305\Cpp and Java. The most likely explanation is that, since exceptions are
     306rarely considered to be the common case, the more optimized languages
     307make that case expense. In addition, languages with high-level
     308representations have a much easier time scanning the stack as there is less
    246309to decode.
    247310
    248 This means that while \CFA does not actually keep up with Python in every
    249 case it is usually no worse than roughly half the speed of \Cpp. This is good
     311This observation means that while \CFA does not actually keep up with Python in every
     312case, it is usually no worse than roughly half the speed of \Cpp. This performance is good
    250313enough for the prototyping purposes of the project.
    251314
    252315The test case where \CFA falls short is Raise Other, the case where the
    253316stack is unwound including a bunch of non-matching handlers.
    254 This slowdown seems to come from missing optimizations,
    255 the results above came from gcc/g++ 10 (gcc as \CFA backend or g++ for \Cpp)
    256 but the results change if they are run in gcc/g++ 9 instead.
     317This slowdown seems to come from missing optimizations.
     318
    257319Importantly, there is a huge slowdown in \Cpp's results bringing that brings
    258 \CFA's performace back in that roughly half speed area. However many other
     320\CFA's performance back in that roughly half speed area. However many other
    259321\CFA benchmarks increase their run-time by a similar amount falling far
    260322behind their \Cpp counter-parts.
     
    269331Resumption exception handling is also incredibly fast. Often an order of
    270332magnitude or two better than the best termination speed.
    271 There is a simple explination for this; traversing a linked list is much   
     333There is a simple explanation for this; traversing a linked list is much   
    272334faster than examining and unwinding the stack. When resumption does not do as
    273 well its when more try statements are used per raise. Updating the interal
    274 linked list is not very expencive but it does add up.
     335well its when more try statements are used per raise. Updating the internal
     336linked list is not very expensive but it does add up.
    275337
    276338The relative speed of the Match All and Match None tests (within each
     
    280342\item
    281343Java and Python get similar values in both tests.
    282 Between the interperated code, a higher level repersentation of the call
     344Between the interpreted code, a higher level representation of the call
    283345stack and exception reuse it it is possible the cost for a second
    284346throw can be folded into the first.
    285347% Is this due to optimization?
    286348\item
    287 Both types of \CFA are slighly slower if there is not a match.
     349Both types of \CFA are slightly slower if there is not a match.
    288350For termination this likely comes from unwinding a bit more stack through
    289351libunwind instead of executing the code normally.
Note: See TracChangeset for help on using the changeset viewer.