Changeset cfbab07


Ignore:
Timestamp:
Aug 29, 2021, 11:45:49 AM (3 years ago)
Author:
Andrew Beach <ajbeach@…>
Branches:
ADT, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, pthread-emulation, qualifiedEnum
Children:
eaeca5f
Parents:
e8bad5c8
Message:

Andrew MMath: Updated performance chapter, using Peter's feedback and discussion.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/andrew_beach_MMath/performance.tex

    re8bad5c8 rcfbab07  
    22\label{c:performance}
    33
    4 Performance has been of secondary importance for most of this project.
    5 Instead, the focus has been to get the features working. The only performance
    6 requirements is to ensure the tests for correctness run in a reasonable
    7 amount of time.
     4Performance is of secondary importance for most of this project.
     5Instead, the focus was to get the features working. The only performance
     6requirement is to ensure the tests for correctness run in a reasonable
     7amount of time. Hence, a few basic performance tests were performed to
     8check this requirement.
    89
    910\section{Test Set-Up}
    10 Tests will be run in \CFA, C++, Java and Python.
     11Tests were run in \CFA, C++, Java and Python.
    1112In addition there are two sets of tests for \CFA,
    12 one for termination exceptions and once with resumption exceptions.
     13one for termination and once with resumption.
    1314
    1415C++ is the most comparable language because both it and \CFA use the same
    1516framework, libunwind.
    16 In fact, the comparison is almost entirely a quality of implementation
    17 comparison. \CFA's EHM has had significantly less time to be optimized and
     17In fact, the comparison is almost entirely in quality of implementation.
     18Specifically, \CFA's EHM has had significantly less time to be optimized and
    1819does not generate its own assembly. It does have a slight advantage in that
    19 there are some features it does not handle, through utility functions,
    20 but otherwise \Cpp has a significant advantage.
    21 
    22 Java is another very popular language with similar termination semantics.
    23 It is implemented in a very different environment, a virtual machine with
     20\Cpp has to do some extra bookkeeping to support its utility functions,
     21but otherwise \Cpp should have a significant advantage.
     22
     23Java a popular language with similar termination semantics, but
     24it is implemented in a very different environment, a virtual machine with
    2425garbage collection.
    2526It also implements the finally clause on try blocks allowing for a direct
    2627feature-to-feature comparison.
    27 As with \Cpp, Java's implementation is more mature, has more optimizations
    28 and more extra features.
    29 
    30 Python was used as a point of comparison because of the \CFA EHM's
    31 current performance goals, which is not be prohibitively slow while the
     28As with \Cpp, Java's implementation is mature, has more optimizations
     29and extra features as compared to \CFA.
     30
     31Python is used as an alternative comparison because of the \CFA EHM's
     32current performance goals, which is to not be prohibitively slow while the
    3233features are designed and examined. Python has similar performance goals for
    3334creating quick scripts and its wide use suggests it has achieved those goals.
    3435
    35 Unfortunately there are no notable modern programming languages with
    36 resumption exceptions. Even the older programming languages with resumptions
    37 seem to be notable only for having resumptions.
    38 So instead resumptions are compared to a less similar but much more familiar
    39 feature, termination exceptions.
    40 
    41 All tests are run inside a main loop which will perform the test
    42 repeatedly. This is to avoids start-up or tear-down time from
     36Unfortunately, there are no notable modern programming languages with
     37resumption exceptions. Even the older programming languages with resumption
     38seem to be notable only for having resumption.
     39Instead, resumption is compared to its simulation in other programming
     40languages: fixup functions that are explicity passed into a function.
     41
     42All tests are run inside a main loop that repeatedly performs a test.
     43This approach avoids start-up or tear-down time from
    4344affecting the timing results.
     45The number of times the loop is run is configurable from the command line,
     46the number used in the timing runs is given with the results per test.
    4447Tests ran their main loop a million times.
    45 The Java versions of the test also run this loop an extra 1000 times before
    46 beginning to time the results to ``warm-up" the JVM.
     48The Java tests runs the main loop 1000 times before
     49beginning the actual test to ``warm-up" the JVM.
     50% All other languages are precompiled or interpreted.
    4751
    4852Timing is done internally, with time measured immediately before and
    49 immediately after the test loop. The difference is calculated and printed.
    50 
     53after the test loop. The difference is calculated and printed.
    5154The loop structure and internal timing means it is impossible to test
    5255unhandled exceptions in \Cpp and Java as that would cause the process to
     
    5558critical.
    5659
    57 The exceptions used in these tests will always be a exception based off of
    58 the base exception. This requirement minimizes performance differences based
    59 on the object model used to repersent the exception.
    60 
    61 All tests were designed to be as minimal as possible while still preventing
    62 exessive optimizations.
     60The exceptions used in these tests are always based off of
     61the base exception for the language.
     62This requirement minimizes performance differences based
     63on the object model used to represent the exception.
     64
     65All tests are designed to be as minimal as possible, while still preventing
     66excessive optimizations.
    6367For example, empty inline assembly blocks are used in \CFA and \Cpp to
    6468prevent excessive optimizations while adding no actual work.
     
    6872% \code{C++}{catch(...)}).
    6973
     74When collecting data each test is run eleven times. The top three and bottom
     75three results are discarded and the remaining five values are averaged.
     76The test are run with the latest (still pre-release) \CFA compiler was used,
     77using gcc-10 as a backend.
     78g++-10 is used for \Cpp.
     79Java tests are complied and run with version 11.0.11.
     80Python used version 3.8.
     81The machines used to run the tests are:
     82\todo{Get patch versions for python, gcc and g++.}
     83\begin{itemize}[nosep]
     84\item ARM 2280 Kunpeng 920 48-core 2$\times$socket
     85      \lstinline{@} 2.6 GHz running Linux v5.11.0-25
     86\item AMD 6380 Abu Dhabi 16-core 4$\times$socket
     87      \lstinline{@} 2.5 GHz running Linux v5.11.0-25
     88\end{itemize}
     89Representing the two major families of hardware architecture.
     90
    7091\section{Tests}
    7192The following tests were selected to test the performance of different
    7293components of the exception system.
    73 The should provide a guide as to where the EHM's costs can be found.
    74 
    75 \paragraph{Raise and Handle}
    76 The first group of tests involve setting up
    77 So there is three layers to the test. The first is set up and a loop, which
    78 configures the test and then runs it repeatedly to reduce the impact of
    79 start-up and shutdown on the results.
    80 Each iteration of the main loop
     94They should provide a guide as to where the EHM's costs are found.
     95
     96\paragraph{Stack Traversal}
     97This group measures the cost of traversing the stack,
     98(and in termination, unwinding it).
     99Inside the main loop is a call to a recursive function.
     100This function calls itself F times before raising an exception.
     101F is configurable from the command line, but is usually 100.
     102This builds up many stack frames, and any contents they may have,
     103before the raise.
     104The exception is always handled at the base of the stack.
     105For example the Empty test for \CFA resumption looks like:
     106\begin{cfa}
     107void unwind_empty(unsigned int frames) {
     108        if (frames) {
     109                unwind_empty(frames - 1);
     110        } else {
     111                throwResume (empty_exception){&empty_vt};
     112        }
     113}
     114\end{cfa}
     115Other test cases have additional code around the recursive call add
     116something besides simple stack frames to the stack.
     117Note that both termination and resumption will have to traverse over
     118the stack but only termination has to unwind it.
    81119\begin{itemize}[nosep]
    82 \item Empty Function:
     120% \item None:
     121% Reuses the empty test code (see below) except that the number of frames
     122% is set to 0 (this is the only test for which the number of frames is not
     123% 100). This isolates the start-up and shut-down time of a throw.
     124\item Empty:
    83125The repeating function is empty except for the necessary control code.
     126As other traversal tests add to this, so it is the baseline for the group
     127as the cost comes from traversing over and unwinding a stack frame
     128that has no other interactions with the exception system.
    84129\item Destructor:
    85130The repeating function creates an object with a destructor before calling
    86131itself.
     132Comparing this to the empty test gives the time to traverse over and/or
     133unwind a destructor.
    87134\item Finally:
    88135The repeating function calls itself inside a try block with a finally clause
    89136attached.
     137Comparing this to the empty test gives the time to traverse over and/or
     138unwind a finally clause.
    90139\item Other Handler:
    91140The repeating function calls itself inside a try block with a handler that
    92 will not match the raised exception. (But is of the same kind of handler.)
     141will not match the raised exception, but is of the same kind of handler.
     142This means that the EHM will have to check each handler, but will continue
     143over all of the until it reaches the base of the stack.
     144Comparing this to the empty test gives the time to traverse over and/or
     145unwind a handler.
    93146\end{itemize}
    94147
    95148\paragraph{Cross Try Statement}
    96 The next group measures the cost of a try statement when no exceptions are
    97 raised. The test is set-up, then there is a loop to reduce the impact of
    98 start-up and shutdown on the results.
    99 In each iteration, a try statement is executed. Entering and leaving a loop
    100 is all the test wants to do.
     149This group of tests measures the cost setting up exception handling if it is
     150not used (because the exceptional case did not occur).
     151Tests repeatedly cross (enter and leave, execute) a try statement but never
     152preform a raise.
    101153\begin{itemize}[nosep]
    102154\item Handler:
    103 The try statement has a handler (of the matching kind).
     155The try statement has a handler (of the appropriate kind).
    104156\item Finally:
    105157The try statement has a finally clause.
     
    107159
    108160\paragraph{Conditional Matching}
    109 This group of tests checks the cost of conditional matching.
     161This group measures the cost of conditional matching.
    110162Only \CFA implements the language level conditional match,
    111 the other languages must mimic with an ``unconditional" match (it still
    112 checks the exception's type) and conditional re-raise if it was not supposed
     163the other languages mimic it with an ``unconditional" match (it still
     164checks the exception's type) and conditional re-raise if it is not supposed
    113165to handle that exception.
     166
     167There is the pattern shown in \CFA and \Cpp. Java and Python use the same
     168pattern as \Cpp, but with their own syntax.
     169
     170\begin{minipage}{0.45\textwidth}
     171\begin{cfa}
     172try {
     173        ...
     174} catch (exception_t * e ;
     175                should_catch(e)) {
     176        ...
     177}
     178\end{cfa}
     179\end{minipage}
     180\begin{minipage}{0.55\textwidth}
     181\begin{lstlisting}[language=C++]
     182try {
     183        ...
     184} catch (std::exception & e) {
     185        if (!should_catch(e)) throw;
     186        ...
     187}
     188\end{lstlisting}
     189\end{minipage}
    114190\begin{itemize}[nosep]
    115191\item Match All:
     
    118194The condition is always false. (Never matches or always re-raises.)
    119195\end{itemize}
     196
     197\paragraph{Resumption Simulation}
     198A slightly altered version of the Empty Traversal test is used when comparing
     199resumption to fix-up routines.
     200The handler, the actual resumption handler or the fix-up routine,
     201always captures a variable at the base of the loop,
     202and receives a reference to a variable at the raise site, either as a
     203field on the exception or an argument to the fix-up routine.
     204% I don't actually know why that is here but not anywhere else.
    120205
    121206%\section{Cost in Size}
     
    130215
    131216\section{Results}
    132 Each test was run eleven times. The top three and bottom three results were
    133 discarded and the remaining five values are averaged.
    134 
    135 In cases where a feature is not supported by a language the test is skipped
    136 for that language. Similarly, if a test is does not change between resumption
    137 and termination in \CFA, then only one test is written and the result
    138 was put into the termination column.
    139 
    140 % Raw Data:
    141 % run-algol-a.sat
    142 % ---------------
    143 % Raise Empty   &  82687046678 &  291616256 &   3252824847 & 15422937623 & 14736271114 \\
    144 % Raise D'tor   & 219933199603 &  297897792 & 223602799362 &         N/A &         N/A \\
    145 % Raise Finally & 219703078448 &  298391745 &          N/A &         ... & 18923060958 \\
    146 % Raise Other   & 296744104920 & 2854342084 & 112981255103 & 15475924808 & 21293137454 \\
    147 % Cross Handler &      9256648 &   13518430 &       769328 &     3486252 &    31790804 \\
    148 % Cross Finally &       769319 &        N/A &          N/A &     2272831 &    37491962 \\
    149 % Match All     &   3654278402 &   47518560 &   3218907794 &  1296748192 &   624071886 \\
    150 % Match None    &   4788861754 &   58418952 &   9458936430 &  1318065020 &   625200906 \\
    151 %
    152 % run-algol-thr-c
    153 % ---------------
    154 % Raise Empty   &   3757606400 &   36472972 &   3257803337 & 15439375452 & 14717808642 \\
    155 % Raise D'tor   &  64546302019 &  102148375 & 223648121635 &         N/A &         N/A \\
    156 % Raise Finally &  64671359172 &  103285005 &          N/A & 15442729458 & 18927008844 \\
    157 % Raise Other   & 294143497130 & 2630130385 & 112969055576 & 15448220154 & 21279953424 \\
    158 % Cross Handler &      9646462 &   11955668 &       769328 &     3453707 &    31864074 \\
    159 % Cross Finally &       773412 &        N/A &          N/A &     2253825 &    37266476 \\
    160 % Match All     &   3719462155 &   43294042 &   3223004977 &  1286054154 &   623887874 \\
    161 % Match None    &   4971630929 &   55311709 &   9481225467 &  1310251289 &   623752624 \\
    162 %
    163 % run-algol-04-a
    164 % --------------
    165 % Raise Empty   & 0.0 & 0.0 &  3250260945 & 0.0 & 0.0 \\
    166 % Raise D'tor   & 0.0 & 0.0 & 29017675113 & N/A & N/A \\
    167 % Raise Finally & 0.0 & 0.0 &         N/A & 0.0 & 0.0 \\
    168 % Raise Other   & 0.0 & 0.0 & 24411823773 & 0.0 & 0.0 \\
    169 % Cross Handler & 0.0 & 0.0 &      769334 & 0.0 & 0.0 \\
    170 % Cross Finally & 0.0 & N/A &         N/A & 0.0 & 0.0 \\
    171 % Match All     & 0.0 & 0.0 &  3254283504 & 0.0 & 0.0 \\
    172 % Match None    & 0.0 & 0.0 &  9476060146 & 0.0 & 0.0 \\
    173 
    174 \begin{tabular}{|l|c c c c c|}
    175 \hline
    176               & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\
    177 \hline
    178 Raise Empty   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
    179 Raise D'tor   & 0.0 & 0.0 & 0.0 & N/A & N/A \\
    180 Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
    181 Raise Other   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
    182 Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
    183 Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
    184 Match All     & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
    185 Match None    & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
     217% First, introduce the tables.
     218\autoref{t:PerformanceTermination},
     219\autoref{t:PerformanceResumption}
     220and~\autoref{t:PerformanceFixupRoutines}
     221show the test results.
     222In cases where a feature is not supported by a language, the test is skipped
     223for that language and the result is marked N/A.
     224There are also cases where the feature is supported but measuring its
     225cost is impossible. This happened with Java, which uses a JIT that optimize
     226away the tests and it cannot be stopped.\cite{Dice21}
     227These tests are marked N/C.
     228To get results in a consistent range (1 second to 1 minute is ideal,
     229going higher is better than going low) N, the number of iterations of the
     230main loop in each test, is varied between tests. It is also given in the
     231results and usually have a value in the millions.
     232
     233An anomaly in some results came from \CFA's use of gcc nested functions.
     234These nested functions are used to create closures that can access stack
     235variables in their lexical scope.
     236However, if they do so then they can cause the benchmark's run-time to
     237increase by an order of magnitude.
     238The simplest solution is to make those values global variables instead
     239of function local variables.
     240% Do we know if editing a global inside nested function is a problem?
     241Tests that had to be modified to avoid this problem have been marked
     242with a ``*'' in the results.
     243
     244% Now come the tables themselves:
     245% You might need a wider window for this.
     246
     247\begin{table}[htb]
     248\centering
     249\caption{Termination Performance Results (sec)}
     250\label{t:PerformanceTermination}
     251\begin{tabular}{|r|*{2}{|r r r r|}}
     252\hline
     253                       & \multicolumn{4}{c||}{AMD}         & \multicolumn{4}{c|}{ARM}  \\
     254\cline{2-9}
     255N\hspace{8pt}          & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} &
     256                         \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\
     257\hline
     258Empty Traversal (1M)   & 3.4   & 2.8   & 18.3  & 23.4      & 3.7   & 3.2   & 15.5  & 14.8  \\
     259D'tor Traversal (1M)   & 48.4  & 23.6  & N/A   & N/A       & 64.2  & 29.0  & N/A   & N/A   \\
     260Finally Traversal (1M) & 3.4*  & N/A   & 17.9  & 29.0      & 4.1*  & N/A   & 15.6  & 19.0  \\
     261Other Traversal (1M)   & 3.6*  & 23.2  & 18.2  & 32.7      & 4.0*  & 24.5  & 15.5  & 21.4  \\
     262Cross Handler (100M)   & 6.0   & 0.9   & N/C   & 37.4      & 10.0  & 0.8   & N/C   & 32.2  \\
     263Cross Finally (100M)   & 0.9   & N/A   & N/C   & 44.1      & 0.8   & N/A   & N/C   & 37.3  \\
     264Match All (10M)        & 32.9  & 20.7  & 13.4  & 4.9       & 36.2  & 24.5  & 12.0  & 3.1   \\
     265Match None (10M)       & 32.7  & 50.3  & 11.0  & 5.1       & 36.3  & 71.9  & 12.3  & 4.2   \\
    186266\hline
    187267\end{tabular}
    188 
    189 % run-plg7a-a.sat
    190 % ---------------
    191 % Raise Empty   &  57169011329 &  296612564 &   2788557155 & 17511466039 & 23324548496 \\
    192 % Raise D'tor   & 150599858014 &  318443709 & 149651693682 &         N/A &         N/A \\
    193 % Raise Finally & 148223145000 &  373325807 &          N/A &         ... & 29074552998 \\
    194 % Raise Other   & 189463708732 & 3017109322 &  85819281694 & 17584295487 & 32602686679 \\
    195 % Cross Handler &      8001654 &   13584858 &      1555995 &     6626775 &    41927358 \\
    196 % Cross Finally &      1002473 &        N/A &          N/A &     4554344 &    51114381 \\
    197 % Match All     &   3162460860 &   37315018 &   2649464591 &  1523205769 &   742374509 \\
    198 % Match None    &   4054773797 &   47052659 &   7759229131 &  1555373654 &   744656403 \\
    199 %
    200 % run-plg7a-thr-a
    201 % ---------------
    202 % Raise Empty   &   3604235388 &   29829965 &   2786931833 & 17576506385 & 23352975105 \\
    203 % Raise D'tor   &  46552380948 &  178709605 & 149834207219 &         N/A &         N/A \\
    204 % Raise Finally &  46265157775 &  177906320 &          N/A & 17493045092 & 29170962959 \\
    205 % Raise Other   & 195659245764 & 2376968982 &  86070431924 & 17552979675 & 32501882918 \\
    206 % Cross Handler &    397031776 &   12503552 &      1451225 &     6658628 &    42304965 \\
    207 % Cross Finally &      1136746 &        N/A &          N/A &     4468799 &    46155817 \\
    208 % Match All     &   3189512499 &   39124453 &   2667795989 &  1525889031 &   733785613 \\
    209 % Match None    &   4094675477 &   48749857 &   7850618572 &  1566713577 &   733478963 \\
    210 %
    211 % run-plg7a-04-a
    212 % --------------
    213 % 0.0 are unfilled.
    214 % Raise Empty   & 0.0 & 0.0 &  2770781479 & 0.0 & 0.0 \\
    215 % Raise D'tor   & 0.0 & 0.0 & 23530084907 & N/A & N/A \\
    216 % Raise Finally & 0.0 & 0.0 &         N/A & 0.0 & 0.0 \\
    217 % Raise Other   & 0.0 & 0.0 & 23816827982 & 0.0 & 0.0 \\
    218 % Cross Handler & 0.0 & 0.0 &     1422188 & 0.0 & 0.0 \\
    219 % Cross Finally & 0.0 & N/A &         N/A & 0.0 & 0.0 \\
    220 % Match All     & 0.0 & 0.0 &  2671989778 & 0.0 & 0.0 \\
    221 % Match None    & 0.0 & 0.0 &  7829059869 & 0.0 & 0.0 \\
    222 
    223 % PLG7A (in seconds)
    224 \begin{tabular}{|l|c c c c c|}
    225 \hline
    226               & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\
    227 \hline
    228 Raise Empty   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
    229 Raise D'tor   & 0.0 & 0.0 & 0.0 & N/A & N/A \\
    230 Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
    231 Raise Other   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
    232 Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
    233 Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
    234 Match All     & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
    235 Match None    & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
     268\end{table}
     269
     270\begin{table}[htb]
     271\centering
     272\caption{Resumption Performance Results (sec)}
     273\label{t:PerformanceResumption}
     274\begin{tabular}{|r||r||r|}
     275\hline
     276N\hspace{8pt}
     277                        & AMD     & ARM  \\
     278\hline
     279Empty Traversal (10M)   & 0.2     & 0.3  \\
     280D'tor Traversal (10M)   & 1.8     & 1.0  \\
     281Finally Traversal (10M) & 1.7     & 1.0  \\
     282Other Traversal (10M)   & 22.6    & 25.9 \\
     283Cross Handler (100M)    & 8.4     & 11.9 \\
     284Match All (100M)        & 2.3     & 3.2  \\
     285Match None (100M)       & 2.9     & 3.9  \\
    236286\hline
    237287\end{tabular}
    238 
    239 One result that is not directly related to \CFA but is important to keep in
    240 mind is that in exceptions the standard intuitions about which languages
    241 should go faster often do not hold. There are cases where Python out-preforms
    242 \Cpp and Java. The most likely explination is that, since exceptions are
    243 rarely considered to be the common case, the more optimized langages have
    244 optimized at their expence. In addition languages with high level           
    245 repersentations have a much easier time scanning the stack as there is less
    246 to decode.
    247 
    248 This means that while \CFA does not actually keep up with Python in every
    249 case it is usually no worse than roughly half the speed of \Cpp. This is good
    250 enough for the prototyping purposes of the project.
    251 
    252 The test case where \CFA falls short is Raise Other, the case where the
    253 stack is unwound including a bunch of non-matching handlers.
    254 This slowdown seems to come from missing optimizations,
    255 the results above came from gcc/g++ 10 (gcc as \CFA backend or g++ for \Cpp)
    256 but the results change if they are run in gcc/g++ 9 instead.
    257 Importantly, there is a huge slowdown in \Cpp's results bringing that brings
    258 \CFA's performace back in that roughly half speed area. However many other
    259 \CFA benchmarks increase their run-time by a similar amount falling far
    260 behind their \Cpp counter-parts.
    261 
    262 This suggests that the performance issue in Raise Other is just an
    263 optimization not being applied. Later versions of gcc may be able to
    264 optimize this case further, at least down to the half of \Cpp mark.
    265 A \CFA compiler that directly produced assembly could do even better as it
    266 would not have to work across some of \CFA's current abstractions, like
    267 the try terminate function.
    268 
    269 Resumption exception handling is also incredibly fast. Often an order of
    270 magnitude or two better than the best termination speed.
    271 There is a simple explination for this; traversing a linked list is much   
    272 faster than examining and unwinding the stack. When resumption does not do as
    273 well its when more try statements are used per raise. Updating the interal
    274 linked list is not very expencive but it does add up.
    275 
    276 The relative speed of the Match All and Match None tests (within each
    277 language) can also show the effectiveness conditional matching as compared
    278 to catch and rethrow.
    279 \begin{itemize}[nosep]
    280 \item
    281 Java and Python get similar values in both tests.
    282 Between the interperated code, a higher level repersentation of the call
    283 stack and exception reuse it it is possible the cost for a second
    284 throw can be folded into the first.
    285 % Is this due to optimization?
    286 \item
    287 Both types of \CFA are slighly slower if there is not a match.
    288 For termination this likely comes from unwinding a bit more stack through
    289 libunwind instead of executing the code normally.
    290 For resumption there is extra work in traversing more of the list and running
    291 more checks for a matching exceptions.
    292 % Resumption is a bit high for that but this is my best theory.
    293 \item
    294 Then there is \Cpp, which takes 2--3 times longer to catch and rethrow vs.
    295 just the catch. This is very high, but it does have to repeat the same
    296 process of unwinding the stack and may have to parse the LSDA of the function
    297 with the catch and rethrow twice, once before the catch and once after the
    298 rethrow.
    299 % I spent a long time thinking of what could push it over twice, this is all
    300 % I have to explain it.
    301 \end{itemize}
    302 The difference in relative performance does show that there are savings to
    303 be made by performing the check without catching the exception.
     288\end{table}
     289
     290\begin{table}[htb]
     291\centering
     292\small
     293\caption{Resumption/Fixup Routine Comparison (sec)}
     294\label{t:PerformanceFixupRoutines}
     295\setlength{\tabcolsep}{5pt}
     296\begin{tabular}{|r|*{2}{|r r r r r|}}
     297\hline
     298            & \multicolumn{5}{c||}{AMD}     & \multicolumn{5}{c|}{ARM}  \\
     299\cline{2-11}
     300N\hspace{8pt}       & \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} &
     301              \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\
     302\hline
     303Resume Empty (10M)  & 3.8  & 3.5  & 14.7  & 2.3   & 176.1 & 0.3  & 0.1  & 8.9   & 1.2   & 119.9 \\
     304%Resume Other (10M)  & 4.0* & 0.1* & 21.9  & 6.2   & 381.0 & 0.3* & 0.1* & 13.2  & 5.0   & 290.7 \\
     305\hline
     306\end{tabular}
     307\end{table}
     308
     309% Now discuss the results in the tables.
     310One result not directly related to \CFA but important to keep in mind is that,
     311for exceptions the standard intuition about which languages should go
     312faster often does not hold.
     313For example, there are a few cases where Python out-performs
     314\CFA, \Cpp and Java.
     315The most likely explanation is that, since exceptions
     316are rarely considered to be the common case, the more optimized languages
     317make that case expensive to improve other cases.
     318In addition, languages with high-level representations have a much
     319easier time scanning the stack as there is less to decode.
     320
     321As stated,
     322the performance tests are not attempting to show \CFA has a new competitive
     323way of implementing exception handling.
     324The only performance requirement is to insure the \CFA EHM has reasonable
     325performance for prototyping.
     326Although that may be hard to exactly quantify, we believe it has succeeded
     327in that regard.
     328Details on the different test cases follow.
     329
     330\begin{description}
     331\item[Empty Traversal]
     332\CFA is slower than \Cpp, but is still faster than the other languages
     333and closer to \Cpp than other languages.
     334This is to be expected as \CFA is closer to \Cpp than the other languages.
     335
     336\item[D'tor Traversal]
     337Running destructors causes huge slowdown in every language that supports
     338them. \CFA has a higher proportionate slowdown but it is similar to \Cpp's.
     339Considering the amount of work done in destructors is so low the cost
     340likely comes from the change of context required to do that work.
     341
     342\item[Finally Traversal]
     343Speed is similar to Empty Traversal in all languages that support finally
     344clauses. Only Python seems to have a larger than random noise change in
     345its run-time and it is still not large.
     346Despite the similarity between finally clauses and destructors,
     347finally clauses seem to avoid the spike in run-time destructors have.
     348Possibly some optimization removes the cost of changing contexts.
     349\todo{OK, I think the finally clause may have been optimized out.}
     350
     351\item[Other Traversal]
     352For \Cpp, stopping to check if a handler applies seems to be about as
     353expensive as stopping to run a destructor.
     354This results in a significant jump.
     355
     356Other languages experiance a small increase in run-time.
     357The small increase likely comes from running the checks,
     358but they could avoid the spike by not having the same kind of overhead for
     359switching to the check's context.
     360
     361\todo{Could revist Other Traversal, after Finally Traversal.}
     362
     363\item[Cross Handler]
     364Here \CFA falls behind \Cpp by a much more significant margin.
     365This is likely due to the fact \CFA has to insert two extra function
     366calls while \Cpp doesn't have to do execute any other instructions.
     367Python is much further behind.
     368
     369\item[Cross Finally]
     370\CFA's performance now matches \Cpp's from Cross Handler.
     371If the code from the finally clause is being inlined,
     372which is just a asm comment, than there are no additional instructions
     373to execute again when exiting the try statement normally.
     374
     375\item[Conditional Match]
     376Both of the conditional matching tests can be considered on their own,
     377however for evaluating the value of conditional matching itself the
     378comparison of the two sets of results is useful.
     379Consider the massive jump in run-time for \Cpp going from match all to match
     380none, which none of the other languages have.
     381Some strange interaction is causing run-time to more than double for doing
     382twice as many raises.
     383Java and Python avoid this problem and have similar run-time for both tests,
     384possibly through resource reuse or their program representation.
     385However \CFA is built like \Cpp and avoids the problem as well, this matches
     386the pattern of the conditional match which makes the two execution paths
     387much more similar.
     388
     389\end{description}
     390
     391Moving on to resumption there is one general note,
     392resumption is \textit{fast}, the only test where it fell
     393behind termination is Cross Handler.
     394In every other case, the number of iterations had to be increased by a
     395factor of 10 to get the run-time in an approprate range
     396and in some cases resumption still took less time.
     397
     398% I tried \paragraph and \subparagraph, maybe if I could adjust spacing
     399% between paragraphs those would work.
     400\begin{description}
     401\item[Empty Traversal]
     402See above for the general speed-up notes.
     403This result is not surprising as resumption's link list approach
     404means that traversing over stack frames without a resumption handler is
     405$O(1)$.
     406
     407\item[D'tor Traversal]
     408Resumption does have the same spike in run-time that termination has.
     409The run-time is actually very similar to Finally Traversal.
     410As resumption does not unwind the stack both destructors and finally
     411clauses are run while walking down the stack normally.
     412So it follows their performance is similar.
     413
     414\item[Finally Traversal]
     415The increase in run-time fromm Empty Traversal (once adjusted for
     416the number of iterations) roughly the same as for termination.
     417This suggests that the
     418
     419\item[Other Traversal]
     420Traversing across handlers reduces resumption's advantage as it actually
     421has to stop and check each one.
     422Resumption still came out ahead (adjusting for iterations) but by much less
     423than the other cases.
     424
     425\item[Cross Handler]
     426The only test case where resumption could not keep up with termination,
     427although the difference is not as significant as many other cases.
     428It is simply a matter of where the costs come from. Even if \CFA termination
     429is not ``zero-cost" passing through an empty function still seems to be
     430cheaper than updating global values.
     431
     432\item[Conditional Match]
     433Resumption shows a slight slowdown if the exception is not matched
     434by the first handler, which follows from the fact the second handler now has
     435to be checked. However the difference is not large.
     436
     437\end{description}
     438
     439Finally are the results of the resumption/fixup routine comparison.
     440These results are surprisingly varied, it is possible that creating a closure
     441has more to do with performance than passing the argument through layers of
     442calls.
     443Even with 100 stack frames though, resumption is only about as fast as
     444manually passing a fixup routine.
     445So there is a cost for the additional power and flexibility exceptions
     446provide.
Note: See TracChangeset for help on using the changeset viewer.