Ignore:
Timestamp:
Sep 27, 2021, 2:09:55 PM (4 years ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
ADT, ast-experimental, enum, forall-pointer-decay, master, pthread-emulation, qualifiedEnum
Children:
cc287800
Parents:
4e28d2e9 (diff), 056cbdb (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/andrew_beach_MMath/performance.tex

    r4e28d2e9 r949339b  
    55Instead, the focus was to get the features working. The only performance
    66requirement is to ensure the tests for correctness run in a reasonable
    7 amount of time. Hence, a few basic performance tests were performed to
     7amount of time. Hence, only a few basic performance tests were performed to
    88check this requirement.
    99
     
    1313one with termination and one with resumption.
    1414
    15 C++ is the most comparable language because both it and \CFA use the same
     15GCC C++ is the most comparable language because both it and \CFA use the same
    1616framework, libunwind.
    1717In fact, the comparison is almost entirely in quality of implementation.
     
    3737resumption exceptions. Even the older programming languages with resumption
    3838seem to be notable only for having resumption.
     39On the other hand, the functional equivalents to resumption are too new.
     40There does not seem to be any standard implementations in well-known
     41languages; so far, they seem confined to extensions and research languages.
     42% There was some maybe interesting comparison to an OCaml extension
     43% but I'm not sure how to get that working if it is interesting.
    3944Instead, resumption is compared to its simulation in other programming
    4045languages: fixup functions that are explicitly passed into a function.
     
    4651the number used in the timing runs is given with the results per test.
    4752The Java tests run the main loop 1000 times before
    48 beginning the actual test to ``warm-up" the JVM.
     53beginning the actual test to ``warm up" the JVM.
    4954% All other languages are precompiled or interpreted.
    5055
     
    5459unhandled exceptions in \Cpp and Java as that would cause the process to
    5560terminate.
    56 Luckily, performance on the ``give-up and kill the process" path is not
     61Luckily, performance on the ``give up and kill the process" path is not
    5762critical.
    5863
     
    7681using gcc-10 10.3.0 as a backend.
    7782g++-10 10.3.0 is used for \Cpp.
    78 Java tests are complied and run with version 11.0.11.
    79 Python used version 3.8.10.
     83Java tests are complied and run with Oracle OpenJDK version 11.0.11.
     84Python used CPython version 3.8.10.
    8085The machines used to run the tests are:
    8186\begin{itemize}[nosep]
     
    8590      \lstinline{@} 2.5 GHz running Linux v5.11.0-25
    8691\end{itemize}
    87 Representing the two major families of hardware architecture.
     92These represent the two major families of hardware architecture.
    8893
    8994\section{Tests}
     
    9398
    9499\paragraph{Stack Traversal}
    95 This group measures the cost of traversing the stack,
     100This group of tests measures the cost of traversing the stack
    96101(and in termination, unwinding it).
    97102Inside the main loop is a call to a recursive function.
     
    147152This group of tests measures the cost for setting up exception handling,
    148153if it is
    149 not used (because the exceptional case did not occur).
     154not used because the exceptional case did not occur.
    150155Tests repeatedly cross (enter, execute and leave) a try statement but never
    151156perform a raise.
     
    222227for that language and the result is marked N/A.
    223228There are also cases where the feature is supported but measuring its
    224 cost is impossible. This happened with Java, which uses a JIT that optimize
    225 away the tests and it cannot be stopped.\cite{Dice21}
     229cost is impossible. This happened with Java, which uses a JIT that optimizes
     230away the tests and cannot be stopped.\cite{Dice21}
    226231These tests are marked N/C.
    227232To get results in a consistent range (1 second to 1 minute is ideal,
     
    230235results and has a value in the millions.
    231236
    232 An anomaly in some results came from \CFA's use of gcc nested functions.
     237An anomaly in some results came from \CFA's use of GCC nested functions.
    233238These nested functions are used to create closures that can access stack
    234239variables in their lexical scope.
    235 However, if they do so, then they can cause the benchmark's run-time to
     240However, if they do so, then they can cause the benchmark's run time to
    236241increase by an order of magnitude.
    237242The simplest solution is to make those values global variables instead
    238 of function local variables.
     243of function-local variables.
    239244% Do we know if editing a global inside nested function is a problem?
    240245Tests that had to be modified to avoid this problem have been marked
     
    255260                         \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\
    256261\hline
    257 Empty Traversal (1M)   & 3.4   & 2.8   & 18.3  & 23.4      & 3.7   & 3.2   & 15.5  & 14.8  \\
    258 D'tor Traversal (1M)   & 48.4  & 23.6  & N/A   & N/A       & 64.2  & 29.0  & N/A   & N/A   \\
    259 Finally Traversal (1M) & 3.4*  & N/A   & 17.9  & 29.0      & 4.1*  & N/A   & 15.6  & 19.0  \\
    260 Other Traversal (1M)   & 3.6*  & 23.2  & 18.2  & 32.7      & 4.0*  & 24.5  & 15.5  & 21.4  \\
    261 Cross Handler (100M)   & 6.0   & 0.9   & N/C   & 37.4      & 10.0  & 0.8   & N/C   & 32.2  \\
    262 Cross Finally (100M)   & 0.9   & N/A   & N/C   & 44.1      & 0.8   & N/A   & N/C   & 37.3  \\
    263 Match All (10M)        & 32.9  & 20.7  & 13.4  & 4.9       & 36.2  & 24.5  & 12.0  & 3.1   \\
    264 Match None (10M)       & 32.7  & 50.3  & 11.0  & 5.1       & 36.3  & 71.9  & 12.3  & 4.2   \\
     262Empty Traversal (1M)   & 23.0  & 9.6   & 17.6  & 23.4      & 30.6  & 13.6  & 15.5  & 14.7  \\
     263D'tor Traversal (1M)   & 48.1  & 23.5  & N/A   & N/A       & 64.2  & 29.2  & N/A   & N/A   \\
     264Finally Traversal (1M) & 3.2*  & N/A   & 17.6  & 29.2      & 3.9*  & N/A   & 15.5  & 19.0  \\
     265Other Traversal (1M)   & 3.3*  & 23.9  & 17.7  & 32.8      & 3.9*  & 24.5  & 15.5  & 21.6  \\
     266Cross Handler (1B)     & 6.5   & 0.9   & N/C   & 38.0      & 9.6   & 0.8   & N/C   & 32.1  \\
     267Cross Finally (1B)     & 0.8   & N/A   & N/C   & 44.6      & 0.6   & N/A   & N/C   & 37.3  \\
     268Match All (10M)        & 30.5  & 20.6  & 11.2  & 3.9       & 36.9  & 24.6  & 10.7  & 3.1   \\
     269Match None (10M)       & 30.6  & 50.9  & 11.2  & 5.0       & 36.9  & 71.9  & 10.7  & 4.1   \\
    265270\hline
    266271\end{tabular}
     
    276281                        & AMD     & ARM  \\
    277282\hline
    278 Empty Traversal (10M)   & 0.2     & 0.3  \\
     283Empty Traversal (10M)   & 1.4     & 1.2  \\
    279284D'tor Traversal (10M)   & 1.8     & 1.0  \\
    280 Finally Traversal (10M) & 1.7     & 1.0  \\
    281 Other Traversal (10M)   & 22.6    & 25.9 \\
    282 Cross Handler (100M)    & 8.4     & 11.9 \\
     285Finally Traversal (10M) & 1.8     & 1.0  \\
     286Other Traversal (10M)   & 22.6    & 25.8 \\
     287Cross Handler (1B)      & 9.0     & 11.9 \\
    283288Match All (100M)        & 2.3     & 3.2  \\
    284 Match None (100M)       & 2.9     & 3.9  \\
     289Match None (100M)       & 3.0     & 3.8  \\
    285290\hline
    286291\end{tabular}
     
    300305              \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\
    301306\hline
    302 Resume Empty (10M)  & 1.5 & 1.5 & 14.7 & 2.3 & 176.1  & 1.0 & 1.4 & 8.9 & 1.2 & 119.9 \\
     307Resume Empty (10M)  & 1.4 & 1.4 & 15.4 & 2.3 & 178.0  & 1.2 & 1.2 & 8.9 & 1.2 & 118.4 \\
    303308\hline
    304309\end{tabular}
     
    312317\CFA, \Cpp and Java.
    313318% To be exact, the Match All and Match None cases.
    314 The most likely explanation is that, since exceptions
    315 are rarely considered to be the common case, the more optimized languages
    316 make that case expensive to improve other cases.
     319The most likely explanation is that
     320the generally faster languages have made ``common cases fast" at the expense
     321of the rarer cases. Since exceptions are considered rare, they are made
     322expensive to help speed up common actions, such as entering and leaving try
     323statements.
     324Python, on the other hand, while generally slower than the other languages,
     325uses exceptions more and has not sacrificed their performance.
    317326In addition, languages with high-level representations have a much
    318327easier time scanning the stack as there is less to decode.
     
    346355Performance is similar to Empty Traversal in all languages that support finally
    347356clauses. Only Python seems to have a larger than random noise change in
    348 its run-time and it is still not large.
     357its run time and it is still not large.
    349358Despite the similarity between finally clauses and destructors,
    350 finally clauses seem to avoid the spike that run-time destructors have.
     359finally clauses seem to avoid the spike that run time destructors have.
    351360Possibly some optimization removes the cost of changing contexts.
    352361
     
    356365This results in a significant jump.
    357366
    358 Other languages experience a small increase in run-time.
     367Other languages experience a small increase in run time.
    359368The small increase likely comes from running the checks,
    360369but they could avoid the spike by not having the same kind of overhead for
     
    362371
    363372\item[Cross Handler]
    364 Here \CFA falls behind \Cpp by a much more significant margin.
    365 This is likely due to the fact \CFA has to insert two extra function
    366 calls, while \Cpp does not have to do execute any other instructions.
     373Here, \CFA falls behind \Cpp by a much more significant margin.
     374This is likely due to the fact that \CFA has to insert two extra function
     375calls, while \Cpp does not have to execute any other instructions.
    367376Python is much further behind.
    368377
     
    375384\item[Conditional Match]
    376385Both of the conditional matching tests can be considered on their own.
    377 However for evaluating the value of conditional matching itself, the
     386However, for evaluating the value of conditional matching itself, the
    378387comparison of the two sets of results is useful.
    379 Consider the massive jump in run-time for \Cpp going from match all to match
     388Consider the massive jump in run time for \Cpp going from match all to match
    380389none, which none of the other languages have.
    381 Some strange interaction is causing run-time to more than double for doing
     390Some strange interaction is causing run time to more than double for doing
    382391twice as many raises.
    383 Java and Python avoid this problem and have similar run-time for both tests,
     392Java and Python avoid this problem and have similar run time for both tests,
    384393possibly through resource reuse or their program representation.
    385 However \CFA is built like \Cpp and avoids the problem as well, this matches
     394However, \CFA is built like \Cpp, and avoids the problem as well.
     395This matches
    386396the pattern of the conditional match, which makes the two execution paths
    387397very similar.
     
    391401\subsection{Resumption \texorpdfstring{(\autoref{t:PerformanceResumption})}{}}
    392402
    393 Moving on to resumption, there is one general note,
     403Moving on to resumption, there is one general note:
    394404resumption is \textit{fast}. The only test where it fell
    395405behind termination is Cross Handler.
    396406In every other case, the number of iterations had to be increased by a
    397 factor of 10 to get the run-time in an appropriate range
     407factor of 10 to get the run time in an appropriate range
    398408and in some cases resumption still took less time.
    399409
     
    408418
    409419\item[D'tor Traversal]
    410 Resumption does have the same spike in run-time that termination has.
    411 The run-time is actually very similar to Finally Traversal.
     420Resumption does have the same spike in run time that termination has.
     421The run time is actually very similar to Finally Traversal.
    412422As resumption does not unwind the stack, both destructors and finally
    413423clauses are run while walking down the stack during the recursive returns.
     
    416426\item[Finally Traversal]
    417427Same as D'tor Traversal,
    418 except termination did not have a spike in run-time on this test case.
     428except termination did not have a spike in run time on this test case.
    419429
    420430\item[Other Traversal]
     
    427437The only test case where resumption could not keep up with termination,
    428438although the difference is not as significant as many other cases.
    429 It is simply a matter of where the costs come from,
    430 both termination and resumption have some work to set-up or tear-down a
     439It is simply a matter of where the costs come from:
     440both termination and resumption have some work to set up or tear down a
    431441handler. It just so happens that resumption's work is slightly slower.
    432442
     
    434444Resumption shows a slight slowdown if the exception is not matched
    435445by the first handler, which follows from the fact the second handler now has
    436 to be checked. However the difference is not large.
     446to be checked. However, the difference is not large.
    437447
    438448\end{description}
     
    448458More experiments could try to tease out the exact trade-offs,
    449459but the prototype's only performance goal is to be reasonable.
    450 It has already in that range, and \CFA's fixup routine simulation is
     460It is already in that range, and \CFA's fixup routine simulation is
    451461one of the faster simulations as well.
    452 Plus exceptions add features and remove syntactic overhead,
    453 so even at similar performance resumptions have advantages
     462Plus, exceptions add features and remove syntactic overhead,
     463so even at similar performance, resumptions have advantages
    454464over fixup routines.
Note: See TracChangeset for help on using the changeset viewer.