Context Navigation

← Previous Change
Next Change →

performance.tex

Timestamp:

Sep 27, 2021, 2:09:55 PM (4 years ago)

Author:

Thierry Delisle <tdelisle@…>

Branches:

ADT, ast-experimental, enum, forall-pointer-decay, master, pthread-emulation, qualifiedEnum

Children:

cc287800

Parents:

4e28d2e9 (diff), 056cbdb (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

File:

: 1 edited

doc/theses/andrew_beach_MMath/performance.tex (modified) (25 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/theses/andrew_beach_MMath/performance.tex

-              r4e28d2e9
+              r949339b
 Instead, the focus was to get the features working. The only performance
 requirement is to ensure the tests for correctness run in a reasonable
 amount of time. Hence, a few basic performance tests were performed to
+amount of time. Hence, only a few basic performance tests were performed to
 check this requirement.
 …
 one with termination and one with resumption.
 C++ is the most comparable language because both it and \CFA use the same
+GCC C++ is the most comparable language because both it and \CFA use the same
 framework, libunwind.
 In fact, the comparison is almost entirely in quality of implementation.
 …
 resumption exceptions. Even the older programming languages with resumption
 seem to be notable only for having resumption.
+On the other hand, the functional equivalents to resumption are too new.
+There does not seem to be any standard implementations in well-known
+languages; so far, they seem confined to extensions and research languages.
+% There was some maybe interesting comparison to an OCaml extension
+% but I'm not sure how to get that working if it is interesting.
 Instead, resumption is compared to its simulation in other programming
 languages: fixup functions that are explicitly passed into a function.
 …
 the number used in the timing runs is given with the results per test.
 The Java tests run the main loop 1000 times before
 beginning the actual test to ``warm-up" the JVM.
+beginning the actual test to ``warm up" the JVM.
 % All other languages are precompiled or interpreted.
 …
 unhandled exceptions in \Cpp and Java as that would cause the process to
 terminate.
 Luckily, performance on the ``give-up and kill the process" path is not
+Luckily, performance on the ``give up and kill the process" path is not
 critical.
 …
 using gcc-10 10.3.0 as a backend.
 g++-10 10.3.0 is used for \Cpp.
 Java tests are complied and run with version 11.0.11.
 Python used version 3.8.10.
+Java tests are complied and run with Oracle OpenJDK version 11.0.11.
+Python used CPython version 3.8.10.
 The machines used to run the tests are:
 \begin{itemize}[nosep]
 …
       \lstinline{@} 2.5 GHz running Linux v5.11.0-25
 \end{itemize}
 Representing the two major families of hardware architecture.
+These represent the two major families of hardware architecture.
 \section{Tests}
 …
 \paragraph{Stack Traversal}
 This group measures the cost of traversing the stack,
+This group of tests measures the cost of traversing the stack
 (and in termination, unwinding it).
 Inside the main loop is a call to a recursive function.
 …
 This group of tests measures the cost for setting up exception handling,
 if it is
 not used (because the exceptional case did not occur).
+not used because the exceptional case did not occur.
 Tests repeatedly cross (enter, execute and leave) a try statement but never
 perform a raise.
 …
 for that language and the result is marked N/A.
 There are also cases where the feature is supported but measuring its
 cost is impossible. This happened with Java, which uses a JIT that optimize
 away the tests and it cannot be stopped.\cite{Dice21}
+cost is impossible. This happened with Java, which uses a JIT that optimizes
+away the tests and cannot be stopped.\cite{Dice21}
 These tests are marked N/C.
 To get results in a consistent range (1 second to 1 minute is ideal,
 …
 results and has a value in the millions.
 An anomaly in some results came from \CFA's use of gcc nested functions.
+An anomaly in some results came from \CFA's use of GCC nested functions.
 These nested functions are used to create closures that can access stack
 variables in their lexical scope.
 However, if they do so, then they can cause the benchmark's run-time to
+However, if they do so, then they can cause the benchmark's run time to
 increase by an order of magnitude.
 The simplest solution is to make those values global variables instead
 of function local variables.
+of function-local variables.
 % Do we know if editing a global inside nested function is a problem?
 Tests that had to be modified to avoid this problem have been marked
 …
                          \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\
 \hline
 Empty Traversal (1M)   & 3.4   & 2.8   & 18.3  & 23.4      & 3.7   & 3.2   & 15.5  & 14.8  \\
 D'tor Traversal (1M)   & 48.4  & 23.6  & N/A   & N/A       & 64.2  & 29.0  & N/A   & N/A   \\
 Finally Traversal (1M) & 3.4*  & N/A   & 17.9  & 29.0      & 4.1*  & N/A   & 15.6  & 19.0  \\
 Other Traversal (1M)   & 3.6*  & 23.2  & 18.2  & 32.7      & 4.0*  & 24.5  & 15.5  & 21.4  \\
 Cross Handler (100M)   & 6.0   & 0.9   & N/C   & 37.4      & 10.0  & 0.8   & N/C   & 32.2  \\
 Cross Finally (100M)   & 0.9   & N/A   & N/C   & 44.1      & 0.8   & N/A   & N/C   & 37.3  \\
 Match All (10M)        & 32.9  & 20.7  & 13.4  & 4.9       & 36.2  & 24.5  & 12.0  & 3.1   \\
 Match None (10M)       & 32.7  & 50.3  & 11.0  & 5.1       & 36.3  & 71.9  & 12.3  & 4.2   \\
+Empty Traversal (1M)   & 23.0  & 9.6   & 17.6  & 23.4      & 30.6  & 13.6  & 15.5  & 14.7  \\
+D'tor Traversal (1M)   & 48.1  & 23.5  & N/A   & N/A       & 64.2  & 29.2  & N/A   & N/A   \\
+Finally Traversal (1M) & 3.2*  & N/A   & 17.6  & 29.2      & 3.9*  & N/A   & 15.5  & 19.0  \\
+Other Traversal (1M)   & 3.3*  & 23.9  & 17.7  & 32.8      & 3.9*  & 24.5  & 15.5  & 21.6  \\
+Cross Handler (1B)     & 6.5   & 0.9   & N/C   & 38.0      & 9.6   & 0.8   & N/C   & 32.1  \\
+Cross Finally (1B)     & 0.8   & N/A   & N/C   & 44.6      & 0.6   & N/A   & N/C   & 37.3  \\
+Match All (10M)        & 30.5  & 20.6  & 11.2  & 3.9       & 36.9  & 24.6  & 10.7  & 3.1   \\
+Match None (10M)       & 30.6  & 50.9  & 11.2  & 5.0       & 36.9  & 71.9  & 10.7  & 4.1   \\
 \hline
 \end{tabular}
 …
                         & AMD     & ARM  \\
 \hline
 Empty Traversal (10M)   & 0.2     & 0.3  \\
+Empty Traversal (10M)   & 1.4     & 1.2  \\
 D'tor Traversal (10M)   & 1.8     & 1.0  \\
 Finally Traversal (10M) & 1.7     & 1.0  \\
 Other Traversal (10M)   & 22.6    & 25.9 \\
 Cross Handler (100M)    & 8.4     & 11.9 \\
+Finally Traversal (10M) & 1.8     & 1.0  \\
+Other Traversal (10M)   & 22.6    & 25.8 \\
+Cross Handler (1B)      & 9.0     & 11.9 \\
 Match All (100M)        & 2.3     & 3.2  \\
 Match None (100M)       & 2.9     & 3.9  \\
+Match None (100M)       & 3.0     & 3.8  \\
 \hline
 \end{tabular}
 …
               \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\
 \hline
 Resume Empty (10M)  & 1.5 & 1.5 & 14.7 & 2.3 & 176.1  & 1.0 & 1.4 & 8.9 & 1.2 & 119.9 \\
+Resume Empty (10M)  & 1.4 & 1.4 & 15.4 & 2.3 & 178.0  & 1.2 & 1.2 & 8.9 & 1.2 & 118.4 \\
 \hline
 \end{tabular}
 …
 \CFA, \Cpp and Java.
 % To be exact, the Match All and Match None cases.
+The most likely explanation is that, since exceptions
+are rarely considered to be the common case, the more optimized languages
+make that case expensive to improve other cases.
+The most likely explanation is that
+the generally faster languages have made ``common cases fast" at the expense
+of the rarer cases. Since exceptions are considered rare, they are made
+expensive to help speed up common actions, such as entering and leaving try
+statements.
+Python, on the other hand, while generally slower than the other languages,
+uses exceptions more and has not sacrificed their performance.
 In addition, languages with high-level representations have a much
 easier time scanning the stack as there is less to decode.
 …
 Performance is similar to Empty Traversal in all languages that support finally
 clauses. Only Python seems to have a larger than random noise change in
 its run-time and it is still not large.
+its run time and it is still not large.
 Despite the similarity between finally clauses and destructors,
 finally clauses seem to avoid the spike that run-time destructors have.
+finally clauses seem to avoid the spike that run time destructors have.
 Possibly some optimization removes the cost of changing contexts.
 …
 This results in a significant jump.
 Other languages experience a small increase in run-time.
+Other languages experience a small increase in run time.
 The small increase likely comes from running the checks,
 but they could avoid the spike by not having the same kind of overhead for
 …
 \item[Cross Handler]
 Here \CFA falls behind \Cpp by a much more significant margin.
 This is likely due to the fact \CFA has to insert two extra function
 calls, while \Cpp does not have to do execute any other instructions.
+Here, \CFA falls behind \Cpp by a much more significant margin.
+This is likely due to the fact that \CFA has to insert two extra function
+calls, while \Cpp does not have to execute any other instructions.
 Python is much further behind.
 …
 \item[Conditional Match]
 Both of the conditional matching tests can be considered on their own.
 However for evaluating the value of conditional matching itself, the
+However, for evaluating the value of conditional matching itself, the
 comparison of the two sets of results is useful.
 Consider the massive jump in run-time for \Cpp going from match all to match
+Consider the massive jump in run time for \Cpp going from match all to match
 none, which none of the other languages have.
 Some strange interaction is causing run-time to more than double for doing
+Some strange interaction is causing run time to more than double for doing
 twice as many raises.
 Java and Python avoid this problem and have similar run-time for both tests,
+Java and Python avoid this problem and have similar run time for both tests,
 possibly through resource reuse or their program representation.
+However \CFA is built like \Cpp and avoids the problem as well, this matches
+However, \CFA is built like \Cpp, and avoids the problem as well.
+This matches
 the pattern of the conditional match, which makes the two execution paths
 very similar.
 …
 \subsection{Resumption \texorpdfstring{(\autoref{t:PerformanceResumption})}{}}
 Moving on to resumption, there is one general note,
+Moving on to resumption, there is one general note:
 resumption is \textit{fast}. The only test where it fell
 behind termination is Cross Handler.
 In every other case, the number of iterations had to be increased by a
 factor of 10 to get the run-time in an appropriate range
+factor of 10 to get the run time in an appropriate range
 and in some cases resumption still took less time.
 …
 \item[D'tor Traversal]
 Resumption does have the same spike in run-time that termination has.
 The run-time is actually very similar to Finally Traversal.
+Resumption does have the same spike in run time that termination has.
+The run time is actually very similar to Finally Traversal.
 As resumption does not unwind the stack, both destructors and finally
 clauses are run while walking down the stack during the recursive returns.
 …
 \item[Finally Traversal]
 Same as D'tor Traversal,
 except termination did not have a spike in run-time on this test case.
+except termination did not have a spike in run time on this test case.
 \item[Other Traversal]
 …
 The only test case where resumption could not keep up with termination,
 although the difference is not as significant as many other cases.
 It is simply a matter of where the costs come from,
 both termination and resumption have some work to set-up or tear-down a
+It is simply a matter of where the costs come from:
+both termination and resumption have some work to set up or tear down a
 handler. It just so happens that resumption's work is slightly slower.
 …
 Resumption shows a slight slowdown if the exception is not matched
 by the first handler, which follows from the fact the second handler now has
 to be checked. However the difference is not large.
+to be checked. However, the difference is not large.
 \end{description}
 …
 More experiments could try to tease out the exact trade-offs,
 but the prototype's only performance goal is to be reasonable.
 It has already in that range, and \CFA's fixup routine simulation is
+It is already in that range, and \CFA's fixup routine simulation is
 one of the faster simulations as well.
 Plus exceptions add features and remove syntactic overhead,
 so even at similar performance resumptions have advantages
+Plus, exceptions add features and remove syntactic overhead,
+so even at similar performance, resumptions have advantages
 over fixup routines.

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 949339b for doc/theses/andrew_beach_MMath/performance.tex

Legend:

doc/theses/andrew_beach_MMath/performance.tex

Download in other formats: