Changeset 0477127


Ignore:
Timestamp:
Aug 30, 2021, 6:24:40 PM (3 years ago)
Author:
Andrew Beach <ajbeach@…>
Branches:
ADT, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, pthread-emulation, qualifiedEnum
Children:
f93d7fc
Parents:
7737c29
Message:

Andrew MMath: More updates to the performance chapter.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/andrew_beach_MMath/performance.tex

    r7737c29 r0477127  
    1111Tests were run in \CFA, C++, Java and Python.
    1212In addition there are two sets of tests for \CFA,
    13 one for termination and once with resumption.
     13one with termination and one with resumption.
    1414
    1515C++ is the most comparable language because both it and \CFA use the same
     
    2121but otherwise \Cpp should have a significant advantage.
    2222
    23 Java a popular language with similar termination semantics, but
    24 it is implemented in a very different environment, a virtual machine with
     23Java, a popular language with similar termination semantics,
     24is implemented in a very different environment, a virtual machine with
    2525garbage collection.
    2626It also implements the finally clause on try blocks allowing for a direct
     
    3838seem to be notable only for having resumption.
    3939Instead, resumption is compared to its simulation in other programming
    40 languages: fixup functions that are explicity passed into a function.
     40languages: fixup functions that are explicitly passed into a function.
    4141
    4242All tests are run inside a main loop that repeatedly performs a test.
    4343This approach avoids start-up or tear-down time from
    4444affecting the timing results.
    45 The number of times the loop is run is configurable from the command line,
     45The number of times the loop is run is configurable from the command line;
    4646the number used in the timing runs is given with the results per test.
    47 Tests ran their main loop a million times.
    48 The Java tests runs the main loop 1000 times before
     47The Java tests run the main loop 1000 times before
    4948beginning the actual test to ``warm-up" the JVM.
    5049% All other languages are precompiled or interpreted.
     
    7271% \code{C++}{catch(...)}).
    7372
    74 When collecting data each test is run eleven times. The top three and bottom
     73When collecting data, each test is run eleven times. The top three and bottom
    7574three results are discarded and the remaining five values are averaged.
    76 The test are run with the latest (still pre-release) \CFA compiler was used,
     75The test are run with the latest (still pre-release) \CFA compiler,
    7776using gcc-10 as a backend.
    7877g++-10 is used for \Cpp.
     
    113112}
    114113\end{cfa}
    115 Other test cases have additional code around the recursive call add
     114Other test cases have additional code around the recursive call adding
    116115something besides simple stack frames to the stack.
    117 Note that both termination and resumption will have to traverse over
     116Note that both termination and resumption have to traverse over
    118117the stack but only termination has to unwind it.
    119118\begin{itemize}[nosep]
     
    124123\item Empty:
    125124The repeating function is empty except for the necessary control code.
    126 As other traversal tests add to this, so it is the baseline for the group
     125As other traversal tests add to this, it is the baseline for the group
    127126as the cost comes from traversing over and unwinding a stack frame
    128127that has no other interactions with the exception system.
     
    130129The repeating function creates an object with a destructor before calling
    131130itself.
    132 Comparing this to the empty test gives the time to traverse over and/or
     131Comparing this to the empty test gives the time to traverse over and
    133132unwind a destructor.
    134133\item Finally:
    135134The repeating function calls itself inside a try block with a finally clause
    136135attached.
    137 Comparing this to the empty test gives the time to traverse over and/or
     136Comparing this to the empty test gives the time to traverse over and
    138137unwind a finally clause.
    139138\item Other Handler:
    140139The repeating function calls itself inside a try block with a handler that
    141 will not match the raised exception, but is of the same kind of handler.
    142 This means that the EHM will have to check each handler, but will continue
    143 over all of the until it reaches the base of the stack.
    144 Comparing this to the empty test gives the time to traverse over and/or
     140does not match the raised exception, but is of the same kind of handler.
     141This means that the EHM has to check each handler, and continue
     142over all of them until it reaches the base of the stack.
     143Comparing this to the empty test gives the time to traverse over and
    145144unwind a handler.
    146145\end{itemize}
    147146
    148147\paragraph{Cross Try Statement}
    149 This group of tests measures the cost setting up exception handling if it is
     148This group of tests measures the cost for setting up exception handling,
     149if it is
    150150not used (because the exceptional case did not occur).
    151 Tests repeatedly cross (enter and leave, execute) a try statement but never
    152 preform a raise.
     151Tests repeatedly cross (enter, execute and leave) a try statement but never
     152perform a raise.
    153153\begin{itemize}[nosep]
    154154\item Handler:
     
    165165to handle that exception.
    166166
    167 There is the pattern shown in \CFA and \Cpp. Java and Python use the same
     167Here is the pattern shown in \CFA and \Cpp. Java and Python use the same
    168168pattern as \Cpp, but with their own syntax.
    169169
     
    229229going higher is better than going low) N, the number of iterations of the
    230230main loop in each test, is varied between tests. It is also given in the
    231 results and usually have a value in the millions.
     231results and has a value in the millions.
    232232
    233233An anomaly in some results came from \CFA's use of gcc nested functions.
    234234These nested functions are used to create closures that can access stack
    235235variables in their lexical scope.
    236 However, if they do so then they can cause the benchmark's run-time to
     236However, if they do so, then they can cause the benchmark's run-time to
    237237increase by an order of magnitude.
    238238The simplest solution is to make those values global variables instead
     
    301301              \multicolumn{1}{c}{Raise} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\
    302302\hline
    303 Resume Empty (10M)  & 3.8  & 3.5  & 14.7  & 2.3   & 176.1 & 0.3  & 0.1  & 8.9   & 1.2   & 119.9 \\
    304 %Resume Other (10M)  & 4.0* & 0.1* & 21.9  & 6.2   & 381.0 & 0.3* & 0.1* & 13.2  & 5.0   & 290.7 \\
     303Resume Empty (10M)  & 1.5 & 1.5 & 14.7 & 2.3 & 176.1  & 1.0 & 1.4 & 8.9 & 1.2 & 119.9 \\
    305304\hline
    306305\end{tabular}
     
    309308% Now discuss the results in the tables.
    310309One result not directly related to \CFA but important to keep in mind is that,
    311 for exceptions the standard intuition about which languages should go
     310for exceptions, the standard intuition about which languages should go
    312311faster often does not hold.
    313312For example, there are a few cases where Python out-performs
    314313\CFA, \Cpp and Java.
     314\todo{Make sure there are still cases where Python wins.}
    315315The most likely explanation is that, since exceptions
    316316are rarely considered to be the common case, the more optimized languages
     
    324324The only performance requirement is to insure the \CFA EHM has reasonable
    325325performance for prototyping.
    326 Although that may be hard to exactly quantify, we believe it has succeeded
     326Although that may be hard to exactly quantify, I believe it has succeeded
    327327in that regard.
    328328Details on the different test cases follow.
     329
     330\subsection{Termination \texorpdfstring{(\autoref{t:PerformanceTermination})}{}}
    329331
    330332\begin{description}
     
    332334\CFA is slower than \Cpp, but is still faster than the other languages
    333335and closer to \Cpp than other languages.
    334 This is to be expected as \CFA is closer to \Cpp than the other languages.
     336This result is to be expected,
     337as \CFA is closer to \Cpp than the other languages.
    335338
    336339\item[D'tor Traversal]
    337 Running destructors causes huge slowdown in every language that supports
     340Running destructors causes a huge slowdown in the two languages that support
    338341them. \CFA has a higher proportionate slowdown but it is similar to \Cpp's.
    339 Considering the amount of work done in destructors is so low the cost
    340 likely comes from the change of context required to do that work.
     342Considering the amount of work done in destructors is effectively zero
     343(an assembly comment), the cost
     344must come from the change of context required to run the destructor.
    341345
    342346\item[Finally Traversal]
    343 Speed is similar to Empty Traversal in all languages that support finally
     347Performance is similar to Empty Traversal in all languages that support finally
    344348clauses. Only Python seems to have a larger than random noise change in
    345349its run-time and it is still not large.
    346350Despite the similarity between finally clauses and destructors,
    347 finally clauses seem to avoid the spike in run-time destructors have.
     351finally clauses seem to avoid the spike that run-time destructors have.
    348352Possibly some optimization removes the cost of changing contexts.
    349353\todo{OK, I think the finally clause may have been optimized out.}
     
    354358This results in a significant jump.
    355359
    356 Other languages experiance a small increase in run-time.
     360Other languages experience a small increase in run-time.
    357361The small increase likely comes from running the checks,
    358362but they could avoid the spike by not having the same kind of overhead for
    359363switching to the check's context.
    360 
    361 \todo{Could revist Other Traversal, after Finally Traversal.}
     364\todo{Could revisit Other Traversal, after Finally Traversal.}
    362365
    363366\item[Cross Handler]
    364367Here \CFA falls behind \Cpp by a much more significant margin.
    365368This is likely due to the fact \CFA has to insert two extra function
    366 calls while \Cpp doesn't have to do execute any other instructions.
     369calls, while \Cpp does not have to do execute any other instructions.
    367370Python is much further behind.
    368371
     
    370373\CFA's performance now matches \Cpp's from Cross Handler.
    371374If the code from the finally clause is being inlined,
    372 which is just a asm comment, than there are no additional instructions
     375which is just an asm comment, than there are no additional instructions
    373376to execute again when exiting the try statement normally.
    374377
    375378\item[Conditional Match]
    376 Both of the conditional matching tests can be considered on their own,
    377 however for evaluating the value of conditional matching itself the
     379Both of the conditional matching tests can be considered on their own.
     380However for evaluating the value of conditional matching itself, the
    378381comparison of the two sets of results is useful.
    379382Consider the massive jump in run-time for \Cpp going from match all to match
     
    384387possibly through resource reuse or their program representation.
    385388However \CFA is built like \Cpp and avoids the problem as well, this matches
    386 the pattern of the conditional match which makes the two execution paths
    387 much more similar.
     389the pattern of the conditional match, which makes the two execution paths
     390very similar.
    388391
    389392\end{description}
    390393
    391 Moving on to resumption there is one general note,
    392 resumption is \textit{fast}, the only test where it fell
     394\subsection{Resumption \texorpdfstring{(\autoref{t:PerformanceResumption})}{}}
     395
     396Moving on to resumption, there is one general note,
     397resumption is \textit{fast}. The only test where it fell
    393398behind termination is Cross Handler.
    394399In every other case, the number of iterations had to be increased by a
    395 factor of 10 to get the run-time in an approprate range
     400factor of 10 to get the run-time in an appropriate range
    396401and in some cases resumption still took less time.
    397402
     
    401406\item[Empty Traversal]
    402407See above for the general speed-up notes.
    403 This result is not surprising as resumption's link list approach
     408This result is not surprising as resumption's linked-list approach
    404409means that traversing over stack frames without a resumption handler is
    405410$O(1)$.
     
    408413Resumption does have the same spike in run-time that termination has.
    409414The run-time is actually very similar to Finally Traversal.
    410 As resumption does not unwind the stack both destructors and finally
    411 clauses are run while walking down the stack normally.
     415As resumption does not unwind the stack, both destructors and finally
     416clauses are run while walking down the stack during the recursive returns.
    412417So it follows their performance is similar.
    413418
    414419\item[Finally Traversal]
    415 The increase in run-time fromm Empty Traversal (once adjusted for
    416 the number of iterations) roughly the same as for termination.
    417 This suggests that the
     420Same as D'tor Traversal,
     421except termination did not have a spike in run-time on this test case.
    418422
    419423\item[Other Traversal]
     
    426430The only test case where resumption could not keep up with termination,
    427431although the difference is not as significant as many other cases.
    428 It is simply a matter of where the costs come from. Even if \CFA termination
    429 is not ``zero-cost" passing through an empty function still seems to be
    430 cheaper than updating global values.
     432It is simply a matter of where the costs come from,
     433both termination and resumption have some work to set-up or tear-down a
     434handler. It just so happens that resumption's work is slightly slower.
    431435
    432436\item[Conditional Match]
     
    437441\end{description}
    438442
     443\subsection{Resumption/Fixup \texorpdfstring{(\autoref{t:PerformanceFixupRoutines})}{}}
     444
    439445Finally are the results of the resumption/fixup routine comparison.
    440 These results are surprisingly varied, it is possible that creating a closure
     446These results are surprisingly varied. It is possible that creating a closure
    441447has more to do with performance than passing the argument through layers of
    442448calls.
    443 Even with 100 stack frames though, resumption is only about as fast as
    444 manually passing a fixup routine.
    445 So there is a cost for the additional power and flexibility exceptions
    446 provide.
     449At 100 stack frames, resumption and manual fixup routines have similar
     450performance in \CFA.
     451More experiments could try to tease out the exact trade-offs,
     452but the prototype's only performance goal is to be reasonable.
     453It has already in that range, and \CFA's fixup routine simulation is
     454one of the faster simulations as well.
     455Plus exceptions add features and remove syntactic overhead,
     456so even at similar performance resumptions have advantages
     457over fixup routines.
Note: See TracChangeset for help on using the changeset viewer.