Changeset 7737c29


Ignore:
Timestamp:
Aug 30, 2021, 11:51:36 AM (5 months ago)
Author:
Andrew Beach <ajbeach@…>
Branches:
jacob/cs343-translation, master
Children:
0477127
Parents:
9cb6514
Message:

Revert "proofread chapter performance.tex", updates have been saved.

This reverts commit 9cb6514c72783e65308b91bda966d51674a21e8a.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/andrew_beach_MMath/performance.tex

    r9cb6514 r7737c29  
    1111Tests were run in \CFA, C++, Java and Python.
    1212In addition there are two sets of tests for \CFA,
    13 one with termination and one with resumption.
     13one for termination and once with resumption.
    1414
    1515C++ is the most comparable language because both it and \CFA use the same
     
    2121but otherwise \Cpp should have a significant advantage.
    2222
    23 Java, a popular language with similar termination semantics, but
     23Java a popular language with similar termination semantics, but
    2424it is implemented in a very different environment, a virtual machine with
    2525garbage collection.
     
    3838seem to be notable only for having resumption.
    3939Instead, resumption is compared to its simulation in other programming
    40 languages: fixup functions that are explicitly passed into a function.
     40languages: fixup functions that are explicity passed into a function.
    4141
    4242All tests are run inside a main loop that repeatedly performs a test.
    4343This approach avoids start-up or tear-down time from
    4444affecting the timing results.
    45 The number of times the loop is run is configurable from the command line;
     45The number of times the loop is run is configurable from the command line,
    4646the number used in the timing runs is given with the results per test.
    47 % Tests ran their main loop a million times.
    48 The Java tests run the main loop 1000 times before
     47Tests ran their main loop a million times.
     48The Java tests runs the main loop 1000 times before
    4949beginning the actual test to ``warm-up" the JVM.
    5050% All other languages are precompiled or interpreted.
     
    7272% \code{C++}{catch(...)}).
    7373
    74 When collecting data, each test is run eleven times. The top three and bottom
     74When collecting data each test is run eleven times. The top three and bottom
    7575three results are discarded and the remaining five values are averaged.
    76 The test are run with the latest (still pre-release) \CFA compiler,
     76The test are run with the latest (still pre-release) \CFA compiler was used,
    7777using gcc-10 as a backend.
    7878g++-10 is used for \Cpp.
     
    113113}
    114114\end{cfa}
    115 Other test cases have additional code around the recursive call adding
     115Other test cases have additional code around the recursive call add
    116116something besides simple stack frames to the stack.
    117 Note that both termination and resumption have to traverse over
     117Note that both termination and resumption will have to traverse over
    118118the stack but only termination has to unwind it.
    119119\begin{itemize}[nosep]
     
    124124\item Empty:
    125125The repeating function is empty except for the necessary control code.
    126 As other traversal tests add to this, it is the baseline for the group
     126As other traversal tests add to this, so it is the baseline for the group
    127127as the cost comes from traversing over and unwinding a stack frame
    128128that has no other interactions with the exception system.
     
    130130The repeating function creates an object with a destructor before calling
    131131itself.
    132 Comparing this to the empty test gives the time to traverse over and
     132Comparing this to the empty test gives the time to traverse over and/or
    133133unwind a destructor.
    134134\item Finally:
    135135The repeating function calls itself inside a try block with a finally clause
    136136attached.
    137 Comparing this to the empty test gives the time to traverse over and
     137Comparing this to the empty test gives the time to traverse over and/or
    138138unwind a finally clause.
    139139\item Other Handler:
    140140The repeating function calls itself inside a try block with a handler that
    141 does not match the raised exception, but is of the same kind of handler.
    142 This means that the EHM has to check each handler, but continue
    143 over all of them until it reaches the base of the stack.
    144 Comparing this to the empty test gives the time to traverse over and
     141will not match the raised exception, but is of the same kind of handler.
     142This means that the EHM will have to check each handler, but will continue
     143over all of the until it reaches the base of the stack.
     144Comparing this to the empty test gives the time to traverse over and/or
    145145unwind a handler.
    146146\end{itemize}
    147147
    148148\paragraph{Cross Try Statement}
    149 This group of tests measures the cost for setting up exception handling, if it is
     149This group of tests measures the cost setting up exception handling if it is
    150150not used (because the exceptional case did not occur).
    151 Tests repeatedly cross (enter, execute, and leave) a try statement but never
    152 perform a raise.
     151Tests repeatedly cross (enter and leave, execute) a try statement but never
     152preform a raise.
    153153\begin{itemize}[nosep]
    154154\item Handler:
     
    165165to handle that exception.
    166166
    167 Here is the pattern shown in \CFA and \Cpp. Java and Python use the same
     167There is the pattern shown in \CFA and \Cpp. Java and Python use the same
    168168pattern as \Cpp, but with their own syntax.
    169169
     
    229229going higher is better than going low) N, the number of iterations of the
    230230main loop in each test, is varied between tests. It is also given in the
    231 results and has a value in the millions.
     231results and usually have a value in the millions.
    232232
    233233An anomaly in some results came from \CFA's use of gcc nested functions.
    234234These nested functions are used to create closures that can access stack
    235235variables in their lexical scope.
    236 However, if they do so, then they can cause the benchmark's run-time to
     236However, if they do so then they can cause the benchmark's run-time to
    237237increase by an order of magnitude.
    238238The simplest solution is to make those values global variables instead
     
    309309% Now discuss the results in the tables.
    310310One result not directly related to \CFA but important to keep in mind is that,
    311 for exceptions, the standard intuition about which languages should go
     311for exceptions the standard intuition about which languages should go
    312312faster often does not hold.
    313313For example, there are a few cases where Python out-performs
     
    324324The only performance requirement is to insure the \CFA EHM has reasonable
    325325performance for prototyping.
    326 Although that may be hard to exactly quantify, I believe it has succeeded
     326Although that may be hard to exactly quantify, we believe it has succeeded
    327327in that regard.
    328328Details on the different test cases follow.
    329 
    330 \subsection{Termination, \autoref{t:PerformanceTermination}}
    331329
    332330\begin{description}
     
    334332\CFA is slower than \Cpp, but is still faster than the other languages
    335333and closer to \Cpp than other languages.
    336 This result is to be expected as \CFA is closer to \Cpp than the other languages.
     334This is to be expected as \CFA is closer to \Cpp than the other languages.
    337335
    338336\item[D'tor Traversal]
    339 Running destructors causes a huge slowdown in the two languages that support
     337Running destructors causes huge slowdown in every language that supports
    340338them. \CFA has a higher proportionate slowdown but it is similar to \Cpp's.
    341 Considering the amount of work done in destructors is virtually zero (asm instruction), the cost
    342 must come from the change of context required to trigger the destructor.
     339Considering the amount of work done in destructors is so low the cost
     340likely comes from the change of context required to do that work.
    343341
    344342\item[Finally Traversal]
    345 Performance is similar to Empty Traversal in all languages that support finally
     343Speed is similar to Empty Traversal in all languages that support finally
    346344clauses. Only Python seems to have a larger than random noise change in
    347345its run-time and it is still not large.
    348346Despite the similarity between finally clauses and destructors,
    349 finally clauses seem to avoid the spike that run-time destructors have.
     347finally clauses seem to avoid the spike in run-time destructors have.
    350348Possibly some optimization removes the cost of changing contexts.
    351349\todo{OK, I think the finally clause may have been optimized out.}
     
    356354This results in a significant jump.
    357355
    358 Other languages experience a small increase in run-time.
     356Other languages experiance a small increase in run-time.
    359357The small increase likely comes from running the checks,
    360358but they could avoid the spike by not having the same kind of overhead for
    361359switching to the check's context.
    362360
    363 \todo{Could revisit Other Traversal, after Finally Traversal.}
     361\todo{Could revist Other Traversal, after Finally Traversal.}
    364362
    365363\item[Cross Handler]
    366364Here \CFA falls behind \Cpp by a much more significant margin.
    367365This is likely due to the fact \CFA has to insert two extra function
    368 calls, while \Cpp does not have to execute any other instructions.
     366calls while \Cpp doesn't have to do execute any other instructions.
    369367Python is much further behind.
    370368
     
    372370\CFA's performance now matches \Cpp's from Cross Handler.
    373371If the code from the finally clause is being inlined,
    374 which is just an asm comment, than there are no additional instructions
     372which is just a asm comment, than there are no additional instructions
    375373to execute again when exiting the try statement normally.
    376374
    377375\item[Conditional Match]
    378 Both of the conditional matching tests can be considered on their own.
    379 However for evaluating the value of conditional matching itself, the
     376Both of the conditional matching tests can be considered on their own,
     377however for evaluating the value of conditional matching itself the
    380378comparison of the two sets of results is useful.
    381379Consider the massive jump in run-time for \Cpp going from match all to match
     
    386384possibly through resource reuse or their program representation.
    387385However \CFA is built like \Cpp and avoids the problem as well, this matches
    388 the pattern of the conditional match, which makes the two execution paths
    389 very similar.
     386the pattern of the conditional match which makes the two execution paths
     387much more similar.
    390388
    391389\end{description}
    392390
    393 \subsection{Resumption, \autoref{t:PerformanceResumption}}
    394 
    395 Moving on to resumption, there is one general note,
    396 resumption is \textit{fast}. The only test where it fell
     391Moving on to resumption there is one general note,
     392resumption is \textit{fast}, the only test where it fell
    397393behind termination is Cross Handler.
    398394In every other case, the number of iterations had to be increased by a
    399 factor of 10 to get the run-time in an appropriate range
     395factor of 10 to get the run-time in an approprate range
    400396and in some cases resumption still took less time.
    401397
     
    405401\item[Empty Traversal]
    406402See above for the general speed-up notes.
    407 This result is not surprising as resumption's linked-list approach
     403This result is not surprising as resumption's link list approach
    408404means that traversing over stack frames without a resumption handler is
    409405$O(1)$.
     
    412408Resumption does have the same spike in run-time that termination has.
    413409The run-time is actually very similar to Finally Traversal.
    414 As resumption does not unwind the stack, both destructors and finally
    415 clauses are run while walking down the stack during the recursion returns.
     410As resumption does not unwind the stack both destructors and finally
     411clauses are run while walking down the stack normally.
    416412So it follows their performance is similar.
    417413
    418414\item[Finally Traversal]
    419 % The increase in run-time from Empty Traversal (once adjusted for
    420 % the number of iterations) is roughly the same as for termination.
    421 % This suggests that the
    422 See D'tor Traversal discussion.
     415The increase in run-time fromm Empty Traversal (once adjusted for
     416the number of iterations) roughly the same as for termination.
     417This suggests that the
    423418
    424419\item[Other Traversal]
     
    431426The only test case where resumption could not keep up with termination,
    432427although the difference is not as significant as many other cases.
    433 It is simply a matter of where the costs come from. \PAB{What does this mean?
    434 Even if \CFA termination
    435 is not ``zero-cost", passing through an empty function still seems to be
    436 cheaper than updating global values.}
     428It is simply a matter of where the costs come from. Even if \CFA termination
     429is not ``zero-cost" passing through an empty function still seems to be
     430cheaper than updating global values.
    437431
    438432\item[Conditional Match]
     
    443437\end{description}
    444438
    445 \subsection{Resumption/Fixup, \autoref{t:PerformanceFixupRoutines}}
    446 
    447439Finally are the results of the resumption/fixup routine comparison.
    448 These results are surprisingly varied. It is possible that creating a closure
     440These results are surprisingly varied, it is possible that creating a closure
    449441has more to do with performance than passing the argument through layers of
    450442calls.
    451443Even with 100 stack frames though, resumption is only about as fast as
    452444manually passing a fixup routine.
    453 However, as the number of fixup routines is increased, the cost of passing them
    454 should make the resumption dynamic-search cheaper.
    455445So there is a cost for the additional power and flexibility exceptions
    456446provide.
Note: See TracChangeset for help on using the changeset viewer.