- Timestamp:
- Aug 21, 2021, 5:49:45 PM (3 years ago)
- Branches:
- ADT, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, pthread-emulation, qualifiedEnum
- Children:
- c2a9d88
- Parents:
- d8f8d08
- Location:
- doc/theses/andrew_beach_MMath
- Files:
-
- 4 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/andrew_beach_MMath/conclusion.tex
rd8f8d08 r7372065 1 1 \chapter{Conclusion} 2 \label{c:conclusion}3 2 % Just a little knot to tie the paper together. 4 3 5 In the previous chapters this thesis presents the design and implementation of 6 \CFA's EHM. Both the design and implementation are based off of tools and 7 techniques developed for other programming languages but they were adapted to 8 better fit \CFA's feature set and add a few features that do not exist in other 9 EHMs, like conditional catch, default handlers, implicitly changing resumption 10 into termination in the resumption default handler, and cancellation through 11 coroutines and threads back to program main. 4 In the previous chapters this thesis presents the design and implementation 5 of \CFA's exception handling mechanism (EHM). 6 Both the design and implementation are based off of tools and techniques 7 developed for other programming languages but they were adapted to better fit 8 \CFA's feature set. 12 9 13 10 The resulting features cover all of the major use cases of the most popular 14 11 termination EHMs of today, along with reintroducing resumption exceptions and 15 creating some new features that fit with \CFA's larger programming patterns, 16 such as virtuals independent of traditional objects. 12 creating some new features that fix with \CFA's larger programming patterns. 17 13 18 The implementation has been tested through a set of small but interesting micro-benchmarks 19 and compared to other implementations. 14 The implementation has been tested and compared to other implementations. 20 15 The results, while not cutting edge, are good enough for prototyping, which 21 is \CFA's currentstage of development.16 is \CFA's stage of development. 22 17 23 This i nitial EHM is a valuable new feature for \CFA in its own right but also serves24 as a tool and motivationfor other developments in the language.18 This is a valuable new feature for \CFA in its own right but also serves 19 as a tool (and motivation) for other developments in the language. -
doc/theses/andrew_beach_MMath/future.tex
rd8f8d08 r7372065 2 2 \label{c:future} 3 3 4 The following discussion covers both missing language features that affected my5 work and research based improvements.6 7 4 \section{Language Improvements} 8 5 \todo{Future/Language Improvements seems to have gotten mixed up. It is 6 presented as ``waiting on language improvements" but really its more 7 non-research based impovements.} 9 8 \CFA is a developing programming language. As such, there are partially or 10 unimplemented features (including several broken components)11 that I had to workaround while building an EHMlargely in9 unimplemented features of the language (including several broken components) 10 that I had to workaround while building an exception handling system largely in 12 11 the \CFA language (some C components). The following are a few of these 13 12 issues, and once implemented/fixed, how they would affect the exception system. … … 15 14 \item 16 15 The implementation of termination is not portable because it includes 17 hand-crafted assembly statements for each architecture, where the 18 ARM processor was just added. 19 % The existing compilers cannot translate that for other platforms and those 20 % sections must be ported by hand to 21 Supporting more hardware architectures in a general way is important. 16 hand-crafted assembly statements. 17 The existing compilers cannot translate that for other platforms and those 18 sections must be ported by hand to 19 support more hardware architectures, such as the ARM processor. 22 20 \item 23 21 Due to a type-system problem, the catch clause cannot bind the exception to a … … 29 27 @return@, \etc. The reason is that current code generation hoists a handler 30 28 into a nested function for convenience (versus assemble-code generation at the 31 @try@ statement). Hence, when the handler runs, its can access local variable 32 in the lexical scope of the @try@ statement, but the closure does not capture 33 local control-flow points so it cannot perform non-local transfers in the 34 hoisted function. 29 @try@ statement). Hence, when the handler runs, its code is not in the lexical 30 scope of the @try@ statement, where the local control-flow transfers are 31 meaningful. 35 32 \item 36 33 There is no detection of colliding unwinds. It is possible for clean-up code 37 34 run during an unwind to trigger another unwind that escapes the clean-up code 38 35 itself; such as a termination exception caught further down the stack or a 39 cancellation. There do exist ways to handle this case, but currently there is no40 detection and the first unwind is simplyforgotten, often leaving36 cancellation. There do exist ways to handle this but currently they are not 37 even detected and the first unwind will simply be forgotten, often leaving 41 38 it in a bad state. 42 39 \item 43 Finally, the exception system has not have a lot programmer testing.44 More time with encouraged usage will reveal new45 quality of life upgrades that can be made .40 Also the exception system did not have a lot of time to be tried and tested. 41 So just letting people use the exception system more will reveal new 42 quality of life upgrades that can be made with time. 46 43 \end{itemize} 47 44 … … 50 47 project, but was thrust upon it to do exception inheritance; hence, only 51 48 minimal work is done. A draft for a complete virtual system is available but 52 not finalized. A future \CFA project is to complete that work and then49 it is not finalized. A future \CFA project is to complete that work and then 53 50 update the exception system that uses the current version. 54 51 … … 56 53 exception traits. The most important one is an assertion to check one virtual 57 54 type is a child of another. This check precisely captures many of the 58 c urrent ad-hoc correctness requirements.55 correctness requirements. 59 56 60 57 The full virtual system might also include other improvement like associated 61 58 types to allow traits to refer to types not listed in their header. This 62 59 feature allows exception traits to not refer to the virtual-table type 63 explicitly . %, removing the need for the current interface macros.60 explicitly, removing the need for the current interface macros. 64 61 65 62 \section{Additional Raises} … … 96 93 Checked exceptions make exceptions part of a function's type by adding an 97 94 exception signature. An exception signature must declare all checked 98 exceptions that could propagate from the function ,either because they were99 raised inside the function or a call to a sub-function. This improves safety95 exceptions that could propagate from the function (either because they were 96 raised inside the function or came from a sub-function). This improves safety 100 97 by making sure every checked exception is either handled or consciously 101 98 passed on. 102 99 103 100 However checked exceptions were never seriously considered for this project 104 because they have significant trade-offs in usab ility and code reuse in101 because they have significant trade-offs in usablity and code reuse in 105 102 exchange for the increased safety. 106 103 These trade-offs are most problematic when trying to pass exceptions through … … 132 129 not support a successful-exiting stack-search without doing an unwind. 133 130 Workarounds are possible but awkward. Ideally an extension to libunwind could 134 be made, but that would either require separate maintenance or gain ingenough135 support to have it folded into the code base.131 be made, but that would either require separate maintenance or gain enough 132 support to have it folded into the standard. 136 133 137 134 Also new techniques to skip previously searched parts of the stack need to be … … 161 158 to leave the handler. 162 159 Currently, mimicking this behaviour in \CFA is possible by throwing a 163 termination exceptioninside a resumption handler.160 termination inside a resumption handler. 164 161 165 162 % Maybe talk about the escape; and escape CONTROL_STMT; statements or how -
doc/theses/andrew_beach_MMath/implement.tex
rd8f8d08 r7372065 20 20 21 21 \subsection{Virtual Type} 22 A virtual type~(see \autoref{s:Virtuals}) has a pointer to a virtual table, 23 called the \emph{virtual-table pointer}, which binds an instance of a virtual 24 type to a virtual table. Internally, the field is called \snake{virtual_table} 25 and is fixed after construction. This pointer is also the table's id and how 26 the system accesses the virtual table and the virtual members there. It is 27 always the first field in the structure so that its location is always known. 22 Virtual types only have one change to their structure: the addition of a 23 pointer to the virtual table, which is called the \emph{virtual-table pointer}. 24 Internally, the field is called \snake{virtual_table}. 25 The field is fixed after construction. It is always the first field in the 26 structure so that its location is always known. 28 27 \todo{Talk about constructors for virtual types (after they are working).} 29 28 29 The virtual table pointer binds an instance of a virtual type 30 to a virtual table. 31 The pointer is also the table's id and how the system accesses the 32 virtual table and the virtual members there. 33 30 34 \subsection{Type Id} 31 Every virtual type needs a unique id, so that type ids can be compared for 32 equality, which checks if the types representation are the same, or used to 33 access the type's type information. Here, uniqueness means within a program 34 composed of multiple translation units (TU), not uniqueness across all 35 programs. 36 37 One approach for program uniqueness is declaring a static declaration for each 38 type id, where the runtime storage address of that variable is guaranteed to be 39 unique during program execution. The type id storage can also be used for other 40 purposes. 41 42 The problem is that a type id may appear in multiple TUs that compose a 43 program, see \autoref{ss:VirtualTable}; hence in each TU, it must be declared 44 as external to prevent multiple definitions. However, the type id must actually 45 be declared in one of the TUs so the linker creates the storage. Hence, the 46 problem becomes designating one TU to insert an actual type-id declaration. But 47 the \CFA compiler does not know the set of the translation units that compose a 48 program, because TUs can be compile separately, followed by a separate link 49 step. 50 51 The solution is to mimic a \CFA feature in \Cpp{17}, @inline@ variables and 52 function: 53 \begin{quote} 54 There may be more than one definition of an inline function or variable (since 55 \Cpp{17} in the program as long as each definition appears in a different 56 translation unit and (for non-static inline functions and variables (since 57 \Cpp{17})) all definitions are identical. For example, an inline function or an 58 inline variable (since \Cpp{17}) may be defined in a header file that is 59 @#include@'d in multiple source files.~\cite{C++17} 60 \end{quote} 61 The underlying mechanism to provide this capability is attribute 62 \begin{cfa} 63 section(".gnu.linkonce.NAME") 64 \end{cfa} 65 where @NAME@ is the variable/function name duplicated in each TU. The linker than 66 provides the service of generating a single declaration (instance) across all 67 TUs, even if a program is linked incrementally. 68 69 C does not support this feature for @inline@, and hence, neither does \CFA. 70 Again, rather than implement a new @inline@ extension for \CFA, a temporary 71 solution for the exception handling is to add the following in \CFA. 72 \begin{lstlisting}[language=CFA,{moredelim=**[is][\color{red}]{@}{@}}] 73 @__attribute__((cfa_linkonce))@ void f() {} 74 \end{lstlisting} 75 which becomes 76 \begin{lstlisting}[language=CFA,{moredelim=**[is][\color{red}]{@}{@}}] 77 __attribute__((section(".gnu.linkonce._X1fFv___1"))) void @_X1fFv___1@(){} 78 \end{lstlisting} 79 where @NAME@ from above is the \CFA mangled variable/function name. Note, 80 adding this feature is necessary because, when using macros, the mangled name 81 is unavailable. This attribute is useful for purposes other than exception 82 handling, and should eventually be rolled into @inline@ processing in \CFA. 83 84 Finally, a type id's data implements a pointers to the type's type information 85 instance. Dereferencing the pointer gets the type information. 86 87 \subsection{Implementation} 88 35 Every virtual type has a unique id. 36 Type ids can be compared for equality, 37 which checks if the types reperented are the same, 38 or used to access the type's type information. 89 39 The type information currently is only the parent's type id or, if the 90 40 type has no parent, the null pointer. 41 42 The id's are implemented as pointers to the type's type information instance. 43 Dereferencing the pointer gets the type information. 91 44 The ancestors of a virtual type are found by traversing type ids through 92 45 the type information. 93 An example using helper macros looks like: 46 The information pushes the issue of creating a unique value (for 47 the type id) to the problem of creating a unique instance (for type 48 information), which the linker can solve. 49 50 The advanced linker support is used here to avoid having to create 51 a new declaration to attach this data to. 52 With C/\CFA's header/implementation file divide for something to appear 53 exactly once it must come from a declaration that appears in exactly one 54 implementation file; the declarations in header files may exist only once 55 they can be included in many different translation units. 56 Therefore, structure's declaration will not work. 57 Neither will attaching the type information to the virtual table -- although 58 a vtable declarations are in implemention files they are not unique, see 59 \autoref{ss:VirtualTable}. 60 Instead the same type information is generated multiple times and then 61 the new attribute \snake{cfa_linkone} is used to removed duplicates. 62 63 Type information is constructed as follows: 64 \begin{enumerate} 65 \item 66 Use the type's name to generate a name for the type information structure. 67 This is saved so it may be reused. 68 \item 69 Generate a new structure definition to store the type 70 information. The layout is the same in each case, just the parent's type id, 71 but the types used change from instance to instance. 72 The generated name is used for both this structure and, if relivant, the 73 parent pointer. 74 If the virtual type is polymorphic then the type information structure is 75 polymorphic as well, with the same polymorphic arguments. 76 \item 77 A seperate name for instances is generated from the type's name. 78 \item 79 The definition is generated and initialised. 80 The parent id is set to the null pointer or to the address of the parent's 81 type information instance. Name resolution handles the rest. 82 \item 83 \CFA's name mangler does its regular name mangling encoding the type of 84 the declaration into the instance name. This gives a completely unique name 85 including different instances of the same polymorphic type. 86 \end{enumerate} 87 \todo{The list is making me realise, some of this isn't ordered.} 88 89 Writing that code manually, with helper macros for the early name mangling, 90 would look like this: 94 91 \begin{cfa} 95 92 struct INFO_TYPE(TYPE) { … … 103 100 \end{cfa} 104 101 105 The type information is constructed as follows:106 \begin{enumerate}107 \item108 Use the type's name to generate a name for the type information structure,109 which is saved so it can be reused.110 \item111 Generate a new structure definition to store the type112 information. The layout is the same in each case, just the parent's type id,113 but the types used change from instance to instance.114 The generated name is used for both this structure and, if relevant, the115 parent pointer.116 If the virtual type is polymorphic then the type information structure is117 polymorphic as well, with the same polymorphic arguments.118 \item119 A separate name for instances is generated from the type's name.120 \item121 The definition is generated and initialized.122 The parent id is set to the null pointer or to the address of the parent's123 type information instance. Name resolution handles the rest.124 \item125 \CFA's name mangler does its regular name mangling encoding the type of126 the declaration into the instance name. This process gives a program unique name127 including different instances of the same polymorphic type.128 \end{enumerate}129 \todo{The list is making me realize, some of this isn't ordered.}130 131 132 \begin{comment}133 102 \subsubsection{\lstinline{cfa\_linkonce} Attribute} 134 % I just reali zed: This is an extension of the inline keyword.103 % I just realised: This is an extension of the inline keyword. 135 104 % An extension of C's at least, it is very similar to C++'s. 136 105 Another feature added to \CFA is a new attribute: \texttt{cfa\_linkonce}. … … 157 126 everything that comes after the special prefix, then only one is used 158 127 and the other is discarded. 159 \end{comment}160 161 128 162 129 \subsection{Virtual Table} … … 191 158 The first and second sections together mean that every virtual table has a 192 159 prefix that has the same layout and types as its parent virtual table. 193 This, combined with the fixed offset to the virtual -table pointer, means that160 This, combined with the fixed offset to the virtual table pointer, means that 194 161 for any virtual type, it is always safe to access its virtual table and, 195 162 from there, it is safe to check the type id to identify the exact type of the … … 209 176 type's alignment, is set using an @alignof@ expression. 210 177 211 \subs ection{Concurrency Integration}178 \subsubsection{Concurrency Integration} 212 179 Coroutines and threads need instances of @CoroutineCancelled@ and 213 180 @ThreadCancelled@ respectively to use all of their functionality. When a new … … 216 183 at the definition of the main function. 217 184 218 Th ese transformations are shownthrough code re-writing in219 \autoref{f:Co routineTypeTransformation} and220 \autoref{f:Co routineMainTransformation} for a coroutine and a thread is similar.221 In both cases , the original declaration is not modified, only new ones are222 added.185 This is showned through code re-writing in 186 \autoref{f:ConcurrencyTypeTransformation} and 187 \autoref{f:ConcurrencyMainTransformation}. 188 In both cases the original declaration is not modified, 189 only new ones are added. 223 190 224 191 \begin{figure} … … 240 207 extern CoroutineCancelled_vtable & _default_vtable; 241 208 \end{cfa} 242 \caption{Coroutine Type Transformation} 243 \label{f:CoroutineTypeTransformation} 244 %\end{figure} 245 246 \bigskip 247 248 %\begin{figure} 209 \caption{Concurrency Type Transformation} 210 \label{f:ConcurrencyTypeTransformation} 211 \end{figure} 212 213 \begin{figure} 249 214 \begin{cfa} 250 215 void main(Example & this) { … … 264 229 &_default_vtable_object_declaration; 265 230 \end{cfa} 266 \caption{Co routineMain Transformation}267 \label{f:Co routineMainTransformation}231 \caption{Concurrency Main Transformation} 232 \label{f:ConcurrencyMainTransformation} 268 233 \end{figure} 269 234 … … 280 245 struct __cfavir_type_id const * child ); 281 246 \end{cfa} 282 The type id for thetarget type of the virtual cast is passed in as @parent@ and247 The type id of target type of the virtual cast is passed in as @parent@ and 283 248 the cast target is passed in as @child@. 284 The generated C code wraps both arguments and the result with type casts. 249 250 For generated C code wraps both arguments and the result with type casts. 285 251 There is also an internal check inside the compiler to make sure that the 286 252 target type is a virtual type. … … 294 260 295 261 \section{Exceptions} 296 \todo{Anything about exception construction.} 262 % Anything about exception construction. 297 263 298 264 \section{Unwinding} … … 308 274 stack. On function entry and return, unwinding is handled directly by the 309 275 call/return code embedded in the function. 310 \PAB{Meaning:In many cases, the position of the instruction pointer (relative to parameter276 In many cases, the position of the instruction pointer (relative to parameter 311 277 and local declarations) is enough to know the current size of the stack 312 frame. }278 frame. 313 279 314 280 Usually, the stack-frame size is known statically based on parameter and 315 local variable declarations. Even for adynamic stack-size, the information281 local variable declarations. Even with dynamic stack-size, the information 316 282 to determine how much of the stack has to be removed is still contained 317 283 within the function. … … 319 285 bumping the hardware stack-pointer up or down as needed. 320 286 Constructing/destructing values within a stack frame has 321 a similar complexity but larger constants, which takeslonger.287 a similar complexity but can add additional work and take longer. 322 288 323 289 Unwinding across multiple stack frames is more complex because that 324 290 information is no longer contained within the current function. 325 With separate compilation a function does not know its callers nor their frame size. 326 In general, the caller's frame size is embedded only at the functions entry (push 327 stack) and exit (pop stack). 291 With seperate compilation a function has no way of knowing what its callers 292 are so it can't know how large those frames are. 328 293 Without altering the main code path it is also hard to pass that work off 329 294 to the caller. … … 337 302 This approach is fragile and requires extra work in the surrounding code. 338 303 339 With respect to the extra work in the sur rounding code,304 With respect to the extra work in the surounding code, 340 305 many languages define clean-up actions that must be taken when certain 341 306 sections of the stack are removed. Such as when the storage for a variable 342 is removed from the stack (destructor call)or when a try statement with a finally clause is307 is removed from the stack or when a try statement with a finally clause is 343 308 (conceptually) popped from the stack. 344 None of these casesshould be handled by the user --- that would contradict the309 None of these should be handled by the user --- that would contradict the 345 310 intention of these features --- so they need to be handled automatically. 346 311 … … 390 355 in this case @clean_up@, run when the variable goes out of scope. 391 356 This feature is enough to mimic destructors, 392 but not try statements that affect357 but not try statements which can effect 393 358 the unwinding. 394 359 395 360 To get full unwinding support, all of these features must be handled directly 396 in assembly and assembler directives; parti cularly the cfi directives361 in assembly and assembler directives; partiularly the cfi directives 397 362 \snake{.cfi_lsda} and \snake{.cfi_personality}. 398 363 … … 434 399 @_UA_FORCE_UNWIND@ specifies a forced unwind call. Forced unwind only performs 435 400 the cleanup phase and uses a different means to decide when to stop 436 (see \ autoref{s:ForcedUnwind}).401 (see \vref{s:ForcedUnwind}). 437 402 \end{enumerate} 438 403 439 404 The @exception_class@ argument is a copy of the 440 405 \code{C}{exception}'s @exception_class@ field, 441 which is a number that identifies the EHM406 which is a number that identifies the exception handling mechanism 442 407 that created the exception. 443 408 … … 529 494 needs its own exception context. 530 495 531 An exception context isretrieved by calling the function496 The exception context should be retrieved by calling the function 532 497 \snake{this_exception_context}. 533 498 For sequential execution, this function is defined as … … 554 519 The first step of a termination raise is to copy the exception into memory 555 520 managed by the exception system. Currently, the system uses @malloc@, rather 556 than reserved memory or the stack top. The EHMmanages521 than reserved memory or the stack top. The exception handling mechanism manages 557 522 memory for the exception as well as memory for libunwind and the system's own 558 523 per-exception storage. … … 653 618 \subsection{Try Statements and Catch Clauses} 654 619 The try statement with termination handlers is complex because it must 655 compensate for the C code-generation versus proper620 compensate for the C code-generation versus 656 621 assembly-code generated from \CFA. Libunwind 657 622 requires an LSDA and personality function for control to unwind across a … … 659 624 660 625 The workaround is a function called @__cfaehm_try_terminate@ in the standard 661 \CFAlibrary. The contents of a try block and the termination handlers are converted662 into nestedfunctions. These are then passed to the try terminate function and it663 calls them , appropriately.626 library. The contents of a try block and the termination handlers are converted 627 into functions. These are then passed to the try terminate function and it 628 calls them. 664 629 Because this function is known and fixed (and not an arbitrary function that 665 happens to contain a try statement), itsLSDA can be generated ahead630 happens to contain a try statement), the LSDA can be generated ahead 666 631 of time. 667 632 668 Both the LSDA and the personality function for @__cfaehm_try_terminate@are set ahead of time using633 Both the LSDA and the personality function are set ahead of time using 669 634 embedded assembly. This assembly code is handcrafted using C @asm@ statements 670 635 and contains 671 enough information for a single try statement the function rep resents.636 enough information for a single try statement the function repersents. 672 637 673 638 The three functions passed to try terminate are: … … 681 646 decides if a catch clause matches the termination exception. It is constructed 682 647 from the conditional part of each handler and runs each check, top to bottom, 683 in turn, first checking to see if the exception type matches. 684 The match is performed in two steps, first a virtual cast is used to see 685 if the raised exception is an instance of the declared exception or one of 686 its descendant type, and then is the condition true, if present. 687 It takes a pointer to the exception and returns 0 if the 648 in turn, first checking to see if the exception type matches and then if the 649 condition is true. It takes a pointer to the exception and returns 0 if the 688 650 exception is not handled here. Otherwise the return value is the id of the 689 651 handler that matches the exception. … … 698 660 All three functions are created with GCC nested functions. GCC nested functions 699 661 can be used to create closures, 700 in other words , functions that can refer to their lexical scope inother701 functions on the stack when called. This approach allows the functions to refer to all the662 in other words functions that can refer to the state of other 663 functions on the stack. This approach allows the functions to refer to all the 702 664 variables in scope for the function containing the @try@ statement. These 703 665 nested functions and all other functions besides @__cfaehm_try_terminate@ in … … 707 669 708 670 \autoref{f:TerminationTransformation} shows the pattern used to transform 709 a \CFA try statement with catch clauses into the appropr iate C functions.671 a \CFA try statement with catch clauses into the approprate C functions. 710 672 \todo{Explain the Termination Transformation figure.} 711 673 … … 776 738 Instead of storing the data in a special area using assembly, 777 739 there is just a linked list of possible handlers for each stack, 778 with each node on the list rep resenting a try statement on the stack.740 with each node on the list reperenting a try statement on the stack. 779 741 780 742 The head of the list is stored in the exception context. … … 782 744 to the head of the list. 783 745 Instead of traversing the stack, resumption handling traverses the list. 784 At each node, the EHM checks to see if the try statement the node rep resents746 At each node, the EHM checks to see if the try statement the node repersents 785 747 can handle the exception. If it can, then the exception is handled and 786 748 the operation finishes, otherwise the search continues to the next node. 787 749 If the search reaches the end of the list without finding a try statement 788 750 that can handle the exception, the default handler is executed and the 789 operation finishes , unless it throws an exception.751 operation finishes. 790 752 791 753 Each node has a handler function that does most of the work. 792 754 The handler function is passed the raised exception and returns true 793 755 if the exception is handled and false otherwise. 756 794 757 The handler function checks each of its internal handlers in order, 795 758 top-to-bottom, until it funds a match. If a match is found that handler is … … 797 760 If no match is found the function returns false. 798 761 The match is performed in two steps, first a virtual cast is used to see 799 if the raised exception is an instance of the declared exception or one of 800 its descendant type, and then is the condition true, if present. 801 \PAB{I don't understand this sentence. 802 This ordering gives the type guarantee used in the predicate.} 762 if the thrown exception is an instance of the declared exception or one of 763 its descendant type, then check to see if passes the custom predicate if one 764 is defined. This ordering gives the type guarantee used in the predicate. 803 765 804 766 \autoref{f:ResumptionTransformation} shows the pattern used to transform 805 a \CFA try statement with catch clauses into the appropr iate C functions.767 a \CFA try statement with catch clauses into the approprate C functions. 806 768 \todo{Explain the Resumption Transformation figure.} 807 769 … … 852 814 (see \vpageref{s:ResumptionMarking}), which ignores parts of 853 815 the stack 854 already examined, andis accomplished by updating the front of the list as the855 search continues. Before the handler is called at a matching node, the head of the list816 already examined, is accomplished by updating the front of the list as the 817 search continues. Before the handler at a node is called, the head of the list 856 818 is updated to the next node of the current node. After the search is complete, 857 819 successful or not, the head of the list is reset. … … 860 822 been checked are not on the list while a handler is run. If a resumption is 861 823 thrown during the handling of another resumption, the active handlers and all 862 the other handler schecked up to this point are not checked again.824 the other handler checked up to this point are not checked again. 863 825 % No paragraph? 864 826 This structure also supports new handlers added while the resumption is being … … 868 830 869 831 \begin{figure} 870 \centering871 832 \input{resumption-marking} 872 833 \caption{Resumption Marking} … … 890 851 \section{Finally} 891 852 % Uses destructors and GCC nested functions. 892 \autoref{f:FinallyTransformation} shows the pattern used to transform a \CFA 893 try statement with finally clause into the appropriate C functions. 894 The finally clause is placed into a GCC nested-function 895 with a unique name, and no arguments or return values. This nested function is 896 then set as the cleanup function of an empty object that is declared at the 897 beginning of a block placed around the context of the associated @try@ 898 statement. 899 900 \begin{figure} 901 \begin{cfa} 902 try { 903 // TRY BLOCK 904 } finally { 905 // FINALLY BLOCK 906 } 907 \end{cfa} 908 909 \transformline 910 911 \begin{cfa} 912 { 913 void finally(void *__hook){ 914 // FINALLY BLOCK 915 } 916 __attribute__ ((cleanup(finally))) 917 struct __cfaehm_cleanup_hook __finally_hook; 918 { 919 // TRY BLOCK 920 } 921 922 } 923 \end{cfa} 924 925 \caption{Finally Transformation} 926 \label{f:FinallyTransformation} 927 \end{figure} 853 A finally clause is placed into a GCC nested-function with a unique name, 854 and no arguments or return values. 855 This nested function is then set as the cleanup 856 function of an empty object that is declared at the beginning of a block placed 857 around the context of the associated @try@ statement. 928 858 929 859 The rest is handled by GCC. The try block and all handlers are inside this … … 939 869 940 870 The first step of cancellation is to find the cancelled stack and its type: 941 coroutine, thread ,or main thread.871 coroutine, thread or main thread. 942 872 In \CFA, a thread (the construct the user works with) is a user-level thread 943 873 (point of execution) paired with a coroutine, the thread's main coroutine. 944 874 The thread library also stores pointers to the main thread and the current 945 coroutine.875 thread. 946 876 If the current thread's main and current coroutines are the same then the 947 877 current stack is a thread stack, otherwise it is a coroutine stack. … … 957 887 passed to the forced-unwind function. The general pattern of all three stop 958 888 functions is the same: continue unwinding until the end of stack and 959 then p erform the appropriate transfer.889 then preform the appropriate transfer. 960 890 961 891 For main stack cancellation, the transfer is just a program abort. -
doc/theses/andrew_beach_MMath/performance.tex
rd8f8d08 r7372065 2 2 \label{c:performance} 3 3 4 Performance is of secondary importance for most of this project. 5 Instead, the focus was to get the features working. The only performance 6 requirement is to ensure the tests for correctness run in a reasonable 7 amount of time. Hence, a few basic performance tests were performed to 8 check this requirement. 4 Performance has been of secondary importance for most of this project. 5 Instead, the focus has been to get the features working. The only performance 6 requirements is to ensure the tests for correctness run in a reasonable 7 amount of time. 9 8 10 9 \section{Test Set-Up} 11 Tests w ere run in \CFA, C++, Java and Python.10 Tests will be run in \CFA, C++, Java and Python. 12 11 In addition there are two sets of tests for \CFA, 13 one for termination and one forresumption exceptions.12 one for termination exceptions and once with resumption exceptions. 14 13 15 14 C++ is the most comparable language because both it and \CFA use the same 16 15 framework, libunwind. 17 In fact, the comparison is almost entirely a quality of implementation .18 Specifically,\CFA's EHM has had significantly less time to be optimized and16 In fact, the comparison is almost entirely a quality of implementation 17 comparison. \CFA's EHM has had significantly less time to be optimized and 19 18 does not generate its own assembly. It does have a slight advantage in that 20 there are some features it handles directly instead ofthrough utility functions,21 but otherwise \Cpp should havea significant advantage.22 23 Java is a popular language with similar termination semantics, but24 it is implemented in a very different environment, a virtual machine with19 there are some features it does not handle, through utility functions, 20 but otherwise \Cpp has a significant advantage. 21 22 Java is another very popular language with similar termination semantics. 23 It is implemented in a very different environment, a virtual machine with 25 24 garbage collection. 26 It also implements the @finally@ clause on @try@blocks allowing for a direct25 It also implements the finally clause on try blocks allowing for a direct 27 26 feature-to-feature comparison. 28 As with \Cpp, Java's implementation is m ature, optimized29 and hasextra features.30 31 Python is used as an alternativecomparison because of the \CFA EHM's32 current performance goals, which is not tobe prohibitively slow while the27 As with \Cpp, Java's implementation is more mature, has more optimizations 28 and more extra features. 29 30 Python was used as a point of comparison because of the \CFA EHM's 31 current performance goals, which is not be prohibitively slow while the 33 32 features are designed and examined. Python has similar performance goals for 34 33 creating quick scripts and its wide use suggests it has achieved those goals. 35 34 36 Unfortunately, there are no notable modern programming languages with 37 resumption exceptions. Even the older programming languages with resumption 38 seem to be notable only for having resumption. 39 So instead, resumption is compared to its simulation in other programming 40 languages using fixup functions that are explicitly passed for correction or 41 logging purposes. 42 % So instead, resumption is compared to a less similar but much more familiar 43 %feature, termination exceptions. 44 45 All tests are run inside a main loop that repeatedly performs a test. 46 This approach avoids start-up or tear-down time from 35 Unfortunately there are no notable modern programming languages with 36 resumption exceptions. Even the older programming languages with resumptions 37 seem to be notable only for having resumptions. 38 So instead resumptions are compared to a less similar but much more familiar 39 feature, termination exceptions. 40 41 All tests are run inside a main loop which will perform the test 42 repeatedly. This is to avoids start-up or tear-down time from 47 43 affecting the timing results. 48 Each test is run a N times (configurable from the command line).49 The Java tests runs the main loop1000 times before50 beginning t he actual testto ``warm-up" the JVM.44 Tests ran their main loop a million times. 45 The Java versions of the test also run this loop an extra 1000 times before 46 beginning to time the results to ``warm-up" the JVM. 51 47 52 48 Timing is done internally, with time measured immediately before and 53 after the test loop. The difference is calculated and printed. 49 immediately after the test loop. The difference is calculated and printed. 50 54 51 The loop structure and internal timing means it is impossible to test 55 52 unhandled exceptions in \Cpp and Java as that would cause the process to … … 58 55 critical. 59 56 60 The exceptions used in these tests are alwaysbased off of61 abase exception. This requirement minimizes performance differences based62 on the object model used to rep resent the exception.63 64 All tests are designed to be as minimal as possible,while still preventing65 ex cessive optimizations.57 The exceptions used in these tests will always be a exception based off of 58 the base exception. This requirement minimizes performance differences based 59 on the object model used to repersent the exception. 60 61 All tests were designed to be as minimal as possible while still preventing 62 exessive optimizations. 66 63 For example, empty inline assembly blocks are used in \CFA and \Cpp to 67 64 prevent excessive optimizations while adding no actual work. 68 Each test was run eleven times. The top three and bottom three results were69 discarded and the remaining five values are averaged.70 71 The tests are compiled with gcc-10 for \CFA and g++-10 for \Cpp. Java is72 compiled with version 11.0.11. Python with version 3.8. The tests were run on:73 \begin{itemize}[nosep]74 \item75 ARM 2280 Kunpeng 920 48-core 2$\times$socket \lstinline{@} 2.6 GHz running Linux v5.11.0-2576 \item77 AMD 6380 Abu Dhabi 16-core 4$\times$socket \lstinline{@} 2.5 GHz running Linux v5.11.0-2578 \end{itemize}79 Two kinds of hardware architecture allows discriminating any implementation and80 architectural effects.81 82 65 83 66 % We don't use catch-alls but if we did: … … 88 71 The following tests were selected to test the performance of different 89 72 components of the exception system. 90 The y should provide a guide as to where the EHM's costs are found.73 The should provide a guide as to where the EHM's costs can be found. 91 74 92 75 \paragraph{Raise and Handle} 93 This group measures the cost of a try statement when exceptions are raised and 94 the stack is unwound (termination) or not unwound (resumption). Each test has 95 has a repeating function like the following 96 \begin{lstlisting}[language=CFA,{moredelim=**[is][\color{red}]{@}{@}}] 97 void unwind_empty(unsigned int frames) { 98 if (frames) { 99 @unwind_empty(frames - 1);@ // AUGMENTED IN OTHER EXPERIMENTS 100 } else throw (empty_exception){&empty_vt}; 101 } 102 \end{lstlisting} 103 which is called N times, where each call recurses to a depth of R (configurable from the command line), an 104 exception is raised, the stack is a unwound, and the exception caught. 76 The first group of tests involve setting up 77 So there is three layers to the test. The first is set up and a loop, which 78 configures the test and then runs it repeatedly to reduce the impact of 79 start-up and shutdown on the results. 80 Each iteration of the main loop 105 81 \begin{itemize}[nosep] 106 \item Empty: 107 For termination, this test measures the cost of raising (stack walking) an 108 exception through empty stack frames from the bottom of the recursion to an 109 empty handler, and unwinding the stack. (see above code) 110 111 \medskip 112 For resumption, this test measures the same raising cost but does not unwind 113 the stack. For languages without resumption, a fixup function is to the bottom 114 of the recursion and called to simulate a fixup operation at that point. 115 \begin{cfa} 116 void nounwind_fixup(unsigned int frames, void (*raised_rtn)(int &)) { 117 if (frames) { 118 nounwind_fixup(frames - 1, raised_rtn); 119 } else { 120 int fixup = 17; 121 raised_rtn(fixup); 122 } 123 } 124 \end{cfa} 125 where the passed fixup function is: 126 \begin{cfa} 127 void raised(int & fixup) { 128 fixup = 42; 129 } 130 \end{cfa} 131 For comparison, a \CFA version passing a function is also included. 82 \item Empty Function: 83 The repeating function is empty except for the necessary control code. 132 84 \item Destructor: 133 This test measures the cost of raising an exception through non-empty frames, 134 where each frame has an object requiring destruction, from the bottom of the 135 recursion to an empty handler. Hence, there are N destructor calls during 136 unwinding. 137 138 \medskip 139 This test is not meaningful for resumption because the stack is only unwound as 140 the recursion returns. 141 \begin{cfa} 142 WithDestructor object; 143 unwind_destructor(frames - 1); 144 \end{cfa} 85 The repeating function creates an object with a destructor before calling 86 itself. 145 87 \item Finally: 146 This test measures the cost of establishing a try block with an empty finally 147 clause on the front side of the recursion and running the empty finally clauses 148 during stack unwinding from the bottom of the recursion to an empty handler. 149 \begin{cfa} 150 try { 151 unwind_finally(frames - 1); 152 } finally {} 153 \end{cfa} 154 155 \medskip 156 This test is not meaningful for resumption because the stack is only unwound as 157 the recursion returns. 88 The repeating function calls itself inside a try block with a finally clause 89 attached. 158 90 \item Other Handler: 159 For termination, this test is like the finally test but the try block has a 160 catch clause for an exception that is not raised, so catch matching is executed 161 during stack unwinding but the match never successes until the catch at the 162 bottom of the recursion. 163 \begin{cfa} 164 try { 165 unwind_other(frames - 1); 166 } catch (not_raised_exception *) {} 167 \end{cfa} 168 169 \medskip 170 For resumption, this test measures the same raising cost but does not unwind 171 the stack. For languages without resumption, the same fixup function is passed 172 and called. 91 The repeating function calls itself inside a try block with a handler that 92 will not match the raised exception. (But is of the same kind of handler.) 173 93 \end{itemize} 174 94 175 \paragraph{Try/Handle/Finally Statement} 176 This group measures just the cost of executing a try statement so 177 \emph{there is no stack unwinding}. Hence, the program main loops N times 178 around: 179 \begin{cfa} 180 try { 181 } catch (not_raised_exception *) {} 182 \end{cfa} 95 \paragraph{Cross Try Statement} 96 The next group measures the cost of a try statement when no exceptions are 97 raised. The test is set-up, then there is a loop to reduce the impact of 98 start-up and shutdown on the results. 99 In each iteration, a try statement is executed. Entering and leaving a loop 100 is all the test wants to do. 183 101 \begin{itemize}[nosep] 184 102 \item Handler: 185 The try statement has a handler ( catch/resume).103 The try statement has a handler (of the matching kind). 186 104 \item Finally: 187 105 The try statement has a finally clause. … … 189 107 190 108 \paragraph{Conditional Matching} 191 This group measures the cost of conditional matching.109 This group of tests checks the cost of conditional matching. 192 110 Only \CFA implements the language level conditional match, 193 the other languages m imic with an ``unconditional" match (it still194 checks the exception's type) and conditional re-raise if it is not suppose111 the other languages must mimic with an ``unconditional" match (it still 112 checks the exception's type) and conditional re-raise if it was not supposed 195 113 to handle that exception. 196 \begin{center}197 \begin{tabular}{ll}198 \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp, Java, Python} \\199 \begin{cfa}200 try {201 throw_exception();202 } catch (empty_exception * exc;203 should_catch) {204 }205 \end{cfa}206 &207 \begin{cfa}208 try {209 throw_exception();210 } catch (EmptyException & exc) {211 if (!should_catch) throw;212 }213 \end{cfa}214 \end{tabular}215 \end{center}216 114 \begin{itemize}[nosep] 217 115 \item Match All: … … 221 119 \end{itemize} 222 120 223 \medskip224 \noindent225 All omitted test code for other languages is functionally identical to the \CFA226 tests or simulated, and available online~\cite{CforallExceptionBenchmarks}.227 228 121 %\section{Cost in Size} 229 122 %Using exceptions also has a cost in the size of the executable. … … 237 130 238 131 \section{Results} 239 One result not directly related to \CFA but important to keep in 240 mind is that, for exceptions, the standard intuition about which languages 241 should go faster often does not hold. For example, there are a few cases where Python out-performs 242 \CFA, \Cpp and Java. The most likely explanation is that, since exceptions are 243 rarely considered to be the common case, the more optimized languages 244 make that case expense. In addition, languages with high-level 245 representations have a much easier time scanning the stack as there is less 246 to decode. 247 248 Tables~\ref{t:PerformanceTermination} and~\ref{t:PerformanceResumption} show 249 the test results for termination and resumption, respectively. In cases where 250 a feature is not supported by a language, the test is skipped for that language 251 (marked N/A). For some Java experiments it was impossible to measure certain 252 effects because the JIT corrupted the test (marked N/C). No workaround was 253 possible~\cite{Dice21}. To get experiments in the range of 1--100 seconds, the 254 number of times an experiment is run (N) is varied (N is marked beside each 255 experiment, e.g., 1M $\Rightarrow$ 1 million test iterations). 256 257 An anomaly exists with gcc nested functions used as thunks for implementing 258 much of the \CFA EHM. If a nested-function closure captures local variables in 259 its lexical scope, performance dropped by a factor of 10. Specifically, in try 260 statements of the form: 261 \begin{cfa} 262 try { 263 unwind_other(frames - 1); 264 } catch (not_raised_exception *) {} 265 \end{cfa} 266 the try block is hoisted into a nested function and the variable @frames@ is 267 the local parameter to the recursive function, which triggers the anomaly. The 268 workaround is to remove the recursion parameter and make it a global variable 269 that is explicitly decremented outside of the try block (marked with a ``*''): 270 \begin{cfa} 271 frames -= 1; 272 try { 273 unwind_other(); 274 } catch (not_raised_exception *) {} 275 \end{cfa} 276 To make comparisons fair, a dummy parameter is added and the dummy value passed 277 in the recursion. Note, nested functions in gcc are rarely used (if not 278 completely unknown) and must follow the C calling convention, unlike \Cpp 279 lambdas, so it is not surprising if there are performance issues efficiently 280 capturing closures. 281 282 % Similarly, if a test does not change between resumption 283 % and termination in \CFA, then only one test is written and the result 284 % was put into the termination column. 132 Each test was run eleven times. The top three and bottom three results were 133 discarded and the remaining five values are averaged. 134 135 In cases where a feature is not supported by a language the test is skipped 136 for that language. Similarly, if a test is does not change between resumption 137 and termination in \CFA, then only one test is written and the result 138 was put into the termination column. 285 139 286 140 % Raw Data: … … 318 172 % Match None & 0.0 & 0.0 & 9476060146 & 0.0 & 0.0 \\ 319 173 174 \begin{tabular}{|l|c c c c c|} 175 \hline 176 & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\ 177 \hline 178 Raise Empty & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 179 Raise D'tor & 0.0 & 0.0 & 0.0 & N/A & N/A \\ 180 Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\ 181 Raise Other & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 182 Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 183 Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\ 184 Match All & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 185 Match None & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 186 \hline 187 \end{tabular} 188 320 189 % run-plg7a-a.sat 321 190 % --------------- … … 352 221 % Match None & 0.0 & 0.0 & 7829059869 & 0.0 & 0.0 \\ 353 222 354 \begin{table} 355 \centering 356 \caption{Performance Results Termination (sec)} 357 \label{t:PerformanceTermination} 358 \begin{tabular}{|r|*{2}{|r r r r|}} 359 \hline 360 & \multicolumn{4}{c||}{AMD} & \multicolumn{4}{c|}{ARM} \\ 361 \cline{2-9} 362 N\hspace{8pt} & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} & 363 \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ 364 \hline 365 Throw Empty (1M) & 3.4 & 2.8 & 18.3 & 23.4 & 3.7 & 3.2 & 15.5 & 14.8 \\ 366 Throw D'tor (1M) & 48.4 & 23.6 & N/A & N/A & 64.2 & 29.0 & N/A & N/A \\ 367 Throw Finally (1M) & 3.4* & N/A & 17.9 & 29.0 & 4.1* & N/A & 15.6 & 19.0 \\ 368 Throw Other (1M) & 3.6* & 23.2 & 18.2 & 32.7 & 4.0* & 24.5 & 15.5 & 21.4 \\ 369 Try/Catch (100M) & 6.0 & 0.9 & N/C & 37.4 & 10.0 & 0.8 & N/C & 32.2 \\ 370 Try/Finally (100M) & 0.9 & N/A & N/C & 44.1 & 0.8 & N/A & N/C & 37.3 \\ 371 Match All (10M) & 32.9 & 20.7 & 13.4 & 4.9 & 36.2 & 24.5 & 12.0 & 3.1 \\ 372 Match None (10M) & 32.7 & 50.3 & 11.0 & 5.1 & 36.3 & 71.9 & 12.3 & 4.2 \\ 223 % PLG7A (in seconds) 224 \begin{tabular}{|l|c c c c c|} 225 \hline 226 & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\ 227 \hline 228 Raise Empty & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 229 Raise D'tor & 0.0 & 0.0 & 0.0 & N/A & N/A \\ 230 Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\ 231 Raise Other & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 232 Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 233 Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\ 234 Match All & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 235 Match None & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 373 236 \hline 374 237 \end{tabular} 375 \end{table} 376 377 \begin{table} 378 \centering 379 \small 380 \caption{Performance Results Resumption (sec)} 381 \label{t:PerformanceResumption} 382 \setlength{\tabcolsep}{5pt} 383 \begin{tabular}{|r|*{2}{|r r r r|}} 384 \hline 385 & \multicolumn{4}{c||}{AMD} & \multicolumn{4}{c|}{ARM} \\ 386 \cline{2-9} 387 N\hspace{8pt} & \multicolumn{1}{c}{\CFA (R/F)} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} & 388 \multicolumn{1}{c}{\CFA (R/F)} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\ 389 \hline 390 Resume Empty (10M) & 3.8/3.5 & 14.7 & 2.3 & 176.1 & 0.3/0.1 & 8.9 & 1.2 & 119.9 \\ 391 Resume Other (10M) & 4.0*/0.1* & 21.9 & 6.2 & 381.0 & 0.3*/0.1* & 13.2 & 5.0 & 290.7 \\ 392 Try/Resume (100M) & 8.8 & N/A & N/A & N/A & 12.3 & N/A & N/A & N/A \\ 393 Match All (10M) & 0.3 & N/A & N/A & N/A & 0.3 & N/A & N/A & N/A \\ 394 Match None (10M) & 0.3 & N/A & N/A & N/A & 0.4 & N/A & N/A & N/A \\ 395 \hline 396 \end{tabular} 397 \end{table} 398 399 As stated, the performance tests are not attempting to compare exception 400 handling across languages. The only performance requirement is to ensure the 401 \CFA EHM implementation runs in a reasonable amount of time, given its 402 constraints. In general, the \CFA implement did very well. Each of the tests is 403 analysed. 404 \begin{description} 405 \item[Throw/Resume Empty] 406 For termination, \CFA is close to \Cpp, where other languages have a higher cost. 407 408 For resumption, \CFA is better than the fixup simulations in the other languages, except Java. 409 The \CFA results on the ARM computer for both resumption and function simulation are particularly low; 410 I have no explanation for this anomaly, except the optimizer has managed to remove part of the experiment. 411 Python has a high cost for passing the lambda during the recursion. 412 413 \item[Throw D'tor] 414 For termination, \CFA is twice the cost of \Cpp. 415 The higher cost for \CFA must be related to how destructors are handled. 416 417 \item[Throw Finally] 418 \CFA is better than the other languages with a @finally@ clause, which is the 419 same for termination and resumption. 420 421 \item[Throw/Resume Other] 422 For termination, \CFA is better than the other languages. 423 424 For resumption, \CFA is equal to or better the other languages. 425 Again, the \CFA results on the ARM computer for both resumption and function simulation are particularly low. 426 Python has a high cost for passing the lambda during the recursion. 427 428 \item[Try/Catch/Resume] 429 For termination, installing a try statement is more expressive than \Cpp 430 because the try components are hoisted into local functions. At runtime, these 431 functions are than passed to libunwind functions to set up the try statement. 432 \Cpp zero-cost try-entry accounts for its performance advantage. 433 434 For resumption, there are similar costs to termination to set up the try 435 statement but libunwind is not used. 436 437 \item[Try/Finally] 438 Setting up a try finally is less expensive in \CFA than setting up handlers, 439 and is significantly less than other languages. 440 441 \item[Throw/Resume Match All] 442 For termination, \CFA is close to the other language simulations. 443 444 For resumption, the stack unwinding is much faster because it does not use 445 libunwind. Instead resumption is just traversing a linked list with each node 446 being the next stack frame with the try block. 447 448 \item[Throw/Resume Match None] 449 The same results as for Match All. 450 \end{description} 451 452 \begin{comment} 453 This observation means that while \CFA does not actually keep up with Python in 454 every case, it is usually no worse than roughly half the speed of \Cpp. This 455 performance is good enough for the prototyping purposes of the project. 238 239 One result that is not directly related to \CFA but is important to keep in 240 mind is that in exceptions the standard intuitions about which languages 241 should go faster often do not hold. There are cases where Python out-preforms 242 \Cpp and Java. The most likely explination is that, since exceptions are 243 rarely considered to be the common case, the more optimized langages have 244 optimized at their expence. In addition languages with high level 245 repersentations have a much easier time scanning the stack as there is less 246 to decode. 247 248 This means that while \CFA does not actually keep up with Python in every 249 case it is usually no worse than roughly half the speed of \Cpp. This is good 250 enough for the prototyping purposes of the project. 456 251 457 252 The test case where \CFA falls short is Raise Other, the case where the 458 253 stack is unwound including a bunch of non-matching handlers. 459 This slowdown seems to come from missing optimizations. 254 This slowdown seems to come from missing optimizations, 255 the results above came from gcc/g++ 10 (gcc as \CFA backend or g++ for \Cpp) 256 but the results change if they are run in gcc/g++ 9 instead. 257 Importantly, there is a huge slowdown in \Cpp's results bringing that brings 258 \CFA's performace back in that roughly half speed area. However many other 259 \CFA benchmarks increase their run-time by a similar amount falling far 260 behind their \Cpp counter-parts. 460 261 461 262 This suggests that the performance issue in Raise Other is just an … … 468 269 Resumption exception handling is also incredibly fast. Often an order of 469 270 magnitude or two better than the best termination speed. 470 There is a simple expl anation for this; traversing a linked list is much271 There is a simple explination for this; traversing a linked list is much 471 272 faster than examining and unwinding the stack. When resumption does not do as 472 well its when more try statements are used per raise. Updating the inter nal473 linked list is not very expen sive but it does add up.273 well its when more try statements are used per raise. Updating the interal 274 linked list is not very expencive but it does add up. 474 275 475 276 The relative speed of the Match All and Match None tests (within each … … 479 280 \item 480 281 Java and Python get similar values in both tests. 481 Between the interp reted code, a higher level representation of the call282 Between the interperated code, a higher level repersentation of the call 482 283 stack and exception reuse it it is possible the cost for a second 483 284 throw can be folded into the first. 484 285 % Is this due to optimization? 485 286 \item 486 Both types of \CFA are sligh tly slower if there is not a match.287 Both types of \CFA are slighly slower if there is not a match. 487 288 For termination this likely comes from unwinding a bit more stack through 488 289 libunwind instead of executing the code normally. … … 501 302 The difference in relative performance does show that there are savings to 502 303 be made by performing the check without catching the exception. 503 \end{comment}504 505 506 \begin{comment}507 From: Dave Dice <dave.dice@oracle.com>508 To: "Peter A. Buhr" <pabuhr@uwaterloo.ca>509 Subject: Re: [External] : JIT510 Date: Mon, 16 Aug 2021 01:21:56 +0000511 512 > On 2021-8-15, at 7:14 PM, Peter A. Buhr <pabuhr@uwaterloo.ca> wrote:513 >514 > My student is trying to measure the cost of installing a try block with a515 > finally clause in Java.516 >517 > We tried the random trick (see below). But if the try block is comment out, the518 > results are the same. So the program measures the calls to the random number519 > generator and there is no cost for installing the try block.520 >521 > Maybe there is no cost for a try block with an empty finally, i.e., the try is522 > optimized away from the get-go.523 524 There's quite a bit of optimization magic behind the HotSpot curtains for525 try-finally. (I sound like the proverbial broken record (:>)).526 527 In many cases we can determine that the try block can't throw any exceptions,528 so we can elide all try-finally plumbing. In other cases, we can convert the529 try-finally to normal if-then control flow, in the case where the exception is530 thrown into the same method. This makes exceptions _almost cost-free. If we531 actually need to "physically" rip down stacks, then things get expensive,532 impacting both the throw cost, and inhibiting other useful optimizations at the533 catch point. Such "true" throws are not just expensive, they're _very534 expensive. The extremely aggressive inlining used by the JIT helps, because we535 can convert cases where a heavy rip-down would normally needed back into simple536 control flow.537 538 Other quirks involve the thrown exception object. If it's never accessed then539 we're apply a nice set of optimizations to avoid its construction. If it's540 accessed but never escapes the catch frame (common) then we can also cheat.541 And if we find we're hitting lots of heavy rip-down cases, the JIT will542 consider recompilation - better inlining -- to see if we can merge the throw543 and catch into the same physical frame, and shift to simple branches.544 545 In your example below, System.out.print() can throw, I believe. (I could be546 wrong, but most IO can throw). Native calls that throw will "unwind" normally547 in C++ code until they hit the boundary where they reenter java emitted code,548 at which point the JIT-ed code checks for a potential pending exception. So in549 a sense the throw point is implicitly after the call to the native method, so550 we can usually make those cases efficient.551 552 Also, when we're running in the interpreter and warming up, we'll notice that553 the == 42 case never occurs, and so when we start to JIT the code, we elide the554 call to System.out.print(), replacing it (and anything else which appears in555 that if x == 42 block) with a bit of code we call an "uncommon trap". I'm556 presuming we encounter 42 rarely. So if we ever hit the x == 42 case, control557 hits the trap, which triggers synchronous recompilation of the method, this558 time with the call to System.out.print() and, because of that, we now to adapt559 the new code to handle any traps thrown by print(). This is tricky stuff, as560 we may need to rebuild stack frames to reflect the newly emitted method. And561 we have to construct a weird bit of "thunk" code that allows us to fall back562 directly into the newly emitted "if" block. So there's a large one-time cost563 when we bump into the uncommon trap and recompile, and subsequent execution564 might get slightly slower as the exception could actually be generated, whereas565 before we hit the trap, we knew the exception could never be raised.566 567 Oh, and things also get expensive if we need to actually fill in the stack568 trace associated with the exception object. Walking stacks is hellish.569 570 Quite a bit of effort was put into all this as some of the specjvm benchmarks571 showed significant benefit.572 573 It's hard to get sensible measurements as the JIT is working against you at574 every turn. What's good for the normal user is awful for anybody trying to575 benchmark. Also, all the magic results in fairly noisy and less reproducible576 results.577 578 Regards579 Dave580 581 p.s., I think I've mentioned this before, but throwing in C++ is grim as582 unrelated throws in different threads take common locks, so nothing scales as583 you might expect.584 \end{comment}
Note: See TracChangeset
for help on using the changeset viewer.