Context Navigation

← Previous Changeset
Next Changeset →

Changeset 7372065

Timestamp:

Aug 21, 2021, 5:49:45 PM (4 years ago)

Author:

Andrew Beach <ajbeach@…>

Branches:

ADT, ast-experimental, enum, forall-pointer-decay, jacob/cs343-translation, master, pthread-emulation, qualifiedEnum

Children:

Parents:

Message:

Saved and reverted another set of Peter's changes.

Location:

doc/theses/andrew_beach_MMath

Files:

: 4 edited

conclusion.tex (modified) (1 diff)
future.tex (modified) (8 diffs)
implement.tex (modified) (31 diffs)
performance.tex (modified) (11 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/theses/andrew_beach_MMath/conclusion.tex

-              rd8f8d08
+              r7372065
 \chapter{Conclusion}
-\label{c:conclusion}
 % Just a little knot to tie the paper together.
+In the previous chapters this thesis presents the design and implementation of
+\CFA's EHM.  Both the design and implementation are based off of tools and
+techniques developed for other programming languages but they were adapted to
+better fit \CFA's feature set and add a few features that do not exist in other
+EHMs, like conditional catch, default handlers, implicitly changing resumption
+into termination in the resumption default handler, and cancellation through
+coroutines and threads back to program main.
+In the previous chapters this thesis presents the design and implementation
+of \CFA's exception handling mechanism (EHM).
+Both the design and implementation are based off of tools and techniques
+developed for other programming languages but they were adapted to better fit
+\CFA's feature set.
 The resulting features cover all of the major use cases of the most popular
 termination EHMs of today, along with reintroducing resumption exceptions and
+creating some new features that fit with \CFA's larger programming patterns,
+such as virtuals independent of traditional objects.
+creating some new features that fix with \CFA's larger programming patterns.
+The implementation has been tested through a set of small but interesting micro-benchmarks
+and compared to other implementations.
+The implementation has been tested and compared to other implementations.
 The results, while not cutting edge, are good enough for prototyping, which
 is \CFA's current stage of development.
+is \CFA's stage of development.
 This initial EHM is a valuable new feature for \CFA in its own right but also serves
 as a tool and motivation for other developments in the language.
+This is a valuable new feature for \CFA in its own right but also serves
+as a tool (and motivation) for other developments in the language.

doc/theses/andrew_beach_MMath/future.tex

-              rd8f8d08
+              r7372065
 \label{c:future}
-The following discussion covers both missing language features that affected my
-work and research based improvements.
 \section{Language Improvements}
+\todo{Future/Language Improvements seems to have gotten mixed up. It is
+presented as ``waiting on language improvements" but really its more
+non-research based impovements.}
 \CFA is a developing programming language. As such, there are partially or
 unimplemented features (including several broken components)
 that I had to workaround while building an EHM largely in
+unimplemented features of the language (including several broken components)
+that I had to workaround while building an exception handling system largely in
 the \CFA language (some C components).  The following are a few of these
 issues, and once implemented/fixed, how they would affect the exception system.
 …
 \item
 The implementation of termination is not portable because it includes
+hand-crafted assembly statements for each architecture, where the
+ARM processor was just added.
+% The existing compilers cannot translate that for other platforms and those
+% sections must be ported by hand to
+Supporting more hardware architectures in a general way is important.
+hand-crafted assembly statements.
+The existing compilers cannot translate that for other platforms and those
+sections must be ported by hand to
+support more hardware architectures, such as the ARM processor.
 \item
 Due to a type-system problem, the catch clause cannot bind the exception to a
 …
 @return@, \etc. The reason is that current code generation hoists a handler
 into a nested function for convenience (versus assemble-code generation at the
+@try@ statement). Hence, when the handler runs, its can access local variable
+in the lexical scope of the @try@ statement, but the closure does not capture
+local control-flow points so it cannot perform non-local transfers in the
+hoisted function.
+@try@ statement). Hence, when the handler runs, its code is not in the lexical
+scope of the @try@ statement, where the local control-flow transfers are
+meaningful.
 \item
 There is no detection of colliding unwinds. It is possible for clean-up code
 run during an unwind to trigger another unwind that escapes the clean-up code
 itself; such as a termination exception caught further down the stack or a
 cancellation. There do exist ways to handle this case, but currently there is no
 detection and the first unwind is simply forgotten, often leaving
+cancellation. There do exist ways to handle this but currently they are not
+even detected and the first unwind will simply be forgotten, often leaving
 it in a bad state.
 \item
 Finally, the exception system has not have a lot programmer testing.
 More time with encouraged usage will reveal new
 quality of life upgrades that can be made.
+Also the exception system did not have a lot of time to be tried and tested.
+So just letting people use the exception system more will reveal new
+quality of life upgrades that can be made with time.
 \end{itemize}
 …
 project, but was thrust upon it to do exception inheritance; hence, only
 minimal work is done. A draft for a complete virtual system is available but
 not finalized.  A future \CFA project is to complete that work and then
+it is not finalized.  A future \CFA project is to complete that work and then
 update the exception system that uses the current version.
 …
 exception traits. The most important one is an assertion to check one virtual
 type is a child of another. This check precisely captures many of the
 current ad-hoc correctness requirements.
+correctness requirements.
 The full virtual system might also include other improvement like associated
 types to allow traits to refer to types not listed in their header. This
 feature allows exception traits to not refer to the virtual-table type
 explicitly. %, removing the need for the current interface macros.
+explicitly, removing the need for the current interface macros.
 \section{Additional Raises}
 …
 Checked exceptions make exceptions part of a function's type by adding an
 exception signature. An exception signature must declare all checked
 exceptions that could propagate from the function, either because they were
 raised inside the function or a call to a sub-function. This improves safety
+exceptions that could propagate from the function (either because they were
+raised inside the function or came from a sub-function). This improves safety
 by making sure every checked exception is either handled or consciously
 passed on.
 However checked exceptions were never seriously considered for this project
 because they have significant trade-offs in usability and code reuse in
+because they have significant trade-offs in usablity and code reuse in
 exchange for the increased safety.
 These trade-offs are most problematic when trying to pass exceptions through
 …
 not support a successful-exiting stack-search without doing an unwind.
 Workarounds are possible but awkward. Ideally an extension to libunwind could
 be made, but that would either require separate maintenance or gaining enough
 support to have it folded into the code base.
+be made, but that would either require separate maintenance or gain enough
+support to have it folded into the standard.
 Also new techniques to skip previously searched parts of the stack need to be
 …
 to leave the handler.
 Currently, mimicking this behaviour in \CFA is possible by throwing a
 termination exception inside a resumption handler.
+termination inside a resumption handler.
 % Maybe talk about the escape; and escape CONTROL_STMT; statements or how

doc/theses/andrew_beach_MMath/implement.tex

-              rd8f8d08
+              r7372065
 \subsection{Virtual Type}
+A virtual type~(see \autoref{s:Virtuals}) has a pointer to a virtual table,
+called the \emph{virtual-table pointer}, which binds an instance of a virtual
+type to a virtual table.  Internally, the field is called \snake{virtual_table}
+and is fixed after construction.  This pointer is also the table's id and how
+the system accesses the virtual table and the virtual members there. It is
+always the first field in the structure so that its location is always known.
+Virtual types only have one change to their structure: the addition of a
+pointer to the virtual table, which is called the \emph{virtual-table pointer}.
+Internally, the field is called \snake{virtual_table}.
+The field is fixed after construction. It is always the first field in the
+structure so that its location is always known.
 \todo{Talk about constructors for virtual types (after they are working).}
+The virtual table pointer binds an instance of a virtual type
+to a virtual table.
+The pointer is also the table's id and how the system accesses the
+virtual table and the virtual members there.
 \subsection{Type Id}
+Every virtual type needs a unique id, so that type ids can be compared for
+equality, which checks if the types representation are the same, or used to
+access the type's type information.  Here, uniqueness means within a program
+composed of multiple translation units (TU), not uniqueness across all
+programs.
+One approach for program uniqueness is declaring a static declaration for each
+type id, where the runtime storage address of that variable is guaranteed to be
+unique during program execution. The type id storage can also be used for other
+purposes.
+The problem is that a type id may appear in multiple TUs that compose a
+program, see \autoref{ss:VirtualTable}; hence in each TU, it must be declared
+as external to prevent multiple definitions. However, the type id must actually
+be declared in one of the TUs so the linker creates the storage.  Hence, the
+problem becomes designating one TU to insert an actual type-id declaration. But
+the \CFA compiler does not know the set of the translation units that compose a
+program, because TUs can be compile separately, followed by a separate link
+step.
+The solution is to mimic a \CFA feature in \Cpp{17}, @inline@ variables and
+function:
+\begin{quote}
+There may be more than one definition of an inline function or variable (since
+\Cpp{17} in the program as long as each definition appears in a different
+translation unit and (for non-static inline functions and variables (since
+\Cpp{17})) all definitions are identical. For example, an inline function or an
+inline variable (since \Cpp{17}) may be defined in a header file that is
+@#include@'d in multiple source files.~\cite{C++17}
+\end{quote}
+The underlying mechanism to provide this capability is attribute
+\begin{cfa}
+section(".gnu.linkonce.NAME")
+\end{cfa}
+where @NAME@ is the variable/function name duplicated in each TU.  The linker than
+provides the service of generating a single declaration (instance) across all
+TUs, even if a program is linked incrementally.
+C does not support this feature for @inline@, and hence, neither does \CFA.
+Again, rather than implement a new @inline@ extension for \CFA, a temporary
+solution for the exception handling is to add the following in \CFA.
+\begin{lstlisting}[language=CFA,{moredelim=**[is][\color{red}]{@}{@}}]
+@__attribute__((cfa_linkonce))@ void f() {}
+\end{lstlisting}
+which becomes
+\begin{lstlisting}[language=CFA,{moredelim=**[is][\color{red}]{@}{@}}]
+__attribute__((section(".gnu.linkonce._X1fFv___1"))) void @_X1fFv___1@(){}
+\end{lstlisting}
+where @NAME@ from above is the \CFA mangled variable/function name.  Note,
+adding this feature is necessary because, when using macros, the mangled name
+is unavailable.  This attribute is useful for purposes other than exception
+handling, and should eventually be rolled into @inline@ processing in \CFA.
+Finally, a type id's data implements a pointers to the type's type information
+instance.  Dereferencing the pointer gets the type information.
+\subsection{Implementation}
+Every virtual type has a unique id.
+Type ids can be compared for equality,
+which checks if the types reperented are the same,
+or used to access the type's type information.
 The type information currently is only the parent's type id or, if the
 type has no parent, the null pointer.
+The id's are implemented as pointers to the type's type information instance.
+Dereferencing the pointer gets the type information.
 The ancestors of a virtual type are found by traversing type ids through
 the type information.
+An example using helper macros looks like:
+The information pushes the issue of creating a unique value (for
+the type id) to the problem of creating a unique instance (for type
+information), which the linker can solve.
+The advanced linker support is used here to avoid having to create
+a new declaration to attach this data to.
+With C/\CFA's header/implementation file divide for something to appear
+exactly once it must come from a declaration that appears in exactly one
+implementation file; the declarations in header files may exist only once
+they can be included in many different translation units.
+Therefore, structure's declaration will not work.
+Neither will attaching the type information to the virtual table -- although
+a vtable declarations are in implemention files they are not unique, see
+\autoref{ss:VirtualTable}.
+Instead the same type information is generated multiple times and then
+the new attribute \snake{cfa_linkone} is used to removed duplicates.
+Type information is constructed as follows:
+\begin{enumerate}
+\item
+Use the type's name to generate a name for the type information structure.
+This is saved so it may be reused.
+\item
+Generate a new structure definition to store the type
+information. The layout is the same in each case, just the parent's type id,
+but the types used change from instance to instance.
+The generated name is used for both this structure and, if relivant, the
+parent pointer.
+If the virtual type is polymorphic then the type information structure is
+polymorphic as well, with the same polymorphic arguments.
+\item
+A seperate name for instances is generated from the type's name.
+\item
+The definition is generated and initialised.
+The parent id is set to the null pointer or to the address of the parent's
+type information instance. Name resolution handles the rest.
+\item
+\CFA's name mangler does its regular name mangling encoding the type of
+the declaration into the instance name. This gives a completely unique name
+including different instances of the same polymorphic type.
+\end{enumerate}
+\todo{The list is making me realise, some of this isn't ordered.}
+Writing that code manually, with helper macros for the early name mangling,
+would look like this:
 \begin{cfa}
 struct INFO_TYPE(TYPE) {
 …
 \end{cfa}
-The type information is constructed as follows:
-\begin{enumerate}
-\item
-Use the type's name to generate a name for the type information structure,
-which is saved so it can be reused.
-\item
-Generate a new structure definition to store the type
-information. The layout is the same in each case, just the parent's type id,
-but the types used change from instance to instance.
-The generated name is used for both this structure and, if relevant, the
-parent pointer.
-If the virtual type is polymorphic then the type information structure is
-polymorphic as well, with the same polymorphic arguments.
-\item
-A separate name for instances is generated from the type's name.
-\item
-The definition is generated and initialized.
-The parent id is set to the null pointer or to the address of the parent's
-type information instance. Name resolution handles the rest.
-\item
-\CFA's name mangler does its regular name mangling encoding the type of
-the declaration into the instance name. This process gives a program unique name
-including different instances of the same polymorphic type.
-\end{enumerate}
-\todo{The list is making me realize, some of this isn't ordered.}
-\begin{comment}
 \subsubsection{\lstinline{cfa\_linkonce} Attribute}
 % I just realized: This is an extension of the inline keyword.
+% I just realised: This is an extension of the inline keyword.
 % An extension of C's at least, it is very similar to C++'s.
 Another feature added to \CFA is a new attribute: \texttt{cfa\_linkonce}.
 …
 everything that comes after the special prefix, then only one is used
 and the other is discarded.
-\end{comment}
 \subsection{Virtual Table}
 …
 The first and second sections together mean that every virtual table has a
 prefix that has the same layout and types as its parent virtual table.
 This, combined with the fixed offset to the virtual-table pointer, means that
+This, combined with the fixed offset to the virtual table pointer, means that
 for any virtual type, it is always safe to access its virtual table and,
 from there, it is safe to check the type id to identify the exact type of the
 …
 type's alignment, is set using an @alignof@ expression.
 \subsection{Concurrency Integration}
+\subsubsection{Concurrency Integration}
 Coroutines and threads need instances of @CoroutineCancelled@ and
 @ThreadCancelled@ respectively to use all of their functionality. When a new
 …
 at the definition of the main function.
 These transformations are shown through code re-writing in
 \autoref{f:CoroutineTypeTransformation} and
 \autoref{f:CoroutineMainTransformation} for a coroutine and a thread is similar.
 In both cases, the original declaration is not modified, only new ones are
 added.
+This is showned through code re-writing in
+\autoref{f:ConcurrencyTypeTransformation} and
+\autoref{f:ConcurrencyMainTransformation}.
+In both cases the original declaration is not modified,
+only new ones are added.
 \begin{figure}
 …
 extern CoroutineCancelled_vtable & _default_vtable;
 \end{cfa}
+\caption{Coroutine Type Transformation}
+\label{f:CoroutineTypeTransformation}
+%\end{figure}
+\bigskip
+%\begin{figure}
+\caption{Concurrency Type Transformation}
+\label{f:ConcurrencyTypeTransformation}
+\end{figure}
+\begin{figure}
 \begin{cfa}
 void main(Example & this) {
 …
         &_default_vtable_object_declaration;
 \end{cfa}
 \caption{Coroutine Main Transformation}
 \label{f:CoroutineMainTransformation}
+\caption{Concurrency Main Transformation}
+\label{f:ConcurrencyMainTransformation}
 \end{figure}
 …
         struct __cfavir_type_id const * child );
 \end{cfa}
 The type id for the target type of the virtual cast is passed in as @parent@ and
+The type id of target type of the virtual cast is passed in as @parent@ and
 the cast target is passed in as @child@.
+The generated C code wraps both arguments and the result with type casts.
+For generated C code wraps both arguments and the result with type casts.
 There is also an internal check inside the compiler to make sure that the
 target type is a virtual type.
 …
 \section{Exceptions}
+\todo{Anything about exception construction.}
+% Anything about exception construction.
 \section{Unwinding}
 …
 stack. On function entry and return, unwinding is handled directly by the
 call/return code embedded in the function.
 \PAB{Meaning: In many cases, the position of the instruction pointer (relative to parameter
+In many cases, the position of the instruction pointer (relative to parameter
 and local declarations) is enough to know the current size of the stack
 frame.}
+frame.
 Usually, the stack-frame size is known statically based on parameter and
 local variable declarations. Even for a dynamic stack-size, the information
+local variable declarations. Even with dynamic stack-size, the information
 to determine how much of the stack has to be removed is still contained
 within the function.
 …
 bumping the hardware stack-pointer up or down as needed.
 Constructing/destructing values within a stack frame has
 a similar complexity but larger constants, which takes longer.
+a similar complexity but can add additional work and take longer.
 Unwinding across multiple stack frames is more complex because that
 information is no longer contained within the current function.
+With separate compilation a function does not know its callers nor their frame size.
+In general, the caller's frame size is embedded only at the functions entry (push
+stack) and exit (pop stack).
+With seperate compilation a function has no way of knowing what its callers
+are so it can't know how large those frames are.
 Without altering the main code path it is also hard to pass that work off
 to the caller.
 …
 This approach is fragile and requires extra work in the surrounding code.
 With respect to the extra work in the surrounding code,
+With respect to the extra work in the surounding code,
 many languages define clean-up actions that must be taken when certain
 sections of the stack are removed. Such as when the storage for a variable
 is removed from the stack (destructor call) or when a try statement with a finally clause is
+is removed from the stack or when a try statement with a finally clause is
 (conceptually) popped from the stack.
 None of these cases should be handled by the user --- that would contradict the
+None of these should be handled by the user --- that would contradict the
 intention of these features --- so they need to be handled automatically.
 …
 in this case @clean_up@, run when the variable goes out of scope.
 This feature is enough to mimic destructors,
 but not try statements that affect
+but not try statements which can effect
 the unwinding.
 To get full unwinding support, all of these features must be handled directly
 in assembly and assembler directives; particularly the cfi directives
+in assembly and assembler directives; partiularly the cfi directives
 \snake{.cfi_lsda} and \snake{.cfi_personality}.
 …
 @_UA_FORCE_UNWIND@ specifies a forced unwind call. Forced unwind only performs
 the cleanup phase and uses a different means to decide when to stop
 (see \autoref{s:ForcedUnwind}).
+(see \vref{s:ForcedUnwind}).
 \end{enumerate}
 The @exception_class@ argument is a copy of the
 \code{C}{exception}'s @exception_class@ field,
 which is a number that identifies the EHM
+which is a number that identifies the exception handling mechanism
 that created the exception.
 …
 needs its own exception context.
 An exception context is retrieved by calling the function
+The exception context should be retrieved by calling the function
 \snake{this_exception_context}.
 For sequential execution, this function is defined as
 …
 The first step of a termination raise is to copy the exception into memory
 managed by the exception system. Currently, the system uses @malloc@, rather
 than reserved memory or the stack top. The EHM manages
+than reserved memory or the stack top. The exception handling mechanism manages
 memory for the exception as well as memory for libunwind and the system's own
 per-exception storage.
 …
 \subsection{Try Statements and Catch Clauses}
 The try statement with termination handlers is complex because it must
 compensate for the C code-generation versus proper
+compensate for the C code-generation versus
 assembly-code generated from \CFA. Libunwind
 requires an LSDA and personality function for control to unwind across a
 …
 The workaround is a function called @__cfaehm_try_terminate@ in the standard
 \CFA library. The contents of a try block and the termination handlers are converted
 into nested functions. These are then passed to the try terminate function and it
 calls them, appropriately.
+library. The contents of a try block and the termination handlers are converted
+into functions. These are then passed to the try terminate function and it
+calls them.
 Because this function is known and fixed (and not an arbitrary function that
 happens to contain a try statement), its LSDA can be generated ahead
+happens to contain a try statement), the LSDA can be generated ahead
 of time.
 Both the LSDA and the personality function for @__cfaehm_try_terminate@ are set ahead of time using
+Both the LSDA and the personality function are set ahead of time using
 embedded assembly. This assembly code is handcrafted using C @asm@ statements
 and contains
 enough information for a single try statement the function represents.
+enough information for a single try statement the function repersents.
 The three functions passed to try terminate are:
 …
 decides if a catch clause matches the termination exception. It is constructed
 from the conditional part of each handler and runs each check, top to bottom,
+in turn, first checking to see if the exception type matches.
+The match is performed in two steps, first a virtual cast is used to see
+if the raised exception is an instance of the declared exception or one of
+its descendant type, and then is the condition true, if present.
+It takes a pointer to the exception and returns 0 if the
+in turn, first checking to see if the exception type matches and then if the
+condition is true. It takes a pointer to the exception and returns 0 if the
 exception is not handled here. Otherwise the return value is the id of the
 handler that matches the exception.
 …
 All three functions are created with GCC nested functions. GCC nested functions
 can be used to create closures,
 in other words, functions that can refer to their lexical scope in other
 functions on the stack when called. This approach allows the functions to refer to all the
+in other words functions that can refer to the state of other
+functions on the stack. This approach allows the functions to refer to all the
 variables in scope for the function containing the @try@ statement. These
 nested functions and all other functions besides @__cfaehm_try_terminate@ in
 …
 \autoref{f:TerminationTransformation} shows the pattern used to transform
 a \CFA try statement with catch clauses into the appropriate C functions.
+a \CFA try statement with catch clauses into the approprate C functions.
 \todo{Explain the Termination Transformation figure.}
 …
 Instead of storing the data in a special area using assembly,
 there is just a linked list of possible handlers for each stack,
 with each node on the list representing a try statement on the stack.
+with each node on the list reperenting a try statement on the stack.
 The head of the list is stored in the exception context.
 …
 to the head of the list.
 Instead of traversing the stack, resumption handling traverses the list.
 At each node, the EHM checks to see if the try statement the node represents
+At each node, the EHM checks to see if the try statement the node repersents
 can handle the exception. If it can, then the exception is handled and
 the operation finishes, otherwise the search continues to the next node.
 If the search reaches the end of the list without finding a try statement
 that can handle the exception, the default handler is executed and the
 operation finishes, unless it throws an exception.
+operation finishes.
 Each node has a handler function that does most of the work.
 The handler function is passed the raised exception and returns true
 if the exception is handled and false otherwise.
 The handler function checks each of its internal handlers in order,
 top-to-bottom, until it funds a match. If a match is found that handler is
 …
 If no match is found the function returns false.
 The match is performed in two steps, first a virtual cast is used to see
+if the raised exception is an instance of the declared exception or one of
+its descendant type, and then is the condition true, if present.
+\PAB{I don't understand this sentence.
+This ordering gives the type guarantee used in the predicate.}
+if the thrown exception is an instance of the declared exception or one of
+its descendant type, then check to see if passes the custom predicate if one
+is defined. This ordering gives the type guarantee used in the predicate.
 \autoref{f:ResumptionTransformation} shows the pattern used to transform
 a \CFA try statement with catch clauses into the appropriate C functions.
+a \CFA try statement with catch clauses into the approprate C functions.
 \todo{Explain the Resumption Transformation figure.}
 …
 (see \vpageref{s:ResumptionMarking}), which ignores parts of
 the stack
 already examined, and is accomplished by updating the front of the list as the
 search continues. Before the handler is called at a matching node, the head of the list
+already examined, is accomplished by updating the front of the list as the
+search continues. Before the handler at a node is called, the head of the list
 is updated to the next node of the current node. After the search is complete,
 successful or not, the head of the list is reset.
 …
 been checked are not on the list while a handler is run. If a resumption is
 thrown during the handling of another resumption, the active handlers and all
 the other handlers checked up to this point are not checked again.
+the other handler checked up to this point are not checked again.
 % No paragraph?
 This structure also supports new handlers added while the resumption is being
 …
 \begin{figure}
-\centering
 \input{resumption-marking}
 \caption{Resumption Marking}
 …
 \section{Finally}
 % Uses destructors and GCC nested functions.
+\autoref{f:FinallyTransformation} shows the pattern used to transform a \CFA
+try statement with finally clause into the appropriate C functions.
+The finally clause is placed into a GCC nested-function
+with a unique name, and no arguments or return values.  This nested function is
+then set as the cleanup function of an empty object that is declared at the
+beginning of a block placed around the context of the associated @try@
+statement.
+\begin{figure}
+\begin{cfa}
+try {
+        // TRY BLOCK
+} finally {
+        // FINALLY BLOCK
+}
+\end{cfa}
+\transformline
+\begin{cfa}
+{
+        void finally(void *__hook){
+                // FINALLY BLOCK
+        }
+        __attribute__ ((cleanup(finally)))
+        struct __cfaehm_cleanup_hook __finally_hook;
+        {
+                // TRY BLOCK
+        }
+}
+\end{cfa}
+\caption{Finally Transformation}
+\label{f:FinallyTransformation}
+\end{figure}
+A finally clause is placed into a GCC nested-function with a unique name,
+and no arguments or return values.
+This nested function is then set as the cleanup
+function of an empty object that is declared at the beginning of a block placed
+around the context of the associated @try@ statement.
 The rest is handled by GCC. The try block and all handlers are inside this
 …
 The first step of cancellation is to find the cancelled stack and its type:
 coroutine, thread, or main thread.
+coroutine, thread or main thread.
 In \CFA, a thread (the construct the user works with) is a user-level thread
 (point of execution) paired with a coroutine, the thread's main coroutine.
 The thread library also stores pointers to the main thread and the current
 coroutine.
+thread.
 If the current thread's main and current coroutines are the same then the
 current stack is a thread stack, otherwise it is a coroutine stack.
 …
 passed to the forced-unwind function. The general pattern of all three stop
 functions is the same: continue unwinding until the end of stack and
 then perform the appropriate transfer.
+then preform the appropriate transfer.
 For main stack cancellation, the transfer is just a program abort.

doc/theses/andrew_beach_MMath/performance.tex

-              rd8f8d08
+              r7372065
 \label{c:performance}
+Performance is of secondary importance for most of this project.
+Instead, the focus was to get the features working. The only performance
+requirement is to ensure the tests for correctness run in a reasonable
+amount of time. Hence, a few basic performance tests were performed to
+check this requirement.
+Performance has been of secondary importance for most of this project.
+Instead, the focus has been to get the features working. The only performance
+requirements is to ensure the tests for correctness run in a reasonable
+amount of time.
 \section{Test Set-Up}
 Tests were run in \CFA, C++, Java and Python.
+Tests will be run in \CFA, C++, Java and Python.
 In addition there are two sets of tests for \CFA,
 one for termination and one for resumption exceptions.
+one for termination exceptions and once with resumption exceptions.
 C++ is the most comparable language because both it and \CFA use the same
 framework, libunwind.
 In fact, the comparison is almost entirely a quality of implementation.
 Specifically, \CFA's EHM has had significantly less time to be optimized and
+In fact, the comparison is almost entirely a quality of implementation
+comparison. \CFA's EHM has had significantly less time to be optimized and
 does not generate its own assembly. It does have a slight advantage in that
 there are some features it handles directly instead of through utility functions,
 but otherwise \Cpp should have a significant advantage.
 Java is a popular language with similar termination semantics, but
 it is implemented in a very different environment, a virtual machine with
+there are some features it does not handle, through utility functions,
+but otherwise \Cpp has a significant advantage.
+Java is another very popular language with similar termination semantics.
+It is implemented in a very different environment, a virtual machine with
 garbage collection.
 It also implements the @finally@ clause on @try@ blocks allowing for a direct
+It also implements the finally clause on try blocks allowing for a direct
 feature-to-feature comparison.
 As with \Cpp, Java's implementation is mature, optimized
 and has extra features.
 Python is used as an alternative comparison because of the \CFA EHM's
 current performance goals, which is not to be prohibitively slow while the
+As with \Cpp, Java's implementation is more mature, has more optimizations
+and more extra features.
+Python was used as a point of comparison because of the \CFA EHM's
+current performance goals, which is not be prohibitively slow while the
 features are designed and examined. Python has similar performance goals for
 creating quick scripts and its wide use suggests it has achieved those goals.
+Unfortunately, there are no notable modern programming languages with
+resumption exceptions. Even the older programming languages with resumption
+seem to be notable only for having resumption.
+So instead, resumption is compared to its simulation in other programming
+languages using fixup functions that are explicitly passed for correction or
+logging purposes.
+% So instead, resumption is compared to a less similar but much more familiar
+%feature, termination exceptions.
+All tests are run inside a main loop that repeatedly performs a test.
+This approach avoids start-up or tear-down time from
+Unfortunately there are no notable modern programming languages with
+resumption exceptions. Even the older programming languages with resumptions
+seem to be notable only for having resumptions.
+So instead resumptions are compared to a less similar but much more familiar
+feature, termination exceptions.
+All tests are run inside a main loop which will perform the test
+repeatedly. This is to avoids start-up or tear-down time from
 affecting the timing results.
 Each test is run a N times (configurable from the command line).
 The Java tests runs the main loop 1000 times before
 beginning the actual test to ``warm-up" the JVM.
+Tests ran their main loop a million times.
+The Java versions of the test also run this loop an extra 1000 times before
+beginning to time the results to ``warm-up" the JVM.
 Timing is done internally, with time measured immediately before and
+after the test loop. The difference is calculated and printed.
+immediately after the test loop. The difference is calculated and printed.
 The loop structure and internal timing means it is impossible to test
 unhandled exceptions in \Cpp and Java as that would cause the process to
 …
 critical.
 The exceptions used in these tests are always based off of
 a base exception. This requirement minimizes performance differences based
 on the object model used to represent the exception.
 All tests are designed to be as minimal as possible, while still preventing
 excessive optimizations.
+The exceptions used in these tests will always be a exception based off of
+the base exception. This requirement minimizes performance differences based
+on the object model used to repersent the exception.
+All tests were designed to be as minimal as possible while still preventing
+exessive optimizations.
 For example, empty inline assembly blocks are used in \CFA and \Cpp to
 prevent excessive optimizations while adding no actual work.
-Each test was run eleven times. The top three and bottom three results were
-discarded and the remaining five values are averaged.
-The tests are compiled with gcc-10 for \CFA and g++-10 for \Cpp. Java is
-compiled with version 11.0.11. Python with version 3.8. The tests were run on:
-\begin{itemize}[nosep]
-\item
-ARM 2280 Kunpeng 920 48-core 2$\times$socket \lstinline{@} 2.6 GHz running Linux v5.11.0-25
-\item
-AMD 6380 Abu Dhabi 16-core 4$\times$socket \lstinline{@} 2.5 GHz running Linux v5.11.0-25
-\end{itemize}
-Two kinds of hardware architecture allows discriminating any implementation and
-architectural effects.
 % We don't use catch-alls but if we did:
 …
 The following tests were selected to test the performance of different
 components of the exception system.
 They should provide a guide as to where the EHM's costs are found.
+The should provide a guide as to where the EHM's costs can be found.
 \paragraph{Raise and Handle}
+This group measures the cost of a try statement when exceptions are raised and
+the stack is unwound (termination) or not unwound (resumption).  Each test has
+has a repeating function like the following
+\begin{lstlisting}[language=CFA,{moredelim=**[is][\color{red}]{@}{@}}]
+void unwind_empty(unsigned int frames) {
+        if (frames) {
+                @unwind_empty(frames - 1);@ // AUGMENTED IN OTHER EXPERIMENTS
+        } else throw (empty_exception){&empty_vt};
+}
+\end{lstlisting}
+which is called N times, where each call recurses to a depth of R (configurable from the command line), an
+exception is raised, the stack is a unwound, and the exception caught.
+The first group of tests involve setting up
+So there is three layers to the test. The first is set up and a loop, which
+configures the test and then runs it repeatedly to reduce the impact of
+start-up and shutdown on the results.
+Each iteration of the main loop
 \begin{itemize}[nosep]
+\item Empty:
+For termination, this test measures the cost of raising (stack walking) an
+exception through empty stack frames from the bottom of the recursion to an
+empty handler, and unwinding the stack. (see above code)
+\medskip
+For resumption, this test measures the same raising cost but does not unwind
+the stack. For languages without resumption, a fixup function is to the bottom
+of the recursion and called to simulate a fixup operation at that point.
+\begin{cfa}
+void nounwind_fixup(unsigned int frames, void (*raised_rtn)(int &)) {
+        if (frames) {
+                nounwind_fixup(frames - 1, raised_rtn);
+        } else {
+                int fixup = 17;
+                raised_rtn(fixup);
+        }
+}
+\end{cfa}
+where the passed fixup function is:
+\begin{cfa}
+void raised(int & fixup) {
+        fixup = 42;
+}
+\end{cfa}
+For comparison, a \CFA version passing a function is also included.
+\item Empty Function:
+The repeating function is empty except for the necessary control code.
 \item Destructor:
+This test measures the cost of raising an exception through non-empty frames,
+where each frame has an object requiring destruction, from the bottom of the
+recursion to an empty handler. Hence, there are N destructor calls during
+unwinding.
+\medskip
+This test is not meaningful for resumption because the stack is only unwound as
+the recursion returns.
+\begin{cfa}
+        WithDestructor object;
+        unwind_destructor(frames - 1);
+\end{cfa}
+The repeating function creates an object with a destructor before calling
+itself.
 \item Finally:
+This test measures the cost of establishing a try block with an empty finally
+clause on the front side of the recursion and running the empty finally clauses
+during stack unwinding from the bottom of the recursion to an empty handler.
+\begin{cfa}
+        try {
+                unwind_finally(frames - 1);
+        } finally {}
+\end{cfa}
+\medskip
+This test is not meaningful for resumption because the stack is only unwound as
+the recursion returns.
+The repeating function calls itself inside a try block with a finally clause
+attached.
 \item Other Handler:
+For termination, this test is like the finally test but the try block has a
+catch clause for an exception that is not raised, so catch matching is executed
+during stack unwinding but the match never successes until the catch at the
+bottom of the recursion.
+\begin{cfa}
+        try {
+                unwind_other(frames - 1);
+        } catch (not_raised_exception *) {}
+\end{cfa}
+\medskip
+For resumption, this test measures the same raising cost but does not unwind
+the stack. For languages without resumption, the same fixup function is passed
+and called.
+The repeating function calls itself inside a try block with a handler that
+will not match the raised exception. (But is of the same kind of handler.)
 \end{itemize}
+\paragraph{Try/Handle/Finally Statement}
+This group measures just the cost of executing a try statement so
+\emph{there is no stack unwinding}.  Hence, the program main loops N times
+around:
+\begin{cfa}
+try {
+} catch (not_raised_exception *) {}
+\end{cfa}
+\paragraph{Cross Try Statement}
+The next group measures the cost of a try statement when no exceptions are
+raised. The test is set-up, then there is a loop to reduce the impact of
+start-up and shutdown on the results.
+In each iteration, a try statement is executed. Entering and leaving a loop
+is all the test wants to do.
 \begin{itemize}[nosep]
 \item Handler:
 The try statement has a handler (catch/resume).
+The try statement has a handler (of the matching kind).
 \item Finally:
 The try statement has a finally clause.
 …
 \paragraph{Conditional Matching}
 This group measures the cost of conditional matching.
+This group of tests checks the cost of conditional matching.
 Only \CFA implements the language level conditional match,
 the other languages mimic with an ``unconditional" match (it still
 checks the exception's type) and conditional re-raise if it is not suppose
+the other languages must mimic with an ``unconditional" match (it still
+checks the exception's type) and conditional re-raise if it was not supposed
 to handle that exception.
-\begin{center}
-\begin{tabular}{ll}
-\multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp, Java, Python} \\
-\begin{cfa}
-try {
-        throw_exception();
-} catch (empty_exception * exc;
-                 should_catch) {
+}
-\end{cfa}
+&
-\begin{cfa}
-try {
-        throw_exception();
-} catch (EmptyException & exc) {
-        if (!should_catch) throw;
+}
-\end{cfa}
-\end{tabular}
-\end{center}
 \begin{itemize}[nosep]
 \item Match All:
 …
 \end{itemize}
-\medskip
-\noindent
-All omitted test code for other languages is functionally identical to the \CFA
-tests or simulated, and available online~\cite{CforallExceptionBenchmarks}.
 %\section{Cost in Size}
 %Using exceptions also has a cost in the size of the executable.
 …
 \section{Results}
+One result not directly related to \CFA but important to keep in
+mind is that, for exceptions, the standard intuition about which languages
+should go faster often does not hold. For example, there are a few cases where Python out-performs
+\CFA, \Cpp and Java. The most likely explanation is that, since exceptions are
+rarely considered to be the common case, the more optimized languages
+make that case expense. In addition, languages with high-level
+representations have a much easier time scanning the stack as there is less
+to decode.
+Tables~\ref{t:PerformanceTermination} and~\ref{t:PerformanceResumption} show
+the test results for termination and resumption, respectively.  In cases where
+a feature is not supported by a language, the test is skipped for that language
+(marked N/A).  For some Java experiments it was impossible to measure certain
+effects because the JIT corrupted the test (marked N/C). No workaround was
+possible~\cite{Dice21}.  To get experiments in the range of 1--100 seconds, the
+number of times an experiment is run (N) is varied (N is marked beside each
+experiment, e.g., 1M $\Rightarrow$ 1 million test iterations).
+An anomaly exists with gcc nested functions used as thunks for implementing
+much of the \CFA EHM. If a nested-function closure captures local variables in
+its lexical scope, performance dropped by a factor of 10.  Specifically, in try
+statements of the form:
+\begin{cfa}
+        try {
+                unwind_other(frames - 1);
+        } catch (not_raised_exception *) {}
+\end{cfa}
+the try block is hoisted into a nested function and the variable @frames@ is
+the local parameter to the recursive function, which triggers the anomaly. The
+workaround is to remove the recursion parameter and make it a global variable
+that is explicitly decremented outside of the try block (marked with a ``*''):
+\begin{cfa}
+        frames -= 1;
+        try {
+                unwind_other();
+        } catch (not_raised_exception *) {}
+\end{cfa}
+To make comparisons fair, a dummy parameter is added and the dummy value passed
+in the recursion. Note, nested functions in gcc are rarely used (if not
+completely unknown) and must follow the C calling convention, unlike \Cpp
+lambdas, so it is not surprising if there are performance issues efficiently
+capturing closures.
+% Similarly, if a test does not change between resumption
+% and termination in \CFA, then only one test is written and the result
+% was put into the termination column.
+Each test was run eleven times. The top three and bottom three results were
+discarded and the remaining five values are averaged.
+In cases where a feature is not supported by a language the test is skipped
+for that language. Similarly, if a test is does not change between resumption
+and termination in \CFA, then only one test is written and the result
+was put into the termination column.
 % Raw Data:
 …
 % Match None    & 0.0 & 0.0 &  9476060146 & 0.0 & 0.0 \\
+\begin{tabular}{|l|c c c c c|}
+\hline
+              & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\
+\hline
+Raise Empty   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
+Raise D'tor   & 0.0 & 0.0 & 0.0 & N/A & N/A \\
+Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
+Raise Other   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
+Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
+Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
+Match All     & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
+Match None    & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
+\hline
+\end{tabular}
 % run-plg7a-a.sat
 % ---------------
 …
 % Match None    & 0.0 & 0.0 &  7829059869 & 0.0 & 0.0 \\
+\begin{table}
+\centering
+\caption{Performance Results Termination (sec)}
+\label{t:PerformanceTermination}
+\begin{tabular}{|r|*{2}{|r r r r|}}
+\hline
+                        & \multicolumn{4}{c||}{AMD}             & \multicolumn{4}{c|}{ARM}      \\
+\cline{2-9}
+N\hspace{8pt}           & \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} &
+                          \multicolumn{1}{c}{\CFA} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\
+\hline
+Throw Empty (1M)        & 3.4   & 2.8   & 18.3  & 23.4          & 3.7   & 3.2   & 15.5  & 14.8  \\
+Throw D'tor (1M)        & 48.4  & 23.6  & N/A   & N/A           & 64.2  & 29.0  & N/A   & N/A   \\
+Throw Finally (1M)      & 3.4*  & N/A   & 17.9  & 29.0          & 4.1*  & N/A   & 15.6  & 19.0  \\
+Throw Other (1M)        & 3.6*  & 23.2  & 18.2  & 32.7          & 4.0*  & 24.5  & 15.5  & 21.4  \\
+Try/Catch (100M)        & 6.0   & 0.9   & N/C   & 37.4          & 10.0  & 0.8   & N/C   & 32.2  \\
+Try/Finally (100M)      & 0.9   & N/A   & N/C   & 44.1          & 0.8   & N/A   & N/C   & 37.3  \\
+Match All (10M)         & 32.9  & 20.7  & 13.4  & 4.9           & 36.2  & 24.5  & 12.0  & 3.1   \\
+Match None (10M)        & 32.7  & 50.3  & 11.0  & 5.1           & 36.3  & 71.9  & 12.3  & 4.2   \\
+% PLG7A (in seconds)
+\begin{tabular}{|l|c c c c c|}
+\hline
+              & \CFA (Terminate) & \CFA (Resume) & \Cpp & Java & Python \\
+\hline
+Raise Empty   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
+Raise D'tor   & 0.0 & 0.0 & 0.0 & N/A & N/A \\
+Raise Finally & 0.0 & 0.0 & N/A & 0.0 & 0.0 \\
+Raise Other   & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
+Cross Handler & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
+Cross Finally & 0.0 & N/A & N/A & 0.0 & 0.0 \\
+Match All     & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
+Match None    & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\
 \hline
 \end{tabular}
+\end{table}
+\begin{table}
+\centering
+\small
+\caption{Performance Results Resumption (sec)}
+\label{t:PerformanceResumption}
+\setlength{\tabcolsep}{5pt}
+\begin{tabular}{|r|*{2}{|r r r r|}}
+\hline
+                        & \multicolumn{4}{c||}{AMD}             & \multicolumn{4}{c|}{ARM}      \\
+\cline{2-9}
+N\hspace{8pt}           & \multicolumn{1}{c}{\CFA (R/F)} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c||}{Python} &
+                          \multicolumn{1}{c}{\CFA (R/F)} & \multicolumn{1}{c}{\Cpp} & \multicolumn{1}{c}{Java} & \multicolumn{1}{c|}{Python} \\
+\hline
+Resume Empty (10M)      & 3.8/3.5       & 14.7  & 2.3   & 176.1 & 0.3/0.1       & 8.9   & 1.2   & 119.9 \\
+Resume Other (10M)      & 4.0*/0.1*     & 21.9  & 6.2   & 381.0 & 0.3*/0.1*     & 13.2  & 5.0   & 290.7 \\
+Try/Resume (100M)       & 8.8           & N/A   & N/A   & N/A   & 12.3          & N/A   & N/A   & N/A   \\
+Match All (10M)         & 0.3           & N/A   & N/A   & N/A   & 0.3           & N/A   & N/A   & N/A   \\
+Match None (10M)        & 0.3           & N/A   & N/A   & N/A   & 0.4           & N/A   & N/A   & N/A   \\
+\hline
+\end{tabular}
+\end{table}
+As stated, the performance tests are not attempting to compare exception
+handling across languages.  The only performance requirement is to ensure the
+\CFA EHM implementation runs in a reasonable amount of time, given its
+constraints. In general, the \CFA implement did very well. Each of the tests is
+analysed.
+\begin{description}
+\item[Throw/Resume Empty]
+For termination, \CFA is close to \Cpp, where other languages have a higher cost.
+For resumption, \CFA is better than the fixup simulations in the other languages, except Java.
+The \CFA results on the ARM computer for both resumption and function simulation are particularly low;
+I have no explanation for this anomaly, except the optimizer has managed to remove part of the experiment.
+Python has a high cost for passing the lambda during the recursion.
+\item[Throw D'tor]
+For termination, \CFA is twice the cost of \Cpp.
+The higher cost for \CFA must be related to how destructors are handled.
+\item[Throw Finally]
+\CFA is better than the other languages with a @finally@ clause, which is the
+same for termination and resumption.
+\item[Throw/Resume Other]
+For termination, \CFA is better than the other languages.
+For resumption, \CFA is equal to or better the other languages.
+Again, the \CFA results on the ARM computer for both resumption and function simulation are particularly low.
+Python has a high cost for passing the lambda during the recursion.
+\item[Try/Catch/Resume]
+For termination, installing a try statement is more expressive than \Cpp
+because the try components are hoisted into local functions.  At runtime, these
+functions are than passed to libunwind functions to set up the try statement.
+\Cpp zero-cost try-entry accounts for its performance advantage.
+For resumption, there are similar costs to termination to set up the try
+statement but libunwind is not used.
+\item[Try/Finally]
+Setting up a try finally is less expensive in \CFA than setting up handlers,
+and is significantly less than other languages.
+\item[Throw/Resume Match All]
+For termination, \CFA is close to the other language simulations.
+For resumption, the stack unwinding is much faster because it does not use
+libunwind.  Instead resumption is just traversing a linked list with each node
+being the next stack frame with the try block.
+\item[Throw/Resume Match None]
+The same results as for Match All.
+\end{description}
+\begin{comment}
+This observation means that while \CFA does not actually keep up with Python in
+every case, it is usually no worse than roughly half the speed of \Cpp. This
+performance is good enough for the prototyping purposes of the project.
+One result that is not directly related to \CFA but is important to keep in
+mind is that in exceptions the standard intuitions about which languages
+should go faster often do not hold. There are cases where Python out-preforms
+\Cpp and Java. The most likely explination is that, since exceptions are
+rarely considered to be the common case, the more optimized langages have
+optimized at their expence. In addition languages with high level
+repersentations have a much easier time scanning the stack as there is less
+to decode.
+This means that while \CFA does not actually keep up with Python in every
+case it is usually no worse than roughly half the speed of \Cpp. This is good
+enough for the prototyping purposes of the project.
 The test case where \CFA falls short is Raise Other, the case where the
 stack is unwound including a bunch of non-matching handlers.
+This slowdown seems to come from missing optimizations.
+This slowdown seems to come from missing optimizations,
+the results above came from gcc/g++ 10 (gcc as \CFA backend or g++ for \Cpp)
+but the results change if they are run in gcc/g++ 9 instead.
+Importantly, there is a huge slowdown in \Cpp's results bringing that brings
+\CFA's performace back in that roughly half speed area. However many other
+\CFA benchmarks increase their run-time by a similar amount falling far
+behind their \Cpp counter-parts.
 This suggests that the performance issue in Raise Other is just an
 …
 Resumption exception handling is also incredibly fast. Often an order of
 magnitude or two better than the best termination speed.
 There is a simple explanation for this; traversing a linked list is much
+There is a simple explination for this; traversing a linked list is much
 faster than examining and unwinding the stack. When resumption does not do as
 well its when more try statements are used per raise. Updating the internal
 linked list is not very expensive but it does add up.
+well its when more try statements are used per raise. Updating the interal
+linked list is not very expencive but it does add up.
 The relative speed of the Match All and Match None tests (within each
 …
 \item
 Java and Python get similar values in both tests.
 Between the interpreted code, a higher level representation of the call
+Between the interperated code, a higher level repersentation of the call
 stack and exception reuse it it is possible the cost for a second
 throw can be folded into the first.
 % Is this due to optimization?
 \item
 Both types of \CFA are slightly slower if there is not a match.
+Both types of \CFA are slighly slower if there is not a match.
 For termination this likely comes from unwinding a bit more stack through
 libunwind instead of executing the code normally.
 …
 The difference in relative performance does show that there are savings to
 be made by performing the check without catching the exception.
-\end{comment}
-\begin{comment}
-From: Dave Dice <dave.dice@oracle.com>
-To: "Peter A. Buhr" <pabuhr@uwaterloo.ca>
-Subject: Re: [External] : JIT
-Date: Mon, 16 Aug 2021 01:21:56 +0000
-> On 2021-8-15, at 7:14 PM, Peter A. Buhr <pabuhr@uwaterloo.ca> wrote:
+>
-> My student is trying to measure the cost of installing a try block with a
-> finally clause in Java.
+>
-> We tried the random trick (see below). But if the try block is comment out, the
-> results are the same. So the program measures the calls to the random number
-> generator and there is no cost for installing the try block.
+>
-> Maybe there is no cost for a try block with an empty finally, i.e., the try is
-> optimized away from the get-go.
-There's quite a bit of optimization magic behind the HotSpot curtains for
-try-finally.  (I sound like the proverbial broken record (:>)).
-In many cases we can determine that the try block can't throw any exceptions,
-so we can elide all try-finally plumbing.  In other cases, we can convert the
-try-finally to normal if-then control flow, in the case where the exception is
-thrown into the same method.  This makes exceptions _almost cost-free.  If we
-actually need to "physically" rip down stacks, then things get expensive,
-impacting both the throw cost, and inhibiting other useful optimizations at the
-catch point.  Such "true" throws are not just expensive, they're _very
-expensive.  The extremely aggressive inlining used by the JIT helps, because we
-can convert cases where a heavy rip-down would normally needed back into simple
-control flow.
-Other quirks involve the thrown exception object.  If it's never accessed then
-we're apply a nice set of optimizations to avoid its construction.  If it's
-accessed but never escapes the catch frame (common) then we can also cheat.
-And if we find we're hitting lots of heavy rip-down cases, the JIT will
-consider recompilation - better inlining -- to see if we can merge the throw
-and catch into the same physical frame, and shift to simple branches.
-In your example below, System.out.print() can throw, I believe.  (I could be
-wrong, but most IO can throw).  Native calls that throw will "unwind" normally
-in C++ code until they hit the boundary where they reenter java emitted code,
-at which point the JIT-ed code checks for a potential pending exception.  So in
-a sense the throw point is implicitly after the call to the native method, so
-we can usually make those cases efficient.
-Also, when we're running in the interpreter and warming up, we'll notice that
-the == 42 case never occurs, and so when we start to JIT the code, we elide the
-call to System.out.print(), replacing it (and anything else which appears in
-that if x == 42 block) with a bit of code we call an "uncommon trap".  I'm
-presuming we encounter 42 rarely.  So if we ever hit the x == 42 case, control
-hits the trap, which triggers synchronous recompilation of the method, this
-time with the call to System.out.print() and, because of that, we now to adapt
-the new code to handle any traps thrown by print().  This is tricky stuff, as
-we may need to rebuild stack frames to reflect the newly emitted method.  And
-we have to construct a weird bit of "thunk" code that allows us to fall back
-directly into the newly emitted "if" block.  So there's a large one-time cost
-when we bump into the uncommon trap and recompile, and subsequent execution
-might get slightly slower as the exception could actually be generated, whereas
-before we hit the trap, we knew the exception could never be raised.
-Oh, and things also get expensive if we need to actually fill in the stack
-trace associated with the exception object.  Walking stacks is hellish.
-Quite a bit of effort was put into all this as some of the specjvm benchmarks
-showed significant benefit.
-It's hard to get sensible measurements as the JIT is working against you at
-every turn.  What's good for the normal user is awful for anybody trying to
-benchmark.  Also, all the magic results in fairly noisy and less reproducible
-results.
-Regards
-Dave
-p.s., I think I've mentioned this before, but throwing in C++ is grim as
-unrelated throws in different threads take common locks, so nothing scales as
-you might expect.
-\end{comment}

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: