\chapter{Exception Features} This chapter covers the design and user interface of the \CFA exception-handling mechanism (EHM). % or exception system. We will begin with an overview of EHMs in general. It is not a strict definition of all EHMs nor an exaustive list of all possible features. However it does cover the most common structure and features found in them. % We should cover what is an exception handling mechanism and what is an % exception before this. Probably in the introduction. Some of this could % move there. \paragraph{Raise / Handle} An exception operation has two main parts: raise and handle. These terms are sometimes also known as throw and catch but this work uses throw/catch as a particular kind of raise/handle. These are the two parts that the user will write themselves and may be the only two pieces of the EHM that have any syntax in the language. \subparagraph{Raise} The raise is the starting point for exception handling. It marks the beginning of exception handling by raising an excepion, which passes it to the EHM. Some well known examples include the @throw@ statements of \Cpp and Java and the \codePy{raise} statement from Python. In real systems a raise may preform some other work (such as memory management) but for the purposes of this overview that can be ignored. \subparagraph{Handle} The purpose of most exception operations is to run some user code to handle that exception. This code is given, with some other information, in a handler. A handler has three common features: the previously mentioned user code, a region of code they cover and an exception label/condition that matches certain exceptions. Only raises inside the covered region and raising exceptions that match the label can be handled by a given handler. Different EHMs will have different rules to pick a handler if multipe handlers could be used such as ``best match" or ``first found". The @try@ statements of \Cpp, Java and Python are common examples. All three also show another common feature of handlers, they are grouped by the covered region. \paragraph{Propagation} After an exception is raised comes what is usually the biggest step for the EHM: finding and setting up the handler. The propogation from raise to handler can be broken up into three different tasks: searching for a handler, matching against the handler and installing the handler. \subparagraph{Searching} The EHM begins by searching for handlers that might be used to handle the exception. Searching is usually independent of the exception that was thrown as it looks for handlers that have the raise site in their covered region. This includes handlers in the current function, as well as any in callers on the stack that have the function call in their covered region. \subparagraph{Matching} Each handler found has to be matched with the raised exception. The exception label defines a condition that be use used with exception and decides if there is a match or not. In languages where the first match is used this step is intertwined with searching, a match check is preformed immediately after the search finds a possible handler. \subparagraph{Installing} After a handler is chosen it must be made ready to run. The implementation can vary widely to fit with the rest of the design of the EHM. The installation step might be trivial or it could be the most expensive step in handling an exception. The latter tends to be the case when stack unwinding is involved. If a matching handler is not guarantied to be found the EHM will need a different course of action here in the cases where no handler matches. This is only required with unchecked exceptions as checked exceptions (such as in Java) can make than guaranty. This different action can also be installing a handler but it is usually an implicat and much more general one. \subparagraph{Hierarchy} A common way to organize exceptions is in a hierarchical structure. This is especially true in object-orientated languages where the exception hierarchy is a natural extension of the object hierarchy. Consider the following hierarchy of exceptions: \begin{center} \input{exception-hierarchy} \end{center} A handler labelled with any given exception can handle exceptions of that type or any child type of that exception. The root of the exception hierarchy (here \codeC{exception}) acts as a catch-all, leaf types catch single types and the exceptions in the middle can be used to catch different groups of related exceptions. This system has some notable advantages, such as multiple levels of grouping, the ability for libraries to add new exception types and the isolation between different sub-hierarchies. This design is used in \CFA even though it is not a object-orientated language using different tools to create the hierarchy. % Could I cite the rational for the Python IO exception rework? \paragraph{Completion} After the handler has finished the entire exception operation has to complete and continue executing somewhere else. This step is usually simple, both logically and in its implementation, as the installation of the handler is usually set up to do most of the work. The EHM can return control to many different places, the most common are after the handler definition and after the raise. \paragraph{Communication} For effective exception handling, additional information is usually passed from the raise to the handler. So far only communication of the exceptions' identity has been covered. A common method is putting fields into the exception instance and giving the handler access to them. \section{Virtuals} Virtual types and casts are not part of \CFA's EHM nor are they required for any EHM. But \CFA uses a hierarchial system of exceptions and this feature is leveraged to create that. % Maybe talk about why the virtual system is so minimal. % Created for but not a part of the exception system. The virtual system supports multiple ``trees" of types. Each tree is a simple hierarchy with a single root type. Each type in a tree has exactly one parent -- except for the root type which has zero parents -- and any number of children. Any type that belongs to any of these trees is called a virtual type. % A type's ancestors are its parent and its parent's ancestors. % The root type has no ancestors. % A type's decendents are its children and its children's decendents. Every virtual type also has a list of virtual members. Children inherit their parent's list of virtual members but may add new members to it. It is important to note that these are virtual members, not virtual methods of object-orientated programming, and can be of any type. However, since \CFA has function pointers and they are allowed, virtual members can be used to mimic virtual methods. Each virtual type has a unique id. This unique id and all the virtual members are combined into a virtual table type. Each virtual type has a pointer to a virtual table as a hidden field. Up until this point the virtual system is similar to ones found in object-orientated languages but this where \CFA diverges. Objects encapsulate a single set of behaviours in each type, universally across the entire program, and indeed all programs that use that type definition. In this sense the types are ``closed" and cannot be altered. In \CFA types do not encapsulate any behaviour. Traits are local and types can begin to statify a trait, stop satifying a trait or satify the same trait in a different way at any lexical location in the program. In this sense they are ``open" as they can change at any time. This means it is implossible to pick a single set of functions that repersent the type's implementation across the program. \CFA side-steps this issue by not having a single virtual table for each type. A user can define virtual tables which are filled in at their declaration and given a name. Anywhere that name is visible, even if it was defined locally inside a function (although that means it will not have a static lifetime), it can be used. Specifically, a virtual type is ``bound" to a virtual table which sets the virtual members for that object. The virtual members can be accessed through the object. While much of the virtual infrastructure is created, it is currently only used internally for exception handling. The only user-level feature is the virtual cast, which is the same as the \Cpp \lstinline[language=C++]|dynamic_cast|. \label{p:VirtualCast} \begin{cfa} (virtual TYPE)EXPRESSION \end{cfa} Note, the syntax and semantics matches a C-cast, rather than the function-like \Cpp syntax for special casts. Both the type of @EXPRESSION@ and @TYPE@ must be a pointer to a virtual type. The cast dynamically checks if the @EXPRESSION@ type is the same or a sub-type of @TYPE@, and if true, returns a pointer to the @EXPRESSION@ object, otherwise it returns @0p@ (null pointer). \section{Exception} % Leaving until later, hopefully it can talk about actual syntax instead % of my many strange macros. Syntax aside I will also have to talk about the % features all exceptions support. Exceptions are defined by the trait system; there are a series of traits, and if a type satisfies them, then it can be used as an exception. The following is the base trait all exceptions need to match. \begin{cfa} trait is_exception(exceptT &, virtualT &) { virtualT const & get_exception_vtable(exceptT *); }; \end{cfa} The trait is defined over two types, the exception type and the virtual table type. This should be one-to-one: each exception type has only one virtual table type and vice versa. The only assertion in the trait is @get_exception_vtable@, which takes a pointer of the exception type and returns a reference to the virtual table type instance. % TODO: This section, and all references to get_exception_vtable, are % out-of-data. Perhaps wait until the update is finished before rewriting it. The function @get_exception_vtable@ is actually a constant function. Regardless of the value passed in (including the null pointer) it should return a reference to the virtual table instance for that type. The reason it is a function instead of a constant is that it make type annotations easier to write as you can use the exception type instead of the virtual table type; which usually has a mangled name. % Also \CFA's trait system handles functions better than constants and doing % it this way reduce the amount of boiler plate we need. % I did have a note about how it is the programmer's responsibility to make % sure the function is implemented correctly. But this is true of every % similar system I know of (except Agda's I guess) so I took it out. There are two more traits for exceptions defined as follows: \begin{cfa} trait is_termination_exception( exceptT &, virtualT & | is_exception(exceptT, virtualT)) { void defaultTerminationHandler(exceptT &); }; trait is_resumption_exception( exceptT &, virtualT & | is_exception(exceptT, virtualT)) { void defaultResumptionHandler(exceptT &); }; \end{cfa} Both traits ensure a pair of types are an exception type and its virtual table and defines one of the two default handlers. The default handlers are used as fallbacks and are discussed in detail in \vref{s:ExceptionHandling}. However, all three of these traits can be tricky to use directly. While there is a bit of repetition required, the largest issue is that the virtual table type is mangled and not in a user facing way. So these three macros are provided to wrap these traits to simplify referring to the names: @IS_EXCEPTION@, @IS_TERMINATION_EXCEPTION@ and @IS_RESUMPTION_EXCEPTION@. All three take one or two arguments. The first argument is the name of the exception type. The macro passes its unmangled and mangled form to the trait. The second (optional) argument is a parenthesized list of polymorphic arguments. This argument is only used with polymorphic exceptions and the list is be passed to both types. In the current set-up, the two types always have the same polymorphic arguments so these macros can be used without losing flexibility. For example consider a function that is polymorphic over types that have a defined arithmetic exception: \begin{cfa} forall(Num | IS_EXCEPTION(Arithmetic, (Num))) void some_math_function(Num & left, Num & right); \end{cfa} \section{Exception Handling} \label{s:ExceptionHandling} \CFA provides two kinds of exception handling: termination and resumption. These twin operations are the core of \CFA's exception handling mechanism. This section will cover the general patterns shared by the two operations and then go on to cover the details each individual operation. Both operations follow the same set of steps. Both start with the user preforming a raise on an exception. Then the exception propogates up the stack. If a handler is found the exception is caught and the handler is run. After that control returns to normal execution. If the search fails a default handler is run and then control returns to normal execution after the raise. This general description covers what the two kinds have in common. Differences include how propogation is preformed, where exception continues after an exception is caught and handled and which default handler is run. \subsection{Termination} \label{s:Termination} Termination handling is the familiar kind and used in most programming languages with exception handling. It is dynamic, non-local goto. If the raised exception is matched and handled the stack is unwound and control will (usually) continue the function on the call stack that defined the handler. Termination is commonly used when an error has occurred and recovery is impossible locally. % (usually) Control can continue in the current function but then a different % control flow construct should be used. A termination raise is started with the @throw@ statement: \begin{cfa} throw EXPRESSION; \end{cfa} The expression must return a reference to a termination exception, where the termination exception is any type that satisfies the trait @is_termination_exception@ at the call site. Through \CFA's trait system the trait functions are implicity passed into the throw code and the EHM. A new @defaultTerminationHandler@ can be defined in any scope to change the throw's behavior (see below). The throw will copy the provided exception into managed memory to ensure the exception is not destroyed if the stack is unwound. It is the user's responsibility to ensure the original exception is cleaned up wheither the stack is unwound or not. Allocating it on the stack is usually sufficient. Then propogation starts with the search. \CFA uses a ``first match" rule so matching is preformed with the copied exception as the search continues. It starts from the throwing function and proceeds to the base of the stack, from callee to caller. At each stack frame, a check is made for resumption handlers defined by the @catch@ clauses of a @try@ statement. \begin{cfa} try { GUARDED_BLOCK } catch (EXCEPTION_TYPE$\(_1\)$ * [NAME$\(_1\)$]) { HANDLER_BLOCK$\(_1\)$ } catch (EXCEPTION_TYPE$\(_2\)$ * [NAME$\(_2\)$]) { HANDLER_BLOCK$\(_2\)$ } \end{cfa} When viewed on its own, a try statement will simply execute the statements in @GUARDED_BLOCK@ and when those are finished the try statement finishes. However, while the guarded statements are being executed, including any invoked functions, all the handlers in the statement are now on the search path. If a termination exception is thrown and not handled further up the stack they will be matched against the exception. Exception matching checks the handler in each catch clause in the order they appear, top to bottom. If the representation of the thrown exception type is the same or a descendant of @EXCEPTION_TYPE@$_i$ then @NAME@$_i$ (if provided) is bound to a pointer to the exception and the statements in @HANDLER_BLOCK@$_i$ are executed. If control reaches the end of the handler, the exception is freed and control continues after the try statement. If no termination handler is found during the search then the default handler (@defaultTerminationHandler@) is run. Through \CFA's trait system the best match at the throw sight will be used. This function is run and is passed the copied exception. After the default handler is run control continues after the throw statement. There is a global @defaultTerminationHandler@ that is polymorphic over all exception types. Since it is so general a more specific handler can be defined and will be used for those types, effectively overriding the handler for particular exception type. The global default termination handler performs a cancellation (see \vref{s:Cancellation}) on the current stack with the copied exception. \subsection{Resumption} \label{s:Resumption} Resumption exception handling is less common than termination but is just as old~\cite{Goodenough75} and is simpler in many ways. It is a dynamic, non-local function call. If the raised exception is matched a closure will be taken from up the stack and executed, after which the raising function will continue executing. These are most often used when an error occurred and if the error is repaired then the function can continue. A resumption raise is started with the @throwResume@ statement: \begin{cfa} throwResume EXPRESSION; \end{cfa} It works much the same way as the termination throw. The expression must return a reference to a resumption exception, where the resumption exception is any type that satisfies the trait @is_resumption_exception@ at the call site. The assertions from this trait are available to the exception system while handling the exception. At run-time, no exception copy is made. As the stack is not unwound the exception and any values on the stack will remain in scope while the resumption is handled. The EHM then begins propogation. The search starts from the raise in the resuming function and proceeds to the base of the stack, from callee to caller. At each stack frame, a check is made for resumption handlers defined by the @catchResume@ clauses of a @try@ statement. \begin{cfa} try { GUARDED_BLOCK } catchResume (EXCEPTION_TYPE$\(_1\)$ * [NAME$\(_1\)$]) { HANDLER_BLOCK$\(_1\)$ } catchResume (EXCEPTION_TYPE$\(_2\)$ * [NAME$\(_2\)$]) { HANDLER_BLOCK$\(_2\)$ } \end{cfa} % I wonder if there would be some good central place for this. Note that termination handlers and resumption handlers may be used together in a single try statement, intermixing @catch@ and @catchResume@ freely. Each type of handler will only interact with exceptions from the matching type of raise. When a try statement is executed it simply executes the statements in the @GUARDED_BLOCK@ and then finishes. However, while the guarded statements are being executed, including any invoked functions, all the handlers in the statement are now on the search path. If a resumption exception is reported and not handled further up the stack they will be matched against the exception. Exception matching checks the handler in each catch clause in the order they appear, top to bottom. If the representation of the thrown exception type is the same or a descendant of @EXCEPTION_TYPE@$_i$ then @NAME@$_i$ (if provided) is bound to a pointer to the exception and the statements in @HANDLER_BLOCK@$_i$ are executed. If control reaches the end of the handler, execution continues after the the raise statement that raised the handled exception. Like termination, if no resumption handler is found, the default handler visible at the throw statement is called. It will use the best match at the call sight according to \CFA's overloading rules. The default handler is passed the exception given to the throw. When the default handler finishes execution continues after the raise statement. There is a global @defaultResumptionHandler@ is polymorphic over all termination exceptions and preforms a termination throw on the exception. The @defaultTerminationHandler@ for that raise is matched at the original raise statement (the resumption @throwResume@) and it can be customized by introducing a new or better match as well. \subsubsection{Resumption Marking} \label{s:ResumptionMarking} A key difference between resumption and termination is that resumption does not unwind the stack. A side effect that is that when a handler is matched and run it's try block (the guarded statements) and every try statement searched before it are still on the stack. This can lead to the recursive resumption problem. The recursive resumption problem is any situation where a resumption handler ends up being called while it is running. Consider a trivial case: \begin{cfa} try { throwResume (E &){}; } catchResume(E *) { throwResume (E &){}; } \end{cfa} When this code is executed the guarded @throwResume@ will throw, start a search and match the handler in the @catchResume@ clause. This will be call and placed on the stack on top of the try-block. The second throw then throws and will search the same try block and put call another instance of the same handler leading to an infinite loop. This situation is trivial and easy to avoid, but much more complex cycles can form with multiple handlers and different exception types. To prevent all of these cases we mark try statements on the stack. A try statement is marked when a match check is preformed with it and an exception. The statement will be unmarked when the handling of that exception is completed or the search completes without finding a handler. While a try statement is marked its handlers are never matched, effectify skipping over it to the next try statement. \begin{center} \input{stack-marking} \end{center} These rules mirror what happens with termination. When a termination throw happens in a handler the search will not look at any handlers from the original throw to the original catch because that part of the stack has been unwound. A resumption raise in the same situation wants to search the entire stack, but it will not try to match the exception with try statements in the section that would have been unwound as they are marked. The symmetry between resumption termination is why this pattern was picked. Other patterns, such as marking just the handlers that caught, also work but lack the symmetry means there are less rules to remember. \section{Conditional Catch} Both termination and resumption handler clauses can be given an additional condition to further control which exceptions they handle: \begin{cfa} catch (EXCEPTION_TYPE * [NAME] ; CONDITION) \end{cfa} First, the same semantics is used to match the exception type. Second, if the exception matches, @CONDITION@ is executed. The condition expression may reference all names in scope at the beginning of the try block and @NAME@ introduced in the handler clause. If the condition is true, then the handler matches. Otherwise, the exception search continues as if the exception type did not match. The condition matching allows finer matching by allowing the match to check more kinds of information than just the exception type. \begin{cfa} try { handle1 = open( f1, ... ); handle2 = open( f2, ... ); handle3 = open( f3, ... ); ... } catch( IOFailure * f ; fd( f ) == f1 ) { // Only handle IO failure for f1. } catch( IOFailure * f ; fd( f ) == f3 ) { // Only handle IO failure for f3. } // Can't handle a failure relating to f2 here. \end{cfa} In this example the file that experianced the IO error is used to decide which handler should be run, if any at all. \begin{comment} % I know I actually haven't got rid of them yet, but I'm going to try % to write it as if I had and see if that makes sense: \section{Reraising} \label{s:Reraising} Within the handler block or functions called from the handler block, it is possible to reraise the most recently caught exception with @throw@ or @throwResume@, respectively. \begin{cfa} try { ... } catch( ... ) { ... throw; } catchResume( ... ) { ... throwResume; } \end{cfa} The only difference between a raise and a reraise is that reraise does not create a new exception; instead it continues using the current exception, \ie no allocation and copy. However the default handler is still set to the one visible at the raise point, and hence, for termination could refer to data that is part of an unwound stack frame. To prevent this problem, a new default handler is generated that does a program-level abort. \end{comment} \subsection{Comparison with Reraising} A more popular way to allow handlers to match in more detail is to reraise the exception after it has been caught if it could not be handled here. On the surface these two features seem interchangable. If we used @throw;@ to start a termination reraise then these two statements would have the same behaviour: \begin{cfa} try { do_work_may_throw(); } catch(exception_t * exc ; can_handle(exc)) { handle(exc); } \end{cfa} \begin{cfa} try { do_work_may_throw(); } catch(exception_t * exc) { if (can_handle(exc)) { handle(exc); } else { throw; } } \end{cfa} If there are further handlers after this handler only the first version will check them. If multiple handlers on a single try block could handle the same exception the translations get more complex but they are equivilantly powerful. Until stack unwinding comes into the picture. In termination handling, a conditional catch happens before the stack is unwound, but a reraise happens afterwards. Normally this might only cause you to loose some debug information you could get from a stack trace (and that can be side stepped entirely by collecting information during the unwind). But for \CFA there is another issue, if the exception isn't handled the default handler should be run at the site of the original raise. There are two problems with this: the site of the original raise doesn't exist anymore and the default handler might not exist anymore. The site will always be removed as part of the unwinding, often with the entirety of the function it was in. The default handler could be a stack allocated nested function removed during the unwind. This means actually trying to pretend the catch didn't happening, continuing the original raise instead of starting a new one, is infeasible. That is the expected behaviour for most languages and we can't replicate that behaviour. \section{Finally Clauses} \label{s:FinallyClauses} Finally clauses are used to preform unconditional clean-up when leaving a scope and are placed at the end of a try statement after any handler clauses: \begin{cfa} try { GUARDED_BLOCK } ... // any number or kind of handler clauses ... finally { FINALLY_BLOCK } \end{cfa} The @FINALLY_BLOCK@ is executed when the try statement is removed from the stack, including when the @GUARDED_BLOCK@ finishes, any termination handler finishes or during an unwind. The only time the block is not executed is if the program is exited before the stack is unwound. Execution of the finally block should always finish, meaning control runs off the end of the block. This requirement ensures control always continues as if the finally clause is not present, \ie finally is for cleanup not changing control flow. Because of this requirement, local control flow out of the finally block is forbidden. The compiler precludes any @break@, @continue@, @fallthru@ or @return@ that causes control to leave the finally block. Other ways to leave the finally block, such as a long jump or termination are much harder to check, and at best requiring additional run-time overhead, and so are only discouraged. Not all languages with unwinding have finally clauses. Notably \Cpp does without it as descructors serve a similar role. Although destructors and finally clauses can be used in many of the same areas they have their own use cases like top-level functions and lambda functions with closures. Destructors take a bit more work to set up but are much easier to reuse while finally clauses are good for one-off uses and can easily include local information. \section{Cancellation} \label{s:Cancellation} Cancellation is a stack-level abort, which can be thought of as as an uncatchable termination. It unwinds the entire current stack, and if possible forwards the cancellation exception to a different stack. Cancellation is not an exception operation like termination or resumption. There is no special statement for starting a cancellation; instead the standard library function @cancel_stack@ is called passing an exception. Unlike a raise, this exception is not used in matching only to pass information about the cause of the cancellation. (This also means matching cannot fail so there is no default handler.) After @cancel_stack@ is called the exception is copied into the EHM's memory and the current stack is unwound. After that it depends one which stack is being cancelled. \begin{description} \item[Main Stack:] The main stack is the one used by the program main at the start of execution, and is the only stack in a sequential program. After the main stack is unwound there is a program-level abort. There are two reasons for this. The first is that it obviously had to do this in a sequential program as there is nothing else to notify and the simplicity of keeping the same behaviour in sequential and concurrent programs is good. Also, even in concurrent programs there is no stack that an innate connection to, so it would have be explicitly managed. \item[Thread Stack:] A thread stack is created for a \CFA @thread@ object or object that satisfies the @is_thread@ trait. After a thread stack is unwound there exception is stored until another thread attempts to join with it. Then the exception @ThreadCancelled@, which stores a reference to the thread and to the exception passed to the cancellation, is reported from the join. There is one difference between an explicit join (with the @join@ function) and an implicit join (from a destructor call). The explicit join takes the default handler (@defaultResumptionHandler@) from its calling context while the implicit join provides its own which does a program abort if the @ThreadCancelled@ exception cannot be handled. Communication is done at join because a thread only has to have to points of communication with other threads: start and join. Since a thread must be running to perform a cancellation (and cannot be cancelled from another stack), the cancellation must be after start and before the join. So join is the one that we will use. % TODO: Find somewhere to discuss unwind collisions. The difference between the explicit and implicit join is for safety and debugging. It helps prevent unwinding collisions by avoiding throwing from a destructor and prevents cascading the error across multiple threads if the user is not equipped to deal with it. Also you can always add an explicit join if that is the desired behaviour. \item[Coroutine Stack:] A coroutine stack is created for a @coroutine@ object or object that satisfies the @is_coroutine@ trait. After a coroutine stack is unwound control returns to the resume function that most recently resumed it. The resume statement reports a @CoroutineCancelled@ exception, which contains a references to the cancelled coroutine and the exception used to cancel it. The resume function also takes the @defaultResumptionHandler@ from the caller's context and passes it to the internal report. A coroutine knows of two other coroutines, its starter and its last resumer. The starter has a much more distant connection while the last resumer just (in terms of coroutine state) called resume on this coroutine, so the message is passed to the latter. \end{description}