\chapter{Implementation}
% Goes over how all the features are implemented.

The implementation work for this thesis covers two components: the virtual
system and exceptions. Each component is discussed in detail.

\section{Virtual System}
\label{s:VirtualSystem}
% Virtual table rules. Virtual tables, the pointer to them and the cast.
While the \CFA virtual system currently has only one public feature, virtual
cast (see the virtual cast feature \vpageref{p:VirtualCast}),
substantial structure is required to support it,
and provide features for exception handling and the standard library.

\subsection{Virtual Type}
Virtual types only have one change to their structure: the addition of a
pointer to the virtual table, called the \emph{virtual-table pointer}.
Internally, the field is called
@virtual_table@.
This constant pointer is always the first field of the table so when
casting to a supertype, the field's location is always known.
The field is initialized as part of all generated constructors.
\todo{They only come as part exceptions and don't work.}
%After the object is created the field is constant.
Dereferencing it gives the virtual table and access to the
type's virtual members.

\subsection{Virtual Table}
% PAB: These 2 paragraphs are repeated below, and maybe some of the paragraph above, too.
\begin{comment}
Every time a virtual type is defined, a new virtual table-type is
instantiated.
The uniqueness of the virtual-table
instance is important because its address
is used as the identifier for the virtual type. Hence, a pointer to the
virtual table and the ID for the virtual type are interchangeable.
\todo{Unique instances might be going so we will have to talk about the new
system instead.}

The first step is creating the virtual-table type.
The virtual-table type is a structure and is described in terms of
its fields. The first field contains the parent-type ID (or a pointer to
the parent virtual-table) or 0 (null pointer).
Next are repeated fields from on the parent virtual-table.
Finally, the fields used to store any new virtual members of the new
the virtual type.
\end{comment}

%The virtual system is accessed through a private constant field inserted at the
%beginning of every virtual type. This field
The virtual-table pointer
points at a type's virtual table (see Figure~\vref{f:VirtualTableLayout}).
%and is assigned during the object's
%construction.
The address of a virtual table acts as the unique identifier for
the virtual type, and the first field of a virtual table is a pointer to the
parent virtual-table or @0p@ (null pointer). The remaining fields are duplicated from the
parent tables in this type's inheritance chain, followed by any fields this type
introduces. Parent fields are duplicated so they can be changed, \ie all virtual
members are overridable, while the parent pointer allows access to the original values.
Hence, references to the dispatched type
are replaced with the current virtual type.
% These are always taken by pointer or reference.

\begin{figure}
% Simple ascii diragram:
\begin{cfa}
parent_pointer  // \C{parent pointer to access its fields}
parent_field0   // \C{same layout as parent to allow replacement}
...
parent_fieldN
child_field0    // \C{new types for this virtual table}
...
child_fieldN
size
alignment
\end{cfa}
%\todo{Refine the diagram}
\caption{Virtual Table Layout}
\label{f:VirtualTableLayout}
\end{figure}

% For each virtual type, a virtual table is constructed. This is both a new type
% and an instance of that type. Other instances of the type could be created
% but the system doesn't use them. So this section will go over the creation of
% the type and the instance.

\begin{comment}
PAB: seems to be said already.
A virtual table is created when a virtual type is created. The name of the
type is created by mangling the name of the base type. The name of the instance
is also generated by name mangling. The fields are initialized automatically.
The parent field is initialized by getting the type of the parent field and
using that to calculate the mangled name of the parent's virtual-table type.
\end{comment}
There are two special fields that are included like normal fields but have
special initialization rules: the @size@ field is the type's size and is
initialized with a @sizeof@ expression, the @align@ field is the type's
alignment and uses an @alignof@ expression. The remaining fields are resolved
to a name matching the field's name and type using the normal visibility and
overload resolution rules of the type system.

These operations are split up into several groups depending on where they take
place, which varies for monomorphic and polymorphic types. The first devision is
between the declarations and the definitions. Declarations, such as a function
signature or an aggregate's name, must always be visible but may be repeated in
the form of forward declarations in headers. Definitions, such as function
bodies and a aggregate's layout, can be separately compiled but must occur
exactly once in a source file.

The declarations include the virtual-type definition and forward declarations
of the virtual-table instance, constructor, message function and
@get_exception_vtable@. The definition includes the storage and initialization
of the virtual table instance and the bodies of the three functions.

Monomorphic instances put all of these two groups in one place.
Polymorphic instances split out the core declarations and definitions from
the per-instance information. The virtual-table type and most of the functions
are polymorphic so they are all part of the core. The virtual-table instance
and the @get_exception_vtable@ function \PAB{ are ...}.

Coroutines and threads need instances of @CoroutineCancelled@ and
@ThreadCancelled@ respectively to use all of their functionality. When a new
data type is declared with @coroutine@ or @thread@, the forward declaration for
the instance is created as well. The definition of the virtual table is created
at the definition of the main function.

\PAB{You need an example here to show what happens for this case.}


\subsection{Virtual Cast}
Virtual casts are implemented as a function call that does the subtype check
and a C coercion-cast to do the type conversion.
% The C-cast is just to make sure the generated code is correct so the rest of
% the section is about that function.
The function is
\begin{cfa}
void * __cfa__virtual_cast( struct __cfa__parent_vtable const * parent,
	struct __cfa__parent_vtable const * const * child );
\end{cfa}
and it is implemented in the standard library. The structure represents the
head of a virtual table, which is the pointer to the parent virtual table. The
@parent@ points directly at the parent-type virtual-table, while the @child@
points at the object of the (possible) child type.

\PAB{Need a figure to show this relationship.}

In terms of the virtual-cast expression, @parent@ comes from looking up the
type being cast to and @child@ is the result of the expression being cast.
Because the complier outputs C code, some C-type casts are also used.
The last bit of glue is a map that saves every virtual type the compiler
sees. This table is used to check the type used in a virtual cast is a virtual
type and to get its virtual table.
(It also checks for conflicting definitions.)

\PAB{Can this be rolled into the figure above?}

Inside the function is a simple conditional. If the type represented by
@parent@ is an ancestor of the type represented by @*child@ (it
requires one more level of dereference to pass through the object) then @child@
is returned, otherwise the null pointer is returned.

The check is a simple linear search (like \Cpp RTTI). If the child
virtual table or any of its ancestors (which are retrieved through the first
field of every virtual table) are the same as the parent virtual-table then
the cast succeeds.

\section{Exceptions}
% Anything about exception construction.

\section{Unwinding}
% Adapt the unwind chapter, just describe the sections of libunwind used.
% Mention that termination and cancellation use it. Maybe go into why
% resumption doesn't as well.

% Many modern languages work with an internal stack that function push and pop
% their local data to. Stack unwinding removes large sections of the stack,
% often across functions.

Stack unwinding is the process of removing stack frames (activations) from the
stack. On function entry and return, unwinding is handled directly by the call/return code
embedded in a function. Usually, the stack-frame size is known statically
based on parameter and local variable declarations. For dynamically-sized
local variables.
(Often called a variable-length array or VLA, even when the variable type is an aggregate.)
For VLAs, a runtime computation is necessary to know the frame
size. Finally, a function's frame-size may change during execution as local
variables (static or dynamic sized) go in and out of scope, which is a form of VLA.
Allocating/deallocating stack space is usually an $O(1)$ operation achieved by
bumping the hardware stack-pointer up or down as needed.

Unwinding across multiple stack frames is more complex because individual stack-management
code associated with each frame can be bypassed. That is, the location
of a function's frame-management code is largely unknown and dispersed
throughout the function, hence the current frame size managed by that code is
also unknown. Hence, code unwinding across frames does not have direct
knowledge about what is on the stack, and hence, how much of the stack needs to
be removed.

% At a very basic level this can be done with @setjmp@ \& @longjmp@ which simply
% move the top of the stack, discarding everything on the stack above a certain
% point. However this ignores all the cleanup code that should be run when
% certain sections of the stack are removed (for \CFA these are from destructors
% and finally clauses) and also requires that the point to which the stack is
% being unwound is known ahead of time. libunwind is used to address both of
% these problems.

The traditional unwinding mechanism for C is implemented by saving a snap-shot
of a function's state with @setjmp@ and restoring that snap-shot with
@longjmp@. This approach bypasses the need to know stack details by simply
reseting to a snap-shot of an arbitrary but existing function frame on the
stack. It is up to the programmer to ensure the snap-shot is valid when it is
reset and that unwound frames do not have side-effects.
Hence, this unwinding approach is fragile with potential errors that are
difficult to debug because the stack becomes corrupted.

With respect to stack side-effects, many languages define cleanup actions that must be taken when objects
are deallocated from the stack, when the function of blocks within the function end, such as running a variable's
destructor or a @try@ statement's @finally@ clause.
The purpose of these side-effects is to reestablish the global state of the program, such as dynamic memory-allocation or file access.
Handling these side-effect mechanisms
requires walking the stack and checking each stack frame for these potential
actions, where a frame can be any block with declarations.

In languages like \Cpp and Java, it must be possible to walk the stack frames in search of @try@
statements to match and execute a handler. For termination exceptions, it must
also be possible to unwind all stack frames from the throw to the matching
catch (including the @try@ block), and each of these frames must be checked for cleanup actions. Stack
walking is where most of the complexity and expense of exception handling
appears.

One of the most popular tools for stack management is libunwind, a low-level
library that provides tools for stack walking, handler execution, and
unwinding. What follows is an overview of all the relevant features of
libunwind needed for this work, and how \CFA uses them to implement exception
handling.

\subsection{libunwind Usage}
Libunwind, accessed through @unwind.h@ on most platforms, is a C library that
provides \Cpp-style stack-unwinding. Its operation is divided into two phases:
search and cleanup. The dynamic target search -- phase 1 -- is used to scan the
stack and decide where unwinding should stop (but no unwinding occurs). The
cleanup -- phase 2 -- does the unwinding and also runs any cleanup code.

To use libunwind, each function must have a personality function and a Language
Specific Data Area (LSDA). The LSDA has the unique information for each
function to tell the personality function where a function is executing, its
current stack frame, and what handlers should be checked. Theoretically, the
LSDA can contain any information but conventionally it is a table with entries
representing regions of the function and what has to be done there during
unwinding. These regions are bracketed by instruction addresses. If the
instruction pointer is within a region's start/end, then execution is currently
executing in that region. Regions are used to mark out the scopes of objects
with destructors and try blocks.

% Libunwind actually does very little, it simply moves down the stack from
% function to function. Most of the actions are implemented by the personality
% function which libunwind calls on every function. Since this is shared across
% many functions or even every function in a language it will need a bit more
% information.

The GCC compilation flag @-fexceptions@ causes the generation of an LSDA and
attaches its personality function.
It attaches a series of opaque directives (@.cfi_personality@ directive)
used internally and not part of this work.
However, this
flag only handles the cleanup attribute:
\begin{cfa}
void clean_up( int * var ) { ... }
int avar __attribute__(( cleanup(clean_up) ));
\end{cfa}
that is used on a variable and specifies a function, in this case @clean_up@,
run when the variable goes out of scope, which is used to mimic destructors.
However, this feature cannot be used to mimic @try@ statements as it cannot
control the unwinding.

\subsection{Personality Functions}
Personality functions have a complex interface specified by libunwind. This
section covers some of the important parts of the interface.

A personality function can perform different actions depending on how it is
called.
\begin{lstlisting}[language=C,{moredelim=**[is][\color{red}]{@}{@}}]
typedef _Unwind_Reason_Code (*@_Unwind_Personality_Fn@) (
	_Unwind_Action @action@,
	_Unwind_Exception_Class @exception_class@,
	_Unwind_Exception * @exception@,
	struct _Unwind_Context * @context@
);
\end{lstlisting}
The @action@ argument is a bitmask of possible actions:
\begin{enumerate}[topsep=5pt]
\item
@_UA_SEARCH_PHASE@ specifies a search phase and tells the personality function
to check for handlers. If there is a handler in a stack frame, as defined by
the language, the personality function returns @_URC_HANDLER_FOUND@; otherwise
it return @_URC_CONTINUE_UNWIND@.

\item
@_UA_CLEANUP_PHASE@ specifies a cleanup phase, where the entire frame is
unwound and all cleanup code is run. The personality function does whatever
cleanup the language defines (such as running destructors/finalizers) and then
generally returns @_URC_CONTINUE_UNWIND@.

\item
\begin{sloppypar}
@_UA_HANDLER_FRAME@ specifies a cleanup phase on a function frame that found a
handler. The personality function must prepare to return to normal code
execution and return @_URC_INSTALL_CONTEXT@.
\end{sloppypar}

\item
@_UA_FORCE_UNWIND@ specifies a forced unwind call. Forced unwind only performs
the cleanup phase and uses a different means to decide when to stop
(see Section~\vref{s:ForcedUnwind}).
\end{enumerate}

The @exception_class@ argument is a copy of the
\lstinline[language=C]|exception|'s @exception_class@ field.
\PAB{Say more.}

The \lstinline[language=C]|exception| argument is a pointer to the user
provided storage object. It has two public fields, the exception class, which
is just a number, identifying the exception handling mechanism that
created it, and the cleanup function. The cleanup function is called if
required by the exception.

The @context@ argument is a pointer to an opaque type passed to helper
functions called inside the personality function.

The return value, @_Unwind_Reason_Code@, is an enumeration of possible messages
that can be passed several places in libunwind. It includes a number of
messages for special cases (some of which should never be used by the
personality function) and error codes. However, unless otherwise noted, the
personality function should always return @_URC_CONTINUE_UNWIND@.

\subsection{Raise Exception}
Raising an exception is the central function of libunwind and it performs a
two-staged unwinding.
\begin{cfa}
_Unwind_Reason_Code _Unwind_RaiseException(_Unwind_Exception *);
\end{cfa}
First, the function begins the search phase, calling the personality function
of the most recent stack frame. It continues to call personality functions
traversing the stack from newest to oldest until a function finds a handler or
the end of the stack is reached. In the latter case, raise exception returns
@_URC_END_OF_STACK@.

Second, when a handler is matched, raise exception walks the stack again performing the cleanup
phase.
Once again, it calls the personality functions of each stack frame from newest
to oldest. This pass stops at the stack frame containing the matching handler.
If that personality function has not install a handler, it is an error.

If an error is encountered, raise exception returns either
@_URC_FATAL_PHASE1_ERROR@ or @_URC_FATAL_PHASE2_ERROR@ depending on when the
error occurred.

\subsection{Forced Unwind}
\label{s:ForcedUnwind}
Forced Unwind is the other central function in libunwind.
\begin{cfa}
_Unwind_Reason_Code _Unwind_ForcedUnwind(_Unwind_Exception *,
	_Unwind_Stop_Fn, void *);
\end{cfa}
It also unwinds the stack but it does not use the search phase. Instead another
function, the stop function, is used to stop searching. The exception is the
same as the one passed to raise exception. The extra arguments are the stop
function and the stop parameter. The stop function has a similar interface as a
personality function, except it is also passed the stop parameter.
\begin{lstlisting}[language=C,{moredelim=**[is][\color{red}]{@}{@}}]
typedef _Unwind_Reason_Code (*@_Unwind_Stop_Fn@)(
	_Unwind_Action @action@,
	_Unwind_Exception_Class @exception_class@,
	_Unwind_Exception * @exception@,
	struct _Unwind_Context * @context@,
	void * @stop_parameter@);
\end{lstlisting}

The stop function is called at every stack frame before the personality
function is called and then once more after all frames of the stack are
unwound.

Each time it is called, the stop function should return @_URC_NO_REASON@ or
transfer control directly to other code outside of libunwind. The framework
does not provide any assistance here.

\begin{sloppypar}
Its arguments are the same as the paired personality function. The actions
@_UA_CLEANUP_PHASE@ and @_UA_FORCE_UNWIND@ are always set when it is
called. Beyond the libunwind standard, both GCC and Clang add an extra action
on the last call at the end of the stack: @_UA_END_OF_STACK@.
\end{sloppypar}

\section{Exception Context}
% Should I have another independent section?
% There are only two things in it, top_resume and current_exception. How it is
% stored changes depending on whether or not the thread-library is linked.

The exception context is global storage used to maintain data across different
exception operations and to communicate among different components.

Each stack must have its own exception context. In a sequential \CFA program,
there is only one stack with a single global exception-context. However, when
the library @libcfathread@ is linked, there are multiple stacks, where each
needs its own exception context.

The function @this_exception_context@ provides general access to the exception context.
For sequential execution, this function is defined as
a weak symbol in the \CFA system-library, @libcfa@. When a \CFA program is
concurrent, it links with @libcfathread@, where this function is defined with a
strong symbol replacing the sequential version.

The sequential @this_exception_context@ returns a hard-coded pointer to the
global exception context.
The concurrent version adds the exception context to the data stored at the
base of each stack. When @this_exception_context@ is called, it retrieves the
active stack and returns the address of the context saved there.

\section{Termination}
% Memory management & extra information, the custom function used to implement
% catches. Talk about GCC nested functions.

Termination exceptions use libunwind heavily because \CFA termination exceptions match
\Cpp exceptions closely. The main complication for \CFA is that the
compiler generates C code, making it very difficult to generate the assembly to
form the LSDA for try blocks or destructors.

\subsection{Memory Management}
The first step of a termination raise is to copy the exception into memory
managed by the exception system. Currently, the system uses @malloc@, rather
than reserved memory or the stack top. The exception-handling mechanism manages
memory for the exception as well as memory for libunwind and the system's own
per-exception storage.

\begin{figure}
\begin{verbatim}
Fixed Header  | _Unwind_Exception   <- pointer target
              |
              | Cforall storage
              |
Variable Body | the exception       <- fixed offset
              V ...
\end{verbatim}
\caption{Exception Layout}
\label{f:ExceptionLayout}
\end{figure}

Exceptions are stored in variable-sized blocks (see Figure~\vref{f:ExceptionLayout}).
The first component is a fixed-sized data-structure that contains the
information for libunwind and the exception system. The second component is an
area of memory big enough to store the exception. Macros with pointer arthritic
and type cast are used to move between the components or go from the embedded
@_Unwind_Exception@ to the entire node.

Multiple exceptions can exist because handlers can call functions that raise
exceptions.  Figure~\vref{f:MultipleExceptions} shows a \Cpp program where
exceptions are handled, and then a function is called from the handler that
raises a new exception. The previous exception must persist because it is
unhandled, and hence, control can return to the handler and that exception is
reraised.

\begin{figure}
\centering
\newsavebox{\myboxA}
\newsavebox{\myboxB}
\begin{lrbox}{\myboxA}
\begin{lstlisting}[language=C++,{moredelim=**[is][\color{red}]{@}{@}}]
struct E {};
int cnt = 3;
void f( int i ) {
	if ( i == 0 ) @throw E();@
	try {
		@f( i - 1 );@
	} catch( E ) { // handler h
		cnt -= 1;
		if ( cnt > 0 ) @f( 2 );@
	}
}
int main() { @f( 2 );@ }
\end{lstlisting}
\end{lrbox}

\begin{lrbox}{\myboxB}
\begin{lstlisting}
h  $\makebox[0pt][l]{\textbackslash}f$
   f
   f
h  $\makebox[0pt][l]{\textbackslash}f$  throw E$\(_2\)$
   f
   f
h  $\makebox[0pt][l]{\textbackslash}f$  throw E$\(_1\)$
   f
   f
\end{lstlisting}
\end{lrbox}

{\usebox\myboxA}
\hspace{25pt}
{\usebox\myboxB}

\caption{Multiple Exceptions}
\label{f:MultipleExceptions}
\end{figure}

In this case, the exception nodes are linked together in a list, one list per stack, with the
list head stored in the exception context. Within each linked list, the most
recently thrown exception is at the head followed by older thrown
exceptions. This format allows exceptions to be thrown, while a different
exception is being handled. The exception at the head of the list is currently
being handled, while other exceptions wait for the exceptions before them to be
removed.

The virtual members in the exception's virtual table provide the size of the
exception, the copy function, and the free function, so they are specific to an
exception type. The size and copy function are used immediately to copy an
exception into managed memory. After the exception is handled, the free function
is used to clean up the exception and then the entire node is passed to free
so the memory can be given back to the heap.

\subsection{Try Statements and Catch Clauses}
The try statement with termination handlers is complex because it must
compensate for the lack of assembly code generated from \CFA. Libunwind
requires an LSDA and personality function for control to unwind across a
function. The LSDA in particular is hard to mimic in generated C code.

The workaround is a function called @__cfaehm_try_terminate@ in the standard
library. The contents of a try block and the termination handlers are converted
into functions. These are then passed to the try terminate function and it
calls them.
Because this function is known and fixed (and not an arbitrary function that
happens to contain a try statement), this means the LSDA can be generated ahead
of time.

Both the LSDA and the personality function are set ahead of time using
embedded assembly. This assembly code is handcrafted using C @asm@ statements and contains
enough information for the single try statement the function represents.

The three functions passed to try terminate are:
\begin{description}
\item[try function:] This function is the try block, all the code inside the
try block is placed inside the try function. It takes no parameters and has no
return value. This function is called during regular execution to run the try
block.

\item[match function:] This function is called during the search phase and
decides if a catch clause matches the termination exception. It is constructed
from the conditional part of each handler and runs each check, top to bottom,
in turn, first checking to see if the exception type matches and then if the
condition is true. It takes a pointer to the exception and returns 0 if the
exception is not handled here. Otherwise the return value is the id of the
handler that matches the exception.

\item[handler function:] This function handles the exception. It takes a
pointer to the exception and the handler's id and returns nothing. It is called
after the cleanup phase. It is constructed by stitching together the bodies of
each handler and dispatches to the selected handler.
\end{description}
All three functions are created with GCC nested functions. GCC nested functions
can be used to create closures, functions that can refer to the state of other
functions on the stack. This approach allows the functions to refer to all the
variables in scope for the function containing the @try@ statement. These
nested functions and all other functions besides @__cfaehm_try_terminate@ in
\CFA use the GCC personality function and the @-fexceptions@ flag to generate
the LSDA. Through this mechanism, \CFA destructors are implemented via the cleanup attribute.

\PAB{Try to put together an example try statement illustrating these components.}

\section{Resumption}
% The stack-local data, the linked list of nodes.

Resumption is simpler to implement than termination because there is no stack
unwinding.  \PAB{You need to explain how the \lstinline{catchResume} clauses are
handled. Do you use the personality mechanism in libunwind or do you roll your
own mechanism?}

The
resumption raise uses a list of nodes for its stack traversal. The head of the
list is stored in the exception context. The nodes in the list have a pointer
to the next node and a pointer to the handler function.
A resumption raise traverses this list. At each node the handler function is
called, passing the exception by pointer. It returns true if the exception is
handled and false otherwise.

The handler function does both the matching and handling. It computes the
condition of each @catchResume@ in top-to-bottom order, until it finds a
handler that matches. If no handler matches then the function returns
false. Otherwise the matching handler is run; if it completes successfully, the
function returns true. Rethrowing, through the @throwResume;@ statement,
causes the function to return true.

% Recursive Resumption Stuff:
Search skipping (see \vpageref{s:ResumptionMarking}), which ignores parts of
the stack
already examined, is accomplished by updating the front of the list as the
search continues. Before the handler at a node is called, the head of the list
is updated to the next node of the current node. After the search is complete,
successful or not, the head of the list is reset.

This mechanism means the current handler and every handler that has already
been checked are not on the list while a handler is run. If a resumption is
thrown during the handling of another resumption the active handlers and all
the other handler checked up to this point are not checked again.

This structure also supports new handlers added while the resumption is being
handled. These are added to the front of the list, pointing back along the
stack -- the first one points over all the checked handlers -- and the ordering
is maintained.

\PAB{Again, a figure to show how this works would be helpful.}

\label{p:zero-cost}
Note, the resumption implementation has a cost for entering/exiting a @try@
statement with @catchResume@ clauses, whereas a @try@ statement with @catch@
clauses has zero-cost entry/exit. While resumption does not need the stack
unwinding and cleanup provided by libunwind, it could use the search phase to
providing zero-cost enter/exit using the LSDA. Unfortunately, there is no way
to return from a libunwind search without installing a handler or raising an
error. Although workarounds might be possible, they are beyond the scope of
this thesis. The current resumption implementation has simplicity in its
favour.
% Seriously, just compare the size of the two chapters and then consider
% that unwind is required knowledge for that chapter.

\PAB{This paragraph needs to be moved to the start of this Section, where I have have my other comment.}

\section{Finally}
% Uses destructors and GCC nested functions.
A finally clause is placed into a GCC nested-function with a unique mangled name, and no
arguments or return values. This nested function is then set as the cleanup
function of an empty object that is declared at the beginning of a block placed
around the context of an associated @try@ statement.

The rest is handled by GCC. The try block and all handlers are inside this
block. At completion, control exits the block and the empty object is cleaned
up, which runs the function that contains the finally code.

\section{Cancellation}
% Stack selections, the three internal unwind functions.

Cancellation also uses libunwind to do its stack traversal and unwinding,
however it uses a different primary function, @_Unwind_ForcedUnwind@. Details
of its interface can be found in Section~\vref{s:ForcedUnwind}.

The first step of cancellation is to find the cancelled stack and its type:
coroutine or thread. Fortunately, the thread library stores the program-main thread
pointer and the current-thread pointer, and every thread stores a pointer to
the current coroutine it is executing.

\PAB{I don't know if my corrections in the previous paragraph are correct.}

When the active thread and coroutine are the same, the current stack is the thread stack, otherwise it is a coroutine
stack.
% PAB: repeated?
% If it is a thread stack, then an equality check with the stored main
% thread pointer and current thread pointer is enough to tell if the current
% thread is the main thread or not.
However, if the threading library is not linked, the sequential execution is on
the main stack. Hence, the entire check is skipped because the weak-symbol
function is loaded. Therefore, a main thread cancellation is unconditionally
performed.

Regardless of how the stack is chosen, the stop function and parameter are
passed to the forced-unwind function. The general pattern of all three stop
functions is the same: continue unwinding until the end of stack.
%when they
%do there primary work.
For main stack cancellation, the transfer is just a program abort.

For coroutine cancellation, the exception is stored in the coroutine's stack,
and the coroutine context switches to its last resumer. The rest is handled on
the backside of the resume, which check if the resumed coroutine is
cancelled. If cancelled, the exception is retrieved from the resumed coroutine,
and a @CoroutineCancelled@ exception is constructed and loaded with the
cancelled exception. It is then resumed as a regular exception with the default
handler coming from the context of the resumption call.

For thread cancellation, the exception is stored on the thread's main stack and
then context switched to the scheduler. The rest is handled by the thread
joiner. When the join is complete, the joiner checks if the joined thread is
cancelled. If cancelled, the exception is retrieved and the joined thread, and
a @ThreadCancelled@ exception is constructed and loaded with the cancelled
exception. The default handler is passed in as a function pointer. If it is
null (as it is for the auto-generated joins on destructor call), the default is
used, which is a program abort.
%; which gives the required handling on implicate join.