Index: doc/theses/andrew_beach_MMath/existing.tex
===================================================================
--- doc/theses/andrew_beach_MMath/existing.tex	(revision 91571e51fbbd54c3310e740cc2fb3b8da3e53dfa)
+++ doc/theses/andrew_beach_MMath/existing.tex	(revision 1dfe6a6359e3613db1b1b204a4c885cc98c942fe)
@@ -83,6 +83,6 @@
 the the call site.
 
-As an example, even if no function named \codeCFA{do\_once} is not defined
-near the definition of \codeCFA{do\_twice} the following code will work.
+As an example, even if no function named \codeCFA{do_once} is not defined
+near the definition of \codeCFA{do_twice} the following code will work.
 \begin{lstlisting}
 int quadruple(int x) {
@@ -95,8 +95,8 @@
 \end{lstlisting}
 This is not the recommended way to implement a quadruple function but it
-does work. The complier will deduce that \codeCFA{do\_twice}'s T is an
+does work. The complier will deduce that \codeCFA{do_twice}'s T is an
 integer from the argument. It will then look for a definition matching the
-assertion which is the \codeCFA{do\_once} defined within the function. That
-function will be passed in as a function pointer to \codeCFA{do\_twice} and
+assertion which is the \codeCFA{do_once} defined within the function. That
+function will be passed in as a function pointer to \codeCFA{do_twice} and
 called within it.
 
@@ -156,5 +156,5 @@
 In \CFA coroutines are created using the \codeCFA{coroutine} keyword which
 works just like \codeCFA{struct} except that the created structure will be
-modified by the compiler to satify the \codeCFA{is\_coroutine} trait.
+modified by the compiler to satify the \codeCFA{is_coroutine} trait.
 
 These structures act as the interface between callers and the coroutine,
Index: doc/theses/andrew_beach_MMath/implement.tex
===================================================================
--- doc/theses/andrew_beach_MMath/implement.tex	(revision 1dfe6a6359e3613db1b1b204a4c885cc98c942fe)
+++ doc/theses/andrew_beach_MMath/implement.tex	(revision 1dfe6a6359e3613db1b1b204a4c885cc98c942fe)
@@ -0,0 +1,480 @@
+\chapter{Implementation}
+% Goes over how all the features are implemented.
+
+\section{Virtual System}
+% Virtual table rules. Virtual tables, the pointer to them and the cast.
+The \CFA virtual system only has one public facing feature: virtual casts.
+However there is a lot of structure to support that and provide some other
+features for the standard library.
+
+All of this is accessed through a field inserted at the beginning of every
+virtual type. Currently it is called \codeC{virtual_table} but it is not
+ment to be accessed by the user. This field is a pointer to the type's
+virtual table instance. It is assigned once during the object's construction
+and left alone after that.
+
+\subsection{Virtual Table Construction}
+For each virtual type a virtual table is constructed. This is both a new type
+and an instance of that type. Other instances of the type could be created
+but the system doesn't use them. So this section will go over the creation of
+the type and the instance.
+
+Creating the single instance is actually very important. The address of the
+table acts as the unique identifier for the virtual type. Similarly the first
+field in every virtual table is the parent's id; a pointer to the parent
+virtual table instance.
+
+The remaining fields contain the type's virtual members. First come the ones
+present on the parent type, in the same order as they were the parent, and
+then any that this type introduces. The types of the ones inherited from the
+parent may have a slightly modified type, in that references to the
+dispatched type are replaced with the current virtual type. These are always
+taken by pointer or reference.
+
+The structure itself is created where the virtual type is created. The name
+of the type is created by mangling the name of the base type. The name of the
+instance is also generated by name mangling.
+
+The fields are initialized automatically.
+The parent field is initialized by getting the type of the parent field and
+using that to calculate the mangled name of the parent's virtual table type.
+There are two special fields that are included like normal fields but have
+special initialization rules: the \codeC{size} field is the type's size and is
+initialized with a sizeof expression, the \codeC{align} field is the type's
+alignment and uses an alignof expression. The remaining fields are resolved
+to a name matching the field's name and type using the normal visibility
+and overload resolution rules of the type system.
+
+These operations are split up into several groups depending on where they
+take place which can vary for monomorphic and polymorphic types. The first
+devision is between the declarations and the definitions. Declarations, such
+as a function signature or a structure's name, must always be visible but may
+be repeated so they go in headers. Definitions, such as function bodies and a
+structure's layout, don't have to be visible on use but must occur exactly
+once and go into source files.
+
+The declarations include the virtual type definition and forward declarations
+of the virtual table instance, constructor, message function and
+\codeCFA{get_exception_vtable}. The definition includes the storage and
+initialization of the virtual table instance and the bodies of the three
+functions.
+
+Monomorphic instances put all of these two groups in one place each.
+
+Polymorphic instances also split out the core declarations and definitions
+from the per-instance information. The virtual table type and most of the
+functions are polymorphic so they are all part of the core. The virtual table
+instance and the \codeCFA{get_exception_vtable} function.
+
+Coroutines and threads need instances of \codeCFA{CoroutineCancelled} and
+\codeCFA{ThreadCancelled} respectively to use all of their functionality.
+When a new data type is declared with \codeCFA{coroutine} or \codeCFA{thread}
+the forward declaration for the instance is created as well. The definition
+of the virtual table is created at the definition of the main function.
+
+\subsection{Virtual Cast}
+Virtual casts are implemented as a function call that does the check and a
+old C-style cast to do the type conversion. The C-cast is just to make sure
+the generated code is correct so the rest of the section is about that
+function.
+
+The function is \codeC{__cfa__virtual_cast} and it is implemented in the
+standard library. It takes a pointer to the target type's virtual table and
+the object pointer being cast. The function is very simple, getting the
+object's virtual table pointer and then checking to see if it or any of
+its ancestors, by using the parent pointers, are the same as the target type
+virtual table pointer. It does this in a simple loop.
+
+For the generated code a forward decaration of the virtual works as follows.
+There is a forward declaration of \codeC{__cfa__virtual_cast} in every cfa
+file so it can just be used. The object argument is the expression being cast
+so that is just placed in the argument list.
+
+To build the target type parameter the compiler will create a mapping from
+concrete type-name -- so for polymorphic types the parameters are filled in
+-- to virtual table address. Every virtual table declaraction is added to the
+this table; repeats are ignored unless they have conflicting definitions.
+This does mean the declaractions have to be in scope, but they should usually
+be introduced as part of the type definition.
+
+\section{Exceptions}
+% Anything about exception construction.
+
+\section{Unwinding}
+% Adapt the unwind chapter, just describe the sections of libunwind used.
+% Mention that termination and cancellation use it. Maybe go into why
+% resumption doesn't as well.
+
+Many modern languages work with an interal stack that function push and pop
+their local data to. Stack unwinding removes large sections of the stack,
+often across functions.
+
+At a very basic level this can be done with \codeC{setjmp} \& \codeC{longjmp}
+which simply move the top of the stack, discarding everything on the stack
+above a certain point. However this ignores all the clean-up code that should
+be run when certain sections of the stack are removed (for \CFA these are from
+destructors and finally clauses) and also requires that the point to which the
+stack is being unwound is known ahead of time. libunwind is used to address
+both of these problems.
+
+Libunwind, provided in \texttt{unwind.h} on most platorms, is a C library
+that provides \CPP style stack unwinding. Its operation is divided into two
+phases. The search phase -- phase 1 -- is used to scan the stack and decide
+where the unwinding will stop, this allows for a dynamic target. The clean-up
+phase -- phase 2 -- does the actual unwinding and also runs any clean-up code
+as it goes.
+
+To use the libunwind each function must have a personality function and an
+LSDA (Language Specific Data Area). Libunwind actually does very little, it
+simply moves down the stack from function to function. Most of the actions are
+implemented by the personality function which libunwind calls on every
+function. Since this is shared across many functions or even every function in
+a language it will need a bit more information. This is provided by the LSDA
+which has the unique information for each function.
+
+Theoretically the LSDA can contain anything but conventionally it is a table
+with entries reperenting areas of the function and what has to be done there
+during unwinding. These areas are described in terms of where the instruction
+pointer is. If the current value of the instruction pointer is between two
+values reperenting the beginning and end of a region then execution is
+currently being executed. These are used to mark out try blocks and the
+scopes of objects with destructors to run.
+
+GCC will generate an LSDA and attach its personality function with the
+\texttt{-fexceptions} flag. However this only handles the cleanup attribute.
+This attribute is used on a variable and specifies a function that should be
+run when the variable goes out of scope. The function is passed a pointer to
+the object as well so it can be used to mimic destructors. It however cannot
+be used to mimic try statements.
+
+\subsection{Implementing Personality Functions}
+Personality functions have a complex interface specified by libunwind.
+This section will cover some of the important parts of that interface.
+
+\begin{lstlisting}
+typedef _Unwind_Reason_Code (*_Unwind_Personality_Fn)(
+    int version,
+    _Unwind_Action action,
+    _Unwind_Exception_Class exception_class,
+    _Unwind_Exception * exception,
+    struct _Unwind_Context * context);
+\end{lstlisting}
+
+The return value, the reason code, is an enumeration of possible messages
+that can be passed several places in libunwind. It includes a number of
+messages for special cases (some of which should never be used by the
+personality function) and error codes but unless otherwise noted the
+personality function should always return \codeC{_URC_CONTINUE_UNWIND}.
+
+The \codeC{version} argument is the verson of the implementation that is
+calling the personality function. At this point it appears to always be 1 and
+it will likely stay that way until a new version of the API is updated.
+
+The \codeC{action} argument is set of flags that tell the personality
+function when it is being called and what it must do on this invocation.
+The flags are as follows:
+\begin{itemize}
+\item\codeC{_UA_SEARCH_PHASE}: This flag is set whenever the personality
+function is called during the search phase. The personality function should
+decide if unwinding will stop in this function or not. If it does then the
+personality function should return \codeC{_URC_HANDLER_FOUND}.
+\item\codeC{_UA_CLEANUP_PHASE}: This flag is set whenever the personality
+function is called during the cleanup phase. If no other flags are set this
+means the entire frame will be unwound and all cleanup code should be run.
+\item\codeC{_UA_HANDLER_FRAME}: This flag is set during the cleanup phase
+on the function frame that found the handler. The personality function must
+prepare to return to normal code execution and return
+\codeC{_URC_INSTALL_CONTEXT}.
+\item\codeC{_UA_FORCE_UNWIND}: This flag is set if the personality function
+is called through a forced unwind call. Forced unwind only performs the
+cleanup phase and uses a different means to decide when to stop. See its
+section below.
+\end{itemize}
+
+The \codeC{exception_class} argument is a copy of the \codeC{exception}'s
+\codeC{exception_class} field.
+
+The \codeC{exception} argument is a pointer to the user provided storage
+object. It has two public fields, the exception class which is actually just
+a number that identifies the exception handling mechanism that created it and
+the other is the clean-up function. The clean-up function is called if the
+exception needs to 
+
+The \codeC{context} argument is a pointer to an opaque type. This is passed
+to the many helper functions that can be called inside the personality
+function.
+
+\subsection{Raise Exception}
+This could be considered the central function of libunwind. It preforms the
+two staged unwinding the library is built around and most of the rest of the
+interface of libunwind is here to support it. It's signature is as follows:
+
+\begin{lstlisting}
+_Unwind_Reason_Code _Unwind_RaiseException(_Unwind_Exception *);
+\end{lstlisting}
+
+When called the function begins the search phase, calling the personality
+function of the most recent stack frame. It will continue to call personality
+functions traversing the stack new-to-old until a function finds a handler or
+the end of the stack is reached. In the latter case raise exception will
+return with \codeC{_URC_END_OF_STACK}.
+
+Once a handler has been found raise exception continues onto the the cleanup
+phase. Once again it will call the personality functins of each stack frame
+from newest to oldest. This pass will stop at the stack frame that found the
+handler last time, if that personality function does not install the handler
+it is an error.
+
+If an error is encountered raise exception will return either
+\codeC{_URC_FATAL_PHASE1_ERROR} or \codeC{_URC_FATAL_PHASE2_ERROR} depending
+on when the error occured.
+
+\subsection{Forced Unwind}
+This is the second big function in libunwind. It also unwinds a stack but it
+does not use the search phase. Instead another function, the stop function,
+is used to decide when to stop.
+
+\begin{lstlisting}
+_Unwind_Reason_Code _Unwind_ForcedUnwind(
+    _Unwind_Exception *, _Unwind_Stop_Fn, void *);
+\end{lstlisting}
+
+The exception is the same as the one passed to raise exception. The extra
+arguments are the stop function and the stop parameter. The stop function has
+a similar interface as a personality function, except it is also passed the
+stop parameter.
+
+\begin{lstlisting}
+typedef _Unwind_Reason_Code (*_Unwind_Stop_Fn)(
+    int version,
+    _Unwind_Action action,
+    _Unwind_Exception_Class exception_class,
+    _Unwind_Exception * exception,
+    struct _Unwind_Context * context,
+    void * stop_parameter);
+\end{lstlisting}
+
+The stop function is called at every stack frame before the personality
+function is called and then once more once after all frames of the stack have
+been unwound.
+
+Each time it is called the stop function should return \codeC{_URC_NO_REASON}
+or transfer control directly to other code outside of libunwind. The
+framework does not provide any assistance here.
+
+Its arguments are the same as the paired personality function.
+The actions \codeC{_UA_CLEANUP_PHASE} and \codeC{_UA_FORCE_UNWIND} are always
+set when it is called. By the official standard that is all but both GCC and
+Clang add an extra action on the last call at the end of the stack:
+\codeC{_UA_END_OF_STACK}.
+
+\section{Exception Context}
+% Should I have another independent section?
+% There are only two things in it, top_resume and current_exception. How it is
+% stored changes depending on wheither or not the thread-library is linked.
+
+The exception context is a piece of global storage used to maintain data
+across different exception operations and to communicate between different
+components.
+
+Each stack has its own exception context. In a purely sequental program, using
+only core Cforall, there is only one stack and the context is global. However
+if the library \texttt{libcfathread} is linked then there can be multiple
+stacks so they will each need their own.
+
+To handle this code always gets the exception context from the function
+\codeC{this_exception_context}. The main exception handling code is in
+\texttt{libcfa} and that library also defines the function as a weak symbol
+so it acts as a default. Meanwhile in \texttt{libcfathread} the function is
+defined as a strong symbol that replaces it when the libraries are linked
+together.
+
+The version of the function defined in \texttt{libcfa} is very simple. It
+returns a pointer to a global static variable. With only one stack this
+global instance is associated with the only stack.
+
+The version of the function defined in \texttt{libcfathread} has to handle
+more as there are multiple stacks. The exception context is included as
+part of the per-stack data stored as part of coroutines. In the cold data
+section, stored at the base of each stack, is the exception context for that
+stack. The \codeC{this_exception_context} uses the concurrency library to get
+the current coroutine and through it the cold data section and the exception
+context.
+
+\section{Termination}
+% Memory management & extra information, the custom function used to implement
+% catches. Talk about GCC nested functions.
+
+Termination exceptions use libunwind quite heavily because it matches the
+intended use from \CPP exceptions very closely. The main complication is that
+since the \CFA compiler works by translating to C code it cannot generate the
+assembly to form the LSDA for try blocks or destructors.
+
+\subsection{Memory Management}
+The first step of termination is to copy the exception into memory managed by
+the exception system. Currently the system just uses malloc, without reserved
+memory or and ``small allocation" optimizations. The exception handling
+mechanism manages memory for the exception as well as memory for libunwind
+and the system's own per-exception storage.
+
+Exceptions are stored in variable sized block. The first component is a fixed
+sized data structure that contains the information for libunwind and the
+exception system. The second component is a blob of memory that is big enough
+to store the exception. Macros with pointer arthritic and type cast are
+used to move between the components or go from the embedded
+\codeC{_Unwind_Exception} to the entire node.
+
+All of these nodes are strung together in a linked list. One linked list per
+stack, with the head stored in the exception context. Within each linked list
+the most recently thrown exception is at the head and the older exceptions
+are further down the list. This list format allows exceptions to be thrown
+while a different exception is being handled. Only the exception at the head
+of the list is currently being handled, the other will wait for the
+exceptions before them to be removed.
+
+The virtual members in the exception's virtual table. The size of the
+exception, the copy function and the free function are all in the virtual
+table so they are decided per-exception type. The size and copy function are
+used right away when the exception is copied in to managed memory. After the
+exception is handled the free function is used to clean up the exception and
+then the entire node is passed to free.
+
+\subsection{Try Statements \& Catch Clauses}
+The try statements with termination handlers have a pretty complex conversion
+to compensate for the lack of assembly generation. Libunwind requires an LSDA
+(Language Specific Data Area) and personality function for a function to
+unwind across it. The LSDA in particular is hard to generate at the level of
+C which is what the \CFA compiler outputs so a work-around is used.
+
+This work around is a function called \codeC{__cfaehm_try_terminate} in the
+standard library. The contents of a try block and the termination handlers
+are converted into functions. These are then passed to the try terminate
+function and it calls them. This puts the try statements in their own
+functions so that no function has to deal with both termination handlers and
+destructors.
+
+This function has some custom embedded assembly that defines its personality
+function and LSDA. This is hand coded in C which is why there is only one
+version of it, the compiler has no capability to generate it. The personality
+function is structured so that it may be expanded, but really it only handles
+this one function. Notably it does not handle any destructors so the function
+is constructed so that it does need to run it.
+
+The three functions passed to try terminate are:
+\begin{itemize}
+\item The try function: This function is the try block, all the code inside
+the try block is placed inside the try function. It takes no parameters and
+has no return value. This function is called during regular execution to run
+the try block.
+\item The match function: This function decides if this try statement should
+handle any given termination exception. It takes a pointer to the exception
+and returns 0 if the exception is not handled here. Otherwise the return value
+is the id of the handler that should handle the exception. It is called
+during the search phase.
+It is constructed from the conditional part of each handler. It runs each
+check in turn, first checking to see if the object
+\item The catch function: This function handles the exception. It takes a
+pointer to the exception and the handler's id and returns nothing. It is
+called after the clean-up phase.
+It is constructed by stitching together the bodies of each handler
+\end{itemize}
+All three are created with GCC nested functions. GCC nested functions can be
+used to create closures, functions that can refer to the state of other
+functions on the stack. This allows the functions to refer to the main
+function and all the variables in scope.
+
+These nested functions and all other functions besides
+\codeC{__cfaehm_try_terminate} in \CFA use the GCC personality function and
+the \texttt{-fexceptions} flag to generate the LSDA. This allows destructors
+to be implemented with the cleanup attribute.
+
+\section{Resumption}
+% The stack-local data, the linked list of nodes.
+
+Resumption uses a list of nodes for its stack traversal. The head of the list
+is stored in the exception context. The nodes in the list just have a pointer
+to the next node and a pointer to the handler function.
+
+The on a resumption throw the this list is traversed. At each node the
+handler function is called and is passed the exception by pointer. It returns
+true if the exception was handled and false otherwise.
+
+The handler function does both the matching and catching. It tries each
+the condition of \codeCFA{catchResume} in order, top-to-bottom and until it
+finds a handler that matches. If no handler matches then the function returns
+false. Otherwise the matching handler is run, if it completes successfully
+the function returns true. Rethrows, through the \codeCFA{throwResume;}
+statement, cause the function to return true.
+
+\subsection{Libunwind Compatibility}
+Resumption does not use libunwind for two simple reasons. The first is that
+it does not have to unwind anything so would never need to use the clean-up
+phase. Still the search phase could be used to make it free to enter or exit
+a try statement with resumption handlers in the same way termination handlers
+are for the same trade off in the cost of the throw. This is where the second
+reason comes in, there is no way to return from a search without installing
+a handler or raising an error.
+
+Although work arounds could be created none seemed to be worth it for the
+prototype. This implementation has no difference in behaviour and is much
+simpler.
+% Seriously, just compare the size of the two chapters and then consider
+% that unwind is required knowledge for that chapter.
+
+\section{Finally}
+% Uses destructors and GCC nested functions.
+Finally clauses are a simple decomposition to some of the existing features.
+The code in the block is placed into a GCC nested function with a unique name,
+no arguments or return values. This nested function is then set as the
+clean-up function of an empty object that is declared at the beginning of a
+block placed around the contexts of the try statement.
+
+The rest is handled by GCC. The try block and all handlers are inside the
+block. When they are complete control exits the block and the empty object
+is cleaned up, which runs the function that contains the finally code.
+
+\section{Cancellation}
+% Stack selections, the three internal unwind functions.
+
+Cancellation also uses libunwind to do its stack traversal and unwinding,
+however it uses a different primary function \codeC{_Unwind_ForcedUnwind}.
+Details of its interface can be found in the unwind section.
+
+The first step of cancellation is to find the stack was cancelled and which
+type of stack it is. Luckily the threads library stores the main thread
+pointer and the current thread pointer and every thread stores a pointer to
+its main coroutine and the coroutine it is currently executing.
+
+So if the the current thread's main and current coroutine do not match, it is
+a coroutine cancellation. Otherwise if the main and current thread do not
+match, it is a thread cancellation. Otherwise it is a main thread
+cancellation.
+
+However if the threading library is not linked then execution must be on the
+main stack as that is the only one that exists. So the entire check is skipped
+using the linker and weak symbols. Instead the main thread cancellation is
+unconditionally preformed.
+
+Regardless of how they are choosen afterwords the stop function and the stop
+parameter are passed to the forced unwind functon. The general pattern of all
+three stop functions is the same, they continue unwinding until the end of
+stack when they do there primary work.
+
+Main stack cancellation it is very simple. The ``transfer" is just an abort,
+the program stops executing.
+
+The coroutine cancellation stores the exception on the coroutine and then
+does a coroutine context switch. The rest is handled inside resume. Every time
+control returns from a resumed thread there is a check to see if it is
+cancelled. If it is the exception is retrieved and the CoroutineCancelled
+exception is constructed and loaded. It is then thrown as a regular exception
+with the default handler coming from the context of the resumption call.
+
+The thread cancellation stores the exception on the thread's main stack and
+then returns to the scheduler. The rest is handled by the joiner. The wait
+for the joined thread to finish works the same but after that it checks
+to see if there was a cancellation. If there was the exception is retrieved
+and the ThreadCancelled exception is constructed. The default handler is
+passed in as a function pointer. If it is null (as it is for the
+auto-generated joins on destructor call) it a default is used that simply
+calls abort; which gives the required handling on implicate join.