# Changeset d95969a for doc

Ignore:
Timestamp:
Jan 25, 2021, 3:45:42 PM (6 months ago)
Branches:
arm-eh, jacob/cs343-translation, master
Children:
c292244
Parents:
b6a8b31 (diff), 7158202 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Location:
doc
Files:
11 edited

Unmodified
Removed

• ## doc/theses/andrew_beach_MMath/existing.tex

 rb6a8b31 \chapter{\CFA{} Existing Features} \section{Overloading and extern} Cforall has overloading, allowing multiple definitions of the same name to be defined. This also adds name mangling so that the assembly symbols are unique for different overloads. For compatability with names in C there is also a syntax to diable the name mangling. These unmangled names cannot be overloaded but act as the interface between C and \CFA code. The syntax for disabling mangling is: \begin{lstlisting} extern "C" { ... } \end{lstlisting} To re-enable mangling once it is disabled the syntax is: \begin{lstlisting} extern "Cforall" { ... } \end{lstlisting} Both should occur at the declaration level and effect all the declarations in \texttt{...}. Neither care about the state of mangling when they begin and will return to that state after the group is finished. So re-enabling is only used to nest areas of mangled and unmangled declarations. \section{References} \CFA adds references to C. These are auto-dereferencing pointers and use the same syntax as pointers except they use ampersand (\codeCFA{\&}) instead of the asterisk (\codeCFA{*}). They can also be constaint or mutable, if they are mutable they may be assigned to by using the address-of operator (\codeCFA\&) which converts them into a pointer. \chapter{\texorpdfstring{\CFA Existing Features}{Cforall Existing Features}} \CFA (C-for-all)~\cite{Cforall} is an open-source project extending ISO C with modern safety and productivity features, while still ensuring backwards compatibility with C and its programmers.  \CFA is designed to have an orthogonal feature-set based closely on the C programming paradigm (non-object-oriented) and these features can be added incrementally to an existing C code-base allowing programmers to learn \CFA on an as-needed basis. Only those \CFA features pertinent to this thesis are discussed.  Many of the \CFA syntactic and semantic features used in the thesis should be fairly obvious to the reader. \section{\texorpdfstring{Overloading and \lstinline|extern|}{Overloading and extern}} \CFA has extensive overloading, allowing multiple definitions of the same name to be defined.~\cite{Moss18} \begin{cfa} char i; int i; double i;                        $\C[3.75in]{// variable overload}$ int f(); double f();                            $\C{// return overload}$ void g( int ); void g( double );        $\C{// parameter overload}\CRT$ \end{cfa} This feature requires name mangling so the assembly symbols are unique for different overloads. For compatibility with names in C, there is also a syntax to disable name mangling. These unmangled names cannot be overloaded but act as the interface between C and \CFA code.  The syntax for disabling/enabling mangling is: \begin{cfa} // name mangling int i; // _X1ii_1 @extern "C"@ {  // no name mangling int j; // j @extern "Cforall"@ {  // name mangling int k; // _X1ki_1 } // no name mangling } // name mangling \end{cfa} Both forms of @extern@ affect all the declarations within their nested lexical scope and transition back to the previous mangling state when the lexical scope ends. \section{Reference Type} \CFA adds a rebindable reference type to C, but more expressive than the \CC reference.  Multi-level references are allowed and act like auto-dereferenced pointers using the ampersand (@&@) instead of the pointer asterisk (@*@). \CFA references may also be mutable or non-mutable. If mutable, a reference variable may be assigned to using the address-of operator (@&@), which converts the reference to a pointer. \begin{cfa} int i, j; int @&@ ri = i, @&&@ rri = ri; rri = 3;  // auto-dereference assign to i @&@ri = @&@j; // rebindable ri = 5;   // assign to j \end{cfa} \section{Constructors and Destructors} Both constructors and destructors are operators, which means they are just functions with special names. The special names are used to define them and may be used to call the functions expicately. The \CFA special names are constructed by taking the tokens in the operators and putting \texttt{?} where the arguments would go. So multiplication is \texttt{?*?} while dereference is \texttt{*?}. This also make it easy to tell the difference between pre-fix operations (such as \texttt{++?}) and post-fix operations (\texttt{?++}). The special name for contructors is \texttt{?\{\}}, which comes from the initialization syntax in C. The special name for destructors is \texttt{\^{}?\{\}}. % I don't like the \^{} symbol but $^\wedge$ isn't better. Any time a type T goes out of scope the destructor matching \codeCFA{void ^?\{\}(T \&);} is called. In theory this is also true of primitive types such as \codeCFA{int}, but in practice those are no-ops and are usually omitted for optimization. functions with special operator names rather than type names in \CC. The special operator names may be used to call the functions explicitly (not allowed in \CC for constructors). In general, operator names in \CFA are constructed by bracketing an operator token with @?@, which indicates where the arguments. For example, infixed multiplication is @?*?@ while prefix dereference is @*?@. This syntax make it easy to tell the difference between prefix operations (such as @++?@) and post-fix operations (@?++@). The special name for a constructor is @?{}@, which comes from the initialization syntax in C. The special name for a destructor is @^{}@, where the @^@ has no special meaning. % I don't like the \^{} symbol but $^\wedge$ isn't better. \begin{cfa} struct T { ... }; void ?@{}@(@T &@ this, ...) { ... }  // constructor void ?@^{}@(@T &@ this, ...) { ... } // destructor { T s = @{@ ... @}@;  // same constructor/initialization braces } // destructor call automatically generated \end{cfa} The first parameter is a reference parameter to the type for the constructor/destructor. Destructors may have multiple parameters.  The compiler implicitly matches an overloaded constructor @void ^?{}(T &, ...);@ to an object declaration with associated initialization, and generates a construction call after the object is allocated. When an object goes out of scope, the matching overloaded destructor @void ^?{}(T &);@ is called.  Without explicit definition, \CFA creates a default and copy constructor, destructor and assignment (like \CC). It is possible to define constructors/destructors for basic and existing types. \section{Polymorphism} \CFA uses polymorphism to create functions and types that are defined over different types. \CFA polymorphic declarations serve the same role as \CPP templates or Java generics. Polymorphic declaractions start with a forall clause that goes before the standard (monomorphic) declaration. These declarations have the same syntax except that you may use the names introduced by the forall clause in them. Forall clauses are written \codeCFA{forall( ... )} where \codeCFA{...} becomes the list of polymorphic variables (local type names) and assertions, which repersent required operations on those types. \begin{lstlisting} forall(dtype T | { void do_once(T &); }) void do_twice(T & value) { do_once(value); do_once(value); } \end{lstlisting} A polymorphic function can be used in the same way normal functions are. The polymorphics variables are filled in with concrete types and the assertions are checked. An assertion checked by seeing if that name of that type (with all the variables replaced with the concrete types) is defined at the the call site. As an example, even if no function named \codeCFA{do_once} is not defined near the definition of \codeCFA{do_twice} the following code will work. \begin{lstlisting} \CFA uses parametric polymorphism to create functions and types that are defined over multiple types. \CFA polymorphic declarations serve the same role as \CC templates or Java generics. The parametric'' means the polymorphism is accomplished by passing argument operations to associate \emph{parameters} at the call site, and these parameters are used in the function to differentiate among the types the function operates on. Polymorphic declarations start with a universal @forall@ clause that goes before the standard (monomorphic) declaration. These declarations have the same syntax except they may use the universal type names introduced by the @forall@ clause.  For example, the following is a polymorphic identity function that works on any type @T@: \begin{cfa} @forall( T )@ @T@ identity( @T@ val ) { return val; } int forty_two = identity( 42 ); // T bound to int, forty_two == 42 \end{cfa} To allow a polymorphic function to be separately compiled, the type @T@ must be constrained by the operations used on @T@ in the function body. The @forall@ clauses is augmented with a list of polymorphic variables (local type names) and assertions (constraints), which represent the required operations on those types used in a function, \eg: \begin{cfa} forall( T @| { void do_once(T); }@) // assertion void do_twice(T value) { do_once(value); do_once(value); } void do_once(int i) { ... }  // provide assertion int i; do_twice(i); // implicitly pass assertion do_once to do_twice \end{cfa} Any object with a type fulfilling the assertion may be passed as an argument to a @do_twice@ call. A polymorphic function can be used in the same way as a normal function.  The polymorphic variables are filled in with concrete types and the assertions are checked. An assertion is checked by verifying each assertion operation (with all the variables replaced with the concrete types from the arguments) is defined at a call site. Note, a function named @do_once@ is not required in the scope of @do_twice@ to compile it, unlike \CC template expansion. Furthermore, call-site inferencing allows local replacement of the most specific parametric functions needs for a call. \begin{cfa} void do_once(double y) { ... } // global int quadruple(int x) { void do_once(int & y) { y = y * 2; } do_twice(x); return x; } \end{lstlisting} This is not the recommended way to implement a quadruple function but it does work. The complier will deduce that \codeCFA{do_twice}'s T is an integer from the argument. It will then look for a definition matching the assertion which is the \codeCFA{do_once} defined within the function. That function will be passed in as a function pointer to \codeCFA{do_twice} and called within it. To avoid typing out long lists of assertions again and again there are also traits which collect assertions into convenent packages that can then be used in assertion lists instead of all of their components. \begin{lstlisting} trait done_once(dtype T) { void do_once(T &); } \end{lstlisting} After this the forall list in the previous example could instead be written with the trait instead of the assertion itself. \begin{lstlisting} forall(dtype T | done_once(T)) \end{lstlisting} Traits can have arbitrary number of assertions in them and are usually used to create short hands for, and give descriptive names to, commond groupings of assertions. Polymorphic structures and unions may also be defined by putting a forall clause before the declaration. The type variables work the same way except are now used in field declaractions instead of parameters and local variables. \begin{lstlisting} void do_once(int y) { y = y * 2; } // local do_twice(x); // using local "do_once" return x; } \end{cfa} Specifically, the complier deduces that @do_twice@'s T is an integer from the argument @x@. It then looks for the most specific definition matching the assertion, which is the nested integral @do_once@ defined within the function. The matched assertion function is then passed as a function pointer to @do_twice@ and called within it. To avoid typing long lists of assertions, constraints can be collect into convenient packages called a @trait@, which can then be used in an assertion instead of the individual constraints. \begin{cfa} trait done_once(T) { void do_once(T); } \end{cfa} and the @forall@ list in the previous example is replaced with the trait. \begin{cfa} forall(dtype T | @done_once(T)@) \end{cfa} In general, a trait can contain an arbitrary number of assertions, both functions and variables, and are usually used to create a shorthand for, and give descriptive names to, common groupings of assertions describing a certain functionality, like @sumable@, @listable@, \etc. Polymorphic structures and unions are defined by qualifying the aggregate type with @forall@. The type variables work the same except they are used in field declarations instead of parameters, returns, and local variable declarations. \begin{cfa} forall(dtype T) struct node { node(T) * next; T * data; } \end{lstlisting} The \codeCFA{node(T)} is a use of a polymorphic structure. Polymorphic types must be provided their polymorphic parameters. There are many other features of polymorphism that have not given here but these are the ones used by the exception system. node(T) * next;  // generic linked node T * data; } \end{cfa} The generic type @node(T)@ is an example of a polymorphic-type usage.  Like \CC templates usage, a polymorphic-type usage must specify a type parameter. There are many other polymorphism features in \CFA but these are the ones used by the exception system. \section{Concurrency} \CFA has a number of concurrency features, \codeCFA{thread}s, \codeCFA{monitor}s and \codeCFA{mutex} parameters, \codeCFA{coroutine}s and \codeCFA{generator}s. The two features that interact with the exception system are \codeCFA{thread}s and \codeCFA{coroutine}s; they and their supporting constructs will be described here. \subsection{Coroutines} Coroutines are routines that do not have to finish execution to hand control back to their caller, instead they may suspend their execution at any time and resume it later. Coroutines are not true concurrency but share some similarities and many of the same underpinnings and so are included as part of the \CFA threading library. In \CFA coroutines are created using the \codeCFA{coroutine} keyword which works just like \codeCFA{struct} except that the created structure will be modified by the compiler to satify the \codeCFA{is_coroutine} trait. These structures act as the interface between callers and the coroutine, the fields are used to pass information in and out. Here is a simple example where the single field is used to pass the next number in a sequence out. \begin{lstlisting} \CFA has a number of concurrency features: @thread@, @monitor@, @mutex@ parameters, @coroutine@ and @generator@. The two features that interact with the exception system are @thread@ and @coroutine@; they and their supporting constructs are described here. \subsection{Coroutine} A coroutine is a type with associated functions, where the functions are not required to finish execution when control is handed back to the caller. Instead they may suspend execution at any time and be resumed later at the point of last suspension. (Generators are stackless and coroutines are stackful.) These types are not concurrent but share some similarities along with common underpinnings, so they are combined with the \CFA threading library. Further discussion in this section only refers to the coroutine because generators are similar. In \CFA, a coroutine is created using the @coroutine@ keyword, which is an aggregate type like @struct,@ except the structure is implicitly modified by the compiler to satisfy the @is_coroutine@ trait; hence, a coroutine is restricted by the type system to types that provide this special trait.  The coroutine structure acts as the interface between callers and the coroutine, and its fields are used to pass information in and out of coroutine interface functions. Here is a simple example where a single field is used to pass (communicate) the next number in a sequence. \begin{cfa} coroutine CountUp { unsigned int next; } \end{lstlisting} The routine part of the coroutine is a main function for the coroutine. It takes a reference to a coroutine object and returns nothing. In this function, and any functions called by this function, the suspend statement may be used to return execution to the coroutine's caller. When control returns to the function it continue from that same suspend statement instead of at the top of the function. \begin{lstlisting} void main(CountUp & this) { unsigned int next = 0; while (true) { this.next = next; suspend; next = next + 1; } } \end{lstlisting} Control is passed to the coroutine with the resume function. This includes the first time when the coroutine is starting up. The resume function takes a reference to the coroutine structure and returns the same reference. The return value is for easy access to communication variables. For example the next value from a count-up can be generated and collected in a single expression: \codeCFA{resume(count).next}. unsigned int next; // communication variable } CountUp countup; \end{cfa} Each coroutine has @main@ function, which takes a reference to a coroutine object and returns @void@. \begin{cfa}[numbers=left] void main(@CountUp & this@) { // argument matches trait is_coroutine unsigned int up = 0;  // retained between calls while (true) { next = up; // make "up" available outside function @suspend;@$\label{suspend}$ up += 1; } } \end{cfa} In this function, or functions called by this function (helper functions), the @suspend@ statement is used to return execution to the coroutine's caller without terminating the coroutine. A coroutine is resumed by calling the @resume@ function, \eg @resume(countup)@. The first resume calls the @main@ function at the top. Thereafter, resume calls continue a coroutine in the last suspended function after the @suspend@ statement, in this case @main@ line~\ref{suspend}.  The @resume@ function takes a reference to the coroutine structure and returns the same reference. The return value allows easy access to communication variables defined in the coroutine object. For example, the @next@ value for coroutine object @countup@ is both generated and collected in the single expression: @resume(countup).next@. \subsection{Monitors and Mutex} True concurrency does not garrenty ordering. To get some of that ordering back \CFA uses monitors and mutex (mutual exclution) parameters. A monitor is another special declaration that contains a lock and is compatable with mutex parameters. Function parameters can have the \codeCFA{mutex} qualifiers on reference arguments, for example \codeCFA{void example(a_monitor & mutex arg);}. When the function is called it will acquire the lock on all of the mutex parameters. This means that all functions that mutex on a type are part of a critical section and only one will ever run at a time. Concurrency does not guarantee ordering; without ordering results are non-deterministic. To claw back ordering, \CFA uses monitors and @mutex@ (mutual exclusion) parameters. A monitor is another kind of aggregate, where the compiler implicitly inserts a lock and instances are compatible with @mutex@ parameters. A function that requires deterministic (ordered) execution, acquires mutual exclusion on a monitor object by qualifying an object reference parameter with @mutex@. \begin{cfa} void example(MonitorA & @mutex@ argA, MonitorB & @mutex@ argB); \end{cfa} When the function is called, it implicitly acquires the monitor lock for all of the mutex parameters without deadlock.  This semantics means all functions with the same mutex type(s) are part of a critical section for objects of that type and only one runs at a time. \subsection{Threads} While coroutines allow new things to be done with a single execution path threads actually introduce new paths of execution that continue independently. Now for threads to work together their must be some communication between them and that means the timing of certain operations does have to be known. There or various means of syncronization and mutual exclution provided by \CFA but for exceptions only the basic two -- fork and join -- are needed. Threads are created like coroutines except the keyword is changed: \begin{lstlisting} Functions, generators, and coroutines are sequential so there is only a single (but potentially sophisticated) execution path in a program. Threads introduce multiple execution paths that continue independently. For threads to work safely with objects requires mutual exclusion using monitors and mutex parameters. For threads to work safely with other threads, also requires mutual exclusion in the form of a communication rendezvous, which also supports internal synchronization as for mutex objects. For exceptions only the basic two basic operations are important: thread fork and join. Threads are created like coroutines with an associated @main@ function: \begin{cfa} thread StringWorker { const char * input; int result; const char * input; int result; }; void main(StringWorker & this) { const char * localCopy = this.input; // ... do some work, perhaps hashing the string ... this.result = result; } \end{lstlisting} The main function will start executing after the fork operation and continue executing until it is finished. If another thread joins with this one it will wait until main has completed execution. In other words everything the thread does is between fork and join. From the outside this is the creation and destruction of the thread object. Fork happens after the constructor is run and join happens before the destructor runs. Join also happens during the \codeCFA{join} function which can be used to join a thread earlier. If it is used the destructor does not join as that has already been completed. const char * localCopy = this.input; // ... do some work, perhaps hashing the string ... this.result = result; } { StringWorker stringworker; // fork thread running in "main" } // implicitly join with thread $$$\Rightarrow$$$ wait for completion \end{cfa} The thread main is where a new thread starts execution after a fork operation and then the thread continues executing until it is finished. If another thread joins with an executing thread, it waits until the executing main completes execution. In other words, everything a thread does is between a fork and join. From the outside, this behaviour is accomplished through creation and destruction of a thread object.  Implicitly, fork happens after a thread object's constructor is run and join happens before the destructor runs. Join can also be specified explicitly using the @join@ function to wait for a thread's completion independently from its deallocation (\ie destructor call). If @join@ is called explicitly, the destructor does not implicitly join.
• ## doc/theses/andrew_beach_MMath/features.tex

 rb6a8b31 \chapter{Features} This chapter covers the design and user interface of the \CFA exception handling mechanism. \section{Virtual Casts} Virtual casts and virtual types are not truly part of the exception system but they did not exist in \CFA and are useful in exceptions. So a minimal version of they virtual system was designed and implemented. Virtual types are organizied in simple hierarchies. Each virtual type may have a parent and can have any number of children. A type's descendants are its children and its children's descendants. A type may not be its own descendant. Each virtual type has an associated virtual table type. A virtual table is a structure that has fields for all the virtual members of a type. A virtual type has all the virtual members of its parent and can add more. It may also update the values of the virtual members. Except for virtual casts, this is only used internally in the exception system. There is no general purpose interface for the other features. A a virtual cast has the following syntax: \begin{lstlisting} \chapter{Exception Features} This chapter covers the design and user interface of the \CFA exception-handling mechanism. \section{Virtuals} Virtual types and casts are not required for a basic exception-system but are useful for advanced exception features. However, \CFA is not object-oriented so there is no obvious concept of virtuals.  Hence, to create advanced exception features for this work, I needed to designed and implemented a virtual-like system for \CFA. Object-oriented languages often organized exceptions into a simple hierarchy, \eg Java. \begin{center} \setlength{\unitlength}{4000sp}% \begin{picture}(1605,612)(2011,-1951) \put(2100,-1411){\vector(1, 0){225}} \put(3450,-1411){\vector(1, 0){225}} \put(3550,-1411){\line(0,-1){225}} \put(3550,-1636){\vector(1, 0){150}} \put(3550,-1636){\line(0,-1){225}} \put(3550,-1861){\vector(1, 0){150}} \put(2025,-1490){\makebox(0,0)[rb]{\LstBasicStyle{exception}}} \put(2400,-1460){\makebox(0,0)[lb]{\LstBasicStyle{arithmetic}}} \put(3750,-1460){\makebox(0,0)[lb]{\LstBasicStyle{underflow}}} \put(3750,-1690){\makebox(0,0)[lb]{\LstBasicStyle{overflow}}} \put(3750,-1920){\makebox(0,0)[lb]{\LstBasicStyle{zerodivide}}} \end{picture}% \end{center} The hierarchy provides the ability to handle an exception at different degrees of specificity (left to right).  Hence, it is possible to catch a more general exception-type in higher-level code where the implementation details are unknown, which reduces tight coupling to the lower-level implementation. Otherwise, low-level code changes require higher-level code changes, \eg, changing from raising @underflow@ to @overflow@ at the low level means changing the matching catch at the high level versus catching the general @arithmetic@ exception. In detail, each virtual type may have a parent and can have any number of children. A type's descendants are its children and its children's descendants. A type may not be its own descendant. The exception hierarchy allows a handler (@catch@ clause) to match multiple exceptions, \eg a base-type handler catches both base and derived exception-types. \begin{cfa} try { ... } catch(arithmetic &) { ... // handle arithmetic, underflow, overflow, zerodivide } \end{cfa} Most exception mechanisms perform a linear search of the handlers and select the first matching handler, so the order of handers is now important because matching is many to one. Each virtual type needs an associated virtual table. A virtual table is a structure with fields for all the virtual members of a type. A virtual type has all the virtual members of its parent and can add more. It may also update the values of the virtual members and often does. While much of the virtual infrastructure is created, it is currently only used internally for exception handling. The only user-level feature is the virtual cast, which is the same as the \CC \lstinline[language=C++]|dynamic_cast|. \begin{cfa} (virtual TYPE)EXPRESSION \end{lstlisting} This has the same precidence as a traditional C-cast and can be used in the same places. This will convert the result of EXPRESSION to the type TYPE. Both the type of EXPRESSION and TYPE must be pointers to virtual types. The cast is checked and will either return the original value or null, based on the result of the check. The check is does the object pointed at have a type that is a descendant of the target type. If it is the result is the pointer, otherwise the result is null. \section{Exceptions} \end{cfa} Note, the syntax and semantics matches a C-cast, rather than the unusual \CC syntax for special casts. Both the type of @EXPRESSION@ and @TYPE@ must be a pointer to a virtual type. The cast dynamically checks if the @EXPRESSION@ type is the same or a subtype of @TYPE@, and if true, returns a pointer to the @EXPRESSION@ object, otherwise it returns @0p@ (null pointer). \section{Exception} % Leaving until later, hopefully it can talk about actual syntax instead % of my many strange macros. Syntax aside I will also have to talk about the % features all exceptions support. \section{Termination} Termination exception throws are likely the most framilar kind, as they are used in several popular programming languages. A termination will throw an exception, search the stack for a handler, unwind the stack to where the handler is defined, execute the handler and then continue execution after the handler. They are used when execution cannot continue here. Termination has two pieces of syntax it uses. The first is the throw: \begin{lstlisting} Exceptions are defined by the trait system; there are a series of traits, and if a type satisfies them, then it can be used as an exception.  The following is the base trait all exceptions need to match. \begin{cfa} trait is_exception(exceptT &, virtualT &) { virtualT const & @get_exception_vtable@(exceptT *); }; \end{cfa} The function takes any pointer, including the null pointer, and returns a reference to the virtual-table object. Defining this function also establishes the virtual type and a virtual-table pair to the \CFA type-resolver and promises @exceptT@ is a virtual type and a child of the base exception-type. {\color{blue} PAB: I do not understand this paragraph.} One odd thing about @get_exception_vtable@ is that it should always be a constant function, returning the same value regardless of its argument.  A pointer or reference to the virtual table instance could be used instead, however using a function has some ease of implementation advantages and allows for easier disambiguation because the virtual type name (or the address of an instance that is in scope) can be used instead of the mangled virtual table name.  Also note the use of the word promise'' in the trait description. Currently, \CFA cannot check to see if either @exceptT@ or @virtualT@ match the layout requirements. This is considered part of @get_exception_vtable@'s correct implementation. \section{Raise} \CFA provides two kinds of exception raise: termination (see \VRef{s:Termination}) and resumption (see \VRef{s:Resumption}), which are specified with the following traits. \begin{cfa} trait is_termination_exception( exceptT &, virtualT & | is_exception(exceptT, virtualT)) { void @defaultTerminationHandler@(exceptT &); }; \end{cfa} The function is required to allow a termination raise, but is only called if a termination raise does not find an appropriate handler. Allowing a resumption raise is similar. \begin{cfa} trait is_resumption_exception( exceptT &, virtualT & | is_exception(exceptT, virtualT)) { void @defaultResumptionHandler@(exceptT &); }; \end{cfa} The function is required to allow a resumption raise, but is only called if a resumption raise does not find an appropriate handler. Finally there are three convenience macros for referring to the these traits: @IS_EXCEPTION@, @IS_TERMINATION_EXCEPTION@ and @IS_RESUMPTION_EXCEPTION@.  Each takes the virtual type's name, and for polymorphic types only, the parenthesized list of polymorphic arguments. These macros do the name mangling to get the virtual-table name and provide the arguments to both sides {\color{blue}(PAB: What's a side''?)} \subsection{Termination} \label{s:Termination} Termination raise, called throw'', is familiar and used in most programming languages with exception handling. The semantics of termination is: search the stack for a matching handler, unwind the stack frames to the matching handler, execute the handler, and continue execution after the handler. Termination is used when execution \emph{cannot} return to the throw. To continue execution, the program must \emph{recover} in the handler from the failed (unwound) execution at the raise to safely proceed after the handler. A termination raise is started with the @throw@ statement: \begin{cfa} throw EXPRESSION; \end{lstlisting} The expression must evaluate to a reference to a termination exception. A termination exception is any exception with a \codeCFA{void defaultTerminationHandler(T &);} (the default handler) defined on it. The handler is taken from the call sight with \CFA's trait system and passed into the exception system along with the exception itself. The exception passed into the system is then copied into managed memory. This is to ensure it remains in scope during unwinding. It is the user's responsibility to make sure the original exception is freed when it goes out of scope. Being allocated on the stack is sufficient for this. Then the exception system will search the stack starting from the throw and proceding towards the base of the stack, from callee to caller. As it goes it will check any termination handlers it finds: \begin{lstlisting} try { TRY_BLOCK } catch (EXCEPTION_TYPE * NAME) { HANDLER } \end{lstlisting} This shows a try statement with a single termination handler. The statements in TRY\_BLOCK will be executed when control reaches this statement. While those statements are being executed if a termination exception is thrown and it is not handled by a try statement further up the stack the EHM will check all of the terminations handlers attached to the try block, top to bottom. At each handler the EHM will check to see if the thrown exception is a descendant of EXCEPTION\_TYPE. If it is the pointer to the exception is bound to NAME and the statements in HANDLER are executed. If control reaches the end of the handler then it exits the block, the exception is freed and control continues after the try statement. The default handler is only used if no handler for the exception is found after the entire stack is searched. When that happens the default handler is called with a reference to the exception as its only argument. If the handler returns control continues from after the throw statement. \paragraph{Conditional Catches} Catch clauses may also be written as: \begin{lstlisting} catch (EXCEPTION_TYPE * NAME ; CONDITION) \end{lstlisting} This has the same behaviour as a regular catch clause except that if the exception matches the given type the condition is also run. If the result is true only then is this considered a matching handler. If the result is false then the handler does not match and the search continues with the next clause in the try block. The condition considers all names in scope at the beginning of the try block to be in scope along with the name introduce in the catch clause itself. \paragraph{Re-Throwing} You can also rethrow the most recent termination exception with \codeCFA{throw;}. % This is terrible and you should never do it. This can be done in a handler or any function that could be called from a handler. This will start another termination throw reusing the exception, meaning it does not copy the exception or allocated any more memory for it. However the default handler is still at the original through and could refer to data that was on the unwound section of the stack. So instead a new default handler that does a program level abort is used. \section{Resumption} Resumption exceptions are less popular then termination but in many regards are simpler and easier to understand. A resumption throws an exception, searches for a handler on the stack, executes that handler on top of the stack and then continues execution from the throw. These are used when a problem needs to be fixed before execution continues. A resumption is thrown with a throw resume statement: \begin{lstlisting} \end{cfa} The expression must return a termination-exception reference, where the termination exception has a type with a @void defaultTerminationHandler(T &)@ (default handler) defined. The handler is found at the call site using \CFA's trait system and passed into the exception system along with the exception itself. At runtime, a representation of the exception type and an instance of the exception type is copied into managed memory (heap) to ensure it remains in scope during unwinding. It is the user's responsibility to ensure the original exception object at the throw is freed when it goes out of scope. Being allocated on the stack is sufficient for this. Then the exception system searches the stack starting from the throw and proceeding towards the base of the stack, from callee to caller. At each stack frame, a check is made for termination handlers defined by the @catch@ clauses of a @try@ statement. \begin{cfa} try { GUARDED_BLOCK } @catch (EXCEPTION_TYPE$$$_1$$$ * NAME)@ { // termination handler 1 HANDLER_BLOCK$$$_1$$$ } @catch (EXCEPTION_TYPE$$$_2$$$ * NAME)@ { // termination handler 2 HANDLER_BLOCK$$$_2$$$ } \end{cfa} The statements in the @GUARDED_BLOCK@ are executed. If those statements, or any functions invoked from those statements, throws an exception, and the exception is not handled by a try statement further up the stack, the termination handlers are searched for a matching exception type from top to bottom. Exception matching checks the representation of the thrown exception-type is the same or a descendant type of the exception types in the handler clauses. If there is a match, a pointer to the exception object created at the throw is bound to @NAME@ and the statements in the associated @HANDLER_BLOCK@ are executed. If control reaches the end of the handler, the exception is freed, and control continues after the try statement. The default handler visible at the throw statement is used if no matching termination handler is found after the entire stack is searched. At that point, the default handler is called with a reference to the exception object generated at the throw. If the default handler returns, the system default action is executed, which often terminates the program. This feature allows each exception type to define its own action, such as printing an informative error message, when an exception is not handled in the program. \subsection{Resumption} \label{s:Resumption} Resumption raise, called resume'', is as old as termination raise~\cite{Goodenough75} but is less popular. In many ways, resumption is simpler and easier to understand, as it is simply a dynamic call (as in Lisp). The semantics of resumption is: search the stack for a matching handler, execute the handler, and continue execution after the resume. Notice, the stack cannot be unwound because execution returns to the raise point. Resumption is used used when execution \emph{can} return to the resume. To continue execution, the program must \emph{correct} in the handler for the failed execution at the raise so execution can safely continue after the resume. A resumption raise is started with the @throwResume@ statement: \begin{cfa} throwResume EXPRESSION; \end{lstlisting} The result of EXPRESSION must be a resumption exception type. A resumption exception type is any type that satifies the assertion \codeCFA{void defaultResumptionHandler(T &);} (the default handler). When the statement is executed the expression is evaluated and the result is thrown. Handlers are declared using clauses in try statements: \begin{lstlisting} try { TRY_BLOCK } catchResume (EXCEPTION_TYPE * NAME) { HANDLER } \end{lstlisting} This is a simple example with the try block and a single resumption handler. Multiple resumption handlers can be put in a try statement and they can be mixed with termination handlers. When a resumption begins it will start searching the stack starting from the throw statement and working its way to the callers. In each try statement handlers will be tried top to bottom. Each handler is checked by seeing if the thrown exception is a descendant of EXCEPTION\_TYPE. If not the search continues. Otherwise NAME is bound to a pointer to the exception and the HANDLER statements are executed. After they are finished executing control continues from the throw statement. If no approprate handler is found then the default handler is called. The throw statement acts as a regular function call passing the exception to the default handler and after the handler finishes executing control continues from the throw statement. The exception system also tracks the position of a search on the stack. If another resumption exception is thrown while a resumption handler is running it will first check handlers pushed to the stack by the handler and any functions it called, then it will continue from the try statement that the handler is a part of; except for the default handler where it continues from the throw the default handler was passed to. This makes the search pattern for resumption reflect the one for termination, which is what most users expect. % This might need a diagram. But it is an important part of the justifaction \end{cfa} The semantics of the @throwResume@ statement are like the @throw@, but the expression has a type with a @void defaultResumptionHandler(T &)@ (default handler) defined, where the handler is found at the call site by the type system.  At runtime, a representation of the exception type and an instance of the exception type is \emph{not} copied because the stack is maintained during the handler search. Then the exception system searches the stack starting from the resume and proceeding towards the base of the stack, from callee to caller. At each stack frame, a check is made for resumption handlers defined by the @catchResume@ clauses of a @try@ statement. \begin{cfa} try { GUARDED_BLOCK } @catchResume (EXCEPTION_TYPE$$$_1$$$ * NAME)@ { // resumption handler 1 HANDLER_BLOCK$$$_1$$$ } @catchResume (EXCEPTION_TYPE$$$_2$$$ * NAME)@ { // resumption handler 2 HANDLER_BLOCK$$$_2$$$ } \end{cfa} The statements in the @GUARDED_BLOCK@ are executed. If those statements, or any functions invoked from those statements, resumes an exception, and the exception is not handled by a try statement further up the stack, the resumption handlers are searched for a matching exception type from top to bottom. (Note, termination and resumption handlers may be intermixed in a @try@ statement but the kind of raise (throw/resume) only matches with the corresponding kind of handler clause.) The exception search and matching for resumption is the same as for termination, including exception inheritance. The difference is when control reaches the end of the handler: the resumption handler returns after the resume rather than after the try statement. The resume point assumes the handler has corrected the problem so execution can safely continue. Like termination, if no resumption handler is found, the default handler visible at the resume statement is called, and the system default action is executed. For resumption, the exception system uses stack marking to partition the resumption search. If another resumption exception is raised in a resumption handler, the second exception search does not start at the point of the original raise. (Remember the stack is not unwound and the current handler is at the top of the stack.) The search for the second resumption starts at the current point on the stack because new try statements may have been pushed by the handler or functions called from the handler. If there is no match back to the point of the current handler, the search skips the stack frames already searched by the first resume and continues after the try statement. The default handler always continues from default handler associated with the point where the exception is created. % This might need a diagram. But it is an important part of the justification % of the design of the traversal order. It also avoids the recursive resumption problem. If the entire stack is searched loops of resumption can form. Consider a handler that handles an exception of type A by resuming an exception of type B and on the same stack, later in the search path, is a second handler that handles B by resuming A. Assuming no other handlers on the stack handle A or B then in either traversal system an A resumed from the top of the stack will be handled by the first handler. A B resumed from the top or from the first handler it will be handled by the second hander. The only difference is when A is thrown from the second handler. The entire stack search will call the first handler again, creating a loop. Starting from the position in the stack though will break this loop. \paragraph{Conditional Catches} Resumption supports conditional catch clauses like termination does. They use the same syntax except the keyword is changed: \begin{lstlisting} catchResume (EXCEPTION_TYPE * NAME ; CONDITION) \end{lstlisting} It also has the same behaviour, after the exception type has been matched with the EXCEPTION\_TYPE the CONDITION is evaluated with NAME in scope. If the result is true then the hander is run, otherwise the search continues just as if there had been a type mismatch. \paragraph{Re-Throwing} You may also re-throw resumptions with a \codeCFA{throwResume;} statement. This can only be done from inside of a \codeCFA{catchResume} block. Outside of any side effects of any code already run in the handler this will have the same effect as if the exception had not been caught in the first place. \begin{verbatim} throwResume2 ----------. |                 | generated from handler       | |                 | handler              | |                 | throwResume1 -----.   : |             |   : try            |   : search skip |             |   : catchResume  <----'   : |                 | \end{verbatim} This resumption search-pattern reflect the one for termination, which matches with programmer expectations. However, it avoids the \emph{recursive resumption} problem. If parts of the stack are searched multiple times, loops can easily form resulting in infinite recursion. Consider the trivial case: \begin{cfa} try { throwResume$$$_1$$$ (E &){}; } catch( E * ) { throwResume; } \end{cfa} Based on termination semantics, programmer expectation is for the re-resume to continue searching the stack frames after the try statement. However, the current try statement is still on the stack below the handler issuing the reresume (see \VRef{s:Reraise}). Hence, the try statement catches the re-raise again and does another re-raise \emph{ad infinitum}, which is confusing and difficult to debug. The \CFA resumption search-pattern skips the try statement so the reresume search continues after the try, mathcing programmer expectation. \section{Conditional Catch} Both termination and resumption handler-clauses may perform conditional matching: \begin{cfa} catch (EXCEPTION_TYPE * NAME ; @CONDITION@) \end{cfa} First, the same semantics is used to match the exception type. Second, if the exception matches, @CONDITION@ is executed. The condition expression may reference all names in scope at the beginning of the try block and @NAME@ introduced in the handler clause.  If the condition is true, then the handler matches. Otherwise, the exception search continues at the next appropriate kind of handler clause in the try block. \begin{cfa} try { f1 = open( ... ); f2 = open( ... ); ... } catch( IOFailure * f ; fd( f ) == f1 ) { // only handle IO failure for f1 } \end{cfa} Note, catching @IOFailure@, checking for @f1@ in the handler, and reraising the exception if not @f1@ is different because the reraise does not examine any of remaining handlers in the current try statement. \section{Reraise} \label{s:Reraise} Within the handler block or functions called from the handler block, it is possible to reraise the most recently caught exception with @throw@ or @throwResume@, respective. \begin{cfa} catch( ... ) { ... throw; // rethrow } catchResume( ... ) { ... throwResume; // reresume } \end{cfa} The only difference between a raise and a reraise is that reraise does not create a new exception; instead it continues using the current exception, \ie no allocation and copy. However the default handler is still set to the one visible at the raise point, and hence, for termination could refer to data that is part of an unwound stack frame. To prevent this problem, a new default handler is generated that does a program-level abort. \section{Finally Clauses} A \codeCFA{finally} clause may be placed at the end of a try statement after all the handler clauses. In the simply case, with no handlers, it looks like this: \begin{lstlisting} try { TRY_BLOCK A @finally@ clause may be placed at the end of a @try@ statement. \begin{cfa} try { GUARDED_BLOCK } ...   // any number or kind of handler clauses } finally { FINAL_STATEMENTS } \end{lstlisting} Any number of termination handlers and resumption handlers may proceed the finally clause. The FINAL\_STATEMENTS, the finally block, are executed whenever the try statement is removed from the stack. This includes: the TRY\_BLOCK finishes executing, a termination exception finishes executing and the stack unwinds. Execution of the finally block should finish by letting control run off the end of the block. This is because after the finally block is complete control will continue to where ever it would if the finally clause was not present. Because of this local control flow out of the finally block is forbidden. The compiler rejects uses of \codeCFA{break}, \codeCFA{continue}, \codeCFA{fallthru} and \codeCFA{return} that would cause control to leave the finally block. Other ways to leave the finally block - such as a long jump or termination - are much harder to check, at best requiring additional runtime overhead, and so are merely discouraged. FINALLY_BLOCK } \end{cfa} The @FINALLY_BLOCK@ is executed when the try statement is unwound from the stack, \ie when the @GUARDED_BLOCK@ or any handler clause finishes. Hence, the finally block is always executed. Execution of the finally block should always finish, meaning control runs off the end of the block. This requirement ensures always continues as if the finally clause is not present, \ie finally is for cleanup not changing control flow.  Because of this requirement, local control flow out of the finally block is forbidden.  The compiler precludes any @break@, @continue@, @fallthru@ or @return@ that causes control to leave the finally block. Other ways to leave the finally block, such as a long jump or termination are much harder to check, and at best requiring additional run-time overhead, and so are discouraged. \section{Cancellation} Cancellation can be thought of as a stack-level abort or as an uncatchable termination. It unwinds the entirety of the current exception and if possible passes an exception to a different stack as a message. There is no special statement for starting a cancellation, instead you call the standard libary function \codeCFA{cancel\_stack} which takes an exception. Unlike in a throw this exception is not used in control flow but is just there to pass information about why the cancellation happened. The handler is decided entirely by which stack is being cancelled. There are three handlers that apply to three different groups of stacks: \begin{itemize} \item Main Stack: The main stack is the one on which the program main is called at the beginning of your program. It is also the only stack you have without the libcfathreads. Because of this there is no other stack above" (or possibly at all) for main to notify when a cancellation occurs. So after the stack is unwound we do a program level abort. \item Thread Stack: Thread stacks are those created \codeCFA{thread} or otherwise satify the \codeCFA{is\_thread} trait. Threads only have two structural points of communication that must happen, start and join. As the thread must be running to preform a cancellation it will be after start and before join, so join is one cancellation uses. After the stack is unwound the thread will halt as if had completed normally and wait for another thread to join with it. The other thread, when it joins, checks for a cancellation. If so it will throw the resumption exception \codeCFA{ThreadCancelled}. There is a difference here in how explicate joins (with the \codeCFA{join} function) and implicate joins (from a destructor call). Explicate joins will take the default handler (\codeCFA{defaultResumptionHandler}) from the context and use like a regular through does if the exception is not caught. The implicate join does a program abort instead. This is for safety. One of the big problems in exceptions is you cannot handle two terminations or cancellations on the same stack as either can destroy the context required for the other. This can happen with join but as the destructors will always be run when the stack is being unwound and one termination/cancellation is already active. Also since they are implicite they are easier to forget about. \item Coroutine Stack: Coroutine stacks are those created with \codeCFA{coroutine} or otherwise satify the \codeCFA{is\_coroutine} trait. A coroutine knows of two other coroutines, its starter and its last resumer. The last resumer is closer" so that is the one notified. After the stack is unwound control goes to the last resumer. Resume will resume throw a \codeCFA{CoroutineCancelled} exception, which is polymorphic over the coroutine type and has a pointer to the coroutine being cancelled and the cancelling exception. The resume function also has an assertion that the \codeCFA{defaultResumptionHandler} for the exception. So it will use the default handler like a regular throw. \end{itemize} Cancellation is a stack-level abort, which can be thought of as as an uncatchable termination. It unwinds the entirety of the current stack, and if possible forwards the cancellation exception to a different stack. There is no special statement for starting a cancellation; instead the standard library function @cancel_stack@ is called passing an exception.  Unlike a raise, this exception is not used in matching only to pass information about the cause of the cancellation. Handling of a cancellation depends on which stack is being cancelled. \begin{description} \item[Main Stack:] The main stack is the one used by the program main at the start of execution, and is the only stack in a sequential program.  Hence, when cancellation is forwarded to the main stack, there is no other forwarding stack, so after the stack is unwound, there is a program-level abort. \item[Thread Stack:] A thread stack is created for a @thread@ object or object that satisfies the @is_thread@ trait.  A thread only has two points of communication that must happen: start and join. As the thread must be running to perform a cancellation, it must occur after start and before join, so join is a cancellation point.  After the stack is unwound, the thread halts and waits for another thread to join with it. The joining thread, checks for a cancellation, and if present, resumes exception @ThreadCancelled@. There is a subtle difference between the explicit join (@join@ function) and implicit join (from a destructor call). The explicit join takes the default handler (@defaultResumptionHandler@) from its calling context, which is used if the exception is not caught. The implicit join does a program abort instead. This semantics is for safety. One difficult problem for any exception system is defining semantics when an exception is raised during an exception search: which exception has priority, the original or new exception? No matter which exception is selected, it is possible for the selected one to disrupt or destroy the context required for the other. {\color{blue} PAB: I do not understand the following sentences.} This loss of information can happen with join but as the thread destructor is always run when the stack is being unwound and one termination/cancellation is already active. Also since they are implicit they are easier to forget about. \item[Coroutine Stack:] A coroutine stack is created for a @coroutine@ object or object that satisfies the @is_coroutine@ trait.  A coroutine only knows of two other coroutines, its starter and its last resumer.  The last resumer has the tightest coupling to the coroutine it activated.  Hence, cancellation of the active coroutine is forwarded to the last resumer after the stack is unwound, as the last resumer has the most precise knowledge about the current execution. When the resumer restarts, it resumes exception @CoroutineCancelled@, which is polymorphic over the coroutine type and has a pointer to the cancelled coroutine. The resume function also has an assertion that the @defaultResumptionHandler@ for the exception. So it will use the default handler like a regular throw. \end{description}

• ## doc/theses/andrew_beach_MMath/unwinding.tex

 rb6a8b31 \chapter{Unwinding in \CFA} \chapter{\texorpdfstring{Unwinding in \CFA}{Unwinding in Cforall}} Stack unwinding is the process of removing things from the stack. Within Even this is fairly simple if nothing needs to happen when the stack unwinds. Traditional C can unwind the stack by saving and restoring state (with \codeC{setjmp} \& \codeC{longjmp}). However many languages define actions that @setjmp@ \& @longjmp@). However many languages define actions that have to be taken when something is removed from the stack, such as running a variable's destructor or a \codeCFA{try} statement's \codeCFA{finally} a variable's destructor or a @try@ statement's @finally@ clause. Handling this requires walking the stack going through each stack frame. \CFA uses two primary functions in libunwind to create most of its exceptional control-flow: \codeC{_Unwind_RaiseException} and \codeC{_Unwind_ForcedUnwind}. exceptional control-flow: @_Unwind_RaiseException@ and @_Unwind_ForcedUnwind@. Their operation is divided into two phases: search and clean-up. The search phase -- phase 1 -- is used to scan the stack but not unwinding it. The A personality function performs three tasks, although not all have to be present. The tasks performed are decided by the actions provided. \codeC{_Unwind_Action} is a bitmask of possible actions and an argument of @_Unwind_Action@ is a bitmask of possible actions and an argument of this type is passed into the personality function. \begin{itemize} \item\codeC{_UA_SEARCH_PHASE} is passed in search phase and tells the \item@_UA_SEARCH_PHASE@ is passed in search phase and tells the personality function to check for handlers. If there is a handler in this stack frame, as defined by the language, the personality function should return \codeC{_URC_HANDLER_FOUND}. Otherwise it should return \codeC{_URC_CONTINUE_UNWIND}. \item\codeC{_UA_CLEANUP_PHASE} is passed in during the clean-up phase and return @_URC_HANDLER_FOUND@. Otherwise it should return @_URC_CONTINUE_UNWIND@. \item@_UA_CLEANUP_PHASE@ is passed in during the clean-up phase and means part or all of the stack frame is removed. The personality function should do whatever clean-up the language defines (such as running destructors/finalizers) and then generally returns \codeC{_URC_CONTINUE_UNWIND}. \item\codeC{_UA_HANDLER_FRAME} means the personality function must install @_URC_CONTINUE_UNWIND@. \item@_UA_HANDLER_FRAME@ means the personality function must install a handler. It is also passed in during the clean-up phase and is in addition to the clean-up action. libunwind provides several helpers for the personality function here. Once it is done, the personality function must return \codeC{_URC_INSTALL_CONTEXT}. @_URC_INSTALL_CONTEXT@. \end{itemize} The personality function is given a number of other arguments. Some are for compatability and there is the \codeC{struct _Unwind_Context} pointer which compatability and there is the @struct _Unwind_Context@ pointer which passed to many helpers to get information about the current stack frame. raise-exception but with some extras. The first it passes in an extra action to the personality function on each stack frame, \codeC{_UA_FORCE_UNWIND}, which means a handler cannot be stack frame, @_UA_FORCE_UNWIND@, which means a handler cannot be installed. stack frames have been removed. By the standard API this is marked by setting the stack pointer inside the context passed to the stop function. However both GCC and Clang add an extra action for this case \codeC{_UA_END_OF_STACK}. GCC and Clang add an extra action for this case @_UA_END_OF_STACK@. Each time function the stop function is called it can do one or two things. When it is not the end of the stack it can return \codeC{_URC_NO_REASON} to When it is not the end of the stack it can return @_URC_NO_REASON@ to continue unwinding. % Is there a reason that NO_REASON is used instead of CONTINUE_UNWIND? are provided to do it. \section{\CFA Implementation} \section{\texorpdfstring{\CFA Implementation}{Cforall Implementation}} To use libunwind, \CFA provides several wrappers, its own storage, The stop function is very simple. It checks the end of stack flag to see if it is finished unwinding. If so, it calls \codeC{exit} to end the process, it is finished unwinding. If so, it calls @exit@ to end the process, otherwise it returns with no-reason to continue unwinding. % Yeah, this is going to have to change. location of the instruction pointer and stack layout, which varies with compiler and optimization levels. So for frames where there are only destructors, GCC's attribute cleanup with the \texttt{-fexception} flag is destructors, GCC's attribute cleanup with the @-fexception@ flag is sufficient to handle unwinding. The only functions that require more than that are those that contain \codeCFA{try} statements. A \codeCFA{try} statement has a \codeCFA{try} clause, some number of \codeCFA{catch} clauses and \codeCFA{catchResume} clauses and may have a \codeCFA{finally} clause. Of these only \codeCFA{try} statements with \codeCFA{catch} clauses need to be transformed and only they and the \codeCFA{try} clause are involved. @try@ statements. A @try@ statement has a @try@ clause, some number of @catch@ clauses and @catchResume@ clauses and may have a @finally@ clause. Of these only @try@ statements with @catch@ clauses need to be transformed and only they and the @try@ clause are involved. The \codeCFA{try} statement is converted into a series of closures which can The @try@ statement is converted into a series of closures which can access other parts of the function according to scoping rules but can be passed around. The \codeCFA{try} clause is converted into the try functions, almost entirely unchanged. The \codeCFA{catch} clauses are converted into two passed around. The @try@ clause is converted into the try functions, almost entirely unchanged. The @catch@ clauses are converted into two functions; the match function and the catch function. runs the handler's body. These three functions are passed to \codeC{try_terminate}. This is an These three functions are passed to @try_terminate@. This is an % Maybe I shouldn't quote that, it isn't its actual name. internal hand-written function that has its own personality function and handler was found in this frame. If it was then the personality function installs the handler, which is setting the instruction pointer in \codeC{try_terminate} to an otherwise unused section that calls the catch @try_terminate@ to an otherwise unused section that calls the catch function, passing it the current exception and handler index. \codeC{try_terminate} returns as soon as the catch function returns. @try_terminate@ returns as soon as the catch function returns. At this point control has returned to normal control flow.
• ## doc/theses/fangren_yu_COOP_F20/Report.tex

 rb6a8b31 \usepackage[usenames]{color} \input{common}                                          % common CFA document macros \usepackage[dvips,plainpages=false,pdfpagelabels,pdfpagemode=UseNone,colorlinks=true,pagebackref=true,linkcolor=blue,citecolor=blue,urlcolor=blue,pagebackref=true,breaklinks=true]{hyperref} \usepackage[dvips,plainpages=false,pdfpagelabels,pdfpagemode=UseNone,colorlinks=true,linkcolor=blue,citecolor=blue,urlcolor=blue,pagebackref=true,breaklinks=true]{hyperref} \usepackage{breakurl} \urlstyle{sf} \renewcommand{\subsectionmark}[1]{\markboth{\thesubsection\quad #1}{\thesubsection\quad #1}} \pagenumbering{roman} \linenumbers                                            % comment out to turn off line numbering %\linenumbers                                            % comment out to turn off line numbering \maketitle \pdfbookmark[1]{Contents}{section} \tableofcontents \clearpage \thispagestyle{plain} \pagenumbering{arabic} \begin{abstract} \CFA is an evolutionary, non-object-oriented extension of the C programming language, featuring a parametric type-system, and is currently under active development. The reference compiler for the \CFA language, @cfa-cc@, has some of its major components dated back to the early 2000s, which are based on inefficient data structures and algorithms. This report introduces improvements targeting the expression resolution algorithm, suggested by a recent prototype experiment on a simplified model, which are implemented in @cfa-cc@ to support the full \CFA language. These optimizations speed up the compiler by a factor of 20 across the existing \CFA codebase, bringing the compilation time of a mid-sized \CFA source file down to the 10-second level. A few problem cases derived from realistic code examples are analyzed in detail, with proposed solutions. This work is a critical step in the \CFA project development to achieve its eventual goal of being used alongside C for large software systems. \end{abstract} \clearpage \section*{Acknowledgements} \begin{sloppypar} I would like to thank everyone in the \CFA team for their contribution towards this project. Programming language design and development is a tough subject and requires a lot of teamwork. Without the collaborative efforts from the team, this project could not have been a success. Specifically, I would like to thank Andrew Beach for introducing me to the \CFA codebase, Thierry Delisle for maintaining the test and build automation framework, Michael Brooks for providing example programs of various experimental language and type system features, and most importantly, Professor Martin Karsten for recommending me to the \CFA team, and my supervisor, Professor Peter Buhr for encouraging me to explore deeply into intricate compiler algorithms. Finally, I gratefully acknowledge the help from Aaron Moss, former graduate from the team and the author of the precedent thesis work, to participate in the \CFA team's virtual conferences and email correspondence, and provide many critical arguments and suggestions. 2020 had been an unusually challenging year for everyone and we managed to keep a steady pace. \end{sloppypar} \clearpage \tableofcontents \clearpage \section{Introduction} \section{Completed work} \CFA language, developed by the Programming Language Group at the University of Waterloo, has a long history, with the initial language design in 1992 by Glen Ditchfield~\cite{Ditchfield92} and the first proof-of-concept compiler built in 2003 by Richard Bilson~\cite{Bilson03}. Many new features have been added to the language over time, but the core of \CFA's type-system --- parametric functions introduced by the @forall@ clause (hence the name of the language) providing parametric overloading --- remains mostly unchanged. The current \CFA reference compiler, @cfa-cc@, is designed using the visitor pattern~\cite{vistorpattern} over an abstract syntax tree (AST), where multiple passes over the AST modify it for subsequent passes. @cfa-cc@ still includes many parts taken directly from the original Bilson implementation, which served as the starting point for this enhancement work to the type system. Unfortunately, the prior implementation did not provide the efficiency required for the language to be practical: a \CFA source file of approximately 1000 lines of code can take a multiple minutes to compile. The cause of the problem is that the old compiler used inefficient data structures and algorithms for expression resolution, which involved significant copying and redundant work. This report presents a series of optimizations to the performance-critical parts of the resolver, with a major rework of the compiler data-structures using a functional-programming approach to reduce memory complexity. The improvements were suggested by running the compiler builds with a performance profiler against the \CFA standard-library source-code and a test suite to find the most underperforming components in the compiler algorithm. The \CFA team endorses a pragmatic philosophy that focuses on practical implications of language design and implementation rather than theoretical limits. In particular, the compiler is designed to be expressive with respect to code reuse while maintaining type safety, but compromise theoretical soundness in extreme corner cases. However, when these corner cases do appear in actual usage, they need to be thoroughly investigated. A case-by-case analysis is presented for several of these corner cases, some of which point to certain weaknesses in the language design with solutions proposed based on experimental results. \section{AST restructuring} \subsection{Memory model with sharing} A major rework of the abstract syntax tree (AST) data structure in the compiler is completed as the first step of the project. The majority of work were documented in the reference manual of the compiler~\cite{cfa-cc}. To summarize: \begin{itemize} \item AST nodes (and therefore subtrees) can be shared without copying when reused. \item Modifications apply the functional programming principle, making copies for local changes without affecting the original data shared by other owners. In-place mutations are permitted as a special case when sharing does not happen. The logic is implemented by reference counting. \item Memory allocation and freeing are performed automatically using smart pointers. \end{itemize} The resolver algorithm designed for overload resolution naturally introduces a significant amount of reused intermediate representations, especially in the following two places: \begin{itemize} \item Function overload candidates are computed by combining the argument candidates bottom-up, with many of them being a common term. For example, if $n$ overloads of a function @f@ all take an integer for the first parameter but different types for the second (@f( int, int )@, @f( int, double )@, etc.) the first term is reused $n$ times for each of the generated candidate expressions. This effect is particularly bad for deep expression trees. \item In the unification algorithm and candidate elimination step, actual types are obtained by substituting the type parameters by their bindings. Let $n$ be the complexity (\ie number of nodes in representation) of the original type, $m$ be the complexity of bound type for parameters, and $k$ be the number of occurrences of type parameters in the original type. If everything needs to be deep-copied, the substitution step takes $O(n+mk)$ time and memory, while using shared nodes it is reduced to $O(n)$ time and $O(k)$ memory. \end{itemize} One of the worst examples for the old compiler is a long chain of I/O operations \begin{cfa} sout | 1 | 2 | 3 | 4 | ... \end{cfa} The pipe operator is overloaded by \CFA I/O library for every primitive type in C language, as well as I/O manipulators defined by the library. In total there are around 50 overloads for the output stream operation. On resolving the $n$-th pipe operator in the sequence, the first term, which is the result of sub-expression containing $n-1$ pipe operators, is reused to resolve every overload. Therefore at least $O(n^2)$ copies of expression nodes are made during resolution, not even counting type unification cost; combined with two large factors from number of overloads of pipe operators, and that the output stream type'' in \CFA is a trait with 27 assertions (which adds to complexity of the pipe operator's type) this makes compiling a long output sequence extremely slow. In new AST representation only $O(n)$ copies are required and type of pipe operator is not copied at all. Reduction in space complexity is especially important, as preliminary profiling result on the old compiler build shows that over half of time spent in expression resolution are on memory allocations. A major rework of the AST data-structure in the compiler was completed as the first step of the project. The majority of this work is documented in my prior report documenting the compiler reference-manual~\cite{cfa-cc}. To summarize: \begin{itemize} \item AST nodes (and therefore subtrees) can be shared without copying. \item Modifications are performed using functional-programming principles, making copies for local changes without affecting the original data shared by other owners. In-place mutations are permitted as a special case when there is no sharing. The logic is implemented by reference counting. \item Memory allocation and freeing are performed automatically using smart pointers~\cite{smartpointers}. \end{itemize} The resolver algorithm, designed for overload resolution, uses a significant amount of reused, and hence copying, for the intermediate representations, especially in the following two places: \begin{itemize} \item Function overload candidates are computed by combining the argument candidates bottom-up, with many being a common term. For example, if $n$ overloads of a function @f@ all take an integer for the first parameter but different types for the second, \eg @f( int, int )@, @f( int, double )@, etc., the first term is copied $n$ times for each of the generated candidate expressions. This copying is particularly bad for deep expression trees. \item In the unification algorithm and candidate elimination step, actual types are obtained by substituting the type parameters by their bindings. Let $n$ be the complexity (\ie number of nodes in representation) of the original type, $m$ be the complexity of the bound type for parameters, and $k$ be the number of occurrences of type parameters in the original type. If every substitution needs to be deep-copied, these copy step takes $O(n+mk)$ time and memory, while using shared nodes it is reduced to $O(n)$ time and $O(k)$ memory. \end{itemize} One of the worst examples for the old compiler is a long chain of I/O operations: \begin{cfa} sout | 1 | 2 | 3 | 4 | ...;   // print integer constants \end{cfa} The pipe operator is overloaded by the \CFA I/O library for every primitive type in the C language, as well as I/O manipulators defined by the library. In total, there are around 50 overloads for the output stream operation. On resolving the $n$-th pipe operator in the sequence, the first term, which is the result of sub-expression containing $n-1$ pipe operators, is reused to resolve every overload. Therefore at least $O(n^2)$ copies of expression nodes are made during resolution, not even counting type unification cost; combined with the two large factors from number of overloads of pipe operators, and that the output stream type'' in \CFA is a trait with 27 assertions (which adds to complexity of the pipe operator's type) this makes compiling a long output sequence extremely slow. In the new AST representation, only $O(n)$ copies are required and the type of the pipe operator is not copied at all. Reduction in space complexity is especially important, as preliminary profiling results on the old compiler build showed over half of the time spent in expression resolution is on memory allocations. Since the compiler codebase is large and the new memory model mostly benefits expression resolution, some of the old data structures are still kept, and a conversion pass happens before and after the general resolve phase. Rewriting every compiler module will take longer, and whether the new model is correct was unknown when this project started, therefore only the resolver is currently implemented with the new data structure. \subsection{Merged resolver calls} The pre-resolve phase of compilation, inadequately called validate'' in the compiler source code, does more than just simple syntax validation, as it also normalizes input program. Some of them, however, requires type information on expressions and therefore needs to call the resolver before the general resolve phase. There are three notable places where the resolver is invoked: \begin{itemize} \item Attempt to generate default constructor, copy constructor and destructor for user-defined @struct@ types \item Resolve @with@ statements (the same as in Python, which introduces fields of a structure directly in scope) The pre-resolve phase of compilation, inappropriately called validate'' in the compiler source code, has a number of passes that do more than simple syntax and semantic validation; some passes also normalizes the input program. A few of these passes require type information for expressions, and therefore, need to call the resolver before the general resolve phase. There are three notable places where the resolver is invoked: \begin{itemize} \item Generate default constructor, copy constructor and destructor for user-defined @struct@ types. \item Resolve @with@ statements (the same as in Pascal~\cite{pascal}), which introduces fields of a structure directly into a scope. \item Resolve @typeof@ expressions (cf. @decltype@ in \CC); note that this step may depend on symbols introduced by @with@ statements. \end{itemize} Since the compiler codebase is large and the new memory model mostly only benefits expression resolution, the old data structure is still kept, and a conversion pass happens before and after resolve phase. Rewriting every compiler module will take a long time, and whether the new model is correct is still unknown when started, therefore only the resolver is implemented with the new data structure. Since the constructor calls were one of the most expensive to resolve (reason will be shown in the next section), pre-resolve phase were taking more time after resolver moves to the more efficient new implementation. To better facilitate the new resolver, every step that requires type information are reintegrated as part of resolver. A by-product of this work is that the reversed dependence of @with@ statement and @typeof@ can now be handled. Previously, the compiler is unable to handle cases such as Since the constructor calls are one of the most expensive to resolve (reason given in~\VRef{s:SpecialFunctionLookup}), this pre-resolve phase was taking a large amount of time even after the resolver was changed to the more efficient new implementation. The problem is that multiple resolutions repeat a significant amount of work. Therefore, to better facilitate the new resolver, every step that requires type information should be integrated as part of the general resolver phase. A by-product of this work is that reversed dependence between @with@ statement and @typeof@ can now be handled. Previously, the compiler was unable to handle cases such as: \begin{cfa} struct S { int x; }; S foo(); typeof( foo() ) s; // type is S with (s) { with (s) { x; // refers to s.x } \end{cfa} since type of @s@ is still unresolved when handling @with@ expressions. Instead, the new (and correct) approach is to evaluate @typeof@ expressions when the declaration is first seen, and it suffices because of the declaration-before-use rule. since the type of @s@ is unresolved when handling @with@ expressions because the @with@ pass follows the @typeof@ pass (interchanging passes only interchanges the problem). Instead, the new (and correct) approach is to evaluate @typeof@ expressions when the declaration is first seen during resolution, and it suffices because of the declaration-before-use rule. \subsection{Special function lookup} Reducing the number of functions looked up for overload resolution is an effective way to gain performance when there are many overloads but most of them are trivially wrong. In practice, most functions have few (if any) overloads but there are notable exceptions. Most importantly, constructor @?{}@, destructor @^?{}@, and assignment @?=?@ are generated for every user-defined type, and in a large source file there can be hundreds of them. Furthermore, many calls to them are generated for initializing variables and passing arguments. This fact makes them the most overloaded and most called functions. In an object-oriented programming language, object has methods declared with their types, so a call such as @obj.f()@ only needs to perform lookup in the method table corresponding to type of @obj@. \CFA on the other hand, does not have methods, and all types are open (\ie new operations can be defined on them), so a similar approach will not work in general. However, the big 3'' operators have a unique property enforced by the language rules, such that the first parameter must have a reference type. Since \CFA does not have class inheritance, reference type must always match exactly. Therefore, argument-dependent lookup can be implemented for these operators, by using a dedicated symbol table. The lookup key used for the special functions is the mangled type name of the first parameter, which acts as the @this@ parameter in an object-oriented language. To handle generic types, the type parameters are stripped off, and only the base type is matched. Note that a constructor (destructor, assignment operator) taking arbitrary @this@ argument, for example @forall( dtype T ) void ?{}( T & );@ is not allowed, and it guarantees that if the @this@ type is known, all possible overloads can be found by searching with the given type. In case that the @this@ argument itself is overloaded, it is resolved first and all possible result types are used for lookup. Note that for the generated expressions, the particular variable for @this@ argument is fully known, without overloads, so the majority of constructor call resolutions only need to check for one given object type. Explicit constructor calls and assignment statements sometimes may require lookup for multiple types. In the extremely rare case that type of @this@ argument is yet unbound, everything will have to be checked, just like without the argument-dependent lookup algorithm; fortunately, this case almost never happens in practice. An example is found in the library function @new@: \label{s:SpecialFunctionLookup} Reducing the number of function looked ups for overload resolution is an effective way to gain performance when there are many overloads but most of them are trivially wrong. In practice, most functions have few (if any) overloads but there are notable exceptions. Most importantly, constructor @?{}@, destructor @^?{}@, and assignment @?=?@ are generated for every user-defined type (@struct@ and @union@ in C), and in a large source file there can be hundreds of them. Furthermore, many calls are generated for initializing variables, passing arguments and copying values. This fact makes them the most overloaded and most called functions. In an object-oriented programming language, the object-method types are scoped within a class, so a call such as @obj.f()@ only needs to perform lookup in the method table corresponding to the type of @obj@. \CFA on the other hand, does not have methods, and all types are open, \ie new operations can be defined on them without inheritance; at best a \CFA type can be constrained by a translation unit. However, the big 3'' operators have a unique property enforced by the language rules: the first parameter must be a reference to its associated type, which acts as the @this@ parameter in an object-oriented language. Since \CFA does not have class inheritance, the reference type must always match exactly. Therefore, argument-dependent lookup can be implemented for these operators by using a dedicated, fast symbol-table. The lookup key for the special functions is the mangled type name of the first parameter. To handle generic types, the type parameters are stripped off, and only the base type is matched. Note a constructor (destructor, assignment operator) may not take an arbitrary @this@ argument, \eg @forall( dtype T ) void ?{}( T & )@, thus guaranteeing that if the @this@ type is known, all possible overloads can be found by searching with this given type. In the case where the @this@ argument itself is overloaded, it is resolved first and all possible result types are used for lookup. Note that for a generated expression, the particular variable for the @this@ argument is fully known, without overloads, so the majority of constructor-call resolutions only need to check for one given object type. Explicit constructor calls and assignment statements sometimes require lookup for multiple types. In the extremely rare case that the @this@-argument type is unbound, all necessary types are guaranteed to be checked, as for the previous lookup without the argument-dependent lookup; fortunately, this complex case almost never happens in practice. An example is found in the library function @new@: \begin{cfa} forall( dtype T | sized( T ), ttype TT | { void ?{}( T &, TT ); } ) T * new( TT p ) { return &(*malloc()){ p }; } \end{cfa} as @malloc@ may return a pointer to any type, depending on context. Interestingly, this particular line of code actually caused another complicated issue, where the unusually massive work of checking every constructor in presence makes the case even worse. Section~\ref{s:TtypeResolutionInfiniteRecursion} presents a detailed analysis for the problem. The callable'' operator @?()@ (cf. @operator()@ in \CC) could also be included in the special operator list, as it is usually only on user-defined types, and the restriction that first argument must be a reference seems reasonable in this case. as @malloc@ may return a pointer to any type, depending on context. Interestingly, this particular declaration actually causes another complicated issue, making the complex checking of every constructor even worse. \VRef[Section]{s:TtypeResolutionInfiniteRecursion} presents a detailed analysis of this problem. The callable'' operator @?()@ (cf. @operator()@ in \CC) can also be included in this special operator list, as it is usually only on user-defined types, and the restriction that the first argument must be a reference seems reasonable in this case. \subsection{Improvement of function type representation} Since substituting type parameters with their bound types is one fundamental operation in many parts of resolver algorithm (particularly unification and environment binding), making as few copies of type nodes as possible helps reducing memory complexity. Even with the new memory management model, allocation is still a significant factor of resolver performance. Conceptually, operations on type nodes of AST should be performed in functional programming style, treating the data structure as immutable and only copy when necessary. The in-place mutation is a mere optimization that does not change logic of operations. The model was broken on function types by an inappropriate design. Function types require some special treatment due to the existence of assertions. In particular, it must be able to distinguish two different kinds of type parameter usage: Since substituting type parameters with their bound types is one fundamental operation in many parts of resolver algorithm (particularly unification and environment binding), making as few copies of type nodes as possible helps reducing memory complexity. Even with the new memory management model, allocation is still a significant factor of resolver performance. Conceptually, operations on type nodes of the AST should be performed in functional-programming style, treating the data structure as immutable and only copying when necessary. The in-place mutation is a mere optimization that does not change the logic for operations. However, the model was broken for function types by an inappropriate design. Function types require special treatment due to the existence of assertions that constrain the types it supports. Specifically, it must be possible to distinguish two different kinds of type parameter usage: \begin{cfa} forall( dtype T ) void foo( T * t ) { forall( dtype U ) void bar( T * t, U * u ) { ... } } \end{cfa} Here, only @U@ is a free parameter in declaration of @bar@, as it appears in the function's own forall clause; while @T@ is not free. Moreover, the resolution algorithm also has to distinguish type bindings of multiple calls to the same function, for example with forall( dtype U ) void bar( @T@ * t, @U@ * u ) { ... } } \end{cfa} Here, only @U@ is a free parameter in the nested declaration of function @bar@, as @T@ must be bound at the call site when resolving @bar@. Moreover, the resolution algorithm also has to distinguish type bindings of multiple calls to the same function, \eg: \begin{cfa} forall( dtype T ) int foo( T x ); foo( foo( 1.0 ) ); \end{cfa} The inner call has binding (T: double) while the outer call has binding (T: int). Therefore a unique representation of free parameters in each expression is required. This was previously done by creating a copy of the parameter declarations inside function type, and fixing references afterwards. However, fixing references is an inherently deep operation that does not work well with functional programming model, as it must be evaluated eagerly on the entire syntax tree representing the function type. The revised approach generates a unique ID value for each function call expression instance and represents an occurrence of free parameter type with a pair of generated ID and the original parameter declaration, so that references do not need to be fixed, and a shallow copy of function type is possible. Note that after the change, all declaration nodes in syntax tree representation maps one-to-one with the actual declarations in the program, and therefore are guaranteed to be unique. Such property can potentially enable more optimizations, and some related ideas are presented after Section~\ref{s:SharedSub-ExpressionCaseUniqueExpressions}. int i = foo( foo( 1.0 ) ); \end{cfa} The inner call has binding (T: double) while the outer call has binding (T: int). Therefore a unique representation for the free parameters is required in each expression. This type binding was previously done by creating a copy of the parameter declarations inside the function type and fixing references afterwards. However, fixing references is an inherently deep operation that does not work well with the functional-programming style, as it forces eager evaluation on the entire syntax tree representing the function type. The revised approach generates a unique ID value for each function call expression instance and represents an occurrence of a free-parameter type with a pair of generated ID and original parameter declaration, so references are unique and a shallow copy of the function type is possible. Note that after the change, all declaration nodes in the syntax-tree representation now map one-to-one with the actual declarations in the program, and therefore are guaranteed to be unique. This property can potentially enable more optimizations, and some related ideas are presented at the end of \VRef{s:SharedSub-ExpressionCaseUniqueExpressions}. \subsection{Improvement of pruning steps} A minor improvement for candidate elimination is to skip the step on the function overloads themselves and only perform on results of function application. As function calls are usually by name, the name resolution rule dictates that every function candidate necessarily has a different type; indirect function calls are rare, and when they do appear, they usually will not have many possible interpretations, and those rarely matches exactly in argument type. Since function types have a much more complex representation than data types (with multiple parameters and assertions), checking equality on them also takes longer. A brief test of this approach shows that the number of function overloads considered in expression resolution increases by a negligible amount of less than 1 percent, while type comparisons in candidate elimination are cut by more than half. Improvement is consistent over all \CFA source files in the test suite. A minor improvement for candidate elimination is to skip the step on the function overloads and only check the results of function application. As function calls are usually by name (versus pointers to functions), the name resolution rule dictates that every function candidate necessarily has a different type; indirect function calls are rare, and when they do appear, there are even fewer cases with multiple interpretations, and these rarely match exactly in argument type. Since function types have a much more complex representation (with multiple parameters and assertions) than data types, checking equality on them also takes longer. A brief test of this approach shows that the number of function overloads considered in expression resolution increases by an amount of less than 1 percent, while type comparisons in candidate elimination are reduced by more than half. This improvement is consistent over all \CFA source files in the test suite. \label{s:SharedSub-ExpressionCaseUniqueExpressions} Unique expression denotes an expression that must be evaluated only once, to prevent unwanted side effects. It is currently only a compiler artifact, generated on tuple member expression of the form Unique expression denotes an expression evaluated only once to prevent unwanted side effects. It is currently only a compiler artifact, generated for tuple-member expression of the form: \begin{cfa} struct S { int a; int b; }; s.[a, b]; // tuple member expression, type is [int, int] \end{cfa} If the aggregate expression contains function calls, it cannot be evaluated multiple times: If the aggregate expression is function call, it cannot be evaluated multiple times: \begin{cfa} S makeS(); makeS().[a, b]; // this should only make one S makeS().[a, b]; // this should only generate a unique S \end{cfa} Before code generation, the above expression is internally represented as \end{cfa} at code generation, where @_unique_var@ and @_unique_var_evaluated@ are generated variables whose scope covers all appearances of the same expression. Note that although the unique expression is only used for tuple expansion now, it is a generally useful construction, and can be seen in other languages, such as Scala's @lazy val@~\cite{Scala}; therefore it could be worthwhile to introduce the unique expression to a broader context in \CFA and even make it directly available to programmers. In the compiler's visitor pattern, however, this creates a problem where multiple paths to a logically unique expression exist, so it may be modified more than once and become ill-formed; some specific intervention is required to ensure that unique expressions are only visited once. Furthermore, a unique expression appearing in more than one places will be copied on mutation so its representation is no longer unique. Some hacks are required to keep it in sync, and the methods are different when mutating the unique expression instance itself or its underlying expression. Example when mutating the underlying expression (visit-once guard) The conditional check ensures a single call to @makeS()@ even though there are logically multiple calls because of the tuple field expansion. Note that although the unique expression is only used for tuple expansion now, it is a generally useful construction, and is seen in other programming languages, such as Scala's @lazy val@~\cite{Scala}; therefore it may be worthwhile to introduce the unique expression to a broader context in \CFA and even make it directly available to programmers. In the compiler's visitor pattern, however, this creates a problem where multiple paths to a logically unique expression exist, so it may be modified more than once and become ill-formed; some specific intervention is required to ensure unique expressions are only visited once. Furthermore, a unique expression appearing in more than one places is copied on mutation so its representation is no longer unique. Currently, special cases are required to keep everything synchronized, and the methods are different when mutating the unique expression instance itself or its underlying expression: \begin{itemize} \item When mutating the underlying expression (visit-once guard) \begin{cfa} void InsertImplicitCalls::previsit( const ast::UniqueExpr * unqExpr ) { if ( visitedIds.count( unqExpr->id ) ) visit_children = false; @if ( visitedIds.count( unqExpr->id ) ) visit_children = false;@ else visitedIds.insert( unqExpr->id ); } \end{cfa} Example when mutating the unique instance itself, which actually creates copies \item When mutating the unique instance itself, which actually creates copies \begin{cfa} auto mutExpr = mutate( unqExpr ); // internally calls copy when shared if ( ! unqMap.count( unqExpr->id ) ) { @if ( ! unqMap.count( unqExpr->id ) ) {@ ... } else { } \end{cfa} Such workaround seems difficult to be fit into a common visitor template. This suggests the memory model may need different kinds of nodes to accurately represent the syntax tree. Together with the fact that declaration nodes are always unique, it is possible that AST nodes can be classified by three different types: \begin{itemize} \item \textbf{Strictly unique} with only one owner (declarations); \item \textbf{Logically unique} with (possibly) many owners but should not be copied (unique expression example presented here); \item \textbf{Shared} by functional programming model, which assume immutable data structure and are copied on mutation. \end{itemize} Such workarounds are difficult to fit into the common visitor pattern, which suggests the memory model may need different kinds of nodes to accurately represent this feature in the AST. Given that declaration nodes are unique, it is possible for AST nodes to be divided into three different types: \begin{itemize} \item \textbf{Singleton} with only one owner (declarations); \item \textbf{No-copy} with multiple owners but cannot be copied (unique expression example presented here); \item \textbf{Copy} by functional-programming style, which assumes immutable data structures that are copied on mutation. \end{itemize} The boilerplate code can potentially handle these three cases differently. \section{Analysis of resolver algorithm complexity} The focus of this chapter is to identify and analyze some realistic cases that cause resolver algorithm to have an exponential run time. As previous work has shown [3], the overload resolution problem in \CFA has worst-case exponential complexity; however, only few specific patterns can trigger the exponential complexity in practice. Implementing heuristic-based optimization for those selected cases is helpful to alleviate the problem. The focus of this section is to identify and analyze some realistic cases that cause the resolver algorithm to have an exponential runtime. As previous work has shown~\cite[\S~4.2.1]{Moss19}, the overload resolution problem in \CFA has worst-case exponential complexity; however, only few specific patterns can trigger the exponential complexity in practice. Implementing heuristic-based optimization for those selected cases is helpful to alleviate the problem. \label{s:UnboundReturnType} The interaction of return type overloading and polymorphic functions creates this problem of function calls with unbound return type, and is further complicated by the presence of assertions. The interaction of return-type overloading and polymorphic functions creates function calls with unbounded return-type, and is further complicated by the presence of assertions. The prime example of a function with unbound return type is the type-safe version of C @malloc@: \begin{cfa} // size deduced from type, so no need to provide the size argument forall( dtype T | sized( T ) ) T * malloc( void ); \end{cfa} Unbound return type can be problematic in resolver algorithm complexity because a single match of function call with unbound return type may create multiple candidates. In the worst case, consider a function declared to return any @otype@: forall( dtype T | sized( T ) ) T * malloc( void ) { return (T *)malloc( sizeof(T) ); } // call C malloc int * i = malloc();  // type deduced from left-hand size $\Rightarrow$ no size argument or return cast \end{cfa} An unbound return-type is problematic in resolver complexity because a single match of a function call with an unbound return type may create multiple candidates. In the worst case, consider a function declared that returns any @otype@ (defined \VPageref{otype}): \begin{cfa} forall( otype T ) T anyObj( void ); \end{cfa} As the resolver attempts to satisfy the otype constraint on @T@, a single call to @anyObj()@ without the result type known creates at least as many candidates as the number of complete types currently in scope; with generic types it becomes even worse, for example, assuming a declaration of generic pair is available at that point: As the resolver attempts to satisfy the otype constraint on @T@, a call to @anyObj()@ in an expression, without the result type known, creates at least as many candidates as the number of complete types currently in scope; with generic types it becomes even worse, \eg assuming a declaration of a generic @pair@ is available at that point: \begin{cfa} forall( otype T, otype U ) struct pair { T first; U second; }; \end{cfa} Then an @anyObj()@ call can result in arbitrarily complex types, such as @pair( pair( int,int ), pair( int,int ) )@, and the depth can grow indefinitely until the specified parameter depth limit, thus creating exponentially many candidates. However, the expected types allowed by parent expressions are practically very few, so most of those interpretations are invalid; if the result type is never bound up to top level, by the semantic rules it is ambiguous if there are more than one valid bindings, and resolution can fail fast. It is therefore reasonable to delay resolving assertions on an unbound parameter in return type; however, with the current cost model, such behavior may further cause irregularities in candidate selection, such that the presence of assertions can change the preferred candidate, even when order of expression costs are supposed to stay the same. Detailed analysis of this issue will be presented later, in the correctness part. Then an @anyObj()@ call can result in arbitrarily complex types, such as @pair( pair( int, int ), pair( int, int ) )@, and the depth can grow indefinitely until a specified parameter-depth limit, thus creating exponentially many candidates. However, the expected types allowed by parent expressions are practically very few, so most of those interpretations are invalid; if the result type is never bound up to the top level, by the semantic rules it is ambiguous if there is more than one valid binding and resolution fails quickly. It is therefore reasonable to delay resolving assertions on an unbound parameter in a return type; however, with the current cost model, such behavior may further cause irregularities in candidate selection, such that the presence of assertions can change the preferred candidate, even when order of expression costs are supposed to stay the same. A detailed analysis of this issue is presented in \VRef{s:AnalysisTypeSystemCorrectness}. \label{s:TtypeResolutionInfiniteRecursion} @ttype@ (tuple type'') is a relatively new addition to the language that attempts to provide type-safe variadic argument semantics. Unlike regular @dtype@ parameters, @ttype@ is only valid in function parameter list, and may only appear once as the type of last parameter. At the call site, a @ttype@ parameter is bound to the tuple type of all remaining function call arguments. @ttype@ (tuple type'') is a relatively new addition to the language that attempts to provide type-safe variadic argument semantics. Unlike regular @dtype@ parameters, @ttype@ is only valid in a function parameter-list, and may only appear once as the last parameter type. At the call site, a @ttype@ parameter is bound to the tuple type of all remaining function-call arguments. There are two kinds of idiomatic @ttype@ usage: one is to provide flexible argument forwarding, similar to the variadic template in \CC (\lstinline[language=C++]|template|), as shown below in the implementation of @unique_ptr@ T * data; }; forall( dtype T | sized( T ), ttype Args | { void ?{}( T &, Args ); }) void ?{}( unique_ptr( T ) & this, Args args ) { this.data = new( args ); } \end{cfa} the other is to implement structural recursion in the first-rest manner: \begin{cfa} forall( otype T, ttype Params | { void process( T ); void func( Params ); }) forall( dtype T | sized( T ), @ttype Args@ | { void ?{}( T &, Args ); }) void ?{}( unique_ptr( T ) & this, Args @args@ ) { this.data = new( @args@ );  // forward constructor arguments to dynamic allocator } \end{cfa} The other usage is to implement structural recursion in the first-rest pattern: \begin{cfa} forall( otype T, @ttype Params@ | { void process( T ); void func( Params ); }) void func( T arg1, Params p ) { process( arg1 ); func( p ); } \end{cfa} For the second use case, it is important that the number of parameters in the recursive call go down, since the call site must deduce all assertion candidates, and that is only possible if by just looking at argument types (and not their values), the recursion is known to be completed in a finite number of steps. In recent experiments, however, some flaw in the type binding rules can lead to the first kind of @ttype@ use case produce an invalid candidate that the resolver enters an infinite loop. This bug was discovered in an attempt to raise assertion recursive depth limit and one of the library program takes exponentially longer time to compile. The cause of the problem is identified to be the following set of functions. File @memory.cfa@ contains \begin{cfa} #include "memory.hfa" #include "stdlib.hfa" \end{cfa} where file @memory.hfa@ contains the @unique_ptr@ declaration above, and two other similar functions with @ttype@ parameter: \begin{cfa} forall( dtype T | sized( T ), ttype Args | { void ?{}( T &, Args ); }) { func( @p@ );  // recursive call until base case of one argument } \end{cfa} For the second use case, it is imperative the number of parameters in the recursive call goes down, since the call site must deduce all assertion candidates, and that is only possible if by observation of the argument types (and not their values), the recursion is known to be completed in a finite number of steps. In recent experiments, however, a flaw in the type-binding rules can lead to the first kind of @ttype@ use case producing an invalid candidate and the resolver enters an infinite loop. This bug was discovered in an attempt to raise the assertion recursive-depth limit and one of the library programs took exponentially longer to compile. The cause of the problem is the following set of functions: \begin{cfa} // unique_ptr  declaration from above forall( dtype T | sized( T ), ttype Args | { void ?{}( T &, Args ); } ) { // distribute forall clause void ?{}( counter_data( T ) & this, Args args ); void ?{}( counter_ptr( T ) & this, Args args ); void ?{}( unique_ptr( T ) & this, Args args ); } \end{cfa} File @stdlib.hfa@ contains \begin{cfa} forall( dtype T | sized( T ), ttype TT | { void ?{}( T &, TT ); } ) T * new( TT p ) { return &(*malloc()){ p }; } \end{cfa} In the expression @(*malloc()){p}@, the type of object being constructed is yet unknown, since the return type information is not immediately provided. That caused every constructor to be searched, and while normally a bound @ttype@ cannot be unified with any free parameter, it is possible with another free @ttype@. Therefore in addition to the correct option provided by assertion, 3 wrong options are examined, each of which again requires the same assertion, for an unknown base type T and @ttype@ arguments, and that becomes an infinite loop, until the specified recursion limit and resolution is forced to fail. Moreover, during the recursion steps, number of candidates grows exponentially, since there are always 3 options at each step. Unfortunately, @ttype@ to @ttype@ binding is necessary, to allow calling the function provided by assertion indirectly. \begin{cfa} forall( dtype T | sized( T ), ttype Args | { void ?{}( T &, Args ); }) void ?{}( unique_ptr( T ) & this, Args args ) { this.data = (T * )new( args ); } \end{cfa} Here the constructor assertion is used for the @new( args )@ call. T * new( TT p ) { return @&(*malloc()){ p };@ } \end{cfa} In the expression @(*malloc()){p}@, the type of the object being constructed is unknown, since the return-type information is not immediately available. That causes every constructor to be searched, and while normally a bound @ttype@ cannot be unified with any free parameter, it is possible with another free @ttype@. Therefore, in addition to the correct option provided by the assertion, 3 wrong options are examined, each of which again requires the same assertion, for an unknown base-type @T@ and @ttype@ argument, which becomes an infinite loop until the specified recursion limit and resolution is fails. Moreover, during the recursion steps, the number of candidates grows exponentially, since there are always 3 options at each step. Unfortunately, @ttype@ to @ttype@ binding is necessary, to allow indirectly calling a function provided in an assertion. \begin{cfa} forall( dtype T | sized( T ), ttype Args | { @void ?{}( T &, Args );@ }) void ?{}( unique_ptr( T ) & this, Args args ) { this.data = (T *)@new( args )@; } // constructor call \end{cfa} Here the constructor assertion is used by the @new( args )@ call to indirectly call the constructor on the allocated storage. Therefore, it is hard, perhaps impossible, to solve this problem by tweaking the type binding rules. An assertion caching algorithm can help improve this case by detecting cycles in recursion. Meanwhile, without the caching algorithm implemented, some changes in the \CFA source code are enough to eliminate this problem, at least in the current codebase. Note that the issue only happens with an overloaded variadic function, which rarely appears in practice, since the idiomatic use cases are for argument forwarding and self-recursion. The only overloaded @ttype@ function so far discovered in all of \CFA standard library code is the constructor, and by utilizing the argument-dependent lookup process described in Section~\ref{s:UnboundReturnType}, adding a cast before constructor call gets rid of the issue. \begin{cfa} T * new( TT p ) { return &(*(T * )malloc()){ p }; } Meanwhile, without a caching algorithm implemented, some changes in the \CFA source code are enough to eliminate this problem, at least in the current codebase. Note that the issue only happens with an overloaded variadic function, which rarely appears in practice, since the idiomatic use cases are for argument forwarding and self-recursion. The only overloaded @ttype@ function so far discovered in all of \CFA standard library is the constructor, and by utilizing the argument-dependent lookup process described in \VRef{s:UnboundReturnType}, adding a cast before the constructor call removes the issue. \begin{cfa} T * new( TT p ) { return &(*@(T * )@malloc()){ p }; } \end{cfa} \subsection{Reused assertions in nested generic type} The following test of deeply nested dynamic generic type reveals that locally caching reused assertions is necessary, rather than just a resolver optimization, because recomputing assertions can result in bloated generated code size: The following test of deeply nested, dynamic generic type reveals that locally caching reused assertions is necessary, rather than just a resolver optimization, because recomputing assertions can result in bloated generated code size: \begin{cfa} struct nil {}; int main() { #if   N==0 nil x; nil @x@; #elif N==1 cons( size_t, nil ) x; cons( size_t, nil ) @x@; #elif N==2 cons( size_t, cons( size_t, nil ) ) x; cons( size_t, cons( size_t, nil ) ) @x@; #elif N==3 cons( size_t, cons( size_t, cons( size_t, nil ) ) ) x; cons( size_t, cons( size_t, cons( size_t, nil ) ) ) @x@; // similarly for N=4,5,6 #endif } \end{cfa} At the declaration of @x@, it is implicitly initialized by generated constructor call, whose signature is given by At the declaration of @x@, it is implicitly initialized by generated constructor call, with signature: \begin{cfa} forall( otype L, otype R ) void ?{}( cons( L, R ) & ); \end{cfa} Note that the @otype@ constraint contains 4 assertions: where the @otype@ constraint contains the 4 assertions:\label{otype} \begin{cfa} void ?{}( L & ); // default constructor L & ?=?( L &, L & ); // assignment \end{cfa} Now since the right hand side of outermost cons is again a cons, recursive assertions are required. When the compiler cannot cache and reuse already resolved assertions, it becomes a problem, as each of those 4 pending assertions again asks for 4 more assertions one level below. Without any caching, number of resolved assertions grows exponentially, while that is obviously unnecessary since there are only $n+1$ different types involved. Even worse, this causes exponentially many wrapper functions generated later at the codegen step, and results in huge compiled binary. \begin{table}[h] \begin{table}[htb] \centering \caption{Compilation results of nested cons test} \label{t:NestedConsTest} \begin{tabular}{|r|r|r|} \hline \end{table} As the local functions are implemented by emitting executable code on the stack~\cite{gcc-nested-func}, it eventually means that compiled code also has exponential run time. This problem has evident practical implications, as nested collection types are frequently used in real production code. Now since the right hand side of outermost cons is again a cons, recursive assertions are required. \VRef[Table]{t:NestedConsTest} shows when the compiler does not cache and reuse already resolved assertions, it becomes a problem, as each of these 4 pending assertions again asks for 4 more assertions one level below. Without caching, the number of resolved assertions grows exponentially, which is unnecessary since there are only $n+1$ different types involved. Even worse, this problem causes exponentially many wrapper functions to be generated at the backend, resulting in a huge binary. As the local functions are implemented by emitting executable code on the stack~\cite{gcc-nested-func}, it means that compiled code also has exponential run time. This problem has practical implications, as nested collection types are frequently used in real production code. \section{Analysis of type system correctness} \label{s:AnalysisTypeSystemCorrectness} In Moss' thesis~\cite[\S~4.1.2,~p.~45]{Moss19}, the author presents the following example: From the set of candidates whose parameter and argument types have been unified and whose assertions have been satisfied, those whose sub-expression interpretations have the smallest total cost of conversion are selected ... The total cost of conversion for each of these candidates is then calculated based on the implicit conversions and polymorphism involved in adapting the types of the sub-expression interpretations to the formal parameter types. \end{quote} With this model, the algorithm picks @g1@ in resolving the @f( g( 42 ) )@ call, which seems to be undesirable. There are further evidence that shows the Bilson model is fundamentally incorrect, following the discussion of unbound return type in Section~\ref{s:UnboundReturnType}. By the conversion cost specification, a binding from a polymorphic type parameter to a concrete type incurs a polymorphic cost of 1. It remains unspecified \emph{when} the type parameters should become bound. When the parameterized types appear in the function parameters, they can be deduced from the argument type, and there is no ambiguity. In the unbound return case, however, the binding may happen at any stage in expression resolution, therefore it is impossible to define a unique local conversion cost. Note that type binding happens exactly once per parameter in resolving the entire expression, so the global binding cost is unambiguously 1. As per the current compiler implementation, it does have a notable inconsistency in handling such case. For any unbound parameter that does \emph{not} come with an associated assertion, it remains unbound to the parent expression; for those that does however, they are immediately bound in the assertion resolution step, and concrete result types are used in the parent expressions. With this model, the algorithm picks @g1@ in resolving the @f( g( 42 ) )@ call, which is undesirable. There is further evidence that shows the Bilson model is fundamentally incorrect, following the discussion of unbound return type in \VRef{s:UnboundReturnType}. By the conversion-cost specification, a binding from a polymorphic type-parameter to a concrete type incurs a polymorphic cost of 1. It remains unspecified \emph{when} the type parameters should become bound. When the parameterized types appear in function parameters, they can be deduced from the argument type, and there is no ambiguity. In the unbound return case, however, the binding may happen at any stage in expression resolution, therefore it is impossible to define a unique local conversion cost. Note that type binding happens exactly once per parameter in resolving the entire expression, so the global binding cost is unambiguously 1. In the current compiler implementation, there is a notable inconsistency in handling this case. For any unbound parameter that does \emph{not} come with an associated assertion, it remains unbound to the parent expression; for those that do, however, they are immediately bound in the assertion resolution step, and concrete result types are used in the parent expressions. Consider the following example: \begin{cfa} void h( int * ); \end{cfa} The expression @h( f() )@ eventually has a total cost of 1 from binding (T: int), but in the eager resolution model, the cost of 1 may occur either at call to @f@ or at call to @h@, and with the assertion resolution triggering a binding, the local cost of @f()@ is (0 poly, 0 spec) with no assertions, but (1 poly, -1 spec) with an assertion: \begin{cfa} forall( dtype T | { void g( T * ); } ) T * f( void ); The expression @h( f() )@ eventually has a total cost of 1 from binding (T: int), but in the eager-resolution model, the cost of 1 may occur either at the call to @f@ or at call to @h@, and with the assertion resolution triggering a binding, the local cost of @f()@ is (0 poly, 0 spec) with no assertions, but (1 poly, -1 spec) with an assertion: \begin{cfa} forall( dtype T | @{ void g( T * ); }@ ) T * f( void ); void g( int * ); void h( int * ); \end{cfa} and that contradicts the principle that adding assertions should make expression cost lower. Furthermore, the time at which type binding and assertion resolution happens is an implementation detail of the compiler, but not a part of language definition. That means two compliant \CFA compilers, one performing immediate assertion resolution at each step, and one delaying assertion resolution on unbound types, can produce different expression costs and therefore different candidate selection, making the language rule itself partially undefined and therefore unsound. By the above reasoning, the updated cost model using global sum of costs should be accepted as the standard. It also allows the compiler to freely choose when to resolve assertions, as the sum of total costs is independent of that choice; more optimizations regarding assertion resolution can also be implemented. and that contradicts the principle that adding assertions should make expression cost lower. Furthermore, the time at which type binding and assertion resolution happens is an implementation detail of the compiler, not part of the language definition. That means two compliant \CFA compilers, one performing immediate assertion resolution at each step, and one delaying assertion resolution on unbound types, can produce different expression costs and therefore different candidate selection, making the language rule itself partially undefined, and therefore, unsound. By the above reasoning, the updated cost model using global sum of costs should be accepted as the standard. It also allows the compiler to freely choose when to resolve assertions, as the sum of total costs is independent of that choice; more optimizations regarding assertion resolution can also be implemented. \section{Timing results} For the timing results presented here, the \CFA compiler is built with gcc 9.3.0, and tested on a server machine running Ubuntu 20.04, 64GB RAM and 32-core 2.2 GHz CPU, results reported by the time command, and using only 8 cores in parallel such that the time is close to the case with 100% CPU utilization on a single thread. On the most recent build, the \CFA standard library (~1.3 MB of source code) compiles in 4 minutes 47 seconds total processor time (single thread equivalent), with the slowest file taking 13 seconds. The test suite (178 test cases, ~2.2MB of source code) completes within 25 minutes total processor time,\footnote{Including a few runtime tests; total time spent in compilation is approximately 21 minutes.} with the slowest file taking 23 seconds. In contrast, the library build on old compiler takes 85 minutes total, 5 minutes for the slowest file. Full test suite takes too long with old compiler build and is therefore not run, but the slowest test cases take approximately 5 minutes. Overall, the most recent build compared to old build in April 2020, before the project started, is consistently faster by a factor of 20. Additionally, 6 selected \CFA source files with distinct features from library and test suite are used to test compiler performance after each of the optimizations are implemented. Test files are from the most recent build and run through C preprocessor to eliminate the factor of header file changes. The selected tests are: \begin{itemize} \item @lib/fstream@ (112 KB)\footnote{File sizes are after preprocessing, with no line information (\lstinline|gcc -E -P|).}: implementation of I/O library For the timing results presented here, the \CFA compiler is built with gcc 9.3.0, and tested on a server machine running Ubuntu 20.04, 64GB RAM and 32-core 2.2 GHz CPU. Timing is reported by the @time@ command and an experiment is run using 8 cores, where each core is at 100\% CPU utilization. On the most recent build, the \CFA standard library ($\approx$1.3 MB of source code) compiles in 4 minutes 47 seconds total processor time (single thread equivalent), with the slowest file taking 13 seconds. The test suite (178 test cases, $\approx$2.2MB of source code) completes within 25 minutes total processor time, % PAB: I do not understand this footnote. %\footnote{Including a few runtime tests; total time spent in compilation is approximately 21 minutes.} with the slowest file taking 23 seconds. In contrast, the library build with the old compiler takes 85 minutes total, 5 minutes for the slowest file. The full test-suite takes too long with old compiler build and is therefore not run, but the slowest test cases take approximately 5 minutes. Overall, the most recent build compared to an old build is consistently faster by a factor of 20. Additionally, 6 selected \CFA source files with distinct features from the library and test suite are used to illustrate the compiler performance change after each of the implemented optimizations. Test files are from the most recent build and run through the C preprocessor to expand header file, perform macro expansions, but no line number information (@gcc -E -P@). \VRef[Table]{t:SelectedFileByCompilerBuild} shows the selected tests: \begin{itemize} \item @lib/fstream@ (112 KB) \item @lib/mutex@ (166 KB): implementation of concurrency primitive @lib/stdlib@ (64 KB): type-safe wrapper to @void *@-based C standard library functions \item @test/ISO2@ (55 KB): application of I/O library @test/io2@ (55 KB): application of I/O library \item @test/thread@ (188 KB): application of threading library \end{itemize} The \CFA compiler builds are picked from git commit history that passed the test suite, and implement the optimizations incrementally: \begin{itemize} \item \#0 is the first working build of new AST data structure versus \CFA compiler builds picked from the git commit history that implement the optimizations incrementally: \begin{itemize} \item old resolver \item \#0 is the first working build of the new AST data structure \item \#1 implements special symbol table and argument-dependent lookup \item \#2 implements late assertion satisfaction \item \#3 implements revised function type representation \item \#4 skips pruning on expressions with function type (most recent build) \end{itemize} The old resolver with no memory sharing and none of the optimizations above is also tested. \begin{table} \#2 implements late assertion-satisfaction \item \#3 implements revised function-type representation \item \#4 skips pruning on expressions for function types (most recent build) \end{itemize} Reading left to right for a test shows the benefit of each optimization on the cost of compilation. \begin{table}[htb] \centering \caption{Compile time of selected files by compiler build, in seconds} \label{t:SelectedFileByCompilerBuild} \begin{tabular}{|l|r|r|r|r|r|r|} \hline \end{table} \section{Conclusion} Over the course of 8 months of active research and development in \CFA type system and compiler algorithm, performance of the reference \CFA compiler, cfa-cc, has been greatly improved, allowing mid-sized \CFA programs to be compiled and built reasonably fast. As there are also ongoing efforts in the team on building a standard library, evaluating the runtime performance, and attempting to incorporate \CFA with existing software written in C, this project is especially meaningful for practical purposes. Analysis conducted in the project were based significantly on heuristics and practical evidence, as the theoretical bounds and average cases for the expression resolution problem differ. This approach was difficult at start to follow, with an unacceptably slow compiler, since running the program through debugger and validation tools (\eg @gdb@, @valgrind@) adds another order of magnitude to run time, which was already in minutes. However, near the end of the project, many significant improvements have already been made and new optimizations can be tested immediately. The positive feedback in development cycle benefits the \CFA team as a whole, more than just for the compiler optimizations. Some potential issues of the language that may happen frequently in practice have been identified. Due to the time constraint and complex nature of these problems, a handful of them remain unsolved, but some constructive proposals are made. Notably, introducing a local assertion cache in the resolver is a common solution for a few remaining problems, so that should be the focus of work soon. The \CFA team are planning on a public alpha release of the language as the compiler performance becomes promising, and other parts of the system, such as a standard library, are also being enhanced. Ideally, the remaining problems should be resolved before release, and the solutions will also be integral to drafting a formal specification. Over the course of 8 months of active research and development of the \CFA type system and compiler algorithms, performance of the reference \CFA compiler, cfa-cc, has been greatly improved. Now, mid-sized \CFA programs are compiled reasonably fast. Currently, there are ongoing efforts by the \CFA team to augment the standard library and evaluate its runtime performance, and incorporate \CFA with existing software written in C; therefore this project is especially meaningful for these practical purposes. Accomplishing this work was difficult. Analysis conducted in the project is based significantly on heuristics and practical evidence, as the theoretical bounds and average cases for the expression resolution problem differ. As well, the slowness of the initial compiler made attempts to understand why and where problems exist extremely difficult because both debugging and validation tools (\eg @gdb@, @valgrind@, @pref@) further slowed down compilation time. However, by the end of the project, I had found and fixed several significant problems and new optimizations are easier to introduce and test. The reduction in the development cycle benefits the \CFA team as a whole. Some potential issues of the language, which happen frequently in practice, have been identified. Due to the time constraint and complex nature of these problems, a handful of them remain unsolved, but some constructive proposals are made. Notably, introducing a local assertion cache in the resolver is a reasonable solution for a few remaining problems, so that should be the focus of future work. The \CFA team are planning on a public alpha release of the language as the compiler performance, given my recent improvements, is now useable. Other parts of the system, such as the standard library, have made significant gains due to the speed up in the development cycle. Ideally, the remaining problems should be resolved before release, and the solutions will also be integral to drafting a formal specification. \addcontentsline{toc}{section}{\refname}