Changeset d95969a for doc/theses


Ignore:
Timestamp:
Jan 25, 2021, 3:45:42 PM (9 months ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
arm-eh, jacob/cs343-translation, master, new-ast-unique-expr
Children:
c292244
Parents:
b6a8b31 (diff), 7158202 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Location:
doc/theses
Files:
3 added
9 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/andrew_beach_MMath/.gitignore

    rb6a8b31 rd95969a  
    33
    44# Final Files:
    5 thesis.pdf
     5*.pdf
    66
    77# The Makefile here is not generated.
  • doc/theses/andrew_beach_MMath/Makefile

    rb6a8b31 rd95969a  
    11### Makefile for Andrew Beach's Masters Thesis
    22
    3 DOC=thesis.pdf
     3DOC=uw-ethesis.pdf
    44BUILD=out
    55TEXSRC=$(wildcard *.tex)
     
    77STYSRC=$(wildcard *.sty)
    88CLSSRC=$(wildcard *.cls)
    9 TEXLIB= .:${BUILD}:
     9TEXLIB= .:../../LaTeXmacros:${BUILD}:
    1010BIBLIB= .:../../bibliography
    1111
     
    2929        ${LATEX} ${BASE}
    3030        ${BIBTEX} ${BUILD}/${BASE}
     31        ${LATEX} ${BASE}
    3132        ${GLOSSARY} ${BUILD}/${BASE}
    3233        ${LATEX} ${BASE}
  • doc/theses/andrew_beach_MMath/existing.tex

    rb6a8b31 rd95969a  
    1 \chapter{\CFA{} Existing Features}
    2 
    3 \section{Overloading and extern}
    4 Cforall has overloading, allowing multiple definitions of the same name to
    5 be defined.
    6 
    7 This also adds name mangling so that the assembly symbols are unique for
    8 different overloads. For compatability with names in C there is also a
    9 syntax to diable the name mangling. These unmangled names cannot be overloaded
    10 but act as the interface between C and \CFA code.
    11 
    12 The syntax for disabling mangling is:
    13 \begin{lstlisting}
    14 extern "C" {
    15     ...
    16 }
    17 \end{lstlisting}
    18 
    19 To re-enable mangling once it is disabled the syntax is:
    20 \begin{lstlisting}
    21 extern "Cforall" {
    22     ...
    23 }
    24 \end{lstlisting}
    25 
    26 Both should occur at the declaration level and effect all the declarations
    27 in \texttt{...}. Neither care about the state of mangling when they begin
    28 and will return to that state after the group is finished. So re-enabling
    29 is only used to nest areas of mangled and unmangled declarations.
    30 
    31 \section{References}
    32 \CFA adds references to C. These are auto-dereferencing pointers and use the
    33 same syntax as pointers except they use ampersand (\codeCFA{\&}) instead of
    34 the asterisk (\codeCFA{*}). They can also be constaint or mutable, if they
    35 are mutable they may be assigned to by using the address-of operator
    36 (\codeCFA\&) which converts them into a pointer.
     1\chapter{\texorpdfstring{\CFA Existing Features}{Cforall Existing Features}}
     2
     3\CFA (C-for-all)~\cite{Cforall} is an open-source project extending ISO C with
     4modern safety and productivity features, while still ensuring backwards
     5compatibility with C and its programmers.  \CFA is designed to have an
     6orthogonal feature-set based closely on the C programming paradigm
     7(non-object-oriented) and these features can be added incrementally to an
     8existing C code-base allowing programmers to learn \CFA on an as-needed basis.
     9
     10Only those \CFA features pertinent to this thesis are discussed.  Many of the
     11\CFA syntactic and semantic features used in the thesis should be fairly
     12obvious to the reader.
     13
     14\section{\texorpdfstring{Overloading and \lstinline|extern|}{Overloading and extern}}
     15\CFA has extensive overloading, allowing multiple definitions of the same name
     16to be defined.~\cite{Moss18}
     17\begin{cfa}
     18char i; int i; double i;                        $\C[3.75in]{// variable overload}$
     19int f(); double f();                            $\C{// return overload}$
     20void g( int ); void g( double );        $\C{// parameter overload}\CRT$
     21\end{cfa}
     22This feature requires name mangling so the assembly symbols are unique for
     23different overloads. For compatibility with names in C, there is also a syntax
     24to disable name mangling. These unmangled names cannot be overloaded but act as
     25the interface between C and \CFA code.  The syntax for disabling/enabling
     26mangling is:
     27\begin{cfa}
     28// name mangling
     29int i; // _X1ii_1
     30@extern "C"@ {  // no name mangling
     31        int j; // j
     32        @extern "Cforall"@ {  // name mangling
     33                int k; // _X1ki_1
     34        }
     35        // no name mangling
     36}
     37// name mangling
     38\end{cfa}
     39Both forms of @extern@ affect all the declarations within their nested lexical
     40scope and transition back to the previous mangling state when the lexical scope
     41ends.
     42
     43\section{Reference Type}
     44\CFA adds a rebindable reference type to C, but more expressive than the \CC
     45reference.  Multi-level references are allowed and act like auto-dereferenced
     46pointers using the ampersand (@&@) instead of the pointer asterisk (@*@). \CFA
     47references may also be mutable or non-mutable. If mutable, a reference variable
     48may be assigned to using the address-of operator (@&@), which converts the
     49reference to a pointer.
     50\begin{cfa}
     51int i, j;
     52int @&@ ri = i, @&&@ rri = ri;
     53rri = 3;  // auto-dereference assign to i
     54@&@ri = @&@j; // rebindable
     55ri = 5;   // assign to j
     56\end{cfa}
    3757
    3858\section{Constructors and Destructors}
    3959
    4060Both constructors and destructors are operators, which means they are just
    41 functions with special names. The special names are used to define them and
    42 may be used to call the functions expicately. The \CFA special names are
    43 constructed by taking the tokens in the operators and putting \texttt{?} where
    44 the arguments would go. So multiplication is \texttt{?*?} while dereference
    45 is \texttt{*?}. This also make it easy to tell the difference between
    46 pre-fix operations (such as \texttt{++?}) and post-fix operations
    47 (\texttt{?++}).
    48 
    49 The special name for contructors is \texttt{?\{\}}, which comes from the
    50 initialization syntax in C. The special name for destructors is
    51 \texttt{\^{}?\{\}}. % I don't like the \^{} symbol but $^\wedge$ isn't better.
    52 
    53 Any time a type T goes out of scope the destructor matching
    54 \codeCFA{void ^?\{\}(T \&);} is called. In theory this is also true of
    55 primitive types such as \codeCFA{int}, but in practice those are no-ops and
    56 are usually omitted for optimization.
     61functions with special operator names rather than type names in \CC. The
     62special operator names may be used to call the functions explicitly (not
     63allowed in \CC for constructors).
     64
     65In general, operator names in \CFA are constructed by bracketing an operator
     66token with @?@, which indicates where the arguments. For example, infixed
     67multiplication is @?*?@ while prefix dereference is @*?@. This syntax make it
     68easy to tell the difference between prefix operations (such as @++?@) and
     69post-fix operations (@?++@).
     70
     71The special name for a constructor is @?{}@, which comes from the
     72initialization syntax in C. The special name for a destructor is @^{}@, where
     73the @^@ has no special meaning.
     74% I don't like the \^{} symbol but $^\wedge$ isn't better.
     75\begin{cfa}
     76struct T { ... };
     77void ?@{}@(@T &@ this, ...) { ... }  // constructor
     78void ?@^{}@(@T &@ this, ...) { ... } // destructor
     79{
     80        T s = @{@ ... @}@;  // same constructor/initialization braces
     81} // destructor call automatically generated
     82\end{cfa}
     83The first parameter is a reference parameter to the type for the
     84constructor/destructor. Destructors may have multiple parameters.  The compiler
     85implicitly matches an overloaded constructor @void ^?{}(T &, ...);@ to an
     86object declaration with associated initialization, and generates a construction
     87call after the object is allocated. When an object goes out of scope, the
     88matching overloaded destructor @void ^?{}(T &);@ is called.  Without explicit
     89definition, \CFA creates a default and copy constructor, destructor and
     90assignment (like \CC). It is possible to define constructors/destructors for
     91basic and existing types.
    5792
    5893\section{Polymorphism}
    59 \CFA uses polymorphism to create functions and types that are defined over
    60 different types. \CFA polymorphic declarations serve the same role as \CPP
    61 templates or Java generics.
    62 
    63 Polymorphic declaractions start with a forall clause that goes before the
    64 standard (monomorphic) declaration. These declarations have the same syntax
    65 except that you may use the names introduced by the forall clause in them.
    66 
    67 Forall clauses are written \codeCFA{forall( ... )} where \codeCFA{...} becomes
    68 the list of polymorphic variables (local type names) and assertions, which
    69 repersent required operations on those types.
    70 
    71 \begin{lstlisting}
    72 forall(dtype T | { void do_once(T &); })
    73 void do_twice(T & value) {
    74     do_once(value);
    75     do_once(value);
    76 }
    77 \end{lstlisting}
    78 
    79 A polymorphic function can be used in the same way normal functions are.
    80 The polymorphics variables are filled in with concrete types and the
    81 assertions are checked. An assertion checked by seeing if that name of that
    82 type (with all the variables replaced with the concrete types) is defined at
    83 the the call site.
    84 
    85 As an example, even if no function named \codeCFA{do_once} is not defined
    86 near the definition of \codeCFA{do_twice} the following code will work.
    87 \begin{lstlisting}
     94\CFA uses parametric polymorphism to create functions and types that are
     95defined over multiple types. \CFA polymorphic declarations serve the same role
     96as \CC templates or Java generics. The ``parametric'' means the polymorphism is
     97accomplished by passing argument operations to associate \emph{parameters} at
     98the call site, and these parameters are used in the function to differentiate
     99among the types the function operates on.
     100
     101Polymorphic declarations start with a universal @forall@ clause that goes
     102before the standard (monomorphic) declaration. These declarations have the same
     103syntax except they may use the universal type names introduced by the @forall@
     104clause.  For example, the following is a polymorphic identity function that
     105works on any type @T@:
     106\begin{cfa}
     107@forall( T )@ @T@ identity( @T@ val ) { return val; }
     108int forty_two = identity( 42 ); // T bound to int, forty_two == 42
     109\end{cfa}
     110
     111To allow a polymorphic function to be separately compiled, the type @T@ must be
     112constrained by the operations used on @T@ in the function body. The @forall@
     113clauses is augmented with a list of polymorphic variables (local type names)
     114and assertions (constraints), which represent the required operations on those
     115types used in a function, \eg:
     116\begin{cfa}
     117forall( T @| { void do_once(T); }@) // assertion
     118void do_twice(T value) {
     119        do_once(value);
     120        do_once(value);
     121}
     122void do_once(int i) { ... }  // provide assertion
     123int i;
     124do_twice(i); // implicitly pass assertion do_once to do_twice
     125\end{cfa}
     126Any object with a type fulfilling the assertion may be passed as an argument to
     127a @do_twice@ call.
     128
     129A polymorphic function can be used in the same way as a normal function.  The
     130polymorphic variables are filled in with concrete types and the assertions are
     131checked. An assertion is checked by verifying each assertion operation (with
     132all the variables replaced with the concrete types from the arguments) is
     133defined at a call site.
     134
     135Note, a function named @do_once@ is not required in the scope of @do_twice@ to
     136compile it, unlike \CC template expansion. Furthermore, call-site inferencing
     137allows local replacement of the most specific parametric functions needs for a
     138call.
     139\begin{cfa}
     140void do_once(double y) { ... } // global
    88141int quadruple(int x) {
    89     void do_once(int & y) {
    90         y = y * 2;
    91     }
    92     do_twice(x);
    93     return x;
    94 }
    95 \end{lstlisting}
    96 This is not the recommended way to implement a quadruple function but it
    97 does work. The complier will deduce that \codeCFA{do_twice}'s T is an
    98 integer from the argument. It will then look for a definition matching the
    99 assertion which is the \codeCFA{do_once} defined within the function. That
    100 function will be passed in as a function pointer to \codeCFA{do_twice} and
    101 called within it.
    102 
    103 To avoid typing out long lists of assertions again and again there are also
    104 traits which collect assertions into convenent packages that can then be used
    105 in assertion lists instead of all of their components.
    106 \begin{lstlisting}
    107 trait done_once(dtype T) {
    108     void do_once(T &);
    109 }
    110 \end{lstlisting}
    111 
    112 After this the forall list in the previous example could instead be written
    113 with the trait instead of the assertion itself.
    114 \begin{lstlisting}
    115 forall(dtype T | done_once(T))
    116 \end{lstlisting}
    117 
    118 Traits can have arbitrary number of assertions in them and are usually used to
    119 create short hands for, and give descriptive names to, commond groupings of
    120 assertions.
    121 
    122 Polymorphic structures and unions may also be defined by putting a forall
    123 clause before the declaration. The type variables work the same way except
    124 are now used in field declaractions instead of parameters and local variables.
    125 
    126 \begin{lstlisting}
     142        void do_once(int y) { y = y * 2; } // local
     143        do_twice(x); // using local "do_once"
     144        return x;
     145}
     146\end{cfa}
     147Specifically, the complier deduces that @do_twice@'s T is an integer from the
     148argument @x@. It then looks for the most specific definition matching the
     149assertion, which is the nested integral @do_once@ defined within the
     150function. The matched assertion function is then passed as a function pointer
     151to @do_twice@ and called within it.
     152
     153To avoid typing long lists of assertions, constraints can be collect into
     154convenient packages called a @trait@, which can then be used in an assertion
     155instead of the individual constraints.
     156\begin{cfa}
     157trait done_once(T) {
     158        void do_once(T);
     159}
     160\end{cfa}
     161and the @forall@ list in the previous example is replaced with the trait.
     162\begin{cfa}
     163forall(dtype T | @done_once(T)@)
     164\end{cfa}
     165In general, a trait can contain an arbitrary number of assertions, both
     166functions and variables, and are usually used to create a shorthand for, and
     167give descriptive names to, common groupings of assertions describing a certain
     168functionality, like @sumable@, @listable@, \etc.
     169
     170Polymorphic structures and unions are defined by qualifying the aggregate type
     171with @forall@. The type variables work the same except they are used in field
     172declarations instead of parameters, returns, and local variable declarations.
     173\begin{cfa}
    127174forall(dtype T)
    128175struct node {
    129     node(T) * next;
    130     T * data;
    131 }
    132 \end{lstlisting}
    133 
    134 The \codeCFA{node(T)} is a use of a polymorphic structure. Polymorphic types
    135 must be provided their polymorphic parameters.
    136 
    137 There are many other features of polymorphism that have not given here but
    138 these are the ones used by the exception system.
     176        node(T) * next;  // generic linked node
     177        T * data;
     178}
     179\end{cfa}
     180The generic type @node(T)@ is an example of a polymorphic-type usage.  Like \CC
     181templates usage, a polymorphic-type usage must specify a type parameter.
     182
     183There are many other polymorphism features in \CFA but these are the ones used
     184by the exception system.
    139185
    140186\section{Concurrency}
    141 
    142 \CFA has a number of concurrency features, \codeCFA{thread}s,
    143 \codeCFA{monitor}s and \codeCFA{mutex} parameters, \codeCFA{coroutine}s and
    144 \codeCFA{generator}s. The two features that interact with the exception system
    145 are \codeCFA{thread}s and \codeCFA{coroutine}s; they and their supporting
    146 constructs will be described here.
    147 
    148 \subsection{Coroutines}
    149 Coroutines are routines that do not have to finish execution to hand control
    150 back to their caller, instead they may suspend their execution at any time
    151 and resume it later.
    152 Coroutines are not true concurrency but share some similarities and many of
    153 the same underpinnings and so are included as part of the \CFA threading
    154 library.
    155 
    156 In \CFA coroutines are created using the \codeCFA{coroutine} keyword which
    157 works just like \codeCFA{struct} except that the created structure will be
    158 modified by the compiler to satify the \codeCFA{is_coroutine} trait.
    159 
    160 These structures act as the interface between callers and the coroutine,
    161 the fields are used to pass information in and out. Here is a simple example
    162 where the single field is used to pass the next number in a sequence out.
    163 \begin{lstlisting}
     187\CFA has a number of concurrency features: @thread@, @monitor@, @mutex@
     188parameters, @coroutine@ and @generator@. The two features that interact with
     189the exception system are @thread@ and @coroutine@; they and their supporting
     190constructs are described here.
     191
     192\subsection{Coroutine}
     193A coroutine is a type with associated functions, where the functions are not
     194required to finish execution when control is handed back to the caller. Instead
     195they may suspend execution at any time and be resumed later at the point of
     196last suspension. (Generators are stackless and coroutines are stackful.) These
     197types are not concurrent but share some similarities along with common
     198underpinnings, so they are combined with the \CFA threading library. Further
     199discussion in this section only refers to the coroutine because generators are
     200similar.
     201
     202In \CFA, a coroutine is created using the @coroutine@ keyword, which is an
     203aggregate type like @struct,@ except the structure is implicitly modified by
     204the compiler to satisfy the @is_coroutine@ trait; hence, a coroutine is
     205restricted by the type system to types that provide this special trait.  The
     206coroutine structure acts as the interface between callers and the coroutine,
     207and its fields are used to pass information in and out of coroutine interface
     208functions.
     209
     210Here is a simple example where a single field is used to pass (communicate) the
     211next number in a sequence.
     212\begin{cfa}
    164213coroutine CountUp {
    165     unsigned int next;
    166 }
    167 \end{lstlisting}
    168 
    169 The routine part of the coroutine is a main function for the coroutine. It
    170 takes a reference to a coroutine object and returns nothing. In this function,
    171 and any functions called by this function, the suspend statement may be used
    172 to return execution to the coroutine's caller. When control returns to the
    173 function it continue from that same suspend statement instead of at the top
    174 of the function.
    175 \begin{lstlisting}
    176 void main(CountUp & this) {
    177     unsigned int next = 0;
    178     while (true) {
    179         this.next = next;
    180         suspend;
    181         next = next + 1;
    182     }
    183 }
    184 \end{lstlisting}
    185 
    186 Control is passed to the coroutine with the resume function. This includes the
    187 first time when the coroutine is starting up. The resume function takes a
    188 reference to the coroutine structure and returns the same reference. The
    189 return value is for easy access to communication variables. For example the
    190 next value from a count-up can be generated and collected in a single
    191 expression: \codeCFA{resume(count).next}.
     214        unsigned int next; // communication variable
     215}
     216CountUp countup;
     217\end{cfa}
     218Each coroutine has @main@ function, which takes a reference to a coroutine
     219object and returns @void@.
     220\begin{cfa}[numbers=left]
     221void main(@CountUp & this@) { // argument matches trait is_coroutine
     222        unsigned int up = 0;  // retained between calls
     223        while (true) {
     224                next = up; // make "up" available outside function
     225                @suspend;@$\label{suspend}$
     226                up += 1;
     227        }
     228}
     229\end{cfa}
     230In this function, or functions called by this function (helper functions), the
     231@suspend@ statement is used to return execution to the coroutine's caller
     232without terminating the coroutine.
     233
     234A coroutine is resumed by calling the @resume@ function, \eg @resume(countup)@.
     235The first resume calls the @main@ function at the top. Thereafter, resume calls
     236continue a coroutine in the last suspended function after the @suspend@
     237statement, in this case @main@ line~\ref{suspend}.  The @resume@ function takes
     238a reference to the coroutine structure and returns the same reference. The
     239return value allows easy access to communication variables defined in the
     240coroutine object. For example, the @next@ value for coroutine object @countup@
     241is both generated and collected in the single expression:
     242@resume(countup).next@.
    192243
    193244\subsection{Monitors and Mutex}
    194 
    195 True concurrency does not garrenty ordering. To get some of that ordering back
    196 \CFA uses monitors and mutex (mutual exclution) parameters. A monitor is
    197 another special declaration that contains a lock and is compatable with mutex
    198 parameters.
    199 
    200 Function parameters can have the \codeCFA{mutex} qualifiers on reference
    201 arguments, for example \codeCFA{void example(a_monitor & mutex arg);}. When the
    202 function is called it will acquire the lock on all of the mutex parameters.
    203 
    204 This means that all functions that mutex on a type are part of a critical
    205 section and only one will ever run at a time.
     245Concurrency does not guarantee ordering; without ordering results are
     246non-deterministic. To claw back ordering, \CFA uses monitors and @mutex@
     247(mutual exclusion) parameters. A monitor is another kind of aggregate, where
     248the compiler implicitly inserts a lock and instances are compatible with
     249@mutex@ parameters.
     250
     251A function that requires deterministic (ordered) execution, acquires mutual
     252exclusion on a monitor object by qualifying an object reference parameter with
     253@mutex@.
     254\begin{cfa}
     255void example(MonitorA & @mutex@ argA, MonitorB & @mutex@ argB);
     256\end{cfa}
     257When the function is called, it implicitly acquires the monitor lock for all of
     258the mutex parameters without deadlock.  This semantics means all functions with
     259the same mutex type(s) are part of a critical section for objects of that type
     260and only one runs at a time.
    206261
    207262\subsection{Threads}
    208 While coroutines allow new things to be done with a single execution path
    209 threads actually introduce new paths of execution that continue independently.
    210 Now for threads to work together their must be some communication between them
    211 and that means the timing of certain operations does have to be known. There
    212 or various means of syncronization and mutual exclution provided by \CFA but
    213 for exceptions only the basic two -- fork and join -- are needed.
    214 
    215 Threads are created like coroutines except the keyword is changed:
    216 \begin{lstlisting}
     263Functions, generators, and coroutines are sequential so there is only a single
     264(but potentially sophisticated) execution path in a program. Threads introduce
     265multiple execution paths that continue independently.
     266
     267For threads to work safely with objects requires mutual exclusion using
     268monitors and mutex parameters. For threads to work safely with other threads,
     269also requires mutual exclusion in the form of a communication rendezvous, which
     270also supports internal synchronization as for mutex objects. For exceptions
     271only the basic two basic operations are important: thread fork and join.
     272
     273Threads are created like coroutines with an associated @main@ function:
     274\begin{cfa}
    217275thread StringWorker {
    218     const char * input;
    219     int result;
     276        const char * input;
     277        int result;
    220278};
    221 
    222279void main(StringWorker & this) {
    223     const char * localCopy = this.input;
    224     // ... do some work, perhaps hashing the string ...
    225     this.result = result;
    226 }
    227 \end{lstlisting}
    228 The main function will start executing after the fork operation and continue
    229 executing until it is finished. If another thread joins with this one it will
    230 wait until main has completed execution. In other words everything the thread
    231 does is between fork and join.
    232 
    233 From the outside this is the creation and destruction of the thread object.
    234 Fork happens after the constructor is run and join happens before the
    235 destructor runs. Join also happens during the \codeCFA{join} function which
    236 can be used to join a thread earlier. If it is used the destructor does not
    237 join as that has already been completed.
     280        const char * localCopy = this.input;
     281        // ... do some work, perhaps hashing the string ...
     282        this.result = result;
     283}
     284{
     285        StringWorker stringworker; // fork thread running in "main"
     286} // implicitly join with thread $\(\Rightarrow\)$ wait for completion
     287\end{cfa}
     288The thread main is where a new thread starts execution after a fork operation
     289and then the thread continues executing until it is finished. If another thread
     290joins with an executing thread, it waits until the executing main completes
     291execution. In other words, everything a thread does is between a fork and join.
     292
     293From the outside, this behaviour is accomplished through creation and
     294destruction of a thread object.  Implicitly, fork happens after a thread
     295object's constructor is run and join happens before the destructor runs. Join
     296can also be specified explicitly using the @join@ function to wait for a
     297thread's completion independently from its deallocation (\ie destructor
     298call). If @join@ is called explicitly, the destructor does not implicitly join.
  • doc/theses/andrew_beach_MMath/features.tex

    rb6a8b31 rd95969a  
    1 \chapter{Features}
    2 
    3 This chapter covers the design and user interface of the \CFA exception
    4 handling mechanism.
    5 
    6 \section{Virtual Casts}
    7 
    8 Virtual casts and virtual types are not truly part of the exception system but
    9 they did not exist in \CFA and are useful in exceptions. So a minimal version
    10 of they virtual system was designed and implemented.
    11 
    12 Virtual types are organizied in simple hierarchies. Each virtual type may have
    13 a parent and can have any number of children. A type's descendants are its
    14 children and its children's descendants. A type may not be its own descendant.
    15 
    16 Each virtual type has an associated virtual table type. A virtual table is a
    17 structure that has fields for all the virtual members of a type. A virtual
    18 type has all the virtual members of its parent and can add more. It may also
    19 update the values of the virtual members.
    20 
    21 Except for virtual casts, this is only used internally in the exception
    22 system. There is no general purpose interface for the other features. A
    23 a virtual cast has the following syntax:
    24 
    25 \begin{lstlisting}
     1\chapter{Exception Features}
     2
     3This chapter covers the design and user interface of the \CFA
     4exception-handling mechanism.
     5
     6\section{Virtuals}
     7Virtual types and casts are not required for a basic exception-system but are
     8useful for advanced exception features. However, \CFA is not object-oriented so
     9there is no obvious concept of virtuals.  Hence, to create advanced exception
     10features for this work, I needed to designed and implemented a virtual-like
     11system for \CFA.
     12
     13Object-oriented languages often organized exceptions into a simple hierarchy,
     14\eg Java.
     15\begin{center}
     16\setlength{\unitlength}{4000sp}%
     17\begin{picture}(1605,612)(2011,-1951)
     18\put(2100,-1411){\vector(1, 0){225}}
     19\put(3450,-1411){\vector(1, 0){225}}
     20\put(3550,-1411){\line(0,-1){225}}
     21\put(3550,-1636){\vector(1, 0){150}}
     22\put(3550,-1636){\line(0,-1){225}}
     23\put(3550,-1861){\vector(1, 0){150}}
     24\put(2025,-1490){\makebox(0,0)[rb]{\LstBasicStyle{exception}}}
     25\put(2400,-1460){\makebox(0,0)[lb]{\LstBasicStyle{arithmetic}}}
     26\put(3750,-1460){\makebox(0,0)[lb]{\LstBasicStyle{underflow}}}
     27\put(3750,-1690){\makebox(0,0)[lb]{\LstBasicStyle{overflow}}}
     28\put(3750,-1920){\makebox(0,0)[lb]{\LstBasicStyle{zerodivide}}}
     29\end{picture}%
     30\end{center}
     31The hierarchy provides the ability to handle an exception at different degrees
     32of specificity (left to right).  Hence, it is possible to catch a more general
     33exception-type in higher-level code where the implementation details are
     34unknown, which reduces tight coupling to the lower-level implementation.
     35Otherwise, low-level code changes require higher-level code changes, \eg,
     36changing from raising @underflow@ to @overflow@ at the low level means changing
     37the matching catch at the high level versus catching the general @arithmetic@
     38exception. In detail, each virtual type may have a parent and can have any
     39number of children. A type's descendants are its children and its children's
     40descendants. A type may not be its own descendant.
     41
     42The exception hierarchy allows a handler (@catch@ clause) to match multiple
     43exceptions, \eg a base-type handler catches both base and derived
     44exception-types.
     45\begin{cfa}
     46try {
     47        ...
     48} catch(arithmetic &) {
     49        ... // handle arithmetic, underflow, overflow, zerodivide
     50}
     51\end{cfa}
     52Most exception mechanisms perform a linear search of the handlers and select
     53the first matching handler, so the order of handers is now important because
     54matching is many to one.
     55
     56Each virtual type needs an associated virtual table. A virtual table is a
     57structure with fields for all the virtual members of a type. A virtual type has
     58all the virtual members of its parent and can add more. It may also update the
     59values of the virtual members and often does.
     60
     61While much of the virtual infrastructure is created, it is currently only used
     62internally for exception handling. The only user-level feature is the virtual
     63cast, which is the same as the \CC \lstinline[language=C++]|dynamic_cast|.
     64\begin{cfa}
    2665(virtual TYPE)EXPRESSION
    27 \end{lstlisting}
    28 
    29 This has the same precidence as a traditional C-cast and can be used in the
    30 same places. This will convert the result of EXPRESSION to the type TYPE. Both
    31 the type of EXPRESSION and TYPE must be pointers to virtual types.
    32 
    33 The cast is checked and will either return the original value or null, based
    34 on the result of the check. The check is does the object pointed at have a
    35 type that is a descendant of the target type. If it is the result is the
    36 pointer, otherwise the result is null.
    37 
    38 \section{Exceptions}
     66\end{cfa}
     67Note, the syntax and semantics matches a C-cast, rather than the unusual \CC
     68syntax for special casts. Both the type of @EXPRESSION@ and @TYPE@ must be a
     69pointer to a virtual type. The cast dynamically checks if the @EXPRESSION@ type
     70is the same or a subtype of @TYPE@, and if true, returns a pointer to the
     71@EXPRESSION@ object, otherwise it returns @0p@ (null pointer).
     72
     73\section{Exception}
    3974% Leaving until later, hopefully it can talk about actual syntax instead
    4075% of my many strange macros. Syntax aside I will also have to talk about the
    4176% features all exceptions support.
    4277
    43 \section{Termination}
    44 
    45 Termination exception throws are likely the most framilar kind, as they are
    46 used in several popular programming languages. A termination will throw an
    47 exception, search the stack for a handler, unwind the stack to where the
    48 handler is defined, execute the handler and then continue execution after
    49 the handler. They are used when execution cannot continue here.
    50 
    51 Termination has two pieces of syntax it uses. The first is the throw:
    52 \begin{lstlisting}
     78Exceptions are defined by the trait system; there are a series of traits, and
     79if a type satisfies them, then it can be used as an exception.  The following
     80is the base trait all exceptions need to match.
     81\begin{cfa}
     82trait is_exception(exceptT &, virtualT &) {
     83        virtualT const & @get_exception_vtable@(exceptT *);
     84};
     85\end{cfa}
     86The function takes any pointer, including the null pointer, and returns a
     87reference to the virtual-table object. Defining this function also establishes
     88the virtual type and a virtual-table pair to the \CFA type-resolver and
     89promises @exceptT@ is a virtual type and a child of the base exception-type.
     90
     91{\color{blue} PAB: I do not understand this paragraph.}
     92One odd thing about @get_exception_vtable@ is that it should always be a
     93constant function, returning the same value regardless of its argument.  A
     94pointer or reference to the virtual table instance could be used instead,
     95however using a function has some ease of implementation advantages and allows
     96for easier disambiguation because the virtual type name (or the address of an
     97instance that is in scope) can be used instead of the mangled virtual table
     98name.  Also note the use of the word ``promise'' in the trait
     99description. Currently, \CFA cannot check to see if either @exceptT@ or
     100@virtualT@ match the layout requirements. This is considered part of
     101@get_exception_vtable@'s correct implementation.
     102
     103\section{Raise}
     104\CFA provides two kinds of exception raise: termination (see
     105\VRef{s:Termination}) and resumption (see \VRef{s:Resumption}), which are
     106specified with the following traits.
     107\begin{cfa}
     108trait is_termination_exception(
     109                exceptT &, virtualT & | is_exception(exceptT, virtualT)) {
     110        void @defaultTerminationHandler@(exceptT &);
     111};
     112\end{cfa}
     113The function is required to allow a termination raise, but is only called if a
     114termination raise does not find an appropriate handler.
     115
     116Allowing a resumption raise is similar.
     117\begin{cfa}
     118trait is_resumption_exception(
     119                exceptT &, virtualT & | is_exception(exceptT, virtualT)) {
     120        void @defaultResumptionHandler@(exceptT &);
     121};
     122\end{cfa}
     123The function is required to allow a resumption raise, but is only called if a
     124resumption raise does not find an appropriate handler.
     125
     126Finally there are three convenience macros for referring to the these traits:
     127@IS_EXCEPTION@, @IS_TERMINATION_EXCEPTION@ and @IS_RESUMPTION_EXCEPTION@.  Each
     128takes the virtual type's name, and for polymorphic types only, the
     129parenthesized list of polymorphic arguments. These macros do the name mangling
     130to get the virtual-table name and provide the arguments to both sides
     131{\color{blue}(PAB: What's a ``side''?)}
     132
     133\subsection{Termination}
     134\label{s:Termination}
     135
     136Termination raise, called ``throw'', is familiar and used in most programming
     137languages with exception handling. The semantics of termination is: search the
     138stack for a matching handler, unwind the stack frames to the matching handler,
     139execute the handler, and continue execution after the handler. Termination is
     140used when execution \emph{cannot} return to the throw. To continue execution,
     141the program must \emph{recover} in the handler from the failed (unwound)
     142execution at the raise to safely proceed after the handler.
     143
     144A termination raise is started with the @throw@ statement:
     145\begin{cfa}
    53146throw EXPRESSION;
    54 \end{lstlisting}
    55 
    56 The expression must evaluate to a reference to a termination exception. A
    57 termination exception is any exception with a
    58 \codeCFA{void defaultTerminationHandler(T &);} (the default handler) defined
    59 on it. The handler is taken from the call sight with \CFA's trait system and
    60 passed into the exception system along with the exception itself.
    61 
    62 The exception passed into the system is then copied into managed memory.
    63 This is to ensure it remains in scope during unwinding. It is the user's
    64 responsibility to make sure the original exception is freed when it goes out
    65 of scope. Being allocated on the stack is sufficient for this.
    66 
    67 Then the exception system will search the stack starting from the throw and
    68 proceding towards the base of the stack, from callee to caller. As it goes
    69 it will check any termination handlers it finds:
    70 
    71 \begin{lstlisting}
    72 try {
    73     TRY_BLOCK
    74 } catch (EXCEPTION_TYPE * NAME) {
    75     HANDLER
    76 }
    77 \end{lstlisting}
    78 
    79 This shows a try statement with a single termination handler. The statements
    80 in TRY\_BLOCK will be executed when control reaches this statement. While
    81 those statements are being executed if a termination exception is thrown and
    82 it is not handled by a try statement further up the stack the EHM will check
    83 all of the terminations handlers attached to the try block, top to bottom.
    84 
    85 At each handler the EHM will check to see if the thrown exception is a
    86 descendant of EXCEPTION\_TYPE. If it is the pointer to the exception is
    87 bound to NAME and the statements in HANDLER are executed. If control reaches
    88 the end of the handler then it exits the block, the exception is freed and
    89 control continues after the try statement.
    90 
    91 The default handler is only used if no handler for the exception is found
    92 after the entire stack is searched. When that happens the default handler
    93 is called with a reference to the exception as its only argument. If the
    94 handler returns control continues from after the throw statement.
    95 
    96 \paragraph{Conditional Catches}
    97 
    98 Catch clauses may also be written as:
    99 \begin{lstlisting}
    100 catch (EXCEPTION_TYPE * NAME ; CONDITION)
    101 \end{lstlisting}
    102 This has the same behaviour as a regular catch clause except that if the
    103 exception matches the given type the condition is also run. If the result is
    104 true only then is this considered a matching handler. If the result is false
    105 then the handler does not match and the search continues with the next clause
    106 in the try block.
    107 
    108 The condition considers all names in scope at the beginning of the try block
    109 to be in scope along with the name introduce in the catch clause itself.
    110 
    111 \paragraph{Re-Throwing}
    112 
    113 You can also rethrow the most recent termination exception with
    114 \codeCFA{throw;}. % This is terrible and you should never do it.
    115 This can be done in a handler or any function that could be called from a
    116 handler.
    117 
    118 This will start another termination throw reusing the exception, meaning it
    119 does not copy the exception or allocated any more memory for it. However the
    120 default handler is still at the original through and could refer to data that
    121 was on the unwound section of the stack. So instead a new default handler that
    122 does a program level abort is used.
    123 
    124 \section{Resumption}
    125 
    126 Resumption exceptions are less popular then termination but in many
    127 regards are simpler and easier to understand. A resumption throws an exception,
    128 searches for a handler on the stack, executes that handler on top of the stack
    129 and then continues execution from the throw. These are used when a problem
    130 needs to be fixed before execution continues.
    131 
    132 A resumption is thrown with a throw resume statement:
    133 \begin{lstlisting}
     147\end{cfa}
     148The expression must return a termination-exception reference, where the
     149termination exception has a type with a @void defaultTerminationHandler(T &)@
     150(default handler) defined. The handler is found at the call site using \CFA's
     151trait system and passed into the exception system along with the exception
     152itself.
     153
     154At runtime, a representation of the exception type and an instance of the
     155exception type is copied into managed memory (heap) to ensure it remains in
     156scope during unwinding. It is the user's responsibility to ensure the original
     157exception object at the throw is freed when it goes out of scope. Being
     158allocated on the stack is sufficient for this.
     159
     160Then the exception system searches the stack starting from the throw and
     161proceeding towards the base of the stack, from callee to caller. At each stack
     162frame, a check is made for termination handlers defined by the @catch@ clauses
     163of a @try@ statement.
     164\begin{cfa}
     165try {
     166        GUARDED_BLOCK
     167} @catch (EXCEPTION_TYPE$\(_1\)$ * NAME)@ { // termination handler 1
     168        HANDLER_BLOCK$\(_1\)$
     169} @catch (EXCEPTION_TYPE$\(_2\)$ * NAME)@ { // termination handler 2
     170        HANDLER_BLOCK$\(_2\)$
     171}
     172\end{cfa}
     173The statements in the @GUARDED_BLOCK@ are executed. If those statements, or any
     174functions invoked from those statements, throws an exception, and the exception
     175is not handled by a try statement further up the stack, the termination
     176handlers are searched for a matching exception type from top to bottom.
     177
     178Exception matching checks the representation of the thrown exception-type is
     179the same or a descendant type of the exception types in the handler clauses. If
     180there is a match, a pointer to the exception object created at the throw is
     181bound to @NAME@ and the statements in the associated @HANDLER_BLOCK@ are
     182executed. If control reaches the end of the handler, the exception is freed,
     183and control continues after the try statement.
     184
     185The default handler visible at the throw statement is used if no matching
     186termination handler is found after the entire stack is searched. At that point,
     187the default handler is called with a reference to the exception object
     188generated at the throw. If the default handler returns, the system default
     189action is executed, which often terminates the program. This feature allows
     190each exception type to define its own action, such as printing an informative
     191error message, when an exception is not handled in the program.
     192
     193\subsection{Resumption}
     194\label{s:Resumption}
     195
     196Resumption raise, called ``resume'', is as old as termination
     197raise~\cite{Goodenough75} but is less popular. In many ways, resumption is
     198simpler and easier to understand, as it is simply a dynamic call (as in
     199Lisp). The semantics of resumption is: search the stack for a matching handler,
     200execute the handler, and continue execution after the resume. Notice, the stack
     201cannot be unwound because execution returns to the raise point. Resumption is
     202used used when execution \emph{can} return to the resume. To continue
     203execution, the program must \emph{correct} in the handler for the failed
     204execution at the raise so execution can safely continue after the resume.
     205
     206A resumption raise is started with the @throwResume@ statement:
     207\begin{cfa}
    134208throwResume EXPRESSION;
    135 \end{lstlisting}
    136 The result of EXPRESSION must be a resumption exception type. A resumption
    137 exception type is any type that satifies the assertion
    138 \codeCFA{void defaultResumptionHandler(T &);} (the default handler). When the
    139 statement is executed the expression is evaluated and the result is thrown.
    140 
    141 Handlers are declared using clauses in try statements:
    142 \begin{lstlisting}
    143 try {
    144     TRY_BLOCK
    145 } catchResume (EXCEPTION_TYPE * NAME) {
    146     HANDLER
    147 }
    148 \end{lstlisting}
    149 This is a simple example with the try block and a single resumption handler.
    150 Multiple resumption handlers can be put in a try statement and they can be
    151 mixed with termination handlers.
    152 
    153 When a resumption begins it will start searching the stack starting from
    154 the throw statement and working its way to the callers. In each try statement
    155 handlers will be tried top to bottom. Each handler is checked by seeing if
    156 the thrown exception is a descendant of EXCEPTION\_TYPE. If not the search
    157 continues. Otherwise NAME is bound to a pointer to the exception and the
    158 HANDLER statements are executed. After they are finished executing control
    159 continues from the throw statement.
    160 
    161 If no approprate handler is found then the default handler is called. The
    162 throw statement acts as a regular function call passing the exception to
    163 the default handler and after the handler finishes executing control continues
    164 from the throw statement.
    165 
    166 The exception system also tracks the position of a search on the stack. If
    167 another resumption exception is thrown while a resumption handler is running
    168 it will first check handlers pushed to the stack by the handler and any
    169 functions it called, then it will continue from the try statement that the
    170 handler is a part of; except for the default handler where it continues from
    171 the throw the default handler was passed to.
    172 
    173 This makes the search pattern for resumption reflect the one for termination,
    174 which is what most users expect.
    175 
    176 % This might need a diagram. But it is an important part of the justifaction
     209\end{cfa}
     210The semantics of the @throwResume@ statement are like the @throw@, but the
     211expression has a type with a @void defaultResumptionHandler(T &)@ (default
     212handler) defined, where the handler is found at the call site by the type
     213system.  At runtime, a representation of the exception type and an instance of
     214the exception type is \emph{not} copied because the stack is maintained during
     215the handler search.
     216
     217Then the exception system searches the stack starting from the resume and
     218proceeding towards the base of the stack, from callee to caller. At each stack
     219frame, a check is made for resumption handlers defined by the @catchResume@
     220clauses of a @try@ statement.
     221\begin{cfa}
     222try {
     223        GUARDED_BLOCK
     224} @catchResume (EXCEPTION_TYPE$\(_1\)$ * NAME)@ { // resumption handler 1
     225        HANDLER_BLOCK$\(_1\)$
     226} @catchResume (EXCEPTION_TYPE$\(_2\)$ * NAME)@ { // resumption handler 2
     227        HANDLER_BLOCK$\(_2\)$
     228}
     229\end{cfa}
     230The statements in the @GUARDED_BLOCK@ are executed. If those statements, or any
     231functions invoked from those statements, resumes an exception, and the
     232exception is not handled by a try statement further up the stack, the
     233resumption handlers are searched for a matching exception type from top to
     234bottom. (Note, termination and resumption handlers may be intermixed in a @try@
     235statement but the kind of raise (throw/resume) only matches with the
     236corresponding kind of handler clause.)
     237
     238The exception search and matching for resumption is the same as for
     239termination, including exception inheritance. The difference is when control
     240reaches the end of the handler: the resumption handler returns after the resume
     241rather than after the try statement. The resume point assumes the handler has
     242corrected the problem so execution can safely continue.
     243
     244Like termination, if no resumption handler is found, the default handler
     245visible at the resume statement is called, and the system default action is
     246executed.
     247
     248For resumption, the exception system uses stack marking to partition the
     249resumption search. If another resumption exception is raised in a resumption
     250handler, the second exception search does not start at the point of the
     251original raise. (Remember the stack is not unwound and the current handler is
     252at the top of the stack.) The search for the second resumption starts at the
     253current point on the stack because new try statements may have been pushed by
     254the handler or functions called from the handler. If there is no match back to
     255the point of the current handler, the search skips the stack frames already
     256searched by the first resume and continues after the try statement. The default
     257handler always continues from default handler associated with the point where
     258the exception is created.
     259
     260% This might need a diagram. But it is an important part of the justification
    177261% of the design of the traversal order.
    178 It also avoids the recursive resumption problem. If the entire stack is
    179 searched loops of resumption can form. Consider a handler that handles an
    180 exception of type A by resuming an exception of type B and on the same stack,
    181 later in the search path, is a second handler that handles B by resuming A.
    182 
    183 Assuming no other handlers on the stack handle A or B then in either traversal
    184 system an A resumed from the top of the stack will be handled by the first
    185 handler. A B resumed from the top or from the first handler it will be handled
    186 by the second hander. The only difference is when A is thrown from the second
    187 handler. The entire stack search will call the first handler again, creating a
    188 loop. Starting from the position in the stack though will break this loop.
    189 
    190 \paragraph{Conditional Catches}
    191 
    192 Resumption supports conditional catch clauses like termination does. They
    193 use the same syntax except the keyword is changed:
    194 \begin{lstlisting}
    195 catchResume (EXCEPTION_TYPE * NAME ; CONDITION) 
    196 \end{lstlisting}
    197 
    198 It also has the same behaviour, after the exception type has been matched
    199 with the EXCEPTION\_TYPE the CONDITION is evaluated with NAME in scope. If
    200 the result is true then the hander is run, otherwise the search continues
    201 just as if there had been a type mismatch.
    202 
    203 \paragraph{Re-Throwing}
    204 
    205 You may also re-throw resumptions with a \codeCFA{throwResume;} statement.
    206 This can only be done from inside of a \codeCFA{catchResume} block.
    207 
    208 Outside of any side effects of any code already run in the handler this will
    209 have the same effect as if the exception had not been caught in the first
    210 place.
     262\begin{verbatim}
     263       throwResume2 ----------.
     264            |                 |
     265 generated from handler       |
     266            |                 |
     267         handler              |
     268            |                 |
     269        throwResume1 -----.   :
     270            |             |   :
     271           try            |   : search skip
     272            |             |   :
     273        catchResume  <----'   :
     274            |                 |
     275\end{verbatim}
     276
     277This resumption search-pattern reflect the one for termination, which matches
     278with programmer expectations. However, it avoids the \emph{recursive
     279resumption} problem. If parts of the stack are searched multiple times, loops
     280can easily form resulting in infinite recursion.
     281
     282Consider the trivial case:
     283\begin{cfa}
     284try {
     285        throwResume$\(_1\)$ (E &){};
     286} catch( E * ) {
     287        throwResume;
     288}
     289\end{cfa}
     290Based on termination semantics, programmer expectation is for the re-resume to
     291continue searching the stack frames after the try statement. However, the
     292current try statement is still on the stack below the handler issuing the
     293reresume (see \VRef{s:Reraise}). Hence, the try statement catches the re-raise
     294again and does another re-raise \emph{ad infinitum}, which is confusing and
     295difficult to debug. The \CFA resumption search-pattern skips the try statement
     296so the reresume search continues after the try, mathcing programmer
     297expectation.
     298
     299\section{Conditional Catch}
     300Both termination and resumption handler-clauses may perform conditional matching:
     301\begin{cfa}
     302catch (EXCEPTION_TYPE * NAME ; @CONDITION@)
     303\end{cfa}
     304First, the same semantics is used to match the exception type. Second, if the
     305exception matches, @CONDITION@ is executed. The condition expression may
     306reference all names in scope at the beginning of the try block and @NAME@
     307introduced in the handler clause.  If the condition is true, then the handler
     308matches. Otherwise, the exception search continues at the next appropriate kind
     309of handler clause in the try block.
     310\begin{cfa}
     311try {
     312        f1 = open( ... );
     313        f2 = open( ... );
     314        ...
     315} catch( IOFailure * f ; fd( f ) == f1 ) {
     316        // only handle IO failure for f1
     317}
     318\end{cfa}
     319Note, catching @IOFailure@, checking for @f1@ in the handler, and reraising the
     320exception if not @f1@ is different because the reraise does not examine any of
     321remaining handlers in the current try statement.
     322
     323\section{Reraise}
     324\label{s:Reraise}
     325Within the handler block or functions called from the handler block, it is
     326possible to reraise the most recently caught exception with @throw@ or
     327@throwResume@, respective.
     328\begin{cfa}
     329catch( ... ) {
     330        ... throw; // rethrow
     331} catchResume( ... ) {
     332        ... throwResume; // reresume
     333}
     334\end{cfa}
     335The only difference between a raise and a reraise is that reraise does not
     336create a new exception; instead it continues using the current exception, \ie
     337no allocation and copy. However the default handler is still set to the one
     338visible at the raise point, and hence, for termination could refer to data that
     339is part of an unwound stack frame. To prevent this problem, a new default
     340handler is generated that does a program-level abort.
     341
    211342
    212343\section{Finally Clauses}
    213 
    214 A \codeCFA{finally} clause may be placed at the end of a try statement after
    215 all the handler clauses. In the simply case, with no handlers, it looks like
    216 this:
    217 
    218 \begin{lstlisting}
    219 try {
    220     TRY_BLOCK
     344A @finally@ clause may be placed at the end of a @try@ statement.
     345\begin{cfa}
     346try {
     347        GUARDED_BLOCK
     348} ...   // any number or kind of handler clauses
    221349} finally {
    222     FINAL_STATEMENTS
    223 }
    224 \end{lstlisting}
    225 
    226 Any number of termination handlers and resumption handlers may proceed the
    227 finally clause.
    228 
    229 The FINAL\_STATEMENTS, the finally block, are executed whenever the try
    230 statement is removed from the stack. This includes: the TRY\_BLOCK finishes
    231 executing, a termination exception finishes executing and the stack unwinds.
    232 
    233 Execution of the finally block should finish by letting control run off
    234 the end of the block. This is because after the finally block is complete
    235 control will continue to where ever it would if the finally clause was not
    236 present.
    237 
    238 Because of this local control flow out of the finally block is forbidden.
    239 The compiler rejects uses of \codeCFA{break}, \codeCFA{continue},
    240 \codeCFA{fallthru} and \codeCFA{return} that would cause control to leave
    241 the finally block. Other ways to leave the finally block - such as a long
    242 jump or termination - are much harder to check, at best requiring additional
    243 runtime overhead, and so are merely discouraged.
     350        FINALLY_BLOCK
     351}
     352\end{cfa}
     353The @FINALLY_BLOCK@ is executed when the try statement is unwound from the
     354stack, \ie when the @GUARDED_BLOCK@ or any handler clause finishes. Hence, the
     355finally block is always executed.
     356
     357Execution of the finally block should always finish, meaning control runs off
     358the end of the block. This requirement ensures always continues as if the
     359finally clause is not present, \ie finally is for cleanup not changing control
     360flow.  Because of this requirement, local control flow out of the finally block
     361is forbidden.  The compiler precludes any @break@, @continue@, @fallthru@ or
     362@return@ that causes control to leave the finally block. Other ways to leave
     363the finally block, such as a long jump or termination are much harder to check,
     364and at best requiring additional run-time overhead, and so are discouraged.
    244365
    245366\section{Cancellation}
    246 
    247 Cancellation can be thought of as a stack-level abort or as an uncatchable
    248 termination. It unwinds the entirety of the current exception and if possible
    249 passes an exception to a different stack as a message.
    250 
    251 There is no special statement for starting a cancellation, instead you call
    252 the standard libary function \codeCFA{cancel\_stack} which takes an exception.
    253 Unlike in a throw this exception is not used in control flow but is just there
    254 to pass information about why the cancellation happened.
    255 
    256 The handler is decided entirely by which stack is being cancelled. There are
    257 three handlers that apply to three different groups of stacks:
    258 \begin{itemize}
    259 \item Main Stack:
    260 The main stack is the one on which the program main is called at the beginning
    261 of your program. It is also the only stack you have without the libcfathreads.
    262 
    263 Because of this there is no other stack ``above" (or possibly at all) for main
    264 to notify when a cancellation occurs. So after the stack is unwound we do a
    265 program level abort.
    266 
    267 \item Thread Stack:
    268 Thread stacks are those created \codeCFA{thread} or otherwise satify the
    269 \codeCFA{is\_thread} trait.
    270 
    271 Threads only have two structural points of communication that must happen,
    272 start and join. As the thread must be running to preform a cancellation it
    273 will be after start and before join, so join is one cancellation uses.
    274 
    275 After the stack is unwound the thread will halt as if had completed normally
    276 and wait for another thread to join with it. The other thread, when it joins,
    277 checks for a cancellation. If so it will throw the resumption exception
    278 \codeCFA{ThreadCancelled}.
    279 
    280 There is a difference here in how explicate joins (with the \codeCFA{join}
    281 function) and implicate joins (from a destructor call). Explicate joins will
    282 take the default handler (\codeCFA{defaultResumptionHandler}) from the context
    283 and use like a regular through does if the exception is not caught. The
    284 implicate join does a program abort instead.
    285 
    286 This is for safety. One of the big problems in exceptions is you cannot handle
    287 two terminations or cancellations on the same stack as either can destroy the
    288 context required for the other. This can happen with join but as the
    289 destructors will always be run when the stack is being unwound and one
    290 termination/cancellation is already active. Also since they are implicite they
    291 are easier to forget about.
    292 
    293 \item Coroutine Stack:
    294 Coroutine stacks are those created with \codeCFA{coroutine} or otherwise
    295 satify the \codeCFA{is\_coroutine} trait.
    296 
    297 A coroutine knows of two other coroutines, its starter and its last resumer.
    298 The last resumer is ``closer" so that is the one notified.
    299 
    300 After the stack is unwound control goes to the last resumer.
    301 Resume will resume throw a \codeCFA{CoroutineCancelled} exception, which is
    302 polymorphic over the coroutine type and has a pointer to the coroutine being
    303 cancelled and the cancelling exception. The resume function also has an
    304 assertion that the \codeCFA{defaultResumptionHandler} for the exception. So it
    305 will use the default handler like a regular throw.
    306 
    307 \end{itemize}
     367Cancellation is a stack-level abort, which can be thought of as as an
     368uncatchable termination. It unwinds the entirety of the current stack, and if
     369possible forwards the cancellation exception to a different stack.
     370
     371There is no special statement for starting a cancellation; instead the standard
     372library function @cancel_stack@ is called passing an exception.  Unlike a
     373raise, this exception is not used in matching only to pass information about
     374the cause of the cancellation.
     375
     376Handling of a cancellation depends on which stack is being cancelled.
     377\begin{description}
     378\item[Main Stack:]
     379
     380The main stack is the one used by the program main at the start of execution,
     381and is the only stack in a sequential program.  Hence, when cancellation is
     382forwarded to the main stack, there is no other forwarding stack, so after the
     383stack is unwound, there is a program-level abort.
     384
     385\item[Thread Stack:]
     386A thread stack is created for a @thread@ object or object that satisfies the
     387@is_thread@ trait.  A thread only has two points of communication that must
     388happen: start and join. As the thread must be running to perform a
     389cancellation, it must occur after start and before join, so join is a
     390cancellation point.  After the stack is unwound, the thread halts and waits for
     391another thread to join with it. The joining thread, checks for a cancellation,
     392and if present, resumes exception @ThreadCancelled@.
     393
     394There is a subtle difference between the explicit join (@join@ function) and
     395implicit join (from a destructor call). The explicit join takes the default
     396handler (@defaultResumptionHandler@) from its calling context, which is used if
     397the exception is not caught. The implicit join does a program abort instead.
     398
     399This semantics is for safety. One difficult problem for any exception system is
     400defining semantics when an exception is raised during an exception search:
     401which exception has priority, the original or new exception? No matter which
     402exception is selected, it is possible for the selected one to disrupt or
     403destroy the context required for the other. {\color{blue} PAB: I do not
     404understand the following sentences.} This loss of information can happen with
     405join but as the thread destructor is always run when the stack is being unwound
     406and one termination/cancellation is already active. Also since they are
     407implicit they are easier to forget about.
     408
     409\item[Coroutine Stack:] A coroutine stack is created for a @coroutine@ object
     410or object that satisfies the @is_coroutine@ trait.  A coroutine only knows of
     411two other coroutines, its starter and its last resumer.  The last resumer has
     412the tightest coupling to the coroutine it activated.  Hence, cancellation of
     413the active coroutine is forwarded to the last resumer after the stack is
     414unwound, as the last resumer has the most precise knowledge about the current
     415execution. When the resumer restarts, it resumes exception
     416@CoroutineCancelled@, which is polymorphic over the coroutine type and has a
     417pointer to the cancelled coroutine.
     418
     419The resume function also has an assertion that the @defaultResumptionHandler@
     420for the exception. So it will use the default handler like a regular throw.
     421\end{description}
  • doc/theses/andrew_beach_MMath/future.tex

    rb6a8b31 rd95969a  
    88parts of the exception system that use the current version.
    99
    10 For instance a full virtual system would probably allow for several
    11 improvements to the exception traits. Although they do currently work they
    12 could be made easier to use by making the virtual table type implitate in the
    13 trait (which would remove the need for those wrapper marcos) or allowing
    14 for assertions that give the layout of a virtual table for safety.
     10There are several improvements to the virtual system that would improve
     11the exception traits. The biggest one is an assertion that checks that one
     12virtual type is a child of another virtual type. This would capture many of
     13the requirements much more precisely.
     14
     15The full virtual system might also include other improvement like associated
     16types. This is a proposed feature that would allow traits to refer to types
     17not listed in their header. This would allow the exception traits to not
     18refer to the virtual table type explicatly which would remove the need for
     19the interface macros.
    1520
    1621\section{Additional Throws}
    17 Several other kinds of throws, beyond the termination throw (\codeCFA{throw}),
    18 the resumption throw (\codeCFA{throwResume}) and the re-throws, were considered.
     22Several other kinds of throws, beyond the termination throw (@throw@),
     23the resumption throw (@throwResume@) and the re-throws, were considered.
    1924None were as useful as the core throws but they would likely be worth
    2025revising.
     
    6570
    6671Also new techniques to skip previously searched parts of the stack will have
    67 to be developed.
     72to be developed. The recursive resume problem still remains and ideally the
     73same pattern of ignoring sections of the stack.
    6874
    69 \section{Support for More Platforms}
    70 Termination is not portable because it is implemented with inline assembly.
    71 Those sections will have to be rewritten to support different architectures
     75\section{Signal Exceptions}
     76Exception Handling: Issues and a Proposed Notation suggests there are three
     77types of exceptions: escape, notify and signal.
     78Escape exceptions are our termination exceptions, notify exceptions are
     79resumption exceptions and that leaves signal exception unimplemented.
    7280
    73 \section{Quality-of-Life Improvements}
    74 Finally come various improvements to the usability of \CFA. Most of these
    75 would just require time. Time that would not lead to interesting research so
    76 it has been left aside for now. A few examples are included here but there
    77 are more:
     81Signal exceptions allow either behaviour, that is after the exception is
     82handled control can either return to the throw or from where the handler is
     83defined.
     84
     85The design should be rexamined and be updated for \CFA. A very direct
     86translation would perhaps have a new throw and catch pair and a statement
     87(or statements) could be used to decide if the handler returns to the throw
     88or continues where it is, but there are other options.
     89
     90For instance resumption could be extended to cover this use by allowing
     91local control flow out of it. This would require an unwind as part of the
     92transition as there are stack frames that have to be removed.
     93This would mean there is no notify like throw but because \CFA does not have
     94exception signatures a termination can be thrown from any resumption handler
     95already so there are already ways one could try to do this in existing \CFA.
     96
     97% Maybe talk about the escape; and escape CONTROL_STMT; statements or how
     98% if we could choose if _Unwind_Resume proceeded to the clean-up stage this
     99% would be much easier to implement.
     100
     101\section{Language Improvements}
     102There is also a lot of work that are not follow ups to this work in terms of
     103research, some have no interesting research to be done at all, but would
     104improve \CFA as a programming language. The full list of these would
     105naturally be quite extensive but here are a few examples that involve
     106exceptions:
    78107
    79108\begin{itemize}
     109\item The implementation of termination is not portable because it includes
     110some assembly statements. These sections will have to be re-written to so
     111\CFA has full support on more machines.
    80112\item Allowing exception handler to bind the exception to a reference instead
    81113of a pointer. This should actually result in no change in behaviour so there
    82114is no reason not to allow it. It is however a small improvement; giving a bit
    83115of flexibility to the user in what style they want to use.
    84 \item Enabling local control flow (by \codeCFA{break}, \codeCFA{return} and
     116\item Enabling local control flow (by @break@, @return@ and
    85117similar statements) out of a termination handler. The current set-up makes
    86118this very difficult but the catch function that runs the handler after it has
     
    88120much easier. (To do the same for try blocks would probably wait for zero-cost
    89121exceptions, which would allow the try block to be inlined as well.)
    90 \item Enabling local control flow out of a resumption handler. This would be
    91 a weighty operation, causing a stack unwind like a termination, so there might
    92 be a different statement or a statement modifier to make sure the user does
    93 this purposefully.
    94 
    95 However this would require the more complex system as they cannot be inlined
    96 into the original function as they can be run at a different place on the
    97 stack. So instead the unwinding will have to carry with it information on
    98 which one of these points to continue at and possibly also the return value
    99 for the function if a \codeCFA{return} statement was used.
    100122\end{itemize}
  • doc/theses/andrew_beach_MMath/implement.tex

    rb6a8b31 rd95969a  
    99
    1010All of this is accessed through a field inserted at the beginning of every
    11 virtual type. Currently it is called \codeC{virtual_table} but it is not
     11virtual type. Currently it is called @virtual_table@ but it is not
    1212ment to be accessed by the user. This field is a pointer to the type's
    1313virtual table instance. It is assigned once during the object's construction
     
    4040using that to calculate the mangled name of the parent's virtual table type.
    4141There are two special fields that are included like normal fields but have
    42 special initialization rules: the \codeC{size} field is the type's size and is
    43 initialized with a sizeof expression, the \codeC{align} field is the type's
     42special initialization rules: the @size@ field is the type's size and is
     43initialized with a sizeof expression, the @align@ field is the type's
    4444alignment and uses an alignof expression. The remaining fields are resolved
    4545to a name matching the field's name and type using the normal visibility
     
    5656The declarations include the virtual type definition and forward declarations
    5757of the virtual table instance, constructor, message function and
    58 \codeCFA{get_exception_vtable}. The definition includes the storage and
     58@get_exception_vtable@. The definition includes the storage and
    5959initialization of the virtual table instance and the bodies of the three
    6060functions.
     
    6565from the per-instance information. The virtual table type and most of the
    6666functions are polymorphic so they are all part of the core. The virtual table
    67 instance and the \codeCFA{get_exception_vtable} function.
    68 
    69 Coroutines and threads need instances of \codeCFA{CoroutineCancelled} and
    70 \codeCFA{ThreadCancelled} respectively to use all of their functionality.
    71 When a new data type is declared with \codeCFA{coroutine} or \codeCFA{thread}
     67instance and the @get_exception_vtable@ function.
     68
     69Coroutines and threads need instances of @CoroutineCancelled@ and
     70@ThreadCancelled@ respectively to use all of their functionality.
     71When a new data type is declared with @coroutine@ or @thread@
    7272the forward declaration for the instance is created as well. The definition
    7373of the virtual table is created at the definition of the main function.
     
    7979function.
    8080
    81 The function is \codeC{__cfa__virtual_cast} and it is implemented in the
     81The function is @__cfa__virtual_cast@ and it is implemented in the
    8282standard library. It takes a pointer to the target type's virtual table and
    8383the object pointer being cast. The function is very simple, getting the
     
    8787
    8888For the generated code a forward decaration of the virtual works as follows.
    89 There is a forward declaration of \codeC{__cfa__virtual_cast} in every cfa
     89There is a forward declaration of @__cfa__virtual_cast@ in every cfa
    9090file so it can just be used. The object argument is the expression being cast
    9191so that is just placed in the argument list.
     
    110110often across functions.
    111111
    112 At a very basic level this can be done with \codeC{setjmp} \& \codeC{longjmp}
     112At a very basic level this can be done with @setjmp@ \& @longjmp@
    113113which simply move the top of the stack, discarding everything on the stack
    114114above a certain point. However this ignores all the clean-up code that should
     
    118118both of these problems.
    119119
    120 Libunwind, provided in \texttt{unwind.h} on most platorms, is a C library
     120Libunwind, provided in @unwind.h@ on most platorms, is a C library
    121121that provides \CPP style stack unwinding. Its operation is divided into two
    122122phases. The search phase -- phase 1 -- is used to scan the stack and decide
     
    142142
    143143GCC will generate an LSDA and attach its personality function with the
    144 \texttt{-fexceptions} flag. However this only handles the cleanup attribute.
     144@-fexceptions@ flag. However this only handles the cleanup attribute.
    145145This attribute is used on a variable and specifies a function that should be
    146146run when the variable goes out of scope. The function is passed a pointer to
     
    165165messages for special cases (some of which should never be used by the
    166166personality function) and error codes but unless otherwise noted the
    167 personality function should always return \codeC{_URC_CONTINUE_UNWIND}.
    168 
    169 The \codeC{version} argument is the verson of the implementation that is
     167personality function should always return @_URC_CONTINUE_UNWIND@.
     168
     169The @version@ argument is the verson of the implementation that is
    170170calling the personality function. At this point it appears to always be 1 and
    171171it will likely stay that way until a new version of the API is updated.
    172172
    173 The \codeC{action} argument is set of flags that tell the personality
     173The @action@ argument is set of flags that tell the personality
    174174function when it is being called and what it must do on this invocation.
    175175The flags are as follows:
    176176\begin{itemize}
    177 \item\codeC{_UA_SEARCH_PHASE}: This flag is set whenever the personality
     177\item@_UA_SEARCH_PHASE@: This flag is set whenever the personality
    178178function is called during the search phase. The personality function should
    179179decide if unwinding will stop in this function or not. If it does then the
    180 personality function should return \codeC{_URC_HANDLER_FOUND}.
    181 \item\codeC{_UA_CLEANUP_PHASE}: This flag is set whenever the personality
     180personality function should return @_URC_HANDLER_FOUND@.
     181\item@_UA_CLEANUP_PHASE@: This flag is set whenever the personality
    182182function is called during the cleanup phase. If no other flags are set this
    183183means the entire frame will be unwound and all cleanup code should be run.
    184 \item\codeC{_UA_HANDLER_FRAME}: This flag is set during the cleanup phase
     184\item@_UA_HANDLER_FRAME@: This flag is set during the cleanup phase
    185185on the function frame that found the handler. The personality function must
    186186prepare to return to normal code execution and return
    187 \codeC{_URC_INSTALL_CONTEXT}.
    188 \item\codeC{_UA_FORCE_UNWIND}: This flag is set if the personality function
     187@_URC_INSTALL_CONTEXT@.
     188\item@_UA_FORCE_UNWIND@: This flag is set if the personality function
    189189is called through a forced unwind call. Forced unwind only performs the
    190190cleanup phase and uses a different means to decide when to stop. See its
     
    192192\end{itemize}
    193193
    194 The \codeC{exception_class} argument is a copy of the \codeC{exception}'s
    195 \codeC{exception_class} field.
    196 
    197 The \codeC{exception} argument is a pointer to the user provided storage
     194The @exception_class@ argument is a copy of the @exception@'s
     195@exception_class@ field.
     196
     197The @exception@ argument is a pointer to the user provided storage
    198198object. It has two public fields, the exception class which is actually just
    199199a number that identifies the exception handling mechanism that created it and
     
    201201exception needs to
    202202
    203 The \codeC{context} argument is a pointer to an opaque type. This is passed
     203The @context@ argument is a pointer to an opaque type. This is passed
    204204to the many helper functions that can be called inside the personality
    205205function.
     
    218218functions traversing the stack new-to-old until a function finds a handler or
    219219the end of the stack is reached. In the latter case raise exception will
    220 return with \codeC{_URC_END_OF_STACK}.
     220return with @_URC_END_OF_STACK@.
    221221
    222222Once a handler has been found raise exception continues onto the the cleanup
     
    227227
    228228If an error is encountered raise exception will return either
    229 \codeC{_URC_FATAL_PHASE1_ERROR} or \codeC{_URC_FATAL_PHASE2_ERROR} depending
     229@_URC_FATAL_PHASE1_ERROR@ or @_URC_FATAL_PHASE2_ERROR@ depending
    230230on when the error occured.
    231231
     
    259259been unwound.
    260260
    261 Each time it is called the stop function should return \codeC{_URC_NO_REASON}
     261Each time it is called the stop function should return @_URC_NO_REASON@
    262262or transfer control directly to other code outside of libunwind. The
    263263framework does not provide any assistance here.
    264264
    265265Its arguments are the same as the paired personality function.
    266 The actions \codeC{_UA_CLEANUP_PHASE} and \codeC{_UA_FORCE_UNWIND} are always
     266The actions @_UA_CLEANUP_PHASE@ and @_UA_FORCE_UNWIND@ are always
    267267set when it is called. By the official standard that is all but both GCC and
    268268Clang add an extra action on the last call at the end of the stack:
    269 \codeC{_UA_END_OF_STACK}.
     269@_UA_END_OF_STACK@.
    270270
    271271\section{Exception Context}
     
    280280Each stack has its own exception context. In a purely sequental program, using
    281281only core Cforall, there is only one stack and the context is global. However
    282 if the library \texttt{libcfathread} is linked then there can be multiple
     282if the library @libcfathread@ is linked then there can be multiple
    283283stacks so they will each need their own.
    284284
    285285To handle this code always gets the exception context from the function
    286 \codeC{this_exception_context}. The main exception handling code is in
    287 \texttt{libcfa} and that library also defines the function as a weak symbol
    288 so it acts as a default. Meanwhile in \texttt{libcfathread} the function is
     286@this_exception_context@. The main exception handling code is in
     287@libcfa@ and that library also defines the function as a weak symbol
     288so it acts as a default. Meanwhile in @libcfathread@ the function is
    289289defined as a strong symbol that replaces it when the libraries are linked
    290290together.
    291291
    292 The version of the function defined in \texttt{libcfa} is very simple. It
     292The version of the function defined in @libcfa@ is very simple. It
    293293returns a pointer to a global static variable. With only one stack this
    294294global instance is associated with the only stack.
    295295
    296 The version of the function defined in \texttt{libcfathread} has to handle
     296The version of the function defined in @libcfathread@ has to handle
    297297more as there are multiple stacks. The exception context is included as
    298298part of the per-stack data stored as part of coroutines. In the cold data
    299299section, stored at the base of each stack, is the exception context for that
    300 stack. The \codeC{this_exception_context} uses the concurrency library to get
     300stack. The @this_exception_context@ uses the concurrency library to get
    301301the current coroutine and through it the cold data section and the exception
    302302context.
     
    323323to store the exception. Macros with pointer arthritic and type cast are
    324324used to move between the components or go from the embedded
    325 \codeC{_Unwind_Exception} to the entire node.
     325@_Unwind_Exception@ to the entire node.
    326326
    327327All of these nodes are strung together in a linked list. One linked list per
     
    347347C which is what the \CFA compiler outputs so a work-around is used.
    348348
    349 This work around is a function called \codeC{__cfaehm_try_terminate} in the
     349This work around is a function called @__cfaehm_try_terminate@ in the
    350350standard library. The contents of a try block and the termination handlers
    351351are converted into functions. These are then passed to the try terminate
     
    385385
    386386These nested functions and all other functions besides
    387 \codeC{__cfaehm_try_terminate} in \CFA use the GCC personality function and
    388 the \texttt{-fexceptions} flag to generate the LSDA. This allows destructors
     387@__cfaehm_try_terminate@ in \CFA use the GCC personality function and
     388the @-fexceptions@ flag to generate the LSDA. This allows destructors
    389389to be implemented with the cleanup attribute.
    390390
     
    401401
    402402The handler function does both the matching and catching. It tries each
    403 the condition of \codeCFA{catchResume} in order, top-to-bottom and until it
     403the condition of @catchResume@ in order, top-to-bottom and until it
    404404finds a handler that matches. If no handler matches then the function returns
    405405false. Otherwise the matching handler is run, if it completes successfully
    406 the function returns true. Rethrows, through the \codeCFA{throwResume;}
     406the function returns true. Rethrows, through the @throwResume;@
    407407statement, cause the function to return true.
     408
     409% Recursive Resumption Stuff:
     410Blocking out part of the stack is accomplished by updating the front of the
     411list as the search continues. Before the handler at a node is called the head
     412of the list is updated to the next node of the current node. After the search
     413is complete, successful or not, the head of the list is reset.
     414
     415This means the current handler and every handler that has already been
     416checked are not on the list while a handler is run. If a resumption is thrown
     417during the handling of another resumption the active handlers and all the
     418other handler checked up to this point will not be checked again.
     419
     420This structure also supports new handler added while the resumption is being
     421handled. These are added to the front of the list, pointing back along the
     422stack -- the first one will point over all the checked handlers -- and the
     423ordering is maintained.
    408424
    409425\subsection{Libunwind Compatibility}
     
    438454
    439455Cancellation also uses libunwind to do its stack traversal and unwinding,
    440 however it uses a different primary function \codeC{_Unwind_ForcedUnwind}.
     456however it uses a different primary function @_Unwind_ForcedUnwind@.
    441457Details of its interface can be found in the unwind section.
    442458
  • doc/theses/andrew_beach_MMath/unwinding.tex

    rb6a8b31 rd95969a  
    1 \chapter{Unwinding in \CFA}
     1\chapter{\texorpdfstring{Unwinding in \CFA}{Unwinding in Cforall}}
    22
    33Stack unwinding is the process of removing things from the stack. Within
     
    1010Even this is fairly simple if nothing needs to happen when the stack unwinds.
    1111Traditional C can unwind the stack by saving and restoring state (with
    12 \codeC{setjmp} \& \codeC{longjmp}). However many languages define actions that
     12@setjmp@ \& @longjmp@). However many languages define actions that
    1313have to be taken when something is removed from the stack, such as running
    14 a variable's destructor or a \codeCFA{try} statement's \codeCFA{finally}
     14a variable's destructor or a @try@ statement's @finally@
    1515clause. Handling this requires walking the stack going through each stack
    1616frame.
     
    2929
    3030\CFA uses two primary functions in libunwind to create most of its
    31 exceptional control-flow: \codeC{_Unwind_RaiseException} and
    32 \codeC{_Unwind_ForcedUnwind}.
     31exceptional control-flow: @_Unwind_RaiseException@ and
     32@_Unwind_ForcedUnwind@.
    3333Their operation is divided into two phases: search and clean-up. The search
    3434phase -- phase 1 -- is used to scan the stack but not unwinding it. The
     
    4444A personality function performs three tasks, although not all have to be
    4545present. The tasks performed are decided by the actions provided.
    46 \codeC{_Unwind_Action} is a bitmask of possible actions and an argument of
     46@_Unwind_Action@ is a bitmask of possible actions and an argument of
    4747this type is passed into the personality function.
    4848\begin{itemize}
    49 \item\codeC{_UA_SEARCH_PHASE} is passed in search phase and tells the
     49\item@_UA_SEARCH_PHASE@ is passed in search phase and tells the
    5050personality function to check for handlers. If there is a handler in this
    5151stack frame, as defined by the language, the personality function should
    52 return \codeC{_URC_HANDLER_FOUND}. Otherwise it should return
    53 \codeC{_URC_CONTINUE_UNWIND}.
    54 \item\codeC{_UA_CLEANUP_PHASE} is passed in during the clean-up phase and
     52return @_URC_HANDLER_FOUND@. Otherwise it should return
     53@_URC_CONTINUE_UNWIND@.
     54\item@_UA_CLEANUP_PHASE@ is passed in during the clean-up phase and
    5555means part or all of the stack frame is removed. The personality function
    5656should do whatever clean-up the language defines
    5757(such as running destructors/finalizers) and then generally returns
    58 \codeC{_URC_CONTINUE_UNWIND}.
    59 \item\codeC{_UA_HANDLER_FRAME} means the personality function must install
     58@_URC_CONTINUE_UNWIND@.
     59\item@_UA_HANDLER_FRAME@ means the personality function must install
    6060a handler. It is also passed in during the clean-up phase and is in addition
    6161to the clean-up action. libunwind provides several helpers for the personality
    6262function here. Once it is done, the personality function must return
    63 \codeC{_URC_INSTALL_CONTEXT}.
     63@_URC_INSTALL_CONTEXT@.
    6464\end{itemize}
    6565The personality function is given a number of other arguments. Some are for
    66 compatability and there is the \codeC{struct _Unwind_Context} pointer which
     66compatability and there is the @struct _Unwind_Context@ pointer which
    6767passed to many helpers to get information about the current stack frame.
    6868
     
    7272raise-exception but with some extras.
    7373The first it passes in an extra action to the personality function on each
    74 stack frame, \codeC{_UA_FORCE_UNWIND}, which means a handler cannot be
     74stack frame, @_UA_FORCE_UNWIND@, which means a handler cannot be
    7575installed.
    7676
     
    8383stack frames have been removed. By the standard API this is marked by setting
    8484the stack pointer inside the context passed to the stop function. However both
    85 GCC and Clang add an extra action for this case \codeC{_UA_END_OF_STACK}.
     85GCC and Clang add an extra action for this case @_UA_END_OF_STACK@.
    8686
    8787Each time function the stop function is called it can do one or two things.
    88 When it is not the end of the stack it can return \codeC{_URC_NO_REASON} to
     88When it is not the end of the stack it can return @_URC_NO_REASON@ to
    8989continue unwinding.
    9090% Is there a reason that NO_REASON is used instead of CONTINUE_UNWIND?
     
    9393are provided to do it.
    9494
    95 \section{\CFA Implementation}
     95\section{\texorpdfstring{\CFA Implementation}{Cforall Implementation}}
    9696
    9797To use libunwind, \CFA provides several wrappers, its own storage,
     
    113113
    114114The stop function is very simple. It checks the end of stack flag to see if
    115 it is finished unwinding. If so, it calls \codeC{exit} to end the process,
     115it is finished unwinding. If so, it calls @exit@ to end the process,
    116116otherwise it returns with no-reason to continue unwinding.
    117117% Yeah, this is going to have to change.
     
    128128location of the instruction pointer and stack layout, which varies with
    129129compiler and optimization levels. So for frames where there are only
    130 destructors, GCC's attribute cleanup with the \texttt{-fexception} flag is
     130destructors, GCC's attribute cleanup with the @-fexception@ flag is
    131131sufficient to handle unwinding.
    132132
    133133The only functions that require more than that are those that contain
    134 \codeCFA{try} statements. A \codeCFA{try} statement has a \codeCFA{try}
    135 clause, some number of \codeCFA{catch} clauses and \codeCFA{catchResume}
    136 clauses and may have a \codeCFA{finally} clause. Of these only \codeCFA{try}
    137 statements with \codeCFA{catch} clauses need to be transformed and only they
    138 and the \codeCFA{try} clause are involved.
     134@try@ statements. A @try@ statement has a @try@
     135clause, some number of @catch@ clauses and @catchResume@
     136clauses and may have a @finally@ clause. Of these only @try@
     137statements with @catch@ clauses need to be transformed and only they
     138and the @try@ clause are involved.
    139139
    140 The \codeCFA{try} statement is converted into a series of closures which can
     140The @try@ statement is converted into a series of closures which can
    141141access other parts of the function according to scoping rules but can be
    142 passed around. The \codeCFA{try} clause is converted into the try functions,
    143 almost entirely unchanged. The \codeCFA{catch} clauses are converted into two
     142passed around. The @try@ clause is converted into the try functions,
     143almost entirely unchanged. The @catch@ clauses are converted into two
    144144functions; the match function and the catch function.
    145145
     
    153153runs the handler's body.
    154154
    155 These three functions are passed to \codeC{try_terminate}. This is an
     155These three functions are passed to @try_terminate@. This is an
    156156% Maybe I shouldn't quote that, it isn't its actual name.
    157157internal hand-written function that has its own personality function and
     
    167167handler was found in this frame. If it was then the personality function
    168168installs the handler, which is setting the instruction pointer in
    169 \codeC{try_terminate} to an otherwise unused section that calls the catch
     169@try_terminate@ to an otherwise unused section that calls the catch
    170170function, passing it the current exception and handler index.
    171 \codeC{try_terminate} returns as soon as the catch function returns.
     171@try_terminate@ returns as soon as the catch function returns.
    172172
    173173At this point control has returned to normal control flow.
  • doc/theses/fangren_yu_COOP_F20/Report.tex

    rb6a8b31 rd95969a  
    1717\usepackage[usenames]{color}
    1818\input{common}                                          % common CFA document macros
    19 \usepackage[dvips,plainpages=false,pdfpagelabels,pdfpagemode=UseNone,colorlinks=true,pagebackref=true,linkcolor=blue,citecolor=blue,urlcolor=blue,pagebackref=true,breaklinks=true]{hyperref}
     19\usepackage[dvips,plainpages=false,pdfpagelabels,pdfpagemode=UseNone,colorlinks=true,linkcolor=blue,citecolor=blue,urlcolor=blue,pagebackref=true,breaklinks=true]{hyperref}
    2020\usepackage{breakurl}
    2121\urlstyle{sf}
     
    7676\renewcommand{\subsectionmark}[1]{\markboth{\thesubsection\quad #1}{\thesubsection\quad #1}}
    7777\pagenumbering{roman}
    78 \linenumbers                                            % comment out to turn off line numbering
     78%\linenumbers                                            % comment out to turn off line numbering
    7979
    8080\maketitle
    8181\pdfbookmark[1]{Contents}{section}
    82 \tableofcontents
    83 
    84 \clearpage
     82
    8583\thispagestyle{plain}
    8684\pagenumbering{arabic}
    8785
    8886\begin{abstract}
     87\CFA is an evolutionary, non-object-oriented extension of the C programming language, featuring a parametric type-system, and is currently under active development. The reference compiler for the \CFA language, @cfa-cc@, has some of its major components dated back to the early 2000s, which are based on inefficient data structures and algorithms. This report introduces improvements targeting the expression resolution algorithm, suggested by a recent prototype experiment on a simplified model, which are implemented in @cfa-cc@ to support the full \CFA language. These optimizations speed up the compiler by a factor of 20 across the existing \CFA codebase, bringing the compilation time of a mid-sized \CFA source file down to the 10-second level. A few problem cases derived from realistic code examples are analyzed in detail, with proposed solutions. This work is a critical step in the \CFA project development to achieve its eventual goal of being used alongside C for large software systems.
    8988\end{abstract}
    9089
     90\clearpage
     91\section*{Acknowledgements}
     92\begin{sloppypar}
     93I would like to thank everyone in the \CFA team for their contribution towards this project. Programming language design and development is a tough subject and requires a lot of teamwork. Without the collaborative efforts from the team, this project could not have been a success. Specifically, I would like to thank Andrew Beach for introducing me to the \CFA codebase, Thierry Delisle for maintaining the test and build automation framework, Michael Brooks for providing example programs of various experimental language and type system features, and most importantly, Professor Martin Karsten for recommending me to the \CFA team, and my supervisor, Professor Peter Buhr for encouraging me to explore deeply into intricate compiler algorithms. Finally, I gratefully acknowledge the help from Aaron Moss, former graduate from the team and the author of the precedent thesis work, to participate in the \CFA team's virtual conferences and email correspondence, and provide many critical arguments and suggestions. 2020 had been an unusually challenging year for everyone and we managed to keep a steady pace.
     94\end{sloppypar}
     95
     96\clearpage
     97\tableofcontents
     98
     99\clearpage
    91100\section{Introduction}
    92101
    93 \section{Completed work}
     102\CFA language, developed by the Programming Language Group at the University of Waterloo, has a long history, with the initial language design in 1992 by Glen Ditchfield~\cite{Ditchfield92} and the first proof-of-concept compiler built in 2003 by Richard Bilson~\cite{Bilson03}. Many new features have been added to the language over time, but the core of \CFA's type-system --- parametric functions introduced by the @forall@ clause (hence the name of the language) providing parametric overloading --- remains mostly unchanged.
     103
     104The current \CFA reference compiler, @cfa-cc@, is designed using the visitor pattern~\cite{vistorpattern} over an abstract syntax tree (AST), where multiple passes over the AST modify it for subsequent passes. @cfa-cc@ still includes many parts taken directly from the original Bilson implementation, which served as the starting point for this enhancement work to the type system. Unfortunately, the prior implementation did not provide the efficiency required for the language to be practical: a \CFA source file of approximately 1000 lines of code can take a multiple minutes to compile. The cause of the problem is that the old compiler used inefficient data structures and algorithms for expression resolution, which involved significant copying and redundant work.
     105
     106This report presents a series of optimizations to the performance-critical parts of the resolver, with a major rework of the compiler data-structures using a functional-programming approach to reduce memory complexity. The improvements were suggested by running the compiler builds with a performance profiler against the \CFA standard-library source-code and a test suite to find the most underperforming components in the compiler algorithm.
     107
     108The \CFA team endorses a pragmatic philosophy that focuses on practical implications of language design and implementation rather than theoretical limits. In particular, the compiler is designed to be expressive with respect to code reuse while maintaining type safety, but compromise theoretical soundness in extreme corner cases. However, when these corner cases do appear in actual usage, they need to be thoroughly investigated. A case-by-case analysis is presented for several of these corner cases, some of which point to certain weaknesses in the language design with solutions proposed based on experimental results.
     109
     110\section{AST restructuring}
    94111
    95112\subsection{Memory model with sharing}
    96113
    97 A major rework of the abstract syntax tree (AST) data structure in the compiler is completed as the first step of the project. The majority of work were documented in the reference manual of the compiler~\cite{cfa-cc}. To summarize:
    98 \begin{itemize}
    99 \item
    100 AST nodes (and therefore subtrees) can be shared without copying when reused.
    101 \item
    102 Modifications apply the functional programming principle, making copies for local changes without affecting the original data shared by other owners. In-place mutations are permitted as a special case when sharing does not happen. The logic is implemented by reference counting.
    103 \item
    104 Memory allocation and freeing are performed automatically using smart pointers.
    105 \end{itemize}
    106 The resolver algorithm designed for overload resolution naturally introduces a significant amount of reused intermediate representations, especially in the following two places:
    107 \begin{itemize}
    108 \item
    109 Function overload candidates are computed by combining the argument candidates bottom-up, with many of them being a common term. For example, if $n$ overloads of a function @f@ all take an integer for the first parameter but different types for the second (@f( int, int )@, @f( int, double )@, etc.) the first term is reused $n$ times for each of the generated candidate expressions. This effect is particularly bad for deep expression trees.
    110 \item
    111 In the unification algorithm and candidate elimination step, actual types are obtained by substituting the type parameters by their bindings. Let $n$ be the complexity (\ie number of nodes in representation) of the original type, $m$ be the complexity of bound type for parameters, and $k$ be the number of occurrences of type parameters in the original type. If everything needs to be deep-copied, the substitution step takes $O(n+mk)$ time and memory, while using shared nodes it is reduced to $O(n)$ time and $O(k)$ memory.
    112 \end{itemize}
    113 One of the worst examples for the old compiler is a long chain of I/O operations
    114 \begin{cfa}
    115 sout | 1 | 2 | 3 | 4 | ...
    116 \end{cfa}
    117 The pipe operator is overloaded by \CFA I/O library for every primitive type in C language, as well as I/O manipulators defined by the library. In total there are around 50 overloads for the output stream operation. On resolving the $n$-th pipe operator in the sequence, the first term, which is the result of sub-expression containing $n-1$ pipe operators, is reused to resolve every overload. Therefore at least $O(n^2)$ copies of expression nodes are made during resolution, not even counting type unification cost; combined with two large factors from number of overloads of pipe operators, and that the ``output stream type'' in \CFA is a trait with 27 assertions (which adds to complexity of the pipe operator's type) this makes compiling a long output sequence extremely slow. In new AST representation only $O(n)$ copies are required and type of pipe operator is not copied at all.
    118 
    119 Reduction in space complexity is especially important, as preliminary profiling result on the old compiler build shows that over half of time spent in expression resolution are on memory allocations.
    120  
     114A major rework of the AST data-structure in the compiler was completed as the first step of the project. The majority of this work is documented in my prior report documenting the compiler reference-manual~\cite{cfa-cc}. To summarize:
     115\begin{itemize}
     116\item
     117AST nodes (and therefore subtrees) can be shared without copying.
     118\item
     119Modifications are performed using functional-programming principles, making copies for local changes without affecting the original data shared by other owners. In-place mutations are permitted as a special case when there is no sharing. The logic is implemented by reference counting.
     120\item
     121Memory allocation and freeing are performed automatically using smart pointers~\cite{smartpointers}.
     122\end{itemize}
     123
     124The resolver algorithm, designed for overload resolution, uses a significant amount of reused, and hence copying, for the intermediate representations, especially in the following two places:
     125\begin{itemize}
     126\item
     127Function overload candidates are computed by combining the argument candidates bottom-up, with many being a common term. For example, if $n$ overloads of a function @f@ all take an integer for the first parameter but different types for the second, \eg @f( int, int )@, @f( int, double )@, etc., the first term is copied $n$ times for each of the generated candidate expressions. This copying is particularly bad for deep expression trees.
     128\item
     129In the unification algorithm and candidate elimination step, actual types are obtained by substituting the type parameters by their bindings. Let $n$ be the complexity (\ie number of nodes in representation) of the original type, $m$ be the complexity of the bound type for parameters, and $k$ be the number of occurrences of type parameters in the original type. If every substitution needs to be deep-copied, these copy step takes $O(n+mk)$ time and memory, while using shared nodes it is reduced to $O(n)$ time and $O(k)$ memory.
     130\end{itemize}
     131One of the worst examples for the old compiler is a long chain of I/O operations:
     132\begin{cfa}
     133sout | 1 | 2 | 3 | 4 | ...;   // print integer constants
     134\end{cfa}
     135The pipe operator is overloaded by the \CFA I/O library for every primitive type in the C language, as well as I/O manipulators defined by the library. In total, there are around 50 overloads for the output stream operation. On resolving the $n$-th pipe operator in the sequence, the first term, which is the result of sub-expression containing $n-1$ pipe operators, is reused to resolve every overload. Therefore at least $O(n^2)$ copies of expression nodes are made during resolution, not even counting type unification cost; combined with the two large factors from number of overloads of pipe operators, and that the ``output stream type'' in \CFA is a trait with 27 assertions (which adds to complexity of the pipe operator's type) this makes compiling a long output sequence extremely slow. In the new AST representation, only $O(n)$ copies are required and the type of the pipe operator is not copied at all.
     136Reduction in space complexity is especially important, as preliminary profiling results on the old compiler build showed over half of the time spent in expression resolution is on memory allocations.
     137
     138Since the compiler codebase is large and the new memory model mostly benefits expression resolution, some of the old data structures are still kept, and a conversion pass happens before and after the general resolve phase. Rewriting every compiler module will take longer, and whether the new model is correct was unknown when this project started, therefore only the resolver is currently implemented with the new data structure.
     139
    121140
    122141\subsection{Merged resolver calls}
    123142
    124 The pre-resolve phase of compilation, inadequately called ``validate'' in the compiler source code, does more than just simple syntax validation, as it also normalizes input program. Some of them, however, requires type information on expressions and therefore needs to call the resolver before the general resolve phase. There are three notable places where the resolver is invoked:
    125 \begin{itemize}
    126 \item
    127 Attempt to generate default constructor, copy constructor and destructor for user-defined @struct@ types
    128 \item
    129 Resolve @with@ statements (the same as in Python, which introduces fields of a structure directly in scope)
     143The pre-resolve phase of compilation, inappropriately called ``validate'' in the compiler source code, has a number of passes that do more than simple syntax and semantic validation; some passes also normalizes the input program. A few of these passes require type information for expressions, and therefore, need to call the resolver before the general resolve phase. There are three notable places where the resolver is invoked:
     144\begin{itemize}
     145\item
     146Generate default constructor, copy constructor and destructor for user-defined @struct@ types.
     147\item
     148Resolve @with@ statements (the same as in Pascal~\cite{pascal}), which introduces fields of a structure directly into a scope.
    130149\item
    131150Resolve @typeof@ expressions (cf. @decltype@ in \CC); note that this step may depend on symbols introduced by @with@ statements.
    132151\end{itemize}
    133 Since the compiler codebase is large and the new memory model mostly only benefits expression resolution, the old data structure is still kept, and a conversion pass happens before and after resolve phase. Rewriting every compiler module will take a long time, and whether the new model is correct is still unknown when started, therefore only the resolver is implemented with the new data structure.
    134 
    135 Since the constructor calls were one of the most expensive to resolve (reason will be shown in the next section), pre-resolve phase were taking more time after resolver moves to the more efficient new implementation. To better facilitate the new resolver, every step that requires type information are reintegrated as part of resolver.
    136 
    137 A by-product of this work is that the reversed dependence of @with@ statement and @typeof@ can now be handled. Previously, the compiler is unable to handle cases such as
     152
     153Since the constructor calls are one of the most expensive to resolve (reason given in~\VRef{s:SpecialFunctionLookup}), this pre-resolve phase was taking a large amount of time even after the resolver was changed to the more efficient new implementation. The problem is that multiple resolutions repeat a significant amount of work. Therefore, to better facilitate the new resolver, every step that requires type information should be integrated as part of the general resolver phase.
     154
     155A by-product of this work is that reversed dependence between @with@ statement and @typeof@ can now be handled. Previously, the compiler was unable to handle cases such as:
    138156\begin{cfa}
    139157struct S { int x; };
    140158S foo();
    141159typeof( foo() ) s; // type is S
    142 with (s) { 
     160with (s) {
    143161        x; // refers to s.x
    144162}
    145163\end{cfa}
    146 since type of @s@ is still unresolved when handling @with@ expressions. Instead, the new (and correct) approach is to evaluate @typeof@ expressions when the declaration is first seen, and it suffices because of the declaration-before-use rule.
     164since the type of @s@ is unresolved when handling @with@ expressions because the @with@ pass follows the @typeof@ pass (interchanging passes only interchanges the problem). Instead, the new (and correct) approach is to evaluate @typeof@ expressions when the declaration is first seen during resolution, and it suffices because of the declaration-before-use rule.
    147165
    148166
    149167\subsection{Special function lookup}
    150 
    151 Reducing the number of functions looked up for overload resolution is an effective way to gain performance when there are many overloads but most of them are trivially wrong. In practice, most functions have few (if any) overloads but there are notable exceptions. Most importantly, constructor @?{}@, destructor @^?{}@, and assignment @?=?@ are generated for every user-defined type, and in a large source file there can be hundreds of them. Furthermore, many calls to them are generated for initializing variables and passing arguments. This fact makes them the most overloaded and most called functions.
    152 
    153 In an object-oriented programming language, object has methods declared with their types, so a call such as @obj.f()@ only needs to perform lookup in the method table corresponding to type of @obj@. \CFA on the other hand, does not have methods, and all types are open (\ie new operations can be defined on them), so a similar approach will not work in general. However, the ``big 3'' operators have a unique property enforced by the language rules, such that the first parameter must have a reference type. Since \CFA does not have class inheritance, reference type must always match exactly. Therefore, argument-dependent lookup can be implemented for these operators, by using a dedicated symbol table.
    154 
    155 The lookup key used for the special functions is the mangled type name of the first parameter, which acts as the @this@ parameter in an object-oriented language. To handle generic types, the type parameters are stripped off, and only the base type is matched. Note that a constructor (destructor, assignment operator) taking arbitrary @this@ argument, for example @forall( dtype T ) void ?{}( T & );@ is not allowed, and it guarantees that if the @this@ type is known, all possible overloads can be found by searching with the given type. In case that the @this@ argument itself is overloaded, it is resolved first and all possible result types are used for lookup.
    156 
    157 Note that for the generated expressions, the particular variable for @this@ argument is fully known, without overloads, so the majority of constructor call resolutions only need to check for one given object type. Explicit constructor calls and assignment statements sometimes may require lookup for multiple types. In the extremely rare case that type of @this@ argument is yet unbound, everything will have to be checked, just like without the argument-dependent lookup algorithm; fortunately, this case almost never happens in practice. An example is found in the library function @new@:
     168\label{s:SpecialFunctionLookup}
     169
     170Reducing the number of function looked ups for overload resolution is an effective way to gain performance when there are many overloads but most of them are trivially wrong. In practice, most functions have few (if any) overloads but there are notable exceptions. Most importantly, constructor @?{}@, destructor @^?{}@, and assignment @?=?@ are generated for every user-defined type (@struct@ and @union@ in C), and in a large source file there can be hundreds of them. Furthermore, many calls are generated for initializing variables, passing arguments and copying values. This fact makes them the most overloaded and most called functions.
     171
     172In an object-oriented programming language, the object-method types are scoped within a class, so a call such as @obj.f()@ only needs to perform lookup in the method table corresponding to the type of @obj@. \CFA on the other hand, does not have methods, and all types are open, \ie new operations can be defined on them without inheritance; at best a \CFA type can be constrained by a translation unit. However, the ``big 3'' operators have a unique property enforced by the language rules: the first parameter must be a reference to its associated type, which acts as the @this@ parameter in an object-oriented language. Since \CFA does not have class inheritance, the reference type must always match exactly. Therefore, argument-dependent lookup can be implemented for these operators by using a dedicated, fast symbol-table.
     173
     174The lookup key for the special functions is the mangled type name of the first parameter. To handle generic types, the type parameters are stripped off, and only the base type is matched. Note a constructor (destructor, assignment operator) may not take an arbitrary @this@ argument, \eg @forall( dtype T ) void ?{}( T & )@, thus guaranteeing that if the @this@ type is known, all possible overloads can be found by searching with this given type. In the case where the @this@ argument itself is overloaded, it is resolved first and all possible result types are used for lookup.
     175
     176Note that for a generated expression, the particular variable for the @this@ argument is fully known, without overloads, so the majority of constructor-call resolutions only need to check for one given object type. Explicit constructor calls and assignment statements sometimes require lookup for multiple types. In the extremely rare case that the @this@-argument type is unbound, all necessary types are guaranteed to be checked, as for the previous lookup without the argument-dependent lookup; fortunately, this complex case almost never happens in practice. An example is found in the library function @new@:
    158177\begin{cfa}
    159178forall( dtype T | sized( T ), ttype TT | { void ?{}( T &, TT ); } )
    160179T * new( TT p ) { return &(*malloc()){ p }; }
    161180\end{cfa}
    162 as @malloc@ may return a pointer to any type, depending on context. 
    163 
    164 Interestingly, this particular line of code actually caused another complicated issue, where the unusually massive work of checking every constructor in presence makes the case even worse. Section~\ref{s:TtypeResolutionInfiniteRecursion} presents a detailed analysis for the problem.
    165 
    166 The ``callable'' operator @?()@ (cf. @operator()@ in \CC) could also be included in the special operator list, as it is usually only on user-defined types, and the restriction that first argument must be a reference seems reasonable in this case.
     181as @malloc@ may return a pointer to any type, depending on context.
     182
     183Interestingly, this particular declaration actually causes another complicated issue, making the complex checking of every constructor even worse. \VRef[Section]{s:TtypeResolutionInfiniteRecursion} presents a detailed analysis of this problem.
     184
     185The ``callable'' operator @?()@ (cf. @operator()@ in \CC) can also be included in this special operator list, as it is usually only on user-defined types, and the restriction that the first argument must be a reference seems reasonable in this case.
    167186
    168187
    169188\subsection{Improvement of function type representation}
    170189
    171 Since substituting type parameters with their bound types is one fundamental operation in many parts of resolver algorithm (particularly unification and environment binding), making as few copies of type nodes as possible helps reducing memory complexity. Even with the new memory management model, allocation is still a significant factor of resolver performance. Conceptually, operations on type nodes of AST should be performed in functional programming style, treating the data structure as immutable and only copy when necessary. The in-place mutation is a mere optimization that does not change logic of operations.
    172 The model was broken on function types by an inappropriate design. Function types require some special treatment due to the existence of assertions. In particular, it must be able to distinguish two different kinds of type parameter usage:
     190Since substituting type parameters with their bound types is one fundamental operation in many parts of resolver algorithm (particularly unification and environment binding), making as few copies of type nodes as possible helps reducing memory complexity. Even with the new memory management model, allocation is still a significant factor of resolver performance. Conceptually, operations on type nodes of the AST should be performed in functional-programming style, treating the data structure as immutable and only copying when necessary. The in-place mutation is a mere optimization that does not change the logic for operations.
     191
     192However, the model was broken for function types by an inappropriate design. Function types require special treatment due to the existence of assertions that constrain the types it supports. Specifically, it must be possible to distinguish two different kinds of type parameter usage:
    173193\begin{cfa}
    174194forall( dtype T ) void foo( T * t ) {
    175         forall( dtype U ) void bar( T * t, U * u ) { ... }
    176 }
    177 \end{cfa}
    178 Here, only @U@ is a free parameter in declaration of @bar@, as it appears in the function's own forall clause; while @T@ is not free.
    179 
    180 Moreover, the resolution algorithm also has to distinguish type bindings of multiple calls to the same function, for example with
     195        forall( dtype U ) void bar( @T@ * t, @U@ * u ) { ... }
     196}
     197\end{cfa}
     198Here, only @U@ is a free parameter in the nested declaration of function @bar@, as @T@ must be bound at the call site when resolving @bar@.
     199
     200Moreover, the resolution algorithm also has to distinguish type bindings of multiple calls to the same function, \eg:
    181201\begin{cfa}
    182202forall( dtype T ) int foo( T x );
    183 foo( foo( 1.0 ) );
    184 \end{cfa}
    185 The inner call has binding (T: double) while the outer call has binding (T: int). Therefore a unique representation of free parameters in each expression is required. This was previously done by creating a copy of the parameter declarations inside function type, and fixing references afterwards. However, fixing references is an inherently deep operation that does not work well with functional programming model, as it must be evaluated eagerly on the entire syntax tree representing the function type.
    186 
    187 The revised approach generates a unique ID value for each function call expression instance and represents an occurrence of free parameter type with a pair of generated ID and the original parameter declaration, so that references do not need to be fixed, and a shallow copy of function type is possible.
    188 
    189 Note that after the change, all declaration nodes in syntax tree representation maps one-to-one with the actual declarations in the program, and therefore are guaranteed to be unique. Such property can potentially enable more optimizations, and some related ideas are presented after Section~\ref{s:SharedSub-ExpressionCaseUniqueExpressions}.
     203int i = foo( foo( 1.0 ) );
     204\end{cfa}
     205The inner call has binding (T: double) while the outer call has binding (T: int). Therefore a unique representation for the free parameters is required in each expression. This type binding was previously done by creating a copy of the parameter declarations inside the function type and fixing references afterwards. However, fixing references is an inherently deep operation that does not work well with the functional-programming style, as it forces eager evaluation on the entire syntax tree representing the function type.
     206
     207The revised approach generates a unique ID value for each function call expression instance and represents an occurrence of a free-parameter type with a pair of generated ID and original parameter declaration, so references are unique and a shallow copy of the function type is possible.
     208
     209Note that after the change, all declaration nodes in the syntax-tree representation now map one-to-one with the actual declarations in the program, and therefore are guaranteed to be unique. This property can potentially enable more optimizations, and some related ideas are presented at the end of \VRef{s:SharedSub-ExpressionCaseUniqueExpressions}.
    190210
    191211
    192212\subsection{Improvement of pruning steps}
    193213
    194 A minor improvement for candidate elimination is to skip the step on the function overloads themselves and only perform on results of function application. As function calls are usually by name, the name resolution rule dictates that every function candidate necessarily has a different type; indirect function calls are rare, and when they do appear, they usually will not have many possible interpretations, and those rarely matches exactly in argument type. Since function types have a much more complex representation than data types (with multiple parameters and assertions), checking equality on them also takes longer.
    195 
    196 A brief test of this approach shows that the number of function overloads considered in expression resolution increases by a negligible amount of less than 1 percent, while type comparisons in candidate elimination are cut by more than half. Improvement is consistent over all \CFA source files in the test suite.
     214A minor improvement for candidate elimination is to skip the step on the function overloads and only check the results of function application. As function calls are usually by name (versus pointers to functions), the name resolution rule dictates that every function candidate necessarily has a different type; indirect function calls are rare, and when they do appear, there are even fewer cases with multiple interpretations, and these rarely match exactly in argument type. Since function types have a much more complex representation (with multiple parameters and assertions) than data types, checking equality on them also takes longer.
     215
     216A brief test of this approach shows that the number of function overloads considered in expression resolution increases by an amount of less than 1 percent, while type comparisons in candidate elimination are reduced by more than half. This improvement is consistent over all \CFA source files in the test suite.
    197217
    198218
     
    200220\label{s:SharedSub-ExpressionCaseUniqueExpressions}
    201221
    202 Unique expression denotes an expression that must be evaluated only once, to prevent unwanted side effects. It is currently only a compiler artifact, generated on tuple member expression of the form
     222Unique expression denotes an expression evaluated only once to prevent unwanted side effects. It is currently only a compiler artifact, generated for tuple-member expression of the form:
    203223\begin{cfa}
    204224struct S { int a; int b; };
     
    206226s.[a, b]; // tuple member expression, type is [int, int]
    207227\end{cfa}
    208 If the aggregate expression contains function calls, it cannot be evaluated multiple times:
     228If the aggregate expression is function call, it cannot be evaluated multiple times:
    209229\begin{cfa}
    210230S makeS();
    211 makeS().[a, b]; // this should only make one S
     231makeS().[a, b]; // this should only generate a unique S
    212232\end{cfa}
    213233Before code generation, the above expression is internally represented as
     
    226246\end{cfa}
    227247at code generation, where @_unique_var@ and @_unique_var_evaluated@ are generated variables whose scope covers all appearances of the same expression.
    228 
    229 Note that although the unique expression is only used for tuple expansion now, it is a generally useful construction, and can be seen in other languages, such as Scala's @lazy val@~\cite{Scala}; therefore it could be worthwhile to introduce the unique expression to a broader context in \CFA and even make it directly available to programmers.
    230 
    231 In the compiler's visitor pattern, however, this creates a problem where multiple paths to a logically unique expression exist, so it may be modified more than once and become ill-formed; some specific intervention is required to ensure that unique expressions are only visited once. Furthermore, a unique expression appearing in more than one places will be copied on mutation so its representation is no longer unique. Some hacks are required to keep it in sync, and the methods are different when mutating the unique expression instance itself or its underlying expression.
    232 
    233 Example when mutating the underlying expression (visit-once guard)
     248The conditional check ensures a single call to @makeS()@ even though there are logically multiple calls because of the tuple field expansion.
     249
     250Note that although the unique expression is only used for tuple expansion now, it is a generally useful construction, and is seen in other programming languages, such as Scala's @lazy val@~\cite{Scala}; therefore it may be worthwhile to introduce the unique expression to a broader context in \CFA and even make it directly available to programmers.
     251
     252In the compiler's visitor pattern, however, this creates a problem where multiple paths to a logically unique expression exist, so it may be modified more than once and become ill-formed; some specific intervention is required to ensure unique expressions are only visited once. Furthermore, a unique expression appearing in more than one places is copied on mutation so its representation is no longer unique.
     253
     254Currently, special cases are required to keep everything synchronized, and the methods are different when mutating the unique expression instance itself or its underlying expression:
     255\begin{itemize}
     256\item
     257When mutating the underlying expression (visit-once guard)
    234258\begin{cfa}
    235259void InsertImplicitCalls::previsit( const ast::UniqueExpr * unqExpr ) {
    236         if ( visitedIds.count( unqExpr->id ) ) visit_children = false;
     260        @if ( visitedIds.count( unqExpr->id ) ) visit_children = false;@
    237261        else visitedIds.insert( unqExpr->id );
    238262}
    239263\end{cfa}
    240 Example when mutating the unique instance itself, which actually creates copies
     264\item
     265When mutating the unique instance itself, which actually creates copies
    241266\begin{cfa}
    242267auto mutExpr = mutate( unqExpr ); // internally calls copy when shared
    243 if ( ! unqMap.count( unqExpr->id ) ) {
     268@if ( ! unqMap.count( unqExpr->id ) ) {@
    244269        ...
    245270} else {
     
    248273}
    249274\end{cfa}
    250 Such workaround seems difficult to be fit into a common visitor template. This suggests the memory model may need different kinds of nodes to accurately represent the syntax tree.
    251 
    252 Together with the fact that declaration nodes are always unique, it is possible that AST nodes can be classified by three different types:
    253 \begin{itemize}
    254 \item
    255 \textbf{Strictly unique} with only one owner (declarations);
    256 \item
    257 \textbf{Logically unique} with (possibly) many owners but should not be copied (unique expression example presented here);
    258 \item
    259 \textbf{Shared} by functional programming model, which assume immutable data structure and are copied on mutation.
     275\end{itemize}
     276Such workarounds are difficult to fit into the common visitor pattern, which suggests the memory model may need different kinds of nodes to accurately represent this feature in the AST.
     277
     278Given that declaration nodes are unique, it is possible for AST nodes to be divided into three different types:
     279\begin{itemize}
     280\item
     281\textbf{Singleton} with only one owner (declarations);
     282\item
     283\textbf{No-copy} with multiple owners but cannot be copied (unique expression example presented here);
     284\item
     285\textbf{Copy} by functional-programming style, which assumes immutable data structures that are copied on mutation.
    260286\end{itemize}
    261287The boilerplate code can potentially handle these three cases differently.
     
    264290\section{Analysis of resolver algorithm complexity}
    265291
    266 The focus of this chapter is to identify and analyze some realistic cases that cause resolver algorithm to have an exponential run time. As previous work has shown [3], the overload resolution problem in \CFA has worst-case exponential complexity; however, only few specific patterns can trigger the exponential complexity in practice. Implementing heuristic-based optimization for those selected cases is helpful to alleviate the problem.
     292The focus of this section is to identify and analyze some realistic cases that cause the resolver algorithm to have an exponential runtime. As previous work has shown~\cite[\S~4.2.1]{Moss19}, the overload resolution problem in \CFA has worst-case exponential complexity; however, only few specific patterns can trigger the exponential complexity in practice. Implementing heuristic-based optimization for those selected cases is helpful to alleviate the problem.
    267293
    268294
     
    270296\label{s:UnboundReturnType}
    271297
    272 The interaction of return type overloading and polymorphic functions creates this problem of function calls with unbound return type, and is further complicated by the presence of assertions.
     298The interaction of return-type overloading and polymorphic functions creates function calls with unbounded return-type, and is further complicated by the presence of assertions.
    273299The prime example of a function with unbound return type is the type-safe version of C @malloc@:
    274300\begin{cfa}
    275 // size deduced from type, so no need to provide the size argument
    276 forall( dtype T | sized( T ) ) T * malloc( void );
    277 \end{cfa}
    278 Unbound return type can be problematic in resolver algorithm complexity because a single match of function call with unbound return type may create multiple candidates. In the worst case, consider a function declared to return any @otype@:
     301forall( dtype T | sized( T ) )
     302T * malloc( void ) { return (T *)malloc( sizeof(T) ); } // call C malloc
     303int * i = malloc();  // type deduced from left-hand size $\Rightarrow$ no size argument or return cast
     304\end{cfa}
     305An unbound return-type is problematic in resolver complexity because a single match of a function call with an unbound return type may create multiple candidates. In the worst case, consider a function declared that returns any @otype@ (defined \VPageref{otype}):
    279306\begin{cfa}
    280307forall( otype T ) T anyObj( void );
    281308\end{cfa}
    282 As the resolver attempts to satisfy the otype constraint on @T@, a single call to @anyObj()@ without the result type known creates at least as many candidates as the number of complete types currently in scope; with generic types it becomes even worse, for example, assuming a declaration of generic pair is available at that point:
     309As the resolver attempts to satisfy the otype constraint on @T@, a call to @anyObj()@ in an expression, without the result type known, creates at least as many candidates as the number of complete types currently in scope; with generic types it becomes even worse, \eg assuming a declaration of a generic @pair@ is available at that point:
    283310\begin{cfa}
    284311forall( otype T, otype U ) struct pair { T first; U second; };
    285312\end{cfa}
    286 Then an @anyObj()@ call can result in arbitrarily complex types, such as @pair( pair( int,int ), pair( int,int ) )@, and the depth can grow indefinitely until the specified parameter depth limit, thus creating exponentially many candidates. However, the expected types allowed by parent expressions are practically very few, so most of those interpretations are invalid; if the result type is never bound up to top level, by the semantic rules it is ambiguous if there are more than one valid bindings, and resolution can fail fast. It is therefore reasonable to delay resolving assertions on an unbound parameter in return type; however, with the current cost model, such behavior may further cause irregularities in candidate selection, such that the presence of assertions can change the preferred candidate, even when order of expression costs are supposed to stay the same. Detailed analysis of this issue will be presented later, in the correctness part.
     313Then an @anyObj()@ call can result in arbitrarily complex types, such as @pair( pair( int, int ), pair( int, int ) )@, and the depth can grow indefinitely until a specified parameter-depth limit, thus creating exponentially many candidates. However, the expected types allowed by parent expressions are practically very few, so most of those interpretations are invalid; if the result type is never bound up to the top level, by the semantic rules it is ambiguous if there is more than one valid binding and resolution fails quickly. It is therefore reasonable to delay resolving assertions on an unbound parameter in a return type; however, with the current cost model, such behavior may further cause irregularities in candidate selection, such that the presence of assertions can change the preferred candidate, even when order of expression costs are supposed to stay the same. A detailed analysis of this issue is presented in \VRef{s:AnalysisTypeSystemCorrectness}.
    287314
    288315
     
    290317\label{s:TtypeResolutionInfiniteRecursion}
    291318
    292 @ttype@ (``tuple type'') is a relatively new addition to the language that attempts to provide type-safe variadic argument semantics. Unlike regular @dtype@ parameters, @ttype@ is only valid in function parameter list, and may only appear once as the type of last parameter. At the call site, a @ttype@ parameter is bound to the tuple type of all remaining function call arguments.
     319@ttype@ (``tuple type'') is a relatively new addition to the language that attempts to provide type-safe variadic argument semantics. Unlike regular @dtype@ parameters, @ttype@ is only valid in a function parameter-list, and may only appear once as the last parameter type. At the call site, a @ttype@ parameter is bound to the tuple type of all remaining function-call arguments.
    293320
    294321There are two kinds of idiomatic @ttype@ usage: one is to provide flexible argument forwarding, similar to the variadic template in \CC (\lstinline[language=C++]|template<typename... args>|), as shown below in the implementation of @unique_ptr@
     
    298325        T * data;
    299326};
    300 forall( dtype T | sized( T ), ttype Args | { void ?{}( T &, Args ); })
    301 void ?{}( unique_ptr( T ) & this, Args args ) {
    302         this.data = new( args );
    303 }
    304 \end{cfa}
    305 the other is to implement structural recursion in the first-rest manner:
    306 \begin{cfa}
    307 forall( otype T, ttype Params | { void process( T ); void func( Params ); })
     327forall( dtype T | sized( T ), @ttype Args@ | { void ?{}( T &, Args ); })
     328void ?{}( unique_ptr( T ) & this, Args @args@ ) {
     329        this.data = new( @args@ );  // forward constructor arguments to dynamic allocator
     330}
     331\end{cfa}
     332The other usage is to implement structural recursion in the first-rest pattern:
     333\begin{cfa}
     334forall( otype T, @ttype Params@ | { void process( T ); void func( Params ); })
    308335void func( T arg1, Params p ) {
    309336        process( arg1 );
    310         func( p );
    311 }
    312 \end{cfa}
    313 For the second use case, it is important that the number of parameters in the recursive call go down, since the call site must deduce all assertion candidates, and that is only possible if by just looking at argument types (and not their values), the recursion is known to be completed in a finite number of steps.
    314 
    315 In recent experiments, however, some flaw in the type binding rules can lead to the first kind of @ttype@ use case produce an invalid candidate that the resolver enters an infinite loop.
    316 
    317 This bug was discovered in an attempt to raise assertion recursive depth limit and one of the library program takes exponentially longer time to compile. The cause of the problem is identified to be the following set of functions.
    318 File @memory.cfa@ contains
    319 \begin{cfa}
    320 #include "memory.hfa"
    321 #include "stdlib.hfa"
    322 \end{cfa}
    323 where file @memory.hfa@ contains the @unique_ptr@ declaration above, and two other similar functions with @ttype@ parameter:
    324 \begin{cfa}
    325 forall( dtype T | sized( T ), ttype Args | { void ?{}( T &, Args ); }) {
     337        func( @p@ );  // recursive call until base case of one argument
     338}
     339\end{cfa}
     340For the second use case, it is imperative the number of parameters in the recursive call goes down, since the call site must deduce all assertion candidates, and that is only possible if by observation of the argument types (and not their values), the recursion is known to be completed in a finite number of steps.
     341
     342In recent experiments, however, a flaw in the type-binding rules can lead to the first kind of @ttype@ use case producing an invalid candidate and the resolver enters an infinite loop.
     343This bug was discovered in an attempt to raise the assertion recursive-depth limit and one of the library programs took exponentially longer to compile. The cause of the problem is the following set of functions:
     344\begin{cfa}
     345// unique_ptr  declaration from above
     346
     347forall( dtype T | sized( T ), ttype Args | { void ?{}( T &, Args ); } ) { // distribute forall clause
    326348        void ?{}( counter_data( T ) & this, Args args );
    327349        void ?{}( counter_ptr( T ) & this, Args args );
    328350        void ?{}( unique_ptr( T ) & this, Args args );
    329351}
    330 \end{cfa}
    331 File @stdlib.hfa@ contains
    332 \begin{cfa}
     352
    333353forall( dtype T | sized( T ), ttype TT | { void ?{}( T &, TT ); } )
    334 T * new( TT p ) { return &(*malloc()){ p }; }
    335 \end{cfa}
    336 
    337 In the expression @(*malloc()){p}@, the type of object being constructed is yet unknown, since the return type information is not immediately provided. That caused every constructor to be searched, and while normally a bound @ttype@ cannot be unified with any free parameter, it is possible with another free @ttype@. Therefore in addition to the correct option provided by assertion, 3 wrong options are examined, each of which again requires the same assertion, for an unknown base type T and @ttype@ arguments, and that becomes an infinite loop, until the specified recursion limit and resolution is forced to fail. Moreover, during the recursion steps, number of candidates grows exponentially, since there are always 3 options at each step.
    338 
    339 Unfortunately, @ttype@ to @ttype@ binding is necessary, to allow calling the function provided by assertion indirectly.
    340 \begin{cfa}
    341 forall( dtype T | sized( T ), ttype Args | { void ?{}( T &, Args ); })
    342 void ?{}( unique_ptr( T ) & this, Args args ) { this.data = (T * )new( args ); }
    343 \end{cfa}
    344 Here the constructor assertion is used for the @new( args )@ call.
     354T * new( TT p ) { return @&(*malloc()){ p };@ }
     355\end{cfa}
     356In the expression @(*malloc()){p}@, the type of the object being constructed is unknown, since the return-type information is not immediately available. That causes every constructor to be searched, and while normally a bound @ttype@ cannot be unified with any free parameter, it is possible with another free @ttype@. Therefore, in addition to the correct option provided by the assertion, 3 wrong options are examined, each of which again requires the same assertion, for an unknown base-type @T@ and @ttype@ argument, which becomes an infinite loop until the specified recursion limit and resolution is fails. Moreover, during the recursion steps, the number of candidates grows exponentially, since there are always 3 options at each step.
     357
     358Unfortunately, @ttype@ to @ttype@ binding is necessary, to allow indirectly calling a function provided in an assertion.
     359\begin{cfa}
     360forall( dtype T | sized( T ), ttype Args | { @void ?{}( T &, Args );@ })
     361void ?{}( unique_ptr( T ) & this, Args args ) { this.data = (T *)@new( args )@; } // constructor call
     362\end{cfa}
     363Here the constructor assertion is used by the @new( args )@ call to indirectly call the constructor on the allocated storage.
    345364Therefore, it is hard, perhaps impossible, to solve this problem by tweaking the type binding rules. An assertion caching algorithm can help improve this case by detecting cycles in recursion.
    346365
    347 Meanwhile, without the caching algorithm implemented, some changes in the \CFA source code are enough to eliminate this problem, at least in the current codebase. Note that the issue only happens with an overloaded variadic function, which rarely appears in practice, since the idiomatic use cases are for argument forwarding and self-recursion. The only overloaded @ttype@ function so far discovered in all of \CFA standard library code is the constructor, and by utilizing the argument-dependent lookup process described in Section~\ref{s:UnboundReturnType}, adding a cast before constructor call gets rid of the issue.
    348 \begin{cfa}
    349 T * new( TT p ) { return &(*(T * )malloc()){ p }; }
     366Meanwhile, without a caching algorithm implemented, some changes in the \CFA source code are enough to eliminate this problem, at least in the current codebase. Note that the issue only happens with an overloaded variadic function, which rarely appears in practice, since the idiomatic use cases are for argument forwarding and self-recursion. The only overloaded @ttype@ function so far discovered in all of \CFA standard library is the constructor, and by utilizing the argument-dependent lookup process described in \VRef{s:UnboundReturnType}, adding a cast before the constructor call removes the issue.
     367\begin{cfa}
     368T * new( TT p ) { return &(*@(T * )@malloc()){ p }; }
    350369\end{cfa}
    351370
     
    353372\subsection{Reused assertions in nested generic type}
    354373
    355 The following test of deeply nested dynamic generic type reveals that locally caching reused assertions is necessary, rather than just a resolver optimization, because recomputing assertions can result in bloated generated code size:
     374The following test of deeply nested, dynamic generic type reveals that locally caching reused assertions is necessary, rather than just a resolver optimization, because recomputing assertions can result in bloated generated code size:
    356375\begin{cfa}
    357376struct nil {};
     
    361380int main() {
    362381        #if   N==0
    363         nil x;   
     382        nil @x@;
    364383        #elif N==1
    365         cons( size_t, nil ) x;
     384        cons( size_t, nil ) @x@;
    366385        #elif N==2
    367         cons( size_t, cons( size_t, nil ) ) x;
     386        cons( size_t, cons( size_t, nil ) ) @x@;
    368387        #elif N==3
    369         cons( size_t, cons( size_t, cons( size_t, nil ) ) ) x;
     388        cons( size_t, cons( size_t, cons( size_t, nil ) ) ) @x@;
    370389        // similarly for N=4,5,6
    371390        #endif
    372391}
    373392\end{cfa}
    374 At the declaration of @x@, it is implicitly initialized by generated constructor call, whose signature is given by
     393At the declaration of @x@, it is implicitly initialized by generated constructor call, with signature:
    375394\begin{cfa}
    376395forall( otype L, otype R ) void ?{}( cons( L, R ) & );
    377396\end{cfa}
    378 Note that the @otype@ constraint contains 4 assertions:
     397where the @otype@ constraint contains the 4 assertions:\label{otype}
    379398\begin{cfa}
    380399void ?{}( L & ); // default constructor
     
    383402L & ?=?( L &, L & ); // assignment
    384403\end{cfa}
    385 Now since the right hand side of outermost cons is again a cons, recursive assertions are required. When the compiler cannot cache and reuse already resolved assertions, it becomes a problem, as each of those 4 pending assertions again asks for 4 more assertions one level below. Without any caching, number of resolved assertions grows exponentially, while that is obviously unnecessary since there are only $n+1$ different types involved. Even worse, this causes exponentially many wrapper functions generated later at the codegen step, and results in huge compiled binary.
    386 
    387 \begin{table}[h]
     404
     405\begin{table}[htb]
     406\centering
    388407\caption{Compilation results of nested cons test}
     408\label{t:NestedConsTest}
    389409\begin{tabular}{|r|r|r|}
    390410\hline
     
    402422\end{table}
    403423
    404 As the local functions are implemented by emitting executable code on the stack~\cite{gcc-nested-func}, it eventually means that compiled code also has exponential run time. This problem has evident practical implications, as nested collection types are frequently used in real production code.
    405 
     424Now since the right hand side of outermost cons is again a cons, recursive assertions are required. \VRef[Table]{t:NestedConsTest} shows when the compiler does not cache and reuse already resolved assertions, it becomes a problem, as each of these 4 pending assertions again asks for 4 more assertions one level below. Without caching, the number of resolved assertions grows exponentially, which is unnecessary since there are only $n+1$ different types involved. Even worse, this problem causes exponentially many wrapper functions to be generated at the backend, resulting in a huge binary. As the local functions are implemented by emitting executable code on the stack~\cite{gcc-nested-func}, it means that compiled code also has exponential run time. This problem has practical implications, as nested collection types are frequently used in real production code.
    406425
    407426\section{Analysis of type system correctness}
     427\label{s:AnalysisTypeSystemCorrectness}
    408428
    409429In Moss' thesis~\cite[\S~4.1.2,~p.~45]{Moss19}, the author presents the following example:
     
    422442From the set of candidates whose parameter and argument types have been unified and whose assertions have been satisfied, those whose sub-expression interpretations have the smallest total cost of conversion are selected ... The total cost of conversion for each of these candidates is then calculated based on the implicit conversions and polymorphism involved in adapting the types of the sub-expression interpretations to the formal parameter types.
    423443\end{quote}
    424 With this model, the algorithm picks @g1@ in resolving the @f( g( 42 ) )@ call, which seems to be undesirable.
    425 
    426 There are further evidence that shows the Bilson model is fundamentally incorrect, following the discussion of unbound return type in Section~\ref{s:UnboundReturnType}. By the conversion cost specification, a binding from a polymorphic type parameter to a concrete type incurs a polymorphic cost of 1. It remains unspecified \emph{when} the type parameters should become bound. When the parameterized types appear in the function parameters, they can be deduced from the argument type, and there is no ambiguity. In the unbound return case, however, the binding may happen at any stage in expression resolution, therefore it is impossible to define a unique local conversion cost. Note that type binding happens exactly once per parameter in resolving the entire expression, so the global binding cost is unambiguously 1.
    427 
    428 As per the current compiler implementation, it does have a notable inconsistency in handling such case. For any unbound parameter that does \emph{not} come with an associated assertion, it remains unbound to the parent expression; for those that does however, they are immediately bound in the assertion resolution step, and concrete result types are used in the parent expressions.
    429 
     444With this model, the algorithm picks @g1@ in resolving the @f( g( 42 ) )@ call, which is undesirable.
     445
     446There is further evidence that shows the Bilson model is fundamentally incorrect, following the discussion of unbound return type in \VRef{s:UnboundReturnType}. By the conversion-cost specification, a binding from a polymorphic type-parameter to a concrete type incurs a polymorphic cost of 1. It remains unspecified \emph{when} the type parameters should become bound. When the parameterized types appear in function parameters, they can be deduced from the argument type, and there is no ambiguity. In the unbound return case, however, the binding may happen at any stage in expression resolution, therefore it is impossible to define a unique local conversion cost. Note that type binding happens exactly once per parameter in resolving the entire expression, so the global binding cost is unambiguously 1.
     447
     448In the current compiler implementation, there is a notable inconsistency in handling this case. For any unbound parameter that does \emph{not} come with an associated assertion, it remains unbound to the parent expression; for those that do, however, they are immediately bound in the assertion resolution step, and concrete result types are used in the parent expressions.
    430449Consider the following example:
    431450\begin{cfa}
     
    433452void h( int * );
    434453\end{cfa}
    435 The expression @h( f() )@ eventually has a total cost of 1 from binding (T: int), but in the eager resolution model, the cost of 1 may occur either at call to @f@ or at call to @h@, and with the assertion resolution triggering a binding, the local cost of @f()@ is (0 poly, 0 spec) with no assertions, but (1 poly, -1 spec) with an assertion:
    436 \begin{cfa}
    437 forall( dtype T | { void g( T * ); } ) T * f( void );
     454The expression @h( f() )@ eventually has a total cost of 1 from binding (T: int), but in the eager-resolution model, the cost of 1 may occur either at the call to @f@ or at call to @h@, and with the assertion resolution triggering a binding, the local cost of @f()@ is (0 poly, 0 spec) with no assertions, but (1 poly, -1 spec) with an assertion:
     455\begin{cfa}
     456forall( dtype T | @{ void g( T * ); }@ ) T * f( void );
    438457void g( int * );
    439458void h( int * );
    440459\end{cfa}
    441 and that contradicts the principle that adding assertions should make expression cost lower. Furthermore, the time at which type binding and assertion resolution happens is an implementation detail of the compiler, but not a part of language definition. That means two compliant \CFA compilers, one performing immediate assertion resolution at each step, and one delaying assertion resolution on unbound types, can produce different expression costs and therefore different candidate selection, making the language rule itself partially undefined and therefore unsound. By the above reasoning, the updated cost model using global sum of costs should be accepted as the standard. It also allows the compiler to freely choose when to resolve assertions, as the sum of total costs is independent of that choice; more optimizations regarding assertion resolution can also be implemented.
     460and that contradicts the principle that adding assertions should make expression cost lower. Furthermore, the time at which type binding and assertion resolution happens is an implementation detail of the compiler, not part of the language definition. That means two compliant \CFA compilers, one performing immediate assertion resolution at each step, and one delaying assertion resolution on unbound types, can produce different expression costs and therefore different candidate selection, making the language rule itself partially undefined, and therefore, unsound. By the above reasoning, the updated cost model using global sum of costs should be accepted as the standard. It also allows the compiler to freely choose when to resolve assertions, as the sum of total costs is independent of that choice; more optimizations regarding assertion resolution can also be implemented.
    442461
    443462
    444463\section{Timing results}
    445464
    446 For the timing results presented here, the \CFA compiler is built with gcc 9.3.0, and tested on a server machine running Ubuntu 20.04, 64GB RAM and 32-core 2.2 GHz CPU, results reported by the time command, and using only 8 cores in parallel such that the time is close to the case with 100% CPU utilization on a single thread.
    447 
    448 On the most recent build, the \CFA standard library (~1.3 MB of source code) compiles in 4 minutes 47 seconds total processor time (single thread equivalent), with the slowest file taking 13 seconds. The test suite (178 test cases, ~2.2MB of source code) completes within 25 minutes total processor time,\footnote{Including a few runtime tests; total time spent in compilation is approximately 21 minutes.} with the slowest file taking 23 seconds. In contrast, the library build on old compiler takes 85 minutes total, 5 minutes for the slowest file. Full test suite takes too long with old compiler build and is therefore not run, but the slowest test cases take approximately 5 minutes. Overall, the most recent build compared to old build in April 2020, before the project started, is consistently faster by a factor of 20.
    449 
    450 Additionally, 6 selected \CFA source files with distinct features from library and test suite are used to test compiler performance after each of the optimizations are implemented. Test files are from the most recent build and run through C preprocessor to eliminate the factor of header file changes. The selected tests are:
    451 \begin{itemize}
    452 \item
    453 @lib/fstream@ (112 KB)\footnote{File sizes are after preprocessing, with no line information (\lstinline|gcc -E -P|).}: implementation of I/O library
     465For the timing results presented here, the \CFA compiler is built with gcc 9.3.0, and tested on a server machine running Ubuntu 20.04, 64GB RAM and 32-core 2.2 GHz CPU.
     466Timing is reported by the @time@ command and an experiment is run using 8 cores, where each core is at 100\% CPU utilization.
     467
     468On the most recent build, the \CFA standard library ($\approx$1.3 MB of source code) compiles in 4 minutes 47 seconds total processor time (single thread equivalent), with the slowest file taking 13 seconds. The test suite (178 test cases, $\approx$2.2MB of source code) completes within 25 minutes total processor time,
     469% PAB: I do not understand this footnote.
     470%\footnote{Including a few runtime tests; total time spent in compilation is approximately 21 minutes.}
     471with the slowest file taking 23 seconds. In contrast, the library build with the old compiler takes 85 minutes total, 5 minutes for the slowest file. The full test-suite takes too long with old compiler build and is therefore not run, but the slowest test cases take approximately 5 minutes. Overall, the most recent build compared to an old build is consistently faster by a factor of 20.
     472
     473Additionally, 6 selected \CFA source files with distinct features from the library and test suite are used to illustrate the compiler performance change after each of the implemented optimizations. Test files are from the most recent build and run through the C preprocessor to expand header file, perform macro expansions, but no line number information (@gcc -E -P@).
     474\VRef[Table]{t:SelectedFileByCompilerBuild} shows the selected tests:
     475\begin{itemize}
     476\item
     477@lib/fstream@ (112 KB)
    454478\item
    455479@lib/mutex@ (166 KB): implementation of concurrency primitive
     
    459483@lib/stdlib@ (64 KB): type-safe wrapper to @void *@-based C standard library functions
    460484\item
    461 @test/ISO2@ (55 KB): application of I/O library
     485@test/io2@ (55 KB): application of I/O library
    462486\item
    463487@test/thread@ (188 KB): application of threading library
    464488\end{itemize}
    465 
    466 The \CFA compiler builds are picked from git commit history that passed the test suite, and implement the optimizations incrementally:
    467 \begin{itemize}
    468 \item
    469 \#0 is the first working build of new AST data structure
     489versus \CFA compiler builds picked from the git commit history that implement the optimizations incrementally:
     490\begin{itemize}
     491\item
     492old resolver
     493\item
     494\#0 is the first working build of the new AST data structure
    470495\item
    471496\#1 implements special symbol table and argument-dependent lookup
    472497\item
    473 \#2 implements late assertion satisfaction
    474 \item
    475 \#3 implements revised function type representation
    476 \item
    477 \#4 skips pruning on expressions with function type (most recent build)
    478 \end{itemize}
    479 The old resolver with no memory sharing and none of the optimizations above is also tested.
    480 \begin{table}
     498\#2 implements late assertion-satisfaction
     499\item
     500\#3 implements revised function-type representation
     501\item
     502\#4 skips pruning on expressions for function types (most recent build)
     503\end{itemize}
     504Reading left to right for a test shows the benefit of each optimization on the cost of compilation.
     505
     506\begin{table}[htb]
     507\centering
    481508\caption{Compile time of selected files by compiler build, in seconds}
     509\label{t:SelectedFileByCompilerBuild}
    482510\begin{tabular}{|l|r|r|r|r|r|r|}
    483511\hline
     
    502530\end{table}
    503531
    504 
    505532\section{Conclusion}
    506533
    507 Over the course of 8 months of active research and development in \CFA type system and compiler algorithm, performance of the reference \CFA compiler, cfa-cc, has been greatly improved, allowing mid-sized \CFA programs to be compiled and built reasonably fast. As there are also ongoing efforts in the team on building a standard library, evaluating the runtime performance, and attempting to incorporate \CFA with existing software written in C, this project is especially meaningful for practical purposes.
    508 
    509 Analysis conducted in the project were based significantly on heuristics and practical evidence, as the theoretical bounds and average cases for the expression resolution problem differ. This approach was difficult at start to follow, with an unacceptably slow compiler, since running the program through debugger and validation tools (\eg @gdb@, @valgrind@) adds another order of magnitude to run time, which was already in minutes. However, near the end of the project, many significant improvements have already been made and new optimizations can be tested immediately. The positive feedback in development cycle benefits the \CFA team as a whole, more than just for the compiler optimizations.
    510 
    511 Some potential issues of the language that may happen frequently in practice have been identified. Due to the time constraint and complex nature of these problems, a handful of them remain unsolved, but some constructive proposals are made. Notably, introducing a local assertion cache in the resolver is a common solution for a few remaining problems, so that should be the focus of work soon.
    512 
    513 The \CFA team are planning on a public alpha release of the language as the compiler performance becomes promising, and other parts of the system, such as a standard library, are also being enhanced. Ideally, the remaining problems should be resolved before release, and the solutions will also be integral to drafting a formal specification.
     534Over the course of 8 months of active research and development of the \CFA type system and compiler algorithms, performance of the reference \CFA compiler, cfa-cc, has been greatly improved. Now, mid-sized \CFA programs are compiled reasonably fast. Currently, there are ongoing efforts by the \CFA team to augment the standard library and evaluate its runtime performance, and incorporate \CFA with existing software written in C; therefore this project is especially meaningful for these practical purposes.
     535
     536Accomplishing this work was difficult. Analysis conducted in the project is based significantly on heuristics and practical evidence, as the theoretical bounds and average cases for the expression resolution problem differ. As well, the slowness of the initial compiler made attempts to understand why and where problems exist extremely difficult because both debugging and validation tools (\eg @gdb@, @valgrind@, @pref@) further slowed down compilation time. However, by the end of the project, I had found and fixed several significant problems and new optimizations are easier to introduce and test. The reduction in the development cycle benefits the \CFA team as a whole.
     537
     538Some potential issues of the language, which happen frequently in practice, have been identified. Due to the time constraint and complex nature of these problems, a handful of them remain unsolved, but some constructive proposals are made. Notably, introducing a local assertion cache in the resolver is a reasonable solution for a few remaining problems, so that should be the focus of future work.
     539
     540The \CFA team are planning on a public alpha release of the language as the compiler performance, given my recent improvements, is now useable. Other parts of the system, such as the standard library, have made significant gains due to the speed up in the development cycle. Ideally, the remaining problems should be resolved before release, and the solutions will also be integral to drafting a formal specification.
    514541
    515542\addcontentsline{toc}{section}{\refname}
  • doc/theses/fangren_yu_COOP_S20/Report.tex

    rb6a8b31 rd95969a  
    1717\usepackage[usenames]{color}
    1818\input{common}                                          % common CFA document macros
    19 \usepackage[dvips,plainpages=false,pdfpagelabels,pdfpagemode=UseNone,colorlinks=true,pagebackref=true,linkcolor=blue,citecolor=blue,urlcolor=blue,pagebackref=true,breaklinks=true]{hyperref}
     19\usepackage[dvips,plainpages=false,pdfpagelabels,pdfpagemode=UseNone,colorlinks=true,linkcolor=blue,citecolor=blue,urlcolor=blue,pagebackref=true,breaklinks=true]{hyperref}
    2020\usepackage{breakurl}
    2121\urlstyle{sf}
Note: See TracChangeset for help on using the changeset viewer.