Context Navigation

← Previous Change
Next Change →

Changeset 8b590a4 for doc

Timestamp:

Mar 13, 2019, 11:35:33 AM (7 years ago)

Author:

Peter A. Buhr <pabuhr@…>

Branches:

ADT, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum

Children:

d078afd

Parents:

1c35c78 (diff), 30e32b2 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Location:

doc

Files:

: 48 added
: 12 edited

bibliography/pl.bib (modified) (5 diffs)
theses/aaron_moss_PhD/phd/Makefile (modified) (5 diffs)
theses/aaron_moss_PhD/phd/background.tex (modified) (13 diffs)
theses/aaron_moss_PhD/phd/conclusion.tex (modified) (1 diff)
theses/aaron_moss_PhD/phd/evaluation/algo-summary.dat (added)
theses/aaron_moss_PhD/phd/evaluation/algo-summary.gp (added)
theses/aaron_moss_PhD/phd/evaluation/bu-summary.dat (added)
theses/aaron_moss_PhD/phd/evaluation/cfa-cc/cfa-bu.csv (added)
theses/aaron_moss_PhD/phd/evaluation/cfa-cc/cfa-co.csv (added)
theses/aaron_moss_PhD/phd/evaluation/cfa-cc/cfa-dca.csv (added)
theses/aaron_moss_PhD/phd/evaluation/cfa-cc/cfa-def.csv (added)
theses/aaron_moss_PhD/phd/evaluation/cfa-cc/cfa-imm.csv (added)
theses/aaron_moss_PhD/phd/evaluation/cfa-mem-by-time.tsv (added)
theses/aaron_moss_PhD/phd/evaluation/cfa-mem.tsv (added)
theses/aaron_moss_PhD/phd/evaluation/cfa-plots.gp (added)
theses/aaron_moss_PhD/phd/evaluation/cfa-time.tsv (added)
theses/aaron_moss_PhD/phd/evaluation/data.xlsx (added)
theses/aaron_moss_PhD/phd/evaluation/generic-timing.dat (modified) (1 diff)
theses/aaron_moss_PhD/phd/evaluation/mem-by-max-assns.tsv (added)
theses/aaron_moss_PhD/phd/evaluation/mem-by-max-depth.tsv (added)
theses/aaron_moss_PhD/phd/evaluation/metric-plots.gp (added)
theses/aaron_moss_PhD/phd/evaluation/per-prob-scatter.gp (added)
theses/aaron_moss_PhD/phd/evaluation/per-prob.gp (added)
theses/aaron_moss_PhD/phd/evaluation/per-prob.tsv (added)
theses/aaron_moss_PhD/phd/evaluation/per_prob/imgui-per-prob.csv (added)
theses/aaron_moss_PhD/phd/evaluation/per_prob/io1-per-prob.csv (added)
theses/aaron_moss_PhD/phd/evaluation/per_prob/io2-per-prob.csv (added)
theses/aaron_moss_PhD/phd/evaluation/per_prob/kernel-per-prob.csv (added)
theses/aaron_moss_PhD/phd/evaluation/per_prob/math1-per-prob.csv (added)
theses/aaron_moss_PhD/phd/evaluation/per_prob/math2-per-prob.csv (added)
theses/aaron_moss_PhD/phd/evaluation/per_prob/math3-per-prob.csv (added)
theses/aaron_moss_PhD/phd/evaluation/per_prob/math4-per-prob.csv (added)
theses/aaron_moss_PhD/phd/evaluation/per_prob/minmax-per-prob.csv (added)
theses/aaron_moss_PhD/phd/evaluation/per_prob/preemption-per-prob.csv (added)
theses/aaron_moss_PhD/phd/evaluation/per_prob/rational-per-prob.csv (added)
theses/aaron_moss_PhD/phd/evaluation/per_prob/searchsort-per-prob.csv (added)
theses/aaron_moss_PhD/phd/evaluation/per_prob/swap-per-prob.csv (added)
theses/aaron_moss_PhD/phd/evaluation/time-by-max-assns.tsv (added)
theses/aaron_moss_PhD/phd/evaluation/time-by-max-depth.tsv (added)
theses/aaron_moss_PhD/phd/experiments.tex (added)
theses/aaron_moss_PhD/phd/figures/bilson-conv-graph.eps (added)
theses/aaron_moss_PhD/phd/figures/bilson-conv-graph.odg (added)
theses/aaron_moss_PhD/phd/figures/extended-conv-graph.eps (added)
theses/aaron_moss_PhD/phd/figures/extended-conv-graph.odg (added)
theses/aaron_moss_PhD/phd/figures/persistent-union-find.eps (added)
theses/aaron_moss_PhD/phd/figures/persistent-union-find.odg (added)
theses/aaron_moss_PhD/phd/figures/resolution-dag.eps (added)
theses/aaron_moss_PhD/phd/figures/resolution-dag.odg (added)
theses/aaron_moss_PhD/phd/figures/safe-conv-graph.eps (added)
theses/aaron_moss_PhD/phd/figures/safe-conv-graph.odg (added)
theses/aaron_moss_PhD/phd/figures/union-find-with-classes.eps (added)
theses/aaron_moss_PhD/phd/figures/union-find-with-classes.odg (added)
theses/aaron_moss_PhD/phd/frontpgs.tex (modified) (6 diffs)
theses/aaron_moss_PhD/phd/generic-bench.tex (added)
theses/aaron_moss_PhD/phd/generic-types.tex (modified) (22 diffs)
theses/aaron_moss_PhD/phd/introduction.tex (modified) (2 diffs)
theses/aaron_moss_PhD/phd/macros.tex (modified) (1 diff)
theses/aaron_moss_PhD/phd/resolution-heuristics.tex (modified) (1 diff)
theses/aaron_moss_PhD/phd/thesis.tex (modified) (3 diffs)
theses/aaron_moss_PhD/phd/type-environment.tex (modified) (9 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/bibliography/pl.bib

-              r1c35c78
+              r8b590a4
     author      = {Tarditi, David and Elliott, Archibald Samuel and Ruef, Andrew and Hicks, Michael},
     title       = {Checked C: Making C Safe by Extension},
+    booktitle   = {2018 IEEE Cybersecurity Development (SecDev)},
+    publisher   = {IEEE},
+    year        = {2018},
+    month       = seep,
+    pages       = {53-60},
+    url         = {https://www.microsoft.com/en-us/research/publication/checkedc-making-c-safe-by-extension/},
+    booktitle   = {2018 IEEE Cybersecurity Development (SecDev)}
+    year = {2018},
+    month = {September},
+    pages = {53-60},
+    publisher = {IEEE},
+    url = {https://www.microsoft.com/en-us/research/publication/checkedc-making-c-safe-by-extension/},
+}
+@misc{Clang,
+    keywords = {clang},
+    contributer = {a3moss@uwaterloo.ca},
+    title = {Clang: a {C} language family frontend for {LLVM}},
+    howpublished = {\href{https://clang.llvm.org/}{https://\-clang.llvm.org/}},
+    note = {Accessed 2019-02-22}
+}
 …
+}
+@misc{Dotty-github,
+    keywords = {dotty,scala},
+    contributer = {a3moss@uwaterloo.ca},
+    author = {Martin Odersky},
+    title = {Dotty},
+    howpublished = {\href{https://github.com/lampepfl/dotty}{https://\-github.com/\-lampepfl/\-dotty}},
+    note = {Acessed: 2019-02-22}
+}
 @unpublished{Duff83,
     keywords    = {C, switch statement, control flow},
 …
     pages       = {325-361},
+}
+@article{Tarjan75,
+ keywords = {union-find},
+ contributer = {a3moss@uwaterloo.ca},
+ author = {Tarjan, Robert Endre},
+ title = {Efficiency of a Good But Not Linear Set Union Algorithm},
+ journal = {J. ACM},
+ issue_date = {April 1975},
+ volume = {22},
+ number = {2},
+ month = apr,
+ year = {1975},
+ issn = {0004-5411},
+ pages = {215--225},
+ numpages = {11},
+ url = {http://doi.acm.org/10.1145/321879.321884},
+ doi = {10.1145/321879.321884},
+ acmid = {321884},
+ publisher = {ACM},
+ address = {New York, NY, USA},
+}
 @book{Eiffel,
 …
+}
+@misc{Matsakis17,
+    keywords    = {Rust, Chalk, PROLOG},
+    contributer = {a3moss@uwaterloo.ca},
+    author      = {Nicholas Matsakis},
+    title       = {Lowering {Rust} traits to logic},
+    month       = jan,
+    year        = 2017,
+    howpublished= {\href{http://smallcultfollowing.com/babysteps/blog/2017/01/26/lowering-rust-traits-to-logic/}
+                  {http://smallcultfollowing.com/\-babysteps/\-blog/\-2017/\-01/\-26/\-lowering-rust-traits-to-logic/}},
+    optnote     = {Accessed: 2019-01},
+}
 @article{Cormack89,
     keywords    = {parsing, LR, error recovery},
 …
     publisher   = {Motorola},
     year        = 1992,
+}
+@misc{Haberman16,
+    keywords    = {C++ template expansion},
+    contributer = {a3moss@uwaterloo.ca},
+    author      = {Josh Haberman},
+    title       = {Making arbitrarily-large binaries from fixed-size {C}{\kern-.1em\hbox{\large\texttt{+\kern-.25em+}}} code},
+    year        = 2016,
+    howpublished= {\href{http://blog.reverberate.org/2016/01/making-arbitrarily-large-binaries-from.html}
+                  {
+          {http://blog.reverberate.org/\-2016/\-01/\-making-arbitrarily-large-binaries-from.html}
+          }},
+    optnote     = {Accessed: 2016-09},
+}

doc/theses/aaron_moss_PhD/phd/Makefile

-              r1c35c78
+              r8b590a4
 BIBDIR = ../../../bibliography
 EVALDIR = evaluation
+FIGDIR = figures
 TEXLIB = .:${BUILD}:${BIBDIR}:
 …
 BIBTEX = BIBINPUTS=${TEXLIB} && export BIBINPUTS && bibtex
 VPATH = ${EVALDIR}
+VPATH = ${EVALDIR} ${FIGDIR}
 BASE = thesis
 …
 background \
 generic-types \
+resolution-heuristics \
 type-environment \
 resolution-heuristics \
+experiments \
 conclusion \
+generic-bench \
+}
+FIGURES = ${addsuffix .eps, \
+safe-conv-graph \
+resolution-dag \
+union-find-with-classes \
+persistent-union-find \
+}
 GRAPHS = ${addsuffix .tex, \
 generic-timing \
+tests-completed \
+per-prob-histo \
+per-prob-depth \
+cfa-time \
+}
 …
         dvips ${BUILD}/$< -o ${BUILD}/$@
 ${BASE}.dvi : Makefile ${SOURCES} ${GRAPHS} ${BIBFILE} ${BUILD}
+${BASE}.dvi : Makefile ${SOURCES} ${GRAPHS} ${FIGURES} ${BIBFILE} ${BUILD}
         ${LATEX} ${BASE}
         ${BIBTEX} ${BUILD}/${BASE}
 …
         ${LATEX} ${BASE}
 ${GRAPHS} : generic-timing.gp generic-timing.dat ${BUILD}
+generic-timing.tex : generic-timing.gp generic-timing.dat ${BUILD}
         gnuplot -e BUILD="'${BUILD}/'" ${EVALDIR}/generic-timing.gp
+tests-completed.tex : algo-summary.gp algo-summary.dat bu-summary.dat ${BUILD}
+        gnuplot -e BUILD="'${BUILD}/'" ${EVALDIR}/algo-summary.gp
+per-prob-histo.tex : per-prob.gp per-prob.tsv ${BUILD}
+        gnuplot -e BUILD="'${BUILD}/'" ${EVALDIR}/per-prob.gp
+per-prob-depth.tex : per-prob-scatter.gp ${BUILD}
+        gnuplot -e BUILD="'${BUILD}/'" ${EVALDIR}/per-prob-scatter.gp
+cfa-time.tex : cfa-plots.gp cfa-time.tsv cfa-mem.tsv ${BUILD}
+        gnuplot -e BUILD="'${BUILD}/'" ${EVALDIR}/cfa-plots.gp
 ${BUILD}:

doc/theses/aaron_moss_PhD/phd/background.tex

-              r1c35c78
+              r8b590a4
 \chapter{\CFA{}}
+\label{cfa-chap}
 \CFA{} adds a number of features to C, some of them providing significant increases to the expressive power of the language, but all designed to maintain the existing procedural programming paradigm of C and to be as orthogonal as possible to each other.
 To provide background for the contributions in subsequent chapters, this chapter provides a summary of the features of \CFA{} at the time this work was conducted.
+The core design of \CFA{} is laid out in Glen Ditchfield's 1992 PhD thesis, \emph{Contextual Polymorphism}\cite{Ditchfield92}; in that thesis, Ditchfield presents the theoretical underpinnings of the \CFA{} polymorphism model.
+Building on Ditchfield's design for contextual polymorphism as well as KW-C\cite{Buhr94a}, an earlier set of (largely syntactic) extensions to C, Richard Bilson\cite{Bilson03} built the first version of the \CFA{} compiler, \CFACC{}, in the early 2000's.
+This early \CFACC{} provided basic functionality, but incorporated a number of poor algorithmic choices due to a rushed implementation time frame, and as such lacked the runtime performance required for practical use; this thesis is substantially concerned with rectifying those deficits.
+The \CFA{} project was revived in 2015 with the intention of building a production-ready language and compiler; at the time of this writing, both \CFA{} and \CFACC{} have been under active development continuously since.
+As this development has been proceeding concurrently with the work described in this thesis, the state of \CFA{} has been somewhat of a moving target; however, Moss~\etal\cite{Moss18} provides a reasonable summary of the current design.
+Notable features added during this period include generic types (Chapter~\ref{generic-chap}), constructors and destructors\cite{Schluntz17}, improved support for tuples\cite{Schluntz17}, reference types\cite{Moss18}, first-class concurrent and parallel programming support\cite{Delisle18}, as well as numerous pieces of syntactic sugar and the start of an idiomatic standard library\cite{Moss18}.
+\section{\CFA{} Features}
+The core design of \CFA{} is laid out in Glen Ditchfield's 1992 PhD thesis, \emph{Contextual Polymorphism} \cite{Ditchfield92}; in that thesis, Ditchfield presents the theoretical underpinnings of the \CFA{} polymorphism model.
+Building on Ditchfield's design for contextual polymorphism as well as KW-C \cite{Buhr94a}, an earlier set of (largely syntactic) extensions to C, Richard Bilson \cite{Bilson03} built the first version of the \CFA{} compiler, \CFACC{}, in the early 2000's.
+This early \CFACC{} provided basic functionality, but incorporated a number of algorithmic choices that have failed to scale as \CFA{} has developed, lacking the runtime performance for practical use; this thesis is substantially concerned with rectifying those deficits.
+The \CFA{} project was revived in 2015 with the intention of building a production-ready language and compiler; at the time of this writing, both \CFA{} and \CFACC{} remain under active development.
+As this development has been proceeding concurrently with the work described in this thesis, the state of \CFA{} has been somewhat of a moving target; however, Moss~\etal~\cite{Moss18} provides a reasonable summary of the current design.
+Notable features added during this period include generic types (Chapter~\ref{generic-chap}), constructors and destructors \cite{Schluntz17}, improved support for tuples \cite{Schluntz17}, reference types \cite{Moss18}, first-class concurrent and parallel programming support \cite{Delisle18}, as well as numerous pieces of syntactic sugar and the start of an idiomatic standard library \cite{Moss18}.
 The selection of features presented in this chapter are chosen to elucidate the design constraints of the work presented in this thesis.
 In some cases the interactions of multiple features make this design a significantly more complex problem than any individual feature would; in other cases a feature that does not by itself add any complexity to expression resolution triggers previously rare edge cases more frequently.
 \subsection{Procedural Paradigm}
+In some cases the interactions of multiple features make this design a significantly more complex problem than any individual feature; in other cases a feature that does not by itself add any complexity to expression resolution triggers previously rare edge cases more frequently.
+\section{Procedural Paradigm}
 It is important to note that \CFA{} is not an object-oriented language.
+This is a deliberate choice intended to maintain the applicability of the mental model and language idioms already possessed by C programmers.
+This choice is in marked contrast to \CC{}, which, though it has backward-compatibility with C on the source code level, is a much larger and more complex language, and requires extensive developer re-training to write idiomatic, efficient code in \CC{}'s object-oriented paradigm.
+\CFA{} does have a system of implicit type conversions derived from C's ``usual arithmetic conversions''; while these conversions may be thought of as something like an inheritance hierarchy, the underlying semantics are significantly different and such an analogy is loose at best.
+This is a deliberate choice intended to maintain the applicability of the programming model and language idioms already possessed by C programmers.
+This choice is in marked contrast to \CC{}, which is a much larger and more complex language, and requires extensive developer re-training to write idiomatic, efficient code in \CC{}'s object-oriented paradigm.
 Particularly, \CFA{} has no concept of \emph{subclass}, and thus no need to integrate an inheritance-based form of polymorphism with its parametric and overloading-based polymorphism.
+The graph structure of the \CFA{} type conversions is also markedly different than an inheritance hierarchy; it has neither a top nor a bottom type, and does not satisfy the lattice properties typical of inheritance hierarchies.
+While \CFA{} does have a system of implicit type conversions derived from C's ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} and these conversions may be thought of as something like an inheritance hierarchy, the underlying semantics are significantly different and such an analogy is loose at best.
+The graph structure of the \CFA{} type conversions (discussed in Section~\ref{conv-cost-sec}) is also markedly different than an inheritance hierarchy; it has neither a top nor a bottom type, and does not satisfy the lattice properties typical of inheritance hierarchies.
 Another aspect of \CFA{}'s procedural paradigm is that it retains C's translation-unit-based encapsulation model, rather than class-based encapsulation such as in \CC{}.
 This choice implies that that separate compilation must be maintained to allow headers to act as an encapsulation boundary, rather than the header-only libraries used by \CC{} templates.
 \subsection{Name Overloading} \label{overloading-sec}
 In C, no more than one variable or function in the same scope may share the same name\footnote{Technically, C has multiple separated namespaces, one holding \lstinline{struct}, \lstinline{union}, and \lstinline{enum} tags, one holding labels, one holding \lstinline{typedef} names, variable, function, and enumerator identifiers, and one for each \lstinline{struct} and \lstinline{union} type holding the field names\cit{}.}, and variable or function declarations in inner scopes with the same name as a declaration in an outer scope hide the outer declaration.
 This restriction makes finding the proper declaration to match to a variable expression or function application a simple matter of symbol-table lookup, which can be easily and efficiently implemented.
+This design choice implies that separate compilation must be maintained to allow headers to act as an encapsulation boundary, rather than the header-only libraries used by \CC{} templates.
+\section{Name Overloading} \label{overloading-sec}
+In C, no more than one variable or function in the same scope may share the same name\footnote{Technically, C has multiple separated namespaces, one holding \lstinline{struct}, \lstinline{union}, and \lstinline{enum} tags, one holding labels, one holding \lstinline{typedef} names, variable, function, and enumerator identifiers, and one for each \lstinline{struct} and \lstinline{union} type holding the field names \cite[\S{}6.2.3]{C11}.}, and variable or function declarations in inner scopes with the same name as a declaration in an outer scope hide the outer declaration.
+This restriction makes finding the proper declaration to match to a variable expression or function application a simple matter of lexically-scoped name lookup, which can be easily and efficiently implemented.
 \CFA{}, on the other hand, allows overloading of variable and function names so long as the overloaded declarations do not have the same type, avoiding the multiplication of variable and function names for different types common in the C standard library, as in the following example:
 …
 \end{cfa}
+While the wisdom of giving both the maximum value of a type and the function to take the maximum of two values the same name is debatable, \eg{} some developers may prefer !MAX! for the former, the guiding philosophy of \CFA{} is ``describe, don't prescribe'' --- we prefer to trust programmers with powerful tools, and there is no technical reason to restrict overloading between variables and functions.
+However, the expressivity of \CFA{}'s name overloading has the consequence that simple table lookup is insufficient to match identifiers to declarations, and a type-matching algorithm must be part of expression resolution.
+\subsubsection{Operator Overloading}
+The final expression in the preceding example includes a feature of \CFA{} name overloading not shared by \CC{}, the ability to disambiguate expressions based on their return type. This provides greater flexibility and power than the parameter-based overload selection of \CC{}, though at the cost of greater complexity in the resolution algorithm.
+While the wisdom of giving both the maximum value of a type and the function to take the maximum of two values the same name in the example above is debatable, \eg{} some developers may prefer !MAX! for the former, the guiding philosophy of \CFA{} is ``describe, don't prescribe'' --- we prefer to trust programmers with powerful tools, and there is no technical reason to restrict overloading between variables and functions.
+However, the expressivity of \CFA{}'s name overloading does have the consequence that simple table lookup is insufficient to match identifiers to declarations, and a type-matching algorithm must be part of expression resolution.
+\subsection{Operator Overloading}
 C does allow name overloading in one context: operator overloading.
 From the perspective of the type system, there is nothing special about operators as opposed to other functions, nor is it desirable to restrict the clear and readable syntax of operators to only the built-in types.
+For these reasons, \CFA{} also allows overloading of operators by writing specially-named functions where !?! stands in for the operands\footnote{This example uses \CFA{}'s reference types, described in Section~\ref{type-features-sec}}:
+For these reasons, \CFA{}, like \CC{} and many other programming languages, allows overloading of operators by writing specially-named functions where !?! stands in for the operands.
+This syntax is more natural than the operator overloading syntax of \CC{}, which requires ``dummy'' parameters to disambiguate overloads of similarly-named pre- and postfix operators\footnote{This example uses \CFA{}'s reference types, described in Section~\ref{type-features-sec}}:
 \begin{cfa}
 …
 \end{cfa}
 Together, \CFA{}'s backward-compatibility with C and the inclusion of this operator overloading feature imply that \CFA{} must select among function overloads using a method compatible with C's ``usual arithmetic conversions''\cit{}, so as to present user programmers with only a single set of overloading rules.
 \subsubsection{Special Literal Types}
+Together, \CFA{}'s backward-compatibility with C and the inclusion of this operator overloading feature imply that \CFA{} must select among function overloads using a method compatible with C's ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11}, so as to present user programmers with only a single set of overloading rules.
+\subsection{Special Literal Types}
 Literal !0! is also used polymorphically in C; it may be either integer zero or the null value of any pointer type.
+\CFA{} provides a special type for the !0! literal, !zero_t!, so that users can define a zero value for their own types without being forced to create a conversion from an integer or pointer type (though \CFA{} also includes implicit conversions from !zero_t! to the integer and pointer types for backward compatibility).
+According to the C standard\cit{}, !0! is the only false value; any value that compares equal to zero is false, while any value that does not is true.
+By this rule, boolean contexts such as !if ( x )! can always be equivalently rewritten as \lstinline{if ( (x) != 0 )}.
+\CFACC{} applies this rewriting in all boolean contexts, so any type !T! can be made ``truthy'' (that is, given a boolean interpretation) in \CFA{} by defining an operator overload \lstinline{int ?!=?(T, zero_t)}; unlike \CC{} prior to the addition of explicit casts in \CCeleven{}, this design does not add comparability or convertablity to arbitrary integer types.
+\CFA{} provides a special type for the !0! literal, !zero_t!, so that users can define a zero value for their own types without being forced to create a conversion from an integer or pointer type; \CFA{} also includes implicit conversions from !zero_t! to the !int! and pointer type constructors\footnote{See Section~\ref{type-features-sec}} from !zero_t! for backward compatibility.
+According to the C standard \cite[\S{}6.8.4.1]{C11}, !0! is the only false value; any value that compares equal to zero is false, while any value that does not is true.
+By this rule, Boolean contexts such as !if ( x )! can always be equivalently rewritten as \lstinline{if ( (x) != 0 )}.
+\CFACC{} applies this rewriting in all Boolean contexts, so any type !T! can be made ``truthy'' (that is, given a Boolean interpretation) in \CFA{} by defining an operator overload \lstinline{int ?!=?(T, zero_t)}.
+\CC{} takes a different approach to user-defined truthy types, allowing definition of an implicit conversion operator to !bool!; prior to the introduction of the !explicit! keyword for conversion operators in \CCeleven{} this approach also allowed undesired implicit conversions to all other arithmetic types, a shortcoming not shared by the \CFA{} design.
 \CFA{} also includes a special type for !1!, !one_t!; like !zero_t!, !one_t! has built-in implicit conversions to the various integral types so that !1! maintains its expected semantics in legacy code.
 …
 \CFA{} previously allowed !0! and !1! to be the names of polymorphic variables, with separate overloads for !int 0!, !int 1!, and !forall(dtype T) T* 0!.
 As revealed in my own work on generic types (Chapter~\ref{generic-chap}), the parameteric polymorphic zero variable was not generalizable to other types; though all null pointers have the same in-memory representation, the same cannot be said of the zero values of arbitrary types.
+While designing \CFA{} generic types (see Chapter~\ref{generic-chap}), it was discovered that the parametric polymorphic zero variable is not generalizable to other types; though all null pointers have the same in-memory representation, the same cannot be said of the zero values of arbitrary types.
 As such, variables that could represent !0! and !1! were phased out in favour of functions that could generate those values for a given type as appropriate.
 \subsection{Polymorphic Functions} \label{poly-func-sec}
 The most significant feature \CFA{} adds is parametric-polymorphic functions.
+\section{Polymorphic Functions} \label{poly-func-sec}
+The most significant type-system feature \CFA{} adds is parametric-polymorphic functions.
 Such functions are written using a !forall! clause (which gives the language its name):
 …
 The type variable !T! is transformed into a set of additional implicit parameters to !identity!, which encode sufficient information about !T! to create and return a variable of that type.
 \CFA{} passes the size and alignment of the type represented by an !otype! parameter, as well as a default constructor, copy constructor, assignment operator, and destructor.
+Types which do not have one or more of these operators visible cannot be bound to !otype! parameters.
+In this design, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; experiments have shown this overhead to be similar to \CC{} virtual function calls. \TODO{rerun experiments, possibly look at vtable variant}
+Types which do not have one or more of these operators visible cannot be bound to !otype! parameters, but may be bound to un-constrained !dtype! (``data type'') type variables.
+In this design, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; the experiments in Chapter~\ref{generic-chap} indicate that this overhead is comparable to that of \CC{} virtual function calls.
+% \TODO{rerun experiments, possibly look at vtable variant}
 One benefit of this design is that it allows polymorphic functions to be separately compiled.
 …
 The fact that there is only one implementation of each polymorphic function also reduces compile times relative to the template-expansion approach taken by \CC{}, as well as reducing binary sizes and runtime pressure on instruction cache by re-using a single version of each function.
 \subsubsection{Type Assertions}
 Since bare polymorphic types do not provide a great range of available operations, \CFA{} provides a \emph{type assertion} mechanism to provide further information about a type:
+\subsection{Type Assertions}
+Since bare polymorphic types do not provide a great range of available operations, \CFA{} provides a \emph{type assertion} mechanism to provide further information about a type\footnote{Subscript not included in source code.\label{sub-foot}}:
 \begin{cfa}
 forall(otype T `| { T twice(T); }`)
 T four_times(T x) { return twice( twice(x) ); }
 double twice(double d) { return d * 2.0; }  $\C[2.75in]{// (1)}$
+double twice$\(_1\)$(double d) { return d * 2.0; }
 double ans = four_times( 10.5 );  $\C[2.75in]{// T bound to double, ans == 42.0}$
 …
 Monomorphic specializations of polymorphic functions can themselves be used to satisfy type assertions.
 For instance, !twice! could have been defined like this:
+For instance, !twice! could have been defined like this\footref{sub-foot}:
 \begin{cfa}
 forall(otype S | { S ?+?(S, S); })
+S twice(S x) { return x + x; }  $\C[2.75in]{// (2)}
+\end{cfa}
+This version of !twice! works for any type !S! that has an addition operator defined for it, and it could be used to satisfy the type assertion on !four_times!.
+\CFACC{} accomplishes this by creating a wrapper function calling !twice//(2)! with !S! bound to !double!, then providing this wrapper function to !four_times!\footnote{\lstinline{twice // (2)} could also have had a type parameter named \lstinline{T}; \CFA{} specifies renaming of the type parameters, which would avoid the name conflict with the type variable \lstinline{T} of \lstinline{four_times}}.
+S twice$\(_2\)$(S x) { return x + x; }
+\end{cfa}
+Specializing this polymorphic function with !S = double! produces a monomorphic function which can  be used to satisfy the type assertion on !four_times!.
+\CFACC{} accomplishes this by creating a wrapper function calling !twice!$_2$ with !S! bound to !double!, then providing this wrapper function to !four_times!\footnote{\lstinline{twice}$_2$ could also have had a type parameter named \lstinline{T}; \CFA{} specifies renaming of the type parameters, which would avoid the name conflict with the type variable \lstinline{T} of \lstinline{four_times}}.
+However, !twice!$_2$ also works for any type !S! that has an addition operator defined for it.
 Finding appropriate functions to satisfy type assertions is essentially a recursive case of expression resolution, as it takes a name (that of the type assertion) and attempts to match it to a suitable declaration in the current scope.
 If a polymorphic function can be used to satisfy one of its own type assertions, this recursion may not terminate, as it is possible that that function is examined as a candidate for its own assertion unboundedly repeatedly.
+To avoid such infinite loops, \CFACC{} imposes a fixed limit on the possible depth of recursion, similar to that employed by most \CC{} compilers for template expansion; this restriction means that there are some semantically well-typed expressions that cannot be resolved by \CFACC{}.
+\TODO{Update this with final state} One contribution made in the course of this thesis was modifying \CFACC{} to use the more flexible expression resolution algorithm for assertion matching, rather than the simpler but limited previous approach of unification on the types of the functions.
+\subsubsection{Deleted Declarations}
+Particular type combinations can be exempted from matching a given polymorphic function through use of a \emph{deleted function declaration}:
+\begin{cfa}
+int somefn(char) = void;
+\end{cfa}
+This feature is based on a \CCeleven{} feature typically used to make a type non-copyable by deleting its copy constructor and assignment operator\footnote{In previous versions of \CC{}, a type could be made non-copyable by declaring a private copy constructor and assignment operator, but not defining either. This idiom is well-known, but depends on some rather subtle and \CC{}-specific rules about private members and implicitly-generated functions; the deleted function form is both clearer and less verbose.} or forbidding some interpretations of a polymorphic function by specifically deleting the forbidden overloads\footnote{Specific polymorphic function overloads can also be forbidden in previous \CC{} versions through use of template metaprogramming techniques, though this advanced usage is beyond the skills of many programmers. A similar effect can be produced on an ad-hoc basis at the appropriate call sites through use of casts to determine the function type. In both cases, the deleted-function form is clearer and more concise.}.
+Deleted function declarations are implemented in \CFACC{} by adding them to the symbol table as usual, but with a flag set that indicates that the function is deleted.
+If this deleted declaration is selected as the unique minimal-cost interpretation of an expression than an error is produced. \TODO{Check this is implemented at print.}
+\subsubsection{Traits}
+\CFA{} provides \emph{traits} as a means to name a group of type assertions, as in the example below\footnote{This example uses \CFA{}'s reference types, constructors, and zero type, described in Section~\ref{type-features-sec}.}:
+To avoid such infinite loops, \CFACC{} imposes a fixed limit on the possible depth of recursion, similar to that employed by most \CC{} compilers for template expansion; this restriction means that there are some otherwise semantically well-typed expressions that cannot be resolved by \CFACC{}.
+% \subsection{Deleted Declarations}
+% Particular type combinations can be exempted from matching a given polymorphic function through use of a \emph{deleted function declaration}:
+% \begin{cfa}
+% int somefn(char) = void;
+% \end{cfa}
+% This feature is based on a \CCeleven{} feature typically used to make a type non-copyable by deleting its copy constructor and assignment operator\footnote{In previous versions of \CC{}, a type could be made non-copyable by declaring a private copy constructor and assignment operator, but not defining either. This idiom is well-known, but depends on some rather subtle and \CC{}-specific rules about private members and implicitly-generated functions; the deleted function form is both clearer and less verbose.} or forbidding some interpretations of a polymorphic function by specifically deleting the forbidden overloads\footnote{Specific polymorphic function overloads can also be forbidden in previous \CC{} versions through use of template metaprogramming techniques, though this advanced usage is beyond the skills of many programmers. A similar effect can be produced on an ad-hoc basis at the appropriate call sites through use of casts to determine the function type. In both cases, the deleted-function form is clearer and more concise.}.
+% Deleted function declarations are implemented in \CFACC{} by adding them to the symbol table as usual, but with a flag set that indicates that the function is deleted.
+% If this deleted declaration is selected as the unique minimal-cost interpretation of an expression than an error is produced.
+\subsection{Traits}
+\CFA{} provides \emph{traits} as a means to name a group of type assertions, as in the example below\footnote{This example uses \CFA{}'s reference types and constructors, described in Section~\ref{type-features-sec}.}:
 \begin{cfa}
 …
 Semantically, traits are simply a named list of type assertions, but they may be used for many of the same purposes that interfaces in Java or abstract base classes in \CC{} are used for.
 Unlike Java interfaces or \CC{} base classes, \CFA{} types do not explicitly state any inheritance relationship to traits they satisfy; this can be considered a form a structural inheritance, similar to interface implementation in Go, as opposed to the nominal inheritance model of Java and \CC{}.
+Unlike Java interfaces or \CC{} base classes, \CFA{} types do not explicitly state any inheritance relationship to traits they satisfy; this can be considered a form of structural inheritance, similar to interface implementation in Go, as opposed to the nominal inheritance model of Java and \CC{}.
 Nominal inheritance can be simulated in \CFA{} using marker variables in traits:
 …
 Traits, however, are significantly more powerful than nominal-inheritance interfaces; firstly, due to the scoping rules of the declarations that satisfy a trait's type assertions, a type may not satisfy a trait everywhere that that type is declared, as with !char! and the !nominal! trait above.
 Secondly, because \CFA{} is not object-oriented and types do not have a closed set of methods, existing C library types can be extended to implement a trait simply by writing the requisite functions.
+Secondly, because \CFA{} is not object-oriented and types do not have a closed set of methods, existing C library types can be extended to implement a trait simply by writing the requisite functions\footnote{\CC{} only allows partial extension of C types, because constructors, destructors, conversions, and the assignment, indexing, and function-call operators may only be defined in a class; \CFA{} does not have any of these restrictions.}.
 Finally, traits may be used to declare a relationship among multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems\footnote{This example uses \CFA{}'s reference types, described in Section~\ref{type-features-sec}.}:
 …
 \end{cfa}
 In the example above, !(list_iterator, int)! satisfies !pointer_like! by the user-defined dereference function, and !(list_iterator, list)! also satisfies !pointer_like! by the built-in dereference operator for pointers.
+In this example above, !(list_iterator, int)! satisfies !pointer_like! by the user-defined dereference function, and !(list_iterator, list)! also satisfies !pointer_like! by the built-in dereference operator for pointers.
 Given a declaration !list_iterator it!, !*it! can be either an !int! or a !list!, with the meaning disambiguated by context (\eg{} !int x = *it;! interprets !*it! as !int!, while !(*it).value = 42;! interprets !*it! as !list!).
 While a nominal-inheritance system with associated types could model one of those two relationships by making !El! an associated type of !Ptr! in the !pointer_like! implementation, few such systems could model both relationships simultaneously.
 The flexibility of \CFA{}'s implicit trait-satisfaction mechanism provides programmers with a great deal of power, but also blocks some optimization approaches for expression resolution.
 The ability of types to begin or cease to satisfy traits when declarations go into or out of scope makes caching of trait satisfaction judgements difficult, and the ability of traits to take multiple type parameters can lead to a combinatorial explosion of work in any attempt to pre-compute trait satisfaction relationships.
 \subsection{Implicit Conversions} \label{implicit-conv-sec}
 In addition to the multiple interpretations of an expression produced by name overloading and polymorphic functions, for backward compatibility \CFA{} must support all of the implicit conversions present in C, producing further candidate interpretations for expressions.
 As mentioned above, C does not have an inheritance hierarchy of types, but the C standard's rules for the ``usual arithmetic conversions'\cit{} define which of the built-in types are implicitly convertible to which other types, and the relative cost of any pair of such conversions from a single source type.
+The ability of types to begin or cease to satisfy traits when declarations go into or out of scope makes caching of trait satisfaction judgments difficult, and the ability of traits to take multiple type parameters can lead to a combinatorial explosion of work in any attempt to pre-compute trait satisfaction relationships.
+\section{Implicit Conversions} \label{implicit-conv-sec}
+In addition to the multiple interpretations of an expression produced by name overloading and polymorphic functions, \CFA{} must support all of the implicit conversions present in C for backward compatibility, producing further candidate interpretations for expressions.
+As mentioned above, C does not have an inheritance hierarchy of types, but the C standard's rules for the ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} define which of the built-in types are implicitly convertible to which other types, as well as which implicit conversions to apply for mixed arguments to binary operators.
 \CFA{} adds rules to the usual arithmetic conversions defining the cost of binding a polymorphic type variable in a function call; such bindings are cheaper than any \emph{unsafe} (narrowing) conversion, \eg{} !int! to !char!, but more expensive than any \emph{safe} (widening) conversion, \eg{} !int! to !double!.
 One contribution of this thesis, discussed in Section \TODO{add to resolution chapter}, is a number of refinements to this cost model to more efficiently resolve polymorphic function calls.
 The expression resolution problem which is the focus of Chapter~\ref{resolution-chap} is to find the unique minimal-cost interpretation of each expression in the program, where all identifiers must be matched to a declaration, and implicit conversions or polymorphic bindings of the result of an expression may increase the cost of the expression.
 While semantically valid \CFA{} code always has such a unique minimal-cost interpretation, \CFACC{} must also be able to detect and report as errors expressions which have either no interpretation or multiple ambiguous minimal-cost interpretations.
+One contribution of this thesis, discussed in Section~\ref{conv-cost-sec}, is a number of refinements to this cost model to more efficiently resolve polymorphic function calls.
+The expression resolution problem, which is the focus of Chapter~\ref{resolution-chap}, is to find the unique minimal-cost interpretation of each expression in the program, where all identifiers must be matched to a declaration, and implicit conversions or polymorphic bindings of the result of an expression may increase the cost of the expression.
+While semantically valid \CFA{} code always has such a unique minimal-cost interpretation, \CFACC{} must also be able to detect and report as errors expressions that have either no interpretation or multiple ambiguous minimal-cost interpretations.
 Note that which subexpression interpretation is minimal-cost may require contextual information to disambiguate.
 For instance, in the example in Section~\ref{overloading-sec}, !max(max, -max)! cannot be unambiguously resolved, but !int m = max(max, -max)! has a single minimal-cost resolution.
 While the interpretation !int m = (int)max((double)max, -(double)max)! is also a valid interpretation, it is not minimal-cost due to the unsafe cast from the !double! result of !max! to !int!\footnote{The two \lstinline{double} casts function as type ascriptions selecting \lstinline{double max} rather than casts from \lstinline{int max} to \lstinline{double}, and as such are zero-cost.}.
+While the interpretation !int m = (int)max((double)max, -(double)max)! is also a valid interpretation, it is not minimal-cost due to the unsafe cast from the !double! result of !max! to !int!\footnote{The two \lstinline{double} casts function as type ascriptions selecting \lstinline{double max} rather than casts from \lstinline{int max} to \lstinline{double}, and as such are zero-cost. The \lstinline{int} to \lstinline{double} conversion could be forced if desired with two casts: \lstinline{(double)(int)max}}.
 These contextual effects make the expression resolution problem for \CFA{} both theoretically and practically difficult, but the observation driving the work in Chapter~\ref{resolution-chap} is that of the many top-level expressions in a given program, most are straightforward and idiomatic so that programmers writing and maintaining the code can easily understand them; it follows that effective heuristics for common cases can bring down compiler runtime enough that a small proportion of harder-to-resolve expressions does not inordinately increase overall compiler runtime or memory usage.
 \subsection{Type Features} \label{type-features-sec}
 The name overloading and polymorphism features of \CFA{} have the greatest effect on language design and compiler runtime, but there are a number of other features in the type system which have a smaller effect but are useful for code examples.
+\section{Type Features} \label{type-features-sec}
+The name overloading and polymorphism features of \CFA{} have the greatest effect on language design and compiler runtime, but there are a number of other features in the type system that have a smaller effect but are useful for code examples.
 These features are described here.
 \subsubsection{Reference Types}
 One of the key ergonomic improvements in \CFA{} is reference types, designed and implemented by Robert Schluntz\cite{Schluntz17}.
+\subsection{Reference Types}
+One of the key ergonomic improvements in \CFA{} is reference types, designed and implemented by Robert Schluntz \cite{Schluntz17}.
 Given some type !T!, a !T&! (``reference to !T!'') is essentially an automatically dereferenced pointer.
 These types allow seamless pass-by-reference for function parameters, without the extraneous dereferencing syntax present in C; they also allow easy easy aliasing of nested values with a similarly convenient syntax.
 A particular improvement is removing syntactic special cases for operators which take or return mutable values; for example, the use !a += b! of a compound assignment operator now matches its signature, !int& ?+=?(int&, int)!, as opposed to the previous syntactic special cases to automatically take the address of the first argument to !+=! and to mark its return value as mutable.
 The C standard makes heavy use of the concept of \emph{lvalue}, an expression with a memory address; its complement, \emph{rvalue} (a non-addressable expression) is not explicitly named.
 In \CFA{}, the distinction between lvalue and rvalue can be reframed in terms of reference and non-reference types, with the benefit of being able to express the difference in user code.
+These types allow seamless pass-by-reference for function parameters, without the extraneous dereferencing syntax present in C; they also allow easy aliasing of nested values with a similarly convenient syntax.
+A particular improvement is removing syntactic special cases for operators that take or return mutable values; for example, the use !a += b! of a compound assignment operator now matches its signature, !int& ?+=?(int&, int)!, as opposed to the syntactic special cases in earlier versions of \CFA{} to automatically take the address of the first argument to !+=! and to mark its return value as mutable.
+The C standard makes heavy use of the concept of \emph{lvalue}, an expression with a memory address; its complement, \emph{rvalue} (a non-addressable expression) is not explicitly named in the standard.
+In \CFA{}, the distinction between lvalue and rvalue can be re-framed in terms of reference and non-reference types, with the benefit of being able to express the difference in user code.
 \CFA{} references preserve the existing qualifier-dropping implicit lvalue-to-rvalue conversion from C (\eg{} a !const volatile int&! can be implicitly copied to a bare !int!)
 To make reference types more easily usable in legacy pass-by-value code, \CFA{} also adds an implicit rvalue-to-lvalue conversion, implemented by storing the value in a fresh compiler-generated temporary variable and passing a reference to that temporary.
 To mitigate the ``!const! hell'' problem present in \CC{}, there is also a qualifier-dropping lvalue-to-lvalue conversion, also implemented by copying into a temporary:
+To make reference types more easily usable in legacy pass-by-value code, \CFA{} also adds an implicit rvalue-to-lvalue conversion, implemented by storing the value in a compiler-generated temporary variable and passing a reference to that temporary.
+To mitigate the ``!const! hell'' problem present in \CC{}, there is also a qualifier-dropping lvalue-to-lvalue conversion implemented by copying into a temporary:
 \begin{cfa}
 …
 The primary issue with \CC{} references is that it is impossible to extract the address of the reference variable rather than the address of the referred-to variable.
 This breaks a number of the usual compositional properties of the \CC{} type system, \eg{} a reference cannot be re-bound to another variable, nor is it possible to take a pointer to, array of, or reference to a reference.
 \CFA{} supports all of these use cases \TODO{test array} without further added syntax.
+\CFA{} supports all of these use cases without further added syntax.
 The key to this syntax-free feature support is an observation made by the author that the address of a reference is a lvalue.
 In C, the address-of operator !&x! can only be applied to lvalue expressions, and always produces an immutable rvalue; \CFA{} supports reference re-binding by assignment to the address of a reference, and pointers to references by repeating the address-of operator:
 …
 For better compatibility with C, the \CFA{} team has chosen not to differentiate function overloads based on top-level reference types, and as such their contribution to the difficulty of \CFA{} expression resolution is largely restricted to the implementation details of normalization conversions and adapters.
 \subsubsection{Resource Management}
 \CFA{} also supports the RAII (``Resource Acquisition is Initialization'') idiom originated by \CC{}, thanks to the object lifetime work of Robert Schluntz\cite{Schluntz17}.
+\subsection{Resource Management}
+\CFA{} also supports the RAII (``Resource Acquisition is Initialization'') idiom originated by \CC{}, thanks to the object lifetime work of Robert Schluntz \cite{Schluntz17}.
 This idiom allows a safer and more principled approach to resource management by tying acquisition of a resource to object initialization, with the corresponding resource release executed automatically at object finalization.
 A wide variety of conceptual resources may be conveniently managed by this scheme, including heap memory, file handles, and software locks.
 …
 \end{cfa}
 \subsubsection{Tuple Types}
+\subsection{Tuple Types}
 \CFA{} adds \emph{tuple types} to C, a syntactic facility for referring to lists of values anonymously or with a single identifier.
 An identifier may name a tuple, a function may return one, and a tuple may be implicitly \emph{destructured} into its component values.
 The implementation of tuples in \CFACC{}'s code generation is based on the generic types introduced in Chapter~\ref{generic-chap}, with one compiler-generated generic type for each tuple arity.
 This allows tuples to take advantage of the same runtime optimizations available to generic types, while reducing code bloat.
 An extended presentation of the tuple features of \CFA{} can be found in \cite{Moss18}, but the following example shows the basics:
 \begin{cfa}
 [char, char] x = ['!', '?']; $\C{// (1); tuple type and expression syntax}$
 int x = 2; $\C{// (2)}$
+This reuse allows tuples to take advantage of the same runtime optimizations available to generic types, while reducing code bloat.
+An extended presentation of the tuple features of \CFA{} can be found in \cite{Moss18}, but the following example demonstrates the basic features:
+\begin{cfa}
+[char, char] x$\(_1\)$ = ['!', '?']; $\C{// tuple type and expression syntax}$
+int x$\(_2\)$ = 2;
 forall(otype T)
 [T, T] swap( T a, T b ) { $\C{// (3)}$
+[T, T] swap$\(_1\)$( T a, T b ) {
         return [b, a]; $\C{// one-line swap syntax}$
+}
 x = swap( x ); $\C{// destructure [char, char] x into two elements}$
 $\C{// cannot use int x, not enough arguments}$
 void swap( int, char, char ); $\C{// (4)}$
 swap( x, x ); $\C{// (4) on (2), (1)}$
 $\C{// not (3) on (2), (2) due to polymorphism cost}$
+x = swap( x ); $\C{// destructure x\(_1\) into two elements}$
+$\C{// cannot use x\(_2\), not enough arguments}$
+void swap$\(_2\)$( int, char, char );
+swap( x, x ); $\C{// swap\(_2\)( x\(_2\), x\(_1\) )}$
+$\C{// not swap\(_1\)( x\(_2\), x\(_2\) ) due to polymorphism cost}$
 \end{cfa}
 Tuple destructuring breaks the one-to-one relationship between identifiers and values.
+This precludes some argument-parameter matching strategies for expression resolution, as well as cheap interpretation filters based on comparing number of parameters and arguments.
+As an example, in the call to !swap( x, x )! above, the second !x! can be resolved starting at the second or third parameter of !swap!, depending which interpretation of !x! was chosen for the first argument.
+Hence, some argument-parameter matching strategies for expression resolution are precluded, as well as cheap interpretation filters based on comparing number of parameters and arguments.
+As an example, in the call to !swap( x, x )! above, the second !x! can be resolved starting at the second or third parameter of !swap!$_2$, depending which interpretation of !x! is chosen for the first argument.
+\section{Conclusion}
+\CFA{} adds a significant number of features to standard C, increasing the expressivity and re-usability of \CFA{} code while maintaining backwards compatibility for both code and larger language paradigms.
+This flexibility does incur significant added compilation costs, however, the mitigation of which are the primary concern of this thesis.

doc/theses/aaron_moss_PhD/phd/conclusion.tex

-              r1c35c78
+              r8b590a4
 \chapter{Conclusion}
+Wrap it up --- Done, done done.
+Decades after its first standardization, the C language remains a widely-used tool and a vital part of the software development landscape.
+The \CFA{} language under development at the University of Waterloo represents an evolutionary modernization of C with expressive modern language features paired with strong C backwards-compatibility.
+This thesis has contributed to these project goals in a variety of ways, including the addition of a generic-types language feature (Chapter~\ref{generic-chap}) and refinement of the \CFA{} overload selection rules to produce a more expressive and intuitive model (Section~\ref{conv-cost-sec}).
+Based on the technical contribution of the resolver prototype system (Section~\ref{rp-features-sec}), I have also made significant improvements to \CFA{} compilation performance, including un-combined bottom-up expression traversal (Section~\ref{arg-parm-matching-sec}), deferred-cached assertion satisfaction (Section~\ref{assn-sat-sec}), and a novel persistent union-find type environment data structure (Section~\ref{env-persistent-union-find}).
+The combination of these practical improvements and added features significantly improve the viability of \CFA{} as a practical programming language.
+Further improvements to the \CFA{} type-system are still possible, however.
+One area suggested by this work is development of a scheme for user-defined conversions; to integrate properly with the \CFA{} conversion model, there would need to be a distinction between safe and unsafe conversions, and possibly a way to denote conversions as explicit-only or non-chainable.
+Another place for ongoing effort is improvement of compilation performance; I believe the most promising direction for that is rebuilding the \CFA{} compiler on a different framework than Bilson's \CFACC{}.
+The resolver prototype presented in this work has good performance and already has the basics of \CFA{} semantics implemented, as well as many of the necessary core data structures, and would be a viable candidate for a new compiler architecture.
+An alternate approach would be to fork an existing C compiler such as Clang~\cite{Clang}, which would need to be modified to use one of the resolution algorithms discussed here, as well as various other features introduced by Bilson~\cite{Bilson03}.

doc/theses/aaron_moss_PhD/phd/evaluation/generic-timing.dat

r1c35c78	r8b590a4
8	8	"clear\npair" 2840 773 748 3511
9	9	"pop\npair" 3025 5414 813 23867
10

doc/theses/aaron_moss_PhD/phd/frontpgs.tex

-              r1c35c78
+              r8b590a4
         \bigskip
+%       \noindent
+%   \begin{tabbing}
+%   Internal-External Member: \=  \kill % using longest text to define tab length
+%   External Examiner: \>  Bruce Bruce \\
+%   \> Professor, Dept. of Philosophy of Zoology, University of Wallamaloo \\
+%   \end{tabbing}
+%       \bigskip
+        \noindent
+  \begin{tabbing}
+  Internal-External Member: \=  \kill % using longest text to define tab length
+  External Examiner: \>  Doug Lea \\
+  \> Professor, Computer Science Department, \\
+  \> State Univesity of New York at Oswego \\
+  \end{tabbing}
+        \bigskip
         \noindent
 …
   Internal-External Member: \=  \kill % using longest text to define tab length
   Supervisor: \> Peter Buhr \\
+  \> Professor, School of Computer Science, University of Waterloo \\
+  \> Associate Professor, School of Computer Science, \\
+  \> University of Waterloo \\
   \end{tabbing}
         \bigskip
 …
   \begin{tabbing}
   Internal-External Member: \=  \kill % using longest text to define tab length
+  Internal Members: \> Gregor Richards \\
+  \> Professor, School of Computer Science, University of Waterloo \\
+  \> Ond\v{r}ej Lhot\a'ak \\
+  \> Professor, School of Computer Science, University of Waterloo \\
+  Internal Members: \> Ond\v{r}ej Lhot\a'ak \\
+  \> Associate Professor, School of Computer Science, \\
+  \>University of Waterloo \\
+  \\
+  \> Gregor Richards \\
+  \> Assistant Professor, School of Computer Science, \\
+  \> University of Waterloo \\
   \end{tabbing}
+%       \bigskip
+%       \noindent
+%   \begin{tabbing}
+%   Internal-External Member: \=  \kill % using longest text to define tab length
+%   Internal-External Member: \> Deepa Thotta \\
+%   \> Professor, Dept. of Philosophy, University of Waterloo \\
+%   \end{tabbing}
+%       \bigskip
+        \bigskip
+        \noindent
+  \begin{tabbing}
+  Internal-External Member: \=  \kill % using longest text to define tab length
+  Internal-External Member: \> Werner Dietl \\
+  \> Assistant Professor, Electrical and Computer Engineering, \\
+  \> University of Waterloo \\
+  \end{tabbing}
+%       \bigskip
 %       \noindent
 …
 \begin{center}\textbf{Abstract}\end{center}
+This is the abstract.
+        The C programming language has been an important software development tool for decades.
+        \CFA{} is a new programming language designed with strong backwards-compatibility to take advantage of widely distributed C programming expertise and the large deployed base of C code, paired with modern language features to improve developer productivity.
+        This thesis presents a number of improvements to \CFA{}.
+        The author has developed one major new language feature, generic types, in a way that integrates naturally with both the existing polymorphism features of \CFA{} and the translation-unit-based encapsulation model of C.
+        This thesis also presents a number of smaller refinements to the \CFA{} overload resolution rules, each of which improves the expressivity or intuitive nature of the language.
+        This thesis also includes a number of practical improvements to \CFA{} compilation performance, focused on the expression resolution pass, which is the main bottleneck.
+        These include better algorithms for argument-parameter matching and type assertion satisfaction, as well as a new type-environment data-structure based on a novel variant of union-find.
+        The compilation performance improvements have all been experimentally validated with a new prototype system that encapsulates the key aspects of the \CFA{} language; this prototype is a promising basis for future research and a technical contribution of this work.
 \cleardoublepage
 …
 % -------------------------------
+% \begin{center}\textbf{Acknowledgements}\end{center}
+% I would like to thank all the little people who made this thesis possible.
+% \cleardoublepage
+\begin{center}\textbf{Acknowledgements}\end{center}
+Though a doctoral thesis is an individual project, I could not have completed it without the help and support of many members of my community.
+This thesis would not exist in the form it does without the mentorship of my advisor, Peter Buhr, who has ably led the \CFA{} team while giving me both the advantage of his decades of experience and the freedom to follow my own interests.
+My work on \CFA{} does not exist in a vaccuum, and it has been a pleasure and a privilege to collaborate with the members of the \CFA{} team: Andrew Beach, Richard Bilson, Michael Brooks, Bryan Chan, Thierry Delisle, Glen Ditchfield, Brice Dobry, Rob Schluntz, and others.
+I gratefully acknowledge the financial support of the National Science and Engineering Council of Canada and Huawei Ltd.\ for this project.
+I would also like to thank of my thesis committee, Werner Dietl, Doug Lea, Ond\v{r}ej Lhot\a'ak, and Gregor Richards, for the time and effort they have invested in providing constructive feedback to refine this work.
+I am indebted to Peter van Beek and Ian Munro for their algorithmic expertise and willingness to share their time with me.
+I have far too many colleagues in the Programming Languages Group and School of Computer Science to name, but I deeply appreciate their camaradarie; specifically with regard to the production of this thesis, I would like to thank Nathan Fish for recommending my writing soundtrack, and Sharon Choy for her unfailing supply of encouraging rabbit animations.
+Finally, to all my friends and family who have supported me and made Kitchener-Waterloo home these past seven years, thank you, I could not have done it without you; most especially, Christina Moss, you are the best of wives and best of women, your support has kept me going through the ups and downs of research, and your partnership is key to all my successes.
+\cleardoublepage
 % D E D I C A T I O N
 …
 % \begin{center}\textbf{Dedication}\end{center}
+% This is dedicated to the one I love.
+% To Christina, who has spent too many hours politely listening to me work out the technical minutiae of this thesis, I love you, and I won't make you read it.
 % \cleardoublepage

doc/theses/aaron_moss_PhD/phd/generic-types.tex

-              r1c35c78
+              r8b590a4
 While this approach is flexible and supports integration with the C type checker and tooling, it is also tedious and error prone, especially for more complex data structures.
 A second approach is to use !void*!-based polymorphism, \eg{} the C standard library functions !bsearch! and !qsort!, which allow for the reuse of common functionality.
 However, basing all polymorphism on !void*! eliminates the type checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is otherwise not needed.
+However, basing all polymorphism on !void*! eliminates the type checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is otherwise unnecessary.
 A third approach to generic code is to use preprocessor macros, which does allow the generated code to be both generic and type checked, but errors in such code may be difficult to locate and debug.
 Furthermore, writing and using preprocessor macros is unnatural and inflexible.
 Figure~\ref{bespoke-generic-fig} demonstrates the bespoke approach for a simple linked list with !insert! and !head! operations, while Figure~\ref{void-generic-fig} and Figure~\ref{macro-generic-fig} show the same example using !void*!- and !#define!-based polymorphism, respectively.
+Figure~\ref{bespoke-generic-fig} demonstrates the bespoke approach for a simple linked list with !insert! and !head! operations, while Figure~\ref{void-generic-fig} and Figure~\ref{macro-generic-fig} show the same example using !void*! and !#define!-based polymorphism, respectively.
 \begin{figure}
         \begin{cfa}
+                #include <stdlib.h> $\C{// for malloc}$
+                #include <stdio.h>  $\C{// for printf}$
                 struct int_list { int value; struct int_list* next; };
 …
 \begin{figure}
         \begin{cfa}
+                #include <stdlib.h> $\C{// for malloc}$
+                #include <stdio.h>  $\C{// for printf}$
                 // single code implementation
 …
 \begin{figure}
         \begin{cfa}
+                #include <stdlib.h> $\C{// for malloc}$
+                #include <stdio.h>  $\C{// for printf}$
                 $\C[\textwidth]{// code is nested in macros}$
 …
 \CFA{} generic types integrate efficiently and naturally with the existing polymorphic functions in \CFA{}, while retaining backward compatibility with C in layout and support for separate compilation.
 A generic type can be declared in \CFA{} by placing a !forall! specifier on a !struct! or !union! declaration, and instantiated using a parenthesized list of types after the generic name.
 An example comparable to the C polymorphism examples in Figures~\ref{bespoke-generic-fig}, \ref{void-generic-fig}, and \ref{macro-generic-fig} can be seen in Figure~\ref{cfa-generic-fig} \TODO{test this code}.
+An example comparable to the C polymorphism examples in Figures~\ref{bespoke-generic-fig}, \ref{void-generic-fig}, and \ref{macro-generic-fig} can be seen in Figure~\ref{cfa-generic-fig}.
 \begin{figure}
         \begin{cfa}
+                #include <stdlib.hfa> $\C{// for alloc}$
+                #include <stdio.h>  $\C{// for printf}$
                 forall(otype T) struct list { T value; list(T)* next; };
 …
 \section{Design}
 Though a number of languages have some implementation of generic types, backward compatibility with both C and existing \CFA{} polymorphism presented some unique design constraints for this project.
 The guiding principle was to maintain an unsurprising language model for C programmers without compromising runtime efficiency.
 A key insight for this design was that C already possesses a handful of built-in generic types (\emph{compound types} in the language of the standard\cit{}), notably pointer (!T*!) and array (!T[]!), and that user-definable generics should act similarly.
+Though a number of languages have some implementation of generic types, backward compatibility with both C and existing \CFA{} polymorphism present some unique design constraints for \CFA{} generics.
+The guiding principle is to maintain an unsurprising language model for C programmers without compromising runtime efficiency.
+A key insight for this design is that C already possesses a handful of built-in generic types (\emph{derived types} in the language of the standard \cite[\S{}6.2.5]{C11}), notably pointer (!T*!) and array (!T[]!), and that user-definable generics should act similarly.
 \subsection{Related Work}
 One approach to the design of generic types is that taken by \CC{} templates\cite{C++}.
+One approach to the design of generic types is that taken by \CC{} templates \cite{C++}.
 The template approach is closely related to the macro-expansion approach to C polymorphism demonstrated in Figure~\ref{macro-generic-fig}, but where the macro-expansion syntax has been given first-class language support.
 Template expansion has the benefit of generating code with near-optimal runtime efficiency, as distinct optimizations can be applied for each instantiation of the template.
 On the other hand, template expansion can also lead to significant code bloat, exponential in the worst case\cit{}, and the costs of increased instruction cache pressure at runtime and wasted developer time when compiling cannot be discounted.
+On the other hand, template expansion can also lead to significant code bloat, exponential in the worst case \cite{Haberman16}, and the costs of increased compilation time and instruction cache pressure cannot be ignored.
 The most significant restriction of the \CC{} template model is that it breaks separate compilation and C's translation-unit-based encapsulation mechanisms.
 Because a \CC{} template is not actually code, but rather a sort of ``recipe'' to generate code, template code must be visible at its call site to be used.
 Furthermore, \CC{} template code cannot be type-checked without instantiating it, a time consuming process with no hope of improvement until \CC{} concepts\cite{C++Concepts} are standardized in \CCtwenty{}.
 C code, by contrast, only needs a !struct! or function declaration to call that function or use (by-pointer) values of that type, a desirable property to maintain for \CFA{}.
 Java\cite{Java8} has another prominent implementation for generic types, introduced in Java~5 and based on a significantly different approach than \CC{}.
+Because a \CC{} template is not actually code, but rather a ``recipe'' to generate code, template code must be visible at its call site to be used.
+Furthermore, \CC{} template code cannot be type-checked without instantiating it, a time consuming process with no hope of improvement until \CC{} concepts \cite{C++Concepts} are standardized in \CCtwenty{}.
+C code, by contrast, only needs a function declaration to call that function or a !struct! declaration to use (by-pointer) values of that type, desirable properties to maintain in \CFA{}.
+Java \cite{Java8} has another prominent implementation for generic types, introduced in Java~5 and based on a significantly different approach than \CC{}.
 The Java approach has much more in common with the !void*!-polymorphism shown in Figure~\ref{void-generic-fig}; since in Java nearly all data is stored by reference, the Java approach to polymorphic data is to store pointers to arbitrary data and insert type-checked implicit casts at compile-time.
 This process of \emph{type erasure} has the benefit of allowing a single instantiation of polymorphic code, but relies heavily on Java's object model and garbage collector.
 To use this model, a more C-like language such as \CFA{} would be required to dynamically allocate internal storage for variables, track their lifetime, and properly clean them up afterward.
 Cyclone\cite{Grossman06} is another language extending C, and also provides capabilities for polymorphic functions and existential types, similar to \CFA{}'s !forall! functions and generic types.
+Cyclone \cite{Grossman06} extends C and also provides capabilities for polymorphic functions and existential types which are similar to \CFA{}'s !forall! functions and generic types.
 Cyclone existential types can include function pointers in a construct similar to a virtual function table, but these pointers must be explicitly initialized at some point in the code, which is tedious and error-prone compared to \CFA{}'s implicit assertion satisfaction.
 Furthermore, Cyclone's polymorphic functions and types are restricted to abstraction over types with the same layout and calling convention as !void*!, \ie{} only pointer types and !int!.
 …
 Many other languages include some form of generic types.
 As a brief survey, ML\cite{ML} was the first language to support parameteric polymorphism, but unlike \CFA{} does not support the use of assertions and traits to constrain type arguments.
 Haskell\cite{Haskell10} combines ML-style polymorphism with the notion of type classes, similar to \CFA{} traits, but requiring an explicit association with their implementing types, unlike \CFA{}.
 Objective-C\cite{obj-c-book} is an extension to C which has had some industrial success; however, it did not support type-checked generics until recently\cite{xcode7}, and it's garbage-collected, message-passing object-oriented model is a radical departure from C.
 Go\cite{Go}, and Rust\cite{Rust} are modern compiled languages with abstraction features similar to \CFA{} traits, \emph{interfaces} in Go and \emph{traits} in Rust.
 Go has implicit interface implementation and uses a ``fat pointer'' construct to pass polymorphic objects to functions, similar in principle to \CFA{}'s implicit forall paramters.
+As a brief survey, ML \cite{ML} was the first language to support parametric polymorphism, but unlike \CFA{} does not support the use of assertions and traits to constrain type arguments.
+Haskell \cite{Haskell10} combines ML-style polymorphism with the notion of type classes, similar to \CFA{} traits, but requiring an explicit association with their implementing types, unlike \CFA{}.
+Objective-C \cite{obj-c-book} is an extension to C which has had some industrial success; however, it did not support type-checked generics until recently \cite{xcode7}, and its garbage-collected, message-passing object-oriented model is a radical departure from C.
+Go \cite{Go}, and Rust \cite{Rust} are modern compiled languages with abstraction features similar to \CFA{} traits: \emph{interfaces} in Go and \emph{traits} in Rust.
+Go has implicit interface implementation and uses a ``fat pointer'' construct to pass polymorphic objects to functions, similar in principle to \CFA{}'s implicit forall parameters.
 Go does not, however, allow user code to define generic types, restricting Go programmers to the small set of generic types defined by the compiler.
 Rust has powerful abstractions for generic programming, including explicit implemenation of traits and options for both separately-compiled virtual dispatch and template-instantiated static dispatch in functions.
+Rust has powerful abstractions for generic programming, including explicit implementation of traits and options for both separately-compiled virtual dispatch and template-instantiated static dispatch in functions.
 On the other hand, the safety guarantees of Rust's \emph{lifetime} abstraction and borrow checker impose a distinctly idiosyncratic programming style and steep learning curve; \CFA{}, with its more modest safety features, allows direct ports of C code while maintaining the idiomatic style of the original source.
 \subsection{\CFA{} Generics}
 The generic types design in \CFA{} draws inspiration from both \CC{} and Java generics, capturing the better aspects of each.
 Like \CC{} template types, generic !struct!s and !union!s in \CFA{} have macro-expanded storage layouts, but, like Java generics, \CFA{} generic types can be used with separately-compiled polymorphic functions without requiring either the type or function definition to be visible to the other.
+The generic types design in \CFA{} draws inspiration from both \CC{} and Java generics, capturing useful aspects of each.
+Like \CC{} template types, generic !struct! and !union! types in \CFA{} have macro-expanded storage layouts, but, like Java generics, \CFA{} generic types can be used with separately-compiled polymorphic functions without requiring either the type or function definition to be visible to the other.
 The fact that the storage layout of any instantiation of a \CFA{} generic type is identical to that of the monomorphic type produced by simple macro replacement of the generic type parameters is important to provide consistent and predictable runtime performance, and to not impose any undue abstraction penalty on generic code.
+As an example, consider the following generic type and function \TODO{test this}:
+As an example, consider the following generic type and function:
+% TODO whatever the bug is with initializer-expressions not working, it affects this
 \begin{cfa}
 forall( otype R, otype S ) struct pair { R first; S second; };
 pair(const char*, int) with_len( const char* s ) {
         return (pair(const char*), int){ s, strlen(s) };
+        return (pair(const char*, int)){ s, strlen(s) };
+}
 \end{cfa}
 In this example, !with_len! is defined at the same scope as !pair!, but it could be called from any context that can see the definition of !pair! and a declaration of !with_len!.
 If its return type was !pair(const char*, int)*!, callers of !with_len! would only need the declaration !forall(otype R, otype S) struct pair! to call it, in accordance with the usual C rules for opaque types.
 !with_len! is itself a monomorphic function, returning a type that is structurally identical to !struct { const char* first; int second; }!, and as such could be called from C given an appropriate redeclaration and demangling flags.
 However, the definition of !with_len! depends on a polymorphic function call to the !pair! constructor, which only needs to be written once (in this case, implicitly by the compiler according to the usual \CFA{} constructor generation\cite{Moss18}) and can be re-used for a wide variety of !pair! instantiations.
 Since the parameters to this polymorphic constructor call are all statically known, compiler inlining can eliminate any runtime overhead of this polymorphic call.
+If its return type were !pair(const char*, int)*!, callers of !with_len! would only need the declaration !forall(otype R, otype S) struct pair! to call it, in accordance with the usual C rules for opaque types.
+!with_len! is itself a monomorphic function, returning a type that is structurally identical to !struct { const char* first; int second; }!, and as such could be called from C given appropriate re-declarations and demangling flags.
+However, the definition of !with_len! depends on a polymorphic function call to the !pair! constructor, which only needs to be written once (in this case, implicitly by the compiler according to the usual \CFA{} constructor generation \cite{Schluntz17}) and can be re-used for a wide variety of !pair! instantiations.
+Since the parameters to this polymorphic constructor call are all statically known, compiler inlining can in principle eliminate any runtime overhead of this polymorphic call.
 \CFA{} deliberately does not support \CC{}-style partial specializations of generic types.
 A particularly infamous example in the \CC{} standard library is !vector<bool>!, which is represented as a bitstring rather than the array representation of the other !vector! instantiations.
+A particularly infamous example in the \CC{} standard library is !vector<bool>!, which is represented as a bit-string rather than the array representation of the other !vector! instantiations.
 Complications from this inconsistency (chiefly the fact that a single bit is not addressable, unlike an array element) make the \CC{} !vector! unpleasant to use in generic contexts due to the break in its public interface.
 Rather than attempting to plug leaks in the template specialization abstraction with a detailed method interface, \CFA{} takes the more principled position that two types with an unrelated data layout are in fact unrelated types, and should be handled with different code.
 Of course, to the degree that distinct types are similar enough to share an interface, the \CFA{} !trait! system allows one to be defined, and objects of types implementing that !trait! to be operated on in the same polymorphic functions.
+Rather than attempting to plug leaks in the template specialization abstraction with a detailed method interface, \CFA{} takes the more consistent position that two types with an unrelated data layout are in fact unrelated types, and should be handled with different code.
+Of course, to the degree that distinct types are similar enough to share an interface, the \CFA{} !trait! system allows such an interface to be defined, and objects of types implementing that !trait! to be operated on using the same polymorphic functions.
 Since \CFA{} polymorphic functions can operate over polymorphic generic types, functions over such types can be partially or completely specialized using the usual overload selection rules.
 As an example, the !with_len! function above could be an optimization of the following more general function:
+As an example, the following generalization of !with_len! is a semantically-equivalent function which works for any type that has a !len! function declared, making use of both the ad-hoc (overloading) and parametric (!forall!) polymorphism features of \CFA{}:
 \begin{cfa}
 …
 \end{cfa}
 \CFA{} generic types also support the type constraints from !forall! functions.
+\CFA{} generic types also support type constraints, as in !forall! functions.
 For example, the following declaration of a sorted set type ensures that the set key implements equality and relational comparison:
 …
 \end{cfa}
 These constraints are implemented by applying equivalent constraints to the compiler-generated constructors for this type.
+These constraints are enforced by applying equivalent constraints to the compiler-generated constructors for this type.
 \section{Implementation} \label{generic-impl-sec}
+The ability to use generic types in polymorphic contexts means that the \CFA{} implementation in \CFACC{} must support a mechanism for accessing fields of generic types dynamically at runtime.
+While \CFACC{} could in principle use this same mechanism for accessing fields of all generic types, such an approach would throw away compiler knowledge of static types and impose an unnecessary runtime cost, limiting the utility of the generic type design.
+Instead, my design for generic type support in \CFACC{} distinguishes between \emph{concrete} generic types that have a fixed memory layout regardless of type parameters and \emph{dynamic} generic types that may vary in memory layout depending on their type parameters.
+The ability to use generic types in polymorphic contexts means that the \CFA{} implementation must support a mechanism for accessing fields of generic types dynamically.
+While \CFACC{} could in principle use this same mechanism for accessing fields of generic types in monomorphic contexts as well, such an approach would throw away compiler knowledge of static types and impose an unnecessary runtime cost.
+Instead, my design for generic types in \CFACC{} distinguishes between \emph{concrete} generic types that have a fixed memory layout regardless of type parameters and \emph{dynamic} generic types that may vary in memory layout depending on their type parameters.
 A \emph{dtype-static} type has polymorphic parameters but is still concrete.
 Polymorphic pointers are an example of dtype-static types; given some type variable !T!, T is a polymorphic type, but !T*! has a fixed size and can therefore be represented by a !void*! in code generation.
+Polymorphic pointers are an example of dtype-static types; given some type variable !T!, !T! is a polymorphic type, but !T*! has a fixed size and can therefore be represented by a !void*! in code generation.
 In particular, generic types where all parameters are un-!sized! (\ie{} they do not conform to the built-in !sized! trait because the compiler does not know their size and alignment) are always concrete, as there is no possibility for their layout to vary based on type parameters of unknown size and alignment.
 More precisely, a type is concrete if and only if all of its !sized! type parameters are concrete, and a concrete type is dtype-static if any of its type parameters are (possibly recursively) polymorphic.
+To illustrate, the following code using the !pair! type from above \TODO{test this} has each use of !pair! commented with its class:
+To illustrate, the following code using the !pair! type from above has each use of !pair! commented with its class:
+% TODO constructor bugs here too
 \begin{cfa}
 //dynamic, layout varies based on T
 forall(otype T) T value( pair(const char*, T) p ) { return p.second; }
+forall(otype T) T value$\(_1\)$( pair(const char*, T) p ) { return p.second; }
 // dtype-static, F* and T* are concrete but recursively polymorphic
 forall(dtype F, otype T) T value( pair(F*, T*) ) { return *p.second; }
+forall(dtype F, otype T) T value$\(_2\)$( pair(F*, T*) ) { return *p.second; }
 pair(const char*, int) p = {"magic", 42}; $\C[2.5in]{// concrete}$
 int i = value(p);
 pair(void*, int*) q = {0, &p.second}; $\C[2.5in]{// concrete}$
+pair(void*, int*) q = {0, &i}; $\C[2.5in]{// concrete}$
 i = value(q);
 double d = 1.0;
 …
 \subsection{Concrete Generic Types}
 The \CFACC{} translator template expands concrete generic types into new structure types, affording maximal inlining.
+The \CFACC{} translator template-expands concrete generic types into new structure types, affording maximal inlining.
 To enable interoperation among equivalent instantiations of a generic type, \CFACC{} saves the set of instantiations currently in scope and reuses the generated structure declarations where appropriate.
 In particular, tuple types are implemented as a single compiler-generated generic type definition per tuple arity, and can be instantiated and reused according to the usual rules for generic types.
 A function declaration that accepts or returns a concrete generic type produces a declaration for the instantiated structure in the same scope, which all callers may reuse.
 As an example, the concrete instantiation for !pair(const char*, int)! is\footnote{This omits the field name mangling performed by \CFACC{} for overloading purposes.\label{mangle-foot}}
+As an example, the concrete instantiation for !pair(const char*, int)! is\footnote{Field name mangling for overloading purposes is omitted.\label{mangle-foot}}:
 \begin{cfa}
 …
 A concrete generic type with dtype-static parameters is also expanded to a structure type, but this type is used for all matching instantiations.
 In the example above, the !pair(F*, T*)! parameter to !value! is such a type; its expansion is below\footref{mangle-foot}, and it is used as the type of the variables !q! and !r! as well, with casts for member access where appropriate.
+In the example above, the !pair(F*, T*)! parameter to !value! is such a type; its expansion is below\footref{mangle-foot}, and it is used as the type of the variables !q! and !r! as well, with casts for member access where appropriate:
 \begin{cfa}
 …
 The design for generic types presented here adds an \emph{offset array} containing structure-member offsets for dynamic generic !struct! types.
 A dynamic generic !union! needs no such offset array, as all members are at offset 0, but size and alignment are still necessary.
 Access to members of a dynamic structure is provided at runtime via base displacement addressing the structure pointer and the member offset (similar to the !offsetof! macro), moving a compile-time offset calculation to runtime.
 the offset arrays are statically generated where possible.
+Access to members of a dynamic structure is provided at runtime via base-displacement addressing of the structure pointer and the member offset (similar to the !offsetof! macro), moving a compile-time offset calculation to runtime.
+The offset arrays are statically generated where possible.
 If a dynamic generic type is passed or returned by value from a polymorphic function, \CFACC{} can safely assume that the generic type is complete (\ie{} has a known layout) at any call site, and the offset array is passed from the caller; if the generic type is concrete at the call site, the elements of this offset array can even be statically generated using the C !offsetof! macro.
 As an example, the body of the second !value! function above is implemented as
+As an example, the body of !value!$_2$ above is implemented as:
 \begin{cfa}
 …
 Here, !_assign_T! is passed in as an implicit parameter from !otype T! and takes two !T*! (!void*! in the generated code), a destination and a source, and !_retval! is the pointer to a caller-allocated buffer for the return value, the usual \CFA{} method to handle dynamically-sized return types.
 !_offsetof_pair! is the offset array passed into !value!; this array is generated at the call site as
+!_offsetof_pair! is the offset array passed into !value!; this array is statically generated at the call site as:
 \begin{cfa}
 …
 For instance, modularity is generally provided in C by including an opaque forward declaration of a structure and associated accessor and mutator functions in a header file, with the actual implementations in a separately-compiled \texttt{.c} file.
 \CFA{} supports this pattern for generic types, implying that the caller of a polymorphic function may not know the actual layout or size of a dynamic generic type and only holds it by pointer.
 \CFACC{} automatically generates \emph{layout functions} for cases where the size, alignment, and offset array of a generic struct cannot be passed into a function from that functions's caller.
+\CFACC{} automatically generates \emph{layout functions} for cases where the size, alignment, and offset array of a generic struct cannot be passed into a function from that function's caller.
 These layout functions take as arguments pointers to size and alignment variables and a caller-allocated array of member offsets, as well as the size and alignment of all !sized! parameters to the generic structure.
 Un!sized! parameters not passed because they are forbidden from being used in a context that affects layout by C's usual rules about incomplete types.
 Similarly, the layout function can only safely be called from a context where the generic type definition is visible, because otherwise the caller will not know how large to allocate the array of member offsets.
 The C standard does not specify a memory layout for structs, but the POSIX ABI for x86\cit{} does; this memory layout is common for C implementations, but is a platform-specific issue for porting \CFA{}.
+Un-!sized! parameters are not passed because they are forbidden from being used in a context that affects layout by C's usual rules about incomplete types.
+Similarly, the layout function can only safely be called from a context where the generic type definition is visible, because otherwise the caller does not know how large to allocate the array of member offsets.
+The C standard does not specify a memory layout for structs, but the POSIX ABI for x86 \cite{POSIX08} does; this memory layout is common for C implementations, but is a platform-specific issue for porting \CFA{}.
 This algorithm, sketched below in pseudo-\CFA{}, is a straightforward mapping of consecutive fields into the first properly-aligned offset in the !struct! layout; layout functions for !union! types omit the offset array and simply calculate the maximum size and alignment over all union variants.
 Since \CFACC{} generates a distinct layout function for each type, constant-folding and loop unrolling are applied.
 …
 forall(dtype T1, dtype T2, ... | sized(T1) | sized(T2) | ...)
 void layout(size_t* size, size_t* align, size_t* offsets) {
-        // initialize values
         *size = 0; *align = 1;
         // set up members
 …
 \end{cfa}
 Results of layout function calls are cached so that they are only computed once per type per function.
 Layout functions also allow generic types to be used in a function definition without reflecting them in the function signature, an important implemenation-hiding constraint of the design.
+Results of layout-function calls are cached so that they are only computed once per type per function.
+Layout functions also allow generic types to be used in a function definition without reflecting them in the function signature, an important implementation-hiding constraint of the design.
 For instance, a function that strips duplicate values from an unsorted !list(T)! likely has a reference to the list as its only explicit parameter, but uses some sort of !set(T)! internally to test for duplicate values.
 This function could acquire the layout for !set(T)! by calling its layout function, providing as an argument the layout of !T! implicitly passed into that function.
 Whether a type is concrete, dtype-static, or dynamic is decided solely on the basis of the type arguments and !forall! clause type paramters.
 This design allows opaque forward declarations of generic types, \eg{} !forall(otype T) struct Box;! like in C, all uses of $Box(T)$ can be separately compiled, and callers from other translation units know the proper calling conventions to use.
 In an alternate design where the definition of a structure type is included in deciding whether a generic type is dynamic or concrete, some further types may be recognized as dtype-static --- \eg{} !Box! could be defined with a body !{ T* p; }!, and would thus not depend on !T! for its layout.
 However, the existence of an !otype! parameter !T! means that !Box! \emph{could} depend on !T! for its layout if this definition is not visible, and we judged preserving separate compilation (and the associated C compatibility) in the implemented design to be an acceptable trade-off.
+Whether a type is concrete, dtype-static, or dynamic is decided solely on the basis of the type arguments and !forall! clause type parameters.
+This design allows opaque forward declarations of generic types, \eg{} !forall(otype T) struct Box;! like in C, all uses of !Box(T)! can be separately compiled, and callers from other translation units know the proper calling conventions.
+In an alternate design, where the definition of a structure type is included in deciding whether a generic type is dynamic or concrete, some further types may be recognized as dtype-static --- \eg{} !Box! could be defined with a body !{ T* p; }!, and would thus not depend on !T! for its layout.
+However, the existence of an !otype! parameter !T! means that !Box! \emph{could} depend on !T! for its layout if this definition is not visible, and preserving separate compilation (and the associated C compatibility) is a more important design metric.
 \subsection{Applications of Dtype-static Types} \label{dtype-static-sec}
 …
 Another useful pattern enabled by reused dtype-static type instantiations is zero-cost \emph{tag structures}.
 Sometimes, information is only used for type checking and can be omitted at runtime.
 In the example below, !scalar! is a dtype-static type; hence, all uses have a single structure definition containing !unsigned long! and can share the same implementations of common functions like !?+?!.
+In the example below, !scalar! is a dtype-static type; hence, all uses have a single structure definition containing !unsigned long! and can share the same implementations of common functions, like !?+?!.
 These implementations may even be separately compiled, unlike \CC{} template functions.
 However, the \CFA{} type checker ensures matching types are used by all calls to !?+?!, preventing nonsensical computations like adding a length to a volume.
 …
 \section{Performance Experiments} \label{generic-performance-sec}
 To validate the practicality of this generic type design I have conducted microbenchmark-based tests against a number of comparable code designs in C and \CC{}, first published in \cite{Moss18}.
 Since all these languages are compiled with the same compiler backend and share a subset essentially comprising standard C, maximal-performance benchmarks should show little runtime variance, differing only in length and clarity of source code.
+To validate the practicality of this generic type design, microbenchmark-based tests were conducted against a number of comparable code designs in C and \CC{}, first published in \cite{Moss18}.
+Since all these languages are all C-based and compiled with the same compiler backend, maximal-performance benchmarks should show little runtime variance, differing only in length and clarity of source code.
 A more illustrative comparison measures the costs of idiomatic usage of each language's features.
 The code below shows the \CFA{} benchmark tests for a generic stack based on a singly-linked list; the test suite is equivalent for the other other languages.
+The code below shows the \CFA{} benchmark tests for a generic stack based on a singly-linked list; the test suite is equivalent for the other other languages, code for which is included in Appendix~\ref{generic-bench-app}.
 The experiment uses element types !int! and !pair(short, char)! and pushes $N = 40M$ elements on a generic stack, copies the stack, clears one of the stacks, and finds the maximum value in the other stack.
 \begin{cfa}
+#define N 4000000
 int main() {
         int max = 0, val = 42;
 …
 \end{cfa}
 The four versions of the benchmark implemented are C with !void*!-based polymorphism, \CFA{} with parameteric polymorphism, \CC{} with templates, and \CC{} using only class inheritance for polymorphism, denoted \CCV{}.
+The four versions of the benchmark implemented are C with !void*!-based polymorphism, \CFA{} with parametric polymorphism, \CC{} with templates, and \CC{} using only class inheritance for polymorphism, denoted \CCV{}.
 The \CCV{} variant illustrates an alternative object-oriented idiom where all objects inherit from a base !object! class, mimicking a Java-like interface; in particular, runtime checks are necessary to safely downcast objects.
 The most notable difference among the implementations is the memory layout of generic types: \CFA{} and \CC{} inline the stack and pair elements into corresponding list and pair nodes, while C and \CCV{} lack such capability and, instead, must store generic objects via pointers to separately allocated objects.
 …
 \centering
 \input{generic-timing}
 \caption{Benchmark timing results (smaller is better)} \label{generic-eval-fig}
+\caption[Benchmark timing results]{Benchmark timing results (smaller is better)} \label{generic-eval-fig}
 \end{figure}
 …
 Use of these standard library types has minimal impact on the performance benchmarks, but shrinks the \CFA{} and \CC{} code to 39 and 42 lines, respectively.
 The difference between the \CFA{} and \CC{} line counts is primarily declaration duplication to implement separate compilation; a header-only \CFA{} library is similar in length to the \CC{} version.
 On the other hand, due to the language shortcomings mentioned at the beginning of the chapter, C does not have a generic collections library in its standard distribution, resulting in frequent reimplementation of such collection types by C programmers.
+On the other hand, due to the language shortcomings mentioned at the beginning of the chapter, C does not have a generic collections library in its standard distribution, resulting in frequent re-implementation of such collection types by C programmers.
 \CCV{} does not use the \CC{} standard template library by construction, and, in fact, includes the definition of !object! and wrapper classes for !char!, !short!, and !int! in its line count, which inflates this count somewhat, as an actual object-oriented language would include these in the standard library.
 I justify the given line count by noting that many object-oriented languages do not allow implementing new interfaces on library types without subclassing or wrapper types, which may be similarly verbose.
 …
 Line count is a fairly rough measure of code complexity; another important factor is how much type information the programmer must specify manually, especially where that information is not type-checked.
 Such unchecked type information produces a heavier documentation burden and increased potential for runtime bugs and is much less common in \CFA{} than C, with its manually specified function pointer arguments and format codes, or \CCV{}, with its extensive use of un-type-checked downcasts, \eg{} !object! to !integer! when popping a stack.
 To quantify this manual typing, the ``redundant type annotations'' line in Table~\ref{generic-eval-table} counts the number of lines on which the known type of a variable is respecified, either as a format specifier, explicit downcast, type-specific function, or by name in a !sizeof!, !struct! literal, or !new! expression.
+To quantify this manual typing, the ``redundant type annotations'' line in Table~\ref{generic-eval-table} counts the number of lines on which the known type of a variable is re-specified, either as a format specifier, explicit downcast, type-specific function, or by name in a !sizeof!, !struct! literal, or !new! expression.
 The \CC{} benchmark uses two redundant type annotations to create new stack nodes, whereas the C and \CCV{} benchmarks have several such annotations spread throughout their code.
 The \CFA{} benchmark is able to eliminate \emph{all} redundant type annotations through use of the return-type polymorphic !alloc! function in the \CFA{} standard library.
 …
 \section{Future Work}
 The generic types design presented here is already sufficiently expressive to implement a variety of useful library types.
+The generic types presented here are already sufficiently expressive to implement a variety of useful library types.
 However, some other features based on this design could further improve \CFA{}.
 The most pressing addition is the ability to have non-type generic parameters.
+C already supports fixed-length array types, \eg{} !int[10]!; these types are essentially generic types with unsigned integer parameters, and allowing \CFA{} users the capability to build similar types is a requested feature.
+More exotically, the ability to have these non-type parameters depend on dynamic runtime values rather than static compile-time constants opens up interesting opportunities for type-checking problematic code patterns.
+For example, if a collection iterator was parameterized over the pointer to the collection it was drawn from, then a sufficiently powerful static analysis pass could ensure that that iterator was only used for that collection, eliminating one source of hard-to-find bugs.
+The implementation mechanisms behind this generic types design can also be used to add new features to \CFA{}.
+One such potential feature would be to add \emph{field assertions} to the existing function and variable assertions on polymorphic type variables.
+C already supports fixed-length array types, \eg{} !int[10]!; these types are essentially generic types with unsigned integer parameters (\ie{} array dimension), and allowing \CFA{} users the capability to build similar types is a requested feature.
+% More exotically, the ability to have these non-type parameters depend on dynamic runtime values rather than static compile-time constants opens up interesting opportunities for type-checking problematic code patterns.
+% For example, if a collection iterator was parameterized over the pointer to the collection it was drawn from, then a sufficiently powerful static analysis pass could ensure that that iterator was only used for that collection, eliminating one source of hard-to-find bugs.
+The implementation mechanisms behind generic types can also be used to add new features to \CFA{}.
+One such potential feature is \emph{field assertions}, an addition to the existing function and variable assertions on polymorphic type variables.
+These assertions could be specified using this proposed syntax:
+\begin{cfa}
+trait hasXY(dtype T) {
+        int T.x;  $\C{// T has a field x of type int}$
+        int T.y;  $\C{// T has a field y of type int}$
+};
+\end{cfa}
 Implementation of these field assertions would be based on the same code that supports member access by dynamic offset calculation for dynamic generic types.
 Simulating field access can already be done more flexibly in \CFA{} by declaring a trait containing an accessor function to be called from polymorphic code, but these accessor functions impose some overhead both to write and call, and directly providing field access via an implicit offset parameter would be both more concise and more efficient.
+Of course, there are language design trade-offs to such an approach, notably that providing the two similar features of field and function assertions would impose a burden of choice on programmers writing traits, with field assertions more efficient, but function assertions more general; given this open design question we have deferred a decision on field assertions until we have more experience using \CFA{}.
+If field assertions are included in the language, a natural extension would be to provide a structural inheritance mechanism for every !struct! type that simply turns the list of !struct! fields into a list of field assertions, allowing monomorphic functions over that type to be generalized to polymorphic functions over other similar types with added or reordered fields.
+\CFA{} could also support a packed or otherwise size-optimized representation for generic types based on a similar mechanism --- the layout function would need to be re-written, but nothing in the use of the offset arrays implies that the field offsets need be monotonically increasing.
+Of course, there are language design trade-offs to such an approach, notably that providing the two similar features of field and function assertions would impose a burden of choice on programmers writing traits, with field assertions more efficient, but function assertions more general; given this open design question a decision on field assertions is deferred until \CFA{} is more mature.
+If field assertions are included in the language, a natural extension would be to provide a structural inheritance mechanism for every !struct! type that simply turns the list of !struct! fields into a list of field assertions, allowing monomorphic functions over that type to be generalized to polymorphic functions over other similar types with added or reordered fields, for example:
+\begin{cfa}
+struct point { int x, y; };  $\C{// traitof(point) is equivalent to hasXY above}$
+struct coloured_point { int x, y; enum { RED, BLACK } colour };
+// works for both point and coloured_point
+forall(dtype T | traitof(point)(T) )
+double hypot( T& p ) { return sqrt( p.x*p.x + p.y*p.y ); }
+\end{cfa}
+\CFA{} could also support a packed or otherwise size-optimized representation for generic types based on a similar mechanism --- nothing in the use of the offset arrays implies that the field offsets need to be monotonically increasing.
 With respect to the broader \CFA{} polymorphism design, the experimental results in Section~\ref{generic-performance-sec} demonstrate that though the runtime impact of \CFA{}'s dynamic virtual dispatch is low, it is not as low as the static dispatch of \CC{} template inlining.
+However, rather than subject all \CFA{} users to the compile-time costs of ubiquitous template expansion, we are considering more targeted mechanisms for performance-sensitive code.
+Two promising approaches are are an !inline! annotation at polymorphic function call sites to create a template specialization of the function (provided the code is visible) or placing a different !inline! annotation on polymorphic function definitions to instantiate a specialized version of the function for some set of types.
+These approaches are not mutually exclusive and allow performance optimizations to be applied only when necessary, without suffering global code bloat.
+In general, the \CFA{} team believes that separate compilation works well with loaded hardware caches by producing smaller code, which may offset the benefit of larger inlined code.
+However, rather than subject all \CFA{} users to the compile-time costs of ubiquitous template expansion, it is better to target performance-sensitive code more precisely.
+Two promising approaches are an !inline! annotation at polymorphic function call sites to create a template specialization of the function (provided the code is visible) or placing a different !inline! annotation on polymorphic function definitions to instantiate a specialized version of the function for some set of types.
+These approaches are complementary and allow performance optimizations to be applied only when necessary, without suffering global code bloat.

doc/theses/aaron_moss_PhD/phd/introduction.tex

-              r1c35c78
+              r8b590a4
 The C programming language has had a wide-ranging impact on the design of software and programming languages.
 In the 30 years since its first standardization, it has consistently been one of the most popular programming languages, with millions of lines of C code still in active use, and tens of thousands of trained programmers producing it. The TIOBE index\cite{TIOBE} tracks popularity of programming languages over time, and C has never dropped below second place:
+In the 30 years since its first standardization, it has consistently been one of the most popular programming languages, with billions of lines of C code still in active use, and tens of thousands of trained programmers producing it. The TIOBE index\cite{TIOBE} tracks popularity of programming languages over time, and C has never dropped below second place:
 \begin{table}[h]
 …
 \end{table}
 The impact of C on programming language design is also obvious from Table~\ref{tiobe-table}; with the exception of Python, all of the top five languages use C-like syntax and procedural control structures.
 \CC{} is even a largely backwards-compatible extension of C, with development dating back nearly as far as C itself.
 Though its lasting popularity and wide impact on programming language design point to the continued relevance of C, they also highlight the widespread desire of programmers for languages with more expressive power and programmer-friendly features; accommodating both low-impact maintenance of legacy C code and low-effort development of the software of the future is a difficult task for a single programming language.
+The impact of C on programming language design is also obvious from Table~\ref{tiobe-table}; with the exception of Python, all of the top five languages use C-like syntax and control structures.
+\CC{} is even a largely backwards-compatible extension of C.
+Though its lasting popularity and wide impact on programming language design point to the continued relevance of C, there is also widespread desire of programmers for languages with more expressive power and programmer-friendly features; accommodating both maintenance of legacy C code and development of the software of the future is a difficult task for a single programming language.
 \CFA{}\footnote{Pronounced ``C-for-all'', and written \CFA{} or \CFL{}.} is an evolutionary modernization of the C programming language which aims to fulfill both these ends well.
+\CFA{}\footnote{Pronounced ``C-for-all'', and written \CFA{} or \CFL{}.} is an evolutionary modernization of the C programming language that aims to fulfill both these ends well.
 \CFA{} both fixes existing design problems and adds multiple new features to C, including name overloading, user-defined operators, parametric-polymorphic routines, and type constructors and destructors, among others.
 The new features make \CFA{} more powerful and expressive than C, while maintaining strong backward-compatibility with both C code and the procedural paradigm expected by C programmers.
+Unlike other popular C extensions like \CC{} and Objective-C, \CFA{} adds modern features to C without imposing an object-oriented paradigm to use them.
 However, these new features do impose a compile-time cost, particularly in the expression resolver, which must evaluate the typing rules of a significantly more complex type-system.
 This thesis is focused on making \CFA{} a more powerful and expressive language, both by adding new features to the \CFA{} type system and ensuring that both added and existing features can be efficiently implemented in \CFACC{}, the \CFA{} reference compiler.
+Particular contributions of this work include design and implementation of
+parametric-polymorphic (``generic'') types in a manner compatible with the existing polymorphism design of \CFA{} (Chapter~\ref{generic-chap}), a type environment data structure based on a novel variant of the union-find algorithm (Chapter~\ref{env-chap}), and a new expression resolution algorithm designed to quickly locate the optimal declarations for a \CFA{} expression (Chapter~\ref{resolution-chap}).
+This expression resolution algorithm was designed with the aid of a purpose-built prototype system which encapsulates the essential aspects of the \CFA{} type system without incurring the technical debt of the existing system or the friction-inducing necessity of maintaining a working compiler; the goal of this prototype system was to discover effective heuristics to avoid performing unnecessary work in the process of locating the optimal \CFA{} expression resolution.
+Particular contributions of this work include:
+\begin{itemize}
+\item design and implementation of parametric-polymorphic (``generic'') types in a manner compatible with the existing polymorphism design of \CFA{} (Chapter~\ref{generic-chap}),
+\item a new expression resolution algorithm designed to quickly locate the optimal declarations for a \CFA{} expression (Chapter~\ref{resolution-chap}),
+\item a type environment data structure based on a novel variant of the union-find algorithm (Chapter~\ref{env-chap}),
+\item and as a technical contribution, a prototype system for compiler algorithm development which encapsulates the essential aspects of the \CFA{} type system without incurring the technical debt of the existing system or the friction-inducing necessity of maintaining a working compiler (Chapter~\ref{expr-chap}).
+\end{itemize}
 Though the direction and validation of this work was fairly narrowly focused on the \CFA{} programming language, the tools used and results obtained should be of interest to a wider compiler and programming language design community.
 In particular, with the addition of \emph{concepts} in \CCtwenty{}, conforming \CC{} compilers must support a model of type assertions very similar to that in \CFA{}, and the algorithmic techniques used in the expression resolution algorithm presented here may prove useful.
 Type environments are also widely modelled in compiler implementations, particularly of functional languages, though also increasingly commonly in other languages (such as Rust) which perform type inference; the type environment presented here may be useful to those language implementers.
+Though the direction and validation of this work is fairly narrowly focused on the \CFA{} programming language, the tools used and results obtained should be of interest to a wider compiler and programming language design community.
+In particular, with the addition of \emph{concepts} in \CCtwenty{} \cite{C++Concepts}, conforming \CC{} compilers must support a model of type assertions very similar to that in \CFA{}, and the algorithmic techniques used here may prove useful.
+Type environments are also widely modelled in compiler implementations, particularly for functional languages, though also increasingly commonly for other languages (such as Rust) which perform type inference; the type environment presented here may be useful to those language implementers.

doc/theses/aaron_moss_PhD/phd/macros.tex

-              r1c35c78
+              r8b590a4
 \newcommand{\Csharp}{C\raisebox{-0.7ex}{\Large$^\sharp$}} % C# symbolic name
 \newcommand{\ie}{\textit{i.e.}}
 \newcommand{\eg}{\textit{e.g.}}
 \newcommand{\etc}{\textit{etc.}}
 \newcommand{\etal}{\textit{et~al.}}
+\newcommand{\ie}{\textit{i.e.}\@}
+\newcommand{\eg}{\textit{e.g.}\@}
+\newcommand{\etc}{\textit{etc.}\@}
+\newcommand{\etal}{\textit{et~al.}\@}
 \newcommand{\myset}[1]{\left\{#1\right\}}

doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex

-              r1c35c78
+              r8b590a4
 \chapter{Resolution Heuristics}
+\chapter{Resolution Algorithms}
 \label{resolution-chap}
+% consider using "satisfaction" throughout when talking about assertions
+% "valid" instead of "feasible" interpretations
 The main task of the \CFACC{} type-checker is \emph{expression resolution}, determining which declarations the identifiers in each expression correspond to.
+Resolution is a straightforward task in C, as each declaration has a unique identifier, but in \CFA{} the name overloading features discussed in Section~\ref{overloading-sec} generate multiple candidate declarations for each identifier.
+I refer to a given matching between identifiers and declarations in an expression as an \emph{interpretation}; an interpretation also includes information about polymorphic type bindings and implicit casts to support the \CFA{} features discussed in Sections~\ref{poly-func-sec} and~\ref{implicit-conv-sec}, each of which increase the proportion of feasible candidate interpretations.
+To choose between feasible interpretations, \CFA{} defines a \emph{conversion cost} to rank interpretations; the expression resolution problem is thus to find the unique minimal-cost interpretation for an expression, reporting an error if no such interpretation exists.
+\section{Conversion Cost}
+% Discuss changes to cost model, as promised in Ch. 2
+% Mention relevance of work to C++20 concepts
+Resolution is a straightforward task in C, as no declarations share identifiers, but in \CFA{} the name overloading features discussed in Section~\ref{overloading-sec} generate multiple candidate declarations for each identifier.
+A given matching between identifiers and declarations in an expression is an \emph{interpretation}; an interpretation also includes information about polymorphic type bindings and implicit casts to support the \CFA{} features discussed in Sections~\ref{poly-func-sec} and~\ref{implicit-conv-sec}, each of which increase the number of valid candidate interpretations.
+To choose among valid interpretations, a \emph{conversion cost} is used to rank interpretations.
+Hence, the expression resolution problem is to find the unique minimal-cost interpretation for an expression, reporting an error if no such interpretation exists.
+\section{Expression Resolution}
+\subsection{Type Unification}
+The polymorphism features of \CFA{} require binding of concrete types to polymorphic type variables.
+Briefly, \CFACC{} keeps a mapping from type variables to the concrete types they are bound to as an auxiliary data structure during expression resolution; Chapter~\ref{env-chap} describes this \emph{environment} data structure in more detail.
+A \emph{unification} algorithm is used to simultaneously check two types for equivalence with respect to the substitutions in an environment and update that environment.
+Essentially, unification recursively traverses the structure of both types, checking them for equivalence, and when it encounters a type variable, it replaces it with the concrete type it is bound to; if the type variable has not yet been bound, the unification algorithm assigns the equivalent type as the bound type of the variable, after performing various consistency checks.
+Ditchfield~\cite{Ditchfield92} and Bilson~\cite{Bilson03} describe the semantics of \CFA{} unification in more detail.
+\subsection{Conversion Cost} \label{conv-cost-sec}
+C does not have an explicit cost model for implicit conversions, but the ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} used to decide which arithmetic operators to use define one implicitly.
+The only context in which C has name overloading is the arithmetic operators, and the usual arithmetic conversions define a \emph{common type} for mixed-type arguments to binary arithmetic operators.
+Since for backward-compatibility purposes the conversion costs of \CFA{} must produce an equivalent result to these common type rules, it is appropriate to summarize \cite[\S{}6.3.1.8]{C11} here:
+\begin{itemize}
+\item If either operand is a floating-point type, the common type is the size of the largest floating-point type. If either operand is !_Complex!, the common type is also !_Complex!.
+\item If both operands are of integral type, the common type has the same size\footnote{Technically, the C standard defines a notion of \emph{rank} in \cite[\S{}6.3.1.1]{C11}, a distinct value for each \lstinline{signed} and \lstinline{unsigned} pair; integral types of the same size thus may have distinct ranks. For instance, though \lstinline{int} and \lstinline{long} may have the same size, \lstinline{long} always has greater rank. The standard-defined types are declared to have greater rank than any types of the same size added as compiler extensions.} as the larger type.
+\item If the operands have opposite signedness, the common type is !signed! if the !signed! operand is strictly larger, or !unsigned! otherwise. If the operands have the same signedness, the common type shares it.
+\end{itemize}
+Beginning with the work of Bilson~\cite{Bilson03}, \CFA{} defines a \emph{conversion cost} for each function call in a way that generalizes C's conversion rules.
+Loosely defined, the conversion cost counts the implicit conversions utilized by an interpretation.
+With more specificity, the cost is a lexicographically-ordered tuple, where each element corresponds to a particular kind of conversion.
+In Bilson's design, conversion cost is a 3-tuple, $(unsafe, poly, safe)$, where $unsafe$ is the count of unsafe (narrowing) conversions, $poly$ is the count of polymorphic type bindings, and $safe$ is the sum of the degree of safe (widening) conversions.
+Degree of safe conversion is calculated as path weight in a directed graph of safe conversions between types; Bilson's version and the current version of this graph are in Figures~\ref{bilson-conv-fig} and~\ref{extended-conv-fig}, respectively.
+The safe conversion graph is designed such that the common type $c$ of two types $u$ and $v$ is compatible with the C standard definitions from \cite[\S{}6.3.1.8]{C11} and can be calculated as the unique type minimizing the sum of the path weights of $\overrightarrow{uc}$ and $\overrightarrow{vc}$.
+The following example lists the cost in the Bilson model of calling each of the following functions with two !int! parameters:
+\begin{cfa}
+void f$\(_1\)$(char, long); $\C{// (1,0,1)}$
+void f$\(_2\)$(short, long); $\C{// (1,0,1)}$
+forall(otype T) void f$\(_3\)$(T, long); $\C{// (0,1,1)}$
+void f$\(_4\)$(long, long); $\C{// (0,0,2)}$
+void f$\(_5\)$(int, unsigned long); $\C{// (0,0,2)}$
+void f$\(_6\)$(int, long); $\C{// (0,0,1)}$
+\end{cfa}
+Note that safe and unsafe conversions are handled differently; \CFA{} counts distance of safe conversions (\eg{} !int! to !long! is cheaper than !int! to !unsigned long!), while only counting the number of unsafe conversions (\eg{} !int! to !char! and !int! to !short! both have unsafe cost 1, as in !f!$_1$ and !f!$_2$ above).
+These costs are summed over the parameters in a call; in the example above, the cost of the two !int! to !long! conversions for !f!$_4$ sum equal to the one !int! to !unsigned long! conversion for !f!$_5$.
+\begin{figure}
+        \centering
+        \begin{subfigure}[h]{3in}
+                \includegraphics{figures/bilson-conv-graph}
+                \caption{Bilson} \label{bilson-conv-fig}
+        \end{subfigure}~\begin{subfigure}[h]{3in}
+                \includegraphics{figures/extended-conv-graph}
+                \caption{Extended} \label{extended-conv-fig}
+        \end{subfigure}
+        % \includegraphics{figures/safe-conv-graph}
+        \caption[Safe conversion graphs.]{Safe conversion graphs. In both graphs, plain arcs have cost $safe = 1, sign = 0$ while dashed sign-conversion arcs have cost $safe = 1, sign = 1$. As per \cite[\S{}6.3.1.8]{C11}, types promote to types of the same signedness with greater rank, from \lstinline{signed} to \lstinline{unsigned} with the same rank, and from \lstinline{unsigned} to \lstinline{signed} with greater size. The arc from \lstinline{unsigned long} to \lstinline{long long} (highlighted in red in \ref{bilson-conv-fig}) is deliberately omitted in \ref{extended-conv-fig}, as on the presented system \lstinline{sizeof(long) == sizeof(long long)}.}
+        \label{safe-conv-fig}
+\end{figure}
+As part of adding reference types to \CFA{} (see Section~\ref{type-features-sec}), Schluntz added a new $reference$ element to the cost tuple, which counts the number of implicit reference-to-rvalue conversions performed so that candidate interpretations can be distinguished by how closely they match the nesting of reference types; since references are meant to act almost indistinguishably from lvalues, this $reference$ element is the least significant in the lexicographic comparison of cost tuples.
+I also refined the \CFA{} cost model as part of this thesis work.
+Bilson's \CFA{} cost model includes the cost of polymorphic type bindings from a function's type assertions in the $poly$ element of the cost tuple; this has the effect of making more-constrained functions more expensive than less-constrained functions, as in the following example, based on differing iterator types:
+\begin{cfa}
+forall(dtype T | { T& ++?(T&); }) T& advance$\(1\)$(T& i, int n);
+forall(dtype T | { T& ++?(T&); T& ?+=?(T&, int)}) T& advance$\(2\)$(T& i, int n);
+\end{cfa}
+In resolving a call to !advance!, the binding to the !T&! parameter in the assertions is added to the $poly$ cost in Bilson's model.
+However, type assertions actually make a function \emph{less} polymorphic, and as such functions with more type assertions should be preferred in type resolution.
+In the example above, if the meaning of !advance! is ``increment !i! !n! times'', !advance!$_1$ requires an !n!-iteration loop, while !advance!$_2$ can be implemented more efficiently with the !?+=?! operator; as such, !advance!$_2$ should be chosen over !advance!$_1$ whenever its added constraint can be satisfied.
+Accordingly, a $specialization$ element is now included in the \CFA{} cost tuple, the values of which are always negative.
+Each type assertion subtracts 1 from $specialization$, so that more-constrained functions cost less, and thus are chosen over less-constrained functions, all else being equal.
+A more sophisticated design would define a partial order over sets of type assertions by set inclusion (\ie{} one function would only cost less than another if it had a strict superset of assertions,  rather than just more total assertions), but I did not judge the added complexity of computing and testing this order to be worth the gain in specificity.
+I also incorporated an unimplemented aspect of Ditchfield's earlier cost model.
+In the example below, adapted from \cite[p.89]{Ditchfield92}, Bilson's cost model only distinguished between the first two cases by accounting extra cost for the extra set of !otype! parameters, which, as discussed above, is not a desirable solution:
+\begin{cfa}
+forall(otype T, otype U) void f$\(_1\)$(T, U);  $\C[3.25in]{// polymorphic}$
+forall(otype T) void f$\(_2\)$(T, T);  $\C[3.25in]{// less polymorphic}$
+forall(otype T) void f$\(_3\)$(T, int);  $\C[3.25in]{// even less polymorphic}$
+forall(otype T) void f$\(_4\)$(T*, int); $\C[3.25in]{// least polymorphic}$
+\end{cfa}
+The new cost model accounts for the fact that functions with more polymorphic variables are less constrained by introducing a $var$ cost element that counts the number of type variables on a candidate function.
+In the example above, !f!$_1$ has $var = 2$, while the others have $var = 1$.
+The new cost model also accounts for a nuance unhandled by Ditchfield or Bilson, in that it makes the more specific !f!$_4$ cheaper than the more generic !f!$_3$; !f!$_4$ is presumably somewhat optimized for handling pointers, but the prior \CFA{} cost model could not account for the more specific binding, as it simply counted the number of polymorphic unifications.
+In the modified model, each level of constraint on a polymorphic type in the parameter list results in a decrement of the $specialization$ cost element, which is shared with the count of assertions due to their common nature as constraints on polymorphic type bindings.
+Thus, all else equal, if both a binding to !T! and a binding to !T*! are available, the model chooses the more specific !T*! binding with $specialization = -1$.
+This process is recursive, such that !T**! has $specialization = -2$.
+This calculation works similarly for generic types, \eg{} !box(T)! also has specialization cost $-1$.
+For multi-argument generic types, the least-specialized polymorphic parameter sets the specialization cost, \eg{} the specialization cost of !pair(T, S*)! is $-1$ (from !T!) rather than $-2$ (from !S!).
+Specialization cost is not counted on the return type list; since $specialization$ is a property of the function declaration, a lower specialization cost prioritizes one declaration over another.
+User programmers can choose between functions with varying parameter lists by adjusting the arguments, but the same is not true in general of varying return types\footnote{In particular, as described in Section~\ref{expr-cost-sec}, cast expressions take the cheapest valid and convertable interpretation of the argument expression, and expressions are resolved as a cast to \lstinline{void}. As a result of this, including return types in the $specialization$ cost means that a function with return type \lstinline{T*} for some polymorphic type \lstinline{T} would \emph{always} be chosen over a function with the same parameter types returning \lstinline{void}, even for \lstinline{void} contexts, an unacceptably counter-intuitive result.}, so the return types are omitted from the $specialization$ element.
+Since both $vars$ and $specialization$ are properties of the declaration rather than any particular interpretation, they are prioritized less than the interpretation-specific conversion costs from Bilson's original 3-tuple.
+A final refinement I have made to the \CFA{} cost model is with regard to choosing among arithmetic conversions.
+The C standard \cite[\S{}6.3.1.8]{C11} states that the common type of !int! and !unsigned int! is !unsigned int! and that the common type of !int! and !long! is !long!, but does not provide guidance for making a choice among conversions.
+Bilson's \CFACC{} uses conversion costs based off Figure~\ref{bilson-conv-fig}.
+However, Bilson's design results in inconsistent and somewhat surprising costs, with conversion to the next-larger same-sign type generally (but not always) double the cost of conversion to the !unsigned! type of the same size.
+In the redesign, for consistency with the approach of the usual arithmetic conversions, which select a common type primarily based on size, but secondarily on sign, arcs in the new graph are annotated with whether they represent a sign change, and such sign changes are summed in a new $sign$ cost element that lexicographically succeeds $safe$.
+This means that sign conversions are approximately the same cost as widening conversions, but slightly more expensive (as opposed to less expensive in Bilson's graph), so maintaining the same signedness is consistently favoured.
+With these modifications, the current \CFA{} cost tuple is as follows:
+\begin{equation*}
+        (unsafe, poly, safe, sign, vars, specialization, reference)
+\end{equation*}
+\subsection{Expression Cost} \label{expr-cost-sec}
+The mapping from \CFA{} expressions to cost tuples is described by Bilson in \cite{Bilson03}, and remains effectively unchanged modulo the refinements to the cost tuple described above.
+Nonetheless, some salient details are repeated here for the sake of completeness.
+On a theoretical level, the resolver treats most expressions as if they were function calls.
+Operators in \CFA{} (both those existing in C and added features like constructors) are all modelled as function calls.
+In terms of the core argument-parameter matching algorithm, overloaded variables are handled the same as zero-argument function calls, aside from a different pool of candidate declarations and setup for different code generation.
+Similarly, an aggregate member expression !a.m! can be modelled as a unary function !m! that takes one argument of the aggregate type.
+Literals do not require sophisticated resolution, as in C the syntactic form of each implies their result types (!42! is !int!, !"hello"! is !char*!, \etc{}), though struct literals (\eg{} !(S){ 1, 2, 3 }! for some struct !S!) require resolution of the implied constructor call.
+Since most expressions can be treated as function calls, nested function calls are the primary component of complexity in expression resolution.
+Each function call has an \emph{identifier} that must match the name of the corresponding declaration, and a possibly-empty list of \emph{arguments}.
+These arguments may be function call expressions themselves, producing a tree of function-call expressions to resolve, where the leaf expressions are generally nullary functions, variable expressions, or literals.
+A single instance of expression resolution consists of matching declarations to all the identifiers in the expression tree of a top-level expression, along with inserting any conversions and satisfying all assertions necessary for that matching.
+The cost of a function-call expression is the sum of the conversion costs of each argument type to the corresponding parameter and the total cost of each subexpression, recursively calculated.
+\CFA{} expression resolution must produce either the unique lowest-cost interpretation of the top-level expression, or an appropriate error message if none exists.
+The cost model of \CFA{} precludes a greedy bottom-up resolution pass, as constraints and costs introduced by calls higher in the expression tree can change the interpretation of those lower in the tree, as in the following example:
+\begin{cfa}
+void f(int);
+double g$\(_1\)$(int);
+int g$\(_2\)$(long);
+f( g(42) );
+\end{cfa}
+Considered independently, !g!$_1$!(42)! is the cheapest interpretation of !g(42)!, with cost $(0,0,0,0,0,0)$ since the argument type is an exact match.
+However, in context, an unsafe conversion is required to downcast the return type of !g!$_1$ to an !int! suitable for !f!, for a total cost of $(1,0,0,0,0,0)$ for !f( g!$_1$!(42) )!.
+If !g!$_2$ is chosen, on the other hand, there is a safe upcast from the !int! type of !42! to !long!, but no cast on the return of !g!$_2$, for a total cost of $(0,0,1,0,0,0)$ for !f( g!$_2$!(42) )!; as this is cheaper, !g!$_2$ is chosen.
+Due to this design, all valid interpretations of subexpressions must in general be propagated to the top of the expression tree before any can be eliminated, a lazy form of expression resolution, as opposed to the eager expression resolution allowed by C, where each expression can be resolved given only the resolution of its immediate subexpressions.
+If there are no valid interpretations of the top-level expression, expression resolution fails and must produce an appropriate error message.
+If any subexpression has no valid interpretations, the process can be short-circuited and the error produced at that time.
+If there are multiple valid interpretations of a top-level expression, ties are broken based on the conversion cost, calculated as above.
+If there are multiple minimal-cost valid interpretations of a top-level expression, that expression is said to be \emph{ambiguous}, and an error must be produced.
+Multiple minimal-cost interpretations of a subexpression do not necessarily imply an ambiguous top-level expression, however, as the subexpression interpretations may be disambiguated based on their return type or by selecting a more-expensive interpretation of that subexpression to reduce the overall expression cost, as in the example above.
+The \CFA{} resolver uses type assertions to filter out otherwise-valid subexpression interpretations.
+An interpretation can only be selected if all the type assertions in the !forall! clause on the corresponding declaration can be satisfied with a unique minimal-cost set of satisfying declarations.
+Type assertion satisfaction is tested by performing type unification on the type of the assertion and the type of the declaration satisfying the assertion.
+That is, a declaration that satisfies a type assertion must have the same name and type as the assertion after applying the substitutions in the type environment.
+Assertion-satisfying declarations may be polymorphic functions with assertions of their own that must be satisfied recursively.
+This recursive assertion satisfaction has the potential to introduce infinite loops into the type resolution algorithm, a situation which \CFACC{} avoids by imposing a hard limit on the depth of recursive assertion satisfaction (currently 4); this approach is also taken by \CC{} to prevent infinite recursion in template expansion, and has proven to be effective and not unduly restrictive of the expressive power of \CFA{}.
+Cast expressions must be treated somewhat differently than functions for backwards compatibility purposes with C.
+In C, cast expressions can serve two purposes, \emph{conversion} (\eg{} !(int)3.14!), which semantically converts a value to another value in a different type with a different bit representation, or \emph{coercion} (\eg{} !void* p; (int*)p;!), which assigns a different type to the same bit value.
+C provides a set of built-in conversions and coercions, and user programmers are able to force a coercion over a conversion if desired by casting pointers.
+The overloading features in \CFA{} introduce a third cast semantic, \emph{ascription} (\eg{} !int x; double x; (int)x;!), which selects the overload that most-closely matches the cast type.
+However, since ascription does not exist in C due to the lack of overloadable identifiers, if a cast argument has an unambiguous interpretation as a conversion argument then it must be interpreted as such, even if the ascription interpretation would have a lower overall cost.
+This is demonstrated in the following example, adapted from the C standard library:
+\begin{cfa}
+unsigned long long x;
+(unsigned)(x >> 32);
+\end{cfa}
+In C semantics, this example is unambiguously upcasting !32! to !unsigned long long!, performing the shift, then downcasting the result to !unsigned!, at total cost $(1,0,3,1,0,0,0)$.
+If ascription were allowed to be a first-class interpretation of a cast expression, it would be cheaper to select the !unsigned! interpretation of !?>>?! by downcasting !x! to !unsigned! and upcasting !32! to !unsigned!, at a total cost of $(1,0,1,1,0,0,0)$.
+However, this break from C semantics is not backwards compatibile, so to maintain C compatibility, the \CFA{} resolver selects the lowest-cost interpretation of the cast argument for which a conversion or coercion to the target type exists (upcasting to !unsigned long long! in the example above, due to the lack of unsafe downcasts), using the cost of the conversion itself only as a tie-breaker.
+For example, in !int x; double x; (int)x;!, both declarations have zero-cost interpretations as !x!, but the !int x! interpretation is cheaper to cast to !int!, and is thus selected.
+Thus, in contrast to the lazy resolution of nested function-call expressions discussed above, where final interpretations for each subexpression are not chosen until the top-level expression is reached, cast expressions introduce eager resolution of their argument subexpressions, as if that argument was itself a top-level expression.
+\section{Resolution Algorithms}
+\CFA{} expression resolution is not, in general, polynomial in the size of the input expression, as shown in Section~\ref{resn-analysis-sec}.
+While this theoretical result is daunting, its implications can be mitigated in practice.
+\CFACC{} does not solve one instance of expression resolution in the course of compiling a program, but rather thousands; therefore, if the worst case of expression resolution is sufficiently rare, worst-case instances can be amortized by more-common easy instances for an acceptable overall runtime, as shown in Section~\ref{instance-expr-sec}.
+Secondly, while a programmer \emph{can} deliberately generate a program designed for inefficient compilation\footnote{See for instance \cite{Haberman16}, which generates arbitrarily large \CC{} template expansions from a fixed-size source file.}, source code tends to follow common patterns.
+Programmers generally do not want to run the full compiler algorithm in their heads, and as such keep mental shortcuts in the form of language idioms.
+If the compiler can be tuned to handle idiomatic code more efficiently, then the reduction in runtime for idiomatic (but otherwise difficult) resolution instances can make a significant difference in total compiler runtime.
+\subsection{Worst-case Analysis} \label{resn-analysis-sec}
+Expression resolution has a number of components that contribute to its runtime, including argument-parameter type unification, recursive traversal of the expression tree, and satisfaction of type assertions.
+If the bound type for a type variable can be looked up or mutated in constant time (as asserted in Table~\ref{env-bounds-table}), then the runtime of the unification algorithm to match an argument to a parameter is usually proportional to the complexity of the types being unified.
+In C, complexity of type representation is bounded by the most-complex type explicitly written in a declaration, effectively a small constant; in \CFA{}, however, polymorphism can generate more-complex types:
+\begin{cfa}
+        forall(otype T) pair(T) wrap(T x, T y);
+        wrap(wrap(wrap(1, 2), wrap(3, 4)), wrap(wrap(5, 6), wrap(7, 8)));
+\end{cfa}
+To resolve the outermost !wrap!, the resolver must check that !pair(pair(int))! unifies with itself, but at three levels of nesting, !pair(pair(int))! is more complex than either !pair(T)! or !T!, the types in the declaration of !wrap!.
+Accordingly, the cost of a single argument-parameter unification is $O(d)$, where $d$ is the depth of the expression tree, and the cost of argument-parameter unification for a single candidate for a given function call expression is $O(pd)$, where $p$ is the number of parameters.
+This bound does not, however, account for the higher costs of unifying two polymorphic type variables, which may in the worst case result in a recursive unification of all type variables in the expression (as discussed in Chapter~\ref{env-chap}).
+Since this recursive unification reduces the number of type variables, it may happen at most once, for an added $O(p^d)$ cost for a top-level expression with $O(p^d)$ type variables.
+Implicit conversions are also checked in argument-parameter matching, but the cost of checking for the existence of an implicit conversion is again proportional to the complexity of the type, $O(d)$.
+Polymorphism also introduces a potential expense here; for a monomorphic function there is only one potential implicit conversion from argument type to parameter type, while if the parameter type is an unbound polymorphic type-variable then any implicit conversion from the argument type could potentially be considered a valid binding for that type variable.
+\CFA{}, however, requires exact matches for the bound type of polymorphic parameters, removing this problem.
+An interesting question for future work is whether loosening this requirement incurs a significant compiler runtime cost in practice; preliminary results from the prototype system described in Chapter~\ref{expr-chap} suggest it does not.
+Considering the recursive traversal of the expression tree, polymorphism again greatly expands the worst-case runtime.
+Let $i$ be the number of candidate declarations for each function call; if all of these candidates are monomorphic, then there are no more than $i$ unambiguous interpretations of the subexpression rooted at that function call.
+Ambiguous minimal-cost subexpression interpretations may also be collapsed into a single \emph{ambiguous interpretation}, as the presence of such a subexpression interpretation in the final solution is an error condition.
+One safe pruning operation during expression resolution is to discard all subexpression interpretations with greater-than-minimal cost for their return type, as such interpretations cannot beat the minimal-cost interpretation with their return type for the overall optimal solution.
+As such, with no polymorphism, each declaration can generate no more than one minimal-cost interpretation with its return type, so the number of possible subexpression interpretations is $O(i)$ (note that in C, which lacks overloading, $i \leq 1$).
+With polymorphism, however, a single declaration (like !wrap! above) can have many concrete return types after type variable substitution, and could in principle have a different concrete return type for each combination of argument interpretations.
+Calculated recursively, the bound on the total number of candidate interpretations is $O(i^{p^d})$, each with a distinct type.
+Given these calculations of number of subexpression interpretations and matching costs, the upper bound on runtime for generating candidates for a single subexpression $d$ levels up from the leaves is $O( i^{p^d} \cdot pd )$.
+Since there are $O(p^d)$ subexpressions in a single top-level expression, the total worst-case cost of argument-parameter matching with the overloading and polymorphism features of \CFA{} is $O( i^{p^d} \cdot pd \cdot p^d )$.
+Since the size of the expression is $O(p^d)$, letting $n = p^d$ this simplifies to $O(i^n \cdot n^2)$
+This bound does not yet account for the cost of assertion satisfaction, however.
+\CFA{} uses type unification on the assertion type and the candidate declaration type to test assertion satisfaction; this unification calculation has cost proportional to the complexity of the declaration type after substitution of bound type variables; as discussed above, this cost is $O(d)$.
+If there are $O(a)$ type assertions on each declaration, there are $O(i)$ candidates to satisfy each assertion, for a total of $O(ai)$ candidates to check for each declaration.
+However, each assertion candidate may generate another $O(a)$ assertions, recursively until the assertion recursion limit $r$ is reached, for a total cost of $O((ai)^r \cdot d)$.
+Now, $a$ and $i$ are properties of the set of declarations in scope, while $r$ is defined by the language spec, so $(ai)^r$ is essentially a constant for purposes of expression resolution, albeit a very large one.
+It is not uncommon in \CFA{} to have functions with dozens of assertions, and common function names (\eg{} !?{}!, the constructor) can have hundreds of overloads.
+It is clear that assertion satisfaction costs can be very large, and in fact a method for heuristically reducing these costs is one of the key contributions of this thesis, but it should be noted that the worst-case analysis is a particularly poor match for actual code in the case of assertions.
+It is reasonable to assume that most code compiles without errors, as an actively-developed project is compiled many times, generally with relatively few new errors introduced between compiles.
+However, the worst-case bound for assertion satisfaction is based on recursive assertion satisfaction calls exceeding the limit, which is an error case.
+In practice, then, the depth of recursive assertion satisfaction should be bounded by a small constant for error-free code, which accounts for the vast majority of problem instances.
+Similarly, uses of polymorphism like those that generate the $O(d)$ bound on unification or the $O(i^{p^d})$ bound on number of candidates are rare, but not completely absent.
+This analysis points to type unification, argument-parameter matching, and assertion satisfaction as potentially costly elements of expression resolution, and thus profitable targets for algorithmic investigation.
+Type unification is discussed in Chapter~\ref{env-chap}, while the other aspects are covered below.
+\subsection{Argument-Parameter Matching} \label{arg-parm-matching-sec}
+Pruning possible interpretations as early as possible is one way to reduce the real-world cost of expression resolution, provided that a sufficient proportion of interpretations are pruned to pay for the cost of the pruning algorithm.
+One opportunity for interpretation pruning is by the argument-parameter type matching, but the literature provides no clear answers on whether candidate functions should be chosen according to their available arguments, or whether argument resolution should be driven by the available function candidates.
+For programming languages without implicit conversions, argument-parameter matching is essentially the entirety of the expression resolution problem, and is generally referred to as ``overload resolution'' in the literature.
+All expression-resolution algorithms form a DAG of interpretations, some explicitly, some implicitly; in this DAG, arcs point from function-call interpretations to argument interpretations, as in Figure~\ref{res-dag-fig}
+\begin{figure}[h]
+        \centering
+        \begin{subfigure}[h]{3in}
+        \begin{cfa}
+        char *p$\(_1\)$;
+        int *p$\(_2\)$;
+        char *f$\(_1\)$(char*, int*);
+        double *f$\(_2\)$(int*, int*);
+        f$\(_A\)$( f$\(_B\)$( p$\(_A\)$, p$\(_B\)$ ), p$\(_C\)$ );
+        \end{cfa}
+        \end{subfigure}~\begin{subfigure}[h]{2.5in}
+        \includegraphics{figures/resolution-dag}
+        \end{subfigure}
+        \caption[Resolution DAG for a simple expression.]{Resolution DAG for a simple expression, annotated with explanatory subscripts. Functions that do not have a valid argument matching are covered with an \textsf{X}.} \label{res-dag-fig}
+\end{figure}
+Note that some interpretations may be part of more than one super-interpretation, as with the !p!$_2$ interpretation of !p!$_B$, while some valid subexpression interpretations, like the !f!$_2$ interpretation of !f!$_B$, are not used in any interpretation of their superexpression.
+Overload resolution was first seriously considered in the development of compilers for the Ada programming language, with different algorithms making various numbers of passes over the expression DAG, these passes being either top-down or bottom-up.
+Baker's algorithm~\cite{Baker82} takes a single pass from the leaves of the expression tree up, pre-computing argument candidates at each step.
+For each candidate function, Baker attempts to match argument types to parameter types in sequence, failing if any parameter cannot be matched.
+Bilson~\cite{Bilson03} similarly pre-computes argument-candidates in a single bottom-up pass in the original \CFACC{}, but then explicitly enumerates all possible argument combinations for a multi-parameter function.
+These argument combinations are matched to the parameter types of the candidate function as a unit rather than individual arguments.
+Bilson's approach is less efficient than Baker's, as the same argument may be compared to the same parameter many times, but does allow a more straightforward handling of polymorphic type-binding and tuple-typed expressions.
+Unlike Baker and Bilson, Cormack's algorithm~\cite{Cormack81} requests argument candidates that match the type of each parameter of each candidate function, in a single pass from the top-level expression down; memoization of these requests is presented as an optimization.
+As presented, this algorithm requires the parameter to have a known type, which is a poor fit for polymorphic type parameters in \CFA{}.
+Cormack's algorithm can be modified to request argument interpretations of \emph{any} type when provided an unbound parameter type, but this eliminates any pruning gains that could be provided by the algorithm.
+Ganzinger and Ripken~\cite{Ganzinger80} propose an approach (later refined by Pennello~\etal{}~\cite{Pennello80}) that uses a top-down filtering pass followed by a bottom-up filtering pass to reduce the number of candidate interpretations; they prove that a small number of such iterations is sufficient to converge to a solution for the overload resolution problem in the Ada programming language.
+Persch~\etal{}~\cite{PW:overload} developed a similar two-pass approach where the bottom-up pass is followed by the top-down pass.
+These approaches differ from Baker, Bilson, or Cormack in that they take multiple passes over the expression tree to yield a solution by applying filtering heuristics to all expression nodes.
+This approach of filtering out invalid types is unsuited to \CFA{} expression resolution, however, due to the presence of polymorphic functions and implicit conversions.
+Baker~\cite{Baker82} left empirical comparison of different overload resolution algorithms to future work; Bilson~\cite{Bilson03} described an extension of Baker's algorithm to handle implicit conversions and polymorphism, but did not further explore the space of algorithmic approaches to handle both overloaded names and implicit conversions.
+This thesis closes that gap in the literature by performing performance comparisons of both top-down and bottom-up expression resolution algorithms, with results reported in Chapter~\ref{expr-chap}.
+\subsection{Assertion Satisfaction} \label{assn-sat-sec}
+The assertion satisfaction algorithm designed by Bilson~\cite{Bilson03} for the original \CFACC{} is the most-relevant prior work to this project.
+Before accepting any subexpression candidate, Bilson first checks that that candidate's assertions can all be resolved; this is necessary due to Bilson's addition of assertion satisfaction costs to candidate costs (discussed in Section~\ref{conv-cost-sec}).
+If this subexpression interpretation ends up not being used in the final resolution, than the (sometimes substantial) work of checking its assertions ends up wasted.
+Bilson's assertion checking function recurses on two lists, !need! and !newNeed!, the current declaration's assertion set and those implied by the assertion-satisfying declarations, respectively, as detailed in the pseudo-code below (ancillary aspects of the algorithm are omitted for clarity):
+\begin{cfa}
+List(List(Declaration)) checkAssertions(
+                List(Assertion) need, List(Assertion) newNeed, List(Declaration) have,
+                Environment env ) {
+        if ( is_empty(need) ) {
+                if ( is_empty( newNeed ) ) return { have };
+                else return checkAssertions( newNeed, {}, have, env );
+        }
+        Assertion a = head(need);
+        Type adjType = substitute( a.type, env );
+        List(Declaration) candidates = decls_matching( a.name );
+        List(List(Declaration)) alternatives = {}
+        for ( Declaration c : candidates ) {
+                Environment newEnv = env;
+                if ( unify( adjType, c.type, newEnv ) ) {
+                        append( alternatives,
+                                checkAssertions(
+                                        tail(need), append(newNeed, c.need), append(have, c), newEnv ) );
+                }
+        }
+        return alternatives;
+}
+\end{cfa}
+One shortcoming of this approach is that if an earlier assertion has multiple valid candidates, later assertions may be checked many times due to the structure of the recursion.
+Satisfying declarations for assertions are not completely independent of each other, since the unification process may produce new type bindings in the environment, and these bindings may not be compatible between independently-checked assertions.
+Nonetheless, with the environment data-structures discussed in Chapter~\ref{env-chap}, I have found it more efficient to produce a list of possibly-satisfying declarations for each assertion once, then check their respective environments for mutual compatibility when combining lists of assertions together.
+Another improvement I have made to the assertion resolution scheme in \CFACC{} is to consider all assertion-satisfying combinations at one level of recursion before attempting to recursively satisfy any !newNeed! assertions.
+Monomorphic functions are cheaper than polymorphic functions for assertion satisfaction because they are an exact match for the environment-adjusted assertion type, whereas polymorphic functions require an extra type binding.
+Thus, if there is any mutually-compatible set of assertion-satisfying declarations that does not include any polymorphic functions (and associated recursive assertions), then the optimal set of assertions does not require any recursive !newNeed! satisfaction.
+More generally, due to the \CFA{} cost-model changes I introduced in Section~\ref{conv-cost-sec}, the conversion cost of an assertion-satisfying declaration is no longer dependent on the conversion cost of its own assertions.
+As such, all sets of mutually-compatible assertion-satisfying declarations can be sorted by their summed conversion costs, and the recursive !newNeed! satisfaction pass is required only to check the feasibility of the minimal-cost sets.
+This optimization significantly reduces wasted work relative to Bilson's approach, as well as avoiding generation of deeply-recursive assertion sets, for a significant performance improvement relative to Bilson's \CFACC{}.
+Making the conversion cost of an interpretation independent of the cost of satisfying its assertions has further benefits.
+Bilson's algorithm checks assertions for all subexpression interpretations immediately, including those that are not ultimately used; I have developed a \emph{deferred} variant of assertion checking that waits until a top-level interpretation has been generated to check any assertions.
+If the assertions of the minimal-cost top-level interpretation cannot be satisfied then the next-most-minimal-cost interpretation's assertions are checked, and so forth until a minimal-cost satisfiable interpretation (or ambiguous set thereof) is found, or no top-level interpretations are found to have satisfiable assertions.
+In the common case where the code actually does compile, this saves the work of checking assertions for ultimately-rejected interpretations, though it does rule out some pruning opportunities for subinterpretations with unsatisfiable assertions or which are more expensive than a minimal-cost polymorphic function with the same return type.
+The experimental results in Chapter~\ref{expr-chap} indicate that this is a worthwhile trade-off.
+Optimizing assertion satisfaction for common idioms has also proved effective in \CFA{}; the code below is an unexceptional print statement derived from the \CFA{} test suite that nonetheless is a very difficult instance of expression resolution:
+\begin{cfa}
+sout | "one" | 1 | "two" | 2 | "three" | 3 | "four" | 4 | "five" | 5 | "six" | 6
+        | "seven" | 7 | "eight" | 8 | "nine" | 9 | "ten" | 10 | "end" | nl | nl;
+\end{cfa}
+The first thing that makes this expression so difficult is that it is 23 levels deep; Section~\ref{resn-analysis-sec} indicates that the worst-case bounds on expression resolution are exponential in expression depth.
+Secondly, the !?|?! operator is significantly overloaded in \CFA{} --- there are 74 such operators in the \CFA{} standard library, and while 9 are arithmetic operators inherited from C, the rest are polymorphic I/O operators that look similar to:
+\begin{cfa}
+forall( dtype ostype | ostream( ostype ) )
+ostype& ?|? ( ostype&, int );
+\end{cfa}
+Note that !ostream! is a trait with 25 type assertions, and that the output operators for the other arithmetic types are also valid for the !int!-type parameters due to implicit conversions.
+On this instance, deferred assertion satisfaction saves wasted work checking assertions on the wrong output operators, but does nothing about the 23 repeated checks of the 25 assertions to determine that !ofstream! (the type of !sout!) satisfies !ostream!.
+To solve this problem, I have developed a \emph{cached} variant of assertion checking.
+During the course of checking the assertions of a single top-level expression, the results are cached for each assertion checked.
+The search key for this cache is the assertion declaration with its type variables substituted according to the type environment to distinguish satisfaction of the same assertion for different types.
+This adjusted assertion declaration is then run through the \CFA{} name-mangling algorithm to produce an equivalent string-type key.
+The assertion satisfaction aspect of \CFA{} expression resolution bears some similarity to satisfiability problems from logic, and as such other languages with similar trait and assertion mechanisms make use of logic-program solvers in their compilers.
+For instance, Matsakis~\cite{Matsakis17} and the Rust team have developed a PROLOG-based engine to check satisfaction of Rust traits.
+The combination of the assertion satisfaction elements of the problem with the conversion-cost model of \CFA{} makes this logic-solver approach difficult to apply in \CFACC{}, however.
+Expressing assertion resolution as a satisfiability problem ignores the cost optimization aspect, which is necessary to decide among what are often many possible satisfying assignments of declarations to assertions.
+On the other hand, the deeply-recursive nature of the satisfiability problem makes it difficult to adapt to optimizing solver approaches such as linear programming.
+To maintain a well-defined programming language, any optimization algorithm used must provide an exact (rather than approximate) solution; this constraint also rules out a whole class of approximately-optimal generalized solvers.
+As such, I opted to continue Bilson's approach of designing a bespoke solver for \CFA{} assertion satisfaction, rather than attempting to re-express the problem in some more general formalism.
+\section{Conclusion \& Future Work} \label{resn-conclusion-sec}
+As the results in Chapter~\ref{expr-chap} show, the algorithmic approaches I have developed for \CFA{} expression resolution are sufficient to build a practically-performant \CFA{} compiler.
+This work may also be of use to other compiler construction projects, notably to members of the \CC{} community as they implement the new Concepts \cite{C++Concepts} standard, which includes type assertions similar to those used in \CFA{}, as well as the C-derived implicit conversion system already present in \CC{}.
+I have experimented with using expression resolution rather than type unification to check assertion satisfaction; this variant of the expression resolution problem should be investigated further in future work.
+This approach is more flexible than type unification, allowing for conversions to be applied to functions to satisfy assertions.
+Anecdotally, this flexibility matches user-programmer expectations better, as small type differences (\eg{} the presence or absence of a reference type, or the usual conversion from !int! to !long!) no longer break assertion satisfaction.
+Practically, the resolver prototype discussed in Chapter~\ref{expr-chap} uses this model of assertion satisfaction, with no apparent deficit in performance; the generated expressions that are resolved to satisfy the assertions are easier than the general case because they never have nested subexpressions, which eliminates much of the theoretical differences between unification and resolution.
+The main challenge to implement this approach in \CFACC{} is applying the implicit conversions generated by the resolution process in the code-generation for the thunk functions that \CFACC{} uses to pass type assertions to their requesting functions with the proper signatures.
+Though performance of the existing algorithms is promising, some further optimizations do present themselves.
+The refined cost model discussed in Section~\ref{conv-cost-sec} is more expressive, but requires more than twice as many fields; it may be fruitful to investigate more tightly-packed in-memory representations of the cost-tuple, as well as comparison operations that require fewer instructions than a full lexicographic comparison.
+Integer or vector operations on a more-packed representation may prove effective, though dealing with the negative-valued $specialization$ field may require some effort.
+Parallelization of various phases of expression resolution may also be useful.
+The algorithmic variants I have introduced for both argument-parameter matching and assertion satisfaction are essentially divide-and-conquer algorithms, which solve subproblem instances for each argument or assertion, respectively, then check mutual compatibility of the solutions.
+While the checks for mutual compatibility are naturally more serial, there may be some benefit to parallel resolution of the subproblem instances.
+The resolver prototype built for this project and described in Chapter~\ref{expr-chap} would be a suitable vehicle for many of these further experiments, and thus a technical contribution of continuing utility.

doc/theses/aaron_moss_PhD/phd/thesis.tex

-              r1c35c78
+              r8b590a4
 \usepackage{footmisc} % for double refs to the same footnote
+\usepackage{caption} % for subfigure
+\usepackage{subcaption}
 % Hyperlinks make it very easy to navigate an electronic document.
 …
 \input{resolution-heuristics}
 \input{type-environment}
+\input{experiments}
 \input{conclusion}
 …
 % \nocite{*}
+% APPENDICIES
+% -----------
+\appendix
+\input{generic-bench}
 \end{document}

doc/theses/aaron_moss_PhD/phd/type-environment.tex

-              r1c35c78
+              r8b590a4
 One key data structure for expression resolution is the \emph{type environment}.
 As discussed in Chapter~\ref{resolution-chap}, being able to efficiently determine which type variables are bound to which concrete types or whether two type environments are compatible is a core requirement of the resolution algorithm.
+Furthermore, expression resolution involves a search through many related possible solutions, so being able to re-use shared subsets of type environment data and to switch between environments quickly is desirable for performance.
+In this chapter I discuss and empirically compare a number of type environment data structure variants, including some novel variations on the union-find\cite{Galler64} data structure introduced in this thesis.
+Furthermore, expression resolution involves a search through many related possible solutions, so the ability to re-use shared subsets of type-environment data and to switch between environments quickly is desirable for performance.
+In this chapter, I discuss a number of type-environment data-structure variants, including some novel variations on the union-find \cite{Galler64} data structure introduced in this thesis.
+Chapter~\ref{expr-chap} contains empirical comparisons of the performance of these data structures when integrated into the resolution algorithm.
 \section{Definitions} \label{env-defn-sec}
 For purposes of this chapter, a \emph{type environment} $T$ is a set of \emph{type classes} $\myset{T_1, T_2, \cdots, T_{|T|}}$.
+Each type class $T_i$ contains a set of \emph{type variables} $\myset{v_{i,1}, v_{i,2}, \cdots, v_{i,|T_i|}}$; note that the sets of variables contained in two distinct classes in the same environment must be disjoint.
+Each individual type class $T_i$ may also be associated with a \emph{bound}, $b_i$; this bound contains the \emph{bound type} which the variables in the type class are replaced with, but also includes other information in \CFACC{}, including whether type conversions are permissible on the bound type and what sort of type variables are contained in the class (data types, function types, or variadic tuples).
+Each type class $T_i$ contains a set of \emph{type variables} $\myset{v_{i,1}, v_{i,2}, \cdots, v_{i,|T_i|}}$.
+Since the type classes represent an equivalence relation over the type variables the sets of variables contained in two distinct classes in the same environment must be disjoint.
+Each individual type class $T_i$ may also be associated with a \emph{bound}, $b_i$; this bound contains the \emph{bound type} that the variables in the type class are replaced with, but also includes other information in \CFACC{}, including whether type conversions are permissible on the bound type and what sort of type variables are contained in the class (data types, function types, or variadic tuples).
 \begin{table}
 …
 \centering
 \begin{tabular}{r@{\hskip 0.25em}ll}
+        $find(T, v_{i,j})$ & $\rightarrow T_i | \bot$           & Locate class for variable             \\
+        \hline
+        $find(T, v_{i,j})$ & $\rightarrow T_i~|~\mathsf{fail}$          & Locate class for variable             \\
         $report(T_i)$ & $\rightarrow \{ v_{i,j} \cdots \}$      & List variables for class              \\
         $bound(T_i)$ & $\rightarrow b_i | \bot$                         & Get bound for class                   \\
         $insert(v_{i,1})$ &                                                                     & New single-variable class             \\
+        $bound(T_i)$ & $\rightarrow b_i~|~\mathsf{fail}$                                & Get bound for class                   \\
+        $insert(T, v_{i,1})$ &                                                          & New single-variable class             \\
         $add(T_i, v_{i,j})$ &                                                           & Add variable to class                 \\
         $bind(T_i, b_i)$ &                                                                      & Set or update class bound             \\
+        $unify(T, T_i, T_j)$ & $\rightarrow \top | \bot$        & Combine two type classes              \\
+        \hline
+        $unify(T, T_i, T_j)$ & $\rightarrow \mathsf{pass}~|~\mathsf{fail}$      & Combine two type classes              \\
         $split(T, T_i)$ & $\rightarrow T'$                                      & Revert the last $unify$ operation on $T_i$            \\
         $combine(T, T')$ & $\rightarrow \top | \bot$            & Merge two environments                \\
+        $combine(T, T')$ & $\rightarrow \mathsf{pass}~|~\mathsf{fail}$          & Merge two environments                \\
         $save(T)$ & $\rightarrow H$                                                     & Get handle for current state  \\
+        $backtrack(T, H)$ &                                                                     & Return to handle state
+        $backtrack(T, H)$ &                                                                     & Return to handle state                \\
+        \hline
 \end{tabular}
 \end{table}
 Given this basic structure, type environments in \CFACC{} need to support eleven basic operations, summarized in Table~\ref{env-op-table}.
 The first seven operations are straightforward queries and updates on these data structures:
 The lookup operation $find(T, v_{i,j})$ produces $T_i$, the type class in $T$ which contains variable $v_{i,j}$, or an invalid sentinel value for no such class.
+The first six operations are straightforward queries and updates on these data structures:
+The lookup operation $find(T, v_{i,j})$ produces $T_i$, the type class in $T$ that contains variable $v_{i,j}$, or an invalid sentinel value for no such class.
 The other two query operations act on type classes, where $report(T_i)$ produces the set $\myset{v_{i,1}, v_{i,2}, \cdots, v_{i,|T_i|}}$ of all type variables in a class $T_i$ and $bound(T_i)$ produces the bound $b_i$ of that class, or a sentinel indicating no bound is set.
 The update operation $insert(T, v_{i,1})$ creates a new type class $T_i$ in $T$ that contains only the variable $v_{i,1}$ and no bound; due to the disjointness property $v_{i,1}$ cannot belong to any other type class in $T$.
+The update operation $insert(T, v_{i,1})$ creates a new type class $T_i$ in $T$ that contains only the variable $v_{i,1}$ and no bound; due to the disjointness property, $v_{i,1}$ cannot belong to any other type class in $T$.
 The $add(T_i, v_{i,j})$ operation adds a new type variable $v_{i,j}$ to class $T_i$; again, $v_{i,j}$ cannot exist elsewhere in $T$.
 $bind(T_i, b_i)$ mutates the bound for a type class, setting or updating the current bound.
 The $unify$ operation is the fundamental non-trivial operation a type environment data structure must support.
+The $unify$ operation is the fundamental non-trivial operation a type-environment data-structure must support.
 $unify(T, T_i, T_j)$ merges a type class $T_j$ into another $T_i$, producing a failure result and leaving $T$ in an invalid state if this merge fails.
 It is always possible to unify the type variables of both classes by simply taking the union of both sets; given the disjointness property, no checks for set containment are required, and the variable sets can simply be concatenated if supported by the underlying data structure.
 $unify$ depends on an internal $unifyBound$ operation which may fail.
 In \CFACC{}, $unifyBound(b_i, b_j) \rightarrow b'_i|\bot$ checks that the type classes contain the same sort of variable, takes the tighter of the two conversion permissions, and checks if the bound types can be unified.
+$unify$ depends on an internal $unifyBound$ operation, which may fail.
+In \CFACC{}, $unifyBound(b_i, b_j) \rightarrow b'_i~|~\mathsf{fail}$ checks that the type classes contain the same sort of variable, takes the tighter of the two conversion permissions, and checks if the bound types can be unified.
 If the bound types cannot be unified (\eg{} !struct A! with !int*!), then $unifyBound$ fails, while other combinations of bound types may result in recursive calls.
 For instance, unifying !R*! with !S*! for type variables !R! and !S! will result in a call to $unify(T, find($!R!$), find($!S!$))$, while unifying !R*! with !int*! will result in a call to $unifyBound$ on !int! and the bound type of the class containing !R!.
+For instance, unifying !R*! with !S*! for type variables !R! and !S! results in a call to $unify(T, find($!R!$), find($!S!$))$, while unifying !R*! with !int*! results in a call to $unifyBound$ on !int! and the bound type of the class containing !R!.
 As such, a call to $unify(T, T_i, T_j)$ may touch every type class in $T$, not just $T_i$ and $T_j$, collapsing the entirety of $T$ into a single type class in extreme cases.
 For more information on \CFA{} unification, see \cite{Bilson03}.
 The inverse of $unify$ is $split(T, T_i)$, which produces a new environment $T'$ which is the same as $T$ except that $T_i$ has been replaced by two classes corresponding to the arguments to the previous call to $unify$ on $T_i$.
 If there has been no call to $unify$ on $T_i$ (\ie{} $T_i$ is a single-element class) $T_i$ is absent in $T'$.
 Given the nature of the expression resolution problem as backtracking search, caching and concurrency are both useful tools to decrease runtime.
+The inverse of $unify$ is $split(T, T_i)$, which produces a new environment $T'$ that is the same as $T$ except that $T_i$ has been replaced by two classes corresponding to the arguments to the previous call to $unify$ on $T_i$.
+If there is no prior call to $unify$ on $T_i$ (\ie{} $T_i$ is a single-element class) $T_i$ is absent in $T'$.
+Given the nature of the expression resolution problem as a backtracking search, caching and concurrency are both useful tools to decrease runtime.
 However, both of these approaches may produce multiple distinct descendants of the same initial type environment, which have possibly been mutated in incompatible ways.
 As such, to effectively employ either concurrency or caching, the type environment data structure must support an efficient method to check if two type environments are compatible and merge them if so.
 $combine(T,T')$ attempts to merge an environment $T'$ into another environment $T$, producing $\top$ if successful or leaving $T$ in an invalid state and producing $\bot$ otherwise.
 The invalid state of $T$ on failure is not important, given that a combination failure will result in the resolution algorithm backtracking to a different environment.
+As such, to effectively employ either caching or concurrency, the type environment data structure must support an efficient method to check if two type environments are compatible and merge them if so.
+$combine(T,T')$ attempts to merge an environment $T'$ into another environment $T$, producing $\mathsf{pass}$ if successful or leaving $T$ in an invalid state and producing $\mathsf{fail}$ otherwise.
+The invalid state of $T$ on failure is not important, given that a combination failure results in the resolution algorithm backtracking to a different environment.
 $combine$ proceeds by calls to $insert$, $add$, and $unify$ as needed, and can be roughly thought of as calling $unify$ on every pair of classes in $T$ that have variables $v'_{i,j}$ and $v'_{i,k}$ in the same class $T'_i$ in $T'$.
 Like $unify$, $combine$ can always find a mutually-consistent partition of type variables into classes (in the extreme case, all type variables from $T$ and $T'$ in a single type class), but may fail due to inconsistent bounds on merged type classes.
 …
 The set of mutations to a type environment across the execution of the resolution algorithm produce an implicit tree of related environments, and the backtracking search typically focuses only on one leaf of the tree at once, or at most a small number of closely-related nodes as arguments to $combine$.
 As such, the ability to save and restore particular type environment states is useful, and supported by the $save(T) \rightarrow H$ and $backtrack(T, H)$ operations, which produce a handle for the current environment state and mutate an environment back to a previous state, respectively.
 These operations can be naively implemented by a deep copy of $T$ into $H$ and vice versa, but have more efficient implementations in persistency-aware data structures.
 \section{Approaches}
 \subsection{Na\"{\i}ve}
 The type environment data structure used in Bilson's\cite{Bilson03} original implementation of \CFACC{} is a straightforward translation of the definitions in Section~\ref{env-defn-sec} to \CC{} code; a !TypeEnvironment! contains a list of !EqvClass! type equivalence classes, each of which contains the type bound information and a tree-based sorted set of type variables.
+These operations can be naively implemented by a deep copy of $T$ into $H$ and vice versa, but have more efficient implementations in persistency-aware data structures such as the persistent union-find introduced in Section~\ref{env-persistent-union-find}.
+\section{Approaches} \label{env-approaches-sec}
+\subsection{Na\"{\i}ve} \label{naive-env-sec}
+The type environment data structure used in Bilson's~\cite{Bilson03} original implementation of \CFACC{} is a straightforward translation of the definitions in Section~\ref{env-defn-sec} to \CC{} code; a !TypeEnvironment! contains a list of !EqvClass! type equivalence classes, each of which contains the type bound information and a tree-based sorted set of type variables.
 This approach has the benefit of being easy to understand and not imposing life-cycle or inheritance constraints on its use, but, as can be seen in Table~\ref{env-bounds-table}, does not support many of the desired operations with any particular efficiency.
 Some variations on this structure may improve performance somewhat; for instance, replacing the !EqvClass! variable storage with a hash-based set would reduce search and update times from $O(\log n)$ to amortized $O(1)$, while adding an index for the type variables in the entire environment would remove the need to check each type class individually to maintain the disjointness property.
+Some variations on this structure may improve performance somewhat; for instance, replacing the !EqvClass! variable storage with a hash-based set reduces search and update times from $O(\log n)$ to amortized $O(1)$, while adding an index for the type variables in the entire environment removes the need to check each type class individually to maintain the disjointness property.
 These improvements do not change the fundamental issues with this data structure, however.
 \subsection{Incremental Inheritance}
 One more invasive modification to this data structure which I investigated is to support swifter combinations of closely-related environments in the backtracking tree by storing a reference to a \emph{parent} environment within each environment, and having that environment only store type classes which have been modified with respect to the parent.
+\subsection{Incremental Inheritance} \label{inc-env-sec}
+One more invasive modification to this data structure that I investigated is to support swifter combinations of closely-related environments in the backtracking tree by storing a reference to a \emph{parent} environment within each environment, and having that environment only store type classes that have been modified with respect to the parent.
 This approach provides constant-time copying of environments, as a new environment simply consists of an empty list of type classes and a reference to its (logically identical) parent; since many type environments are no different than their parent, this speeds backtracking in this common case.
 Since all mutations made to a child environment are by definition compatible with the parent environment, two descendants of a common ancestor environment can be combined by iteratively combining the changes made in one environment then that environment's parent until the common ancestor is reached, again re-using storage and reducing computation in many cases.
 For this environment I also employed a lazily-generated index of type variables to their containing class, which could be in either the current environment or an ancestor.
 Any mutation of a type class in an ancestor environment would cause that class to be copied into the current environment before mutation, as well as added to the index, ensuring that all local changes to the type environment are listed in its index.
+Since all mutations made to a child environment are by definition compatible with the parent environment, two descendants of a common ancestor environment can be combined by iteratively combining the changes made in one environment, then that environment's parent, until the common ancestor is reached, again re-using storage and reducing computation in many cases.
+For this environment, I also employed a lazily-generated index of type variables to their containing class, which could be in either the current environment or an ancestor.
+Any mutation of a type class in an ancestor environment causes that class to be copied into the current environment before mutation, as well as added to the index, ensuring all local changes to the type environment are listed in its index.
 However, not adding type variables to the index until lookup or mutation preserves the constant-time environment copy operation in the common case in which the copy is not mutated from its parent during its life-cycle.
 This approach imposes some performance penalty on $combine$ if related environments are not properly linked together, as the entire environment needs to be combined rather than just the diff, but is correct as long as the ``null parent'' base case is properly handled.
+This approach imposes some performance penalty on $combine$ if related environments are not properly linked together, as the entire environment needs to be combined rather than just the difference, but is correct as long as the ``null parent'' base-case is properly handled.
 The life-cycle issues are somewhat more complex, as many environments may descend from a common parent, and all of these need their parent to stay alive for purposes of lookup.
 These issues can be solved by ``flattening'' parent nodes into their children before the parents leave scope, but given the tree structure of the inheritance graph it is more straightforward to store the parent nodes in reference-counted or otherwise automatically garbage-collected heap storage.
+These issues can be solved by ``flattening'' parent nodes into their children before the parent's scope ends, but given the tree structure of the inheritance graph it is more straightforward to store the parent nodes in reference-counted or otherwise automatically garbage-collected heap storage.
 \subsection{Union-Find} \label{env-union-find-approach}
 Given the nature of the classes of type variables as disjoint sets, another natural approach to implementing a type environment is the union-find disjoint set data structure\cite{Galler64}.
+Given the nature of the classes of type variables as disjoint sets, another natural approach to implementing a type environment is the union-find disjoint-set data-structure~\cite{Galler64}.
 Union-find efficiently implements two operations over a partition of a collection of elements into disjoint sets; $find(x)$ locates the \emph{representative} of $x$, the element which canonically names its set, while $union(r, s)$ merges two sets represented by $r$ and $s$, respectively.
 The union-find data structure is based on providing each element with a reference to its parent element, such that the root of a tree of elements is the representative of the set of elements contained in the tree.
 $find$ is then implemented by a search up to the parent, generally combined with a \emph{path compression} step that links nodes more directly to their ancestors to speed up subsequent searches.
 $union$ involves making the representative of one set a child of the representative of the other, generally employing a rank- or size-based heuristic to ensure that the tree remains somewhat balanced.
 If both path compression and a balancing heuristic are employed, both $union$ and $find$ run in amortized $O(\alpha(n))$ worst-case time; this bound by the inverse Ackermann function is a small constant for all practical values of $n$.
+If both path compression and a balancing heuristic are employed, both $union$ and $find$ run in amortized $O(\alpha(n))$ worst-case time; this inverse Ackermann bound is a small constant for all practical values of $n$ \cite{Tarjan75}.
 The union-find $find$ and $union$ operations have obvious applicability to the $find$ and $unify$ type environment operations in Table~\ref{env-op-table}, but the union-find data structure must be augmented to fully implement the type environment operations.
 In particular, the type class bound cannot be easily included in the union-find data structure, as the requirement to make it the class representative breaks the balancing properties of $union$, and requires too-close integration of the type environment $unifyBound$ internal operation.
 This issue can be solved by including a side map from class representatives to the type class bound.
+In particular, the type-class bound cannot be easily included in the union-find data structure, as the requirement to make it the class representative breaks the balancing properties of $union$, and requires too-close integration of the type environment $unifyBound$ internal operation.
+This issue can be solved by including a side map from class representatives to the type-class bound.
 If placeholder values are inserted in this map for type classes without bounds than this also has the useful property that the key set of the map provides an easily obtainable list of all the class representatives, a list which cannot be derived from the union-find data structure without a linear search for class representatives through all elements.
 …
 Another type environment operation not supported directly by the union-find data structure is $report$, which lists the type variables in a given class, and similarly $split$, which reverts a $unify$ operation.
 Since the union-find data structure stores only links from children to parents and not vice-versa, there is no way to reconstruct a class from one of its elements without a linear search over the entire data structure, with $find$ called on each element to check its membership in the class.
+The situation is even worse for the $split$ operation, which would require extra information to maintain the order that each child was added to its parent node.
+Unfortunately, the literature\cite{Tarjan84,Galil91,Patwary10} on union-find does not present a way to keep references to children without breaking the asymptotic time bounds of the algorithm; I have discovered a method to do so which, despite its simplicity, seems to be novel.
+\TODO{port figure from slideshow}
+The situation is even worse for the $split$ operation, which requires extra information to maintain the order that each child is added to its parent node.
+Unfortunately, the literature \cite{Tarjan84,Galil91,Patwary10} on union-find does not present a way to keep references to children without breaking the asymptotic time bounds of the algorithm; I have discovered a method to do so, which, despite its simplicity, seems to be novel.
 The core idea of this ``union-find with classes'' data structure and algorithm is to keep the members of each class stored in a circularly-linked list.
 …
 In my version, the list data structure does not affect the layout of the union-find tree, maintaining the same asymptotic bounds as union-find.
 In more detail, each element is given a !next! pointer to another element in the same class; this !next! pointer initially points to the element itself.
 When two classes are unified, the !next! pointers of the representatives of those classes are swapped, splicing the two circularly-linked lists together.
+When two classes are unified, the !next! pointers of the representatives of those classes are swapped, splicing the two circularly-linked lists together as illustrated in Figure~\ref{union-find-classes-fig}.
 Importantly, though this approach requires an extra pointer per element, it does maintain the linear space bound of union-find, and because it only requires updating the two root nodes in $union$ it does not asymptotically increase runtime either.
 The basic approach is compatible with all path-compression techniques, and allows the members of any class to be retrieved in time linear in the size of the class simply by following the !next! pointers from any element.
+\begin{figure}
+        \centering
+        \includegraphics{figures/union-find-with-classes}
+        \caption[Union operation for union-find with classes.]{Union operation for union-find with classes. Solid lines indicate parent pointers, dashed lines are \lstinline{next} pointers.}
+        \label{union-find-classes-fig}
+\end{figure}
 If the path-compression optimization is abandoned, union-find with classes also encodes a reversible history of all the $union$ operations applied to a given class.
 …
 \begin{theorem} \label{env-reverse-thm}
 The !next! pointer of a class representative in the union-find with classes algorithm without path compression points to a leaf from the most-recently-added subtree.
+The !next! pointer of a class representative in the union-find with classes algorithm, without path compression, points to a leaf from the most-recently-added subtree.
 \end{theorem}
 …
         By induction on the height of the tree. \\
         \emph{Base case:} A height 1 tree by definition includes only a single item. In such a case, the representative's !next! pointer points to itself by construction, and the representative is the most-recently-added (and only) leaf in the tree. \\
         \emph{Inductive case:} By construction, a tree $T$ of height greater than 1 has children of the root (representative) node that were representative nodes of classes merged by $union$. By definition, the most-recently-added subtree $T'$ has a smaller height than $T$, thus by the inductive hypothesis before the most-recent $union$ operation the !next! pointer of the root of $T'$ pointed to one of the leaf nodes of $T'$; by construction the !next! pointer of the root of $T$ points to this leaf after the $union$ operation.
+        \emph{Inductive case:} By construction, a tree $T$ of height greater than 1 has children of the root (representative) node that were representative nodes of classes merged by $union$. By definition, the most-recently-added subtree $T'$ has a smaller height than $T$, thus by the inductive hypothesis before the most-recent $union$ operation, the !next! pointer of the root of $T'$ pointed to one of the leaf nodes of $T'$; by construction the !next! pointer of the root of $T$ points to this leaf after the $union$ operation.
 \end{proof}
 …
 \subsection{Persistent Union-Find}
+Given the backtracking nature of the resolution algorithm discussed in Section~\ref{env-defn-sec}, the abilities to quickly switch between related versions of a type environment and to de-duplicate shared data between environments are both assets to performance.
+\label{env-persistent-union-find}
+Given the backtracking nature of the resolution algorithm discussed in Section~\ref{env-defn-sec}, the abilities to quickly switch between related versions of a type environment and to de-duplicate shared data among environments are both assets to performance.
 Conchon and Filli\^{a}tre~\cite{Conchon07} present a persistent union-find data structure based on the persistent array of Baker~\cite{Baker78,Baker91}.
+\TODO{port figure from slideshow}
+In Baker's persistent array, an array reference contains either a pointer to the array or a pointer to an \emph{edit node}; these edit nodes contain an array index, the value in that index, and another array reference pointing either to the array or a different edit node.
+In this manner, a tree of edits is formed, rooted at the actual array.
+Read from the actual array at the root can be performed in constant time, as with a non-persistent array.
+In Baker's persistent array, an \emph{array reference} contains either a pointer to the array or a pointer to an \emph{edit node}; these edit nodes contain an array index, the value in that index, and another array reference pointing either to the array or a different edit node.
+By construction, these array references always point to a node more like the actual array, forming a tree of edits rooted at the actual array.
+Reads from the actual array at the root can be performed in constant time, as with a non-persistent array.
 The persistent array can be mutated in constant time by directly modifying the underlying array, then replacing its array reference with an edit node containing the mutated index, the previous value at that index, and a reference to the mutated array. If the current array reference is not the root, mutation consists simply of constructing a new edit node encoding the change and referring to the current array reference.
+The mutation algorithm at the root is in some sense a special case of the key operation on persistent arrays, $reroot$.
+The mutation algorithm at the root is a special case of the key operation on persistent arrays, $reroot$.
 A rerooting operation takes any array reference and makes it the root node of the array.
 This is accomplished by tracing the path from some edit node to the root node of the array (always the underlying array), recursively applying the edits to the underlying array and replacing each edit node's successor with the inverse edit.
+This operation is accomplished by tracing the path from some edit node to actual array at the root node, recursively applying the edits to the underlying array and replacing each edit node's successor with the inverse edit.
 In this way, any previous state of the persistent array can be restored in time proportional to the number of edits to the current state of the array.
 While $reroot$ does maintain the same value mapping in every version of the persistent array, the internal mutations it performs means that it is not thread-safe, and must be used behind a lock in a concurrent context.
+While $reroot$ does maintain the same value mapping in every version of the persistent array, the internal mutations it performs break thread-safety, and thus it must be used behind a lock in a concurrent context.
 Also, the root node with the actual array may in principle be anywhere in the tree, and does not provide information to report its leaf nodes, so some form of automatic garbage collection is generally required for the data structure.
 Since the graph of edit nodes is tree-structured, reference counting approaches suffice for garbage collection; Conchon and Filli\^{a}tre~\cite{Conchon07} also observe that if the only $reroot$ operations are for backtracking then the tail of inverse edit nodes may be elided, suggesting the possibility of stack-based memory management.
 While Conchon and Filli\^{a}tre~\cite{Conchon07} implement their persistent union-find data structure over a universe of integer elements in the fixed range $[1,N]$, the type environment problem needs more flexibility.
 In particular, an arbitrary number of type variables must be added to the environment.
 As such, a persistent hash table is a more suitable structure than a persistent array, providing the same expected asymptotic time bounds while allowing a dynamic number of elements.
 Besides replacing the underlying array with a hash table, the other major change in this approach is to replace the two types of array references, !Array! and !Edit!, with four node types, !Table!,  !Edit!, !Add!, and !Remove!, where !Add! adds a new key-value pair, !Remove! removes a key, and !Edit! mutates an existing key-value pair.
 In this variant of \CFACC{}, this persistent hash table is used as the side map discussed in Section~\ref{env-union-find-approach} for class bounds.
+In particular, an arbitrary number of type variables may be added to the environment.
+As such, a persistent hash table is a more suitable structure than a persistent array, providing the same expected asymptotic time bounds, while allowing a dynamic number of elements.
+Besides replacing the underlying array with a hash table, the other major change in this approach is to replace the two types of array references, !Array! and !Edit!, with four node types, !Table!,  !Edit!, !Add!, and !Remove!, where !Add! adds a new key-value pair, !Remove! removes a key-value pair, and !Edit! mutates an existing key-value pair.
+In this variant of \CFACC{}, this persistent hash-table is used as the side map discussed in Section~\ref{env-union-find-approach} for class bounds.
 The actual union-find data structure is slightly modified from this approach, with a !Base! node containing the root union-find data structure, !Add! nodes adding new elements, !AddTo! nodes defining the union of two type classes, and !Remove! and !RemoveFrom! nodes as inverses of the previous two elements, for purposes of maintaining the edit list.
+Making !AddTo! and !RemoveFrom! single nodes shortens the edit path for improved performance, while also providing semantic information missing from the raw array updates in Conchon and Filli\^{a}tre's data structure.
+The single-node approach, does, however, break under most path-compression algorithms; !RemoveFrom! can be applied to the underlying data structure using the ``leaf of last union'' approach discussed in in Section~\ref{env-union-find-classes-approach}; this was judged an acceptable trade-off for the added semantic information and shortened paths.
+Maintaining explicit information on $union$ operations in the persistent union-find edit tree in the form of !AddTo! and !RemoveFrom! nodes exposes a new option for combining type environments.
+Figure~\ref{persistent-union-find-fig} demonstrates the structure of a simple example.
+Making !AddTo! and !RemoveFrom! single nodes provides semantic information missing from the raw array updates in Conchon and Filli\^{a}tre's data structure.
+!RemoveFrom! is implemented using the ``leaf of last union'' approach discussed in Section~\ref{env-union-find-classes-approach}; this does, however, preclude the use of path-compression algorithms in this persistent union-find data structure.
+\begin{figure}
+        \centering
+        \includegraphics{figures/persistent-union-find}
+        \caption[Persistent union-find data structure.]{Persistent union-find data structure. Shows the edit nodes to reverse back to an empty structure.}
+        \label{persistent-union-find-fig}
+\end{figure}
+This added semantic information on $union$ operations in the persistent union-find edit tree exposes a new option for combining type environments.
 If the type environments are part of the same edit tree, one environment $T'$ can be combined with another $T$ by only testing the edits on the path from $T'$ to $T$ in both the persistent union-find data structure describing the classes and the persistent hash table containing the class bounds.
+This is generally more efficient than testing the compatibility of all type classes in $T'$, as only those that are actually different than those in $T$ must be considered.
+This approach is generally more efficient than testing the compatibility of all type classes in $T'$, as only those that are actually different than those in $T$ must be considered.
+However, the improved performance comes at the cost of some flexibility, as the edit-tree link must be maintained between any two environments to be combined under this algorithm.
 The procedure for $combine(T, T')$ based on edit paths is as follows:
 The shared edit trees for classes and bindings are rerooted at $T$, and the path from $T'$ to $T$ is followed to create a list of actual edits.
 By tracking the state of each element, redundant changes such as an !Edit! followed by an !Edit! can be reduced to their form in $T'$ by dropping the later (more like $T$) !Edit! for the same key; !Add! and !Remove! cancel similarly.
+This procedure is repeated for both the class edit tree and the binding edit tree.
+When the list of net changes to the environment has been produced, the additive changes are applied to $T$.
+For example, if a type class exists in $T'$ but not $T$, the corresponding !Add! edit will be applied to $T$, but in the reverse situation the !Remove! edit will not be applied to $T$, as the intention is to produce a new environment representing the union of the two sets of type classes; similarly, !AddTo! edits are applied to unify type-classes in $T$ that are united in $T'$, but !RemoveFrom! edits that split type classes are not.
+The new environment, $T''$ can always be constructed with a consistent partitioning of type variables; in the extreme case, all variables from both $T$ and $T'$ will be united in a single type class in $T''$.
+Where $combine$ can fail is in unifying the bound types; if any class in $T'$ has a class bound which does not unify with the merged class in $T''$ than $combine$ fails.
+\section{Analysis}
+In this section I present asymptotic analyses of the various approaches to a type environment data structure discussed in the previous section.
+This procedure is repeated for both the class edit-tree and the binding edit-tree.
+When the list of net changes to the environment is produced, the additive changes are applied to $T$.
+For example, if a type class exists in $T'$ but not $T$, the corresponding !Add! edit is applied to $T$, but in the reverse situation the !Remove! edit is not applied to $T$, as the intention is to produce a new environment representing the union of the two sets of type classes; similarly, !AddTo! edits are applied to unify type-classes in $T$ that are united in $T'$, but !RemoveFrom! edits that split type classes are not.
+A new environment, $T''$, can always be constructed with a consistent partitioning of type variables; in the extreme case, all variables from both $T$ and $T'$ are united in a single type class in $T''$.
+$combine$ can fail to unify the bound types; if any class in $T'$ has a class bound that does not unify with the merged class in $T''$, then $combine$ fails.
+\section{Analysis} \label{env-analysis-sec}
+In this section, I present asymptotic analyses of the various approaches to the type environment data structure discussed in the previous section.
+My results are summarized in Table~\ref{env-bounds-table}; in all cases, $n$ is the number of type classes, $m$ is the maximum size of a type class, and $p$ the number of edits between two environments or one environment and the empty environment.
+$u(n)$ captures the recursive cost of class unification, which is kept separate so that the $O(n)$ number of recursive class unifications can be distinguished from the direct cost of each recursive step.
 \begin{table}
 …
 \begin{tabular}{rllll}
         \hline
+                                & \textbf{Na\"{\i}ve}   & \textbf{Incremental}  & \textbf{Union-Find}           & \textbf{U-F with Classes}             \\
+        \hline
+        $find$          & $O(n)$                                & $O(p)$                                & $O(\alpha(m))$                        & $O(\log m)$                                   \\
+        $report$        & $O(m)$                                & $O(m)$                                & $O(n \log m)$                         & $O(m)$                                                \\
+        $bound$         & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
+        $insert$        & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
+        $add$           & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
+        $bind$          & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
+        $unify$         & $O(m + u(n))$                 & $O(m + u(n))$                 & $O(\log m + u(n))$            & $O(\log m + u(n))$                    \\
+        $split$         & ---                                   & ---                                   & ---                                           & $O(\log m)$                                   \\
+        $combine$       & $O(nm \cdot u(n))$    & $O(pm \cdot u(n))$    & $O(n \log m \cdot u(n))$      & $O(p \log m \cdot u(n))$              \\
+        $save$          & $O(nm)$                               & $O(1)$                                & $O(nm)$                                       & $O(1)$                                                \\
+        $backtrack$     & $O(nm)$                               & $O(pm)$                               & $O(nm)$                                       & $O(p)$                                                \\
+                                & \textbf{Na\"{\i}ve} & \textbf{Incremental} & \textbf{Union-Find} & \textbf{Persistent U-F} \\
+        \hline
+        $find$          & $O(n)$                        & $O(p)$                        & $O(\alpha(m))$        & $O(\log m)$           \\
+        $report$        & $O(m)$                        & $O(m)$                        & $O(nm\alpha(m))$      & $O(m)$                        \\
+        $bound$         & $O(1)$                        & $O(1)$                        & $O(1)$                        & $O(1)$                        \\
+        $insert$        & $O(1)$                        & $O(1)$                        & $O(1)$                        & $O(1)$                        \\
+        $add$           & $O(1)$                        & $O(m)$                        & $O(1)$                        & $O(1)$                        \\
+        $bind$          & $O(1)$                        & $O(1)$                        & $O(1)$                        & $O(1)$                        \\
+        $unify$         & $O(m + u(n))$         & $O(m + u(n))$         & $O(1 + u(n))$         & $O(1 + u(n))$         \\
+        $split$         & ---                           & ---                           & ---                           & $O(\log m)$           \\
+        $combine$       & $O(n^2m $                     & $O(p^2m $                     & $O(nm\alpha(m) $      & $O(p \log m $         \\
+                                & $~~~+ nu(n))$         & $~~~+ pu(n))$         & $~~~+ nu(n))$         & $~~~+ pu(n))$         \\
+        $save$          & $O(nm)$                       & $O(1)$                        & $O(nm)$                       & $O(1)$                        \\
+        $backtrack$     & $O(nm)$                       & $O(pm)$                       & $O(nm)$                       & $O(p)$                        \\
         \hline
 \end{tabular}
 \end{table}
+% Future work: design multi-threaded version of C&F persistent map --- core idea is some sort of thread-boundary edit node
+\subsection{Na\"{\i}ve and Incremental}
+\label{naive-incremental-analysis}
+The na\"{\i}ve type environment data structure does not have an environment-wide index for type variables, but does have an index for each type class, assumed to be hash-based here.
+As a result, every class's index must be consulted for a $find$ operation, at an overall cost of $O(n)$.
+The incremental variant builds an overall hash-based index as it is queried, but may need to recursively check its parent environments if its local index does not contain a type variable, and may have as many parents as times it has been modified, for cost $O(p)$.
+It should be noted that subsequent queries for the same variable execute in constant time.
+Since both na\"{\i}ve and incremental variants store complete type classes, the cost of a $report$ operation is simply the time needed to output the contained variables, $O(m)$.
+Since the type classes store their bounds, $bound$ and $bind$ are both $O(1)$ given a type class.
+Once a $find$ operation has already been performed to verify that a type variable does not exist in the environment, the data structures for both these variants support adding new type classes (the $insert$ operation) in $O(1)$.
+Adding a variable to a type class (the $add$ operation) can be done in $O(1)$ for the na\"{\i}ve implementation, but the incremental implementation may need to copy the edited type class from a parent at cost $O(m)$.
+The linear storage of type classes in both variants also leads to $O(m)$ time for the variable-merging step in $unify$, plus the usual $u(n)$ recursion term for $unifyBound$.
+The na\"{\i}ve $combine$ operation must traverse each of the classes of one environment, merging in any class of the other environment that shares a type variable.
+Since there are at most $n$ classes to unify, the unification cost is $O(nm + nu(n))$, while traversal and $find$ costs to locate classes to merge total $O(n^2m)$, for an overall cost of $O(n^2m + nu(n))$.
+The incremental $combine$ operation works similarly, but only needs to consider classes modified in either environment with respect to the common ancestor of both environments, allowing the $n$ cost terms to be substituted for $p$, for an overall cost of $O(p^m + pu(n))$.
+Neither variant supports the $split$ operation to undo a $unify$.
+The na\"{\i}ve environment does nothing to support $save$ and $backtrack$, so these operations must be implemented by making a deep copy of the environment on $save$, then a destructive overwrite on $backtrack$, each at a cost of $O(nm)$.
+The incremental environment supports $O(1)$ $save$ by simply setting aside a reference to the current environment, then proceeding with a new, empty environment with the former environment as a parent.
+$backtrack$ to a parent environment may involve destroying all the intermediate environments if this backtrack removes the last reference to the backtracked-from environment; this cost is $O(pm)$.
+\subsection{Union-Find}
+The union-find data structure is designed to support $find$ efficiently, and thus for any variant, the cost is simply the distance up the tree to the representative element.
+For basic union-find, this is amortized to the inverse Ackermann function, $O(\alpha(m))$, essentially a small constant, though the loss of path compression in persistent union-find raises this cost to $O(\log m)$.
+Basic union-find is not designed to support the $report$ operation, however, so it must be simulated by checking the representative of every type variable, at cost $O(nm\alpha(m))$.
+Persistent union-find, on the other hand, uses the ``with classes'' extension to union-find described in Section~\ref{env-union-find-classes-approach} to run $report$ in $O(m)$ time.
+All union-find environment variants described here use a secondary hash table to map from class representatives to bindings, and as such can perform $bound$ and $bind$ in $O(1)$, given the representative.
+$insert$ is also a $O(1)$ operation for both basic and persistent union-find.
+Since $add$ simply involves attaching a new child to the class root, it is also a $O(1)$ operation for all union-find environment variants.
+Union-find is also designed to support $unify$ in constant time, and as such, given class representatives, the variable-merging cost of $unify$ for both variants is $O(1)$ to make one representative the child of the other, plus the $O(u(n))$ term for $unifyBound$.
+Basic union-find does not support $split$, but persistent union-find can accomplish it using the mechanism described in Section~\ref{env-union-find-classes-approach} in $O(\log m)$, the cost of traversing up to the root of a class from a leaf without path compression.
+$combine$ on the basic union-find data structure works similarly to the data structures discussed above in Section~\ref{naive-incremental-analysis}, with a $O(nu(n))$ term for the $O(n)$ underlying $unify$ operations, and a $O(n\alpha(m))$ term to find the classes which need unification by checking the class representatives of each corresponding type variable in both environments for equality.
+Persistent union-find uses a different approach for $combine$, discussed in Section~\ref{env-persistent-union-find}.
+Discounting recursive $unify$ operations included in the $u(n)$ $unifyBound$ term, there may be at most $O(p)$ $unify$ operations performed, at cost $O(pu(n))$.
+Each of the $O(p)$ steps on the edit path can be processed in the $O(\log m)$ time it takes to find the current representative of the modified type class, for a total runtime of $O(p \log m + pu(n))$.
+In terms of backtracking operations, the basic union-find data structure only supports deep copies, for $O(nm)$ cost for both $save$ and $backtrack$.
+Persistent union-find, as the name suggests, is more optimized, with $O(1)$ cost to $save$ a backtrack-capable reference to the current environment state, and $O(p)$ cost to revert to that state (possibly destroying no-longer-used edit nodes along the path).
+\section{Conclusion \& Future Work}
+This chapter presents the type environment abstract data type, some type-environment data-structures optimized for workloads encountered in the expression resolution problem, and asymptotic analysis of each data structure.
+Chapter~\ref{expr-chap} provides experimental performance results for a representative set of these approaches.
+One contribution of this thesis is the union-find with classes data structure for efficient retrieval of union-find class members, along with a related algorithm for reversing the history of $union$ operations in this data structure.
+This reversible history contributes to the second novel contribution of this chapter, a type environment data structure based off the persistent union-find data structure of Conchon and Filli\^{a}tre~\cite{Conchon07}.
+This persistent union-find environment uses the $split$ operation introduced in union-find with classes and the edit history of the persistent data structure to support an environment-combining algorithm that only considers the edits between the environments to be merged.
+This persistent union-find data structure is efficient, but not thread-safe; as suggested in Section~\ref{resn-conclusion-sec}, it may be valuable to parallelize the \CFA{} expression resolver.
+However, allowing multiple threads concurrent access to the persistent data structure is likely to result in ``reroot thrashing'', as different threads reroot the data structure to their own versions of interest.
+This contention could be mitigated by partitioning the data structure into separate subtrees for each thread, with each subtree having its own root node, and the boundaries among them implemented with a lock-equipped !ThreadBoundary! edit node.

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 8b590a4 for doc

Legend:

Download in other formats: