Index: doc/theses/aaron_moss_PhD/phd/Makefile
===================================================================
 doc/theses/aaron_moss_PhD/phd/Makefile (revision c58bb11d551d74c2ac53af26eaad95ade660e9d5)
+++ doc/theses/aaron_moss_PhD/phd/Makefile (revision b4fd9815429ed283d0fa5c64a3f8523e5844caa4)
@@ 30,4 +30,6 @@
FIGURES = ${addsuffix .eps, \
resolutiondag \
+unionfindwithclasses \
+persistentunionfind \
}
Index: doc/theses/aaron_moss_PhD/phd/typeenvironment.tex
===================================================================
 doc/theses/aaron_moss_PhD/phd/typeenvironment.tex (revision c58bb11d551d74c2ac53af26eaad95ade660e9d5)
+++ doc/theses/aaron_moss_PhD/phd/typeenvironment.tex (revision b4fd9815429ed283d0fa5c64a3f8523e5844caa4)
@@ 70,5 +70,5 @@
\subsection{Na\"{\i}ve}
The type environment data structure used in Bilson's\cite{Bilson03} original implementation of \CFACC{} is a straightforward translation of the definitions in Section~\ref{envdefnsec} to \CC{} code; a !TypeEnvironment! contains a list of !EqvClass! type equivalence classes, each of which contains the type bound information and a treebased sorted set of type variables.
+The type environment data structure used in Bilson's~\cite{Bilson03} original implementation of \CFACC{} is a straightforward translation of the definitions in Section~\ref{envdefnsec} to \CC{} code; a !TypeEnvironment! contains a list of !EqvClass! type equivalence classes, each of which contains the type bound information and a treebased sorted set of type variables.
This approach has the benefit of being easy to understand and not imposing lifecycle or inheritance constraints on its use, but, as can be seen in Table~\ref{envboundstable}, does not support many of the desired operations with any particular efficiency.
Some variations on this structure may improve performance somewhat; for instance, replacing the !EqvClass! variable storage with a hashbased set would reduce search and update times from $O(\log n)$ to amortized $O(1)$, while adding an index for the type variables in the entire environment would remove the need to check each type class individually to maintain the disjointness property.
@@ 91,5 +91,5 @@
\subsection{UnionFind} \label{envunionfindapproach}
Given the nature of the classes of type variables as disjoint sets, another natural approach to implementing a type environment is the unionfind disjoint set data structure\cite{Galler64}.
+Given the nature of the classes of type variables as disjoint sets, another natural approach to implementing a type environment is the unionfind disjoint set data structure~\cite{Galler64}.
Unionfind efficiently implements two operations over a partition of a collection of elements into disjoint sets; $find(x)$ locates the \emph{representative} of $x$, the element which canonically names its set, while $union(r, s)$ merges two sets represented by $r$ and $s$, respectively.
The unionfind data structure is based on providing each element with a reference to its parent element, such that the root of a tree of elements is the representative of the set of elements contained in the tree.
@@ 110,6 +110,4 @@
Unfortunately, the literature\cite{Tarjan84,Galil91,Patwary10} on unionfind does not present a way to keep references to children without breaking the asymptotic time bounds of the algorithm; I have discovered a method to do so which, despite its simplicity, seems to be novel.
\TODO{port figure from slideshow}

The core idea of this ``unionfind with classes'' data structure and algorithm is to keep the members of each class stored in a circularlylinked list.
Aho, Hopcroft, and Ullman also include a circularlylinked list in their 1974 textbook~\cite{Aho74}.
@@ 117,7 +115,14 @@
In my version, the list data structure does not affect the layout of the unionfind tree, maintaining the same asymptotic bounds as unionfind.
In more detail, each element is given a !next! pointer to another element in the same class; this !next! pointer initially points to the element itself.
When two classes are unified, the !next! pointers of the representatives of those classes are swapped, splicing the two circularlylinked lists together.
+When two classes are unified, the !next! pointers of the representatives of those classes are swapped, splicing the two circularlylinked lists together as illustrated in Figure~\ref{unionfindclassesfig}.
Importantly, though this approach requires an extra pointer per element, it does maintain the linear space bound of unionfind, and because it only requires updating the two root nodes in $union$ it does not asymptotically increase runtime either.
The basic approach is compatible with all pathcompression techniques, and allows the members of any class to be retrieved in time linear in the size of the class simply by following the !next! pointers from any element.
+
+\begin{figure}
+ \centering
+ \includegraphics{figures/unionfindwithclasses}
+ \caption[Union operation for unionfind with classes.]{Union operation for unionfind with classes. Solid lines indicate parent pointers, dashed lines are \lstinline{next} pointers.}
+ \label{unionfindclassesfig}
+\end{figure}
If the pathcompression optimization is abandoned, unionfind with classes also encodes a reversible history of all the $union$ operations applied to a given class.
@@ 143,9 +148,7 @@
Conchon and Filli\^{a}tre~\cite{Conchon07} present a persistent unionfind data structure based on the persistent array of Baker~\cite{Baker78,Baker91}.
\TODO{port figure from slideshow}

In Baker's persistent array, an array reference contains either a pointer to the array or a pointer to an \emph{edit node}; these edit nodes contain an array index, the value in that index, and another array reference pointing either to the array or a different edit node.
In this manner, a tree of edits is formed, rooted at the actual array.
Read from the actual array at the root can be performed in constant time, as with a nonpersistent array.
+Reads from the actual array at the root can be performed in constant time, as with a nonpersistent array.
The persistent array can be mutated in constant time by directly modifying the underlying array, then replacing its array reference with an edit node containing the mutated index, the previous value at that index, and a reference to the mutated array. If the current array reference is not the root, mutation consists simply of constructing a new edit node encoding the change and referring to the current array reference.
The mutation algorithm at the root is in some sense a special case of the key operation on persistent arrays, $reroot$.
@@ 163,7 +166,14 @@
Besides replacing the underlying array with a hash table, the other major change in this approach is to replace the two types of array references, !Array! and !Edit!, with four node types, !Table!, !Edit!, !Add!, and !Remove!, where !Add! adds a new keyvalue pair, !Remove! removes a key, and !Edit! mutates an existing keyvalue pair.
In this variant of \CFACC{}, this persistent hash table is used as the side map discussed in Section~\ref{envunionfindapproach} for class bounds.
The actual unionfind data structure is slightly modified from this approach, with a !Base! node containing the root unionfind data structure, !Add! nodes adding new elements, !AddTo! nodes defining the union of two type classes, and !Remove! and !RemoveFrom! nodes as inverses of the previous two elements, for purposes of maintaining the edit list.
+The actual unionfind data structure is slightly modified from this approach, with a !Base! node containing the root unionfind data structure, !Add! nodes adding new elements, !AddTo! nodes defining the union of two type classes, and !Remove! and !RemoveFrom! nodes as inverses of the previous two elements, for purposes of maintaining the edit list; Figure~\ref{persistentunionfindfig} demonstrates the structure of a simple example.
Making !AddTo! and !RemoveFrom! single nodes shortens the edit path for improved performance, while also providing semantic information missing from the raw array updates in Conchon and Filli\^{a}tre's data structure.
The singlenode approach, does, however, break under most pathcompression algorithms; !RemoveFrom! can be applied to the underlying data structure using the ``leaf of last union'' approach discussed in in Section~\ref{envunionfindclassesapproach}; this was judged an acceptable tradeoff for the added semantic information and shortened paths.
+
+\begin{figure}
+ \centering
+ \includegraphics{figures/persistentunionfind}
+ \caption[Persistent unionfind data structure.]{Persistent unionfind data structure. Shows the edit nodes to reverse back to an empty structure.}
+ \label{persistentunionfindfig}
+\end{figure}
Maintaining explicit information on $union$ operations in the persistent unionfind edit tree in the form of !AddTo! and !RemoveFrom! nodes exposes a new option for combining type environments.