Changeset b4fd981 for doc/theses/aaron_moss_PhD/phd/type-environment.tex

Ignore:
Timestamp:
Jan 25, 2019, 5:19:24 PM (4 years ago)
Branches:
aaron-thesis, arm-eh, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
21cf101
Parents:
c58bb11
Message:

thesis: port figures from presentation into union-find chapter

File:
1 edited

Legend:

Unmodified
 rc58bb11 \subsection{Na\"{\i}ve} The type environment data structure used in Bilson's\cite{Bilson03} original implementation of \CFACC{} is a straightforward translation of the definitions in Section~\ref{env-defn-sec} to \CC{} code; a !TypeEnvironment! contains a list of !EqvClass! type equivalence classes, each of which contains the type bound information and a tree-based sorted set of type variables. The type environment data structure used in Bilson's~\cite{Bilson03} original implementation of \CFACC{} is a straightforward translation of the definitions in Section~\ref{env-defn-sec} to \CC{} code; a !TypeEnvironment! contains a list of !EqvClass! type equivalence classes, each of which contains the type bound information and a tree-based sorted set of type variables. This approach has the benefit of being easy to understand and not imposing life-cycle or inheritance constraints on its use, but, as can be seen in Table~\ref{env-bounds-table}, does not support many of the desired operations with any particular efficiency. Some variations on this structure may improve performance somewhat; for instance, replacing the !EqvClass! variable storage with a hash-based set would reduce search and update times from $O(\log n)$ to amortized $O(1)$, while adding an index for the type variables in the entire environment would remove the need to check each type class individually to maintain the disjointness property. \subsection{Union-Find} \label{env-union-find-approach} Given the nature of the classes of type variables as disjoint sets, another natural approach to implementing a type environment is the union-find disjoint set data structure\cite{Galler64}. Given the nature of the classes of type variables as disjoint sets, another natural approach to implementing a type environment is the union-find disjoint set data structure~\cite{Galler64}. Union-find efficiently implements two operations over a partition of a collection of elements into disjoint sets; $find(x)$ locates the \emph{representative} of $x$, the element which canonically names its set, while $union(r, s)$ merges two sets represented by $r$ and $s$, respectively. The union-find data structure is based on providing each element with a reference to its parent element, such that the root of a tree of elements is the representative of the set of elements contained in the tree. Unfortunately, the literature\cite{Tarjan84,Galil91,Patwary10} on union-find does not present a way to keep references to children without breaking the asymptotic time bounds of the algorithm; I have discovered a method to do so which, despite its simplicity, seems to be novel. \TODO{port figure from slideshow} The core idea of this union-find with classes'' data structure and algorithm is to keep the members of each class stored in a circularly-linked list. Aho, Hopcroft, and Ullman also include a circularly-linked list in their 1974 textbook~\cite{Aho74}. In my version, the list data structure does not affect the layout of the union-find tree, maintaining the same asymptotic bounds as union-find. In more detail, each element is given a !next! pointer to another element in the same class; this !next! pointer initially points to the element itself. When two classes are unified, the !next! pointers of the representatives of those classes are swapped, splicing the two circularly-linked lists together. When two classes are unified, the !next! pointers of the representatives of those classes are swapped, splicing the two circularly-linked lists together as illustrated in Figure~\ref{union-find-classes-fig}. Importantly, though this approach requires an extra pointer per element, it does maintain the linear space bound of union-find, and because it only requires updating the two root nodes in $union$ it does not asymptotically increase runtime either. The basic approach is compatible with all path-compression techniques, and allows the members of any class to be retrieved in time linear in the size of the class simply by following the !next! pointers from any element. \begin{figure} \centering \includegraphics{figures/union-find-with-classes} \caption[Union operation for union-find with classes.]{Union operation for union-find with classes. Solid lines indicate parent pointers, dashed lines are \lstinline{next} pointers.} \label{union-find-classes-fig} \end{figure} If the path-compression optimization is abandoned, union-find with classes also encodes a reversible history of all the $union$ operations applied to a given class. Conchon and Filli\^{a}tre~\cite{Conchon07} present a persistent union-find data structure based on the persistent array of Baker~\cite{Baker78,Baker91}. \TODO{port figure from slideshow} In Baker's persistent array, an array reference contains either a pointer to the array or a pointer to an \emph{edit node}; these edit nodes contain an array index, the value in that index, and another array reference pointing either to the array or a different edit node. In this manner, a tree of edits is formed, rooted at the actual array. Read from the actual array at the root can be performed in constant time, as with a non-persistent array. Reads from the actual array at the root can be performed in constant time, as with a non-persistent array. The persistent array can be mutated in constant time by directly modifying the underlying array, then replacing its array reference with an edit node containing the mutated index, the previous value at that index, and a reference to the mutated array. If the current array reference is not the root, mutation consists simply of constructing a new edit node encoding the change and referring to the current array reference. The mutation algorithm at the root is in some sense a special case of the key operation on persistent arrays, $reroot$. Besides replacing the underlying array with a hash table, the other major change in this approach is to replace the two types of array references, !Array! and !Edit!, with four node types, !Table!,  !Edit!, !Add!, and !Remove!, where !Add! adds a new key-value pair, !Remove! removes a key, and !Edit! mutates an existing key-value pair. In this variant of \CFACC{}, this persistent hash table is used as the side map discussed in Section~\ref{env-union-find-approach} for class bounds. The actual union-find data structure is slightly modified from this approach, with a !Base! node containing the root union-find data structure, !Add! nodes adding new elements, !AddTo! nodes defining the union of two type classes, and !Remove! and !RemoveFrom! nodes as inverses of the previous two elements, for purposes of maintaining the edit list. The actual union-find data structure is slightly modified from this approach, with a !Base! node containing the root union-find data structure, !Add! nodes adding new elements, !AddTo! nodes defining the union of two type classes, and !Remove! and !RemoveFrom! nodes as inverses of the previous two elements, for purposes of maintaining the edit list; Figure~\ref{persistent-union-find-fig} demonstrates the structure of a simple example. Making !AddTo! and !RemoveFrom! single nodes shortens the edit path for improved performance, while also providing semantic information missing from the raw array updates in Conchon and Filli\^{a}tre's data structure. The single-node approach, does, however, break under most path-compression algorithms; !RemoveFrom! can be applied to the underlying data structure using the leaf of last union'' approach discussed in in Section~\ref{env-union-find-classes-approach}; this was judged an acceptable trade-off for the added semantic information and shortened paths. \begin{figure} \centering \includegraphics{figures/persistent-union-find} \caption[Persistent union-find data structure.]{Persistent union-find data structure. Shows the edit nodes to reverse back to an empty structure.} \label{persistent-union-find-fig} \end{figure} Maintaining explicit information on $union$ operations in the persistent union-find edit tree in the form of !AddTo! and !RemoveFrom! nodes exposes a new option for combining type environments.