Ignore:
Timestamp:
Jan 25, 2019, 5:19:24 PM (5 years ago)
Author:
Aaron Moss <a3moss@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
21cf101
Parents:
c58bb11
Message:

thesis: port figures from presentation into union-find chapter

Location:
doc/theses/aaron_moss_PhD/phd
Files:
4 added
2 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/aaron_moss_PhD/phd/Makefile

    rc58bb11 rb4fd981  
    3030FIGURES = ${addsuffix .eps, \
    3131resolution-dag \
     32union-find-with-classes \
     33persistent-union-find \
    3234}
    3335
  • doc/theses/aaron_moss_PhD/phd/type-environment.tex

    rc58bb11 rb4fd981  
    7070\subsection{Na\"{\i}ve}
    7171
    72 The type environment data structure used in Bilson's\cite{Bilson03} original implementation of \CFACC{} is a straightforward translation of the definitions in Section~\ref{env-defn-sec} to \CC{} code; a !TypeEnvironment! contains a list of !EqvClass! type equivalence classes, each of which contains the type bound information and a tree-based sorted set of type variables.
     72The type environment data structure used in Bilson's~\cite{Bilson03} original implementation of \CFACC{} is a straightforward translation of the definitions in Section~\ref{env-defn-sec} to \CC{} code; a !TypeEnvironment! contains a list of !EqvClass! type equivalence classes, each of which contains the type bound information and a tree-based sorted set of type variables.
    7373This approach has the benefit of being easy to understand and not imposing life-cycle or inheritance constraints on its use, but, as can be seen in Table~\ref{env-bounds-table}, does not support many of the desired operations with any particular efficiency.
    7474Some variations on this structure may improve performance somewhat; for instance, replacing the !EqvClass! variable storage with a hash-based set would reduce search and update times from $O(\log n)$ to amortized $O(1)$, while adding an index for the type variables in the entire environment would remove the need to check each type class individually to maintain the disjointness property.
     
    9191\subsection{Union-Find} \label{env-union-find-approach}
    9292
    93 Given the nature of the classes of type variables as disjoint sets, another natural approach to implementing a type environment is the union-find disjoint set data structure\cite{Galler64}.
     93Given the nature of the classes of type variables as disjoint sets, another natural approach to implementing a type environment is the union-find disjoint set data structure~\cite{Galler64}.
    9494Union-find efficiently implements two operations over a partition of a collection of elements into disjoint sets; $find(x)$ locates the \emph{representative} of $x$, the element which canonically names its set, while $union(r, s)$ merges two sets represented by $r$ and $s$, respectively.
    9595The union-find data structure is based on providing each element with a reference to its parent element, such that the root of a tree of elements is the representative of the set of elements contained in the tree.
     
    110110Unfortunately, the literature\cite{Tarjan84,Galil91,Patwary10} on union-find does not present a way to keep references to children without breaking the asymptotic time bounds of the algorithm; I have discovered a method to do so which, despite its simplicity, seems to be novel.
    111111
    112 \TODO{port figure from slideshow}
    113 
    114112The core idea of this ``union-find with classes'' data structure and algorithm is to keep the members of each class stored in a circularly-linked list.
    115113Aho, Hopcroft, and Ullman also include a circularly-linked list in their 1974 textbook~\cite{Aho74}.
     
    117115In my version, the list data structure does not affect the layout of the union-find tree, maintaining the same asymptotic bounds as union-find.
    118116In more detail, each element is given a !next! pointer to another element in the same class; this !next! pointer initially points to the element itself.
    119 When two classes are unified, the !next! pointers of the representatives of those classes are swapped, splicing the two circularly-linked lists together.
     117When two classes are unified, the !next! pointers of the representatives of those classes are swapped, splicing the two circularly-linked lists together as illustrated in Figure~\ref{union-find-classes-fig}.
    120118Importantly, though this approach requires an extra pointer per element, it does maintain the linear space bound of union-find, and because it only requires updating the two root nodes in $union$ it does not asymptotically increase runtime either.
    121119The basic approach is compatible with all path-compression techniques, and allows the members of any class to be retrieved in time linear in the size of the class simply by following the !next! pointers from any element.
     120
     121\begin{figure}
     122        \centering
     123        \includegraphics{figures/union-find-with-classes}
     124        \caption[Union operation for union-find with classes.]{Union operation for union-find with classes. Solid lines indicate parent pointers, dashed lines are \lstinline{next} pointers.}
     125        \label{union-find-classes-fig}
     126\end{figure}
    122127
    123128If the path-compression optimization is abandoned, union-find with classes also encodes a reversible history of all the $union$ operations applied to a given class.
     
    143148Conchon and Filli\^{a}tre~\cite{Conchon07} present a persistent union-find data structure based on the persistent array of Baker~\cite{Baker78,Baker91}.
    144149
    145 \TODO{port figure from slideshow}
    146 
    147150In Baker's persistent array, an array reference contains either a pointer to the array or a pointer to an \emph{edit node}; these edit nodes contain an array index, the value in that index, and another array reference pointing either to the array or a different edit node.
    148151In this manner, a tree of edits is formed, rooted at the actual array.
    149 Read from the actual array at the root can be performed in constant time, as with a non-persistent array.
     152Reads from the actual array at the root can be performed in constant time, as with a non-persistent array.
    150153The persistent array can be mutated in constant time by directly modifying the underlying array, then replacing its array reference with an edit node containing the mutated index, the previous value at that index, and a reference to the mutated array. If the current array reference is not the root, mutation consists simply of constructing a new edit node encoding the change and referring to the current array reference. 
    151154The mutation algorithm at the root is in some sense a special case of the key operation on persistent arrays, $reroot$.
     
    163166Besides replacing the underlying array with a hash table, the other major change in this approach is to replace the two types of array references, !Array! and !Edit!, with four node types, !Table!,  !Edit!, !Add!, and !Remove!, where !Add! adds a new key-value pair, !Remove! removes a key, and !Edit! mutates an existing key-value pair.
    164167In this variant of \CFACC{}, this persistent hash table is used as the side map discussed in Section~\ref{env-union-find-approach} for class bounds.
    165 The actual union-find data structure is slightly modified from this approach, with a !Base! node containing the root union-find data structure, !Add! nodes adding new elements, !AddTo! nodes defining the union of two type classes, and !Remove! and !RemoveFrom! nodes as inverses of the previous two elements, for purposes of maintaining the edit list.
     168The actual union-find data structure is slightly modified from this approach, with a !Base! node containing the root union-find data structure, !Add! nodes adding new elements, !AddTo! nodes defining the union of two type classes, and !Remove! and !RemoveFrom! nodes as inverses of the previous two elements, for purposes of maintaining the edit list; Figure~\ref{persistent-union-find-fig} demonstrates the structure of a simple example.
    166169Making !AddTo! and !RemoveFrom! single nodes shortens the edit path for improved performance, while also providing semantic information missing from the raw array updates in Conchon and Filli\^{a}tre's data structure.
    167170The single-node approach, does, however, break under most path-compression algorithms; !RemoveFrom! can be applied to the underlying data structure using the ``leaf of last union'' approach discussed in in Section~\ref{env-union-find-classes-approach}; this was judged an acceptable trade-off for the added semantic information and shortened paths.
     171
     172\begin{figure}
     173        \centering
     174        \includegraphics{figures/persistent-union-find}
     175        \caption[Persistent union-find data structure.]{Persistent union-find data structure. Shows the edit nodes to reverse back to an empty structure.}
     176        \label{persistent-union-find-fig}
     177\end{figure}
    168178
    169179Maintaining explicit information on $union$ operations in the persistent union-find edit tree in the form of !AddTo! and !RemoveFrom! nodes exposes a new option for combining type environments.
Note: See TracChangeset for help on using the changeset viewer.