Context Navigation

-                      rb4fd981
+                      r21cf101
 \subsection{Persistent Union-Find}
+\label{env-persistent-union-find}
 Given the backtracking nature of the resolution algorithm discussed in Section~\ref{env-defn-sec}, the abilities to quickly switch between related versions of a type environment and to de-duplicate shared data between environments are both assets to performance.
 …
 In this variant of \CFACC{}, this persistent hash table is used as the side map discussed in Section~\ref{env-union-find-approach} for class bounds.
 The actual union-find data structure is slightly modified from this approach, with a !Base! node containing the root union-find data structure, !Add! nodes adding new elements, !AddTo! nodes defining the union of two type classes, and !Remove! and !RemoveFrom! nodes as inverses of the previous two elements, for purposes of maintaining the edit list; Figure~\ref{persistent-union-find-fig} demonstrates the structure of a simple example.
 Making !AddTo! and !RemoveFrom! single nodes shortens the edit path for improved performance, while also providing semantic information missing from the raw array updates in Conchon and Filli\^{a}tre's data structure.
+The single-node approach, does, however, break under most path-compression algorithms; !RemoveFrom! can be applied to the underlying data structure using the ``leaf of last union'' approach discussed in in Section~\ref{env-union-find-classes-approach}; this was judged an acceptable trade-off for the added semantic information and shortened paths.
+Making !AddTo! and !RemoveFrom! single nodes provides semantic information missing from the raw array updates in Conchon and Filli\^{a}tre's data structure.
+!RemoveFrom! is implemented using the ``leaf of last union'' approach discussed in Section~\ref{env-union-find-classes-approach}; this does, however, preclude the use of path-compression algorithms in this persistent union-find data structure.
 \begin{figure}
 …
 \end{figure}
 Maintaining explicit information on $union$ operations in the persistent union-find edit tree in the form of !AddTo! and !RemoveFrom! nodes exposes a new option for combining type environments.
+This added semantic information on $union$ operations in the persistent union-find edit tree exposes a new option for combining type environments.
 If the type environments are part of the same edit tree, one environment $T'$ can be combined with another $T$ by only testing the edits on the path from $T'$ to $T$ in both the persistent union-find data structure describing the classes and the persistent hash table containing the class bounds.
 This is generally more efficient than testing the compatibility of all type classes in $T'$, as only those that are actually different than those in $T$ must be considered.
+This improved performance comes at the cost of some flexibility, however, as the edit-tree link must be maintained between any two environments to be combined under this algorithm.
 The procedure for $combine(T, T')$ based on edit paths is as follows:
 …
 In this section I present asymptotic analyses of the various approaches to a type environment data structure discussed in the previous section.
+My results are summarized in Table~\ref{env-bounds-table}; in all cases, $n$ is the number of type classes, $m$ is the maximum size of a type class, and $p$ the number of edits between two environments or one environment and the empty environment.
+$u(n)$ captures the recursive cost of class unification; I have kept this separate so that the $O(n)$ number of recursive class unifications can be separated from the direct cost of each recursive step.
 \begin{table}
 …
 \begin{tabular}{rllll}
         \hline
                                 & \textbf{Na\"{\i}ve}   & \textbf{Incremental}  & \textbf{Union-Find}           & \textbf{U-F with Classes}             \\
+                                & \textbf{Na\"{\i}ve}   & \textbf{Incremental}  & \textbf{Union-Find}           & \textbf{Persistent U-F}               \\
         \hline
         $find$          & $O(n)$                                & $O(p)$                                & $O(\alpha(m))$                        & $O(\log m)$                                   \\
         $report$        & $O(m)$                                & $O(m)$                                & $O(n \log m)$                         & $O(m)$                                                \\
+        $report$        & $O(m)$                                & $O(m)$                                & $O(n\alpha(m))$                               & $O(m)$                                                \\
         $bound$         & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
         $insert$        & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
         $add$           & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
+        $add$           & $O(1)$                                & $O(m)$                                & $O(1)$                                        & $O(1)$                                                \\
         $bind$          & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
         $unify$         & $O(m + u(n))$                 & $O(m + u(n))$                 & $O(\log m + u(n))$            & $O(\log m + u(n))$                    \\
+        $unify$         & $O(m + u(n))$                 & $O(m + u(n))$                 & $O(1 + u(n))$         & $O(1 + u(n))$                 \\
         $split$         & ---                                   & ---                                   & ---                                           & $O(\log m)$                                   \\
         $combine$       & $O(nm \cdot u(n))$    & $O(pm \cdot u(n))$    & $O(n \log m \cdot u(n))$      & $O(p \log m \cdot u(n))$              \\
+        $combine$       & $O(n^2m + nu(n))$     & $O(p^2m + pu(n))$     & $O(n\alpha(m) + nu(n))$       & $O(p \log m + pu(n))$         \\
         $save$          & $O(nm)$                               & $O(1)$                                & $O(nm)$                                       & $O(1)$                                                \\
         $backtrack$     & $O(nm)$                               & $O(pm)$                               & $O(nm)$                                       & $O(p)$                                                \\
 …
 \end{table}
+\subsection{Na\"{\i}ve and Incremental}
+\label{naive-incremental-analysis}
+The na\"{\i}ve type environment data structure does not have an overall index for type variables, but does have an index for each type class, assumed to be hash-based here.
+As a result, every class's index must be consulted for a $find$ operation, at an overall cost of $O(n)$.
+The incremental variant builds an overall hash-based index as it is queried, but may need to recursively check its parent environments if its local index does not contain a type variable, and may have as many parents as times it has been modified, for cost $O(p)$.
+It should be noted that subsequent queries for the same variable execute in constant time.
+Since both na\"{\i}ve and incremental variants store complete type classes, the cost of a $report$ operation is simply the time needed to output the contained variables, $O(m)$.
+Since the type classes store their bounds, $bound$ and $bind$ are both $O(1)$ given a type class.
+Once a $find$ operation has already been performed to verify that a type variable does not exist in the environment, the data structures for both these variants support adding new type classes (the $insert$ operation) in $O(1)$.
+Adding a variable to a type class (the $add$ operation) can be done in $O(1)$ for the na\"{\i}ve implementation, but the incremental implementation may need to copy the edited type class from a parent at cost $O(m)$.
+The linear storage of type classes in both variants also leads to $O(m)$ time for the variable-merging step in $unify$, plus the usual $u(n)$ recursion term for $unifyBound$.
+The na\"{\i}ve $combine$ operation must traverse each of the classes of one environment, merging in any class of the other environment that shares a type variable.
+Since there are at most $n$ classes to unify, the unification cost is $O(nm + nu(n))$, while traversal and $find$ costs to locate classes to merge total $O(n^2m)$, for an overall cost of $O(n^2m + nu(n))$.
+The incremental $combine$ operation works similarly, but only needs to consider classes modified in either environment since the common ancestor of both environments, allowing the $n$ cost terms to be substituted for $p$, for an overall cost of $O(p^m + pu(n))$.
+Neither variant supports the $split$ operation to undo a $unify$.
+The na\"{\i}ve environment does nothing to support $save$ and $backtrack$, so these operations must be implemented by making a deep copy of the environment on $save$, then a destructive overwrite on $backtrack$, each at a cost of $O(nm)$.
+The incremental environment supports $O(1)$ $save$ by simply setting aside a reference to the current environment, then proceeding with a new, empty environment with the former environment as a parent.
+$backtrack$ to a parent environment may involve destroying all the intermediate environments if this backtrack removes the last reference to the backtracked-from environment; this cost is $O(pm)$.
+\subsection{Union-Find}
+The union-find data structure is designed to support $find$ efficiently, and thus for any variant, the cost is simply the distance up the tree to the representative element.
+For basic union-find, this is amortized to the inverse Ackermann function, $O(\alpha(m))$, essentially a small constant, though the loss of path compression in persistent union-find raises this cost to the still cheap $O(\log m)$.
+Basic union-find is not designed to support the $report$ operation, however, so it must be simulated by checking the representative of every type varaible, at cost $O(n\alpha(m))$.
+Persistent union-find, on the other hand, uses the ``with classes'' extension to union-find described in Section~\ref{env-union-find-classes-approach} to run $report$ in $O(m)$ time.
+All union-find environment variants described here use a secondary hash table to map from class representatives to bindings, and as such can perform $bound$ and $bind$ in $O(1)$, given the representative.
+$insert$ is also a $O(1)$ operation for both basic and persistent union-find.
+Since $add$ simply involves attaching a new child to the class root, it is also a $O(1)$ operation for all union-find environment variants.
+Union-find is also designed to support $unify$ in constant time, and as such, given class representatives, the variable-merging most of $unify$ for both variants is $O(1)$ to make one representative the child of the other, plus the $O(u(n))$ term for $unifyBound$.
+Basic union-find does not support $split$, but persistent union-find can accomplish it using the mechanism described in Section~\ref{env-union-find-classes-approach} in $O(\log m)$, the cost of traversing up to the root of a class from a leaf without path compression.
+$combine$ on the basic union-find data structure works similarly to the data structures discussed above in Section~\ref{naive-incremental-analysis}, with a $O(nu(n))$ term for the $O(n)$ underlying $unify$ operations, and a $O(n\alpha(m))$ term to find the classes which need unification by checking the class representatives of each corresponding type variable in both environments for equality.
+Persistent union-find uses a different approach for $combine$, discussed in Section~\ref{env-persistent-union-find}.
+Discounting recursive $unify$ operations included in the $u(n)$ $unifyBound$ term, there may be at most $O(p)$ $unify$ operations performed, at cost $O(pu(n))$.
+Each of the $O(p)$ steps on the edit path can be processed in the $O(\log m)$ time it takes to find the current representative of the modified type class, for a total runtime of $O(p \log m + pu(n))$.
+In terms of backtracking operations, the basic union-find data structure only supports deep copies, for $O(nm)$ cost for both $save$ and $backtrack$.
+Persistent union-find, as the name suggests, is more optimized, with $O(1)$ cost to $save$ a backtrack-capable reference to the current environment state, and $O(p)$ cost to revert to that state (possibly destroying no-longer-used edit nodes along the path).
+\section{Experiments}
 % Future work: design multi-threaded version of C&F persistent map --- core idea is some sort of thread-boundary edit node

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 21cf101 for doc/theses/aaron_moss_PhD/phd/type-environment.tex

Legend:

doc/theses/aaron_moss_PhD/phd/type-environment.tex

Download in other formats: