Ignore:
Timestamp:
Jan 28, 2019, 10:46:46 PM (5 years ago)
Author:
Aaron Moss <a3moss@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
0e6a0be
Parents:
b4fd981
Message:

thesis: finish first draft of type environment analysis section

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/aaron_moss_PhD/phd/type-environment.tex

    rb4fd981 r21cf101  
    144144
    145145\subsection{Persistent Union-Find}
     146\label{env-persistent-union-find}
    146147
    147148Given the backtracking nature of the resolution algorithm discussed in Section~\ref{env-defn-sec}, the abilities to quickly switch between related versions of a type environment and to de-duplicate shared data between environments are both assets to performance.
     
    167168In this variant of \CFACC{}, this persistent hash table is used as the side map discussed in Section~\ref{env-union-find-approach} for class bounds.
    168169The actual union-find data structure is slightly modified from this approach, with a !Base! node containing the root union-find data structure, !Add! nodes adding new elements, !AddTo! nodes defining the union of two type classes, and !Remove! and !RemoveFrom! nodes as inverses of the previous two elements, for purposes of maintaining the edit list; Figure~\ref{persistent-union-find-fig} demonstrates the structure of a simple example.
    169 Making !AddTo! and !RemoveFrom! single nodes shortens the edit path for improved performance, while also providing semantic information missing from the raw array updates in Conchon and Filli\^{a}tre's data structure.
    170 The single-node approach, does, however, break under most path-compression algorithms; !RemoveFrom! can be applied to the underlying data structure using the ``leaf of last union'' approach discussed in in Section~\ref{env-union-find-classes-approach}; this was judged an acceptable trade-off for the added semantic information and shortened paths.
     170Making !AddTo! and !RemoveFrom! single nodes provides semantic information missing from the raw array updates in Conchon and Filli\^{a}tre's data structure.
     171!RemoveFrom! is implemented using the ``leaf of last union'' approach discussed in Section~\ref{env-union-find-classes-approach}; this does, however, preclude the use of path-compression algorithms in this persistent union-find data structure.
    171172
    172173\begin{figure}
     
    177178\end{figure}
    178179
    179 Maintaining explicit information on $union$ operations in the persistent union-find edit tree in the form of !AddTo! and !RemoveFrom! nodes exposes a new option for combining type environments.
     180This added semantic information on $union$ operations in the persistent union-find edit tree exposes a new option for combining type environments.
    180181If the type environments are part of the same edit tree, one environment $T'$ can be combined with another $T$ by only testing the edits on the path from $T'$ to $T$ in both the persistent union-find data structure describing the classes and the persistent hash table containing the class bounds.
    181182This is generally more efficient than testing the compatibility of all type classes in $T'$, as only those that are actually different than those in $T$ must be considered.
     183This improved performance comes at the cost of some flexibility, however, as the edit-tree link must be maintained between any two environments to be combined under this algorithm.
    182184
    183185The procedure for $combine(T, T')$ based on edit paths is as follows:
     
    193195
    194196In this section I present asymptotic analyses of the various approaches to a type environment data structure discussed in the previous section.
     197My results are summarized in Table~\ref{env-bounds-table}; in all cases, $n$ is the number of type classes, $m$ is the maximum size of a type class, and $p$ the number of edits between two environments or one environment and the empty environment.
     198$u(n)$ captures the recursive cost of class unification; I have kept this separate so that the $O(n)$ number of recursive class unifications can be separated from the direct cost of each recursive step.
    195199
    196200\begin{table}
     
    200204\begin{tabular}{rllll}
    201205        \hline
    202                                 & \textbf{Na\"{\i}ve}   & \textbf{Incremental}  & \textbf{Union-Find}           & \textbf{U-F with Classes}             \\
     206                                & \textbf{Na\"{\i}ve}   & \textbf{Incremental}  & \textbf{Union-Find}           & \textbf{Persistent U-F}               \\
    203207        \hline
    204208        $find$          & $O(n)$                                & $O(p)$                                & $O(\alpha(m))$                        & $O(\log m)$                                   \\
    205         $report$        & $O(m)$                                & $O(m)$                                & $O(n \log m)$                         & $O(m)$                                                \\
     209        $report$        & $O(m)$                                & $O(m)$                                & $O(n\alpha(m))$                               & $O(m)$                                                \\
    206210        $bound$         & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
    207211        $insert$        & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
    208         $add$           & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
     212        $add$           & $O(1)$                                & $O(m)$                                & $O(1)$                                        & $O(1)$                                                \\
    209213        $bind$          & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
    210         $unify$         & $O(m + u(n))$                 & $O(m + u(n))$                 & $O(\log m + u(n))$            & $O(\log m + u(n))$                    \\
     214        $unify$         & $O(m + u(n))$                 & $O(m + u(n))$                 & $O(1 + u(n))$         & $O(1 + u(n))$                 \\
    211215        $split$         & ---                                   & ---                                   & ---                                           & $O(\log m)$                                   \\
    212         $combine$       & $O(nm \cdot u(n))$    & $O(pm \cdot u(n))$    & $O(n \log m \cdot u(n))$      & $O(p \log m \cdot u(n))$              \\
     216        $combine$       & $O(n^2m + nu(n))$     & $O(p^2m + pu(n))$     & $O(n\alpha(m) + nu(n))$       & $O(p \log m + pu(n))$         \\
    213217        $save$          & $O(nm)$                               & $O(1)$                                & $O(nm)$                                       & $O(1)$                                                \\
    214218        $backtrack$     & $O(nm)$                               & $O(pm)$                               & $O(nm)$                                       & $O(p)$                                                \\
     
    217221\end{table}
    218222
     223\subsection{Na\"{\i}ve and Incremental}
     224\label{naive-incremental-analysis}
     225
     226The na\"{\i}ve type environment data structure does not have an overall index for type variables, but does have an index for each type class, assumed to be hash-based here.
     227As a result, every class's index must be consulted for a $find$ operation, at an overall cost of $O(n)$.
     228The incremental variant builds an overall hash-based index as it is queried, but may need to recursively check its parent environments if its local index does not contain a type variable, and may have as many parents as times it has been modified, for cost $O(p)$.
     229It should be noted that subsequent queries for the same variable execute in constant time.
     230
     231Since both na\"{\i}ve and incremental variants store complete type classes, the cost of a $report$ operation is simply the time needed to output the contained variables, $O(m)$.
     232Since the type classes store their bounds, $bound$ and $bind$ are both $O(1)$ given a type class.
     233Once a $find$ operation has already been performed to verify that a type variable does not exist in the environment, the data structures for both these variants support adding new type classes (the $insert$ operation) in $O(1)$.
     234Adding a variable to a type class (the $add$ operation) can be done in $O(1)$ for the na\"{\i}ve implementation, but the incremental implementation may need to copy the edited type class from a parent at cost $O(m)$.
     235
     236The linear storage of type classes in both variants also leads to $O(m)$ time for the variable-merging step in $unify$, plus the usual $u(n)$ recursion term for $unifyBound$.
     237The na\"{\i}ve $combine$ operation must traverse each of the classes of one environment, merging in any class of the other environment that shares a type variable.
     238Since there are at most $n$ classes to unify, the unification cost is $O(nm + nu(n))$, while traversal and $find$ costs to locate classes to merge total $O(n^2m)$, for an overall cost of $O(n^2m + nu(n))$.
     239The incremental $combine$ operation works similarly, but only needs to consider classes modified in either environment since the common ancestor of both environments, allowing the $n$ cost terms to be substituted for $p$, for an overall cost of $O(p^m + pu(n))$.
     240Neither variant supports the $split$ operation to undo a $unify$.
     241
     242The na\"{\i}ve environment does nothing to support $save$ and $backtrack$, so these operations must be implemented by making a deep copy of the environment on $save$, then a destructive overwrite on $backtrack$, each at a cost of $O(nm)$.
     243The incremental environment supports $O(1)$ $save$ by simply setting aside a reference to the current environment, then proceeding with a new, empty environment with the former environment as a parent.
     244$backtrack$ to a parent environment may involve destroying all the intermediate environments if this backtrack removes the last reference to the backtracked-from environment; this cost is $O(pm)$.
     245
     246\subsection{Union-Find}
     247
     248The union-find data structure is designed to support $find$ efficiently, and thus for any variant, the cost is simply the distance up the tree to the representative element.
     249For basic union-find, this is amortized to the inverse Ackermann function, $O(\alpha(m))$, essentially a small constant, though the loss of path compression in persistent union-find raises this cost to the still cheap $O(\log m)$.
     250Basic union-find is not designed to support the $report$ operation, however, so it must be simulated by checking the representative of every type varaible, at cost $O(n\alpha(m))$.
     251Persistent union-find, on the other hand, uses the ``with classes'' extension to union-find described in Section~\ref{env-union-find-classes-approach} to run $report$ in $O(m)$ time.
     252
     253All union-find environment variants described here use a secondary hash table to map from class representatives to bindings, and as such can perform $bound$ and $bind$ in $O(1)$, given the representative.
     254$insert$ is also a $O(1)$ operation for both basic and persistent union-find.
     255Since $add$ simply involves attaching a new child to the class root, it is also a $O(1)$ operation for all union-find environment variants.
     256
     257Union-find is also designed to support $unify$ in constant time, and as such, given class representatives, the variable-merging most of $unify$ for both variants is $O(1)$ to make one representative the child of the other, plus the $O(u(n))$ term for $unifyBound$.
     258Basic union-find does not support $split$, but persistent union-find can accomplish it using the mechanism described in Section~\ref{env-union-find-classes-approach} in $O(\log m)$, the cost of traversing up to the root of a class from a leaf without path compression.
     259$combine$ on the basic union-find data structure works similarly to the data structures discussed above in Section~\ref{naive-incremental-analysis}, with a $O(nu(n))$ term for the $O(n)$ underlying $unify$ operations, and a $O(n\alpha(m))$ term to find the classes which need unification by checking the class representatives of each corresponding type variable in both environments for equality.
     260Persistent union-find uses a different approach for $combine$, discussed in Section~\ref{env-persistent-union-find}.
     261Discounting recursive $unify$ operations included in the $u(n)$ $unifyBound$ term, there may be at most $O(p)$ $unify$ operations performed, at cost $O(pu(n))$.
     262Each of the $O(p)$ steps on the edit path can be processed in the $O(\log m)$ time it takes to find the current representative of the modified type class, for a total runtime of $O(p \log m + pu(n))$.
     263
     264In terms of backtracking operations, the basic union-find data structure only supports deep copies, for $O(nm)$ cost for both $save$ and $backtrack$.
     265Persistent union-find, as the name suggests, is more optimized, with $O(1)$ cost to $save$ a backtrack-capable reference to the current environment state, and $O(p)$ cost to revert to that state (possibly destroying no-longer-used edit nodes along the path).
     266
     267\section{Experiments}
     268
    219269% Future work: design multi-threaded version of C&F persistent map --- core idea is some sort of thread-boundary edit node
Note: See TracChangeset for help on using the changeset viewer.