Changeset 5ebb1368


Ignore:
Timestamp:
Dec 12, 2018, 9:16:23 AM (3 years ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
aaron-thesis, arm-eh, cleanup-dtors, deferred_resn, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, no_list, persistent-indexer
Children:
85acec94, ac1ae2c
Parents:
200fcb3 (diff), 29207bf (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/aaron_moss_PhD/phd/type-environment.tex

    r200fcb3 r5ebb1368  
    2424        $add(T_i, v_{i,j})$ &                                                           & Add variable to class                 \\
    2525        $bind(T_i, b_i)$ &                                                                      & Set or update class bound             \\
    26         $remove(T, T_i)$ &                                                                      & Remove class from environment \\
    2726        $unify(T, T_i, T_j)$ & $\rightarrow \top | \bot$        & Combine two type classes              \\
     27        $split(T, T_i)$ & $\rightarrow T'$                                      & Revert the last $unify$ operation on $T_i$            \\
    2828        $combine(T, T')$ & $\rightarrow \top | \bot$            & Merge two environments                \\
    2929        $save(T)$ & $\rightarrow H$                                                     & Get handle for current state  \\
     
    4040The $add(T_i, v_{i,j})$ operation adds a new type variable $v_{i,j}$ to class $T_i$; again, $v_{i,j}$ cannot exist elsewhere in $T$.
    4141$bind(T_i, b_i)$ mutates the bound for a type class, setting or updating the current bound.
    42 The final basic mutation operation is $remove(T, T_i)$, which removes a class $T_i$ and all its type variables from an environment $T$.
    4342
    4443The $unify$ operation is the fundamental non-trivial operation a type environment data structure must support.
     
    5049For instance, unifying !R*! with !S*! for type variables !R! and !S! will result in a call to $unify(T, find($!R!$), find($!S!$))$, while unifying !R*! with !int*! will result in a call to $unifyBound$ on !int! and the bound type of the class containing !R!.
    5150As such, a call to $unify(T, T_i, T_j)$ may touch every type class in $T$, not just $T_i$ and $T_j$, collapsing the entirety of $T$ into a single type class in extreme cases.
     51For more information on \CFA{} unification, see \cite{Bilson03}.
     52The inverse of $unify$ is $split(T, T_i)$, which produces a new environment $T'$ which is the same as $T$ except that $T_i$ has been replaced by two classes corresponding to the arguments to the previous call to $unify$ on $T_i$.
     53If there has been no call to $unify$ on $T_i$ (\ie{} $T_i$ is a single-element class) $T_i$ is absent in $T'$.
    5254
    5355Given the nature of the expression resolution problem as backtracking search, caching and concurrency are both useful tools to decrease runtime.
     
    5759The invalid state of $T$ on failure is not important, given that a combination failure will result in the resolution algorithm backtracking to a different environment.
    5860$combine$ proceeds by calls to $insert$, $add$, and $unify$ as needed, and can be roughly thought of as calling $unify$ on every pair of classes in $T$ that have variables $v'_{i,j}$ and $v'_{i,k}$ in the same class $T'_i$ in $T'$.
    59 Like for $unify$, $combine$ can always find a mutually-consistent partition of type variables into classes (in the extreme case, all type variables from $T$ and $T'$ in a single type class), but may fail due to inconsistent bounds on merged type classes.
     61Like $unify$, $combine$ can always find a mutually-consistent partition of type variables into classes (in the extreme case, all type variables from $T$ and $T'$ in a single type class), but may fail due to inconsistent bounds on merged type classes.
    6062
    6163Finally, the backtracking access patterns of the compiler can be exploited to reduce memory usage or runtime through use of an appropriately designed data structure.
     
    103105\subsection{Union-Find with Classes} \label{env-union-find-classes-approach}
    104106
    105 Another type environment operation not supported directly by the union-find data structure is $report$, which lists the type variables in a given class, and similarly $remove$, which removes a class and all its type variables from the environment.
     107Another type environment operation not supported directly by the union-find data structure is $report$, which lists the type variables in a given class, and similarly $split$, which reverts a $unify$ operation.
    106108Since the union-find data structure stores only links from children to parents and not vice-versa, there is no way to reconstruct a class from one of its elements without a linear search over the entire data structure, with $find$ called on each element to check its membership in the class.
    107 The situation is even worse for the $remove$ operation, where a na\"{\i}ve removal of every element belonging to a specific class would likely remove some parents before their children, requiring either extra bookkeeping or passes through the data structure to remove the leaf elements of the class first.
     109The situation is even worse for the $split$ operation, which would require extra information to maintain the order that each child was added to its parent node.
    108110Unfortunately, the literature\cite{Tarjan84,Galil91,Patwary10} on union-find does not present a way to keep references to children without breaking the asymptotic time bounds of the algorithm; I have discovered a method to do so which, despite its simplicity, seems to be novel.
    109111
     
    112114The core idea of this ``union-find with classes'' data structure and algorithm is to keep the members of each class stored in a circularly-linked list.
    113115Aho, Hopcroft, and Ullman also include a circularly-linked list in their 1974 textbook~\cite{Aho74}.
    114 However, the algorithm presented by Aho~\etal{} has an entirely flat class hierarchy, where all elements were direct children of the representative, giving constant-time $find$ at the cost of linear-time $union$ operations.
     116However, the algorithm presented by Aho~\etal{} has an entirely flat class hierarchy, where all elements are direct children of the representative, giving constant-time $find$ at the cost of linear-time $union$ operations.
    115117In my version, the list data structure does not affect the layout of the union-find tree, maintaining the same asymptotic bounds as union-find.
    116118In more detail, each element is given a !next! pointer to another element in the same class; this !next! pointer initially points to the element itself.
     
    143145\TODO{port figure from slideshow}
    144146
    145 In Baker's persistent array, an array reference contains either a pointer to the array or a pointer to an \emph{edit node}; these edit nodes contain an array index, the value in that index, and another array index pointing either to the array or a different edit node.
     147In Baker's persistent array, an array reference contains either a pointer to the array or a pointer to an \emph{edit node}; these edit nodes contain an array index, the value in that index, and another array reference pointing either to the array or a different edit node.
    146148In this manner, a tree of edits is formed, rooted at the actual array.
    147149Read from the actual array at the root can be performed in constant time, as with a non-persistent array.
    148 The persistent array can be mutated by directly modifying the underlying array, then replacing its array reference with an edit node containing the mutated index, the previous value at that index, and a reference to the mutated array.
    149 This mutation algorithm is in some sense a special case of the key operation on persistent arrays, $reroot$.
     150The persistent array can be mutated in constant time by directly modifying the underlying array, then replacing its array reference with an edit node containing the mutated index, the previous value at that index, and a reference to the mutated array. If the current array reference is not the root, mutation consists simply of constructing a new edit node encoding the change and referring to the current array reference.
     151The mutation algorithm at the root is in some sense a special case of the key operation on persistent arrays, $reroot$.
    150152
    151153A rerooting operation takes any array reference and makes it the root node of the array.
     
    182184In this section I present asymptotic analyses of the various approaches to a type environment data structure discussed in the previous section.
    183185
     186\begin{table}
     187\caption[Type environment operation bounds]{Worst-case analysis of type environment operations. $n$ is the number of type classes, $m$ the maximum size of a type class, and $p$ the edit distance between two environments or a single environment and the empty environment; $u(n)$ captures the recursive cost of class unification.}
     188\label{env-bounds-table}
     189\centering
     190\begin{tabular}{rllll}
     191        \hline
     192                                & \textbf{Na\"{\i}ve}   & \textbf{Incremental}  & \textbf{Union-Find}           & \textbf{U-F with Classes}             \\
     193        \hline
     194        $find$          & $O(n)$                                & $O(p)$                                & $O(\alpha(m))$                        & $O(\log m)$                                   \\
     195        $report$        & $O(m)$                                & $O(m)$                                & $O(n \log m)$                         & $O(m)$                                                \\
     196        $bound$         & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
     197        $insert$        & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
     198        $add$           & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
     199        $bind$          & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
     200        $unify$         & $O(m + u(n))$                 & $O(m + u(n))$                 & $O(\log m + u(n))$            & $O(\log m + u(n))$                    \\
     201        $split$         & ---                                   & ---                                   & ---                                           & $O(\log m)$                                   \\
     202        $combine$       & $O(nm \cdot u(n))$    & $O(pm \cdot u(n))$    & $O(n \log m \cdot u(n))$      & $O(p \log m \cdot u(n))$              \\
     203        $save$          & $O(nm)$                               & $O(1)$                                & $O(nm)$                                       & $O(1)$                                                \\
     204        $backtrack$     & $O(nm)$                               & $O(pm)$                               & $O(nm)$                                       & $O(p)$                                                \\
     205        \hline
     206\end{tabular}
     207\end{table}
     208
    184209% Future work: design multi-threaded version of C&F persistent map --- core idea is some sort of thread-boundary edit node
Note: See TracChangeset for help on using the changeset viewer.