 Timestamp:
 Jan 28, 2019, 10:46:46 PM (5 years ago)
 Branches:
 ADT, aaronthesis, armeh, astexperimental, cleanupdtors, enum, forallpointerdecay, jacob/cs343translation, jenkinssandbox, master, newast, newastuniqueexpr, pthreademulation, qualifiedEnum
 Children:
 0e6a0be
 Parents:
 b4fd981
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

doc/theses/aaron_moss_PhD/phd/typeenvironment.tex
rb4fd981 r21cf101 144 144 145 145 \subsection{Persistent UnionFind} 146 \label{envpersistentunionfind} 146 147 147 148 Given the backtracking nature of the resolution algorithm discussed in Section~\ref{envdefnsec}, the abilities to quickly switch between related versions of a type environment and to deduplicate shared data between environments are both assets to performance. … … 167 168 In this variant of \CFACC{}, this persistent hash table is used as the side map discussed in Section~\ref{envunionfindapproach} for class bounds. 168 169 The actual unionfind data structure is slightly modified from this approach, with a !Base! node containing the root unionfind data structure, !Add! nodes adding new elements, !AddTo! nodes defining the union of two type classes, and !Remove! and !RemoveFrom! nodes as inverses of the previous two elements, for purposes of maintaining the edit list; Figure~\ref{persistentunionfindfig} demonstrates the structure of a simple example. 169 Making !AddTo! and !RemoveFrom! single nodes shortens the edit path for improved performance, while also providingsemantic information missing from the raw array updates in Conchon and Filli\^{a}tre's data structure.170 The singlenode approach, does, however, break under most pathcompression algorithms; !RemoveFrom! can be applied to the underlying data structure using the ``leaf of last union'' approach discussed in in Section~\ref{envunionfindclassesapproach}; this was judged an acceptable tradeoff for the added semantic information and shortened paths. 170 Making !AddTo! and !RemoveFrom! single nodes provides semantic information missing from the raw array updates in Conchon and Filli\^{a}tre's data structure. 171 !RemoveFrom! is implemented using the ``leaf of last union'' approach discussed in Section~\ref{envunionfindclassesapproach}; this does, however, preclude the use of pathcompression algorithms in this persistent unionfind data structure. 171 172 172 173 \begin{figure} … … 177 178 \end{figure} 178 179 179 Maintaining explicit information on $union$ operations in the persistent unionfind edit tree in the form of !AddTo! and !RemoveFrom! nodesexposes a new option for combining type environments.180 This added semantic information on $union$ operations in the persistent unionfind edit tree exposes a new option for combining type environments. 180 181 If the type environments are part of the same edit tree, one environment $T'$ can be combined with another $T$ by only testing the edits on the path from $T'$ to $T$ in both the persistent unionfind data structure describing the classes and the persistent hash table containing the class bounds. 181 182 This is generally more efficient than testing the compatibility of all type classes in $T'$, as only those that are actually different than those in $T$ must be considered. 183 This improved performance comes at the cost of some flexibility, however, as the edittree link must be maintained between any two environments to be combined under this algorithm. 182 184 183 185 The procedure for $combine(T, T')$ based on edit paths is as follows: … … 193 195 194 196 In this section I present asymptotic analyses of the various approaches to a type environment data structure discussed in the previous section. 197 My results are summarized in Table~\ref{envboundstable}; in all cases, $n$ is the number of type classes, $m$ is the maximum size of a type class, and $p$ the number of edits between two environments or one environment and the empty environment. 198 $u(n)$ captures the recursive cost of class unification; I have kept this separate so that the $O(n)$ number of recursive class unifications can be separated from the direct cost of each recursive step. 195 199 196 200 \begin{table} … … 200 204 \begin{tabular}{rllll} 201 205 \hline 202 & \textbf{Na\"{\i}ve} & \textbf{Incremental} & \textbf{UnionFind} & \textbf{ UF with Classes} \\206 & \textbf{Na\"{\i}ve} & \textbf{Incremental} & \textbf{UnionFind} & \textbf{Persistent UF} \\ 203 207 \hline 204 208 $find$ & $O(n)$ & $O(p)$ & $O(\alpha(m))$ & $O(\log m)$ \\ 205 $report$ & $O(m)$ & $O(m)$ & $O(n \log m)$ & $O(m)$ \\209 $report$ & $O(m)$ & $O(m)$ & $O(n\alpha(m))$ & $O(m)$ \\ 206 210 $bound$ & $O(1)$ & $O(1)$ & $O(1)$ & $O(1)$ \\ 207 211 $insert$ & $O(1)$ & $O(1)$ & $O(1)$ & $O(1)$ \\ 208 $add$ & $O(1)$ & $O( 1)$ & $O(1)$ & $O(1)$ \\212 $add$ & $O(1)$ & $O(m)$ & $O(1)$ & $O(1)$ \\ 209 213 $bind$ & $O(1)$ & $O(1)$ & $O(1)$ & $O(1)$ \\ 210 $unify$ & $O(m + u(n))$ & $O(m + u(n))$ & $O( \log m + u(n))$ & $O(\log m+ u(n))$ \\214 $unify$ & $O(m + u(n))$ & $O(m + u(n))$ & $O(1 + u(n))$ & $O(1 + u(n))$ \\ 211 215 $split$ &  &  &  & $O(\log m)$ \\ 212 $combine$ & $O(n m \cdot u(n))$ & $O(pm \cdot u(n))$ & $O(n \log m \cdot u(n))$ & $O(p \log m \cdotu(n))$ \\216 $combine$ & $O(n^2m + nu(n))$ & $O(p^2m + pu(n))$ & $O(n\alpha(m) + nu(n))$ & $O(p \log m + pu(n))$ \\ 213 217 $save$ & $O(nm)$ & $O(1)$ & $O(nm)$ & $O(1)$ \\ 214 218 $backtrack$ & $O(nm)$ & $O(pm)$ & $O(nm)$ & $O(p)$ \\ … … 217 221 \end{table} 218 222 223 \subsection{Na\"{\i}ve and Incremental} 224 \label{naiveincrementalanalysis} 225 226 The na\"{\i}ve type environment data structure does not have an overall index for type variables, but does have an index for each type class, assumed to be hashbased here. 227 As a result, every class's index must be consulted for a $find$ operation, at an overall cost of $O(n)$. 228 The incremental variant builds an overall hashbased index as it is queried, but may need to recursively check its parent environments if its local index does not contain a type variable, and may have as many parents as times it has been modified, for cost $O(p)$. 229 It should be noted that subsequent queries for the same variable execute in constant time. 230 231 Since both na\"{\i}ve and incremental variants store complete type classes, the cost of a $report$ operation is simply the time needed to output the contained variables, $O(m)$. 232 Since the type classes store their bounds, $bound$ and $bind$ are both $O(1)$ given a type class. 233 Once a $find$ operation has already been performed to verify that a type variable does not exist in the environment, the data structures for both these variants support adding new type classes (the $insert$ operation) in $O(1)$. 234 Adding a variable to a type class (the $add$ operation) can be done in $O(1)$ for the na\"{\i}ve implementation, but the incremental implementation may need to copy the edited type class from a parent at cost $O(m)$. 235 236 The linear storage of type classes in both variants also leads to $O(m)$ time for the variablemerging step in $unify$, plus the usual $u(n)$ recursion term for $unifyBound$. 237 The na\"{\i}ve $combine$ operation must traverse each of the classes of one environment, merging in any class of the other environment that shares a type variable. 238 Since there are at most $n$ classes to unify, the unification cost is $O(nm + nu(n))$, while traversal and $find$ costs to locate classes to merge total $O(n^2m)$, for an overall cost of $O(n^2m + nu(n))$. 239 The incremental $combine$ operation works similarly, but only needs to consider classes modified in either environment since the common ancestor of both environments, allowing the $n$ cost terms to be substituted for $p$, for an overall cost of $O(p^m + pu(n))$. 240 Neither variant supports the $split$ operation to undo a $unify$. 241 242 The na\"{\i}ve environment does nothing to support $save$ and $backtrack$, so these operations must be implemented by making a deep copy of the environment on $save$, then a destructive overwrite on $backtrack$, each at a cost of $O(nm)$. 243 The incremental environment supports $O(1)$ $save$ by simply setting aside a reference to the current environment, then proceeding with a new, empty environment with the former environment as a parent. 244 $backtrack$ to a parent environment may involve destroying all the intermediate environments if this backtrack removes the last reference to the backtrackedfrom environment; this cost is $O(pm)$. 245 246 \subsection{UnionFind} 247 248 The unionfind data structure is designed to support $find$ efficiently, and thus for any variant, the cost is simply the distance up the tree to the representative element. 249 For basic unionfind, this is amortized to the inverse Ackermann function, $O(\alpha(m))$, essentially a small constant, though the loss of path compression in persistent unionfind raises this cost to the still cheap $O(\log m)$. 250 Basic unionfind is not designed to support the $report$ operation, however, so it must be simulated by checking the representative of every type varaible, at cost $O(n\alpha(m))$. 251 Persistent unionfind, on the other hand, uses the ``with classes'' extension to unionfind described in Section~\ref{envunionfindclassesapproach} to run $report$ in $O(m)$ time. 252 253 All unionfind environment variants described here use a secondary hash table to map from class representatives to bindings, and as such can perform $bound$ and $bind$ in $O(1)$, given the representative. 254 $insert$ is also a $O(1)$ operation for both basic and persistent unionfind. 255 Since $add$ simply involves attaching a new child to the class root, it is also a $O(1)$ operation for all unionfind environment variants. 256 257 Unionfind is also designed to support $unify$ in constant time, and as such, given class representatives, the variablemerging most of $unify$ for both variants is $O(1)$ to make one representative the child of the other, plus the $O(u(n))$ term for $unifyBound$. 258 Basic unionfind does not support $split$, but persistent unionfind can accomplish it using the mechanism described in Section~\ref{envunionfindclassesapproach} in $O(\log m)$, the cost of traversing up to the root of a class from a leaf without path compression. 259 $combine$ on the basic unionfind data structure works similarly to the data structures discussed above in Section~\ref{naiveincrementalanalysis}, with a $O(nu(n))$ term for the $O(n)$ underlying $unify$ operations, and a $O(n\alpha(m))$ term to find the classes which need unification by checking the class representatives of each corresponding type variable in both environments for equality. 260 Persistent unionfind uses a different approach for $combine$, discussed in Section~\ref{envpersistentunionfind}. 261 Discounting recursive $unify$ operations included in the $u(n)$ $unifyBound$ term, there may be at most $O(p)$ $unify$ operations performed, at cost $O(pu(n))$. 262 Each of the $O(p)$ steps on the edit path can be processed in the $O(\log m)$ time it takes to find the current representative of the modified type class, for a total runtime of $O(p \log m + pu(n))$. 263 264 In terms of backtracking operations, the basic unionfind data structure only supports deep copies, for $O(nm)$ cost for both $save$ and $backtrack$. 265 Persistent unionfind, as the name suggests, is more optimized, with $O(1)$ cost to $save$ a backtrackcapable reference to the current environment state, and $O(p)$ cost to revert to that state (possibly destroying nolongerused edit nodes along the path). 266 267 \section{Experiments} 268 219 269 % Future work: design multithreaded version of C&F persistent map  core idea is some sort of threadboundary edit node
Note: See TracChangeset
for help on using the changeset viewer.