source: doc/theses/aaron_moss_PhD/phd/type-environment.tex @ 9e43aff

ADTaaron-thesisarm-ehast-experimentalcleanup-dtorsenumforall-pointer-decayjacob/cs343-translationjenkins-sandboxnew-astnew-ast-unique-exprpthread-emulationqualifiedEnum
Last change on this file since 9e43aff was 9e43aff, checked in by Aaron Moss <a3moss@…>, 6 years ago

thesis: introduce experimental results from prototype

  • Property mode set to 100644
File size: 33.3 KB
Line 
1\chapter{Type Environment}
2\label{env-chap}
3
4One key data structure for expression resolution is the \emph{type environment}.
5As discussed in Chapter~\ref{resolution-chap}, being able to efficiently determine which type variables are bound to which concrete types or whether two type environments are compatible is a core requirement of the resolution algorithm.
6Furthermore, expression resolution involves a search through many related possible solutions, so being able to re-use shared subsets of type environment data and to switch between environments quickly is desirable for performance.
7In this chapter I discuss and empirically compare a number of type environment data structure variants, including some novel variations on the union-find\cite{Galler64} data structure introduced in this thesis.
8
9\section{Definitions} \label{env-defn-sec}
10
11For purposes of this chapter, a \emph{type environment} $T$ is a set of \emph{type classes} $\myset{T_1, T_2, \cdots, T_{|T|}}$.
12Each type class $T_i$ contains a set of \emph{type variables} $\myset{v_{i,1}, v_{i,2}, \cdots, v_{i,|T_i|}}$; note that the sets of variables contained in two distinct classes in the same environment must be disjoint.
13Each individual type class $T_i$ may also be associated with a \emph{bound}, $b_i$; this bound contains the \emph{bound type} which the variables in the type class are replaced with, but also includes other information in \CFACC{}, including whether type conversions are permissible on the bound type and what sort of type variables are contained in the class (data types, function types, or variadic tuples).
14
15\begin{table}
16\caption[Type environment operation summary]{Summary of type environment operations.}
17\label{env-op-table}
18\centering
19\begin{tabular}{r@{\hskip 0.25em}ll}
20        $find(T, v_{i,j})$ & $\rightarrow T_i | \bot$           & Locate class for variable             \\
21        $report(T_i)$ & $\rightarrow \{ v_{i,j} \cdots \}$      & List variables for class              \\
22        $bound(T_i)$ & $\rightarrow b_i | \bot$                         & Get bound for class                   \\
23        $insert(v_{i,1})$ &                                                                     & New single-variable class             \\
24        $add(T_i, v_{i,j})$ &                                                           & Add variable to class                 \\
25        $bind(T_i, b_i)$ &                                                                      & Set or update class bound             \\
26        $unify(T, T_i, T_j)$ & $\rightarrow \top | \bot$        & Combine two type classes              \\
27        $split(T, T_i)$ & $\rightarrow T'$                                      & Revert the last $unify$ operation on $T_i$            \\
28        $combine(T, T')$ & $\rightarrow \top | \bot$            & Merge two environments                \\
29        $save(T)$ & $\rightarrow H$                                                     & Get handle for current state  \\
30        $backtrack(T, H)$ &                                                                     & Return to handle state
31\end{tabular}
32\end{table}
33
34Given this basic structure, type environments in \CFACC{} need to support eleven basic operations, summarized in Table~\ref{env-op-table}.
35The first seven operations are straightforward queries and updates on these data structures:
36The lookup operation $find(T, v_{i,j})$ produces $T_i$, the type class in $T$ which contains variable $v_{i,j}$, or an invalid sentinel value for no such class.
37The other two query operations act on type classes, where $report(T_i)$ produces the set $\myset{v_{i,1}, v_{i,2}, \cdots, v_{i,|T_i|}}$ of all type variables in a class $T_i$ and $bound(T_i)$ produces the bound $b_i$ of that class, or a sentinel indicating no bound is set.
38
39The update operation $insert(T, v_{i,1})$ creates a new type class $T_i$ in $T$ that contains only the variable $v_{i,1}$ and no bound; due to the disjointness property $v_{i,1}$ cannot belong to any other type class in $T$.
40The $add(T_i, v_{i,j})$ operation adds a new type variable $v_{i,j}$ to class $T_i$; again, $v_{i,j}$ cannot exist elsewhere in $T$.
41$bind(T_i, b_i)$ mutates the bound for a type class, setting or updating the current bound.
42
43The $unify$ operation is the fundamental non-trivial operation a type environment data structure must support.
44$unify(T, T_i, T_j)$ merges a type class $T_j$ into another $T_i$, producing a failure result and leaving $T$ in an invalid state if this merge fails.
45It is always possible to unify the type variables of both classes by simply taking the union of both sets; given the disjointness property, no checks for set containment are required, and the variable sets can simply be concatenated if supported by the underlying data structure.
46$unify$ depends on an internal $unifyBound$ operation which may fail.
47In \CFACC{}, $unifyBound(b_i, b_j) \rightarrow b'_i|\bot$ checks that the type classes contain the same sort of variable, takes the tighter of the two conversion permissions, and checks if the bound types can be unified.
48If the bound types cannot be unified (\eg{} !struct A! with !int*!), then $unifyBound$ fails, while other combinations of bound types may result in recursive calls.
49For instance, unifying !R*! with !S*! for type variables !R! and !S! will result in a call to $unify(T, find($!R!$), find($!S!$))$, while unifying !R*! with !int*! will result in a call to $unifyBound$ on !int! and the bound type of the class containing !R!.
50As such, a call to $unify(T, T_i, T_j)$ may touch every type class in $T$, not just $T_i$ and $T_j$, collapsing the entirety of $T$ into a single type class in extreme cases.
51For more information on \CFA{} unification, see \cite{Bilson03}.
52The inverse of $unify$ is $split(T, T_i)$, which produces a new environment $T'$ which is the same as $T$ except that $T_i$ has been replaced by two classes corresponding to the arguments to the previous call to $unify$ on $T_i$.
53If there has been no call to $unify$ on $T_i$ (\ie{} $T_i$ is a single-element class) $T_i$ is absent in $T'$.
54
55Given the nature of the expression resolution problem as backtracking search, caching and concurrency are both useful tools to decrease runtime.
56However, both of these approaches may produce multiple distinct descendants of the same initial type environment, which have possibly been mutated in incompatible ways.
57As such, to effectively employ either concurrency or caching, the type environment data structure must support an efficient method to check if two type environments are compatible and merge them if so.
58$combine(T,T')$ attempts to merge an environment $T'$ into another environment $T$, producing $\top$ if successful or leaving $T$ in an invalid state and producing $\bot$ otherwise.
59The invalid state of $T$ on failure is not important, given that a combination failure will result in the resolution algorithm backtracking to a different environment.
60$combine$ proceeds by calls to $insert$, $add$, and $unify$ as needed, and can be roughly thought of as calling $unify$ on every pair of classes in $T$ that have variables $v'_{i,j}$ and $v'_{i,k}$ in the same class $T'_i$ in $T'$.
61Like $unify$, $combine$ can always find a mutually-consistent partition of type variables into classes (in the extreme case, all type variables from $T$ and $T'$ in a single type class), but may fail due to inconsistent bounds on merged type classes.
62
63Finally, the backtracking access patterns of the compiler can be exploited to reduce memory usage or runtime through use of an appropriately designed data structure.
64The set of mutations to a type environment across the execution of the resolution algorithm produce an implicit tree of related environments, and the backtracking search typically focuses only on one leaf of the tree at once, or at most a small number of closely-related nodes as arguments to $combine$.
65As such, the ability to save and restore particular type environment states is useful, and supported by the $save(T) \rightarrow H$ and $backtrack(T, H)$ operations, which produce a handle for the current environment state and mutate an environment back to a previous state, respectively.
66These operations can be naively implemented by a deep copy of $T$ into $H$ and vice versa, but have more efficient implementations in persistency-aware data structures.
67
68\section{Approaches}
69
70\subsection{Na\"{\i}ve} \label{naive-env-sec}
71
72The type environment data structure used in Bilson's~\cite{Bilson03} original implementation of \CFACC{} is a straightforward translation of the definitions in Section~\ref{env-defn-sec} to \CC{} code; a !TypeEnvironment! contains a list of !EqvClass! type equivalence classes, each of which contains the type bound information and a tree-based sorted set of type variables.
73This approach has the benefit of being easy to understand and not imposing life-cycle or inheritance constraints on its use, but, as can be seen in Table~\ref{env-bounds-table}, does not support many of the desired operations with any particular efficiency.
74Some variations on this structure may improve performance somewhat; for instance, replacing the !EqvClass! variable storage with a hash-based set would reduce search and update times from $O(\log n)$ to amortized $O(1)$, while adding an index for the type variables in the entire environment would remove the need to check each type class individually to maintain the disjointness property.
75These improvements do not change the fundamental issues with this data structure, however.
76
77\subsection{Incremental Inheritance} \label{inc-env-sec}
78
79One more invasive modification to this data structure which I investigated is to support swifter combinations of closely-related environments in the backtracking tree by storing a reference to a \emph{parent} environment within each environment, and having that environment only store type classes which have been modified with respect to the parent.
80This approach provides constant-time copying of environments, as a new environment simply consists of an empty list of type classes and a reference to its (logically identical) parent; since many type environments are no different than their parent, this speeds backtracking in this common case.
81Since all mutations made to a child environment are by definition compatible with the parent environment, two descendants of a common ancestor environment can be combined by iteratively combining the changes made in one environment then that environment's parent until the common ancestor is reached, again re-using storage and reducing computation in many cases.
82
83For this environment I also employed a lazily-generated index of type variables to their containing class, which could be in either the current environment or an ancestor.
84Any mutation of a type class in an ancestor environment would cause that class to be copied into the current environment before mutation, as well as added to the index, ensuring that all local changes to the type environment are listed in its index.
85However, not adding type variables to the index until lookup or mutation preserves the constant-time environment copy operation in the common case in which the copy is not mutated from its parent during its life-cycle.
86
87This approach imposes some performance penalty on $combine$ if related environments are not properly linked together, as the entire environment needs to be combined rather than just the diff, but is correct as long as the ``null parent'' base case is properly handled.
88The life-cycle issues are somewhat more complex, as many environments may descend from a common parent, and all of these need their parent to stay alive for purposes of lookup.
89These issues can be solved by ``flattening'' parent nodes into their children before the parents leave scope, but given the tree structure of the inheritance graph it is more straightforward to store the parent nodes in reference-counted or otherwise automatically garbage-collected heap storage.
90
91\subsection{Union-Find} \label{env-union-find-approach}
92
93Given the nature of the classes of type variables as disjoint sets, another natural approach to implementing a type environment is the union-find disjoint set data structure~\cite{Galler64}.
94Union-find efficiently implements two operations over a partition of a collection of elements into disjoint sets; $find(x)$ locates the \emph{representative} of $x$, the element which canonically names its set, while $union(r, s)$ merges two sets represented by $r$ and $s$, respectively.
95The union-find data structure is based on providing each element with a reference to its parent element, such that the root of a tree of elements is the representative of the set of elements contained in the tree.
96$find$ is then implemented by a search up to the parent, generally combined with a \emph{path compression} step that links nodes more directly to their ancestors to speed up subsequent searches.
97$union$ involves making the representative of one set a child of the representative of the other, generally employing a rank- or size-based heuristic to ensure that the tree remains somewhat balanced.
98If both path compression and a balancing heuristic are employed, both $union$ and $find$ run in amortized $O(\alpha(n))$ worst-case time; this bound by the inverse Ackermann function is a small constant for all practical values of $n$.
99
100The union-find $find$ and $union$ operations have obvious applicability to the $find$ and $unify$ type environment operations in Table~\ref{env-op-table}, but the union-find data structure must be augmented to fully implement the type environment operations.
101In particular, the type class bound cannot be easily included in the union-find data structure, as the requirement to make it the class representative breaks the balancing properties of $union$, and requires too-close integration of the type environment $unifyBound$ internal operation.
102This issue can be solved by including a side map from class representatives to the type class bound.
103If placeholder values are inserted in this map for type classes without bounds than this also has the useful property that the key set of the map provides an easily obtainable list of all the class representatives, a list which cannot be derived from the union-find data structure without a linear search for class representatives through all elements.
104
105\subsection{Union-Find with Classes} \label{env-union-find-classes-approach}
106
107Another type environment operation not supported directly by the union-find data structure is $report$, which lists the type variables in a given class, and similarly $split$, which reverts a $unify$ operation.
108Since the union-find data structure stores only links from children to parents and not vice-versa, there is no way to reconstruct a class from one of its elements without a linear search over the entire data structure, with $find$ called on each element to check its membership in the class.
109The situation is even worse for the $split$ operation, which would require extra information to maintain the order that each child was added to its parent node.
110Unfortunately, the literature\cite{Tarjan84,Galil91,Patwary10} on union-find does not present a way to keep references to children without breaking the asymptotic time bounds of the algorithm; I have discovered a method to do so which, despite its simplicity, seems to be novel.
111
112The core idea of this ``union-find with classes'' data structure and algorithm is to keep the members of each class stored in a circularly-linked list.
113Aho, Hopcroft, and Ullman also include a circularly-linked list in their 1974 textbook~\cite{Aho74}.
114However, the algorithm presented by Aho~\etal{} has an entirely flat class hierarchy, where all elements are direct children of the representative, giving constant-time $find$ at the cost of linear-time $union$ operations.
115In my version, the list data structure does not affect the layout of the union-find tree, maintaining the same asymptotic bounds as union-find.
116In more detail, each element is given a !next! pointer to another element in the same class; this !next! pointer initially points to the element itself.
117When two classes are unified, the !next! pointers of the representatives of those classes are swapped, splicing the two circularly-linked lists together as illustrated in Figure~\ref{union-find-classes-fig}.
118Importantly, though this approach requires an extra pointer per element, it does maintain the linear space bound of union-find, and because it only requires updating the two root nodes in $union$ it does not asymptotically increase runtime either.
119The basic approach is compatible with all path-compression techniques, and allows the members of any class to be retrieved in time linear in the size of the class simply by following the !next! pointers from any element.
120
121\begin{figure}
122        \centering
123        \includegraphics{figures/union-find-with-classes}
124        \caption[Union operation for union-find with classes.]{Union operation for union-find with classes. Solid lines indicate parent pointers, dashed lines are \lstinline{next} pointers.}
125        \label{union-find-classes-fig}
126\end{figure}
127
128If the path-compression optimization is abandoned, union-find with classes also encodes a reversible history of all the $union$ operations applied to a given class.
129Theorem~\ref{env-reverse-thm} demonstrates that the !next! pointer of the representative of a class always points to a leaf from the last-added subtree.
130This property is sufficient to reverse the most-recent $union$ operation by finding the ancestor of that leaf that is an immediate child of the representative, breaking its parent link, and swapping the !next! pointers back\footnote{Union-by-size may be a more appropriate approach than union-by-rank in this instance, as adding two known sizes is a reversible operation, but the rank increment operation cannot be reliably reversed.}.
131Once the $union$ operation has been reversed, Theorem~\ref{env-reverse-thm} still holds for the reduced class, and the process can be repeated recursively until the entire set is split into its component elements.
132
133\begin{theorem} \label{env-reverse-thm}
134The !next! pointer of a class representative in the union-find with classes algorithm without path compression points to a leaf from the most-recently-added subtree.
135\end{theorem}
136
137\begin{proof}
138        By induction on the height of the tree. \\
139        \emph{Base case:} A height 1 tree by definition includes only a single item. In such a case, the representative's !next! pointer points to itself by construction, and the representative is the most-recently-added (and only) leaf in the tree. \\
140        \emph{Inductive case:} By construction, a tree $T$ of height greater than 1 has children of the root (representative) node that were representative nodes of classes merged by $union$. By definition, the most-recently-added subtree $T'$ has a smaller height than $T$, thus by the inductive hypothesis before the most-recent $union$ operation the !next! pointer of the root of $T'$ pointed to one of the leaf nodes of $T'$; by construction the !next! pointer of the root of $T$ points to this leaf after the $union$ operation.
141\end{proof}
142
143On its own, union-find, like the na\"{\i}ve approach, has no special constraints on life-cycle or inheritance, but it can be used as a building block in more sophisticated type environment data structures.
144
145\subsection{Persistent Union-Find}
146\label{env-persistent-union-find}
147
148Given the backtracking nature of the resolution algorithm discussed in Section~\ref{env-defn-sec}, the abilities to quickly switch between related versions of a type environment and to de-duplicate shared data between environments are both assets to performance.
149Conchon and Filli\^{a}tre~\cite{Conchon07} present a persistent union-find data structure based on the persistent array of Baker~\cite{Baker78,Baker91}.
150
151In Baker's persistent array, an array reference contains either a pointer to the array or a pointer to an \emph{edit node}; these edit nodes contain an array index, the value in that index, and another array reference pointing either to the array or a different edit node.
152In this manner, a tree of edits is formed, rooted at the actual array.
153Reads from the actual array at the root can be performed in constant time, as with a non-persistent array.
154The persistent array can be mutated in constant time by directly modifying the underlying array, then replacing its array reference with an edit node containing the mutated index, the previous value at that index, and a reference to the mutated array. If the current array reference is not the root, mutation consists simply of constructing a new edit node encoding the change and referring to the current array reference. 
155The mutation algorithm at the root is in some sense a special case of the key operation on persistent arrays, $reroot$.
156
157A rerooting operation takes any array reference and makes it the root node of the array.
158This is accomplished by tracing the path from some edit node to the root node of the array (always the underlying array), recursively applying the edits to the underlying array and replacing each edit node's successor with the inverse edit.
159In this way, any previous state of the persistent array can be restored in time proportional to the number of edits to the current state of the array.
160While $reroot$ does maintain the same value mapping in every version of the persistent array, the internal mutations it performs means that it is not thread-safe, and must be used behind a lock in a concurrent context.
161Also, the root node with the actual array may in principle be anywhere in the tree, and does not provide information to report its leaf nodes, so some form of automatic garbage collection is generally required for the data structure.
162Since the graph of edit nodes is tree-structured, reference counting approaches suffice for garbage collection; Conchon and Filli\^{a}tre~\cite{Conchon07} also observe that if the only $reroot$ operations are for backtracking then the tail of inverse edit nodes may be elided, suggesting the possibility of stack-based memory management.
163
164While Conchon and Filli\^{a}tre~\cite{Conchon07} implement their persistent union-find data structure over a universe of integer elements in the fixed range $[1,N]$, the type environment problem needs more flexibility.
165In particular, an arbitrary number of type variables must be added to the environment.
166As such, a persistent hash table is a more suitable structure than a persistent array, providing the same expected asymptotic time bounds while allowing a dynamic number of elements.
167Besides replacing the underlying array with a hash table, the other major change in this approach is to replace the two types of array references, !Array! and !Edit!, with four node types, !Table!,  !Edit!, !Add!, and !Remove!, where !Add! adds a new key-value pair, !Remove! removes a key, and !Edit! mutates an existing key-value pair.
168In this variant of \CFACC{}, this persistent hash table is used as the side map discussed in Section~\ref{env-union-find-approach} for class bounds.
169The actual union-find data structure is slightly modified from this approach, with a !Base! node containing the root union-find data structure, !Add! nodes adding new elements, !AddTo! nodes defining the union of two type classes, and !Remove! and !RemoveFrom! nodes as inverses of the previous two elements, for purposes of maintaining the edit list; Figure~\ref{persistent-union-find-fig} demonstrates the structure of a simple example.
170Making !AddTo! and !RemoveFrom! single nodes provides semantic information missing from the raw array updates in Conchon and Filli\^{a}tre's data structure.
171!RemoveFrom! is implemented using the ``leaf of last union'' approach discussed in Section~\ref{env-union-find-classes-approach}; this does, however, preclude the use of path-compression algorithms in this persistent union-find data structure.
172
173\begin{figure}
174        \centering
175        \includegraphics{figures/persistent-union-find}
176        \caption[Persistent union-find data structure.]{Persistent union-find data structure. Shows the edit nodes to reverse back to an empty structure.}
177        \label{persistent-union-find-fig}
178\end{figure}
179
180This added semantic information on $union$ operations in the persistent union-find edit tree exposes a new option for combining type environments.
181If the type environments are part of the same edit tree, one environment $T'$ can be combined with another $T$ by only testing the edits on the path from $T'$ to $T$ in both the persistent union-find data structure describing the classes and the persistent hash table containing the class bounds.
182This is generally more efficient than testing the compatibility of all type classes in $T'$, as only those that are actually different than those in $T$ must be considered.
183This improved performance comes at the cost of some flexibility, however, as the edit-tree link must be maintained between any two environments to be combined under this algorithm.
184
185The procedure for $combine(T, T')$ based on edit paths is as follows:
186The shared edit trees for classes and bindings are rerooted at $T$, and the path from $T'$ to $T$ is followed to create a list of actual edits.
187By tracking the state of each element, redundant changes such as an !Edit! followed by an !Edit! can be reduced to their form in $T'$ by dropping the later (more like $T$) !Edit! for the same key; !Add! and !Remove! cancel similarly.
188This procedure is repeated for both the class edit tree and the binding edit tree.
189When the list of net changes to the environment has been produced, the additive changes are applied to $T$.
190For example, if a type class exists in $T'$ but not $T$, the corresponding !Add! edit will be applied to $T$, but in the reverse situation the !Remove! edit will not be applied to $T$, as the intention is to produce a new environment representing the union of the two sets of type classes; similarly, !AddTo! edits are applied to unify type-classes in $T$ that are united in $T'$, but !RemoveFrom! edits that split type classes are not.
191The new environment, $T''$ can always be constructed with a consistent partitioning of type variables; in the extreme case, all variables from both $T$ and $T'$ will be united in a single type class in $T''$.
192Where $combine$ can fail is in unifying the bound types; if any class in $T'$ has a class bound which does not unify with the merged class in $T''$ than $combine$ fails.
193
194\section{Analysis}
195
196In this section I present asymptotic analyses of the various approaches to a type environment data structure discussed in the previous section.
197My results are summarized in Table~\ref{env-bounds-table}; in all cases, $n$ is the number of type classes, $m$ is the maximum size of a type class, and $p$ the number of edits between two environments or one environment and the empty environment.
198$u(n)$ captures the recursive cost of class unification; I have kept this separate so that the $O(n)$ number of recursive class unifications can be separated from the direct cost of each recursive step.
199
200\begin{table}
201\caption[Type environment operation bounds]{Worst-case analysis of type environment operations. $n$ is the number of type classes, $m$ the maximum size of a type class, and $p$ the edit distance between two environments or a single environment and the empty environment; $u(n)$ captures the recursive cost of class unification.}
202\label{env-bounds-table}
203\centering
204\begin{tabular}{rllll}
205        \hline
206                                & \textbf{Na\"{\i}ve}   & \textbf{Incremental}  & \textbf{Union-Find}           & \textbf{Persistent U-F}               \\
207        \hline
208        $find$          & $O(n)$                                & $O(p)$                                & $O(\alpha(m))$                        & $O(\log m)$                                   \\
209        $report$        & $O(m)$                                & $O(m)$                                & $O(n\alpha(m))$                               & $O(m)$                                                \\
210        $bound$         & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
211        $insert$        & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
212        $add$           & $O(1)$                                & $O(m)$                                & $O(1)$                                        & $O(1)$                                                \\
213        $bind$          & $O(1)$                                & $O(1)$                                & $O(1)$                                        & $O(1)$                                                \\
214        $unify$         & $O(m + u(n))$                 & $O(m + u(n))$                 & $O(1 + u(n))$         & $O(1 + u(n))$                 \\
215        $split$         & ---                                   & ---                                   & ---                                           & $O(\log m)$                                   \\
216        $combine$       & $O(n^2m + nu(n))$     & $O(p^2m + pu(n))$     & $O(n\alpha(m) + nu(n))$       & $O(p \log m + pu(n))$         \\
217        $save$          & $O(nm)$                               & $O(1)$                                & $O(nm)$                                       & $O(1)$                                                \\
218        $backtrack$     & $O(nm)$                               & $O(pm)$                               & $O(nm)$                                       & $O(p)$                                                \\
219        \hline
220\end{tabular}
221\end{table}
222
223\subsection{Na\"{\i}ve and Incremental} 
224\label{naive-incremental-analysis}
225
226The na\"{\i}ve type environment data structure does not have an overall index for type variables, but does have an index for each type class, assumed to be hash-based here.
227As a result, every class's index must be consulted for a $find$ operation, at an overall cost of $O(n)$.
228The incremental variant builds an overall hash-based index as it is queried, but may need to recursively check its parent environments if its local index does not contain a type variable, and may have as many parents as times it has been modified, for cost $O(p)$.
229It should be noted that subsequent queries for the same variable execute in constant time.
230
231Since both na\"{\i}ve and incremental variants store complete type classes, the cost of a $report$ operation is simply the time needed to output the contained variables, $O(m)$.
232Since the type classes store their bounds, $bound$ and $bind$ are both $O(1)$ given a type class.
233Once a $find$ operation has already been performed to verify that a type variable does not exist in the environment, the data structures for both these variants support adding new type classes (the $insert$ operation) in $O(1)$.
234Adding a variable to a type class (the $add$ operation) can be done in $O(1)$ for the na\"{\i}ve implementation, but the incremental implementation may need to copy the edited type class from a parent at cost $O(m)$.
235
236The linear storage of type classes in both variants also leads to $O(m)$ time for the variable-merging step in $unify$, plus the usual $u(n)$ recursion term for $unifyBound$.
237The na\"{\i}ve $combine$ operation must traverse each of the classes of one environment, merging in any class of the other environment that shares a type variable.
238Since there are at most $n$ classes to unify, the unification cost is $O(nm + nu(n))$, while traversal and $find$ costs to locate classes to merge total $O(n^2m)$, for an overall cost of $O(n^2m + nu(n))$.
239The incremental $combine$ operation works similarly, but only needs to consider classes modified in either environment since the common ancestor of both environments, allowing the $n$ cost terms to be substituted for $p$, for an overall cost of $O(p^m + pu(n))$.
240Neither variant supports the $split$ operation to undo a $unify$.
241
242The na\"{\i}ve environment does nothing to support $save$ and $backtrack$, so these operations must be implemented by making a deep copy of the environment on $save$, then a destructive overwrite on $backtrack$, each at a cost of $O(nm)$.
243The incremental environment supports $O(1)$ $save$ by simply setting aside a reference to the current environment, then proceeding with a new, empty environment with the former environment as a parent.
244$backtrack$ to a parent environment may involve destroying all the intermediate environments if this backtrack removes the last reference to the backtracked-from environment; this cost is $O(pm)$.
245
246\subsection{Union-Find}
247
248The union-find data structure is designed to support $find$ efficiently, and thus for any variant, the cost is simply the distance up the tree to the representative element.
249For basic union-find, this is amortized to the inverse Ackermann function, $O(\alpha(m))$, essentially a small constant, though the loss of path compression in persistent union-find raises this cost to the still cheap $O(\log m)$.
250Basic union-find is not designed to support the $report$ operation, however, so it must be simulated by checking the representative of every type varaible, at cost $O(n\alpha(m))$.
251Persistent union-find, on the other hand, uses the ``with classes'' extension to union-find described in Section~\ref{env-union-find-classes-approach} to run $report$ in $O(m)$ time.
252
253All union-find environment variants described here use a secondary hash table to map from class representatives to bindings, and as such can perform $bound$ and $bind$ in $O(1)$, given the representative.
254$insert$ is also a $O(1)$ operation for both basic and persistent union-find.
255Since $add$ simply involves attaching a new child to the class root, it is also a $O(1)$ operation for all union-find environment variants.
256
257Union-find is also designed to support $unify$ in constant time, and as such, given class representatives, the variable-merging most of $unify$ for both variants is $O(1)$ to make one representative the child of the other, plus the $O(u(n))$ term for $unifyBound$.
258Basic union-find does not support $split$, but persistent union-find can accomplish it using the mechanism described in Section~\ref{env-union-find-classes-approach} in $O(\log m)$, the cost of traversing up to the root of a class from a leaf without path compression.
259$combine$ on the basic union-find data structure works similarly to the data structures discussed above in Section~\ref{naive-incremental-analysis}, with a $O(nu(n))$ term for the $O(n)$ underlying $unify$ operations, and a $O(n\alpha(m))$ term to find the classes which need unification by checking the class representatives of each corresponding type variable in both environments for equality.
260Persistent union-find uses a different approach for $combine$, discussed in Section~\ref{env-persistent-union-find}.
261Discounting recursive $unify$ operations included in the $u(n)$ $unifyBound$ term, there may be at most $O(p)$ $unify$ operations performed, at cost $O(pu(n))$.
262Each of the $O(p)$ steps on the edit path can be processed in the $O(\log m)$ time it takes to find the current representative of the modified type class, for a total runtime of $O(p \log m + pu(n))$.
263
264In terms of backtracking operations, the basic union-find data structure only supports deep copies, for $O(nm)$ cost for both $save$ and $backtrack$.
265Persistent union-find, as the name suggests, is more optimized, with $O(1)$ cost to $save$ a backtrack-capable reference to the current environment state, and $O(p)$ cost to revert to that state (possibly destroying no-longer-used edit nodes along the path).
266
267% Future work: design multi-threaded version of C&F persistent map --- core idea is some sort of thread-boundary edit node
Note: See TracBrowser for help on using the repository browser.