Changeset 57b0b1f
- Timestamp:
- Oct 23, 2018, 5:36:00 PM (6 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, no_list, persistent-indexer, pthread-emulation, qualifiedEnum
- Children:
- 167d5ae, df43791
- Parents:
- eeb0767
- Location:
- doc
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/bibliography/pl.bib
reeb0767 r57b0b1f 5439 5439 } 5440 5440 5441 @inproceedings{Conchon07, 5442 keywords = {persistent array, union-find}, 5443 contributer = {a3moss@uwaterloo.ca}, 5444 title={A persistent union-find data structure}, 5445 author={Conchon, Sylvain and Filli{\^a}tre, Jean-Christophe}, 5446 booktitle={Proceedings of the 2007 workshop on Workshop on ML}, 5447 pages={37--46}, 5448 year={2007}, 5449 organization={ACM} 5450 } 5451 5441 5452 @article{poly, 5442 5453 keywords = {Poly, Standard ML, Russell, persistence}, … … 6360 6371 } 6361 6372 6373 @article{Baker78, 6374 keywords = {Algol display, FUNARG's, Lisp 1.5, deep binding, environment trees, multiprogramming, shallow binding}, 6375 contributer = {a3moss@uwaterloo.ca}, 6376 author = {Baker,Jr., Henry G.}, 6377 title = {Shallow Binding in Lisp 1.5}, 6378 journal = {Commun. ACM}, 6379 issue_date = {July 1978}, 6380 volume = {21}, 6381 number = {7}, 6382 month = jul, 6383 year = {1978}, 6384 issn = {0001-0782}, 6385 pages = {565--569}, 6386 numpages = {5}, 6387 url = {http://doi.acm.org.proxy.lib.uwaterloo.ca/10.1145/359545.359566}, 6388 doi = {10.1145/359545.359566}, 6389 acmid = {359566}, 6390 publisher = {ACM}, 6391 address = {New York, NY, USA} 6392 } 6393 6394 @article{Baker91, 6395 keywords = {shallow binding, functional arrays}, 6396 contributer = {a3moss@uwaterloo.ca}, 6397 author = {Baker, Henry G.}, 6398 title = {Shallow Binding Makes Functional Arrays Fast}, 6399 journal = {SIGPLAN Not.}, 6400 issue_date = {Aug. 1991}, 6401 volume = {26}, 6402 number = {8}, 6403 month = aug, 6404 year = {1991}, 6405 issn = {0362-1340}, 6406 pages = {145--147}, 6407 numpages = {3}, 6408 url = {http://doi.acm.org.proxy.lib.uwaterloo.ca/10.1145/122598.122614}, 6409 doi = {10.1145/122598.122614}, 6410 acmid = {122614}, 6411 publisher = {ACM}, 6412 address = {New York, NY, USA}, 6413 } 6414 6362 6415 @techreport{Adve95, 6363 6416 keywords = {shared memory, consistency models}, -
doc/theses/aaron_moss_PhD/phd/type-environment.tex
reeb0767 r57b0b1f 69 69 70 70 The type environment data structure used in Bilson's\cite{Bilson03} original implementation of \CFACC{} is a straightforward translation of the definitions in Section~\ref{env-defn-sec} to \CC{} code; a !TypeEnvironment! contains a list of !EqvClass! type equivalence classes, each of which contains the type bound information and a tree-based sorted set of type variables. 71 This approach has the benefit of being easy to understand and not imposing life cycle or inheritance constraints on its use, but, as can be seen in Table~\ref{env-bounds-table}, does not support many of the desired operations with any particular efficiency.72 Some variations on this structure may improve performance somewhat; for instance, replacing the !EqvClass! variable storage with a hash-based set would reduce search and update times from $O( log n)$ to amortized $O(1)$, while adding an index for the type variables in the entire environment would remove the need to check each type class individually to maintain the disjointness property.71 This approach has the benefit of being easy to understand and not imposing life-cycle or inheritance constraints on its use, but, as can be seen in Table~\ref{env-bounds-table}, does not support many of the desired operations with any particular efficiency. 72 Some variations on this structure may improve performance somewhat; for instance, replacing the !EqvClass! variable storage with a hash-based set would reduce search and update times from $O(\log n)$ to amortized $O(1)$, while adding an index for the type variables in the entire environment would remove the need to check each type class individually to maintain the disjointness property. 73 73 These improvements do not change the fundamental issues with this data structure, however. 74 74 … … 81 81 For this environment I also employed a lazily-generated index of type variables to their containing class, which could be in either the current environment or an ancestor. 82 82 Any mutation of a type class in an ancestor environment would cause that class to be copied into the current environment before mutation, as well as added to the index, ensuring that all local changes to the type environment are listed in its index. 83 However, not adding type variables to the index until lookup or mutation preserves the constant-time environment copy operation in the common case in which the copy is not mutated from its parent during its life cycle.83 However, not adding type variables to the index until lookup or mutation preserves the constant-time environment copy operation in the common case in which the copy is not mutated from its parent during its life-cycle. 84 84 85 85 This approach imposes some performance penalty on $combine$ if related environments are not properly linked together, as the entire environment needs to be combined rather than just the diff, but is correct as long as the ``null parent'' base case is properly handled. 86 The life cycle issues are somewhat more complex, as many environments may descend from a common parent, and all of these need their parent to stay alive for purposes of lookup.86 The life-cycle issues are somewhat more complex, as many environments may descend from a common parent, and all of these need their parent to stay alive for purposes of lookup. 87 87 These issues can be solved by ``flattening'' parent nodes into their children before the parents leave scope, but given the tree structure of the inheritance graph it is more straightforward to store the parent nodes in reference-counted or otherwise automatically garbage-collected heap storage. 88 88 89 \subsection{Union-Find} 89 \subsection{Union-Find} \label{env-union-find-approach} 90 90 91 91 Given the nature of the classes of type variables as disjoint sets, another natural approach to implementing a type environment is the union-find disjoint set data structure\cite{Galler64}. … … 101 101 If placeholder values are inserted in this map for type classes without bounds than this also has the useful property that the key set of the map provides an easily obtainable list of all the class representatives, a list which cannot be derived from the union-find data structure without a linear search for class representatives through all elements. 102 102 103 \subsection{Union-Find with Classes} 103 \subsection{Union-Find with Classes} \label{env-union-find-classes-approach} 104 104 105 105 Another type environment operation not supported directly by the union-find data structure is $report$, which lists the type variables in a given class, and similarly $remove$, which removes a class and all its type variables from the environment. … … 108 108 Unfortunately, the literature\cite{Tarjan84,Galil91,Patwary10} on union-find does not present a way to keep references to children without breaking the asymptotic time bounds of the algorithm; I have discovered a method to do so which, despite its simplicity, seems to be novel. 109 109 110 \TODO{port figure from slideshow} 111 110 112 The core idea of this ``union-find with classes'' data structure and algorithm is to keep the members of each class stored in a circularly-linked list. 111 Rem's algorithm\cite{Dijkstra76}, dating from the 1970s, also includes a circularly-linked list data structure. 112 \TODO{Check this from source} However, Rem's algorithmhas an entirely flat class hierarchy, where all elements were direct children of the representative, giving constant-time $find$ at the cost of linear-time $union$ operations.113 Aho, Hopcroft, and Ullman also include a circularly-linked list in their 1974 textbook~\cite{Aho74}. 114 However, the algorithm presented by Aho~\etal{} has an entirely flat class hierarchy, where all elements were direct children of the representative, giving constant-time $find$ at the cost of linear-time $union$ operations. 113 115 In my version, the list data structure does not affect the layout of the union-find tree, maintaining the same asymptotic bounds as union-find. 114 116 In more detail, each element is given a !next! pointer to another element in the same class; this !next! pointer initially points to the element itself. … … 132 134 \end{proof} 133 135 136 On its own, union-find, like the na\"{\i}ve approach, has no special constraints on life-cycle or inheritance, but it can be used as a building block in more sophisticated type environment data structures. 137 138 \subsection{Persistent Union-Find} 139 140 Given the backtracking nature of the resolution algorithm discussed in Section~\ref{env-defn-sec}, the abilities to quickly switch between related versions of a type environment and to de-duplicate shared data between environments are both assets to performance. 141 Conchon and Filli\^{a}tre~\cite{Conchon07} present a persistent union-find data structure based on the persistent array of Baker~\cite{Baker78,Baker91}. 142 143 \TODO{port figure from slideshow} 144 145 In Baker's persistent array, an array reference contains either a pointer to the array or a pointer to an \emph{edit node}; these edit nodes contain an array index, the value in that index, and another array index pointing either to the array or a different edit node. 146 In this manner, a tree of edits is formed, rooted at the actual array. 147 Read from the actual array at the root can be performed in constant time, as with a non-persistent array. 148 The persistent array can be mutated by directly modifying the underlying array, then replacing its array reference with an edit node containing the mutated index, the previous value at that index, and a reference to the mutated array. 149 This mutation algorithm is in some sense a special case of the key operation on persistent arrays, $reroot$. 150 151 A rerooting operation takes any array reference and makes it the root node of the array. 152 This is accomplished by tracing the path from some edit node to the root node of the array (always the underlying array), recursively applying the edits to the underlying array and replacing each edit node's successor with the inverse edit. 153 In this way, any previous state of the persistent array can be restored in time proportional to the number of edits to the current state of the array. 154 While $reroot$ does maintain the same value mapping in every version of the persistent array, the internal mutations it performs means that it is not thread-safe, and must be used behind a lock in a concurrent context. 155 Also, the root node with the actual array may in principle be anywhere in the tree, and does not provide information to report its leaf nodes, so some form of automatic garbage collection is generally required for the data structure. 156 Since the graph of edit nodes is tree-structured, reference counting approaches suffice for garbage collection; Conchon and Filli\^{a}tre~\cite{Conchon07} also observe that if the only $reroot$ operations are for backtracking then the tail of inverse edit nodes may be elided, suggesting the possibility of stack-based memory management. 157 158 While Conchon and Filli\^{a}tre~\cite{Conchon07} implement their persistent union-find data structure over a universe of integer elements in the fixed range $[1,N]$, the type environment problem needs more flexibility. 159 In particular, an arbitrary number of type variables must be added to the environment. 160 As such, a persistent hash table is a more suitable structure than a persistent array, providing the same expected asymptotic time bounds while allowing a dynamic number of elements. 161 Besides replacing the underlying array with a hash table, the other major change in this approach is to replace the two types of array references, !Array! and !Edit!, with four node types, !Table!, !Edit!, !Add!, and !Remove!, where !Add! adds a new key-value pair, !Remove! removes a key, and !Edit! mutates an existing key-value pair. 162 In this variant of \CFACC{}, this persistent hash table is used as the side map discussed in Section~\ref{env-union-find-approach} for class bounds. 163 The actual union-find data structure is slightly modified from this approach, with a !Base! node containing the root union-find data structure, !Add! nodes adding new elements, !AddTo! nodes defining the union of two type classes, and !Remove! and !RemoveFrom! nodes as inverses of the previous two elements, for purposes of maintaining the edit list. 164 Making !AddTo! and !RemoveFrom! single nodes shortens the edit path for improved performance, while also providing semantic information missing from the raw array updates in Conchon and Filli\^{a}tre's data structure. 165 The single-node approach, does, however, break under most path-compression algorithms; !RemoveFrom! can be applied to the underlying data structure using the ``leaf of last union'' approach discussed in in Section~\ref{env-union-find-classes-approach}; this was judged an acceptable trade-off for the added semantic information and shortened paths. 166 167 Maintaining explicit information on $union$ operations in the persistent union-find edit tree in the form of !AddTo! and !RemoveFrom! nodes exposes a new option for combining type environments. 168 If the type environments are part of the same edit tree, one environment $T'$ can be combined with another $T$ by only testing the edits on the path from $T'$ to $T$ in both the persistent union-find data structure describing the classes and the persistent hash table containing the class bounds. 169 This is generally more efficient than testing the compatibility of all type classes in $T'$, as only those that are actually different than those in $T$ must be considered. 170 171 The procedure for $combine(T, T')$ based on edit paths is as follows: 172 The shared edit trees for classes and bindings are rerooted at $T$, and the path from $T'$ to $T$ is followed to create a list of actual edits. 173 By tracking the state of each element, redundant changes such as an !Edit! followed by an !Edit! can be reduced to their form in $T'$ by dropping the later (more like $T$) !Edit! for the same key; !Add! and !Remove! cancel similarly. 174 This procedure is repeated for both the class edit tree and the binding edit tree. 175 When the list of net changes to the environment has been produced, the additive changes are applied to $T$. 176 For example, if a type class exists in $T'$ but not $T$, the corresponding !Add! edit will be applied to $T$, but in the reverse situation the !Remove! edit will not be applied to $T$, as the intention is to produce a new environment representing the union of the two sets of type classes; similarly, !AddTo! edits are applied to unify type-classes in $T$ that are united in $T'$, but !RemoveFrom! edits that split type classes are not. 177 The new environment, $T''$ can always be constructed with a consistent partitioning of type variables; in the extreme case, all variables from both $T$ and $T'$ will be united in a single type class in $T''$. 178 Where $combine$ can fail is in unifying the bound types; if any class in $T'$ has a class bound which does not unify with the merged class in $T''$ than $combine$ fails. 179 180 \section{Analysis} 181 182 In this section I present asymptotic analyses of the various approaches to a type environment data structure discussed in the previous section. 183 134 184 % Future work: design multi-threaded version of C&F persistent map --- core idea is some sort of thread-boundary edit node
Note: See TracChangeset
for help on using the changeset viewer.