Changeset b3edf7f5


Ignore:
Timestamp:
Feb 21, 2019, 6:40:58 PM (5 years ago)
Author:
Aaron Moss <a3moss@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
d065ded
Parents:
f316c68
Message:

thesis: polish first draft of Ch.5

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/aaron_moss_PhD/phd/type-environment.tex

    rf316c68 rb3edf7f5  
    55As discussed in Chapter~\ref{resolution-chap}, being able to efficiently determine which type variables are bound to which concrete types or whether two type environments are compatible is a core requirement of the resolution algorithm.
    66Furthermore, expression resolution involves a search through many related possible solutions, so being able to re-use shared subsets of type environment data and to switch between environments quickly is desirable for performance.
    7 In this chapter I discuss and empirically compare a number of type environment data structure variants, including some novel variations on the union-find\cite{Galler64} data structure introduced in this thesis.
     7In this chapter I discuss a number of type environment data structure variants, including some novel variations on the union-find \cite{Galler64} data structure introduced in this thesis.
     8Chapter~\ref{expr-chap} contains empirical comparisons of the performance of these data structures when integrated into the resolution algorithm.
    89
    910\section{Definitions} \label{env-defn-sec}
     
    108109Since the union-find data structure stores only links from children to parents and not vice-versa, there is no way to reconstruct a class from one of its elements without a linear search over the entire data structure, with $find$ called on each element to check its membership in the class.
    109110The situation is even worse for the $split$ operation, which would require extra information to maintain the order that each child was added to its parent node.
    110 Unfortunately, the literature\cite{Tarjan84,Galil91,Patwary10} on union-find does not present a way to keep references to children without breaking the asymptotic time bounds of the algorithm; I have discovered a method to do so which, despite its simplicity, seems to be novel.
     111Unfortunately, the literature \cite{Tarjan84,Galil91,Patwary10} on union-find does not present a way to keep references to children without breaking the asymptotic time bounds of the algorithm; I have discovered a method to do so which, despite its simplicity, seems to be novel.
    111112
    112113The core idea of this ``union-find with classes'' data structure and algorithm is to keep the members of each class stored in a circularly-linked list.
     
    158159This is accomplished by tracing the path from some edit node to the root node of the array (always the underlying array), recursively applying the edits to the underlying array and replacing each edit node's successor with the inverse edit.
    159160In this way, any previous state of the persistent array can be restored in time proportional to the number of edits to the current state of the array.
    160 While $reroot$ does maintain the same value mapping in every version of the persistent array, the internal mutations it performs means that it is not thread-safe, and must be used behind a lock in a concurrent context.
     161While $reroot$ does maintain the same value mapping in every version of the persistent array, the internal mutations it performs break thread-safety, and thus it must be used behind a lock in a concurrent context.
    161162Also, the root node with the actual array may in principle be anywhere in the tree, and does not provide information to report its leaf nodes, so some form of automatic garbage collection is generally required for the data structure.
    162163Since the graph of edit nodes is tree-structured, reference counting approaches suffice for garbage collection; Conchon and Filli\^{a}tre~\cite{Conchon07} also observe that if the only $reroot$ operations are for backtracking then the tail of inverse edit nodes may be elided, suggesting the possibility of stack-based memory management.
     
    248249The union-find data structure is designed to support $find$ efficiently, and thus for any variant, the cost is simply the distance up the tree to the representative element.
    249250For basic union-find, this is amortized to the inverse Ackermann function, $O(\alpha(m))$, essentially a small constant, though the loss of path compression in persistent union-find raises this cost to the still cheap $O(\log m)$.
    250 Basic union-find is not designed to support the $report$ operation, however, so it must be simulated by checking the representative of every type varaible, at cost $O(n\alpha(m))$.
     251Basic union-find is not designed to support the $report$ operation, however, so it must be simulated by checking the representative of every type variable, at cost $O(n\alpha(m))$.
    251252Persistent union-find, on the other hand, uses the ``with classes'' extension to union-find described in Section~\ref{env-union-find-classes-approach} to run $report$ in $O(m)$ time.
    252253
     
    265266Persistent union-find, as the name suggests, is more optimized, with $O(1)$ cost to $save$ a backtrack-capable reference to the current environment state, and $O(p)$ cost to revert to that state (possibly destroying no-longer-used edit nodes along the path).
    266267
    267 % Future work: design multi-threaded version of C&F persistent map --- core idea is some sort of thread-boundary edit node
     268\section{Conclusion \& Future Work}
     269
     270This chapter has discussed the type environment data structure, as well as some more sophisticated approaches to optimize its performance for workloads encountered in the expression resolution problem, and asymptotic analysis of each approach.
     271Chapter~\ref{expr-chap} provides experimental performance results for a representative set of these approaches.
     272One contribution of this thesis is the union-find with classes data structure for efficient retrieval of union-find class members, along with a related algorithm for reversing the history of $union$ operations in this data structure.
     273This reversible history contributes to the second novel contribution of this chapter, a type environment data structure based off the persistent union-find data structure of Conchon and Filli\^{a}tre~\cite{Conchon07}.
     274This persistent union-find environment uses the $split$ operation introduced in union-find with classes and the edit history of the persistent data structure to support an environment-combining algorithm that only considers the edits between the environments to be merged.
     275
     276This persistent union-find data structure is efficient, but not thread-safe; as suggested in Section~\ref{resn-conclusion-sec}, it may be valuable to parallelize the \CFA{} expression resolver.
     277However, allowing multiple threads concurrent access to the persistent data structure is likely to result in a sort of ``reroot thrashing'', as different threads reroot the data structure to their own versions of interest.
     278It is possible this could be mitigated by partitioning the data structure into separate subtrees for each thread, with each subtree having its own root node, and the boundaries between them implemented with a lock-equipped !ThreadBoundary! edit node.
Note: See TracChangeset for help on using the changeset viewer.