# Changeset b3edf7f5 for doc/theses

Ignore:
Timestamp:
Feb 21, 2019, 6:40:58 PM (4 years ago)
Branches:
aaron-thesis, arm-eh, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
d065ded
Parents:
f316c68
Message:

thesis: polish first draft of Ch.5

File:
1 edited

### Legend:

Unmodified
 rf316c68 As discussed in Chapter~\ref{resolution-chap}, being able to efficiently determine which type variables are bound to which concrete types or whether two type environments are compatible is a core requirement of the resolution algorithm. Furthermore, expression resolution involves a search through many related possible solutions, so being able to re-use shared subsets of type environment data and to switch between environments quickly is desirable for performance. In this chapter I discuss and empirically compare a number of type environment data structure variants, including some novel variations on the union-find\cite{Galler64} data structure introduced in this thesis. In this chapter I discuss a number of type environment data structure variants, including some novel variations on the union-find \cite{Galler64} data structure introduced in this thesis. Chapter~\ref{expr-chap} contains empirical comparisons of the performance of these data structures when integrated into the resolution algorithm. \section{Definitions} \label{env-defn-sec} Since the union-find data structure stores only links from children to parents and not vice-versa, there is no way to reconstruct a class from one of its elements without a linear search over the entire data structure, with $find$ called on each element to check its membership in the class. The situation is even worse for the $split$ operation, which would require extra information to maintain the order that each child was added to its parent node. Unfortunately, the literature\cite{Tarjan84,Galil91,Patwary10} on union-find does not present a way to keep references to children without breaking the asymptotic time bounds of the algorithm; I have discovered a method to do so which, despite its simplicity, seems to be novel. Unfortunately, the literature \cite{Tarjan84,Galil91,Patwary10} on union-find does not present a way to keep references to children without breaking the asymptotic time bounds of the algorithm; I have discovered a method to do so which, despite its simplicity, seems to be novel. The core idea of this union-find with classes'' data structure and algorithm is to keep the members of each class stored in a circularly-linked list. This is accomplished by tracing the path from some edit node to the root node of the array (always the underlying array), recursively applying the edits to the underlying array and replacing each edit node's successor with the inverse edit. In this way, any previous state of the persistent array can be restored in time proportional to the number of edits to the current state of the array. While $reroot$ does maintain the same value mapping in every version of the persistent array, the internal mutations it performs means that it is not thread-safe, and must be used behind a lock in a concurrent context. While $reroot$ does maintain the same value mapping in every version of the persistent array, the internal mutations it performs break thread-safety, and thus it must be used behind a lock in a concurrent context. Also, the root node with the actual array may in principle be anywhere in the tree, and does not provide information to report its leaf nodes, so some form of automatic garbage collection is generally required for the data structure. Since the graph of edit nodes is tree-structured, reference counting approaches suffice for garbage collection; Conchon and Filli\^{a}tre~\cite{Conchon07} also observe that if the only $reroot$ operations are for backtracking then the tail of inverse edit nodes may be elided, suggesting the possibility of stack-based memory management. The union-find data structure is designed to support $find$ efficiently, and thus for any variant, the cost is simply the distance up the tree to the representative element. For basic union-find, this is amortized to the inverse Ackermann function, $O(\alpha(m))$, essentially a small constant, though the loss of path compression in persistent union-find raises this cost to the still cheap $O(\log m)$. Basic union-find is not designed to support the $report$ operation, however, so it must be simulated by checking the representative of every type varaible, at cost $O(n\alpha(m))$. Basic union-find is not designed to support the $report$ operation, however, so it must be simulated by checking the representative of every type variable, at cost $O(n\alpha(m))$. Persistent union-find, on the other hand, uses the with classes'' extension to union-find described in Section~\ref{env-union-find-classes-approach} to run $report$ in $O(m)$ time. Persistent union-find, as the name suggests, is more optimized, with $O(1)$ cost to $save$ a backtrack-capable reference to the current environment state, and $O(p)$ cost to revert to that state (possibly destroying no-longer-used edit nodes along the path). % Future work: design multi-threaded version of C&F persistent map --- core idea is some sort of thread-boundary edit node \section{Conclusion \& Future Work} This chapter has discussed the type environment data structure, as well as some more sophisticated approaches to optimize its performance for workloads encountered in the expression resolution problem, and asymptotic analysis of each approach. Chapter~\ref{expr-chap} provides experimental performance results for a representative set of these approaches. One contribution of this thesis is the union-find with classes data structure for efficient retrieval of union-find class members, along with a related algorithm for reversing the history of $union$ operations in this data structure. This reversible history contributes to the second novel contribution of this chapter, a type environment data structure based off the persistent union-find data structure of Conchon and Filli\^{a}tre~\cite{Conchon07}. This persistent union-find environment uses the $split$ operation introduced in union-find with classes and the edit history of the persistent data structure to support an environment-combining algorithm that only considers the edits between the environments to be merged. This persistent union-find data structure is efficient, but not thread-safe; as suggested in Section~\ref{resn-conclusion-sec}, it may be valuable to parallelize the \CFA{} expression resolver. However, allowing multiple threads concurrent access to the persistent data structure is likely to result in a sort of reroot thrashing'', as different threads reroot the data structure to their own versions of interest. It is possible this could be mitigated by partitioning the data structure into separate subtrees for each thread, with each subtree having its own root node, and the boundaries between them implemented with a lock-equipped !ThreadBoundary! edit node.