Apr 26, 2019, 4:56:16 PM (5 years ago)
Thierry Delisle <tdelisle@…>
ADT, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
3898392 (diff), bd405fa (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Merge branch 'master' into jenkins-sandbox

1 edited


  • doc/theses/aaron_moss_PhD/phd/type-environment.tex

    r3898392 rcbef27b  
    1212For purposes of this chapter, a \emph{type environment} $T$ is a set of \emph{type classes} $\myset{T_1, T_2, \cdots, T_{|T|}}$.
    1313Each type class $T_i$ contains a set of \emph{type variables} $\myset{v_{i,1}, v_{i,2}, \cdots, v_{i,|T_i|}}$.
    14 Since the type classes represent an equivalence relation over the type variables the sets of variables contained in two distinct classes in the same environment must be disjoint.
    15 Each individual type class $T_i$ may also be associated with a \emph{bound}, $b_i$; this bound contains the \emph{bound type} that the variables in the type class are replaced with, but also includes other information in \CFACC{}, including whether type conversions are permissible on the bound type and what sort of type variables are contained in the class (data types, function types, or variadic tuples).
     14Since the type classes represent an equivalence relation over the type variables the sets of variables contained in two distinct classes in the same environment must be \emph{disjoint}.
     15Each individual type class $T_i$ may also be associated with a \emph{bound}, $b_i$; this bound contains the \emph{bound type} that the variables in the type class are replaced with, but also includes other information in \CFACC{}, including whether type conversions are permissible on the bound type and what sort of type variables are contained in the class (data types, function types, or variadic tuples).
     17The following example demonstrates the use of a type environment for unification:
     20forall(otype F) F f(F, F);
     21forall(otype G) G g(G);
     23f( g(10), g(20) );
     26Expression resolution starts from an empty type environment; from this empty environment, the calls to !g! can be independently resolved.
     27These resolutions result in two new type environments, $T = \{ \myset{\mathsf{G}_1} \rightarrow$ !int!$\}$ and $T' = \{ \myset{\mathsf{G}_2} \rightarrow$ !int!$\}$; the calls to !g! have generated distinct type variables !G!$_1$ and !G!$_2$, each bound to !int! by unification with the type of its argument (!10! and !20!, both !int!).
     28To complete resolution of the call to !f!, both environments must be combined; resolving the first argument to !f! produces a new type environment $T'' = \{ \myset{\mathsf{G}_1, \mathsf{F}_1} \rightarrow$ !int!$\}$: the new type variable !F!$_1$ has been introduced and unified with !G!$_1$ (the return type of !g(10)!), and consequently bound to !int!.
     29To resolve the second argument to !f!, $T''$ must be checked for compatibility with $T'$; since !F!$_1$ unifies with !G!$_2$, their type classes must be merged.
     30Since both !F!$_1$ and !G!$_2$ are bound to !int!, this merge succeeds, producing the final environment $T'' = \{ \myset{\mathsf{G}_1, \mathsf{F}_1, \mathsf{G}_2} \rightarrow$ !int!$\}$.
    39 Given this basic structure, type environments in \CFACC{} need to support eleven basic operations, summarized in Table~\ref{env-op-table}.
     54Type environments in \CFACC{} need to support eleven basic operations, summarized in Table~\ref{env-op-table}.
    4055The first six operations are straightforward queries and updates on these data structures:
    4156The lookup operation $find(T, v_{i,j})$ produces $T_i$, the type class in $T$ that contains variable $v_{i,j}$, or an invalid sentinel value for no such class.
    4257The other two query operations act on type classes, where $report(T_i)$ produces the set $\myset{v_{i,1}, v_{i,2}, \cdots, v_{i,|T_i|}}$ of all type variables in a class $T_i$ and $bound(T_i)$ produces the bound $b_i$ of that class, or a sentinel indicating no bound is set.
    44 The update operation $insert(T, v_{i,1})$ creates a new type class $T_i$ in $T$ that contains only the variable $v_{i,1}$ and no bound; due to the disjointness property, $v_{i,1}$ cannot belong to any other type class in $T$.
     59The update operation $insert(T, v_{i,1})$ creates a new type class $T_i$ in $T$ that contains only the variable $v_{i,1}$ and no bound; due to the disjointness property, $v_{i,1}$ must not belong to any other type class in $T$.
    4560The $add(T_i, v_{i,j})$ operation adds a new type variable $v_{i,j}$ to class $T_i$; again, $v_{i,j}$ cannot exist elsewhere in $T$.
    4661$bind(T_i, b_i)$ mutates the bound for a type class, setting or updating the current bound.
    7590\subsection{Na\"{\i}ve} \label{naive-env-sec}
    77 The type environment data structure used in Bilson's~\cite{Bilson03} original implementation of \CFACC{} is a straightforward translation of the definitions in Section~\ref{env-defn-sec} to \CC{} code; a !TypeEnvironment! contains a list of !EqvClass! type equivalence classes, each of which contains the type bound information and a tree-based sorted set of type variables.
     92The type environment data structure used in Bilson's~\cite{Bilson03} original implementation of \CFACC{} is a simple translation of the definitions in Section~\ref{env-defn-sec} to \CC{} code; a !TypeEnvironment! contains a list of !EqvClass! type equivalence classes, each of which contains the type bound information and a tree-based sorted set of type variables.
    7893This approach has the benefit of being easy to understand and not imposing life-cycle or inheritance constraints on its use, but, as can be seen in Table~\ref{env-bounds-table}, does not support many of the desired operations with any particular efficiency.
    7994Some variations on this structure may improve performance somewhat; for instance, replacing the !EqvClass! variable storage with a hash-based set reduces search and update times from $O(\log n)$ to amortized $O(1)$, while adding an index for the type variables in the entire environment removes the need to check each type class individually to maintain the disjointness property.
    106121In particular, the type-class bound cannot be easily included in the union-find data structure, as the requirement to make it the class representative breaks the balancing properties of $union$, and requires too-close integration of the type environment $unifyBound$ internal operation.
    107122This issue can be solved by including a side map from class representatives to the type-class bound.
    108 If placeholder values are inserted in this map for type classes without bounds than this also has the useful property that the key set of the map provides an easily obtainable list of all the class representatives, a list which cannot be derived from the union-find data structure without a linear search for class representatives through all elements.
     123If placeholder values are inserted in this map for type classes without bounds then this also has the useful property that the key set of the map provides an easily obtainable list of all the class representatives, a list which cannot be derived from the union-find data structure without a linear search for class representatives through all elements.
    110125\subsection{Union-Find with Classes} \label{env-union-find-classes-approach}
    244259The na\"{\i}ve $combine$ operation must traverse each of the classes of one environment, merging in any class of the other environment that shares a type variable.
    245260Since there are at most $n$ classes to unify, the unification cost is $O(nm + nu(n))$, while traversal and $find$ costs to locate classes to merge total $O(n^2m)$, for an overall cost of $O(n^2m + nu(n))$.
    246 The incremental $combine$ operation works similarly, but only needs to consider classes modified in either environment with respect to the common ancestor of both environments, allowing the $n$ cost terms to be substituted for $p$, for an overall cost of $O(p^m + pu(n))$.
     261The incremental $combine$ operation works similarly, but only needs to consider classes modified in either environment with respect to the common ancestor of both environments, allowing the $n$ cost terms to be substituted for $p$, for an overall cost of $O(p^2m + pu(n))$.
    247262Neither variant supports the $split$ operation to undo a $unify$.
    282297This persistent union-find data structure is efficient, but not thread-safe; as suggested in Section~\ref{resn-conclusion-sec}, it may be valuable to parallelize the \CFA{} expression resolver.
    283298However, allowing multiple threads concurrent access to the persistent data structure is likely to result in ``reroot thrashing'', as different threads reroot the data structure to their own versions of interest.
    284 This contention could be mitigated by partitioning the data structure into separate subtrees for each thread, with each subtree having its own root node, and the boundaries among them implemented with a lock-equipped !ThreadBoundary! edit node.
     299This contention could be mitigated by partitioning the data structure into separate subtrees for each thread, with each subtree having its own root node, and the boundaries among them implemented with a lock-equipped !ThreadBoundary! edit node.
     300Alternatively, the concurrent hash trie of Prokopec \etal{} \cite{Prokopec11,Prokopec12} may be a useful hash-table replacement.
Note: See TracChangeset for help on using the changeset viewer.