Context Navigation

Reverse Diff

resolver_design.md [d14d96a:d5f1cfc]

File:

: 1 edited

doc/working/resolver_design.md (modified) (2 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/working/resolver_design.md

-              rd14d96a
+              rd5f1cfc
 ## Conversion Costs ##
+Each possible resolution of an expression has a _cost_ tuple consisting of
+the following components: _unsafe_ conversion cost, _polymorphic_
+specialization cost, _safe_ conversion cost, a count of _explicit_
+conversions, and _qualifier_ conversion cost.
+These components are lexically-ordered and can be summed element-wise;
+summation starts at `(0, 0, 0, 0, 0)`.
+Each possible resolution of an expression has a _cost_ consisting of four
+integer components: _unsafe_ conversion cost, _polymorphic_ specialization
+cost, _safe_ conversion cost, and a _count_ of conversions.
+These components form a lexically-ordered tuple which can be summed
+element-wise; summation starts at `(0, 0, 0, 0)`.
 ### Lvalue and Qualifier Conversions ###
 …
 programmers.
-## Resolver Architecture ##
-### Function Application Resolution ###
-Our resolution algorithm for function application expressions is based on
-Baker's[3] single-pass bottom-up algorithm, with Cormack's[4] single-pass
-top-down algorithm applied where appropriate as an optimization.
-Broadly speaking, the cost of this resolution per expression will be
-proportional to `i^d`, where `i` is the number of interpretations of each
-program symbol, and `d` is the maximum depth of the expression DAG.
-Since `d` is determined by the user programmer (in general, bounded by a small
-constant), opportunities for resolver optimization primarily revolve around
-minimizing `i`, the number of interpretations of each symbol that are
-considered.
-[3] Baker, Theodore P. A one-pass algorithm for overload resolution in Ada.
-ACM Transactions on Programming Languages and Systems (1982) 4:4 p.601-614
-[4] Cormack, Gordon V. An algorithm for the selection of overloaded functions
-in Ada. SIGPLAN Notices (1981) 16:2 p.48-52
-Unlike Baker, our system allows implicit type conversions for function
-arguments and return types; the problem then becomes to find the valid
-interpretation for an expression that has the unique minimal conversion cost,
-if such exists.
-Interpretations can be produced both by overloaded names and implicit
-conversions applied to existing interpretations; we have proposals to reduce
-the number of interpretations considered from both sources.
-To simplify the problem for this discussion, we will consider application
-resolution restricted to a domain of functions applied to variables, possibly
-in a nested manner (e.g. `f( g( x ), y )`, where `x` and `y` are variables and
-`f` and `g` are functions), and possibly in a typed context such as a variable
-initialization (e.g. `int i = f( x );`); the other aspects of Cforall type
-resolution should be able to be straightforwardly mapped into this model.
-The types of the symbol tables used for variable and function declarations
-look somewhat like the following:
-        variable_table = name_map( variable_name, variable_map )
-        function_table = name_map( function_name, function_map )
-        variable_map = multi_index( by_type( variable_type ),
-                                    variable_decl_set )
-        function_map = multi_index( by_int( n_params ),
-                                                                by_type( return_type ),
-                                                                function_decl_set )
-`variable_name` and `function_name` are likely simple strings, with `name_map`
-a hash table (or perhaps trie) mapping string keys to values.
-`variable_decl_set` and `function_decl_set` can be thought of for the moment
-as simple bags of typed declarations, where the declaration types are linked
-to the graph of available conversions for that type.
-In a typed context both the `variable_decl_set` and the `function_decl_set`
-should be able to be selected upon by type; this is accomplished by the
-`by_type` index of both `variable_map` and `function_map`.
-The `by_int` index of `function_map` also provides a way to select functions
-by their number of parameters; this index may be used to swiftly discard any
-function declaration which does not have the appropriate number of parameters
-for the argument interpretations being passed to it; given the likely small
-number of entries in this map, it is possible that a binary search of a sorted
-vector or even a linear search of an unsorted vector would be more efficient
-than the usual hash-based index.
-Given these data structures, the general outline of our algorithm follows
-Baker, with Cormack's algorithm used as a heuristic filter in typed contexts.
-In an untyped context, we use a variant of Baker's bottom-up algorithm.
-The leaves of the interpretation DAG are drawn from the variable symbol table,
-with entries in the table each producing zero-cost interpretations, and each
-implicit conversion available to be applied to the type of an existing entry
-producing a further interpretation with the same cost as the conversion.
-As in Baker, if two or more interpretations have the same type, only the
-minimum cost interpretation with that type is produced; if there is no unique
-minimum cost interpretation than resolution with that type is ambiguous, and
-not permitted.
-It should be relatively simple to produce the list of interpretations sorted
-by cost by producing the interpretations via a breadth-first search of the
-conversion graph from the initial interpretations provided in the variable
-symbol table.
-To match a function at one of the internal nodes of the DAG, we first look up
-the function's name in the function symbol table, the appropriate number of
-parameters for the arguments that are provided through the `by_int` index of
-the returned `function_map`, then go through the resulting `function_decl_set`
-searching for functions where the parameter types can unify with the provided
-argument lists; any such matching function produces an interpretation with a
-cost that is the sum of its argument costs.
-Though this is not included in our simplified model, this unification step may
-include binding of polymorphic variables, which introduces a cost for the
-function binding itself which must be added to the argument costs.
-Also, checking of function assertions would likely be done at this step as
-well, possibly eliminating some possible matching functions (if no suitable
-assertions can be satisfied), or adding further conversion costs for the
-assertion satisfaction.
-Once the set of valid function interpretations is produced, these may also be
-expanded by the graph of implicit conversions on their return types, as the
-variable interpretations were.
-This implicit conversion-based expansion of interpretations should be skipped
-for the top-level expression if used in an untyped (void) context, e.g. for
-`f` in `f( g ( x ) );` or `x` in `x;`.
-On the other hand, if the top-level expression specifies a type, e.g. in
-`int i = f( x );`, only top level expressions that return that type are
-relevant to the search, so the candidates for `f` can be filtered first by
-those that return `int` (or a type convertable to it); this can be
-accomplished by performing a top-down filter of the interpretations of `f` by
-the `by_type` index of the `function_map` in a manner similar to Cormack's[4]
-algorithm.
-In a typed context, such as an initialization expression
-`T x = f( g( y ), z );`, only interpretations of `f( g( y ), z )` which have
-type `T` are valid; since there are likely to be valid interpretations of
-`f( g( y ), z )` which cannot be used to initialize a variable of type `T`, we
-can use this information to reduce the number of interpretations considered.
-Drawing from Cormack[4], we first search for interpretations of `f` where the
-return type is `T`; by breadth-first-search of the conversion graph, it should
-be straightforward to order the interpretations of `f` by the cost to convert
-their return type to `T`.
-We can also filter out interpretations of `f` with less than two parameters,
-since both `g( y )` and `z` must produce at least one parameter; we may not,
-however, rule out interpretations of `f` with more than two parameters, as
-there may be a valid interpretation of `g( y )` as a function returning more
-than one parameter (if the expression was `f( y, z )` instead, we could use an
-exact parameter count, assuming that variables of tuple type don't exist).
-For each compatible interpretation of `f`, we can add the type of the first
-parameter of that interpretation of `f` to a set `S`, and recursively search
-for interpretations of `g( y )` that return some type `Si` in `S`, and
-similarly for interpretations of `z` that match the type of any of the second
-parameters of some `f`.
-Naturally, if no matching interpretation of `g( y )` can be found for the
-first parameter of some `f`, the type of the second parameter of that `f` will
-not be added to the set of valid types for `z`.
-Each node in this interpretation DAG is given a cost the same way it would be
-in the bottom-up approach, with the exception that when going top-down there
-must be a final bottom-up pass to sum the interpretation costs and sort them
-as appropriate.
-If a parameter type for some `f` is a polymorphic type variable that is left
-unbound by the return type (e.g. `forall(otype S) int f(S x, int y)`), the
-matching arguments should be found using the bottom-up algorithm above for
-untyped contexts, because the polymorphic type variable does not sufficiently
-constrain the available interpretations of the argument expression.
-Similarly, it would likely be an advantage to use top-down resolution for
-cast expressions (e.g. `(int)x`), even when those cast expressions are
-subexpressions of an otherwise untyped expression.
-It may also be fruitful to switch between the bottom-up and top-down
-algorithms if the number of valid interpretations for a subexpression or valid
-types for an argument exceeds some heuristic threshold, but finding such
-a threshold (if any exists) will require experimental data.
-This hybrid top-down/bottom-up search provides more opportunities for pruning
-interpretations than either a bottom-up or top-down approach alone, and thus
-may be more efficient than either.
-A top-down-only approach, however, devolves to linear search through every
-possible interpretation in the solution space in an untyped context, and is
-thus likely to be inferior to a strictly bottom-up approach, though this
-hypothesis needs to be empirically validated.
-Both Baker and Cormack explicitly generate all possible interpretations of a
-given expression; thinking of the set of interpretations of an expression as a
-list sorted by cost, this is an eager evaluation of the list.
-However, since we generally expect that user programmers will not often use
-high-cost implicit conversions, one potentially effective way to prune the
-search space would be to first find the minimal-cost interpretations of any
-given subexpression, then to save the resolution progress of the
-subexpressions and attempt to resolve the superexpression using only those
-subexpression interpretations.
-If no valid interpretation of the superexpression can be found, the resolver
-would then repeatedly find the next-most-minimal cost interpretations of the
-subexpressions and attempt to produce the minimal cost interpretation of the
-superexpression.
-This process would proceed until all possible subexpression interpretations
-have been found and considered.
-A middle ground between the eager and lazy approaches can be taken by
-considering the lexical order on the cost tuple; essentially, every
-interpretation in each of the classes below will be strictly cheaper than any
-interpretation in the class after it, so if a min-cost valid interpretation
-can be found while only generating interpretations in a given class, that
-interpretation is guaranteed to be the best possible one:
-. Interpretations without polymorphic functions or implicit conversions
-. Interpretations without polymorphic functions using only safe conversions
-. Interpretations using polymorphic functions without unsafe conversions
-. Interpretations using unsafe conversions
-In this lazy-eager approach, all the interpretations in one class would be
-eagerly generated, while the interpretations in the next class would only be
-considered if no match was found in the previous class.
 ## Appendix A: Partial and Total Orders ##
 The `<=` relation on integers is a commonly known _total order_, and

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changes in doc/working/resolver_design.md [d14d96a:d5f1cfc]

Legend:

doc/working/resolver_design.md

Download in other formats: