Context Navigation

-                      r38bfe32a
+                      rd14d96a
 ## Conversion Costs ##
+Each possible resolution of an expression has a _cost_ consisting of four
+integer components: _unsafe_ conversion cost, _polymorphic_ specialization
+cost, _safe_ conversion cost, and a _count_ of conversions.
+These components form a lexically-ordered tuple which can be summed
+element-wise; summation starts at `(0, 0, 0, 0)`.
+Each possible resolution of an expression has a _cost_ tuple consisting of
+the following components: _unsafe_ conversion cost, _polymorphic_
+specialization cost, _safe_ conversion cost, a count of _explicit_
+conversions, and _qualifier_ conversion cost.
+These components are lexically-ordered and can be summed element-wise;
+summation starts at `(0, 0, 0, 0, 0)`.
 ### Lvalue and Qualifier Conversions ###
 …
 programmers.
+## Resolver Architecture ##
+### Function Application Resolution ###
+Our resolution algorithm for function application expressions is based on
+Baker's[3] single-pass bottom-up algorithm, with Cormack's[4] single-pass
+top-down algorithm applied where appropriate as an optimization.
+Broadly speaking, the cost of this resolution per expression will be
+proportional to `i^d`, where `i` is the number of interpretations of each
+program symbol, and `d` is the maximum depth of the expression DAG.
+Since `d` is determined by the user programmer (in general, bounded by a small
+constant), opportunities for resolver optimization primarily revolve around
+minimizing `i`, the number of interpretations of each symbol that are
+considered.
+[3] Baker, Theodore P. A one-pass algorithm for overload resolution in Ada.
+ACM Transactions on Programming Languages and Systems (1982) 4:4 p.601-614
+[4] Cormack, Gordon V. An algorithm for the selection of overloaded functions
+in Ada. SIGPLAN Notices (1981) 16:2 p.48-52
+Unlike Baker, our system allows implicit type conversions for function
+arguments and return types; the problem then becomes to find the valid
+interpretation for an expression that has the unique minimal conversion cost,
+if such exists.
+Interpretations can be produced both by overloaded names and implicit
+conversions applied to existing interpretations; we have proposals to reduce
+the number of interpretations considered from both sources.
+To simplify the problem for this discussion, we will consider application
+resolution restricted to a domain of functions applied to variables, possibly
+in a nested manner (e.g. `f( g( x ), y )`, where `x` and `y` are variables and
+`f` and `g` are functions), and possibly in a typed context such as a variable
+initialization (e.g. `int i = f( x );`); the other aspects of Cforall type
+resolution should be able to be straightforwardly mapped into this model.
+The types of the symbol tables used for variable and function declarations
+look somewhat like the following:
+        variable_table = name_map( variable_name, variable_map )
+        function_table = name_map( function_name, function_map )
+        variable_map = multi_index( by_type( variable_type ),
+                                    variable_decl_set )
+        function_map = multi_index( by_int( n_params ),
+                                                                by_type( return_type ),
+                                                                function_decl_set )
+`variable_name` and `function_name` are likely simple strings, with `name_map`
+a hash table (or perhaps trie) mapping string keys to values.
+`variable_decl_set` and `function_decl_set` can be thought of for the moment
+as simple bags of typed declarations, where the declaration types are linked
+to the graph of available conversions for that type.
+In a typed context both the `variable_decl_set` and the `function_decl_set`
+should be able to be selected upon by type; this is accomplished by the
+`by_type` index of both `variable_map` and `function_map`.
+The `by_int` index of `function_map` also provides a way to select functions
+by their number of parameters; this index may be used to swiftly discard any
+function declaration which does not have the appropriate number of parameters
+for the argument interpretations being passed to it; given the likely small
+number of entries in this map, it is possible that a binary search of a sorted
+vector or even a linear search of an unsorted vector would be more efficient
+than the usual hash-based index.
+Given these data structures, the general outline of our algorithm follows
+Baker, with Cormack's algorithm used as a heuristic filter in typed contexts.
+In an untyped context, we use a variant of Baker's bottom-up algorithm.
+The leaves of the interpretation DAG are drawn from the variable symbol table,
+with entries in the table each producing zero-cost interpretations, and each
+implicit conversion available to be applied to the type of an existing entry
+producing a further interpretation with the same cost as the conversion.
+As in Baker, if two or more interpretations have the same type, only the
+minimum cost interpretation with that type is produced; if there is no unique
+minimum cost interpretation than resolution with that type is ambiguous, and
+not permitted.
+It should be relatively simple to produce the list of interpretations sorted
+by cost by producing the interpretations via a breadth-first search of the
+conversion graph from the initial interpretations provided in the variable
+symbol table.
+To match a function at one of the internal nodes of the DAG, we first look up
+the function's name in the function symbol table, the appropriate number of
+parameters for the arguments that are provided through the `by_int` index of
+the returned `function_map`, then go through the resulting `function_decl_set`
+searching for functions where the parameter types can unify with the provided
+argument lists; any such matching function produces an interpretation with a
+cost that is the sum of its argument costs.
+Though this is not included in our simplified model, this unification step may
+include binding of polymorphic variables, which introduces a cost for the
+function binding itself which must be added to the argument costs.
+Also, checking of function assertions would likely be done at this step as
+well, possibly eliminating some possible matching functions (if no suitable
+assertions can be satisfied), or adding further conversion costs for the
+assertion satisfaction.
+Once the set of valid function interpretations is produced, these may also be
+expanded by the graph of implicit conversions on their return types, as the
+variable interpretations were.
+This implicit conversion-based expansion of interpretations should be skipped
+for the top-level expression if used in an untyped (void) context, e.g. for
+`f` in `f( g ( x ) );` or `x` in `x;`.
+On the other hand, if the top-level expression specifies a type, e.g. in
+`int i = f( x );`, only top level expressions that return that type are
+relevant to the search, so the candidates for `f` can be filtered first by
+those that return `int` (or a type convertable to it); this can be
+accomplished by performing a top-down filter of the interpretations of `f` by
+the `by_type` index of the `function_map` in a manner similar to Cormack's[4]
+algorithm.
+In a typed context, such as an initialization expression
+`T x = f( g( y ), z );`, only interpretations of `f( g( y ), z )` which have
+type `T` are valid; since there are likely to be valid interpretations of
+`f( g( y ), z )` which cannot be used to initialize a variable of type `T`, we
+can use this information to reduce the number of interpretations considered.
+Drawing from Cormack[4], we first search for interpretations of `f` where the
+return type is `T`; by breadth-first-search of the conversion graph, it should
+be straightforward to order the interpretations of `f` by the cost to convert
+their return type to `T`.
+We can also filter out interpretations of `f` with less than two parameters,
+since both `g( y )` and `z` must produce at least one parameter; we may not,
+however, rule out interpretations of `f` with more than two parameters, as
+there may be a valid interpretation of `g( y )` as a function returning more
+than one parameter (if the expression was `f( y, z )` instead, we could use an
+exact parameter count, assuming that variables of tuple type don't exist).
+For each compatible interpretation of `f`, we can add the type of the first
+parameter of that interpretation of `f` to a set `S`, and recursively search
+for interpretations of `g( y )` that return some type `Si` in `S`, and
+similarly for interpretations of `z` that match the type of any of the second
+parameters of some `f`.
+Naturally, if no matching interpretation of `g( y )` can be found for the
+first parameter of some `f`, the type of the second parameter of that `f` will
+not be added to the set of valid types for `z`.
+Each node in this interpretation DAG is given a cost the same way it would be
+in the bottom-up approach, with the exception that when going top-down there
+must be a final bottom-up pass to sum the interpretation costs and sort them
+as appropriate.
+If a parameter type for some `f` is a polymorphic type variable that is left
+unbound by the return type (e.g. `forall(otype S) int f(S x, int y)`), the
+matching arguments should be found using the bottom-up algorithm above for
+untyped contexts, because the polymorphic type variable does not sufficiently
+constrain the available interpretations of the argument expression.
+Similarly, it would likely be an advantage to use top-down resolution for
+cast expressions (e.g. `(int)x`), even when those cast expressions are
+subexpressions of an otherwise untyped expression.
+It may also be fruitful to switch between the bottom-up and top-down
+algorithms if the number of valid interpretations for a subexpression or valid
+types for an argument exceeds some heuristic threshold, but finding such
+a threshold (if any exists) will require experimental data.
+This hybrid top-down/bottom-up search provides more opportunities for pruning
+interpretations than either a bottom-up or top-down approach alone, and thus
+may be more efficient than either.
+A top-down-only approach, however, devolves to linear search through every
+possible interpretation in the solution space in an untyped context, and is
+thus likely to be inferior to a strictly bottom-up approach, though this
+hypothesis needs to be empirically validated.
+Both Baker and Cormack explicitly generate all possible interpretations of a
+given expression; thinking of the set of interpretations of an expression as a
+list sorted by cost, this is an eager evaluation of the list.
+However, since we generally expect that user programmers will not often use
+high-cost implicit conversions, one potentially effective way to prune the
+search space would be to first find the minimal-cost interpretations of any
+given subexpression, then to save the resolution progress of the
+subexpressions and attempt to resolve the superexpression using only those
+subexpression interpretations.
+If no valid interpretation of the superexpression can be found, the resolver
+would then repeatedly find the next-most-minimal cost interpretations of the
+subexpressions and attempt to produce the minimal cost interpretation of the
+superexpression.
+This process would proceed until all possible subexpression interpretations
+have been found and considered.
+A middle ground between the eager and lazy approaches can be taken by
+considering the lexical order on the cost tuple; essentially, every
+interpretation in each of the classes below will be strictly cheaper than any
+interpretation in the class after it, so if a min-cost valid interpretation
+can be found while only generating interpretations in a given class, that
+interpretation is guaranteed to be the best possible one:
+. Interpretations without polymorphic functions or implicit conversions
+. Interpretations without polymorphic functions using only safe conversions
+. Interpretations using polymorphic functions without unsafe conversions
+. Interpretations using unsafe conversions
+In this lazy-eager approach, all the interpretations in one class would be
+eagerly generated, while the interpretations in the next class would only be
+considered if no match was found in the previous class.
 ## Appendix A: Partial and Total Orders ##
 The `<=` relation on integers is a commonly known _total order_, and

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset d14d96a

Legend:

doc/working/resolver_design.md

Download in other formats: