Changeset d14d96a for doc/working

Jun 13, 2016, 4:59:28 PM (5 years ago)
Aaron Moss <a3moss@…>
aaron-thesis, arm-eh, cleanup-dtors, ctor, deferred_resn, demangler, gc_noraii, jacob/cs343-translation, jenkins-sandbox, master, memory, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, resolv-new, with_gc

Update resolver design with draft of architecture

1 edited


  • doc/working/

    r38bfe32a rd14d96a  
    8282## Conversion Costs ##
    83 Each possible resolution of an expression has a _cost_ consisting of four
    84 integer components: _unsafe_ conversion cost, _polymorphic_ specialization
    85 cost, _safe_ conversion cost, and a _count_ of conversions.
    86 These components form a lexically-ordered tuple which can be summed
    87 element-wise; summation starts at `(0, 0, 0, 0)`.
     83Each possible resolution of an expression has a _cost_ tuple consisting of
     84the following components: _unsafe_ conversion cost, _polymorphic_
     85specialization cost, _safe_ conversion cost, a count of _explicit_
     86conversions, and _qualifier_ conversion cost.
     87These components are lexically-ordered and can be summed element-wise;
     88summation starts at `(0, 0, 0, 0, 0)`.
    8990### Lvalue and Qualifier Conversions ###
     1227## Resolver Architecture ##
     1229### Function Application Resolution ###
     1230Our resolution algorithm for function application expressions is based on
     1231Baker's[3] single-pass bottom-up algorithm, with Cormack's[4] single-pass
     1232top-down algorithm applied where appropriate as an optimization.
     1233Broadly speaking, the cost of this resolution per expression will be
     1234proportional to `i^d`, where `i` is the number of interpretations of each
     1235program symbol, and `d` is the maximum depth of the expression DAG.
     1236Since `d` is determined by the user programmer (in general, bounded by a small
     1237constant), opportunities for resolver optimization primarily revolve around
     1238minimizing `i`, the number of interpretations of each symbol that are
     1241[3] Baker, Theodore P. A one-pass algorithm for overload resolution in Ada.
     1242ACM Transactions on Programming Languages and Systems (1982) 4:4 p.601-614
     1244[4] Cormack, Gordon V. An algorithm for the selection of overloaded functions
     1245in Ada. SIGPLAN Notices (1981) 16:2 p.48-52
     1247Unlike Baker, our system allows implicit type conversions for function
     1248arguments and return types; the problem then becomes to find the valid
     1249interpretation for an expression that has the unique minimal conversion cost,
     1250if such exists.
     1251Interpretations can be produced both by overloaded names and implicit
     1252conversions applied to existing interpretations; we have proposals to reduce
     1253the number of interpretations considered from both sources.
     1254To simplify the problem for this discussion, we will consider application
     1255resolution restricted to a domain of functions applied to variables, possibly
     1256in a nested manner (e.g. `f( g( x ), y )`, where `x` and `y` are variables and
     1257`f` and `g` are functions), and possibly in a typed context such as a variable
     1258initialization (e.g. `int i = f( x );`); the other aspects of Cforall type
     1259resolution should be able to be straightforwardly mapped into this model.
     1260The types of the symbol tables used for variable and function declarations
     1261look somewhat like the following:
     1263        variable_table = name_map( variable_name, variable_map )
     1265        function_table = name_map( function_name, function_map )
     1267        variable_map = multi_index( by_type( variable_type ),
     1268                                    variable_decl_set )
     1270        function_map = multi_index( by_int( n_params ),
     1271                                                                by_type( return_type ),
     1272                                                                function_decl_set )
     1274`variable_name` and `function_name` are likely simple strings, with `name_map`
     1275a hash table (or perhaps trie) mapping string keys to values.
     1276`variable_decl_set` and `function_decl_set` can be thought of for the moment
     1277as simple bags of typed declarations, where the declaration types are linked
     1278to the graph of available conversions for that type.
     1279In a typed context both the `variable_decl_set` and the `function_decl_set`
     1280should be able to be selected upon by type; this is accomplished by the
     1281`by_type` index of both `variable_map` and `function_map`.
     1282The `by_int` index of `function_map` also provides a way to select functions
     1283by their number of parameters; this index may be used to swiftly discard any
     1284function declaration which does not have the appropriate number of parameters
     1285for the argument interpretations being passed to it; given the likely small
     1286number of entries in this map, it is possible that a binary search of a sorted
     1287vector or even a linear search of an unsorted vector would be more efficient
     1288than the usual hash-based index.
     1290Given these data structures, the general outline of our algorithm follows
     1291Baker, with Cormack's algorithm used as a heuristic filter in typed contexts.
     1293In an untyped context, we use a variant of Baker's bottom-up algorithm.
     1294The leaves of the interpretation DAG are drawn from the variable symbol table,
     1295with entries in the table each producing zero-cost interpretations, and each
     1296implicit conversion available to be applied to the type of an existing entry
     1297producing a further interpretation with the same cost as the conversion.
     1298As in Baker, if two or more interpretations have the same type, only the
     1299minimum cost interpretation with that type is produced; if there is no unique
     1300minimum cost interpretation than resolution with that type is ambiguous, and
     1301not permitted.
     1302It should be relatively simple to produce the list of interpretations sorted
     1303by cost by producing the interpretations via a breadth-first search of the
     1304conversion graph from the initial interpretations provided in the variable
     1305symbol table.
     1307To match a function at one of the internal nodes of the DAG, we first look up
     1308the function's name in the function symbol table, the appropriate number of
     1309parameters for the arguments that are provided through the `by_int` index of
     1310the returned `function_map`, then go through the resulting `function_decl_set`
     1311searching for functions where the parameter types can unify with the provided
     1312argument lists; any such matching function produces an interpretation with a
     1313cost that is the sum of its argument costs.
     1314Though this is not included in our simplified model, this unification step may
     1315include binding of polymorphic variables, which introduces a cost for the
     1316function binding itself which must be added to the argument costs.
     1317Also, checking of function assertions would likely be done at this step as
     1318well, possibly eliminating some possible matching functions (if no suitable
     1319assertions can be satisfied), or adding further conversion costs for the
     1320assertion satisfaction.
     1321Once the set of valid function interpretations is produced, these may also be
     1322expanded by the graph of implicit conversions on their return types, as the
     1323variable interpretations were.
     1325This implicit conversion-based expansion of interpretations should be skipped
     1326for the top-level expression if used in an untyped (void) context, e.g. for
     1327`f` in `f( g ( x ) );` or `x` in `x;`.
     1328On the other hand, if the top-level expression specifies a type, e.g. in
     1329`int i = f( x );`, only top level expressions that return that type are
     1330relevant to the search, so the candidates for `f` can be filtered first by
     1331those that return `int` (or a type convertable to it); this can be
     1332accomplished by performing a top-down filter of the interpretations of `f` by
     1333the `by_type` index of the `function_map` in a manner similar to Cormack's[4]
     1336In a typed context, such as an initialization expression
     1337`T x = f( g( y ), z );`, only interpretations of `f( g( y ), z )` which have
     1338type `T` are valid; since there are likely to be valid interpretations of
     1339`f( g( y ), z )` which cannot be used to initialize a variable of type `T`, we
     1340can use this information to reduce the number of interpretations considered.
     1341Drawing from Cormack[4], we first search for interpretations of `f` where the
     1342return type is `T`; by breadth-first-search of the conversion graph, it should
     1343be straightforward to order the interpretations of `f` by the cost to convert
     1344their return type to `T`.
     1345We can also filter out interpretations of `f` with less than two parameters,
     1346since both `g( y )` and `z` must produce at least one parameter; we may not,
     1347however, rule out interpretations of `f` with more than two parameters, as
     1348there may be a valid interpretation of `g( y )` as a function returning more
     1349than one parameter (if the expression was `f( y, z )` instead, we could use an
     1350exact parameter count, assuming that variables of tuple type don't exist).
     1351For each compatible interpretation of `f`, we can add the type of the first
     1352parameter of that interpretation of `f` to a set `S`, and recursively search
     1353for interpretations of `g( y )` that return some type `Si` in `S`, and
     1354similarly for interpretations of `z` that match the type of any of the second
     1355parameters of some `f`.
     1356Naturally, if no matching interpretation of `g( y )` can be found for the
     1357first parameter of some `f`, the type of the second parameter of that `f` will
     1358not be added to the set of valid types for `z`.
     1359Each node in this interpretation DAG is given a cost the same way it would be
     1360in the bottom-up approach, with the exception that when going top-down there
     1361must be a final bottom-up pass to sum the interpretation costs and sort them
     1362as appropriate.
     1364If a parameter type for some `f` is a polymorphic type variable that is left
     1365unbound by the return type (e.g. `forall(otype S) int f(S x, int y)`), the
     1366matching arguments should be found using the bottom-up algorithm above for
     1367untyped contexts, because the polymorphic type variable does not sufficiently
     1368constrain the available interpretations of the argument expression.
     1369Similarly, it would likely be an advantage to use top-down resolution for
     1370cast expressions (e.g. `(int)x`), even when those cast expressions are
     1371subexpressions of an otherwise untyped expression.
     1372It may also be fruitful to switch between the bottom-up and top-down
     1373algorithms if the number of valid interpretations for a subexpression or valid
     1374types for an argument exceeds some heuristic threshold, but finding such
     1375a threshold (if any exists) will require experimental data.
     1376This hybrid top-down/bottom-up search provides more opportunities for pruning
     1377interpretations than either a bottom-up or top-down approach alone, and thus
     1378may be more efficient than either.
     1379A top-down-only approach, however, devolves to linear search through every
     1380possible interpretation in the solution space in an untyped context, and is
     1381thus likely to be inferior to a strictly bottom-up approach, though this
     1382hypothesis needs to be empirically validated.
     1384Both Baker and Cormack explicitly generate all possible interpretations of a
     1385given expression; thinking of the set of interpretations of an expression as a
     1386list sorted by cost, this is an eager evaluation of the list.
     1387However, since we generally expect that user programmers will not often use
     1388high-cost implicit conversions, one potentially effective way to prune the
     1389search space would be to first find the minimal-cost interpretations of any
     1390given subexpression, then to save the resolution progress of the
     1391subexpressions and attempt to resolve the superexpression using only those
     1392subexpression interpretations.
     1393If no valid interpretation of the superexpression can be found, the resolver
     1394would then repeatedly find the next-most-minimal cost interpretations of the
     1395subexpressions and attempt to produce the minimal cost interpretation of the
     1397This process would proceed until all possible subexpression interpretations
     1398have been found and considered.
     1400A middle ground between the eager and lazy approaches can be taken by
     1401considering the lexical order on the cost tuple; essentially, every
     1402interpretation in each of the classes below will be strictly cheaper than any
     1403interpretation in the class after it, so if a min-cost valid interpretation
     1404can be found while only generating interpretations in a given class, that
     1405interpretation is guaranteed to be the best possible one:
     14071. Interpretations without polymorphic functions or implicit conversions
     14082. Interpretations without polymorphic functions using only safe conversions
     14093. Interpretations using polymorphic functions without unsafe conversions
     14104. Interpretations using unsafe conversions
     1412In this lazy-eager approach, all the interpretations in one class would be
     1413eagerly generated, while the interpretations in the next class would only be
     1414considered if no match was found in the previous class.
    12261416## Appendix A: Partial and Total Orders ##
    12271417The `<=` relation on integers is a commonly known _total order_, and
Note: See TracChangeset for help on using the changeset viewer.