Ignore:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/working/resolver_design.md

    rd14d96a rd5f1cfc  
    8181
    8282## Conversion Costs ##
    83 Each possible resolution of an expression has a _cost_ tuple consisting of
    84 the following components: _unsafe_ conversion cost, _polymorphic_
    85 specialization cost, _safe_ conversion cost, a count of _explicit_
    86 conversions, and _qualifier_ conversion cost.
    87 These components are lexically-ordered and can be summed element-wise;
    88 summation starts at `(0, 0, 0, 0, 0)`.
     83Each possible resolution of an expression has a _cost_ consisting of four
     84integer components: _unsafe_ conversion cost, _polymorphic_ specialization
     85cost, _safe_ conversion cost, and a _count_ of conversions.
     86These components form a lexically-ordered tuple which can be summed
     87element-wise; summation starts at `(0, 0, 0, 0)`.
    8988
    9089### Lvalue and Qualifier Conversions ###
     
    12251224programmers.
    12261225
    1227 ## Resolver Architecture ##
    1228 
    1229 ### Function Application Resolution ###
    1230 Our resolution algorithm for function application expressions is based on
    1231 Baker's[3] single-pass bottom-up algorithm, with Cormack's[4] single-pass
    1232 top-down algorithm applied where appropriate as an optimization.
    1233 Broadly speaking, the cost of this resolution per expression will be
    1234 proportional to `i^d`, where `i` is the number of interpretations of each
    1235 program symbol, and `d` is the maximum depth of the expression DAG.
    1236 Since `d` is determined by the user programmer (in general, bounded by a small
    1237 constant), opportunities for resolver optimization primarily revolve around
    1238 minimizing `i`, the number of interpretations of each symbol that are
    1239 considered.
    1240 
    1241 [3] Baker, Theodore P. A one-pass algorithm for overload resolution in Ada.
    1242 ACM Transactions on Programming Languages and Systems (1982) 4:4 p.601-614
    1243 
    1244 [4] Cormack, Gordon V. An algorithm for the selection of overloaded functions
    1245 in Ada. SIGPLAN Notices (1981) 16:2 p.48-52
    1246 
    1247 Unlike Baker, our system allows implicit type conversions for function
    1248 arguments and return types; the problem then becomes to find the valid
    1249 interpretation for an expression that has the unique minimal conversion cost,
    1250 if such exists.
    1251 Interpretations can be produced both by overloaded names and implicit
    1252 conversions applied to existing interpretations; we have proposals to reduce
    1253 the number of interpretations considered from both sources.
    1254 To simplify the problem for this discussion, we will consider application
    1255 resolution restricted to a domain of functions applied to variables, possibly
    1256 in a nested manner (e.g. `f( g( x ), y )`, where `x` and `y` are variables and
    1257 `f` and `g` are functions), and possibly in a typed context such as a variable
    1258 initialization (e.g. `int i = f( x );`); the other aspects of Cforall type
    1259 resolution should be able to be straightforwardly mapped into this model.
    1260 The types of the symbol tables used for variable and function declarations
    1261 look somewhat like the following:
    1262 
    1263         variable_table = name_map( variable_name, variable_map )
    1264        
    1265         function_table = name_map( function_name, function_map )
    1266        
    1267         variable_map = multi_index( by_type( variable_type ),
    1268                                     variable_decl_set )
    1269 
    1270         function_map = multi_index( by_int( n_params ),
    1271                                                                 by_type( return_type ),
    1272                                                                 function_decl_set )
    1273 
    1274 `variable_name` and `function_name` are likely simple strings, with `name_map`
    1275 a hash table (or perhaps trie) mapping string keys to values.
    1276 `variable_decl_set` and `function_decl_set` can be thought of for the moment
    1277 as simple bags of typed declarations, where the declaration types are linked
    1278 to the graph of available conversions for that type.
    1279 In a typed context both the `variable_decl_set` and the `function_decl_set`
    1280 should be able to be selected upon by type; this is accomplished by the
    1281 `by_type` index of both `variable_map` and `function_map`.
    1282 The `by_int` index of `function_map` also provides a way to select functions
    1283 by their number of parameters; this index may be used to swiftly discard any
    1284 function declaration which does not have the appropriate number of parameters
    1285 for the argument interpretations being passed to it; given the likely small
    1286 number of entries in this map, it is possible that a binary search of a sorted
    1287 vector or even a linear search of an unsorted vector would be more efficient
    1288 than the usual hash-based index.
    1289 
    1290 Given these data structures, the general outline of our algorithm follows
    1291 Baker, with Cormack's algorithm used as a heuristic filter in typed contexts.
    1292 
    1293 In an untyped context, we use a variant of Baker's bottom-up algorithm.
    1294 The leaves of the interpretation DAG are drawn from the variable symbol table,
    1295 with entries in the table each producing zero-cost interpretations, and each
    1296 implicit conversion available to be applied to the type of an existing entry
    1297 producing a further interpretation with the same cost as the conversion.
    1298 As in Baker, if two or more interpretations have the same type, only the
    1299 minimum cost interpretation with that type is produced; if there is no unique
    1300 minimum cost interpretation than resolution with that type is ambiguous, and
    1301 not permitted.
    1302 It should be relatively simple to produce the list of interpretations sorted
    1303 by cost by producing the interpretations via a breadth-first search of the
    1304 conversion graph from the initial interpretations provided in the variable
    1305 symbol table.
    1306 
    1307 To match a function at one of the internal nodes of the DAG, we first look up
    1308 the function's name in the function symbol table, the appropriate number of
    1309 parameters for the arguments that are provided through the `by_int` index of
    1310 the returned `function_map`, then go through the resulting `function_decl_set`
    1311 searching for functions where the parameter types can unify with the provided
    1312 argument lists; any such matching function produces an interpretation with a
    1313 cost that is the sum of its argument costs.
    1314 Though this is not included in our simplified model, this unification step may
    1315 include binding of polymorphic variables, which introduces a cost for the
    1316 function binding itself which must be added to the argument costs.
    1317 Also, checking of function assertions would likely be done at this step as
    1318 well, possibly eliminating some possible matching functions (if no suitable
    1319 assertions can be satisfied), or adding further conversion costs for the
    1320 assertion satisfaction.
    1321 Once the set of valid function interpretations is produced, these may also be
    1322 expanded by the graph of implicit conversions on their return types, as the
    1323 variable interpretations were.
    1324 
    1325 This implicit conversion-based expansion of interpretations should be skipped
    1326 for the top-level expression if used in an untyped (void) context, e.g. for
    1327 `f` in `f( g ( x ) );` or `x` in `x;`.
    1328 On the other hand, if the top-level expression specifies a type, e.g. in
    1329 `int i = f( x );`, only top level expressions that return that type are
    1330 relevant to the search, so the candidates for `f` can be filtered first by
    1331 those that return `int` (or a type convertable to it); this can be
    1332 accomplished by performing a top-down filter of the interpretations of `f` by
    1333 the `by_type` index of the `function_map` in a manner similar to Cormack's[4]
    1334 algorithm.
    1335 
    1336 In a typed context, such as an initialization expression
    1337 `T x = f( g( y ), z );`, only interpretations of `f( g( y ), z )` which have
    1338 type `T` are valid; since there are likely to be valid interpretations of
    1339 `f( g( y ), z )` which cannot be used to initialize a variable of type `T`, we
    1340 can use this information to reduce the number of interpretations considered.
    1341 Drawing from Cormack[4], we first search for interpretations of `f` where the
    1342 return type is `T`; by breadth-first-search of the conversion graph, it should
    1343 be straightforward to order the interpretations of `f` by the cost to convert
    1344 their return type to `T`.
    1345 We can also filter out interpretations of `f` with less than two parameters,
    1346 since both `g( y )` and `z` must produce at least one parameter; we may not,
    1347 however, rule out interpretations of `f` with more than two parameters, as
    1348 there may be a valid interpretation of `g( y )` as a function returning more
    1349 than one parameter (if the expression was `f( y, z )` instead, we could use an
    1350 exact parameter count, assuming that variables of tuple type don't exist).
    1351 For each compatible interpretation of `f`, we can add the type of the first
    1352 parameter of that interpretation of `f` to a set `S`, and recursively search
    1353 for interpretations of `g( y )` that return some type `Si` in `S`, and
    1354 similarly for interpretations of `z` that match the type of any of the second
    1355 parameters of some `f`.
    1356 Naturally, if no matching interpretation of `g( y )` can be found for the
    1357 first parameter of some `f`, the type of the second parameter of that `f` will
    1358 not be added to the set of valid types for `z`.
    1359 Each node in this interpretation DAG is given a cost the same way it would be
    1360 in the bottom-up approach, with the exception that when going top-down there
    1361 must be a final bottom-up pass to sum the interpretation costs and sort them
    1362 as appropriate.
    1363 
    1364 If a parameter type for some `f` is a polymorphic type variable that is left
    1365 unbound by the return type (e.g. `forall(otype S) int f(S x, int y)`), the
    1366 matching arguments should be found using the bottom-up algorithm above for
    1367 untyped contexts, because the polymorphic type variable does not sufficiently
    1368 constrain the available interpretations of the argument expression.
    1369 Similarly, it would likely be an advantage to use top-down resolution for
    1370 cast expressions (e.g. `(int)x`), even when those cast expressions are
    1371 subexpressions of an otherwise untyped expression.
    1372 It may also be fruitful to switch between the bottom-up and top-down
    1373 algorithms if the number of valid interpretations for a subexpression or valid
    1374 types for an argument exceeds some heuristic threshold, but finding such
    1375 a threshold (if any exists) will require experimental data.
    1376 This hybrid top-down/bottom-up search provides more opportunities for pruning
    1377 interpretations than either a bottom-up or top-down approach alone, and thus
    1378 may be more efficient than either.
    1379 A top-down-only approach, however, devolves to linear search through every
    1380 possible interpretation in the solution space in an untyped context, and is
    1381 thus likely to be inferior to a strictly bottom-up approach, though this
    1382 hypothesis needs to be empirically validated.
    1383 
    1384 Both Baker and Cormack explicitly generate all possible interpretations of a
    1385 given expression; thinking of the set of interpretations of an expression as a
    1386 list sorted by cost, this is an eager evaluation of the list.
    1387 However, since we generally expect that user programmers will not often use
    1388 high-cost implicit conversions, one potentially effective way to prune the
    1389 search space would be to first find the minimal-cost interpretations of any
    1390 given subexpression, then to save the resolution progress of the
    1391 subexpressions and attempt to resolve the superexpression using only those
    1392 subexpression interpretations.
    1393 If no valid interpretation of the superexpression can be found, the resolver
    1394 would then repeatedly find the next-most-minimal cost interpretations of the
    1395 subexpressions and attempt to produce the minimal cost interpretation of the
    1396 superexpression.
    1397 This process would proceed until all possible subexpression interpretations
    1398 have been found and considered.
    1399 
    1400 A middle ground between the eager and lazy approaches can be taken by
    1401 considering the lexical order on the cost tuple; essentially, every
    1402 interpretation in each of the classes below will be strictly cheaper than any
    1403 interpretation in the class after it, so if a min-cost valid interpretation
    1404 can be found while only generating interpretations in a given class, that
    1405 interpretation is guaranteed to be the best possible one:
    1406 
    1407 1. Interpretations without polymorphic functions or implicit conversions
    1408 2. Interpretations without polymorphic functions using only safe conversions
    1409 3. Interpretations using polymorphic functions without unsafe conversions
    1410 4. Interpretations using unsafe conversions
    1411 
    1412 In this lazy-eager approach, all the interpretations in one class would be
    1413 eagerly generated, while the interpretations in the next class would only be
    1414 considered if no match was found in the previous class.
    1415 
    14161226## Appendix A: Partial and Total Orders ##
    14171227The `<=` relation on integers is a commonly known _total order_, and
Note: See TracChangeset for help on using the changeset viewer.