Changes in doc/working/resolver_design.md [d14d96a:d5f1cfc]
- File:
-
- 1 edited
-
doc/working/resolver_design.md (modified) (2 diffs)
Legend:
- Unmodified
- Added
- Removed
-
doc/working/resolver_design.md
rd14d96a rd5f1cfc 81 81 82 82 ## Conversion Costs ## 83 Each possible resolution of an expression has a _cost_ tuple consisting of 84 the following components: _unsafe_ conversion cost, _polymorphic_ 85 specialization cost, _safe_ conversion cost, a count of _explicit_ 86 conversions, and _qualifier_ conversion cost. 87 These components are lexically-ordered and can be summed element-wise; 88 summation starts at `(0, 0, 0, 0, 0)`. 83 Each possible resolution of an expression has a _cost_ consisting of four 84 integer components: _unsafe_ conversion cost, _polymorphic_ specialization 85 cost, _safe_ conversion cost, and a _count_ of conversions. 86 These components form a lexically-ordered tuple which can be summed 87 element-wise; summation starts at `(0, 0, 0, 0)`. 89 88 90 89 ### Lvalue and Qualifier Conversions ### … … 1225 1224 programmers. 1226 1225 1227 ## Resolver Architecture ##1228 1229 ### Function Application Resolution ###1230 Our resolution algorithm for function application expressions is based on1231 Baker's[3] single-pass bottom-up algorithm, with Cormack's[4] single-pass1232 top-down algorithm applied where appropriate as an optimization.1233 Broadly speaking, the cost of this resolution per expression will be1234 proportional to `i^d`, where `i` is the number of interpretations of each1235 program symbol, and `d` is the maximum depth of the expression DAG.1236 Since `d` is determined by the user programmer (in general, bounded by a small1237 constant), opportunities for resolver optimization primarily revolve around1238 minimizing `i`, the number of interpretations of each symbol that are1239 considered.1240 1241 [3] Baker, Theodore P. A one-pass algorithm for overload resolution in Ada.1242 ACM Transactions on Programming Languages and Systems (1982) 4:4 p.601-6141243 1244 [4] Cormack, Gordon V. An algorithm for the selection of overloaded functions1245 in Ada. SIGPLAN Notices (1981) 16:2 p.48-521246 1247 Unlike Baker, our system allows implicit type conversions for function1248 arguments and return types; the problem then becomes to find the valid1249 interpretation for an expression that has the unique minimal conversion cost,1250 if such exists.1251 Interpretations can be produced both by overloaded names and implicit1252 conversions applied to existing interpretations; we have proposals to reduce1253 the number of interpretations considered from both sources.1254 To simplify the problem for this discussion, we will consider application1255 resolution restricted to a domain of functions applied to variables, possibly1256 in a nested manner (e.g. `f( g( x ), y )`, where `x` and `y` are variables and1257 `f` and `g` are functions), and possibly in a typed context such as a variable1258 initialization (e.g. `int i = f( x );`); the other aspects of Cforall type1259 resolution should be able to be straightforwardly mapped into this model.1260 The types of the symbol tables used for variable and function declarations1261 look somewhat like the following:1262 1263 variable_table = name_map( variable_name, variable_map )1264 1265 function_table = name_map( function_name, function_map )1266 1267 variable_map = multi_index( by_type( variable_type ),1268 variable_decl_set )1269 1270 function_map = multi_index( by_int( n_params ),1271 by_type( return_type ),1272 function_decl_set )1273 1274 `variable_name` and `function_name` are likely simple strings, with `name_map`1275 a hash table (or perhaps trie) mapping string keys to values.1276 `variable_decl_set` and `function_decl_set` can be thought of for the moment1277 as simple bags of typed declarations, where the declaration types are linked1278 to the graph of available conversions for that type.1279 In a typed context both the `variable_decl_set` and the `function_decl_set`1280 should be able to be selected upon by type; this is accomplished by the1281 `by_type` index of both `variable_map` and `function_map`.1282 The `by_int` index of `function_map` also provides a way to select functions1283 by their number of parameters; this index may be used to swiftly discard any1284 function declaration which does not have the appropriate number of parameters1285 for the argument interpretations being passed to it; given the likely small1286 number of entries in this map, it is possible that a binary search of a sorted1287 vector or even a linear search of an unsorted vector would be more efficient1288 than the usual hash-based index.1289 1290 Given these data structures, the general outline of our algorithm follows1291 Baker, with Cormack's algorithm used as a heuristic filter in typed contexts.1292 1293 In an untyped context, we use a variant of Baker's bottom-up algorithm.1294 The leaves of the interpretation DAG are drawn from the variable symbol table,1295 with entries in the table each producing zero-cost interpretations, and each1296 implicit conversion available to be applied to the type of an existing entry1297 producing a further interpretation with the same cost as the conversion.1298 As in Baker, if two or more interpretations have the same type, only the1299 minimum cost interpretation with that type is produced; if there is no unique1300 minimum cost interpretation than resolution with that type is ambiguous, and1301 not permitted.1302 It should be relatively simple to produce the list of interpretations sorted1303 by cost by producing the interpretations via a breadth-first search of the1304 conversion graph from the initial interpretations provided in the variable1305 symbol table.1306 1307 To match a function at one of the internal nodes of the DAG, we first look up1308 the function's name in the function symbol table, the appropriate number of1309 parameters for the arguments that are provided through the `by_int` index of1310 the returned `function_map`, then go through the resulting `function_decl_set`1311 searching for functions where the parameter types can unify with the provided1312 argument lists; any such matching function produces an interpretation with a1313 cost that is the sum of its argument costs.1314 Though this is not included in our simplified model, this unification step may1315 include binding of polymorphic variables, which introduces a cost for the1316 function binding itself which must be added to the argument costs.1317 Also, checking of function assertions would likely be done at this step as1318 well, possibly eliminating some possible matching functions (if no suitable1319 assertions can be satisfied), or adding further conversion costs for the1320 assertion satisfaction.1321 Once the set of valid function interpretations is produced, these may also be1322 expanded by the graph of implicit conversions on their return types, as the1323 variable interpretations were.1324 1325 This implicit conversion-based expansion of interpretations should be skipped1326 for the top-level expression if used in an untyped (void) context, e.g. for1327 `f` in `f( g ( x ) );` or `x` in `x;`.1328 On the other hand, if the top-level expression specifies a type, e.g. in1329 `int i = f( x );`, only top level expressions that return that type are1330 relevant to the search, so the candidates for `f` can be filtered first by1331 those that return `int` (or a type convertable to it); this can be1332 accomplished by performing a top-down filter of the interpretations of `f` by1333 the `by_type` index of the `function_map` in a manner similar to Cormack's[4]1334 algorithm.1335 1336 In a typed context, such as an initialization expression1337 `T x = f( g( y ), z );`, only interpretations of `f( g( y ), z )` which have1338 type `T` are valid; since there are likely to be valid interpretations of1339 `f( g( y ), z )` which cannot be used to initialize a variable of type `T`, we1340 can use this information to reduce the number of interpretations considered.1341 Drawing from Cormack[4], we first search for interpretations of `f` where the1342 return type is `T`; by breadth-first-search of the conversion graph, it should1343 be straightforward to order the interpretations of `f` by the cost to convert1344 their return type to `T`.1345 We can also filter out interpretations of `f` with less than two parameters,1346 since both `g( y )` and `z` must produce at least one parameter; we may not,1347 however, rule out interpretations of `f` with more than two parameters, as1348 there may be a valid interpretation of `g( y )` as a function returning more1349 than one parameter (if the expression was `f( y, z )` instead, we could use an1350 exact parameter count, assuming that variables of tuple type don't exist).1351 For each compatible interpretation of `f`, we can add the type of the first1352 parameter of that interpretation of `f` to a set `S`, and recursively search1353 for interpretations of `g( y )` that return some type `Si` in `S`, and1354 similarly for interpretations of `z` that match the type of any of the second1355 parameters of some `f`.1356 Naturally, if no matching interpretation of `g( y )` can be found for the1357 first parameter of some `f`, the type of the second parameter of that `f` will1358 not be added to the set of valid types for `z`.1359 Each node in this interpretation DAG is given a cost the same way it would be1360 in the bottom-up approach, with the exception that when going top-down there1361 must be a final bottom-up pass to sum the interpretation costs and sort them1362 as appropriate.1363 1364 If a parameter type for some `f` is a polymorphic type variable that is left1365 unbound by the return type (e.g. `forall(otype S) int f(S x, int y)`), the1366 matching arguments should be found using the bottom-up algorithm above for1367 untyped contexts, because the polymorphic type variable does not sufficiently1368 constrain the available interpretations of the argument expression.1369 Similarly, it would likely be an advantage to use top-down resolution for1370 cast expressions (e.g. `(int)x`), even when those cast expressions are1371 subexpressions of an otherwise untyped expression.1372 It may also be fruitful to switch between the bottom-up and top-down1373 algorithms if the number of valid interpretations for a subexpression or valid1374 types for an argument exceeds some heuristic threshold, but finding such1375 a threshold (if any exists) will require experimental data.1376 This hybrid top-down/bottom-up search provides more opportunities for pruning1377 interpretations than either a bottom-up or top-down approach alone, and thus1378 may be more efficient than either.1379 A top-down-only approach, however, devolves to linear search through every1380 possible interpretation in the solution space in an untyped context, and is1381 thus likely to be inferior to a strictly bottom-up approach, though this1382 hypothesis needs to be empirically validated.1383 1384 Both Baker and Cormack explicitly generate all possible interpretations of a1385 given expression; thinking of the set of interpretations of an expression as a1386 list sorted by cost, this is an eager evaluation of the list.1387 However, since we generally expect that user programmers will not often use1388 high-cost implicit conversions, one potentially effective way to prune the1389 search space would be to first find the minimal-cost interpretations of any1390 given subexpression, then to save the resolution progress of the1391 subexpressions and attempt to resolve the superexpression using only those1392 subexpression interpretations.1393 If no valid interpretation of the superexpression can be found, the resolver1394 would then repeatedly find the next-most-minimal cost interpretations of the1395 subexpressions and attempt to produce the minimal cost interpretation of the1396 superexpression.1397 This process would proceed until all possible subexpression interpretations1398 have been found and considered.1399 1400 A middle ground between the eager and lazy approaches can be taken by1401 considering the lexical order on the cost tuple; essentially, every1402 interpretation in each of the classes below will be strictly cheaper than any1403 interpretation in the class after it, so if a min-cost valid interpretation1404 can be found while only generating interpretations in a given class, that1405 interpretation is guaranteed to be the best possible one:1406 1407 1. Interpretations without polymorphic functions or implicit conversions1408 2. Interpretations without polymorphic functions using only safe conversions1409 3. Interpretations using polymorphic functions without unsafe conversions1410 4. Interpretations using unsafe conversions1411 1412 In this lazy-eager approach, all the interpretations in one class would be1413 eagerly generated, while the interpretations in the next class would only be1414 considered if no match was found in the previous class.1415 1416 1226 ## Appendix A: Partial and Total Orders ## 1417 1227 The `<=` relation on integers is a commonly known _total order_, and
Note:
See TracChangeset
for help on using the changeset viewer.