Changeset d14d96a for doc/working/resolver_design.md
 Timestamp:
 Jun 13, 2016, 4:59:28 PM (6 years ago)
 Branches:
 aaronthesis, armeh, cleanupdtors, ctor, deferred_resn, demangler, gc_noraii, jacob/cs343translation, jenkinssandbox, master, memory, newast, newastuniqueexpr, newenv, no_list, persistentindexer, resolvnew, with_gc
 Children:
 c8c03683
 Parents:
 38bfe32a
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

doc/working/resolver_design.md
r38bfe32a rd14d96a 81 81 82 82 ## Conversion Costs ## 83 Each possible resolution of an expression has a _cost_ consisting of four 84 integer components: _unsafe_ conversion cost, _polymorphic_ specialization 85 cost, _safe_ conversion cost, and a _count_ of conversions. 86 These components form a lexicallyordered tuple which can be summed 87 elementwise; summation starts at `(0, 0, 0, 0)`. 83 Each possible resolution of an expression has a _cost_ tuple consisting of 84 the following components: _unsafe_ conversion cost, _polymorphic_ 85 specialization cost, _safe_ conversion cost, a count of _explicit_ 86 conversions, and _qualifier_ conversion cost. 87 These components are lexicallyordered and can be summed elementwise; 88 summation starts at `(0, 0, 0, 0, 0)`. 88 89 89 90 ### Lvalue and Qualifier Conversions ### … … 1224 1225 programmers. 1225 1226 1227 ## Resolver Architecture ## 1228 1229 ### Function Application Resolution ### 1230 Our resolution algorithm for function application expressions is based on 1231 Baker's[3] singlepass bottomup algorithm, with Cormack's[4] singlepass 1232 topdown algorithm applied where appropriate as an optimization. 1233 Broadly speaking, the cost of this resolution per expression will be 1234 proportional to `i^d`, where `i` is the number of interpretations of each 1235 program symbol, and `d` is the maximum depth of the expression DAG. 1236 Since `d` is determined by the user programmer (in general, bounded by a small 1237 constant), opportunities for resolver optimization primarily revolve around 1238 minimizing `i`, the number of interpretations of each symbol that are 1239 considered. 1240 1241 [3] Baker, Theodore P. A onepass algorithm for overload resolution in Ada. 1242 ACM Transactions on Programming Languages and Systems (1982) 4:4 p.601614 1243 1244 [4] Cormack, Gordon V. An algorithm for the selection of overloaded functions 1245 in Ada. SIGPLAN Notices (1981) 16:2 p.4852 1246 1247 Unlike Baker, our system allows implicit type conversions for function 1248 arguments and return types; the problem then becomes to find the valid 1249 interpretation for an expression that has the unique minimal conversion cost, 1250 if such exists. 1251 Interpretations can be produced both by overloaded names and implicit 1252 conversions applied to existing interpretations; we have proposals to reduce 1253 the number of interpretations considered from both sources. 1254 To simplify the problem for this discussion, we will consider application 1255 resolution restricted to a domain of functions applied to variables, possibly 1256 in a nested manner (e.g. `f( g( x ), y )`, where `x` and `y` are variables and 1257 `f` and `g` are functions), and possibly in a typed context such as a variable 1258 initialization (e.g. `int i = f( x );`); the other aspects of Cforall type 1259 resolution should be able to be straightforwardly mapped into this model. 1260 The types of the symbol tables used for variable and function declarations 1261 look somewhat like the following: 1262 1263 variable_table = name_map( variable_name, variable_map ) 1264 1265 function_table = name_map( function_name, function_map ) 1266 1267 variable_map = multi_index( by_type( variable_type ), 1268 variable_decl_set ) 1269 1270 function_map = multi_index( by_int( n_params ), 1271 by_type( return_type ), 1272 function_decl_set ) 1273 1274 `variable_name` and `function_name` are likely simple strings, with `name_map` 1275 a hash table (or perhaps trie) mapping string keys to values. 1276 `variable_decl_set` and `function_decl_set` can be thought of for the moment 1277 as simple bags of typed declarations, where the declaration types are linked 1278 to the graph of available conversions for that type. 1279 In a typed context both the `variable_decl_set` and the `function_decl_set` 1280 should be able to be selected upon by type; this is accomplished by the 1281 `by_type` index of both `variable_map` and `function_map`. 1282 The `by_int` index of `function_map` also provides a way to select functions 1283 by their number of parameters; this index may be used to swiftly discard any 1284 function declaration which does not have the appropriate number of parameters 1285 for the argument interpretations being passed to it; given the likely small 1286 number of entries in this map, it is possible that a binary search of a sorted 1287 vector or even a linear search of an unsorted vector would be more efficient 1288 than the usual hashbased index. 1289 1290 Given these data structures, the general outline of our algorithm follows 1291 Baker, with Cormack's algorithm used as a heuristic filter in typed contexts. 1292 1293 In an untyped context, we use a variant of Baker's bottomup algorithm. 1294 The leaves of the interpretation DAG are drawn from the variable symbol table, 1295 with entries in the table each producing zerocost interpretations, and each 1296 implicit conversion available to be applied to the type of an existing entry 1297 producing a further interpretation with the same cost as the conversion. 1298 As in Baker, if two or more interpretations have the same type, only the 1299 minimum cost interpretation with that type is produced; if there is no unique 1300 minimum cost interpretation than resolution with that type is ambiguous, and 1301 not permitted. 1302 It should be relatively simple to produce the list of interpretations sorted 1303 by cost by producing the interpretations via a breadthfirst search of the 1304 conversion graph from the initial interpretations provided in the variable 1305 symbol table. 1306 1307 To match a function at one of the internal nodes of the DAG, we first look up 1308 the function's name in the function symbol table, the appropriate number of 1309 parameters for the arguments that are provided through the `by_int` index of 1310 the returned `function_map`, then go through the resulting `function_decl_set` 1311 searching for functions where the parameter types can unify with the provided 1312 argument lists; any such matching function produces an interpretation with a 1313 cost that is the sum of its argument costs. 1314 Though this is not included in our simplified model, this unification step may 1315 include binding of polymorphic variables, which introduces a cost for the 1316 function binding itself which must be added to the argument costs. 1317 Also, checking of function assertions would likely be done at this step as 1318 well, possibly eliminating some possible matching functions (if no suitable 1319 assertions can be satisfied), or adding further conversion costs for the 1320 assertion satisfaction. 1321 Once the set of valid function interpretations is produced, these may also be 1322 expanded by the graph of implicit conversions on their return types, as the 1323 variable interpretations were. 1324 1325 This implicit conversionbased expansion of interpretations should be skipped 1326 for the toplevel expression if used in an untyped (void) context, e.g. for 1327 `f` in `f( g ( x ) );` or `x` in `x;`. 1328 On the other hand, if the toplevel expression specifies a type, e.g. in 1329 `int i = f( x );`, only top level expressions that return that type are 1330 relevant to the search, so the candidates for `f` can be filtered first by 1331 those that return `int` (or a type convertable to it); this can be 1332 accomplished by performing a topdown filter of the interpretations of `f` by 1333 the `by_type` index of the `function_map` in a manner similar to Cormack's[4] 1334 algorithm. 1335 1336 In a typed context, such as an initialization expression 1337 `T x = f( g( y ), z );`, only interpretations of `f( g( y ), z )` which have 1338 type `T` are valid; since there are likely to be valid interpretations of 1339 `f( g( y ), z )` which cannot be used to initialize a variable of type `T`, we 1340 can use this information to reduce the number of interpretations considered. 1341 Drawing from Cormack[4], we first search for interpretations of `f` where the 1342 return type is `T`; by breadthfirstsearch of the conversion graph, it should 1343 be straightforward to order the interpretations of `f` by the cost to convert 1344 their return type to `T`. 1345 We can also filter out interpretations of `f` with less than two parameters, 1346 since both `g( y )` and `z` must produce at least one parameter; we may not, 1347 however, rule out interpretations of `f` with more than two parameters, as 1348 there may be a valid interpretation of `g( y )` as a function returning more 1349 than one parameter (if the expression was `f( y, z )` instead, we could use an 1350 exact parameter count, assuming that variables of tuple type don't exist). 1351 For each compatible interpretation of `f`, we can add the type of the first 1352 parameter of that interpretation of `f` to a set `S`, and recursively search 1353 for interpretations of `g( y )` that return some type `Si` in `S`, and 1354 similarly for interpretations of `z` that match the type of any of the second 1355 parameters of some `f`. 1356 Naturally, if no matching interpretation of `g( y )` can be found for the 1357 first parameter of some `f`, the type of the second parameter of that `f` will 1358 not be added to the set of valid types for `z`. 1359 Each node in this interpretation DAG is given a cost the same way it would be 1360 in the bottomup approach, with the exception that when going topdown there 1361 must be a final bottomup pass to sum the interpretation costs and sort them 1362 as appropriate. 1363 1364 If a parameter type for some `f` is a polymorphic type variable that is left 1365 unbound by the return type (e.g. `forall(otype S) int f(S x, int y)`), the 1366 matching arguments should be found using the bottomup algorithm above for 1367 untyped contexts, because the polymorphic type variable does not sufficiently 1368 constrain the available interpretations of the argument expression. 1369 Similarly, it would likely be an advantage to use topdown resolution for 1370 cast expressions (e.g. `(int)x`), even when those cast expressions are 1371 subexpressions of an otherwise untyped expression. 1372 It may also be fruitful to switch between the bottomup and topdown 1373 algorithms if the number of valid interpretations for a subexpression or valid 1374 types for an argument exceeds some heuristic threshold, but finding such 1375 a threshold (if any exists) will require experimental data. 1376 This hybrid topdown/bottomup search provides more opportunities for pruning 1377 interpretations than either a bottomup or topdown approach alone, and thus 1378 may be more efficient than either. 1379 A topdownonly approach, however, devolves to linear search through every 1380 possible interpretation in the solution space in an untyped context, and is 1381 thus likely to be inferior to a strictly bottomup approach, though this 1382 hypothesis needs to be empirically validated. 1383 1384 Both Baker and Cormack explicitly generate all possible interpretations of a 1385 given expression; thinking of the set of interpretations of an expression as a 1386 list sorted by cost, this is an eager evaluation of the list. 1387 However, since we generally expect that user programmers will not often use 1388 highcost implicit conversions, one potentially effective way to prune the 1389 search space would be to first find the minimalcost interpretations of any 1390 given subexpression, then to save the resolution progress of the 1391 subexpressions and attempt to resolve the superexpression using only those 1392 subexpression interpretations. 1393 If no valid interpretation of the superexpression can be found, the resolver 1394 would then repeatedly find the nextmostminimal cost interpretations of the 1395 subexpressions and attempt to produce the minimal cost interpretation of the 1396 superexpression. 1397 This process would proceed until all possible subexpression interpretations 1398 have been found and considered. 1399 1400 A middle ground between the eager and lazy approaches can be taken by 1401 considering the lexical order on the cost tuple; essentially, every 1402 interpretation in each of the classes below will be strictly cheaper than any 1403 interpretation in the class after it, so if a mincost valid interpretation 1404 can be found while only generating interpretations in a given class, that 1405 interpretation is guaranteed to be the best possible one: 1406 1407 1. Interpretations without polymorphic functions or implicit conversions 1408 2. Interpretations without polymorphic functions using only safe conversions 1409 3. Interpretations using polymorphic functions without unsafe conversions 1410 4. Interpretations using unsafe conversions 1411 1412 In this lazyeager approach, all the interpretations in one class would be 1413 eagerly generated, while the interpretations in the next class would only be 1414 considered if no match was found in the previous class. 1415 1226 1416 ## Appendix A: Partial and Total Orders ## 1227 1417 The `<=` relation on integers is a commonly known _total order_, and
Note: See TracChangeset
for help on using the changeset viewer.