Changeset a2565a2 for doc/aaron_comp_II/comp_II.tex
 Timestamp:
 Jul 27, 2016, 4:45:59 PM (8 years ago)
 Branches:
 ADT, aaronthesis, armeh, astexperimental, cleanupdtors, ctor, deferred_resn, demangler, enum, forallpointerdecay, jacob/cs343translation, jenkinssandbox, master, memory, newast, newastuniqueexpr, newenv, no_list, persistentindexer, pthreademulation, qualifiedEnum, resolvnew, with_gc
 Children:
 b1bd5d38
 Parents:
 c967ef9
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

doc/aaron_comp_II/comp_II.tex
rc967ef9 ra2565a2 245 245 246 246 \section{Expression Resolution} 247 248 \subsection{ArgumentDirected} 249 250 251 \subsection{ParameterDirected} 252 \textbf{TODO: Richard's algorithm isn't Baker (Cormack?), disentangle from this section \ldots}. 253 The expression resolution algorithm used by the existing iteration of {CFACC} is based on Baker's\cite{Baker82} algorithm for overload resolution in Ada. 254 The essential idea of this algorithm is to first find the possible interpretations of the most deeply nested subexpressions, then to use these interpretations to recursively generate valid interpretations of their superexpressions. 255 To simplify matters, the only expressions considered in this discussion of the algorithm are function application and literal expressions; other expression types can generally be considered to be variants of one of these for the purposes of the resolver, \eg variables are essentially zeroargument functions. 256 If we consider expressions as graph nodes with arcs connecting them to their subexpressions, these expressions form a DAG, generated by the algorithm from the bottom up. 257 Literal expressions are represented by leaf nodes, annotated with the type of the expression, while a function application will have a reference to the function declaration chosen, as well as arcs to the interpretation nodes for its argument expressions; functions are annotated with their return type (or types, in the case of multiple return values). 258 259 \textbf{TODO: Figure} 260 261 Baker's algorithm was designed to account for name overloading; Richard Bilson\cite{Bilson03} extended this algorithm to also handle polymorphic functions, implicit conversions \& multiple return types when designing the original \CFA compiler. 262 The core of the algorithm is a function which Baker refers to as $gen\_calls$. 263 $gen\_calls$ takes as arguments the name of a function $f$ and a list containing the set of possible subexpression interpretations $S_j$ for each argument of the function and returns a set of possible interpretations of calling that function on those arguments. 264 The subexpression interpretations are generally either singleton sets generated by the single valid interpretation of a literal expression, or the results of a previous call to $gen\_calls$. 265 If there are no valid interpretations of an expression, the set returned by $gen\_calls$ will be empty, at which point resolution can cease, since each subexpression must have at least one valid interpretation to produce an interpretation of the whole expression. 266 On the other hand, if for some type $T$ there is more than one valid interpretation of an expression with type $T$, all interpretations of that expression with type $T$ can be collapsed into a single \emph{ambiguous expression} of type $T$, since the only way to disambiguate expressions is by their return types. 267 If a subexpression interpretation is ambiguous, than any expression interpretation containing it will also be ambiguous. 268 In the variant of this algorithm including implicit conversions, the interpretation of an expression as type $T$ is ambiguous only if there is more than one \emph{minimalcost} interpretation of the expression as type $T$, as cheaper expressions are always chosen in preference to more expensive ones. 269 270 Given this description of the behaviour of $gen\_calls$, its implementation is quite straightforward: for each function declaration $f_i$ matching the name of the function, consider each of the parameter types $p_j$ of $f_i$, attempting to match the type of an element of $S_j$ to $p_j$ (this may include checking of implicit conversions). 271 If no such element can be found, there is no valid interpretation of the expression using $f_i$, while if more than one such (minimalcost) element is found than an ambiguous interpretation with the result type of $f_i$ is produced. 272 In the \CFA variant, which includes polymorphic functions, it is possible that a single polymorphic function definition $f_i$ can produce multiple valid interpretations by different choices of type variable bindings; these interpretations are unambiguous so long as the return type of $f_i$ is different for each type binding. 273 If all the parameters $p_j$ of $f_i$ can be uniquely matched to a candidate interpretation, then a valid interpretation based on $f_i$ and those $p_j$ is produced. 274 $gen\_calls$ collects the produced interpretations for each $f_i$ and returns them; a top level expression is invalid if this list is empty, ambiguous if there is more than one (minimalcost) result, or if this single result is ambiguous, and valid otherwise. 275 276 In this implementation, resolution of a single toplevel expression takes time $O(\ldots)$, where \ldots. \textbf{TODO:} \textit{Look at 2.3.1 in Richard's thesis when working out complexity; I think he does get the Baker algorithm wrong on combinations though, maybe\ldots} 277 278 \textbf{TODO: Basic Lit Review} \textit{Look at 2.4 in Richard's thesis for any possible morerecent citations of Baker\ldots} \textit{Look back at Baker's related work for other papers that look similar to what you're doing, then check their citations as well\ldots} \textit{Look at Richard's citations in 2.3.2 w.r.t. type data structures\ldots} 279 \textit{CormackWright90 seems to describe a solution for the same problem, mostly focused on how to find the implicit parameters} 247 The expression resolution problem is essentially to determine an optimal matching between some combination of argument interpretations and the parameter list of some overloaded instance of a function; the argument interpretations are produced by recursive invocations of expression resolution, where the base case is zeroargument functions (which are, for purposes of this discussion, semantically equivalent to named variables or constant literal expressions). 248 Assuming that the matching between a function's parameter list and a combination of argument interpretations can be done in $O(p^k)$ time, where $p$ is the number of parameters and $k$ is some positive number, if there are $O(i)$ valid interpretations for each subexpression, there will be $O(i)$ candidate functions and $O(i^p)$ possible argument combinations for each expression, so a single recursive call to expression resolution will take $O(i^{p+1} \cdot p^k)$ time if it compares all combinations. 249 Given this bound, resolution of a single toplevel expression tree of depth $d$ takes $O(i^{p+1} \cdot p^{k \cdot d})$ time\footnote{The call tree will have leaves at depth $O(d)$, and each internal node will have $O(p)$ fanout, producing $O(p^d)$ total recursive calls.}. 250 Expression resolution is somewhat unavoidably exponential in $p$, the number of function parameters, and $d$, the depth of the expression tree, but these values are fixed by the user programmer, and generally bounded by reasonably small constants. 251 $k$, on the other hand, is mostly dependent on the representation of types in the system and the efficiency of type assertion checking; if a candidate argument combination can be compared to a function parameter list in linear time in the length of the list (\ie $k = 1$), then the $p^{k \cdot d}$ term is linear in the input size of the source code for the expression, otherwise the resolution algorithm will exibit sublinear performance scaling on code containing moredeeply nested expressions. 252 The number of valid interpretations of any subexpression, $i$, is bounded by the number of types in the system, which is possibly infinite, though practical resolution algorithms for \CFA must be able to place some finite bound on $i$, possibly at the expense of type system completeness. 253 254 The research goal of this project is to develop a performant expression resolver for \CFA; this analysis suggests two primary areas of investigation to accomplish that end. 255 The first is efficient argumentparameter matching; Bilson\cite{Bilson03} mentions significant optimization opportunities available in the current literature to improve on the existing {CFACC} compiler \textbf{TODO:} \textit{look up and lit review}. 256 The second, and likely more fruitful, area of investigation is heuristics and algorithmic approaches to reduce the number of argument interpretations considered in the common case; given the large ($p+1$) exponent on number of interpretations considered in the runtime analysis, even small reductions here could have a significant effect on overall resolver runtime. 257 The discussion below presents a number of largely orthagonal axes for expression resolution algorithm design to be investigated, noting prior work where applicable. 258 259 \subsection{ArgumentParameter Matching} 260 The first axis we consider is argumentparameter matching  whether the type matching for a candidate function to a set of candidate arguments is directed by the argument types or the parameter types. 261 262 \subsubsection{Argumentdirected (``Bottomup'')} 263 Baker's algorithm for expression resolution\cite{Baker82} precomputes argument candidates, from the leaves of the expression tree up. 264 For each candidate function, Baker attempts to match argument types to parameter types in sequence, failing if any parameter cannot be matched. 265 266 Bilson\cite{Bilson03} similarly precomputes argument candidates in the original \CFA compiler, but then explicitly enumerates all possible argument combinations for a multiparameter function; these argument combinations are matched to the parameter types of the candidate function as a unit rather than individual arguments. 267 This is less efficient than Baker's approach, as the same argument may be compared to the same parameter many times, but allows a more straightforward handling of polymorphic type binding and multiple return types. 268 It is possible the efficiency losses here relative to Baker could be significantly reduced by application of memoization to the argumentparameter type comparisons. 269 270 \subsubsection{Parameterdirected (``Topdown'')} 271 Unlike Baker and Bilson, Cormack's algorithm\cite{Cormack81} requests argument candidates which match the type of each parameter of each candidate function, from the toplevel expression down; memoization of these requests is presented as an optimization. 272 As presented, this algorithm requires the result of the expression to have a known type, though an algorithm based on Cormack's could reasonably request a candidate set of any return type, though such a set may be quite large. 273 274 \subsubsection{Hybrid} 275 This proposal includes the investigation of hybrid topdown/bottomup argumentparameter matching. 276 A reasonable hybrid approach might be to take a topdown approach when the expression to be matched is known to have a fixed type, and a bottomup approach in untyped contexts. 277 This may include switches from one type to another at different levels of the expression tree, for instance: 278 \begin{lstlisting} 279 forall(otype T) 280 int f(T x); // (1) 281 282 void* f(char y); // (2) 283 284 int x = f( f( '!' ) ); 285 \end{lstlisting} 286 Here, the outer call to ©f© must have a return type that is (implicitly convertable to) ©int©, so a topdown approach could be used to select \textit{(1)} as the proper interpretation of ©f©. \textit{(1)}'s parameter ©x© here, however, is an unbound type variable, and can thus take a value of any complete type, providing no guidance for the choice of candidate for the inner ©f©. The leaf expression ©'!'©, however, gives us a zerocost interpretation of the inner ©f© as \textit{(2)}, providing a minimalcost expression resolution where ©T© is bound to ©void*©. 287 288 Deciding when to switch between bottomup and topdown resolution in a hybrid algorithm is a necessarily heuristic process, and though finding good heuristics for it is an open question, one reasonable approach might be to switch from topdown to bottomup when the number of candidate functions exceeds some threshold. 289 290 \subsection{Implicit Conversion Application} 291 Baker's\cite{Baker82} and Cormack's\cite{Cormack81} algorithms do not account for implicit conversions\footnote{Baker does briefly comment on an approach for handling implicit conversions.}; both assume that there is at most one valid interpretation of a given expression for each distinct type. 292 Integrating implicit conversion handling into their algorithms provides some choice of implementation approach. 293 294 \subsubsection{On Parameters} 295 Bilson\cite{Bilson03} did account for implicit conversions in his algorithm, but it is not clear his approach is optimal. 296 His algorithm integrates checking for valid implicit conversions into the argumentparameter matching step, essentially trading more expensive matching for a smaller number of argument interpretations. 297 This approach may result in the same subexpression being checked for a type match with the same type multiple times, though again memoization may mitigate this cost, and this approach will not generate implicit conversions that are not useful to match the containing function. 298 299 \subsubsection{On Arguments} 300 Another approach would be to generate a set of possible implicit conversions for each set of interpretations of a given argument. 301 This would have the benefit of detecting ambiguous interpretations of arguments at the level of the argument rather than its containing call, and would also never find more than one interpretation of the argument with a given type. 302 On the other hand, this approach may unncessarily generate argument interpretations that will never match a parameter, wasting work. 303 304 \subsection{Candidate Set Generation} 305 306 \subsubsection{Eager} 307 308 \subsubsection{Lazy} 309 310 \subsubsection{Stepwise Lazy} 311 312 %\subsection{ParameterDirected} 313 %\textbf{TODO: Richard's algorithm isn't Baker (Cormack?), disentangle from this section \ldots}. 314 %The expression resolution algorithm used by the existing iteration of {CFACC} is based on Baker's\cite{Baker82} algorithm for overload resolution in Ada. 315 %The essential idea of this algorithm is to first find the possible interpretations of the most deeply nested subexpressions, then to use these interpretations to recursively generate valid interpretations of their superexpressions. 316 %To simplify matters, the only expressions considered in this discussion of the algorithm are function application and literal expressions; other expression types can generally be considered to be variants of one of these for the purposes of the resolver, \eg variables are essentially zeroargument functions. 317 %If we consider expressions as graph nodes with arcs connecting them to their subexpressions, these expressions form a DAG, generated by the algorithm from the bottom up. 318 %Literal expressions are represented by leaf nodes, annotated with the type of the expression, while a function application will have a reference to the function declaration chosen, as well as arcs to the interpretation nodes for its argument expressions; functions are annotated with their return type (or types, in the case of multiple return values). 319 % 320 %\textbf{TODO: Figure} 321 % 322 %Baker's algorithm was designed to account for name overloading; Richard Bilson\cite{Bilson03} extended this algorithm to also handle polymorphic functions, implicit conversions \& multiple return types when designing the original \CFA compiler. 323 %The core of the algorithm is a function which Baker refers to as $gen\_calls$. 324 %$gen\_calls$ takes as arguments the name of a function $f$ and a list containing the set of possible subexpression interpretations $S_j$ for each argument of the function and returns a set of possible interpretations of calling that function on those arguments. 325 %The subexpression interpretations are generally either singleton sets generated by the single valid interpretation of a literal expression, or the results of a previous call to $gen\_calls$. 326 %If there are no valid interpretations of an expression, the set returned by $gen\_calls$ will be empty, at which point resolution can cease, since each subexpression must have at least one valid interpretation to produce an interpretation of the whole expression. 327 %On the other hand, if for some type $T$ there is more than one valid interpretation of an expression with type $T$, all interpretations of that expression with type $T$ can be collapsed into a single \emph{ambiguous expression} of type $T$, since the only way to disambiguate expressions is by their return types. 328 %If a subexpression interpretation is ambiguous, than any expression interpretation containing it will also be ambiguous. 329 %In the variant of this algorithm including implicit conversions, the interpretation of an expression as type $T$ is ambiguous only if there is more than one \emph{minimalcost} interpretation of the expression as type $T$, as cheaper expressions are always chosen in preference to more expensive ones. 330 % 331 %Given this description of the behaviour of $gen\_calls$, its implementation is quite straightforward: for each function declaration $f_i$ matching the name of the function, consider each of the parameter types $p_j$ of $f_i$, attempting to match the type of an element of $S_j$ to $p_j$ (this may include checking of implicit conversions). 332 %If no such element can be found, there is no valid interpretation of the expression using $f_i$, while if more than one such (minimalcost) element is found than an ambiguous interpretation with the result type of $f_i$ is produced. 333 %In the \CFA variant, which includes polymorphic functions, it is possible that a single polymorphic function definition $f_i$ can produce multiple valid interpretations by different choices of type variable bindings; these interpretations are unambiguous so long as the return type of $f_i$ is different for each type binding. 334 %If all the parameters $p_j$ of $f_i$ can be uniquely matched to a candidate interpretation, then a valid interpretation based on $f_i$ and those $p_j$ is produced. 335 %$gen\_calls$ collects the produced interpretations for each $f_i$ and returns them; a top level expression is invalid if this list is empty, ambiguous if there is more than one (minimalcost) result, or if this single result is ambiguous, and valid otherwise. 336 % 337 %In this implementation, resolution of a single toplevel expression takes time $O(\ldots)$, where \ldots. \textbf{TODO:} \textit{Look at 2.3.1 in Richard's thesis when working out complexity; I think he does get the Baker algorithm wrong on combinations though, maybe\ldots} 338 % 339 %\textbf{TODO: Basic Lit Review} \textit{Look at 2.4 in Richard's thesis for any possible morerecent citations of Baker\ldots} \textit{Look back at Baker's related work for other papers that look similar to what you're doing, then check their citations as well\ldots} \textit{Look at Richard's citations in 2.3.2 w.r.t. type data structures\ldots} 340 %\textit{CormackWright90 seems to describe a solution for the same problem, mostly focused on how to find the implicit parameters} 280 341 281 342 \section{Proposal} 343 \textbf{TODO:} Talk about experimental setup here. 282 344 283 345 \section{Completion Timeline}
Note: See TracChangeset
for help on using the changeset viewer.