Changeset a2565a2 for doc/aaron_comp_II


Ignore:
Timestamp:
Jul 27, 2016, 4:45:59 PM (5 years ago)
Author:
Aaron Moss <a3moss@…>
Branches:
aaron-thesis, arm-eh, cleanup-dtors, ctor, deferred_resn, demangler, jacob/cs343-translation, jenkins-sandbox, master, memory, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, resolv-new, with_gc
Children:
b1bd5d38
Parents:
c967ef9
Message:

Rework expression resolution section of Comp II to talk more clearly about axes

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/aaron_comp_II/comp_II.tex

    rc967ef9 ra2565a2  
    245245
    246246\section{Expression Resolution}
    247 
    248 \subsection{Argument-Directed}
    249 
    250 
    251 \subsection{Parameter-Directed}
    252 \textbf{TODO: Richard's algorithm isn't Baker (Cormack?), disentangle from this section \ldots}.
    253 The expression resolution algorithm used by the existing iteration of {CFA-CC} is based on Baker's\cite{Baker82} algorithm for overload resolution in Ada.
    254 The essential idea of this algorithm is to first find the possible interpretations of the most deeply nested subexpressions, then to use these interpretations to recursively generate valid interpretations of their superexpressions.
    255 To simplify matters, the only expressions considered in this discussion of the algorithm are function application and literal expressions; other expression types can generally be considered to be variants of one of these for the purposes of the resolver, \eg variables are essentially zero-argument functions.
    256 If we consider expressions as graph nodes with arcs connecting them to their subexpressions, these expressions form a DAG, generated by the algorithm from the bottom up.
    257 Literal expressions are represented by leaf nodes, annotated with the type of the expression, while a function application will have a reference to the function declaration chosen, as well as arcs to the interpretation nodes for its argument expressions; functions are annotated with their return type (or types, in the case of multiple return values).
    258 
    259 \textbf{TODO: Figure}
    260 
    261 Baker's algorithm was designed to account for name overloading; Richard Bilson\cite{Bilson03} extended this algorithm to also handle polymorphic functions, implicit conversions \& multiple return types when designing the original \CFA compiler.
    262 The core of the algorithm is a function which Baker refers to as $gen\_calls$.
    263 $gen\_calls$ takes as arguments the name of a function $f$ and a list containing the set of possible subexpression interpretations $S_j$ for each argument of the function and returns a set of possible interpretations of calling that function on those arguments.
    264 The subexpression interpretations are generally either singleton sets generated by the single valid interpretation of a literal expression, or the results of a previous call to $gen\_calls$.
    265 If there are no valid interpretations of an expression, the set returned by $gen\_calls$ will be empty, at which point resolution can cease, since each subexpression must have at least one valid interpretation to produce an interpretation of the whole expression.
    266 On the other hand, if for some type $T$ there is more than one valid interpretation of an expression with type $T$, all interpretations of that expression with type $T$ can be collapsed into a single \emph{ambiguous expression} of type $T$, since the only way to disambiguate expressions is by their return types.
    267 If a subexpression interpretation is ambiguous, than any expression interpretation containing it will also be ambiguous.
    268 In the variant of this algorithm including implicit conversions, the interpretation of an expression as type $T$ is ambiguous only if there is more than one \emph{minimal-cost} interpretation of the expression as type $T$, as cheaper expressions are always chosen in preference to more expensive ones.
    269 
    270 Given this description of the behaviour of $gen\_calls$, its implementation is quite straightforward: for each function declaration $f_i$ matching the name of the function, consider each of the parameter types $p_j$ of $f_i$, attempting to match the type of an element of $S_j$ to $p_j$ (this may include checking of implicit conversions).
    271 If no such element can be found, there is no valid interpretation of the expression using $f_i$, while if more than one such (minimal-cost) element is found than an ambiguous interpretation with the result type of $f_i$ is produced.
    272 In the \CFA variant, which includes polymorphic functions, it is possible that a single polymorphic function definition $f_i$ can produce multiple valid interpretations by different choices of type variable bindings; these interpretations are unambiguous so long as the return type of $f_i$ is different for each type binding.
    273 If all the parameters $p_j$ of $f_i$ can be uniquely matched to a candidate interpretation, then a valid interpretation based on $f_i$ and those $p_j$ is produced.
    274 $gen\_calls$ collects the produced interpretations for each $f_i$ and returns them; a top level expression is invalid if this list is empty, ambiguous if there is more than one (minimal-cost) result, or if this single result is ambiguous, and valid otherwise.
    275 
    276 In this implementation, resolution of a single top-level expression takes time $O(\ldots)$, where \ldots. \textbf{TODO:} \textit{Look at 2.3.1 in Richard's thesis when working out complexity; I think he does get the Baker algorithm wrong on combinations though, maybe\ldots}
    277 
    278 \textbf{TODO: Basic Lit Review} \textit{Look at 2.4 in Richard's thesis for any possible more-recent citations of Baker\ldots} \textit{Look back at Baker's related work for other papers that look similar to what you're doing, then check their citations as well\ldots} \textit{Look at Richard's citations in 2.3.2 w.r.t. type data structures\ldots}
    279 \textit{CormackWright90 seems to describe a solution for the same problem, mostly focused on how to find the implicit parameters}
     247The expression resolution problem is essentially to determine an optimal matching between some combination of argument interpretations and the parameter list of some overloaded instance of a function; the argument interpretations are produced by recursive invocations of expression resolution, where the base case is zero-argument functions (which are, for purposes of this discussion, semantically equivalent to named variables or constant literal expressions).
     248Assuming that the matching between a function's parameter list and a combination of argument interpretations can be done in $O(p^k)$ time, where $p$ is the number of parameters and $k$ is some positive number, if there are $O(i)$ valid interpretations for each subexpression, there will be $O(i)$ candidate functions and $O(i^p)$ possible argument combinations for each expression, so a single recursive call to expression resolution will take $O(i^{p+1} \cdot p^k)$ time if it compares all combinations.
     249Given this bound, resolution of a single top-level expression tree of depth $d$ takes $O(i^{p+1} \cdot p^{k \cdot d})$ time\footnote{The call tree will have leaves at depth $O(d)$, and each internal node will have $O(p)$ fan-out, producing $O(p^d)$ total recursive calls.}.
     250Expression resolution is somewhat unavoidably exponential in $p$, the number of function parameters, and $d$, the depth of the expression tree, but these values are fixed by the user programmer, and generally bounded by reasonably small constants.
     251$k$, on the other hand, is mostly dependent on the representation of types in the system and the efficiency of type assertion checking; if a candidate argument combination can be compared to a function parameter list in linear time in the length of the list (\ie $k = 1$), then the $p^{k \cdot d}$ term is linear in the input size of the source code for the expression, otherwise the resolution algorithm will exibit sub-linear performance scaling on code containing more-deeply nested expressions.
     252The number of valid interpretations of any subexpression, $i$, is bounded by the number of types in the system, which is possibly infinite, though practical resolution algorithms for \CFA must be able to place some finite bound on $i$, possibly at the expense of type system completeness.
     253
     254The research goal of this project is to develop a performant expression resolver for \CFA; this analysis suggests two primary areas of investigation to accomplish that end.
     255The first is efficient argument-parameter matching; Bilson\cite{Bilson03} mentions significant optimization opportunities available in the current literature to improve on the existing {CFA-CC} compiler \textbf{TODO:} \textit{look up and lit review}.
     256The second, and likely more fruitful, area of investigation is heuristics and algorithmic approaches to reduce the number of argument interpretations considered in the common case; given the large ($p+1$) exponent on number of interpretations considered in the runtime analysis, even small reductions here could have a significant effect on overall resolver runtime.
     257The discussion below presents a number of largely orthagonal axes for expression resolution algorithm design to be investigated, noting prior work where applicable.
     258
     259\subsection{Argument-Parameter Matching}
     260The first axis we consider is argument-parameter matching - whether the type matching for a candidate function to a set of candidate arguments is directed by the argument types or the parameter types.
     261
     262\subsubsection{Argument-directed (``Bottom-up'')}
     263Baker's algorithm for expression resolution\cite{Baker82} pre-computes argument candidates, from the leaves of the expression tree up.
     264For each candidate function, Baker attempts to match argument types to parameter types in sequence, failing if any parameter cannot be matched.
     265
     266Bilson\cite{Bilson03} similarly pre-computes argument candidates in the original \CFA compiler, but then explicitly enumerates all possible argument combinations for a multi-parameter function; these argument combinations are matched to the parameter types of the candidate function as a unit rather than individual arguments.
     267This is less efficient than Baker's approach, as the same argument may be compared to the same parameter many times, but allows a more straightforward handling of polymorphic type binding and multiple return types.
     268It is possible the efficiency losses here relative to Baker could be significantly reduced by application of memoization to the argument-parameter type comparisons.
     269
     270\subsubsection{Parameter-directed (``Top-down'')}
     271Unlike Baker and Bilson, Cormack's algorithm\cite{Cormack81} requests argument candidates which match the type of each parameter of each candidate function, from the top-level expression down; memoization of these requests is presented as an optimization.
     272As presented, this algorithm requires the result of the expression to have a known type, though an algorithm based on Cormack's could reasonably request a candidate set of any return type, though such a set may be quite large.
     273
     274\subsubsection{Hybrid}
     275This proposal includes the investigation of hybrid top-down/bottom-up argument-parameter matching.
     276A reasonable hybrid approach might be to take a top-down approach when the expression to be matched is known to have a fixed type, and a bottom-up approach in untyped contexts.
     277This may include switches from one type to another at different levels of the expression tree, for instance:
     278\begin{lstlisting}
     279forall(otype T)
     280int f(T x);  // (1)
     281
     282void* f(char y);  // (2)
     283
     284int x = f( f( '!' ) );
     285\end{lstlisting}
     286Here, the outer call to ©f© must have a return type that is (implicitly convertable to) ©int©, so a top-down approach could be used to select \textit{(1)} as the proper interpretation of ©f©. \textit{(1)}'s parameter ©x© here, however, is an unbound type variable, and can thus take a value of any complete type, providing no guidance for the choice of candidate for the inner ©f©. The leaf expression ©'!'©, however, gives us a zero-cost interpretation of the inner ©f© as \textit{(2)}, providing a minimal-cost expression resolution where ©T© is bound to ©void*©.
     287
     288Deciding when to switch between bottom-up and top-down resolution in a hybrid algorithm is a necessarily heuristic process, and though finding good heuristics for it is an open question, one reasonable approach might be to switch from top-down to bottom-up when the number of candidate functions exceeds some threshold.
     289
     290\subsection{Implicit Conversion Application}
     291Baker's\cite{Baker82} and Cormack's\cite{Cormack81} algorithms do not account for implicit conversions\footnote{Baker does briefly comment on an approach for handling implicit conversions.}; both assume that there is at most one valid interpretation of a given expression for each distinct type.
     292Integrating implicit conversion handling into their algorithms provides some choice of implementation approach.
     293
     294\subsubsection{On Parameters}
     295Bilson\cite{Bilson03} did account for implicit conversions in his algorithm, but it is not clear his approach is optimal.
     296His algorithm integrates checking for valid implicit conversions into the argument-parameter matching step, essentially trading more expensive matching for a smaller number of argument interpretations.
     297This approach may result in the same subexpression being checked for a type match with the same type multiple times, though again memoization may mitigate this cost, and this approach will not generate implicit conversions that are not useful to match the containing function.
     298
     299\subsubsection{On Arguments}
     300Another approach would be to generate a set of possible implicit conversions for each set of interpretations of a given argument.
     301This would have the benefit of detecting ambiguous interpretations of arguments at the level of the argument rather than its containing call, and would also never find more than one interpretation of the argument with a given type.
     302On the other hand, this approach may unncessarily generate argument interpretations that will never match a parameter, wasting work.
     303
     304\subsection{Candidate Set Generation}
     305
     306\subsubsection{Eager}
     307
     308\subsubsection{Lazy}
     309
     310\subsubsection{Stepwise Lazy}
     311
     312%\subsection{Parameter-Directed}
     313%\textbf{TODO: Richard's algorithm isn't Baker (Cormack?), disentangle from this section \ldots}.
     314%The expression resolution algorithm used by the existing iteration of {CFA-CC} is based on Baker's\cite{Baker82} algorithm for overload resolution in Ada.
     315%The essential idea of this algorithm is to first find the possible interpretations of the most deeply nested subexpressions, then to use these interpretations to recursively generate valid interpretations of their superexpressions.
     316%To simplify matters, the only expressions considered in this discussion of the algorithm are function application and literal expressions; other expression types can generally be considered to be variants of one of these for the purposes of the resolver, \eg variables are essentially zero-argument functions.
     317%If we consider expressions as graph nodes with arcs connecting them to their subexpressions, these expressions form a DAG, generated by the algorithm from the bottom up.
     318%Literal expressions are represented by leaf nodes, annotated with the type of the expression, while a function application will have a reference to the function declaration chosen, as well as arcs to the interpretation nodes for its argument expressions; functions are annotated with their return type (or types, in the case of multiple return values).
     319%
     320%\textbf{TODO: Figure}
     321%
     322%Baker's algorithm was designed to account for name overloading; Richard Bilson\cite{Bilson03} extended this algorithm to also handle polymorphic functions, implicit conversions \& multiple return types when designing the original \CFA compiler.
     323%The core of the algorithm is a function which Baker refers to as $gen\_calls$.
     324%$gen\_calls$ takes as arguments the name of a function $f$ and a list containing the set of possible subexpression interpretations $S_j$ for each argument of the function and returns a set of possible interpretations of calling that function on those arguments.
     325%The subexpression interpretations are generally either singleton sets generated by the single valid interpretation of a literal expression, or the results of a previous call to $gen\_calls$.
     326%If there are no valid interpretations of an expression, the set returned by $gen\_calls$ will be empty, at which point resolution can cease, since each subexpression must have at least one valid interpretation to produce an interpretation of the whole expression.
     327%On the other hand, if for some type $T$ there is more than one valid interpretation of an expression with type $T$, all interpretations of that expression with type $T$ can be collapsed into a single \emph{ambiguous expression} of type $T$, since the only way to disambiguate expressions is by their return types.
     328%If a subexpression interpretation is ambiguous, than any expression interpretation containing it will also be ambiguous.
     329%In the variant of this algorithm including implicit conversions, the interpretation of an expression as type $T$ is ambiguous only if there is more than one \emph{minimal-cost} interpretation of the expression as type $T$, as cheaper expressions are always chosen in preference to more expensive ones.
     330%
     331%Given this description of the behaviour of $gen\_calls$, its implementation is quite straightforward: for each function declaration $f_i$ matching the name of the function, consider each of the parameter types $p_j$ of $f_i$, attempting to match the type of an element of $S_j$ to $p_j$ (this may include checking of implicit conversions).
     332%If no such element can be found, there is no valid interpretation of the expression using $f_i$, while if more than one such (minimal-cost) element is found than an ambiguous interpretation with the result type of $f_i$ is produced.
     333%In the \CFA variant, which includes polymorphic functions, it is possible that a single polymorphic function definition $f_i$ can produce multiple valid interpretations by different choices of type variable bindings; these interpretations are unambiguous so long as the return type of $f_i$ is different for each type binding.
     334%If all the parameters $p_j$ of $f_i$ can be uniquely matched to a candidate interpretation, then a valid interpretation based on $f_i$ and those $p_j$ is produced.
     335%$gen\_calls$ collects the produced interpretations for each $f_i$ and returns them; a top level expression is invalid if this list is empty, ambiguous if there is more than one (minimal-cost) result, or if this single result is ambiguous, and valid otherwise.
     336%
     337%In this implementation, resolution of a single top-level expression takes time $O(\ldots)$, where \ldots. \textbf{TODO:} \textit{Look at 2.3.1 in Richard's thesis when working out complexity; I think he does get the Baker algorithm wrong on combinations though, maybe\ldots}
     338%
     339%\textbf{TODO: Basic Lit Review} \textit{Look at 2.4 in Richard's thesis for any possible more-recent citations of Baker\ldots} \textit{Look back at Baker's related work for other papers that look similar to what you're doing, then check their citations as well\ldots} \textit{Look at Richard's citations in 2.3.2 w.r.t. type data structures\ldots}
     340%\textit{CormackWright90 seems to describe a solution for the same problem, mostly focused on how to find the implicit parameters}
    280341
    281342\section{Proposal}
     343\textbf{TODO:} Talk about experimental setup here.
    282344
    283345\section{Completion Timeline}
Note: See TracChangeset for help on using the changeset viewer.