Context Navigation

← Previous Changeset
Next Changeset →

Changeset f728971

Timestamp:

Feb 20, 2019, 2:00:37 PM (7 years ago)

Author:

Aaron Moss <a3moss@…>

Branches:

ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum

Children:

a2971cc

Parents:

95c0ebe (diff), 7e9fa47 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.

Message:

Merge branch 'aaron-thesis' of plg.uwaterloo.ca:software/cfa/cfa-cc into aaron-thesis

Location:

doc/theses/aaron_moss_PhD/phd

Files:

: 27 added
: 6 edited

Makefile (modified) (2 diffs)
evaluation/cfa-cc/cfa-bu.csv (added)
evaluation/cfa-cc/cfa-co.csv (added)
evaluation/cfa-cc/cfa-dca.csv (added)
evaluation/cfa-cc/cfa-def.csv (added)
evaluation/cfa-cc/cfa-imm.csv (added)
evaluation/cfa-mem-by-time.tsv (added)
evaluation/cfa-mem.tsv (added)
evaluation/cfa-plots.gp (added)
evaluation/cfa-time.tsv (added)
evaluation/data.xlsx (modified) ( previous)
evaluation/mem-by-max-assns.tsv (added)
evaluation/per-prob-scatter.gp (added)
evaluation/per-prob.gp (added)
evaluation/per-prob.tsv (added)
evaluation/per_prob/imgui-per-prob.csv (added)
evaluation/per_prob/io1-per-prob.csv (added)
evaluation/per_prob/io2-per-prob.csv (added)
evaluation/per_prob/kernel-per-prob.csv (added)
evaluation/per_prob/math1-per-prob.csv (added)
evaluation/per_prob/math2-per-prob.csv (added)
evaluation/per_prob/math3-per-prob.csv (added)
evaluation/per_prob/math4-per-prob.csv (added)
evaluation/per_prob/minmax-per-prob.csv (added)
evaluation/per_prob/preemption-per-prob.csv (added)
evaluation/per_prob/rational-per-prob.csv (added)
evaluation/per_prob/searchsort-per-prob.csv (added)
evaluation/per_prob/swap-per-prob.csv (added)
evaluation/time-by-max-assns.tsv (added)
experiments.tex (modified) (4 diffs)
figures/safe-conv-graph.eps (modified) ( previous)
figures/safe-conv-graph.odg (modified) ( previous)
resolution-heuristics.tex (modified) (6 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/theses/aaron_moss_PhD/phd/Makefile

-              r95c0ebe
+              rf728971
 generic-timing \
 tests-completed \
+per-prob-histo \
+per-prob-depth \
+cfa-time \
+}
 …
         gnuplot -e BUILD="'${BUILD}/'" ${EVALDIR}/algo-summary.gp
+per-prob-histo.tex : per-prob.gp per-prob.tsv ${BUILD}
+        gnuplot -e BUILD="'${BUILD}/'" ${EVALDIR}/per-prob.gp
+per-prob-depth.tex : per-prob-scatter.gp ${BUILD}
+        gnuplot -e BUILD="'${BUILD}/'" ${EVALDIR}/per-prob-scatter.gp
+cfa-time.tex : cfa-plots.gp cfa-time.tsv cfa-mem.tsv ${BUILD}
+        gnuplot -e BUILD="'${BUILD}/'" ${EVALDIR}/cfa-plots.gp
 ${BUILD}:
         mkdir -p ${BUILD}

doc/theses/aaron_moss_PhD/phd/experiments.tex

-              r95c0ebe
+              rf728971
 \TODO{test performance; shouldn't be too hard to change \texttt{resolveAssertions} to use unification}
 \section{Prototype Experiments}
+\section{Prototype Experiments} \label{proto-exp-sec}
 The primary performance experiments for this thesis were conducted using the resolver prototype on problem instances generated from actual \CFA{} code using the method described in Section~\ref{rp-features-sec}.
 …
 Terminal output was suppressed for all tests to avoid confounding factors in the timing results, and all tests were run three times in series, with the median result reported in all cases.
 The medians are representative data points; considering test cases that took at least 0.2~s to run, the average run was within 2\% of the reported median runtime, and no run diverged by more than 20\% of median runtime or 5.5~s.
+The memory results are even more consistent, with no run exceeding 2\% difference from median in peak resident set size, and 93\% of tests not recording any difference within the 1~KB granularity of the measurement software.
+The memory results are even more consistent, with no run exceeding 2\% difference from median in peak resident set size, and 93\% of tests not recording any difference within the 1~KB granularity of the measurement software.
+All tests were run on a machine with 128~GB of RAM and 64 cores running at 2.2~GHz.
 As a matter of experimental practicality, test runs which exceeded 8~GB of peak resident memory usage were excluded from the data set.
 …
 \section{Instance Difficulty}
+\section{\CFA{} Results}
+To characterize the difficulty of expression resolution problem instances, the test suites must be explored at a finer granuarity.
+As discussed in Section~\ref{resn-analysis-sec}, a single top-level expression is the fundamental problem instance for resolution, yet the test inputs discussed above are composed of thousands of top-level expressions, like the actual source code they are derived from.
+To pull out the effects of these individual problems, I instrumented the resolver prototype to time resolution for each expression, and also report some relevant properties of the expression.
+This instrumented resolver was then run on a set of difficult test instances; to limit the data collection task, these runs were limited to the best-performing \textsc{bu-dca-per} algorithm and test inputs which that algorithm took more than 1~s to complete.
+The 13 test inputs thus selected contain 20632 top-level expressions between them, which are separated into order-of-magnitude bins by runtime in Figure~\ref{per-prob-histo-fig}.
+As can be seen from this figure, overall runtime is dominated by a few particularly difficult problem instances --- the 60\% of expressions which resolve in under 0.1~ms collectively take less time to resolve than any of the 0.2\% of expressions which take at least 100~ms to resolve.
+On the other hand, the 46 expressions in that 0.2\% take 38\% of the overall time in this difficult test suite, while the 201 expressions that take between 10 and 100~ms to resolve consume another 30\%.
+\begin{figure}
+        \centering
+        \input{per-prob-histo}
+        \caption[Histogram of top-level expressions]{Histogram of top-level expression resolution runtime, binned by order-of-magnitude. The left series counts the expressions in each bin according to the left axis, while the right series reports the summed runtime of resolution for all expressions in that bin. Note that both y-axes are log-scaled.} \label{per-prob-histo-fig}
+\end{figure}
+Since the top centile of expression resolution instances requires approximately two-thirds of the resolver's time, optimizing the resolver for specific hard problem instances has proven to be an effective technique for reducing overall runtime.
+The data below indicates that number of assertions necessary to resolve has the greatest effect on runtime, as seen in
+Figure~\ref{per-prob-assns-fig}.
+However, since the number of assertions required is only known once resolution is finished, the most-promising pre-resolution metric of difficulty is the nesting depth of the expression; as seen in Figure~\ref{per-prob-depth-fig}, expressions of depth $> 10$ in this dataset are uniformly difficult.
+Figure~\ref{per-prob-subs-fig} presents a similar pattern for number of subexpressions, though given that the expensive tail of problem instances occurs at approximately twice the depth values, it is reasonable to believe that the difficult expressions in question are deeply-nested invocations of binary functions rather than wider but shallowly-nested expressions.
+% TODO statistics to tease out difficulty? Is ANOVA the right keyword?
+% TODO maybe metrics to sum number of poly-overloads invoked
+\begin{figure}
+\centering
+\input{per-prob-assns}
+\caption[Top-level expression resolution time by number of assertions resolved.]{Top-level expression resolution time by number of assertions resolved. Note log scales on both axes.} \label{per-prob-assns-fig}
+\end{figure}
+\begin{figure}
+\centering
+\input{per-prob-depth}
+\caption[Top-level expression resolution time by maximum nesting depth of expression.]{Top-level expression resolution time by maximum nesting depth of expression. Note log scales on both axes.} \label{per-prob-depth-fig}
+\end{figure}
+\begin{figure}
+\centering
+\input{per-prob-subs}
+\caption[Top-level expression resolution time by number of subexpressions.]{Top-level expression resolution time by number of subexpressions. Note log scales on both axes.} \label{per-prob-subs-fig}
+\end{figure}
+\section{\CFA{} Results} \label{cfa-results-sec}
+I have integrated most of the algorithmic techniques discussed in this chapter into \CFACC{}.
+This integration took place over a period of months while \CFACC{} was under active development on a number of other fronts, so it is not possible to completely isolate the effects of the algorithmic changes, but I have generated some data.
+To generate this data, representative commits from the \texttt{git} history of the project were checked out and compiled, then run on the same machine used for the resolver prototype experiments discussed in Section~\ref{proto-exp-sec}.
+To negate the effects of changes to the \CFA{} standard library on the timing results, 55 test files from the test suite of the oldest \CFA{} variant were compiled with the \texttt{-E} flag to inline their library dependencies, and these inlined files were used to test the remaining \CFACC{} versions.
+I performed two rounds of modification to \CFACC{}; the first round moved from Bilson's original combined-bottom-up algorithm to an un-combined bottom-up algorithm, denoted \textsc{cfa-co} and \textsc{cfa-bu}, respectively.
+A top-down algorithm was not attempted in \CFACC{} due to its poor performance in the prototype.
+The second round of modifications addressed assertion satisfaction, taking Bilson's original \textsc{cfa-imm} algorithm, and iteratively modifying it, first to use the deferred approach \textsc{cfa-def}, then caching those results in the \text{cfa-dca} algorithm.
+The new environment data structures discussed in Section~\ref{proto-exp-sec} have not been successfully merged into \CFACC{} due to their dependencies on the garbage-collection framework in the prototype; I spent several months modifiying \CFACC{} to use similar garbage collection, but due to \CFACC{} not being designed to use such memory management the performance of the modified compiler was non-viable.
+It is possible that the persistent union-find environment could be modified to use a reference-counted pointer internally without changing the entire memory-management framework of \CFACC{}, but such an attempt is left to future work.
+As can be seen in Figures~\ref{cfa-time-fig} and~\ref{cfa-mem-fig}, which show the time and peak memory results for these five versions of \CFACC{} on the same test suite, assertion resolution dominates total resolution cost, with the \textsc{cfa-def} and \textsc{cfa-dca} variants running consistently faster than the others on more expensive test cases.
+The results from \CFACC{} do not exactly mirror those from the prototype; I conjecture this is mostly due to the different memory-management schemes and sorts of data required to run type unification and assertion satisfaction calculations, as \CFACC{} performance has proven to be particularly sensitive to the amount of heap allocation performed.
+This data also shows a noticable regression in compiler performance in the eleven months between \textsc{cfa-bu} and \textsc{cfa-imm}; this regression is not due to expression resolution, as no integration work happened in this time, but I am unable to ascertain its actual cause.
+It should also be noted with regard to the peak memory results in Figure~\ref{cfa-mem-fig} that the peak memory usage does not always occur during the resolution phase of the compiler.
+\begin{figure}
+\centering
+\input{cfa-time}
+\caption[\CFACC{} runtime against \textsc{cfa-co} baseline.]{\CFACC{} runtime against \textsc{cfa-co} baseline. Note log scales on both axes.} \label{cfa-time-fig}
+\end{figure}
+\begin{figure}
+\centering
+\input{cfa-mem}
+\caption[\CFACC{} peak memory usage against \textsc{cfa-co} baseline runtime.]{\CFACC{} peak memory usage against \textsc{cfa-co} baseline runtime. Note log scales on both axes.} \label{cfa-mem-fig}
+\end{figure}
 % use Jenkins daily build logs to rebuild speedup graph with more data
 …
 % look back at Resolution Algorithms section for threads to tie up "does the algorithm look like this?"
+\section{Conclusion}
+As can be seen from the prototype results, per-expression benchmarks, and \CFACC{}, the dominant factor in the cost of \CFA{} expression resolution is assertion satisfaction.
+Reducing the number of total number of assertion satisfaction problems solved, as in the deferred satisfaction algorithm, is consistently effective at reducing runtime, and caching results of these satisfaction problems has shown promise in the prototype system.
+The results presented here also demonstrate that a bottom-up approach to expression resolution is superior to top-down, settling an open question from Baker~\cite{Baker82}.
+The persistent union-find type environment introduced in Chapter~\ref{env-chap} has also been demonstrated to be a modest performance improvement on the na\"{\i}ve approach.
+Given the consistently strong performance of the \textsc{bu-dca-imm} and \textsc{bu-dca-per} variants of the resolver prototype, the results in this chapter demonstrate that it is possible to develop a \CFA{} compiler with acceptable runtime performance for widespread use, an important and previously unaddressed consideration for the practical viability of the language.
+However, the less-marked improvement in Section~\ref{cfa-results-sec} from retrofitting these algorithmic changes onto the existing compiler leave the actual development of a performant \CFA{} compiler to future work.
+Characterization and elimination of the performance deficits in the existing \CFACC{} has proven difficult, though runtime is generally dominated by the expression resolution phase; as such, building a new \CFA{} compiler based on the resolver prototype contributed by this work may prove to be an effective strategy.

doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex

-              r95c0ebe
+              rf728971
 \begin{itemize}
 \item If either operand is a floating-point type, the common type is the size of the largest floating-point type. If either operand is !_Complex!, the common type is also !_Complex!.
 \item If both operands are of integral type, the common type has the same size\footnote{Technically, the C standard defines a notion of \emph{rank}, a distinct value for each \lstinline{signed} and \lstinline{unsigned} pair; integral types of the same size thus may have distinct ranks. For instance, if \lstinline{int} and \lstinline{long} are the same size, \lstinline{long} will have greater rank. The standard-defined types are declared to have greater rank than any types of the same size added as compiler extensions.} as the larger type.
+\item If both operands are of integral type, the common type has the same size\footnote{Technically, the C standard defines a notion of \emph{rank} in \cite[\S{}6.3.1.1]{C11}, a distinct value for each \lstinline{signed} and \lstinline{unsigned} pair; integral types of the same size thus may have distinct ranks. For instance, if \lstinline{int} and \lstinline{long} are the same size, \lstinline{long} will have greater rank. The standard-defined types are declared to have greater rank than any types of the same size added as compiler extensions.} as the larger type.
 \item If the operands have opposite signedness, the common type is !signed! if the !signed! operand is strictly larger, or !unsigned! otherwise. If the operands have the same signedness, the common type shares it.
 \end{itemize}
 Beginning with the work of Bilson\cite{Bilson03}, \CFA{} has defined a \emph{conversion cost} for each function call in a way that generalizes C's conversion rules.
+Beginning with the work of Bilson\cite{Bilson03}, \CFA{} defines a \emph{conversion cost} for each function call in a way that generalizes C's conversion rules.
 Loosely defined, the conversion cost counts the implicit conversions utilized by an interpretation.
 With more specificity, the cost is a lexicographically-ordered tuple, where each element corresponds to a particular kind of conversion.
 In Bilson's \CFA{} design, conversion cost is a 3-tuple, $(unsafe, poly, safe)$, where $unsafe$ is the count of unsafe (narrowing) conversions, $poly$ is the count of polymorphic type bindings, and $safe$ is the sum of the degree of safe (widening) conversions.
 Degree of safe conversion is calculated as path weight in a weighted directed graph of safe conversions between types; the current version of this graph is in Figure~\ref{safe-conv-graph-fig}.
 The safe conversion graph designed such that the common type $c$ of two types $u$ and $v$ is compatible with the C standard definitions from \cite[\S{}6.3.1.8]{C11} and can be calculated as the unique type minimizing the sum of the path weights of $\overrightarrow{uc}$ and $\overrightarrow{vc}$.
+In Bilson's design, conversion cost is a 3-tuple, $(unsafe, poly, safe)$, where $unsafe$ is the count of unsafe (narrowing) conversions, $poly$ is the count of polymorphic type bindings, and $safe$ is the sum of the degree of safe (widening) conversions.
+Degree of safe conversion is calculated as path weight in a directed graph of safe conversions between types; both Bilson's version and the current version of this graph are in Figure~\ref{safe-conv-graph-fig}.
+The safe conversion graph is designed such that the common type $c$ of two types $u$ and $v$ is compatible with the C standard definitions from \cite[\S{}6.3.1.8]{C11} and can be calculated as the unique type minimizing the sum of the path weights of $\overrightarrow{uc}$ and $\overrightarrow{vc}$.
 The following example lists the cost in the Bilson model of calling each of the following functions with two !int! parameters:
 \begin{cfa}
 void f(char, long); $\C{// (1,0,1)}$
+void f(short, long); $\C{// (1,0,1)}$
 forall(otype T) void f(T, long); $\C{// (0,1,1)}$
 void f(long, long); $\C{// (0,0,2)}$
 …
 \end{cfa}
+Note that safe and unsafe conversions are handled differently; \CFA{} counts distance of safe conversions (\eg{} !int! to !long! is cheaper than !int! to !unsigned long!), while only counting the number of unsafe conversions (\eg{} !int! to !char! and !int! to !short! both have unsafe cost 1).
+Note that safe and unsafe conversions are handled differently; \CFA{} counts distance of safe conversions (\eg{} !int! to !long! is cheaper than !int! to !unsigned long!), while only counting the number of unsafe conversions (\eg{} !int! to !char! and !int! to !short! both have unsafe cost 1, as in the first fwo declarations above).
+These costs are summed over the paramters in a call; in the example above, the cost of the two !int! to !long! conversions for the fourth declaration sum equal to the one !int! to !unsigned long! conversion in the fifth.
 As part of adding reference types to \CFA{} (see Section~\ref{type-features-sec}), Schluntz added a new $reference$ element to the cost tuple, which counts the number of implicit reference-to-rvalue conversions performed so that candidate interpretations can be distinguished by how closely they match the nesting of reference types; since references are meant to act almost indistinguishably from lvalues, this $reference$ element is the least significant in the lexicographic comparison of cost tuples.
 I have also refined the \CFA{} cost model as part of this thesis work.
+Bilson's \CFA{} cost model includes the cost of polymorphic type bindings from a function's type assertions in the $poly$ element of the cost tuple; this has the effect of making more-constrained functions more expensive than less-constrained functions.
+Bilson's \CFA{} cost model includes the cost of polymorphic type bindings from a function's type assertions in the $poly$ element of the cost tuple; this has the effect of making more-constrained functions more expensive than less-constrained functions, as in the following example:
+\begin{cfa}
+forall(dtype T | { T& ++?(T&); }) T& advance(T&, int);
+forall(dtype T | { T& ++?(T&); T& ?+=?(T&, int)}) T& advance(T&, int);
+\end{cfa}
+In resolving a call to !advance!, the binding to the !T&! parameter in the assertions is added to the $poly$ cost in Bilson's model.
 However, type assertions actually make a function \emph{less} polymorphic, and as such functions with more type assertions should be preferred in type resolution.
+As an example, some iterator-based algorithms can work on a forward iterator that only provides an increment operator, but are more efficient on a random-access iterator that can be incremented by an arbitrary number of steps in a single operation.
+The random-access iterator has more type constraints, but should be chosen whenever those constraints can be satisfied.
+As such, I have added a $specialization$ element to the \CFA{} cost tuple, the values of which are always negative.
+Each type assertion subtracts 1 from $specialization$, so that more-constrained functions will cost less and thus be chosen over less-constrained functions, all else being equal.
+In the example above, the more-constrained second function can be implemented more efficiently, and as such should be chosen whenever its added constraint can be satisfied.
+As such, a $specialization$ element is now included in the \CFA{} cost tuple, the values of which are always negative.
+Each type assertion subtracts 1 from $specialization$, so that more-constrained functions cost less, and thus are chosen over less-constrained functions, all else being equal.
 A more sophisticated design would define a partial order over sets of type assertions by set inclusion (\ie{} one function would only cost less than another if it had a strict superset of assertions,  rather than just more total assertions), but I did not judge the added complexity of computing and testing this order to be worth the gain in specificity.
 I have also incorporated an unimplemented aspect of Ditchfield's earlier cost model.
+I also incorporated an unimplemented aspect of Ditchfield's earlier cost model.
 In the example below, adapted from \cite[p.89]{Ditchfield92}, Bilson's cost model only distinguished between the first two cases by accounting extra cost for the extra set of !otype! parameters, which, as discussed above, is not a desirable solution:
 …
 \end{cfa}
 I account for the fact that functions with more polymorphic variables are less constrained by introducing a $var$ cost element that counts the number of type variables on a candidate function.
+The new cost model accounts for the fact that functions with more polymorphic variables are less constrained by introducing a $var$ cost element that counts the number of type variables on a candidate function.
 In the example above, the first !f! has $var = 2$, while the remainder have $var = 1$.
 My new \CFA{} cost model also accounts for a nuance un-handled by Ditchfield or Bilson, in that it makes the more specific fourth function above cheaper than the more generic third function.
+The new cost model also accounts for a nuance un-handled by Ditchfield or Bilson, in that it makes the more specific fourth function above cheaper than the more generic third function.
 The fourth function is presumably somewhat optimized for handling pointers, but the prior \CFA{} cost model could not account for the more specific binding, as it simply counted the number of polymorphic unifications.
 In my modified model, each level of constraint on a polymorphic type in the parameter list results in a decrement of the $specialization$ cost element.
 Thus, all else equal, if both a binding to !T! and a binding to !T*! are available, \CFA{} will pick the more specific !T*! binding.
 This process is recursive, such that !T**! produces a -2 specialization cost, as opposed to the -1 cost for !T*!.
 This works similarly for generic types, \eg{} !box(T)! also has specialization cost -1.
+In the modified model, each level of constraint on a polymorphic type in the parameter list results in a decrement of the $specialization$ cost element, which is shared with the count of assertions due to their common nature as constraints on polymorphic type bindings.
+Thus, all else equal, if both a binding to !T! and a binding to !T*! are available, the model chooses the more specific !T*! binding with $specialization = -1$.
+This process is recursive, such that !T**! has $specialization = -2$.
+This calculation works similarly for generic types, \eg{} !box(T)! also has specialization cost -1.
 For multi-argument generic types, the least-specialized polymorphic parameter sets the specialization cost, \eg{} the specialization cost of !pair(T, S*)! is -1 (from !T!) rather than -2 (from !S!).
+Since the user programmer provides parameters, but cannot provide guidance on return type, specialization cost is not counted for the return type list.
+Specialization cost is not counted on the return type list; since $specialization$ is a property of the function declaration, a lower specialization cost prioritizes one declaration over another.
+User programmers can choose between functions with varying parameter lists by adjusting the arguments, but the same is not true of varying return types, so the return types are omitted from the $specialization$ element.
 Since both $vars$ and $specialization$ are properties of the declaration rather than any particular interpretation, they are prioritized less than the interpretation-specific conversion costs from Bilson's original 3-tuple.
 A final refinement I have made to the \CFA{} cost model is with regard to choosing between arithmetic conversions.
 The C standard states that the common type of !int! and !unsigned int! is !unsigned int! and that the common type of !int! and !long! is !long!, but does not provide guidance for making a choice between those two conversions.
 Bilson's \CFACC{} used conversion costs based off a graph similar to that in Figure~\ref{safe-conv-graph-fig}, but with arcs selectively removed to disambiguate the costs of such conversions.
 However, the arc removal in Bilson's design resulted in inconsistent and somewhat surprising costs, with conversion to the next-larger same-sign type generally (but not always) double the cost of conversion to the !unsigned! type of the same size.
 In my redesign, for consistency with the approach of the usual arithmetic conversions,which select a common type primarily based on size, but secondarily on sign, the costs of arcs in the new graph are defined to be $1$ to go to a larger size, but $1 + \varepsilon$ to change the sign.
+A final refinement I have made to the \CFA{} cost model is with regard to choosing among arithmetic conversions.
+The C standard \cite[\S{}6.3.1.8]{C11} states that the common type of !int! and !unsigned int! is !unsigned int! and that the common type of !int! and !long! is !long!, but does not provide guidance for making a choice among conversions.
+Bilson's \CFACC{} uses conversion costs based off the left graph in Figure~\ref{safe-conv-graph-fig}.
+However, Bilson's design results in inconsistent and somewhat surprising costs, with conversion to the next-larger same-sign type generally (but not always) double the cost of conversion to the !unsigned! type of the same size.
+In the redesign, for consistency with the approach of the usual arithmetic conversions, which select a common type primarily based on size, but secondarily on sign, arcs in the new graph are annotated with whether they represent a sign change, and such sign changes are summed in a new $sign$ cost element that lexicographically succeeds than $safe$.
 This means that sign conversions are approximately the same cost as widening conversions, but slightly more expensive (as opposed to less expensive in Bilson's graph).
-The $\varepsilon$ portion of the arc cost is implemented by adding a new $sign$ cost lexicographically after $safe$ which counts sign conversions.
 \begin{figure}
         \centering
         \includegraphics{figures/safe-conv-graph}
         \caption[Safe conversion graph.]{Safe conversion graph; plain arcs have cost $1$ while dashed sign-conversion arcs have cost $1+ \varepsilon$. The arc from \lstinline{unsigned long} to \lstinline{long long} is deliberately omitted, as on the presented system \lstinline{sizeof(long) == sizeof(long long)}.}
+        \caption[Safe conversion graphs.]{Safe conversion graphs; Bilson's on the left, the extended graph on the right. In both graphs, plain arcs have cost $safe = 1, sign = 0$ while dashed sign-conversion arcs have cost $safe = 1, sign = 1$. As per \cite[\S{}6.3.1.8]{C11}, types promote to types of the same signedness with greater rank, from \lstinline{signed} to \lstinline{unsigned} with the same rank, and from \lstinline{unsigned} to \lstinline{signed} with greater size. The arc from \lstinline{unsigned long} to \lstinline{long long} is deliberately omitted in the modified graph, as on the presented system \lstinline{sizeof(long) == sizeof(long long)}.}
         \label{safe-conv-graph-fig}
 \end{figure}
 …
 \end{equation*}
 \subsection{Expression Cost}
+\subsection{Expression Cost} \label{expr-cost-sec}
 The mapping from \CFA{} expressions to cost tuples is described by Bilson in \cite{Bilson03}, and remains effectively unchanged modulo the refinements to the cost tuple described above.
 …
 To resolve the outermost !wrap!, the resolver must check that !pair(pair(int))! unifies with itself, but at three levels of nesting, !pair(pair(int))! is more complex than either !pair(T)! or !T!, the types in the declaration of !wrap!.
 Accordingly, the cost of a single argument-parameter unification is $O(d)$, where !d! is the depth of the expression tree, and the cost of argument-parameter unification for a single candidate for a given function call expression is $O(pd)$, where $p$ is the number of parameters.
+Accordingly, the cost of a single argument-parameter unification is $O(d)$, where $d$ is the depth of the expression tree, and the cost of argument-parameter unification for a single candidate for a given function call expression is $O(pd)$, where $p$ is the number of parameters.
 Implicit conversions are also checked in argument-parameter matching, but the cost of checking for the existence of an implicit conversion is again proportional to the complexity of the type, $O(d)$.
 …
 % Mention relevance of work to C++20 concepts
+% Mention more compact representations of the (growing) cost tuple

Note: See TracChangeset for help on using the changeset viewer.