Index: doc/theses/aaron_moss_PhD/phd/Makefile
===================================================================
--- doc/theses/aaron_moss_PhD/phd/Makefile	(revision 7c6eb646410144a5ce00d55dddb6822a82db919e)
+++ doc/theses/aaron_moss_PhD/phd/Makefile	(revision 4c41b17f9e72343a9e9c2c718c0ce820021b603a)
@@ -39,4 +39,5 @@
 generic-timing \
 tests-completed \
+per-prob-histo \
 }
 
@@ -69,4 +70,7 @@
 	gnuplot -e BUILD="'${BUILD}/'" ${EVALDIR}/algo-summary.gp
 
+per-prob-histo.tex : per-prob.gp per-prob.tsv ${BUILD}
+	gnuplot -e BUILD="'${BUILD}/'" ${EVALDIR}/per-prob.gp
+
 ${BUILD}: 
 	mkdir -p ${BUILD}
Index: doc/theses/aaron_moss_PhD/phd/evaluation/per-prob.gp
===================================================================
--- doc/theses/aaron_moss_PhD/phd/evaluation/per-prob.gp	(revision 4c41b17f9e72343a9e9c2c718c0ce820021b603a)
+++ doc/theses/aaron_moss_PhD/phd/evaluation/per-prob.gp	(revision 4c41b17f9e72343a9e9c2c718c0ce820021b603a)
@@ -0,0 +1,29 @@
+# set terminal pdfcairo linewidth 3 size 6,3
+# set output "per-prob-histo.pdf"
+set terminal pslatex size 6.25,2.125 color solid
+set output BUILD."per-prob-histo.tex"
+
+set linetype 1 lc rgb 'black'
+set linetype 2 lc rgb 'red'
+set linetype 3 lc rgb 'blue'
+set linetype 4 lc rgb 'green'
+
+set style data histogram
+set style histogram cluster # gap 2
+set style fill pattern 4 border lt -1
+
+set xlabel "expression time (ms)"
+set xtics nomirror
+
+set ylabel "count"
+set logscale y
+set format y "$%.0f$"
+set ytics autofreq nomirror
+
+set y2label "ms"
+set logscale y2
+set format y2 "$%.0f$"
+set y2tics autofreq nomirror
+
+plot 'evaluation/per-prob.tsv' using 2:xticlabels(1) title 'expressions (count)',\
+     'evaluation/per-prob.tsv' using (column(3)/1000):xticlabels(1) title 'total time (ms)' axes x1y2
Index: doc/theses/aaron_moss_PhD/phd/evaluation/per-prob.tsv
===================================================================
--- doc/theses/aaron_moss_PhD/phd/evaluation/per-prob.tsv	(revision 4c41b17f9e72343a9e9c2c718c0ce820021b603a)
+++ doc/theses/aaron_moss_PhD/phd/evaluation/per-prob.tsv	(revision 4c41b17f9e72343a9e9c2c718c0ce820021b603a)
@@ -0,0 +1,7 @@
+#"bin"	"in bin"	"summed time"	"% exprs"	"% time"
+"$<0.01$"	10866	20422	0.527	0.001
+"$<0.1$"	1514	68775	0.073	0.003
+"$<1$"	4584	1748119	0.222	0.065
+"$<10$"	3421	6613230	0.166	0.248
+"$<100$"	201	8133176	0.010	0.304
+"$<1000$"	46	10135929	0.002	0.379
Index: doc/theses/aaron_moss_PhD/phd/experiments.tex
===================================================================
--- doc/theses/aaron_moss_PhD/phd/experiments.tex	(revision 7c6eb646410144a5ce00d55dddb6822a82db919e)
+++ doc/theses/aaron_moss_PhD/phd/experiments.tex	(revision 4c41b17f9e72343a9e9c2c718c0ce820021b603a)
@@ -98,5 +98,6 @@
 Terminal output was suppressed for all tests to avoid confounding factors in the timing results, and all tests were run three times in series, with the median result reported in all cases.
 The medians are representative data points; considering test cases that took at least 0.2~s to run, the average run was within 2\% of the reported median runtime, and no run diverged by more than 20\% of median runtime or 5.5~s. 
-The memory results are even more consistent, with no run exceeding 2\% difference from median in peak resident set size, and 93\% of tests not recording any difference within the 1~KB granularity of the measurement software.
+The memory results are even more consistent, with no run exceeding 2\% difference from median in peak resident set size, and 93\% of tests not recording any difference within the 1~KB granularity of the measurement software. 
+All tests were run on a machine with 128~GB of RAM and 64 cores running at 2.2~GHz.
 
 As a matter of experimental practicality, test runs which exceeded 8~GB of peak resident memory usage were excluded from the data set. 
@@ -156,4 +157,22 @@
 \section{Instance Difficulty}
 
+To characterize the difficulty of expression resolution problem instances, the test suites must be explored at a finer granuarity. 
+As discussed in Section~\ref{resn-analysis-sec}, a single top-level expression is the fundamental problem instance for resolution, yet the test inputs discussed above are composed of thousands of top-level expressions, like the actual source code they are derived from. 
+To pull out the effects of these individual problems, I instrumented the resolver prototype to time resolution for each expression, and also report some relevant properties of the expression. 
+This instrumented resolver was then run on a set of difficult test instances; to limit the data collection task, these runs were limited to the best-performing \textsc{bu-dca-per} algorithm and test inputs which that algorithm took more than 1~s to complete. 
+
+The 13 test inputs thus selected contain 20632 top-level expressions between them, which are separated into order-of-magnitude bins by runtime in Figure~\ref{per-prob-histo-fig}.
+As can be seen from this figure, overall runtime is dominated by a few particularly difficult problem instances --- the 60\% of expressions which resolve in under 0.1~ms collectively take less time to resolve than any of the 0.2\% of expressions which take at least 100~ms to resolve. 
+On the other hand, the 46 expressions in that 0.2\% take 38\% of the overall time in this difficult test suite, while the 201 expressions that take between 10 and 100~ms to resolve consume another 30\%. 
+
+\begin{figure}
+	\centering
+	\input{per-prob-histo}
+	\caption[Histogram of top-level expressions]{Histogram of top-level expression resolution runtime, binned by order-of-magnitude. The left series counts the expressions in each bin according to the left axis, while the right series reports the summed runtime of resolution for all expressions in that bin. Note that both y-axes are log-scaled.} \label{per-prob-histo-fig}
+\end{figure}
+
+Since the top centile of expression resolution instances requires approximately two-thirds of the resolver's time, optimizing the resolver for specific hard problem instances has proven to be an effective technique for reducing overall runtime. 
+\TODO{Discuss metrics of difficulty.}
+
 % TODO: look at overloads
 % TODO: histogram of hard cases
Index: doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex
===================================================================
--- doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex	(revision 7c6eb646410144a5ce00d55dddb6822a82db919e)
+++ doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex	(revision 4c41b17f9e72343a9e9c2c718c0ce820021b603a)
@@ -197,5 +197,5 @@
 
 To resolve the outermost !wrap!, the resolver must check that !pair(pair(int))! unifies with itself, but at three levels of nesting, !pair(pair(int))! is more complex than either !pair(T)! or !T!, the types in the declaration of !wrap!. 
-Accordingly, the cost of a single argument-parameter unification is $O(d)$, where !d! is the depth of the expression tree, and the cost of argument-parameter unification for a single candidate for a given function call expression is $O(pd)$, where $p$ is the number of parameters. 
+Accordingly, the cost of a single argument-parameter unification is $O(d)$, where $d$ is the depth of the expression tree, and the cost of argument-parameter unification for a single candidate for a given function call expression is $O(pd)$, where $p$ is the number of parameters. 
 
 Implicit conversions are also checked in argument-parameter matching, but the cost of checking for the existence of an implicit conversion is again proportional to the complexity of the type, $O(d)$.