Index: doc/papers/general/.gitignore
===================================================================
--- doc/papers/general/.gitignore	(revision d52a55bf2a648f0a8e04111bde526f1d746496a1)
+++ doc/papers/general/.gitignore	(revision cfc3e0facc59b4b72c5c4ade00f35a2478f94cf2)
@@ -8,2 +8,3 @@
 Paper.out.ps
 WileyNJD-AMA.bst
+evaluation.zip
Index: doc/papers/general/Makefile
===================================================================
--- doc/papers/general/Makefile	(revision d52a55bf2a648f0a8e04111bde526f1d746496a1)
+++ doc/papers/general/Makefile	(revision cfc3e0facc59b4b72c5c4ade00f35a2478f94cf2)
@@ -45,4 +45,7 @@
 	@rm -frv ${DOCUMENT} ${BASE}.ps WileyNJD-AMA.bst ${BASE}.out.ps ${Build}
 
+evaluation.zip :
+	zip -x evaluation/.gitignore  -x evaluation/timing.xlsx -x evaluation/timing.dat -r evaluation.zip evaluation
+
 # File Dependencies #
 
@@ -66,11 +69,11 @@
 ## Define the default recipes.
 
-${Build}:
+${Build} :
 	mkdir -p ${Build}
 
-${BASE}.out.ps: ${Build}
+${BASE}.out.ps : ${Build}
 	ln -fs ${Build}/Paper.out.ps .
 
-WileyNJD-AMA.bst:
+WileyNJD-AMA.bst :
 	ln -fs ../AMA/AMA-stix/ama/WileyNJD-AMA.bst .
 
Index: doc/papers/general/Paper.tex
===================================================================
--- doc/papers/general/Paper.tex	(revision d52a55bf2a648f0a8e04111bde526f1d746496a1)
+++ doc/papers/general/Paper.tex	(revision cfc3e0facc59b4b72c5c4ade00f35a2478f94cf2)
@@ -174,4 +174,10 @@
 \lstMakeShortInline@%
 
+\let\OLDthebibliography\thebibliography
+\renewcommand\thebibliography[1]{
+  \OLDthebibliography{#1}
+  \setlength{\parskip}{0pt}
+  \setlength{\itemsep}{4pt plus 0.3ex}
+}
 
 \title{\texorpdfstring{\protect\CFA : Adding Modern Programming Language Features to C}{Cforall : Adding Modern Programming Language Features to C}}
@@ -191,6 +197,7 @@
 The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects.
 This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more.
-Nevertheless, C, first standardized over thirty years ago, lacks many features that make programming in more modern languages safer and more productive.
-The goal of the \CFA project is to create an extension of C that provides modern safety and productivity features while still ensuring strong backwards compatibility with C and its programmers.
+Nevertheless, C, first standardized almost fourty years ago, lacks many features that make programming in more modern languages safer and more productive.
+
+The goal of the \CFA project (pronounced ``C-for-all'') is to create an extension of C that provides modern safety and productivity features while still ensuring strong backwards compatibility with C and its programmers.
 Prior projects have attempted similar goals but failed to honour C programming-style; for instance, adding object-oriented or functional programming with garbage collection is a non-starter for many C developers.
 Specifically, \CFA is designed to have an orthogonal feature-set based closely on the C programming paradigm, so that \CFA features can be added \emph{incrementally} to existing C code-bases, and C programmers can learn \CFA extensions on an as-needed basis, preserving investment in existing code and programmers.
@@ -226,5 +233,5 @@
 Love it or hate it, C is extremely popular, highly used, and one of the few systems languages.
 In many cases, \CC is often used solely as a better C.
-Nevertheless, C, first standardized over thirty years ago, lacks many features that make programming in more modern languages safer and more productive.
+Nevertheless, C, first standardized almost fourty years ago~\cite{ANSI89:C}, lacks many features that make programming in more modern languages safer and more productive.
 
 \CFA (pronounced ``C-for-all'', and written \CFA or Cforall) is an evolutionary extension of the C programming language that adds modern language-features to C, while maintaining both source and runtime compatibility with C and a familiar programming model for programmers.
@@ -324,8 +331,8 @@
 int max( int a, int b ) { return a < b ? b : a; }  $\C{// (3)}$
 double max( double a, double b ) { return a < b ? b : a; }  $\C{// (4)}\CRT$
-max( 7, -max );						$\C[2.75in]{// uses (3) and (1), by matching int from constant 7}$
+max( 7, -max );						$\C{// uses (3) and (1), by matching int from constant 7}$
 max( max, 3.14 );					$\C{// uses (4) and (2), by matching double from constant 3.14}$
-max( max, -max );					$\C{// ERROR: ambiguous}$
-int m = max( max, -max );			$\C{// uses (3) and (1) twice, by matching return type}\CRT$
+max( max, -max );					$\C{// ERROR, ambiguous}$
+int m = max( max, -max );			$\C{// uses (3) and (1) twice, by matching return type}$
 \end{cfa}
 
@@ -336,5 +343,5 @@
 As is shown later, there are a number of situations where \CFA takes advantage of available type information to disambiguate, where other programming languages generate ambiguities.
 
-\Celeven added @_Generic@ expressions, which is used in preprocessor macros to provide a form of ad-hoc polymorphism;
+\Celeven added @_Generic@ expressions~\cite[\S~6.5.1.1]{C11}, which is used with preprocessor macros to provide ad-hoc polymorphism;
 however, this polymorphism is both functionally and ergonomically inferior to \CFA name overloading. 
 The macro wrapping the generic expression imposes some limitations;
@@ -369,10 +376,10 @@
 \begin{cfa}
 forall( otype T `| { T ?+?(T, T); }` ) T twice( T x ) { return x `+` x; }  $\C{// ? denotes operands}$
-int val = twice( twice( 3.7 ) );
+int val = twice( twice( 3.7 ) );  $\C{// val == 14}$
 \end{cfa}
 which works for any type @T@ with a matching addition operator.
 The polymorphism is achieved by creating a wrapper function for calling @+@ with @T@ bound to @double@, then passing this function to the first call of @twice@.
 There is now the option of using the same @twice@ and converting the result to @int@ on assignment, or creating another @twice@ with type parameter @T@ bound to @int@ because \CFA uses the return type~\cite{Cormack81,Baker82,Ada} in its type analysis.
-The first approach has a late conversion from @double@ to @int@ on the final assignment, while the second has an eager conversion to @int@.
+The first approach has a late conversion from @double@ to @int@ on the final assignment, while the second has an early conversion to @int@.
 \CFA minimizes the number of conversions and their potential to lose information, so it selects the first approach, which corresponds with C-programmer intuition.
 
@@ -420,7 +427,7 @@
 \begin{cfa}
 forall( otype T | { int ?<?( T, T ); } ) void qsort( const T * arr, size_t size ) { /* use C qsort */ }
-{
+int main() {
 	int ?<?( double x, double y ) { return x `>` y; } $\C{// locally override behaviour}$
-	qsort( vals, size );					$\C{// descending sort}$
+	qsort( vals, 10 );							$\C{// descending sort}$
 }
 \end{cfa}
@@ -534,5 +541,5 @@
 \begin{cquote}
 \lstDeleteShortInline@%
-\begin{tabular}{@{}l|@{\hspace{2\parindentlnth}}l@{}}
+\begin{tabular}{@{}l|@{\hspace{\parindentlnth}}l@{}}
 \begin{cfa}
 forall( otype R, otype S ) struct pair {
@@ -578,6 +585,5 @@
 \begin{cfa}
 struct _pair_conc0 {
-	const char * first;
-	int second;
+	const char * first;  int second;
 };
 \end{cfa}
@@ -587,6 +593,5 @@
 \begin{cfa}
 struct _pair_conc1 {
-	void * first;
-	void * second;
+	void * first, * second;
 };
 \end{cfa}
@@ -645,5 +650,5 @@
 \begin{cquote}
 \lstDeleteShortInline@%
-\begin{tabular}{@{}l@{\hspace{2\parindentlnth}}l@{}}
+\begin{tabular}{@{}l|@{\hspace{\parindentlnth}}l@{}}
 \begin{cfa}
 forall( dtype Unit ) struct scalar { unsigned long value; };
@@ -661,5 +666,5 @@
 							half_marathon;
 scalar(litres) two_pools = pool + pool;
-`marathon + pool;`	// compilation ERROR
+`marathon + pool;`	// ERROR, mismatched types
 \end{cfa}
 \end{tabular}
@@ -1006,20 +1011,15 @@
 \begin{cfa}
 forall( dtype T0, dtype T1 | sized(T0) | sized(T1) ) struct _tuple2 {
-	T0 field_0;								$\C{// generated before the first 2-tuple}$
-	T1 field_1;
+	T0 field_0;  T1 field_1;					$\C{// generated before the first 2-tuple}$
 };
 _tuple2(int, int) f() {
 	_tuple2(double, double) x;
 	forall( dtype T0, dtype T1, dtype T2 | sized(T0) | sized(T1) | sized(T2) ) struct _tuple3 {
-		T0 field_0;							$\C{// generated before the first 3-tuple}$
-		T1 field_1;
-		T2 field_2;
+		T0 field_0;  T1 field_1;  T2 field_2;	$\C{// generated before the first 3-tuple}$
 	};
 	_tuple3(int, double, int) y;
 }
 \end{cfa}
-{\sloppy
-Tuple expressions are then simply converted directly into compound literals, \eg @[5, 'x', 1.24]@ becomes @(_tuple3(int, char, double)){ 5, 'x', 1.24 }@.
-\par}%
+Tuple expressions are then converted directly into compound literals, \eg @[5, 'x', 1.24]@ becomes @(_tuple3(int, char,@ @double)){ 5, 'x', 1.24 }@.
 
 \begin{comment}
@@ -1105,5 +1105,5 @@
 \begin{cquote}
 \lstDeleteShortInline@%
-\begin{tabular}{@{}l@{\hspace{2\parindentlnth}}l@{}}
+\begin{tabular}{@{}l|@{\hspace{2\parindentlnth}}l@{}}
 \multicolumn{1}{c@{\hspace{2\parindentlnth}}}{\textbf{\CFA}}	& \multicolumn{1}{c}{\textbf{C}}	\\
 \begin{cfa}
@@ -1174,6 +1174,6 @@
 \centering
 \lstDeleteShortInline@%
-\begin{tabular}{@{}l@{\hspace{2\parindentlnth}}l@{}}
-\multicolumn{1}{c@{\hspace{2\parindentlnth}}}{\textbf{\CFA}}	& \multicolumn{1}{c}{\textbf{C}}	\\
+\begin{tabular}{@{}l|@{\hspace{\parindentlnth}}l@{}}
+\multicolumn{1}{c|@{\hspace{\parindentlnth}}}{\textbf{\CFA}}	& \multicolumn{1}{c}{\textbf{C}}	\\
 \begin{cfa}
 `choose` ( day ) {
@@ -1220,6 +1220,6 @@
 \centering
 \lstDeleteShortInline@%
-\begin{tabular}{@{}l@{\hspace{2\parindentlnth}}l@{}}
-\multicolumn{1}{c@{\hspace{2\parindentlnth}}}{\textbf{non-terminator}}	& \multicolumn{1}{c}{\textbf{target label}}	\\
+\begin{tabular}{@{}l|@{\hspace{\parindentlnth}}l@{}}
+\multicolumn{1}{c|@{\hspace{\parindentlnth}}}{\textbf{non-terminator}}	& \multicolumn{1}{c}{\textbf{target label}}	\\
 \begin{cfa}
 choose ( ... ) {
@@ -1264,6 +1264,6 @@
 \begin{figure}
 \lstDeleteShortInline@%
-\begin{tabular}{@{\hspace{\parindentlnth}}l@{\hspace{\parindentlnth}}l@{\hspace{\parindentlnth}}l@{}}
-\multicolumn{1}{@{\hspace{\parindentlnth}}c@{\hspace{\parindentlnth}}}{\textbf{\CFA}}	& \multicolumn{1}{@{\hspace{\parindentlnth}}c}{\textbf{C}}	\\
+\begin{tabular}{@{\hspace{\parindentlnth}}l|@{\hspace{\parindentlnth}}l@{\hspace{\parindentlnth}}l@{}}
+\multicolumn{1}{@{\hspace{\parindentlnth}}c|@{\hspace{\parindentlnth}}}{\textbf{\CFA}}	& \multicolumn{1}{@{\hspace{\parindentlnth}}c}{\textbf{C}}	\\
 \begin{cfa}
 `LC:` {
@@ -1349,5 +1349,5 @@
 \subsection{Exception Handling}
 
-The following framework for \CFA exception handling is in place, excluding some runtime type-information and virtual functions.
+The following framework for \CFA exception-handling is in place, excluding some runtime type-information and virtual functions.
 \CFA provides two forms of exception handling: \newterm{fix-up} and \newterm{recovery} (see Figure~\ref{f:CFAExceptionHandling})~\cite{Buhr92b,Buhr00a}.
 Both mechanisms provide dynamic call to a handler using dynamic name-lookup, where fix-up has dynamic return and recovery has static return from the handler.
@@ -1360,6 +1360,6 @@
 \begin{cquote}
 \lstDeleteShortInline@%
-\begin{tabular}{@{}l@{\hspace{2\parindentlnth}}l@{}}
-\multicolumn{1}{c@{\hspace{2\parindentlnth}}}{\textbf{Resumption}}	& \multicolumn{1}{c}{\textbf{Termination}}	\\
+\begin{tabular}{@{}l|@{\hspace{\parindentlnth}}l@{}}
+\multicolumn{1}{c|@{\hspace{\parindentlnth}}}{\textbf{Resumption}}	& \multicolumn{1}{c}{\textbf{Termination}}	\\
 \begin{cfa}
 `exception R { int fix; };`
@@ -1852,21 +1852,6 @@
 This provides a much more orthogonal design for library implementors, obviating the need for workarounds such as @std::reference_wrapper@.
 Secondly, \CFA references are rebindable, whereas \CC references have a fixed address.
-\newsavebox{\LstBox}
-\begin{lrbox}{\LstBox}
-\lstset{basicstyle=\footnotesize\linespread{0.9}\sf}
-\begin{cfa}
-int & r = *new( int );
-...											$\C{// non-null reference}$
-delete &r;									$\C{// unmanaged (programmer) memory-management}$
-r += 1;										$\C{// undefined reference}$
-\end{cfa}
-\end{lrbox}
 Rebinding allows \CFA references to be default-initialized (\eg to a null pointer\footnote{
-While effort has been made into non-null reference checking in \CC and Java, the exercise seems moot for any non-managed languages (C/\CC), given that it only handles one of many different error situations:
-\begin{cquote}
-\usebox{\LstBox}
-\end{cquote}
-}%
-) and point to different addresses throughout their lifetime, like pointers.
+While effort has been made into non-null reference checking in \CC and Java, the exercise seems moot for any non-managed languages (C/\CC), given that it only handles one of many different error situations, \eg using a pointer after its storage is deleted.}) and point to different addresses throughout their lifetime, like pointers.
 Rebinding is accomplished by extending the existing syntax and semantics of the address-of operator in C.
 
@@ -1880,5 +1865,5 @@
 \begin{itemize}
 \item
-if @R@ is an rvalue of type {@T &@$_1 \cdots$@ &@$_r$} where $r \ge 1$ references (@&@ symbols) than @&R@ has type {@T `*`&@$_{\color{red}2} \cdots$@ &@$_{\color{red}r}$}, \\ \ie @T@ pointer with $r-1$ references (@&@ symbols).
+if @R@ is an rvalue of type {@T &@$_1 \cdots$@ &@$_r$} where $r \ge 1$ references (@&@ symbols) then @&R@ has type {@T `*`&@$_{\color{red}2} \cdots$@ &@$_{\color{red}r}$}, \\ \ie @T@ pointer with $r-1$ references (@&@ symbols).
 	
 \item
@@ -1914,5 +1899,5 @@
 \end{cfa}
 This allows complex values to be succinctly and efficiently passed to functions, without the syntactic overhead of explicit definition of a temporary variable or the runtime cost of pass-by-value.
-\CC allows a similar binding, but only for @const@ references; the more general semantics of \CFA are an attempt to avoid the \newterm{const hell} problem, in which addition of a @const@ qualifier to one reference requires a cascading chain of added qualifiers.
+\CC allows a similar binding, but only for @const@ references; the more general semantics of \CFA are an attempt to avoid the \newterm{const poisoning} problem~\cite{Taylor10}, in which addition of a @const@ qualifier to one reference requires a cascading chain of added qualifiers.
 
 
@@ -1928,5 +1913,4 @@
 \begin{tabular}{@{}l@{\hspace{3em}}l|l@{}}
 \multicolumn{1}{c@{\hspace{3em}}}{\textbf{C Type Nesting}}	& \multicolumn{1}{c|}{\textbf{C Implicit Hoisting}}	& \multicolumn{1}{c}{\textbf{\CFA}}	\\
-\hline
 \begin{cfa}
 struct S {
@@ -2259,5 +2243,5 @@
 	W w, heavy = { 20 };
 	w = 155|_lb|;
-	w = 0b1111|_lb|;       // error, binary unsupported
+	// binary unsupported
 	w = 0${\color{red}\LstBasicStyle{'}}$233|_lb|;          // quote separator
 	w = 0x9b|_kg|;
@@ -2307,11 +2291,9 @@
 \begin{cquote}
 \lstDeleteShortInline@%
-\begin{tabular}{@{}l@{\hspace{2\parindentlnth}}l@{}}
-\multicolumn{1}{c@{\hspace{2\parindentlnth}}}{\textbf{\CFA}}	& \multicolumn{1}{c}{\textbf{C}}	\\
+\begin{tabular}{@{}l@{\hspace{\parindentlnth}}l@{}}
+\multicolumn{1}{c@{\hspace{\parindentlnth}}}{\textbf{\CFA}}	& \multicolumn{1}{c}{\textbf{C}}	\\
 \begin{cfa}
 MIN
-
 MAX
-
 PI
 E
@@ -2319,8 +2301,6 @@
 &
 \begin{cfa}
-SCHAR_MIN, CHAR_MIN, SHRT_MIN, INT_MIN, LONG_MIN,
-	LLONG_MIN, FLT_MIN, DBL_MIN, LDBL_MIN
-SCHAR_MAX, UCHAR_MAX, SHRT_MAX, INT_MAX, LONG_MAX,
-	LLONG_MAX, FLT_MAX, DBL_MAX, LDBL_MAX
+CHAR_MIN, SHRT_MIN, INT_MIN, LONG_MIN, LLONG_MIN, FLT_MIN, DBL_MIN, LDBL_MIN
+UCHAR_MAX, SHRT_MAX, INT_MAX, LONG_MAX, LLONG_MAX, FLT_MAX, DBL_MAX, LDBL_MAX
 M_PI, M_PIl
 M_E, M_El
@@ -2441,4 +2421,6 @@
 
 \begin{table}
+\caption{Storage-Management Operations}
+\label{t:StorageManagementOperations}
 \centering
 \lstDeleteShortInline@%
@@ -2460,6 +2442,4 @@
 \lstDeleteShortInline~%
 \lstMakeShortInline@%
-\caption{Storage-Management Operations}
-\label{t:StorageManagementOperations}
 \end{table}
 
@@ -2589,5 +2569,5 @@
 \end{cquote}
 There is a weak similarity between the \CFA logical-or operator and the Shell pipe-operator for moving data, where data flows in the correct direction for input but the opposite direction for output.
-
+\begin{comment}
 The implicit separator character (space/blank) is a separator not a terminator.
 The rules for implicitly adding the separator are:
@@ -2608,4 +2588,5 @@
 }%
 \end{itemize}
+\end{comment}
 There are functions to set and get the separator string, and manipulators to toggle separation on and off in the middle of output.
 
@@ -2656,10 +2637,13 @@
 
 
-\section{Evaluation}
+\section{Polymorphism Evaluation}
 \label{sec:eval}
 
-Though \CFA provides significant added functionality over C, these features have a low runtime penalty.
-In fact, \CFA's features for generic programming can enable faster runtime execution than idiomatic @void *@-based C code.
-This claim is demonstrated through a set of generic-code-based micro-benchmarks in C, \CFA, and \CC (see stack implementations in Appendix~\ref{sec:BenchmarkStackImplementations}).
+\CFA adds parametric polymorphism to C.
+A runtime evaluation is performed to compare the cost of alternative styles of polymorphism.
+The goal is to compare just the underlying mechanism for implementing different kinds of polymorphism.
+% Though \CFA provides significant added functionality over C, these features have a low runtime penalty.
+% In fact, it is shown that \CFA's generic programming can enable faster runtime execution than idiomatic @void *@-based C code.
+The experiment is a set of generic-stack micro-benchmarks~\cite{CFAStackEvaluation} in C, \CFA, and \CC (see implementations in Appendix~\ref{sec:BenchmarkStackImplementations}).
 Since all these languages share a subset essentially comprising standard C, maximal-performance benchmarks should show little runtime variance, differing only in length and clarity of source code.
 A more illustrative comparison measures the costs of idiomatic usage of each language's features.
@@ -2692,9 +2676,9 @@
 \end{figure}
 
-The structure of each benchmark implemented is: C with @void *@-based polymorphism, \CFA with the presented features, \CC with templates, and \CC using only class inheritance for polymorphism, called \CCV.
+The structure of each benchmark implemented is: C with @void *@-based polymorphism, \CFA with parametric polymorphism, \CC with templates, and \CC using only class inheritance for polymorphism, called \CCV.
 The \CCV variant illustrates an alternative object-oriented idiom where all objects inherit from a base @object@ class, mimicking a Java-like interface;
 hence runtime checks are necessary to safely down-cast objects.
 The most notable difference among the implementations is in memory layout of generic types: \CFA and \CC inline the stack and pair elements into corresponding list and pair nodes, while C and \CCV lack such a capability and instead must store generic objects via pointers to separately-allocated objects.
-Note that the C benchmark uses unchecked casts as there is no runtime mechanism to perform such checks, while \CFA and \CC provide type-safety statically.
+Note, the C benchmark uses unchecked casts as C has no runtime mechanism to perform such checks, while \CFA and \CC provide type-safety statically.
 
 Figure~\ref{fig:eval} and Table~\ref{tab:eval} show the results of running the benchmark in Figure~\ref{fig:BenchmarkTest} and its C, \CC, and \CCV equivalents.
@@ -2711,7 +2695,7 @@
 
 \begin{table}
-\centering
 \caption{Properties of benchmark code}
 \label{tab:eval}
+\centering
 \newcommand{\CT}[1]{\multicolumn{1}{c}{#1}}
 \begin{tabular}{rrrrr}
@@ -2726,8 +2710,9 @@
 The C and \CCV variants are generally the slowest with the largest memory footprint, because of their less-efficient memory layout and the pointer-indirection necessary to implement generic types;
 this inefficiency is exacerbated by the second level of generic types in the pair benchmarks.
-By contrast, the \CFA and \CC variants run in roughly equivalent time for both the integer and pair of @short@ and @char@ because the storage layout is equivalent, with the inlined libraries (\ie no separate compilation) and greater maturity of the \CC compiler contributing to its lead.
+By contrast, the \CFA and \CC variants run in roughly equivalent time for both the integer and pair because of equivalent storage layout, with the inlined libraries (\ie no separate compilation) and greater maturity of the \CC compiler contributing to its lead.
 \CCV is slower than C largely due to the cost of runtime type-checking of down-casts (implemented with @dynamic_cast@);
-The outlier in the graph for \CFA, pop @pair@, results from the complexity of the generated-C polymorphic code.
-The gcc compiler is unable to optimize some dead code and condense nested calls; a compiler designed for \CFA could easily perform these optimizations.
+The outlier for \CFA, pop @pair@, results from the complexity of the generated-C polymorphic code.
+The gcc compiler is unable to optimize some dead code and condense nested calls;
+a compiler designed for \CFA could easily perform these optimizations.
 Finally, the binary size for \CFA is larger because of static linking with the \CFA libraries.
 
@@ -2746,4 +2731,6 @@
 The \CFA benchmark is able to eliminate all redundant type annotations through use of the polymorphic @alloc@ function discussed in Section~\ref{sec:libraries}.
 
+We conjecture these results scale across most generic data-types as the underlying polymorphic implement is constant.
+
 
 \section{Related Work}
@@ -2751,4 +2738,11 @@
 
 \subsection{Polymorphism}
+
+ML~\cite{ML} was the first language to support parametric polymorphism.
+Like \CFA, it supports universal type parameters, but not the use of assertions and traits to constrain type arguments.
+Haskell~\cite{Haskell10} combines ML-style polymorphism, polymorphic data types, and type inference with the notion of type classes, collections of overloadable methods that correspond in intent to traits in \CFA.
+Unlike \CFA, Haskell requires an explicit association between types and their classes that specifies the implementation of operations.
+These associations determine the functions that are assertion arguments for particular combinations of class and type, in contrast to \CFA where the assertion arguments are selected at function call sites based upon the set of operations in scope at that point.
+Haskell also severely restricts the use of overloading: an overloaded name can only be associated with a single class, and methods with overloaded names can only be defined as part of instance declarations.
 
 \CC provides three disjoint polymorphic extensions to C: overloading, inheritance, and templates.
@@ -2804,5 +2798,5 @@
 Go does not have tuples but supports MRVF.
 Java's variadic functions appear similar to C's but are type-safe using homogeneous arrays, which are less useful than \CFA's heterogeneously-typed variadic functions.
-Tuples are a fundamental abstraction in most functional programming languages, such as Standard ML~\cite{sml} and~\cite{Scala}, which decompose tuples using pattern matching.
+Tuples are a fundamental abstraction in most functional programming languages, such as Standard ML~\cite{sml}, Haskell, and Scala~\cite{Scala}, which decompose tuples using pattern matching.
 
 
@@ -2835,10 +2829,8 @@
 Finally, we demonstrate that \CFA performance for some idiomatic cases is better than C and close to \CC, showing the design is practically applicable.
 
-There is ongoing work on a wide range of \CFA features, including arrays with size, runtime type-information, virtual functions, user-defined conversions, concurrent primitives, and modules.
-While all examples in the paper compile and run, a public beta-release of \CFA will take another 8--12 months to finalize these extensions.
-There are also interesting future directions for the polymorphism design.
-Notably, \CC template functions trade compile time and code bloat for optimal runtime of individual instantiations of polymorphic functions.
-\CFA polymorphic functions use dynamic virtual-dispatch;
-the runtime overhead of this approach is low, but not as low as inlining, and it may be beneficial to provide a mechanism for performance-sensitive code.
+While all examples in the paper compile and run, a public beta-release of \CFA will take 6--8 months to reduce compilation time, provide better debugging, and add a few more libraries.
+There is also new work on a number of \CFA features, including arrays with size, runtime type-information, virtual functions, user-defined conversions, and modules.
+While \CFA polymorphic functions use dynamic virtual-dispatch with low runtime overhead (see Section~\ref{sec:eval}), it is not as low as \CC template-inlining.
+Hence it may be beneficial to provide a mechanism for performance-sensitive code.
 Two promising approaches are an @inline@ annotation at polymorphic function call sites to create a template-specialization of the function (provided the code is visible) or placing an @inline@ annotation on polymorphic function-definitions to instantiate a specialized version for some set of types (\CC template specialization).
 These approaches are not mutually exclusive and allow performance optimizations to be applied only when necessary, without suffering global code-bloat.
@@ -2849,9 +2841,10 @@
 
 The authors would like to recognize the design assistance of Glen Ditchfield, Richard Bilson, Thierry Delisle, Andrew Beach and Brice Dobry on the features described in this paper, and thank Magnus Madsen for feedback on the writing.
-This work is supported by a corporate partnership with Huawei Ltd.\ (\url{http://www.huawei.com}), and Aaron Moss and Peter Buhr are partially funded by the Natural Sciences and Engineering Research Council of Canada.
-
-
+Funding for this project has been provided by Huawei Ltd.\ (\url{http://www.huawei.com}), and Aaron Moss and Peter Buhr are partially funded by the Natural Sciences and Engineering Research Council of Canada.
+
+{%
+\fontsize{9bp}{12bp}\selectfont%
 \bibliography{pl}
-
+}%
 
 \appendix
Index: doc/papers/general/evaluation/timing.gp
===================================================================
--- doc/papers/general/evaluation/timing.gp	(revision d52a55bf2a648f0a8e04111bde526f1d746496a1)
+++ doc/papers/general/evaluation/timing.gp	(revision cfc3e0facc59b4b72c5c4ade00f35a2478f94cf2)
@@ -25,5 +25,5 @@
 
 set label "23.9" at 7.125,10.5
-
+set style fill pattern 4 border lt -1
 # set datafile separator ","
 plot for [COL=2:5] 'evaluation/timing.dat' using (column(COL)/SCALE):xticlabels(1) title columnheader
Index: doc/papers/general/response
===================================================================
--- doc/papers/general/response	(revision cfc3e0facc59b4b72c5c4ade00f35a2478f94cf2)
+++ doc/papers/general/response	(revision cfc3e0facc59b4b72c5c4ade00f35a2478f94cf2)
@@ -0,0 +1,373 @@
+Date: Thu, 19 Apr 2018 17:01:14 -0400 (EDT)
+From: "Software: Practice and Experience" <onbehalfof@manuscriptcentral.com>
+Reply-To: judithbishop@outlook.com
+To: a3moss@uwaterloo.ca, rschlunt@uwaterloo.ca, pabuhr@uwaterloo.ca
+Subject: Software: Practice and Experience - Decision on Manuscript ID SPE-18-0065
+
+19-Apr-2018
+
+Dear Dr Buhr,
+
+Many thanks for submitting SPE-18-0065 entitled "Cforall : Adding Modern Programming
+Language Features to C" to Software: Practice and Experience. The paper has now
+been reviewed and the comments of the referee(s) are included at the bottom of
+this letter.
+
+I am delighted to inform you that the referee(s) have recommended publication,
+but also suggest some minor revisions to your manuscript.  Therefore, I invite
+you to respond to the referee(s)' comments and revise your manuscript. All of
+the referees' comments are important, but I would like you to pay special
+attention to the following general points:
+
+1. If there is any more evaluation that a stack, please add it.
+2. How is the compiler implemented?
+3. What features are actually implemented?
+4. The article lacks some related work. such as Haskell, ML etc.
+5. Most of the content in Section 10 RELATED WORK appears to belong to Section 1 INTRODUCTION as a Subsection or as a new Section after Section 1.
+6. Many references are not properly formatted
+7. A statement about any presence or absence of conflicts of interest with Huawei should be explicitly added.
+
+The paper is long by SPE standards (33 pages). We have a maximum of 40
+pages. Please do not extend the paper beyond 35 pages. If necessary, find ways
+to cut the examples or text. If you have an accompanying website for the system
+where some examples are stored, please mention it.
+
+You have 42 days from the date of this email to submit your revision. If you
+are unable to complete the revision within this time, please contact me to
+request a short extension.
+
+You can upload your revised manuscript and submit it through your Author
+Center. Log into https://mc.manuscriptcentral.com/spe and enter your Author
+Center, where you will find your manuscript title listed under "Manuscripts
+with Decisions".
+
+When submitting your revised manuscript, you will be able to respond to the
+comments made by the referee(s) in the space provided.  You can use this space
+to document any changes you make to the original manuscript.
+
+If you feel that your paper could benefit from English language polishing, you
+may wish to consider having your paper professionally edited for English
+language by a service such as Wiley's at
+http://wileyeditingservices.com. Please note that while this service will
+greatly improve the readability of your paper, it does not guarantee acceptance
+of your paper by the journal.
+
+Once again, thank you for submitting your manuscript to Software: Practice and Experience. I look forward to receiving your revision.
+
+Sincerely,
+
+Dr Judith Bishop
+Editor, Software: Practice and Experience
+judithbishop@outlook.com
+
+Referee(s)' Comments to Author:
+
+Reviewing: 1
+
+   Most of the content in Section 10 RELATED WORK appears to belong to Section
+   1 INTRODUCTION as a Subsection or as a new Section after Section 1. (Please
+   also see #4.1 below.) Remaining discussion that cannot be moved earlier can
+   become a DISCUSSION Section or a Subsection within the last Section of the
+   paper.
+
+Sometimes it is appropriate to put related work at the start of a paper and
+sometimes at the end. For this paper, it seems appropriate to put the related
+work at the end of the paper. The purpose of the related work in this paper is
+two fold: to introduce prior work and to contrast it with Cforall.  Only at the
+end of the paper does the reader have sufficient knowledge about Cforall to
+make detailed contrasts with other programming languages possible. If the
+related work is moved to the end of the introduction, the reader knows nothing
+about Cforall so talking about other programming languages in isolation makes
+little sense, especially non-C-related languages, like Java, Go, Rust,
+Haskell. We see no easy way to separate the related work into a general
+discussion at the start and a specific discussion at the end. We explicitly
+attempt to deal with the reader's anticipation at the end of the introduction:
+
+ Finally, it is impossible to describe a programming language without usages
+ before definitions.  Therefore, syntax and semantics appear before
+ explanations; hence, patience is necessary until details are presented.
+
+
+   2. Presentation
+
+   2.1 More information should be moved from the text and added to Figure 10 and
+   Table 2 so that readers can understand the comparison quickly. Imagine a reader
+   read the summary and jump directly to these two display elements. Questions
+   would be raised about the binary size and pop pair result of Cforall and it
+   would take time to find answers in the text.
+
+This suggestion is an alternative writing style. The experiment is complex
+enough that it is unlikely a reader could jump to the table/graph and
+understand the experiment without putting a substantive amount of the text from
+Section 9 into the table and figure, which the reader then has to read anyway.
+In fact, we prefer a writing style where the reader does not have to look at
+the table/figure to understand the experiment and the results, i.e., the
+table/figure are only there to complement the discussion.
+
+   2.2 The pronunciation of ("C-for-all") should be provided in the summary
+   (page 1 line 22) so that people not having an access to the full-text can
+   see it.
+
+Done.
+
+   2.3 Error comment in the code should be written with the same capitalization
+   and it will be helpful if you say specifically compilation error or runtime
+   error. (Please see attached annotated manuscript.)
+
+Fixed. All errors in the paper are compilation errors because they are related
+to the type system.
+
+   2.4 It is possible to provide a bit more information in Appendix A e.g. how
+   many lines/bytes of code and some details about software/hardware can be
+   added/moved here. The aim is to provide sufficient information for readers
+   to reproduce the results and to appreciate the context of the comparison.
+
+Table 2 indicates the source-code size in lines of code. The third paragraph of
+Section 9 gives precise details of the software/hardware used in the
+experiments.
+
+   3. Practical information about the work
+
+   There are three separate pieces of information on pages 2 ("All features
+   discussed in this paper are working, unless otherwise stated as under
+   construction."),
+
+This sentence is replace with:
+
+ All languages features discussed in this paper are working, except some
+ advanced exception-handling features.
+
+and Section 5.4 Exception Handling states:
+
+ The following framework for Cforall exception handling is in place, excluding
+ some runtime type-information and virtual functions.
+
+   page 4 ("Under construction is a mechanism to distribute...")
+
+The feature on page 4 is now complete.
+
+   and page 33 ("There is ongoing work on a wide range ... ")
+
+This sentence is replace to indicate the ongoing work is future work.
+
+ While all examples in the paper compile and run, a public beta-release of
+ Cforall will take 6-8 months to reduce compilation time, provide better
+ debugging, and add a few more libraries.  There is also new work on a number
+ of Cforall features, including arrays with size, runtime type-information,
+ virtual functions, user-defined conversions, and modules.
+
+   My recommendation is to move them to an appendix so that the length is
+   preserved.
+
+There is nothing to move into an appendix, except 3 sentences. We do not intend
+to discuss these items in this paper.
+
+   3.1 Any under construction work (only small part of page 4) should not be
+   mingled into the main part of the manuscript.
+
+See above.
+
+   3.2 Instructions on how to access/use the working functionality of Cforall
+   should be given.
+
+We will indicate release of Cforall in a public location, when we believe the
+code base is acceptable. In the interim, we have made public all the
+experimental code from section 9, and there is a reference in the paper to
+access this code. We can make a private beta-copy of Cforall available to the
+SP&E editor for distribution to the referees so they can verify our claims.
+
+   3.3 Planned work should be given a specific time of completion/release not
+   just "8-12 months".
+
+Software development is not rigorous engineering discipline. Given our small
+research development-team and the size of the project, we cannot give a
+specific time for completion of anything associated with the project. Having
+said that, we have reduced our expected time for Cforall release to 6-8 months
+as work is progressing well.
+
+
+   4. Citations
+
+   4.1 The impression after reading Section 1 INTRODUCTION is that the
+   referencing is poor. It is not until Section 10 RELATED WORK where majority
+   of the prior literature are discussed. Please consider moving the content
+   and improve citations - at least cite all main variations of C languages.
+
+See point 1.
+
+   4.2 I also would like to see citations at these specific places: Page 2
+   after Phil Karlton, page 22 after const hell problem.
+
+The Phil-Karlton quote is an urban legend without a specific academic citation:
+
+  https://skeptics.stackexchange.com/questions/19836/has-phil-karlton-ever-said-there-are-only-two-hard-things-in-computer-science
+
+The term "const hell" is replaced with "const poisoning" with a citation.
+
+   5.1 Footnotes and citations will need to have different schemes - number and
+   perhaps letter.
+
+The latex macros from Wiley generate those symbols. I assume during
+copy-editing the format is changed to suit the journal format.
+
+   5.2 Many references are not properly formatted e.g. date is incomplete,
+   extra/missing white spaces, extra dots, use of page number or section number
+   as part of superscript ref number. Please refer to attached document.
+
+Agreed. The bibtex BST macros are at fault. I have fixed some issues but I
+cannot fix them all as my BST macro-knowledge is limited.
+
+   5.3 Typos:
+   - Page 3 "eager" should be "earlier"
+
+Fixed.
+
+   - Page 4 "vals" should be "arr"
+
+Actually it is "vals", and the example is changed so it is clear why.
+
+   - Page 21 "than" should be "then"
+
+Fixed.
+
+
+   6. Conflict of interest
+   I see that the work is partially supported by Huawei. Perhaps statement
+   about any presence or absence of conflicts of interest should be explicitly
+   added. Please get a clear direction on this from the editor of the journal.
+
+The paper now states the project is open-source, hence there is no conflict of
+interest with the funding received from Huawei.
+
+
+Reviewing: 2
+
+Comments to the Author
+
+   Overloading requires the compiler to mangle a function's signature into its
+   name in the object file.  I'm pretty sure that this will complicate the
+   build process of mixed Cforall/C projects.
+
+There is no complexity with building Cforall/C programs, and there is an
+existence proof because C++ has name mangling for overloading and has no problem
+interacting with C.
+
+   I found the evaluation underwhelming.  There were only ~200 LoC ported from
+   C to Cforall.  This is too less to encounter potential caveats Cforall's
+   type system might impose.
+
+
+
+   Also, how is the compiler implemented?  I guess, Cforall is a
+   source-to-source compiler (from Cforall to C).  But this is left in the
+   dark.  What features are actually implemented?
+
+The following paragraph has been added to the introduction to address this
+comment:
+
+ All languages features discussed in this paper are working, except some
+ advanced exception-handling features.  Not discussed in this paper are the
+ integrated concurrency-constructs and user-level
+ threading-library~\cite{Delisle18}.  Cforall is an open-source project
+ implemented as an source-to-source translator from Cforall to the gcc-dialect
+ of C~\cite{GCCExtensions}, allowing it to leverage the portability and code
+ optimizations provided by gcc, meeting goals (1)--(3).  Ultimately, a compiler
+ is necessary for advanced features and optimal performance.  The Cforall
+ translator is 200+ files and 46,000+ lines of code written in C/C++.  Starting
+ with a translator versus a compiler makes it easier and faster to generate and
+ debug C object-code rather than intermediate, assembler or machine code.  The
+ translator design is based on the visitor pattern, allowing multiple passes
+ over the abstract code-tree, which works well for incrementally adding new
+ feature through additional visitor passes.  At the heart of the translator is
+ the type resolver, which handles the polymorphic routine/type
+ overload-resolution.  The Cforall runtime system is 100+ files and 11,000+
+ lines of code, written in Cforall.  Currently, the Cforall runtime is the
+ largest user of Cforall providing a vehicle to test the language features and
+ implementation.  The Cforall tests are 290+ files and 27,000+ lines of code.
+ The tests illustrate syntactic and semantic features in Cforall, plus a
+ growing number of runtime benchmarks.  The tests check for correctness and are
+ used for daily regression testing of commits (3800+).
+
+   Furthermore, the article lacks some related work.  Many proposed features
+   are present in functional languages such as Haskell, ML etc.  In particular,
+   the dealing of parametric polymorphism reminds me of Haskell.
+
+The following paragraph has been added at the start of Section 10.1:
+
+ ML~\cite{ML} was the first language to support parametric polymorphism.  Like
+ Cforall, it supports universal type parameters, but not the use of assertions
+ and traits to constrain type arguments.  Haskell~\cite{Haskell10} combines
+ ML-style polymorphism, polymorphic data types, and type inference with the
+ notion of type classes, collections of overloadable methods that correspond in
+ intent to traits in Cforall.  Unlike Cforall, Haskell requires an explicit
+ association between types and their classes that specifies the implementation
+ of operations.  These associations determine the functions that are assertion
+ arguments for particular combinations of class and type, in contrast to
+ Cforall where the assertion arguments are selected at function call sites
+ based upon the set of operations in scope at that point.  Haskell also
+ severely restricts the use of overloading: an overloaded name can only be
+ associated with a single class, and methods with overloaded names can only be
+ defined as part of instance declarations.
+
+   Cforall's approach to tuples is also quite similar to many functional
+   languages.
+
+At the end of Section 10.2, we state:
+
+ Tuples are a fundamental abstraction in most functional programming languages,
+ such as Standard ML, Haskell}, and Scala, which decompose tuples using pattern
+ matching.
+
+
+From: Judith Bishop <judithbishop@outlook.com>
+To: "Peter A. Buhr" <pabuhr@uwaterloo.ca>
+Subject: RE: Software: Practice and Experience - Decision on Manuscript ID
+ SPE-18-0065
+Date: Tue, 24 Apr 2018 16:45:51 +0000
+Accept-Language: em-US
+
+Hi Peter
+
+Great to hear from you. I am also glad your paper got through, as it is in the
+mainline of the SPE scope.
+
+It is important to mention that the software is open source. People really
+value that. In the acknowledgements, you can refer to Huawei for funding. It is
+quite normal to have industrial funding, and in fact it is a plus.
+
+I think that sorts out the comment from the referee.
+
+Looking forward to your revised submission.
+
+Kind regards
+
+Judith Bishop
+Extraordinary Professor, Computer Science
+Stellenbosch University, South Africa
+082 301 5220 / 021 671 5133
+judithbishop@outlook.com     LinkedIn
+
+-----Original Message-----
+From: Peter A. Buhr <pabuhr@uwaterloo.ca> 
+Sent: Tuesday, April 24, 2018 6:25 PM
+To: judithbishop@outlook.com
+Cc: a3moss@uwaterloo.ca; rschlunt@uwaterloo.ca
+Subject: Re: Software: Practice and Experience - Decision on Manuscript ID SPE-18-0065
+
+Hi Judy! Hope all is well.
+
+We are over-the-moon to get our paper accepted at SP&E, and we are actively
+working on your and the referee's comments.
+
+One comment where we need assistance is:
+
+  7. A statement about any presence or absence of conflicts of interest with
+     Huawei should be explicitly added.
+
+We forgotten to mention in the paper that our project is open-source. So Huawei
+was funding an open-source project. In fact, the Huawei funding ends soon, so
+there will be no direct affiliation in a couple of months, although there are a
+few people at Huawei who remain very interested in the project.
+
+So does stating that the Cforall project is an open-source project deal with
+the issue of conflict of interest?