Index: doc/papers/general/Paper.tex
===================================================================
--- doc/papers/general/Paper.tex	(revision ad4458f259c8209f2c8cef98f845f5d060220fb3)
+++ doc/papers/general/Paper.tex	(revision ab3251e7b1e2774d12d8f27428f99ce3416ce632)
@@ -199,5 +199,5 @@
 The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects.
 This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more.
-The TIOBE~\cite{TIOBE} ranks the top 5 most popular programming languages as: Java 16\%, \Textbf{C 7\%}, \Textbf{\CC 5\%}, \Csharp 4\%, Python 4\% = 36\%, where the next 50 languages are less than 3\% each with a long tail.
+The TIOBE~\cite{TIOBE} ranks the top 5 most \emph{popular} programming languages as: Java 15\%, \Textbf{C 12\%}, \Textbf{\CC 5.5\%}, Python 5\%, \Csharp 4.5\% = 42\%, where the next 50 languages are less than 4\% each with a long tail.
 The top 3 rankings over the past 30 years are:
 \begin{center}
@@ -205,8 +205,8 @@
 \lstDeleteShortInline@%
 \begin{tabular}{@{}rccccccc@{}}
-		& 2017	& 2012	& 2007	& 2002	& 1997	& 1992	& 1987		\\ \hline
-Java	& 1		& 1		& 1		& 1		& 12	& -		& -			\\
-\Textbf{C}	& \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{1}& \Textbf{1}& \Textbf{1}	\\
-\CC		& 3		& 3		& 3		& 3		& 2		& 2		& 4			\\
+		& 2018	& 2013	& 2008	& 2003	& 1998	& 1993	& 1988	\\ \hline
+Java	& 1		& 2		& 1		& 1		& 18	& -		& -		\\
+\Textbf{C}& \Textbf{2} & \Textbf{1} & \Textbf{2} & \Textbf{2} & \Textbf{1} & \Textbf{1} & \Textbf{1} \\
+\CC		& 3		& 4		& 3		& 3		& 2		& 2		& 5		\\
 \end{tabular}
 \lstMakeShortInline@%
@@ -227,15 +227,21 @@
 \CFA is currently implemented as a source-to-source translator from \CFA to the gcc-dialect of C~\cite{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by gcc, meeting goals (1)--(3).
 Ultimately, a compiler is necessary for advanced features and optimal performance.
-
-This paper identifies shortcomings in existing approaches to generic and variadic data types in C-like languages and presents a design for generic and variadic types avoiding those shortcomings.
+All of the features discussed in this paper are working, unless a feature states it is a future feature for completion.
+
+
+\section{Polymorphic Functions}
+
+\CFA introduces both ad-hoc and parametric polymorphism to C, with a design originally formalized by Ditchfield~\cite{Ditchfield92}, and first implemented by Bilson~\cite{Bilson03}.
+Shortcomings are identified in existing approaches to generic and variadic data types in C-like languages and how these shortcomings are avoided in \CFA.
 Specifically, the solution is both reusable and type-checked, as well as conforming to the design goals of \CFA with ergonomic use of existing C abstractions.
-The new constructs are empirically compared with both standard C and \CC; the results show the new design is comparable in performance.
-
-\section{Polymorphic Functions}
-
-\CFA introduces both ad-hoc and parametric polymorphism to C, with a design originally formalized by Ditchfield~\cite{Ditchfield92}, and first implemented by Bilson~\cite{Bilson03}.
+The new constructs are empirically compared with C and \CC approaches via performance experiments in Section~\ref{sec:eval}.
+
 
 \subsection{Name Overloading}
-
+\label{s:NameOverloading}
+
+\begin{quote}
+There are only two hard things in Computer Science: cache invalidation and \emph{naming things} -- Phil Karlton
+\end{quote}
 C already has a limited form of ad-hoc polymorphism in the form of its basic arithmetic operators, which apply to a variety of different types using identical syntax. 
 \CFA extends the built-in operator overloading by allowing users to define overloads for any function, not just operators, and even any variable;
@@ -245,21 +251,23 @@
 
 \begin{cfa}
-int max(int a, int b) { return a < b ? b : a; }  // (1)
-double max(double a, double b) { return a < b ? b : a; }  // (2)
-
-int max = INT_MAX;     // (3)
-double max = DBL_MAX;  // (4)
-
-max(7, -max);   $\C{// uses (1) and (3), by matching int from constant 7}$
-max(max, 3.14); $\C{// uses (2) and (4), by matching double from constant 3.14}$
-
-//max(max, -max);  $\C{// ERROR: ambiguous}$
-int m = max(max, -max); $\C{// uses (1) once and (3) twice, by matching return type}$
-\end{cfa}
-
-\Celeven did add @_Generic@ expressions, which can be used in preprocessor macros to provide a form of ad-hoc polymorphism; however, this polymorphism is both functionally and ergonomically inferior to \CFA name overloading. 
+int max = 2147483647;						$\C[3.75in]{// (1)}$
+double max = 1.7976931348623157E+308;	$\C{// (2)}$
+int max( int a, int b ) { return a < b ? b : a; }  $\C{// (3)}$
+double max( double a, double b ) { return a < b ? b : a; }  $\C{// (4)}\CRT$
+max( 7, -max );								$\C{// uses (3) and (1), by matching int from constant 7}$
+max( max, 3.14 );							$\C{// uses (4) and (2), by matching double from constant 3.14}$
+max( max, -max );							$\C{// ERROR: ambiguous}$
+int m = max( max, -max );					$\C{// uses (3) and (1) twice, by matching return type}$
+\end{cfa}
+\CFA maximizes the ability to reuse names to aggressively address the naming problem.
+In some cases, hundreds of names can be reduced to tens, resulting in a significant cognitive reduction for a programmer.
+In the above, the name @max@ has a consistent meaning, and a programmer only needs to remember the single concept: maximum.
+To prevent significant ambiguities, \CFA uses the return type in selecting overloads, \eg in the assignment to @m@, the compiler use @m@'s type to unambiguously select the most appropriate call to function @max@ (as does Ada).
+As is shown later, there are a number of situations where \CFA takes advantage of available type information to disambiguate, where other programming languages generate ambiguities.
+
+\Celeven added @_Generic@ expressions, which can be used in preprocessor macros to provide a form of ad-hoc polymorphism; however, this polymorphism is both functionally and ergonomically inferior to \CFA name overloading. 
 The macro wrapping the generic expression imposes some limitations; as an example, it could not implement the example above, because the variables @max@ would collide with the functions @max@. 
 Ergonomic limitations of @_Generic@ include the necessity to put a fixed list of supported types in a single place and manually dispatch to appropriate overloads, as well as possible namespace pollution from the functions dispatched to, which must all have distinct names.
-Though name-overloading removes a major use-case for @_Generic@ expressions, \CFA does implement @_Generic@ for backwards-compatibility purposes. \TODO{actually implement that}
+Though name-overloading removes a major use-case for @_Generic@ expressions, \CFA implements @_Generic@ for backwards-compatibility purposes. \TODO{actually implement that}
 
 % http://fanf.livejournal.com/144696.html
@@ -288,5 +296,5 @@
 For example, the function @twice@ can be defined using the \CFA syntax for operator overloading:
 \begin{cfa}
-forall( otype T `| { T ?+?(T, T); }` ) T twice( T x ) { return x + x; }	$\C{// ? denotes operands}$
+forall( otype T `| { T ?+?(T, T); }` ) T twice( T x ) { return x `+` x; }	$\C{// ? denotes operands}$
 int val = twice( twice( 3.7 ) );
 \end{cfa}
@@ -343,5 +351,6 @@
 \begin{cfa}
 forall( otype T | { int ?<?( T, T ); } ) void qsort( const T * arr, size_t size ) { /* use C qsort */ }
-{	int ?<?( double x, double y ) { return x `>` y; }	$\C{// locally override behaviour}$
+{
+	int ?<?( double x, double y ) { return x `>` y; }	$\C{// locally override behaviour}$
 	qsort( vals, size );					$\C{// descending sort}$
 }
@@ -414,10 +423,10 @@
 \section{Generic Types}
 
-One of the known shortcomings of standard C is that it does not provide reusable type-safe abstractions for generic data structures and algorithms.
+A significant shortcoming of standard C is the lack of reusable type-safe abstractions for generic data structures and algorithms.
 Broadly speaking, there are three approaches to implement abstract data-structures in C.
 One approach is to write bespoke data-structures for each context in which they are needed.
 While this approach is flexible and supports integration with the C type-checker and tooling, it is also tedious and error-prone, especially for more complex data structures.
-A second approach is to use @void *@--based polymorphism, \eg the C standard-library functions @bsearch@ and @qsort@; an approach which does allow reuse of code for common functionality.
-However, basing all polymorphism on @void *@ eliminates the type-checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that would not otherwise be needed.
+A second approach is to use @void *@--based polymorphism, \eg the C standard-library functions @bsearch@ and @qsort@, which allows reuse of code with common functionality.
+However, basing all polymorphism on @void *@ eliminates the type-checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is not otherwise needed.
 A third approach to generic code is to use preprocessor macros, which does allow the generated code to be both generic and type-checked, but errors may be difficult to interpret.
 Furthermore, writing and using preprocessor macros can be unnatural and inflexible.
@@ -434,13 +443,13 @@
 };
 forall( otype T ) T value( pair( const char *, T ) p ) { return p.second; }
-forall( dtype F, otype T ) T value_p( pair( F *, T * ) p ) { return * p.second; }
+forall( dtype F, otype T ) T value( pair( F *, T * ) p ) { return *p.second; }
 
 pair( const char *, int ) p = { "magic", 42 };
-int magic = value( p );
+int i = value( p );
 pair( void *, int * ) q = { 0, &p.second };
-magic = value_p( q );
+i = value( q );
 double d = 1.0;
 pair( double *, double * ) r = { &d, &d };
-d = value_p( r );
+d = value( r );
 \end{cfa}
 
@@ -589,5 +598,5 @@
 [ double ] foo$\(_2\)$( int );
 void bar( int, double, double );
-bar( foo( 3 ), foo( 3 ) );
+`bar`( foo( 3 ), foo( 3 ) );
 \end{cfa}
 The type-resolver only has the tuple return-types to resolve the call to @bar@ as the @foo@ parameters are identical, which involves unifying the possible @foo@ functions with @bar@'s parameter list.
@@ -835,5 +844,5 @@
 Since @sum@\(_0\) does not accept any arguments, it is not a valid candidate function for the call @sum(10, 20, 30)@.
 In order to call @sum@\(_1\), @10@ is matched with @x@, and the argument resolution moves on to the argument pack @rest@, which consumes the remainder of the argument list and @Params@ is bound to @[20, 30]@.
-The process continues unitl @Params@ is bound to @[]@, requiring an assertion @int sum()@, which matches @sum@\(_0\) and terminates the recursion.
+The process continues until @Params@ is bound to @[]@, requiring an assertion @int sum()@, which matches @sum@\(_0\) and terminates the recursion.
 Effectively, this algorithm traces as @sum(10, 20, 30)@ $\rightarrow$ @10 + sum(20, 30)@ $\rightarrow$ @10 + (20 + sum(30))@ $\rightarrow$ @10 + (20 + (30 + sum()))@ $\rightarrow$ @10 + (20 + (30 + 0))@.
 
@@ -1161,9 +1170,12 @@
 @case@ clauses are made disjoint by the @break@ statement.
 While the ability to fall through \emph{is} a useful form of control flow, it does not match well with programmer intuition, resulting in many errors from missing @break@ statements.
-For backwards compatibility, \CFA provides a \emph{new} control structure, @choose@, which mimics @switch@, but reverses the meaning of fall through:
-\begin{cquote}
+For backwards compatibility, \CFA provides a \emph{new} control structure, @choose@, which mimics @switch@, but reverses the meaning of fall through (see Figure~\ref{f:ChooseSwitchStatements}).
+Collectively, these enhancements reduce programmer burden and increase readability and safety.
+
+\begin{figure}
+\centering
 \lstDeleteShortInline@%
-\begin{tabular}{@{}l@{\hspace{\parindentlnth}}l@{}}
-\multicolumn{1}{c@{\hspace{\parindentlnth}}}{\textbf{\CFA}}	& \multicolumn{1}{c}{\textbf{C}}	\\
+\begin{tabular}{@{}l@{\hspace{2\parindentlnth}}l@{}}
+\multicolumn{1}{c@{\hspace{2\parindentlnth}}}{\textbf{\CFA}}	& \multicolumn{1}{c}{\textbf{C}}	\\
 \begin{cfa}
 `choose` ( day ) {
@@ -1199,6 +1211,7 @@
 \end{tabular}
 \lstMakeShortInline@%
-\end{cquote}
-Collectively, these enhancements reduce programmer burden and increase readability and safety.
+\caption{\lstinline|choose| versus \lstinline|switch| Statements}
+\label{f:ChooseSwitchStatements}
+\end{figure}
 
 \begin{comment}
@@ -1368,6 +1381,6 @@
 \begin{cquote}
 \lstDeleteShortInline@%
-\begin{tabular}{@{}l@{\hspace{\parindentlnth}}l@{}}
-\multicolumn{1}{c@{\hspace{\parindentlnth}}}{\textbf{Resumption}}	& \multicolumn{1}{c}{\textbf{Recovery}}	\\
+\begin{tabular}{@{}l@{\hspace{2\parindentlnth}}l@{}}
+\multicolumn{1}{c@{\hspace{2\parindentlnth}}}{\textbf{Resumption}}	& \multicolumn{1}{c}{\textbf{Termination}}	\\
 \begin{cfa}
 `exception R { int fix; };`
@@ -2067,5 +2080,5 @@
 In many cases, the interface is an inline wrapper providing overloading during compilation but zero cost at runtime.
 The following sections give a glimpse of the interface reduction to many C libraries.
-In many cases, @signed@/@unsigned@ @char@ and @short@ routines are available (but not shown) to ensure expression computations remain in a single type, as conversions can distort results.
+In many cases, @signed@/@unsigned@ @char@, @short@, and @_Complex@ routines are available (but not shown) to ensure expression computations remain in a single type, as conversions can distort results.
 
 
@@ -2099,14 +2112,18 @@
 \begin{cfa}
 MIN
+
 MAX
-M_PI
-M_E
+
+PI
+E
 \end{cfa}
 &
 \begin{cfa}
 SCHAR_MIN, CHAR_MIN, SHRT_MIN, INT_MIN, LONG_MIN, LLONG_MIN,
+		FLT_MIN, DBL_MIN, LDBL_MIN
 SCHAR_MAX, UCHAR_MAX, SHRT_MAX, INT_MAX, LONG_MAX, LLONG_MAX,
-M_PI, M_PIl, M_CPI, M_CPIl,
-M_E, M_El, M_CE, M_CEl
+		 FLT_MAX, DBL_MAX, LDBL_MAX
+M_PI, M_PIl
+M_E, M_El
 \end{cfa}
 \end{tabular}
@@ -2158,5 +2175,5 @@
 While \Celeven has type-generic math~\cite[\S~7.25]{C11} in @tgmath.h@ to provide a similar mechanism, these macros are limited, matching a routine name with a single set of floating type(s).
 For example, it is impossible to overload @atan@ for both one and two arguments;
-instead the names @atan@ and @atan2@ are required.
+instead the names @atan@ and @atan2@ are required (see Section~\ref{s:NameOverloading}).
 The key observation is that only a restricted set of type-generic macros are provided for a limited set of routine names, which do not generalize across the type system, as in \CFA.
 
@@ -2446,25 +2463,22 @@
 \begin{cfa}[xleftmargin=3\parindentlnth,aboveskip=0pt,belowskip=0pt]
 int main( int argc, char * argv[] ) {
-	FILE * out = fopen( "cfa-out.txt", "w" );
-	int maxi = 0, vali = 42;
-	stack(int) si, ti;
-
-	REPEAT_TIMED( "push_int", N, push( &si, vali ); )
-	TIMED( "copy_int", ti = si; )
-	TIMED( "clear_int", clear( &si ); )
-	REPEAT_TIMED( "pop_int", N,
-		int xi = pop( &ti ); if ( xi > maxi ) { maxi = xi; } )
-	REPEAT_TIMED( "print_int", N/2, print( out, vali, ":", vali, "\n" ); )
-
-	pair(_Bool, char) maxp = { (_Bool)0, '\0' }, valp = { (_Bool)1, 'a' };
-	stack(pair(_Bool, char)) sp, tp;
-
-	REPEAT_TIMED( "push_pair", N, push( &sp, valp ); )
-	TIMED( "copy_pair", tp = sp; )
-	TIMED( "clear_pair", clear( &sp ); )
-	REPEAT_TIMED( "pop_pair", N,
-		pair(_Bool, char) xp = pop( &tp ); if ( xp > maxp ) { maxp = xp; } )
-	REPEAT_TIMED( "print_pair", N/2, print( out, valp, ":", valp, "\n" ); )
-	fclose(out);
+	ofstream out = { "cfa-out.txt" };
+	int max = 0, val = 42;
+	stack( int ) si, t;
+
+	REPEAT_TIMED( "push_int", N, push( si, val ); )
+	TIMED( "copy_int", t = si; )
+	TIMED( "clear_int", clear( si ); )
+	REPEAT_TIMED( "pop_int", N, int x = pop( t ); max = max( x, max ); )
+	REPEAT_TIMED( "print_int", N/2, out | val | ':' | val | endl; )
+
+	pair( _Bool, char ) max = { (_Bool)0, '\0' }, val = { (_Bool)1, 'a' };
+	stack( pair( _Bool, char ) ) s, t;
+
+	REPEAT_TIMED( "push_pair", N, push( s, val ); )
+	TIMED( "copy_pair", t = s; )
+	TIMED( "clear_pair", clear( s ); )
+	REPEAT_TIMED( "pop_pair", N, pair(_Bool, char) x = pop( t ); max = max( x, max ); )
+	REPEAT_TIMED( "print_pair", N/2, out | val | ':' | val | endl; )
 }
 \end{cfa}
@@ -2640,13 +2654,17 @@
 \CFA
 \begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
+forall(otype T) struct stack_node;
+forall(otype T) struct stack {
+	stack_node(T) * head;
+};
 forall(otype T) struct stack_node {
 	T value;
 	stack_node(T) * next;
 };
-forall(otype T) void ?{}(stack(T) * s) { (&s->head){ 0 }; }
-forall(otype T) void ?{}(stack(T) * s, stack(T) t) {
-	stack_node(T) ** crnt = &s->head;
+forall(otype T) void ?{}( stack(T) & s ) { (s.head){ 0 }; }
+forall(otype T) void ?{}( stack(T) & s, stack(T) t ) {
+	stack_node(T) ** crnt = &s.head;
 	for ( stack_node(T) * next = t.head; next; next = next->next ) {
-		*crnt = ((stack_node(T) *)malloc()){ next->value }; /***/
+		*crnt = malloc(){ next->value };
 		stack_node(T) * acrnt = *crnt;
 		crnt = &acrnt->next;
@@ -2654,30 +2672,30 @@
 	*crnt = 0;
 }
-forall(otype T) stack(T) ?=?(stack(T) * s, stack(T) t) {
-	if ( s->head == t.head ) return *s;
-	clear(s);
+forall(otype T) stack(T) ?=?( stack(T) & s, stack(T) t ) {
+	if ( s.head == t.head ) return s;
+	clear( s );
 	s{ t };
-	return *s;
-}
-forall(otype T) void ^?{}(stack(T) * s) { clear(s); }
-forall(otype T) _Bool empty(const stack(T) * s) { return s->head == 0; }
-forall(otype T) void push(stack(T) * s, T value) {
-	s->head = ((stack_node(T) *)malloc()){ value, s->head }; /***/
-}
-forall(otype T) T pop(stack(T) * s) {
-	stack_node(T) * n = s->head;
-	s->head = n->next;
+	return s;
+}
+forall(otype T) void ^?{}( stack(T) & s) { clear( s ); }
+forall(otype T) _Bool empty( const stack(T) & s ) { return s.head == 0; }
+forall(otype T) void push( stack(T) & s, T value ) {
+	s.head = malloc(){ value, s.head };
+}
+forall(otype T) T pop( stack(T) & s ) {
+	stack_node(T) * n = s.head;
+	s.head = n->next;
 	T x = n->value;
 	^n{};
-	free(n);
+	free( n );
 	return x;
 }
-forall(otype T) void clear(stack(T) * s) {
-	for ( stack_node(T) * next = s->head; next; ) {
+forall(otype T) void clear( stack(T) & s ) {
+	for ( stack_node(T) * next = s.head; next; ) {
 		stack_node(T) * crnt = next;
 		next = crnt->next;
-		delete(crnt);
+		delete( crnt );
 	}
-	s->head = 0;
+	s.head = 0;
 }
 \end{cfa}
@@ -2828,5 +2846,4 @@
 
 \begin{comment}
-
 \subsubsection{bench.h}
 (\texttt{bench.hpp} is similar.)