Index: doc/papers/general/Paper.tex
===================================================================
--- doc/papers/general/Paper.tex	(revision fb2ce273bf7ea018ec95cd8701c1ca1eea477a46)
+++ doc/papers/general/Paper.tex	(revision 28bc8c8060f9366d54baea1d0ac5681a7a636197)
@@ -2547,30 +2547,30 @@
 In fact, \CFA's features for generic programming can enable faster runtime execution than idiomatic @void *@-based C code.
 This claim is demonstrated through a set of generic-code-based micro-benchmarks in C, \CFA, and \CC (see stack implementations in Appendix~\ref{sec:BenchmarkStackImplementation}).
-Since all these languages share a subset essentially comprising standard C, maximal-performance benchmarks would show little runtime variance, other than in length and clarity of source code.
+Since all these languages share a subset essentially comprising standard C, maximal-performance benchmarks would show little runtime variance, differing only in length and clarity of source code.
 A more illustrative benchmark measures the costs of idiomatic usage of each language's features.
-Figure~\ref{fig:BenchmarkTest} shows the \CFA benchmark tests for a generic stack based on a singly linked-list, a generic pair-data-structure, and a variadic @print@ function similar to that in Section~\ref{sec:variadic-tuples}.
+Figure~\ref{fig:BenchmarkTest} shows the \CFA benchmark tests for a generic stack based on a singly linked-list.
 The benchmark test is similar for C and \CC.
-The experiment uses element types @int@ and @pair(_Bool, char)@, and pushes $N=40M$ elements on a generic stack, copies the stack, clears one of the stacks, finds the maximum value in the other stack, and prints $N/2$ (to reduce graph height) constants.
+The experiment uses element types @int@ and @pair(short, char)@, and pushes $N=40M$ elements on a generic stack, copies the stack, clears one of the stacks, and finds the maximum value in the other stack.
 
 \begin{figure}
 \begin{cfa}[xleftmargin=3\parindentlnth,aboveskip=0pt,belowskip=0pt]
-int main( int argc, char * argv[] ) {
+int main() {
 	int max = 0, val = 42;
 	stack( int ) si, ti;
 
 	REPEAT_TIMED( "push_int", N, push( si, val ); )
-	TIMED( "copy_int", ti = si; )
+	TIMED( "copy_int", ti{ si }; )
 	TIMED( "clear_int", clear( si ); )
 	REPEAT_TIMED( "pop_int", N, 
 		int x = pop( ti ); if ( x > max ) max = x; )
 
-	pair( _Bool, char ) max = { (_Bool)0, '\0' }, val = { (_Bool)1, 'a' };
-	stack( pair( _Bool, char ) ) sp, tp;
+	pair( short, char ) max = { 0h, '\0' }, val = { 42h, 'a' };
+	stack( pair( short, char ) ) sp, tp;
 
 	REPEAT_TIMED( "push_pair", N, push( sp, val ); )
-	TIMED( "copy_pair", tp = sp; )
+	TIMED( "copy_pair", tp{ sp }; )
 	TIMED( "clear_pair", clear( sp ); )
 	REPEAT_TIMED( "pop_pair", N,
-		pair(_Bool, char) x = pop( tp ); if ( x > max ) max = x; )
+		pair(short, char) x = pop( tp ); if ( x > max ) max = x; )
 }
 \end{cfa}
@@ -2583,10 +2583,9 @@
 hence runtime checks are necessary to safely down-cast objects.
 The most notable difference among the implementations is in memory layout of generic types: \CFA and \CC inline the stack and pair elements into corresponding list and pair nodes, while C and \CCV lack such a capability and instead must store generic objects via pointers to separately-allocated objects.
-For the print benchmark, idiomatic printing is used: the C and \CFA variants used @stdio.h@, while the \CC and \CCV variants used @iostream@; preliminary tests show this distinction has negligible runtime impact.
-Note, the C benchmark uses unchecked casts as there is no runtime mechanism to perform such checks, while \CFA and \CC provide type-safety statically.
+Note that the C benchmark uses unchecked casts as there is no runtime mechanism to perform such checks, while \CFA and \CC provide type-safety statically.
 
 Figure~\ref{fig:eval} and Table~\ref{tab:eval} show the results of running the benchmark in Figure~\ref{fig:BenchmarkTest} and its C, \CC, and \CCV equivalents.
 The graph plots the median of 5 consecutive runs of each program, with an initial warm-up run omitted.
-All code is compiled at \texttt{-O2} by gcc or g++ 6.2.0, with all \CC code compiled as \CCfourteen.
+All code is compiled at \texttt{-O2} by gcc or g++ 6.3.0, with all \CC code compiled as \CCfourteen.
 The benchmarks are run on an Ubuntu 16.04 workstation with 16 GB of RAM and a 6-core AMD FX-6300 CPU with 3.5 GHz maximum clock frequency.
 
@@ -2606,22 +2605,21 @@
 									& \CT{C}	& \CT{\CFA}	& \CT{\CC}	& \CT{\CCV}		\\ \hline
 maximum memory usage (MB)			& 10001		& 2502		& 2503		& 11253			\\
-source code size (lines)			& 247		& 222		& 165		& 339			\\
-redundant type annotations (lines)	& 39		& 2			& 2			& 15			\\
-binary size (KB)					& 14		& 229		& 18		& 38			\\
+source code size (lines)			& 187		& 188		& 133		& 303			\\
+redundant type annotations (lines)	& 25		& 0			& 2			& 16			\\
+binary size (KB)					& 14		& 257		& 14		& 37			\\
 \end{tabular}
 \end{table}
 
 The C and \CCV variants are generally the slowest with the largest memory footprint, because of their less-efficient memory layout and the pointer-indirection necessary to implement generic types;
-this inefficiency is exacerbated by the second level of generic types in the pair-based benchmarks.
-By contrast, the \CFA and \CC variants run in roughly equivalent time for both the integer and pair of @_Bool@ and @char@ because the storage layout is equivalent, with the inlined libraries (\ie no separate compilation) and greater maturity of the \CC compiler contributing to its lead.
+this inefficiency is exacerbated by the second level of generic types in the pair benchmarks.
+By contrast, the \CFA and \CC variants run in roughly equivalent time for both the integer and pair of @short@ and @char@ because the storage layout is equivalent, with the inlined libraries (\ie no separate compilation) and greater maturity of the \CC compiler contributing to its lead.
 \CCV is slower than C largely due to the cost of runtime type-checking of down-casts (implemented with @dynamic_cast@);
-There are two outliers in the graph for \CFA: all prints and pop of @pair@.
-Both of these cases result from the complexity of the C-generated polymorphic code, so that the gcc compiler is unable to optimize some dead code and condense nested calls.
-A compiler designed for \CFA could easily perform these optimizations.
+The outlier in the graph for \CFA, pop @pair@, results from the complexity of the generated-C polymorphic code.
+The gcc compiler is unable to optimize some dead code and condense nested calls; a compiler designed for \CFA could easily perform these optimizations.
 Finally, the binary size for \CFA is larger because of static linking with the \CFA libraries.
 
-\CFA is also competitive in terms of source code size, measured as a proxy for programmer effort. The line counts in Table~\ref{tab:eval} include implementations of @pair@ and @stack@ types for all four languages for purposes of direct comparison, though it should be noted that \CFA and \CC have pre-written data structures in their standard libraries that programmers would generally use instead. Use of these standard library types has minimal impact on the performance benchmarks, but shrinks the \CFA and \CC benchmarks to 73 and 54 lines, respectively.
+\CFA is also competitive in terms of source code size, measured as a proxy for programmer effort. The line counts in Table~\ref{tab:eval} include implementations of @pair@ and @stack@ types for all four languages for purposes of direct comparison, though it should be noted that \CFA and \CC have pre-written data structures in their standard libraries that programmers would generally use instead. Use of these standard library types has minimal impact on the performance benchmarks, but shrinks the \CFA and \CC benchmarks to 41 and 42 lines, respectively.
 On the other hand, C does not have a generic collections-library in its standard distribution, resulting in frequent reimplementation of such collection types by C programmers.
-\CCV does not use the \CC standard template library by construction, and in fact includes the definition of @object@ and wrapper classes for @bool@, @char@, @int@, and @const char *@ in its line count, which inflates this count somewhat, as an actual object-oriented language would include these in the standard library;
+\CCV does not use the \CC standard template library by construction, and in fact includes the definition of @object@ and wrapper classes for @char@, @short@, and @int@ in its line count, which inflates this count somewhat, as an actual object-oriented language would include these in the standard library;
 with their omission, the \CCV line count is similar to C.
 We justify the given line count by noting that many object-oriented languages do not allow implementing new interfaces on library types without subclassing or wrapper types, which may be similarly verbose.
@@ -2629,13 +2627,10 @@
 Raw line-count, however, is a fairly rough measure of code complexity;
 another important factor is how much type information the programmer must manually specify, especially where that information is not checked by the compiler.
-Such unchecked type information produces a heavier documentation burden and increased potential for runtime bugs, and is much less common in \CFA than C, with its manually specified function pointers arguments and format codes, or \CCV, with its extensive use of un-type-checked downcasts (\eg @object@ to @integer@ when popping a stack, or @object@ to @printable@ when printing the elements of a @pair@).
+Such unchecked type information produces a heavier documentation burden and increased potential for runtime bugs, and is much less common in \CFA than C, with its manually specified function pointer arguments and format codes, or \CCV, with its extensive use of un-type-checked downcasts (\eg @object@ to @integer@ when popping a stack, or @object@ to @printable@ when printing the elements of a @pair@).
 To quantify this, the ``redundant type annotations'' line in Table~\ref{tab:eval} counts the number of lines on which the type of a known variable is re-specified, either as a format specifier, explicit downcast, type-specific function, or by name in a @sizeof@, struct literal, or @new@ expression.
 The \CC benchmark uses two redundant type annotations to create a new stack nodes, while the C and \CCV benchmarks have several such annotations spread throughout their code.
-The two instances in which the \CFA benchmark still uses redundant type specifiers are to cast the result of a polymorphic @malloc@ call (the @sizeof@ argument is inferred by the compiler).
-These uses are similar to the @new@ expressions in \CC, though the \CFA compiler's type resolver should shortly render even these type casts superfluous.
-
+The \CFA benchmark was able to eliminate all redundant type annotations through use of the polymorphic @alloc@ function discussed in Section~\ref{sec:libraries}.
 
 \section{Related Work}
-
 
 \subsection{Polymorphism}
@@ -2765,21 +2760,16 @@
 \CFA
 \begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
-forall(otype T) struct stack_node;
-forall(otype T) struct stack {
-	stack_node(T) * head;
-};
 forall(otype T) struct stack_node {
 	T value;
 	stack_node(T) * next;
 };
+forall(otype T) struct stack { stack_node(T) * head; };
 forall(otype T) void ?{}( stack(T) & s ) { (s.head){ 0 }; }
 forall(otype T) void ?{}( stack(T) & s, stack(T) t ) {
 	stack_node(T) ** crnt = &s.head;
 	for ( stack_node(T) * next = t.head; next; next = next->next ) {
-		stack_node(T) * new_node = ((stack_node(T)*)malloc());
-		(*new_node){ next->value }; /***/
-		*crnt = new_node;
-		stack_node(T) * acrnt = *crnt;
-		crnt = &acrnt->next;
+		*crnt = alloc();
+		((*crnt)->value){ next->value };
+		crnt = &(*crnt)->next;
 	}
 	*crnt = 0;
@@ -2794,22 +2784,24 @@
 forall(otype T) _Bool empty( const stack(T) & s ) { return s.head == 0; }
 forall(otype T) void push( stack(T) & s, T value ) {
-	stack_node(T) * new_node = ((stack_node(T)*)malloc());
-	(*new_node){ value, s.head }; /***/
-	s.head = new_node;
+	stack_node(T) * n = alloc();
+	(*n){ value, head };
+	head = n;
 }
 forall(otype T) T pop( stack(T) & s ) {
-	stack_node(T) * n = s.head;
-	s.head = n->next;
-	T v = n->value;
-	delete( n );
-	return v;
+	stack_node(T) * n = head;
+	head = n->next;
+	T x = n->value;
+	^(*n){};
+	free( n );
+	return x;
 }
 forall(otype T) void clear( stack(T) & s ) {
-	for ( stack_node(T) * next = s.head; next; ) {
+	for ( stack_node(T) * next = head; next; ) {
 		stack_node(T) * crnt = next;
 		next = crnt->next;
-		delete( crnt );
+		^(*crnt){};
+		free(crnt);
 	}
-	s.head = 0;
+	head = 0;
 }
 \end{cfa}
@@ -2818,5 +2810,5 @@
 \CC
 \begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
-template<typename T> class stack {
+template<typename T> struct stack {
 	struct node {
 		T value;
@@ -2825,5 +2817,5 @@
 	};
 	node * head;
-	void copy(const stack<T>& o) {
+	void copy(const stack<T> & o) {
 		node ** crnt = &head;
 		for ( node * next = o.head;; next; next = next->next ) {
@@ -2833,10 +2825,9 @@
 		*crnt = nullptr;
 	}
-  public:
 	stack() : head(nullptr) {}
-	stack(const stack<T>& o) { copy(o); }
+	stack(const stack<T> & o) { copy(o); }
 	stack(stack<T> && o) : head(o.head) { o.head = nullptr; }
 	~stack() { clear(); }
-	stack & operator= (const stack<T>& o) {
+	stack & operator= (const stack<T> & o) {
 		if ( this == &o ) return *this;
 		clear();
@@ -2877,4 +2868,5 @@
 	struct stack_node * next;
 };
+struct stack { struct stack_node* head; };
 struct stack new_stack() { return (struct stack){ NULL }; /***/ }
 void copy_stack(struct stack * s, const struct stack * t, void * (*copy)(const void *)) {
@@ -2882,8 +2874,8 @@
 	for ( struct stack_node * next = t->head; next; next = next->next ) {
 		*crnt = malloc(sizeof(struct stack_node)); /***/
-		**crnt = (struct stack_node){ copy(next->value) }; /***/
+		(*crnt)->value = copy(next->value);
 		crnt = &(*crnt)->next;
 	}
-	*crnt = 0;
+	*crnt = NULL;
 }
 _Bool stack_empty(const struct stack * s) { return s->head == NULL; }
@@ -2914,46 +2906,53 @@
 \CCV
 \begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
-stack::node::node( const object & v, node * n ) : value( v.new_copy() ), next( n ) {}
-void stack::copy(const stack & o) {
-	node ** crnt = &head;
-	for ( node * next = o.head; next; next = next->next ) {
-		*crnt = new node{ *next->value };
-		crnt = &(*crnt)->next;
+struct stack {
+	struct node {
+		ptr<object> value;
+		node* next;
+		node( const object & v, node * n ) : value( v.new_copy() ), next( n ) {}
+	};
+	node* head;
+	void copy(const stack & o) {
+		node ** crnt = &head;
+		for ( node * next = o.head; next; next = next->next ) {
+			*crnt = new node{ *next->value }; /***/
+			crnt = &(*crnt)->next;
+		}
+		*crnt = nullptr;
 	}
-	*crnt = nullptr;
-}
-stack::stack() : head(nullptr) {}
-stack::stack(const stack & o) { copy(o); }
-stack::stack(stack && o) : head(o.head) { o.head = nullptr; }
-stack::~stack() { clear(); }
-stack & stack::operator= (const stack & o) {
-	if ( this == &o ) return *this;
-	clear();
-	copy(o);
-	return *this;
-}
-stack & stack::operator= (stack && o) {
-	if ( this == &o ) return *this;
-	head = o.head;
-	o.head = nullptr;
-	return *this;
-}
-bool stack::empty() const { return head == nullptr; }
-void stack::push(const object & value) { head = new node{ value, head }; /***/ }
-ptr<object> stack::pop() {
-	node * n = head;
-	head = n->next;
-	ptr<object> x = std::move(n->value);
-	delete n;
-	return x;
-}
-void stack::clear() {
-	for ( node * next = head; next; ) {
-		node * crnt = next;
-		next = crnt->next;
-		delete crnt;
+	stack() : head(nullptr) {}
+	stack(const stack & o) { copy(o); }
+	stack(stack && o) : head(o.head) { o.head = nullptr; }
+	~stack() { clear(); }
+	stack & operator= (const stack & o) {
+		if ( this == &o ) return *this;
+		clear();
+		copy(o);
+		return *this;
 	}
-	head = nullptr;
-}
+	stack & operator= (stack && o) {
+		if ( this == &o ) return *this;
+		head = o.head;
+		o.head = nullptr;
+		return *this;
+	}
+	bool empty() const { return head == nullptr; }
+	void push(const object & value) { head = new node{ value, head }; /***/ }
+	ptr<object> pop() {
+		node * n = head;
+		head = n->next;
+		ptr<object> x = std::move(n->value);
+		delete n;
+		return x;
+	}
+	void clear() {
+		for ( node * next = head; next; ) {
+			node * crnt = next;
+			next = crnt->next;
+			delete crnt;
+		}
+		head = nullptr;
+	}
+};
 \end{cfa}
 
Index: doc/papers/general/evaluation/cfa-bench.c
===================================================================
--- doc/papers/general/evaluation/cfa-bench.c	(revision fb2ce273bf7ea018ec95cd8701c1ca1eea477a46)
+++ doc/papers/general/evaluation/cfa-bench.c	(revision 28bc8c8060f9366d54baea1d0ac5681a7a636197)
@@ -3,5 +3,5 @@
 #include "cfa-pair.h"
 
-int main( int argc, char * argv[] ) {
+int main() {
 	int max = 0, val = 42;
 	stack( int ) si, ti;
Index: doc/papers/general/evaluation/cfa-stack.c
===================================================================
--- doc/papers/general/evaluation/cfa-stack.c	(revision fb2ce273bf7ea018ec95cd8701c1ca1eea477a46)
+++ doc/papers/general/evaluation/cfa-stack.c	(revision 28bc8c8060f9366d54baea1d0ac5681a7a636197)
@@ -12,5 +12,5 @@
 	stack_node(T) ** crnt = &s.head;
 	for ( stack_node(T) * next = t.head; next; next = next->next ) {
-		*crnt = malloc();
+		*crnt = alloc();
 		((*crnt)->value){ next->value };
 		crnt = &(*crnt)->next;
@@ -31,5 +31,5 @@
 
 forall(otype T) void push( stack(T) & s, T value ) with( s ) {
-	stack_node(T)* n = malloc();
+	stack_node(T)* n = alloc();
 	(*n){ value, head };
 	head = n;
Index: doc/papers/general/evaluation/timing.dat
===================================================================
--- doc/papers/general/evaluation/timing.dat	(revision fb2ce273bf7ea018ec95cd8701c1ca1eea477a46)
+++ doc/papers/general/evaluation/timing.dat	(revision 28bc8c8060f9366d54baea1d0ac5681a7a636197)
@@ -1,10 +1,9 @@
 "400 million repetitions"	"C"	"\\CFA{}"	"\\CC{}"	"\\CC{obj}"
-"push\nint"	2976	2225	1522	3266
-"copy\nnt"	2932	7072	1526	3110
-"clear\nint"	1380	731	750	1488
-"pop\nint"	1444	1196	756	5156
-"push\npair"	3695	2257	953	6840
-"copy\npair"	6034	6650	994	7224
-"clear\npair"	2832	848	742	3297
-"pop\npair"	3009	5348	797	25235
-
+"push\nint"	3002	2459	1520	3305
+"copy\nint"	2985	2057	1521	3152
+"clear\nint"	1374	827	718	1469
+"pop\nint"	1416	1221	717	5467
+"push\npair"	4214	2752	946	6826
+"copy\npair"	6127	2105	993	7330
+"clear\npair"	2881	885	711	3564
+"pop\npair"	3046	5434	783	26538