Index: doc/bibliography/pl.bib
===================================================================
--- doc/bibliography/pl.bib	(revision ed5023d1a93d52e928c7120920781fefeb7fe26b)
+++ doc/bibliography/pl.bib	(revision 0393fda8cd4ef8e0811b9035990f83c2271392e8)
@@ -302,4 +302,16 @@
     year	= 2016,
     note	= {\url{https://docs.adacore.com/live/wave/arm12/pdf/arm12/arm-12.pdf}},
+}
+
+@manual{Ada22,
+    keywords	= {ISO/IEC Ada},
+    contributer	= {pabuhr@plg},
+    author	= {Ada22},
+    title	= {Ada Reference Manual},
+    edition	= {4th},
+    organization= {AXE Consultants},
+    address	= {Madison WI, USA},
+    year	= 2023,
+    note	= {\url{https://docs.adacore.com/live/wave/arm22/pdf/arm22/arm-22.pdf}},
 }
 
@@ -3921,13 +3933,10 @@
     address	= {USA},
     volume	= {13},
-    number	= {1–2},
+    number	= {1-2},
     journal	= {Higher Order Symbol. Comput.},
     year	= {2000},
     month	= apr,
-    pages	= {11–49},
-}
-
-  
-
+    pages	= {11-49},
+}
 
 @article{Eisenberg72,
Index: c/theses/fangren_yu_MMath/content1.tex
===================================================================
--- doc/theses/fangren_yu_MMath/content1.tex	(revision ed5023d1a93d52e928c7120920781fefeb7fe26b)
+++ 	(revision )
@@ -1,852 +1,0 @@
-\chapter{\CFA Features and Type System Interactions}
-\label{c:content1}
-
-This chapter discusses \CFA features introduced over time by multiple people and their interactions with the type system.
-
-
-\section{Reference Types}
-
-Reference types were added to \CFA by Robert Schluntz and Aaron Moss~\cite{Moss18}.
-The \CFA reference type generalizes the \CC reference type (and its equivalent in other modern programming languages) by providing both mutable and immutable forms and cascading referencing and dereferencing.
-Specifically, \CFA attempts to extend programmer intuition about pointers to references.
-That is, use a pointer when its primary purpose is manipulating the address of storage, \eg a top/head/tail pointer or link field in a mutable data structure.
-Here, manipulating the pointer address is the primary operation, while dereferencing the pointer to its value is the secondary operation.
-For example, \emph{within} a data structure, \eg stack or queue, all operations involve pointer addresses and the pointer may never be dereferenced because the referenced object is opaque.
-Alternatively, use a reference when its primary purpose is to alias a value, \eg a function parameter that does not copy the argument (performance reason).
-Here, manipulating the value is the primary operation, while changing the pointer address is the secondary operation.
-Succinctly, if the address changes often, use a pointer;
-if the value changes often, use a reference.
-Java has mutable references but no pointers.
-\CC has mutable pointers but immutable references;
-here, references match with functional programming.
-However, the consequence is asymmetric semantics between pointer and reference.
-\CFA adopts a uniform policy between pointers and references where mutability is a separate property made at the declaration.
-
-The following examples shows how pointers and references are treated uniformly in \CFA.
-\begin{cfa}[numbers=left,numberblanklines=false]
-int x = 1, y = 2, z = 3;$\label{p:refexamples}$
-int * p1 = &x, ** p2 = &p1,  *** p3 = &p2,	$\C{// pointers to x}$
-	@&@ r1 = x,  @&&@ r2 = r1,   @&&&@ r3 = r2;	$\C{// references to x}$
-int * p4 = &z, & r4 = z;
-
-*p1 = 3; **p2 = 3; ***p3 = 3;				$\C{// different ways to change x to 3}$
- r1 =  3;	r2 = 3;		r3 = 3;				$\C{// change x: implicit dereference *r1, **r2, ***r3}$
-**p3 = &y;	*p3 = &p4;						$\C{// change p1, p2}$
-// cancel implicit dereferences (&*)**r3, (&(&*)*)*r3, &(&*)r4
-@&@r3 = @&@y; @&&@r3 = @&&@r4;				$\C{// change r1, r2}$
-\end{cfa}
-Like pointers, reference can be cascaded, \ie a reference to a reference, \eg @&& r2@.\footnote{
-\CC uses \lstinline{&&} for rvalue reference, a feature for move semantics and handling the \lstinline{const} Hell problem.}
-Usage of a reference variable automatically performs the same number of dereferences as the number of references in its declaration, \eg @r2@ becomes @**r2@.
-Finally, to reassign a reference's address needs a mechanism to stop the auto-referencing, which is accomplished by using a single reference to cancel all the auto-dereferencing, \eg @&r3 = &y@ resets @r3@'s address to point to @y@.
-\CFA's reference type (including multi-de/references) is powerful enough to describe the lvalue rules in C by types only.
-As a result, the \CFA type checker now works on just types without using the notion of lvalue in an expression.
-(\CFA internals still use lvalue for code generation purposes.)
-
-The current reference typing rules in \CFA are summarized as follows:
-\begin{enumerate}
-    \item For a variable $x$ with declared type $T$, the variable-expression $x$ has type reference to $T$, even if $T$ itself is a reference type.
-    \item For an expression $e$ with type $T\ \&_1...\&_n$, \ie $T$ followed by $n$ references, where $T$ is not a reference type, the expression $\&T$ (address of $T$) has type $T *$ followed by $n - 1$ references.
-    \item For an expression $e$ with type $T *\&_1...\&_n$, \ie $T *$  followed by $n$ references, the expression $* T$ (dereference $T$) has type $T$ followed by $n + 1$ references.
-	This rule is the reverse of the previous rule, such that address-of and dereference operators are perfect inverses.
-    \item When matching argument and parameter types at a function call, the number of references on the argument type is stripped off to match the number of references on the parameter type.\footnote{
-	\CFA handles the \lstinline{const} Hell problem by allowing rvalue expressions to be converted to reference values by implicitly creating a temporary variable, with some restrictions.
-	As well, there is a warning that the output nature of the reference is lost.
-	Hence, a single function handles \lstinline{const} and non-\lstinline{const} as constness is handled at the call site.}
-	In an assignment context, the left-hand-side operand-type is always reduced to a single reference.
-\end{enumerate}
-Under this ruleset, a type parameter is never bound to a reference type in a function-call context.
-\begin{cfa}
-forall( T ) void f( T & );
-int & x;
-f( x );  // implicit dereference
-\end{cfa}
-The call applies an implicit dereference once to @x@ so the call is typed @f( int & )@ with @T = int@, rather than with @T = int &@.
-
-As for a pointer type, a reference type may have qualifiers, where @const@ is most common.
-\begin{cfa}
-int x = 3; $\C{// mutable}$
-const int cx = 5; $\C{// immutable}$
-int * const cp = &x, $\C{// immutable pointer pointer/reference}$
-	& const cr = cx;
-const int * const ccp = &cx, $\C{// immutable value and pointer/reference}$
-			& const ccr = cx;
-\end{cfa}
-\begin{cquote}
-\setlength{\tabcolsep}{26pt}
-\begin{tabular}{@{}lll@{}}
-pointer & reference & \\
-\begin{cfa}
-*cp = 7;
-cp = &x;
-*ccp = 7;
-ccp = &cx;
-\end{cfa}
-&
-\begin{cfa}
-cr = 7;
-cr = &x;
-*ccr = 7;
-ccr = &cx;
-\end{cfa}
-&
-\begin{cfa}
-// allowed
-// error, assignment of read-only variable
-// error, assignment of read-only location
-// error, assignment of read-only variable
-\end{cfa}
-\end{tabular}
-\end{cquote}
-Interestingly, C does not give a warning/error if a @const@ pointer is not initialized, while \CC does.
-Hence, type @& const@ is similar to a \CC reference, but \CFA does not preclude initialization with a non-variable address.
-For example, in system's programming, there are cases where an immutable address is initialized to a specific memory location.
-\begin{cfa}
-int & const mem_map = *0xe45bbc67@p@; $\C{// hardware mapped registers ('p' for pointer)}$
-\end{cfa}
-Finally, qualification is generalized across all pointer/reference declarations.
-\begin{cfa}
-const * const * const * const ccccp = ...
-const & const & const & const ccccr = ...
-\end{cfa}
-
-In the initial \CFA reference design, the goal was to make the reference type a \emph{real} data type \vs a restricted \CC reference, which is mostly used for choosing the argument-passing method, \ie by-value or by-reference.
-However, there is an inherent ambiguity for auto-dereferencing: every argument expression involving a reference variable can potentially mean passing the reference's value or address.
-For example, in
-\begin{cfa}
-int & x;
-forall( T ) void foo( T );
-forall( T ) void bar( T & );
-foo( x ); $\C{// means pass by value}$
-bar( x ); $\C{// means pass by reference}$
-\end{cfa}
-the call to @foo@ must pass @x@ by value, implying auto-dereference, while the call to @bar@ must pass @x@ by reference, implying no auto-dereference.
-Without any restrictions, this ambiguity limits the behaviour of reference types in \CFA polymorphic functions, where a type @T@ can bind to a reference or non-reference type.
-This ambiguity prevents the type system treating reference types the same way as other types, even if type variables could be bound to reference types.
-The reason is that \CFA uses a common \emph{object trait}\label{p:objecttrait} (constructor, destructor and assignment operators) to handle passing dynamic concrete type arguments into polymorphic functions, and the reference types are handled differently in these contexts so they do not satisfy this common interface.
-
-Moreover, there is also some discrepancy in how the reference types are treated in initialization and assignment expressions.
-For example, in line 3 of the example code on \VPageref{p:refexamples}:
-\begin{cfa}
-int @&@ r1 = x,  @&&@ r2 = r1,   @&&&@ r3 = r2;	$\C{// references to x}$
-\end{cfa}
-each initialization expression is implicitly dereferenced to match the types, \eg @&x@, because an address is always required and a variable normally returns its value;
-\CC does the same implicit dereference when initializing its reference variables.
-For lines 6 and 9 of the previous example code:
-\begin{cfa}
- r1 =  3;	r2 = 3;   r3 = 3;				$\C{// change x: implicit dereference *r1, **r2, ***r3}$
-@&@r3 = @&@y; @&&@r3 = @&&@r4;				$\C{// change r1, r2}$
-\end{cfa}
-there are no actual assignment operators defined for reference types that can be overloaded;
-instead, all reference assignments are handled by semantic actions in the type system.
-In fact, the reassignment of reference variables is setup internally to use the assignment operators for pointer types.
-Finally, there is an annoying issue (although purely syntactic) for setting a mutable reference to a specific address like null, @int & r1 = *0p@, which looks like dereferencing a null pointer.
-Here, the expression is rewritten as @int & r1 = &(*0p)@, like the variable dereference of @x@ above.
-However, the implicit @&@ needs to be cancelled for an address, which is done with the @*@, \ie @&*@ cancel each other, giving @0p@.
-Therefore, the dereferencing operation does not actually happen and the expression is translated into directly initializing the reference variable with the address.
-Note, the same explicit reference is used in \CC to set a reference variable to null.
-\begin{c++}
-int & ip = @*@(int *)nullptr;
-\end{c++}
-which is used in certain systems-programming situations.
-
-When generic types were introduced to \CFA~\cite{Moss19}, some thought was given to allow reference types as type arguments.
-\begin{cfa}
-forall( T ) struct vector { T t; }; $\C{// generic type}$
-vector( int @&@ ) vec; $\C{// vector of references to ints}$
-\end{cfa}
-While it is possible to write a reference type as the argument to a generic type, it is disallowed in assertion checking, if the generic type requires the object trait \see{\VPageref{p:objecttrait}} for the type argument, a fairly common use case.
-Even if the object trait can be made optional, the current type system often misbehaves by adding undesirable auto-dereference on the referenced-to value rather than the reference variable itself, as intended.
-Some tweaks are necessary to accommodate reference types in polymorphic contexts and it is unclear what can or cannot be achieved.
-Currently, there are contexts where the \CFA programmer is forced to use a pointer type, giving up the benefits of auto-dereference operations and better syntax with reference types.
-
-
-\section{Tuple Types}
-
-The addition of tuples to \CFA can be traced back to the original design by David Till in \mbox{K-W C}~\cite{Till89,Buhr94a}, a predecessor project of \CFA.
-The primary purpose of tuples is to eliminate output parameters or creating an aggregate type to return multiple values from a function, called a multiple-value-returning (MVR) function.
-Traditionally, returning multiple values is accomplished via (in/)output parameters or packing the results in a structure.
-The following examples show these two techniques for a function returning three values.
-\begin{cquote}
-\begin{tabular}{@{}l@{\hspace{20pt}}l@{}}
-\begin{cfa}
-int foo( int &p1, int &p2 );  // in/out parameters
-int x, y = 3, z = 4;
-x = foo( y, z );  // return 3 values: 1 out, 2 in/out
-\end{cfa}
-&
-\begin{cfa}
-struct Ret { int x, y, z; } ret;
-Ret foo( int p1, int p2 );  // return structure
-ret = foo( 3, 4 );  // return 3 values: 3 out
-\end{cfa}
-\end{tabular}
-\end{cquote}
-Like Go, K-W C allows direct return of multiple values into a tuple.
-\begin{cfa}
-@[int, int, int]@ foo( int p1, int p2 );
-@[x, y, z]@ = foo( 3, 4 );  // return 3 values into a tuple
-\end{cfa}
-Along with making returning multiple values a first-class feature, tuples were extended to simplify a number of other common context that normally require multiple statements and/or additional declarations, all of which reduces coding time and errors.
-\begin{cfa}
-[x, y, z] = 3; $\C[2in]{// x = 3; y = 3; z = 3, where types may be different}$
-[x, y] = [y, x]; $\C{// int tmp = x; x = y; y = tmp;}$
-void bar( int, int, int );
-bar( foo( 3, 4 ) ); $\C{// int t0, t1, t2; [t0, t1, t2] = foo( 3, 4 ); bar( t0, t1, t2 );}$
-x = foo( 3, 4 )@.1@; $\C{//  int t0, t1, t2; [t0, t1, t2] = foo( 3, 4 ); x = t1;}\CRT$
-\end{cfa}
-For the call to @bar@, the three results (tuple value) from @foo@ are \newterm{flattened} into individual arguments.
-Flattening is how tuples interact with parameter and subscript lists, and with other tuples, \eg:
-\begin{cfa}
-[ [ x, y ], z, [a, b, c] ] = [2, [3, 4], foo( 3, 4) ]  $\C{// structured}$
-[ x, y, z, a, b, c] = [2, 3, 4, foo( 3, 4) ]  $\C{// flattened, where foo results are t0, t1, t2}$
-\end{cfa}
-Note, in most cases, a tuple is just compile-time syntactic-sugar for a number of individual assignments statements and possibly temporary variables.
-Only when returning a tuple from a function is there the notion of a tuple value.
-
-Overloading in the \CFA type-system must support complex composition of tuples and C type conversions using a costing scheme giving lower cost to widening conversions that do not truncate a value.
-\begin{cfa}
-[ int, int ] foo$\(_1\)$( int );			$\C{// overloaded foo functions}$
-[ double ] foo$\(_2\)$( int );
-void bar( int, double, double );
-bar( @foo@( 3 ), @foo@( 3 ) );
-\end{cfa}
-The type resolver only has the tuple return types to resolve the call to @bar@ as the @foo@ parameters are identical.
-The resultion involves unifying the flattened @foo@ return values with @bar@'s parameter list.
-However, no combination of @foo@s is an exact match with @bar@'s parameters;
-thus, the resolver applies C conversions to obtain a best match.
-The resulting minimal cost expression is @bar( foo@$_1$@( 3 ), foo@$_2$@( 3 ) )@, where the two possible coversions are (@int@, {\color{red}@int@}, @double@) to (@int@, {\color{red}@double@}, @double@) with a safe (widening) conversion from @int@ to @double@ versus ({\color{red}@double@}, {\color{red}@int@}, {\color{red}@int@}) to ({\color{red}@int@}, {\color{red}@double@}, {\color{red}@double@}) with one unsafe (narrowing) conversion from @double@ to @int@ and two safe conversions from @int@ to @double@.
-Go provides a simplified mechanism where only one tuple returning function call is allowed and there are no implicit type conversions.
-\begin{lstlisting}[language=Go]
-func foo( int ) ( int, int, int ) { return 3, 7, 8 }
-func bar( int, int, int ) { ... } // types must match
-bar( foo( 3 ) ) // only one tuple returning call 
-\end{lstlisting}
-Hence, programers cannot take advantage of the full power of tuples but type match is straightforward.
-
-K-W C also supported tuple variables, but with a strong distinction between tuples and tuple values/variables.
-\begin{quote}
-Note that tuple variables are not themselves tuples.
-Tuple variables reference contiguous areas of storage, in which tuple values are stored;
-tuple variables and tuple values are entities which appear during program execution.
-Tuples, on the other hand, are compile-time constructs;
-they are lists of expressions, whose values may not be stored contiguously.~\cite[p.~30]{Till89}
-\end{quote}
-Fundamentally, a tuple value/variable is just a structure (contiguous areas) with an alternate (tuple) interface.
-A tuple value/variable is assignable (like structures), its fields can be accessed using position rather than name qualification, and it can interact with regular tuples.
-\begin{cfa}
-[ int, int, int ] t1, t2;
-t1 = t2;  			$\C{// tuple assignment}$
-t1@.1@ = t2@.0@;  	$\C{// position qualification}$
-int x, y, z;
-t1 = [ x, y, z ];  	$\C{// interact with regular tuples}$
-[ x, y, z ] = t1;
-bar( t2 );			$\C{// bar defined above}$
-\end{cfa}
-\VRef[Figure]{f:Nesting} shows the difference is nesting of structures and tuples.
-The left \CC nested-structure is named so it is not flattened.
-The middle C/\CC nested-structure is unnamed and flattened, causing an error because @i@ and @j@ are duplication names.
-The right \CFA nested tuple cannot be named and is flattened.
-C allows named nested-structures, but they have issues \see{\VRef{s:inlineSubstructure}}.
-Note, it is common in C to have an unnamed @union@ so its fields do not require qualification.
-
-\begin{figure}
-\setlength{\tabcolsep}{20pt}
-\begin{tabular}{@{}ll@{\hspace{90pt}}l@{}}
-\multicolumn{1}{c}{\CC} & \multicolumn{1}{c}{C/\CC} & \multicolumn{1}{c}{tuple} \\
-\begin{cfa}
-struct S {
-	struct @T@ { // not flattened
-		int i, j;
-	};
-	int i, j;
-};
-\end{cfa}
-&
-\begin{cfa}
-struct S2 {
-	struct ${\color{red}/* unnamed */}$ { // flatten
-		int i, j;
-	};
-	int i, j;
-};
-\end{cfa}
-&
-\begin{cfa}
-[
-	[ // flatten
-		1, 2
-	]
-	1, 2
-]
-\end{cfa}
-\end{tabular}
-\caption{Nesting}
-\label{f:Nesting}
-\end{figure}
-
-The primary issues for tuples in the \CFA type system are polymorphism and conversions.
-Specifically, does it make sense to have a generic (polymorphic) tuple type, as is possible for a structure?
-\begin{cfa}
-forall( T, S ) [ T, S ] GT; // polymorphic tuple type
-GT( int, char ) @gt@;
-GT( int, double ) @gt@;
-@gt@ = [ 3, 'a' ];  // select correct gt
-@gt@ = [ 3, 3.5 ];
-\end{cfa}
-and what is the cost model for C conversions across multiple values?
-\begin{cfa}
-gt = [ 'a', 3L ];  // select correct gt
-\end{cfa}
-
-
-\section{Tuple Implementation}
-
-As noted, tradition languages manipulate multiple values by in/out parameters and/or structures.
-K-W C adopted the structure for tuple values or variables, and as needed, the fields are extracted by field access operations.
-As well, for the tuple-assignment implementation, the left-hand tuple expression is expanded into assignments of each component, creating temporary variables to avoid unexpected side effects.
-For example, the tuple value returned from @foo@ is a structure, and its fields are individually assigned to a left-hand tuple, @x@, @y@, @z@, \emph{or} copied directly into a corresponding tuple variable.
-
-In the second implementation of \CFA tuples by Rodolfo Gabriel Esteves~\cite{Esteves04}, a different strategy is taken to handle MVR functions.
-The return values are converted to output parameters passed in by pointers.
-When the return values of a MVR function are directly used in an assignment expression, the addresses of the left-hand operands can be directly passed into the function;
-composition of MVR functions is handled by creating temporaries for the returns.
-For example, given a function returning two values:
-\begin{cfa}
-[int, int] gives_two() { int r1, r2; ... return [ r1, r2 ]; }
-int x, y;
-[x, y] = gives_two();
-\end{cfa}
-\VRef[Figure]{f:AlternateTupleImplementation} shows the two implementation approaches.
-In the left approach, the return statement is rewritten to pack the return values into a structure, which is returned by value, and the structure fields are individually assigned to the left-hand side of the assignment.
-In the right approach, the return statement is rewritten as direct assignments into the passed-in argument addresses.
-The upside of the right implementation is consistence and no copying.
-The downside is indirection within @gives_two@ to access values, unless values get hoisted into registers for some period of time, which is common.
-
-\begin{figure}
-\begin{cquote}
-\setlength{\tabcolsep}{20pt}
-\begin{tabular}{@{}ll@{}}
-Till K-W C implementation & Esteves \CFA implementation \\
-\begin{cfa}
-struct _tuple2 { int _0; int _1; }
-struct _tuple2 gives_two() {
-	... struct _tuple2 ret = { r1, r2 };
-	return ret;
-}
-int x, y;
-struct _tuple2 _tmp = gives_two();
-x = _tmp._0; y = _tmp._1;
-\end{cfa}
-&
-\begin{cfa}
-
-void gives_two( int * r1, int * r2 ) {
-	... *r1 = ...; *r2 = ...;
-	return;
-}
-int x, y;
-
-gives_two( &x, &y );
-\end{cfa}
-\end{tabular}
-\end{cquote}
-\caption{Alternate Tuple Implementation}
-\label{f:AlternateTupleImplementation}
-\end{figure}
-
-Interestingly, in the third implementation of \CFA tuples by Robert Schluntz~\cite[\S~3]{Schluntz17}, the MVR functions revert back to structure based, where it remains in the current version of \CFA.
-The reason for the reversion is a uniform approach for tuple values/variables making tuples first-class types in \CFA, \ie allow tuples with corresponding tuple variables.
-This reversion was possible, because in parallel with Schluntz's work, generic types were added independently by Moss~\cite{Moss19}, and the tuple variables leveraged the same implementation techniques as for generic variables~\cite[\S~3.7]{Schluntz17}.
-For example, these two tuples:
-\begin{cfa}
-[double, double] x;
-[int, double, int] y;
-\end{cfa}
-are transformed internally into two generic structures:
-\begin{cfa}
-forall( T0 &, & T1 | sized( T0 ) | sized( T1 ) )
-struct _tuple2_ {
-	T0 field_0 ;   T1 field_1 ;
-};
-forall( T0 &, T1 &, T2 & | sized( T0 ) | sized( T1 ) | sized( T2 ) )
-struct _tuple3_ {
-	T0 field_0 ;   T1 field_1 ;   T2 field_2 ;
-};
-\end{cfa}
-and the declarations become instances of these generic structure types:
-\begin{cfa}
-_tuple2_( double, double ) x;
-_tuple3_( int, double, int ) y;
-\end{cfa}
-Now types @_tuple2_@ and @_tuple3_@ are available for any further 2 or 3 tuple-types in the translation unit, simplifying internal code transformations by memoizing a small set of tuple structures.
-Ultimately, these generic types are lowered to specific C structures during code generation.
-Scala, like \CC, provides tuple types through a library using this structural expansion, \eg Scala provides tuple sizes 1 through 22 via hand-coded generic data-structures.
-
-However, after experience gained building the \CFA runtime system, making tuple-types first-class seems to add little benefit.
-The main reason is that tuples usages are largely unstructured,
-\begin{cfa}
-[int, int] foo( int, int ); $\C[2in]{// unstructured function}$
-typedef [int, int] Pair; $\C{// tuple type}$
-Pair bar( Pair ); $\C{// structured function}$
-int x = 3, y = 4;
-[x, y] = foo( x, y ); $\C{// unstructured call}$
-Pair ret = [3, 4]; $\C{// tuple variable}$
-ret = bar( ret ); $\C{// structured call}\CRT$
-\end{cfa}
-Basically, creating the tuple-type @Pair@ is largely equivalent to creating a @struct@ type, and creating more types and names defeats the simplicity that tuples are trying to achieve.
-Furthermore, since operator overloading in \CFA is implemented by treating operators as overloadable functions, tuple types are very rarely used in a structured way.
-When a tuple-type expression appears in a function call (except assignment expressions, which are handled differently by mass- or multiple-assignment expansions), it is always flattened, and the tuple structure of function parameter is not considered a part of the function signatures.
-For example, these two prototypes for @foo@:
-\begin{cfa}
-void f( int, int );
-void f( @[@ int, int @]@ );
-f( 3, 4 );  // ambiguous call
-\end{cfa}
-have the same signature (a function taking two @int@s and returning nothing), and therefore invalid overloads.
-Note, the ambiguity error occurs at the call rather than at the second declaration of @f@, because it is possible to have multiple equivalent prototype definitions of a function.
-Furthermore, ordinary polymorphic type-parameters are not allowed to have tuple types.
-\begin{cfa}
-forall( T ) T foo( T );
-int x, y, z;
-[x, y, z] = foo( [x, y, z] );  // substitute tuple type for T
-\end{cfa}
-Without this restriction, the expression resolution algorithm can create too many argument-parameter matching options.
-For example, with multiple type parameters,
-\begin{cfa}
-forall( T, U ) void f( T, U );
-f( [1, 2, 3, 4] );
-\end{cfa}
-the call to @f@ can be interpreted as @T = [1]@ and @U = [2, 3, 4, 5]@, or @T = [1, 2]@ and @U = [3, 4, 5]@, and so on.
-The restriction ensures type checking remains tractable and does not take too long to compute.
-Therefore, tuple types are never present in any fixed-argument function calls, because of the flattening.
-
-\begin{comment}
-Date: Mon, 13 Jan 2025 10:09:06 -0500
-Subject: Re: structure / tuple
-To: "Peter A. Buhr" <pabuhr@uwaterloo.ca>
-CC: Andrew Beach <ajbeach@uwaterloo.ca>,
-        Michael Brooks <mlbrooks@uwaterloo.ca>,
-        Fangren Yu <f37yu@uwaterloo.ca>, Jiada Liang <j82liang@uwaterloo.ca>,
-        Alvin Zhang <alvin.zhang@uwaterloo.ca>,
-        Kyoung Seo <lseo@plg.uwaterloo.ca>
-From: Gregor Richards <gregor.richards@uwaterloo.ca>
-
-Languages support tuples to abbreviate syntax where the meaning of several
-values is obvious from context, such as returns from functions, or where the
-effort of creating a dedicated type is not worth the reward of using that type
-in exactly one location. The positions always have meanings which could be
-given names, and are only not given names for brevity. Whether that brevity is
-a good idea or not is the programmer's problem to deal with. I don't think
-there's any pragmatic value to tuples beyond brevity. (From a theoretical
-perspective, having the empty tuple is useful for type-theoretical reasons, and
-tuples are usually easier to reason about than structures, but that only
-applies to theoretical reasoning, not to actual programming.)
-
-Your distinction unstructured tuples could just as well be made for structs as
-well, if you had named arguments (or named returns?).  Personally, I think that
-having these be a syntactic distinction is a mistake. Other languages return
-fully codified tuples, and if you immediately destructure them, even the most
-naive optimizer will manage to never create an actual tuple in memory. In my
-opinion, since tuples are for brevity, they should always be declared with your
-"unstructured" syntax, and it's up to the optimizer to realize when you've
-never stored them. But, you live closer to the metal in CFA than most
-languages, so perhaps communicating that intent is of sufficient value.
-
-The only value of tuples beyond that is to make it possible for annoying
-students to use std::pair in place of ever creating their own class hierarchy
-or naming things. Then again, I hear that that is one of the hard problems in
-computer science.
-
-With valediction,
-  - Gregor Richards
-
-On 1/13/25 09:11, Peter A. Buhr wrote:
-> The CFA team has been discussing the difference between a structure and
-> tuple.  Basically, a structure has named fields and a tuple has anonymous
-> fields. As a result, structure access uses field names and tuple access uses
-> position.
->
->    struct S { int i, j, k ; };
->    S s;
->    s.i; s.j; // field access
->
->    tuple T { int, int };
->    T t;
->    t.0; t.1; // position access, zero origin
->    t[0]; t[1]; // alternate access
->
-> Hence the difference is small.
->
-> In CFA, we differentiate between unstructured and structured tuples. An
-> unstructured tuple is a lexical grouping of potentially disjoint variables.
->
->    [ int, int, int ] f();
->    void g( int, int, int );
->    x, y, z = f(); // Go unstructured tuple, flatten tuple
->    g( foo() ); // flatten tuple
->
-> Here, the tuple returned from f is flattened into disjoint variables.  A
-> structured tuple is like above and has contiguous memory.
->
-> CFA has fancy unstructured stuff like
->
->    s.[i,k] += 1; // add 1 to each field
->    t.[1,0] = 1; // don't think this works but could
->
-> which is just an unstructured tuple access (sugar).
->
-> What is your opinion of structures and tuples since the difference is
-> small. Why do many languages support both features? Are we missing some
-> important aspect of tuples that differentiates them from structures?  Is CFA
-> unique in having both unstructured and structured tuples?
-\end{comment}
-
-Finally, a type-safe variadic argument signature was added by Robert Schluntz~\cite[\S~4.1.2]{Schluntz17} using @forall@ and a new tuple parameter-type, denoted by the keyword @ttype@ in Schluntz's implementation, but changed to the ellipsis syntax similar to \CC's template parameter pack.
-For C variadics, \eg @va_list@, the number and types of the arguments must be conveyed in some way, \eg @printf@ uses a format string indicating the number and types of the arguments.
-\begin{cfa}
-int printf( const char * format, ${\color{red}\LARGE ...}$ );  // variadic list of variables to print
-\end{cfa}
-\VRef[Figure]{f:CVariadicMaxFunction} shows an $N$ argument @maxd@ function using the C untyped @va_list@ interface.
-In the example, the first argument is the number of following arguments, and the following arguments are assumed to be @double@;
-looping is used to traverse the argument pack from left to right.
-The @va_list@ interface is walking up the stack (by address) looking at the arguments pushed by the caller.
-(Magic knowledge is needed for arguments pushed using registers.)
-
-\begin{figure}
-\begin{cfa}
-double maxd( int @count@, @...@ ) { // ellipse parameter
-	double max = 0;
-	va_list args;
-	va_start( args, count );
-	for ( int i = 0; i < count; i += 1 ) {
-		double num = va_arg( args, double );
-		if ( num > max ) max = num;
-	}
-	va_end(args);
-	return max;
-}
-printf( "%g\n", maxd( @4@, 25.0, 27.3, 26.9, 25.7 ) );
-\end{cfa}
-\caption{C Variadic Maximum Function}
-\label{f:CVariadicMaxFunction}
-\end{figure}
-
-There are two common patterns for using variadic functions in \CFA.
-\begin{enumerate}[leftmargin=*]
-\item
-Argument forwarding to another function, \eg:
-\begin{cfa}
-forall( T *, TT ... | { @void ?{}( T &, TT );@ } ) // constructor assertion
-T * new( TT tp ) { return ((T *)malloc())@{@ tp @}@; }  // call constructor on storage
-\end{cfa}
-Note, the assertion on @T@ requires it to have a constructor @?{}@.
-The function @new@ calls @malloc@ to obtain storage and then invokes @T@'s constructor passing the tuple pack by flattening it over the constructor's arguments, \eg:
-\begin{cfa}
-struct S { int i, j; };
-void ?{}( S & s, int i, int j ) { s.[ i, j ] = [ i, j ]; }  // constructor
-S * sp = new( 3, 4 );  // flatten [3, 4] into call ?{}( 3, 4 );
-\end{cfa}
-\item
-Structural recursion for processing the argument-pack values one at a time, \eg:
-\begin{cfa}
-forall( T | { int ?<?( T, T ); } )
-T max( T v1, T v2 ) { return v1 < v2 ? v2 : v1; }
-$\vspace{-10pt}$
-forall( T, TT ... | { T max( T, T ); T max( TT ); } )
-T max( T arg, TT args ) { return max( arg, max( args ) ); }
-\end{cfa}
-The first non-recursive @max@ function is the polymorphic base-case for the recursion, \ie, find the maximum of two identically typed values with a less-than (@<@) operator.
-The second recursive @max@ function takes two parameters, @T@ and the @TT@ tuple pack, handling all argument lengths greater than two.
-The recursive function computes the maximum for the first argument and the maximum value of the rest of the tuple pack.
-The call of @max@ with one argument is the recursive call, where the tuple pack is converted into two arguments by taking the first value (lisp @car@) from the tuple pack as the first argument (flattening) and the remaining pack becomes the second argument (lisp @cdr@).
-The recursion stops when the argument pack is empty.
-For example, @max( 2, 3, 4 )@ matches with the recursive function, which performs @return max( 2, max( [3, 4] ) )@ and one more step yields @return max( 2, max( 3, 4 ) )@, so the tuple pack is empty.
-\end{enumerate}
-
-As an aside, polymorphic functions are precise with respect to their parameter types, \eg @max@ states all argument values must be the same type, which logically makes sense.
-However, this precision precludes normal C conversions among the base types, \eg, this mix-mode call @max( 2h, 2l, 3.0f, 3.0ld )@ fails because the types are not the same.
-Unfortunately, this failure violates programmer intuition because there are specialized two-argument non-polymorphic versions of @max@ that work, \eg @max( 3, 3.5 )@.
-Allowing C conversions for polymorphic types will require a significant change to the type resolver.
-
-Currently in \CFA, variadic polymorphic functions are the only place tuple types are used.
-And because \CFA compiles polymorphic functions versus template expansion, many wrapper functions are generated to implement both user-defined generic-types and polymorphism with variadics.
-Fortunately, the only permitted operations on polymorphic function parameters are given by the list of assertion (trait) functions.
-Nevertheless, this small set of functions eventually needs to be called with flattened tuple arguments.
-Unfortunately, packing the variadic arguments into a rigid @struct@ type and generating all the required wrapper functions is significant work and largely wasted because most are never called.
-Interested readers can refer to pages 77-80 of Robert Schluntz's thesis to see how verbose the translator output is to implement a simple variadic call with 3 arguments.
-As the number of arguments increases, \eg a call with 5 arguments, the translator generates a concrete @struct@ types for a 4-tuple and a 3-tuple along with all the polymorphic type data for them.
-An alternative approach is to put the variadic arguments into an array, along with an offset array to retrieve each individual argument.
-This method is similar to how the C @va_list@ object is used (and how \CFA accesses polymorphic fields in a generic type), but the \CFA variadics generate the required type information to guarantee type safety (like the @printf@ format string).
-For example, given the following heterogeneous, variadic, typed @print@ and usage:
-\begin{cquote}
-\begin{tabular}{@{}ll@{}}
-\begin{cfa}
-forall( T, TT ... | { void print( T ); void print( TT ); } )
-void print( T arg , TT rest ) {
-	print( arg );
-	print( rest );
-}
-\end{cfa}
-&
-\begin{cfa}
-void print( int i ) { printf( "%d ", i ); }
-void print( double d ) { printf( "%g ", d ); }
-... // other types
-int i = 3 ; double d = 3.5;
-print( i, d );
-\end{cfa}
-\end{tabular}
-\end{cquote}
-it would look like the following using the offset-array implementation approach.
-\begin{cfa}
-void print( T arg, char * _data_rest, size_t * _offset_rest ) {
-	print( arg );
-	print( *((typeof rest.0)*) _data_rest,  $\C{// first element of rest}$
-		  _data_rest + _offset_rest[0],  $\C{// remainder of data}$
-		  _offset_rest + 1);  $\C{// remainder of offset array}$
-}
-\end{cfa}
-where the fixed-arg polymorphism for @T@ can be handled by the standard @void *@-based \CFA polymorphic calling conventions, and the type information can be deduced at the call site.
-Note, the variadic @print@ supports heterogeneous types because the polymorphic @T@ is not returned (unlike variadic @max@), so there is no cascade of type relationships.
-
-Turning tuples into first-class values in \CFA does have a few benefits, namely allowing pointers to tuples and arrays of tuples to exist.
-However, it seems unlikely that these types have realistic use cases that cannot be achieved with structures.
-And having a pointer-to-tuple type potentially forbids the simple offset-array implementation of variadic polymorphism.
-For example, in the case where a type assertion requests the pointer type @TT *@ in the above example, it forces the tuple type to be a @struct@, and thus incurring a high cost.
-My conclusion is that tuples should not be structured (first-class), rather they should be unstructured.
-This agrees with Rodolfo's original description:
-\begin{quote}
-As such, their [tuples] use does not enforce a particular memory layout, and in particular, does not guarantee that the components of a tuple occupy a contiguous region of memory.~\cite[pp.~74--75]{Esteves04}
-\end{quote}
-allowing the simplified implementations for MVR and variadic functions.
-
-Finally, a missing feature for tuples is using an MVR in an initialization context.
-Currently, this feature is \emph{only} possible when declaring a tuple variable.
-\begin{cfa}
-[int, int] ret = gives_two();  $\C{// no constructor call (unstructured)}$
-Pair ret = gives_two();  $\C{// constructor call (structured)}$
-\end{cfa}
-However, this forces the programer to use a tuple variable and possibly a tuple type to support a constructor, when they actually want separate variables with separate constructors.
-And as stated previously, type variables (structured tuples) are rare in general \CFA programming so far.
-To address this issue, while retaining the ability to leverage constructors, I proposed the following new tuple-like declaration syntax.
-\begin{cfa}
-[ int x, int y ] = gives_two();
-\end{cfa}
-where the semantics is:
-\begin{cfa}
-T t0, t1;
-[ t0, t1 ] = gives_two();
-T x = t0;  // constructor call
-T y = t1;  // constructor call
-\end{cfa}
-and the implementation performs as much copy elision as possible.
-Currently, this new declaration form is parsed by \CFA, showing its syntax is viable, but it is unimplemented because of downstream resolver issues.
-
-
-\section{\lstinline{inline} Substructure}
-\label{s:inlineSubstructure}
-
-As mentioned, C allows an anonymous aggregate type (@struct@ or @union@) to be embedded (nested) within another one \see{\VRef[Figure]{f:Nesting}}, \eg a tagged union.
-\begin{cfa}
-struct S {
-	unsigned int tag;
-	union { // anonymous nested aggregate
-		int x;  double y;  char z;
-	};
-} s;
-\end{cfa}
-Here, the @union@ combines its field into a common block of storage, and because there is no variable-name overloading in C, all of the union field names must be unique.
-Furthermore, because the union is unnamed, these field-names are hoisted into the @struct@, giving direct access, \eg @s.x@;
-hence, the union field names must be unique with the structure field names.
-The same semantics applies to a nested anonymous @struct@:
-\begin{cquote}
-\begin{tabular}{@{}l@{\hspace{35pt}}l@{}}
-original & rewritten \\
-\begin{cfa}
-struct S {
-	struct { int i, j; };
-	struct { int k, l; };
-};
-\end{cfa}
-&
-\begin{cfa}
-struct S {
-	int i, j;
-	int k, l;
-};
-\end{cfa}
-\end{tabular}
-\end{cquote}
-However, unlike the union which provides storage sharing, there is no semantic difference between the nested anonymous structure and its rewritten counterpart.
-Hence, the nested anonymous structure provides no useful capability.
-
-Nested \emph{named} aggregates are allowed in C but there is no qualification operator, like the \CC type operator `@::@', to access an inner type.
-\emph{To compensate for the missing type operator, all named nested aggregates are hoisted to global scope, regardless of the nesting depth, and type usages within the nested type are replaced with global type name.}
-Hoisting nested types can result in name collisions among types at the global level, which defeats the purpose of nesting the type.
-\VRef[Figure]{f:NestedNamedAggregate} shows the nested type @T@ is hoisted to the global scope and the declaration rewrites within structure @S@.
-Hence, the possible accesses are:
-\begin{cfa}
-struct S s;
-s.i = 1;
-s.t.i = 2;
-s.w = (struct T){ 7, 8 };
-struct T x = { 5, 6 }; // use (un)nested type name
-s.t = (struct T){ 2, 3 };
-\end{cfa}
-where @T@ is used without qualification even though it is nested in @S@.
-It is for these reasons that nested types are not used in C, and if used, are extremely confusing.
-
-\begin{figure}
-\begin{cquote}
-\begin{tabular}{@{}l@{\hspace{35pt}}l@{}}
-original & rewritten \\
-\begin{cfa}
-struct S {
-	@struct T@ {
-		int i, j;
-	} t; // warning without declaration
-	struct T w;
-	int k;
-};
-
-\end{cfa}
-&
-\begin{cfa}
-@struct T@ {
-	int i, j;
-};
-struct S {
-	@struct T t@;
-	struct T w;
-	int k;
-};
-\end{cfa}
-\end{tabular}
-\end{cquote}
-\caption{Nested Named Aggregate}
-\label{f:NestedNamedAggregate}
-\end{figure}
-
-For good reasons, \CC chose to change this semantics:
-\begin{cquote}
-\begin{description}[leftmargin=*,topsep=0pt,itemsep=0pt,parsep=0pt]
-\item[Change:] A struct is a scope in C++, not in C.
-\item[Rationale:] Class scope is crucial to C++, and a struct is a class.
-\item[Effect on original feature:] Change to semantics of well-defined feature.
-\item[Difficulty of converting:] Semantic transformation.
-\item[How widely used:] C programs use @struct@ extremely frequently, but the change is only noticeable when @struct@, enumeration, or enumerator names are referred to outside the @struct@.
-The latter is probably rare.
-\end{description}
-\hfill ISO/IEC 14882:1998 (\CC Programming Language Standard)~\cite[C.1.2.3.3]{ANSI98:C++}
-\end{cquote}
-\CFA chose to adopt the \CC non-compatible change for nested types, since \CC's change has already forced certain coding changes in C libraries that must be parsed by \CC.
-\CFA also added the ability to access from a variable through a type to a field.
-\begin{cfa}
-struct S s;  @s.i@;  @s.T@.i;
-\end{cfa}
-See the use case for this feature at the end of this section.
-
-% https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html
-
-A polymorphic extension to nested aggregates appears in the Plan-9 C dialect, used in the Bell Labs' Plan-9 research operating-system.
-The feature is called \newterm{unnamed substructures}~\cite[\S~3.3]{Thompson90new}, which continues to be supported by @gcc@ and @clang@ using the extension (@-fplan9-extensions@).
-The goal is to provided the same effect as a nested aggregate with the aggregate type defined elsewhere, which requires it be named.
-\begin{cfa}
-union U {  $\C{// unnested named}$
-	int x;  double y;  char z;
-} u;
-struct W {
-	int i;  double j;  char k;
-} w;
-struct S {
-	@struct W;@  $\C{// Plan-9 substructure}$
-	unsigned int tag;
-	@union U;@  $\C{// Plan-9 substructure}$
-} s;
-\end{cfa}
-Note, the position of the substructure is normally unimportant, unless there is some form of memory or @union@ overlay.
-Like an anonymous nested type, a named Plan-9 nested type has its field names hoisted into @struct S@, so there is direct access, \eg @s.x@ and @s.i@.
-Hence, the field names must be unique, unlike \CC nested types, but the type names are at a nested scope level, unlike type nesting in C.
-In addition, a pointer to a structure is automatically converted to a pointer to an anonymous field for assignments and function calls, providing containment inheritance with implicit subtyping, \ie @U@ $\subset$ @S@ and @W@ $\subset$ @S@, \eg:
-\begin{cfa}
-void f( union U * u );
-void g( struct W * );
-union U * up;   struct W * wp;   struct S * sp;
-up = &s; $\C{// assign pointer to U in S}$
-wp = &s; $\C{// assign pointer to W in S}$
-f( &s ); $\C{// pass pointer to U in S}$
-g( &s ); $\C{// pass pointer to W in S}$
-\end{cfa}
-Note, there is no value assignment, such as, @w = s@, to copy the @W@ field from @S@.
-
-Unfortunately, the Plan-9 designers did not lookahead to other useful features, specifically nested types.
-This nested type compiles in \CC and \CFA.
-\begin{cfa}
-struct R {
-	@struct T;@		$\C[2in]{// forward declaration, conflicts with Plan-9 syntax}$
-	struct S {		$\C{// nested types, mutually recursive reference}\CRT$
-		S * sp;   T * tp;  ...
-	};
-	struct T {
-		S * sp;   T * tp;  ...
-	};
-};
-\end{cfa}
-Note, the syntax for the forward declaration conflicts with the Plan-9 declaration syntax.
-
-\CFA extends the Plan-9 substructure by allowing polymorphism for values and pointers, where the extended substructure is denoted using @inline@.
-\begin{cfa}
-struct S {
-	@inline@ struct W;  $\C{// extended Plan-9 substructure}$
-	unsigned int tag;
-	@inline@ U;  $\C{// extended Plan-9 substructure}$
-} s;
-\end{cfa}
-Note, the declaration of @U@ is not prefixed with @union@.
-Like \CC, \CFA allows optional prefixing of type names with their kind, \eg @struct@, @union@, and @enum@, unless there is ambiguity with variable names in the same scope.
-In addition, a semi-non-compatible change is made so that Plan-9 syntax means a forward declaration in a nested type.
-Since the Plan-9 extension is not part of C and rarely used, this change has minimal impact.
-Hence, all Plan-9 semantics are denoted by the @inline@ qualifier, which is good ``eye-candy'' when reading a structure definition to spot Plan-9 definitions.
-Finally, the following code shows the value and pointer polymorphism.
-\begin{cfa}
-void f( U, U * ); $\C{// value, pointer}$
-void g( W, W * ); $\C{// value, pointer}$
-U u, * up;   S s, * sp;   W w, * wp;
-u = s;   up = sp; $\C{// value, pointer}$
-w = s;   wp = sp; $\C{// value, pointer}$
-f( s, &s ); $\C{// value, pointer}$
-g( s, &s ); $\C{// value, pointer}$
-\end{cfa}
-
-In general, non-standard C features (@gcc@) do not need any special treatment, as they are directly passed through to the C compiler.
-However, the Plan-9 semantics allow implicit conversions from the outer type to the inner type, which means the \CFA type resolver must take this information into account.
-Therefore, the \CFA resolver must implement the Plan-9 features and insert necessary type conversions into the translated code output.
-In the current version of \CFA, this is the only kind of implicit type conversion other than the standard C arithmetic conversions.
-
-Plan-9 polymorphism can result in duplicate field names.
-For example, the \newterm{diamond pattern}~\cite[\S~6.1]{Stroustrup89}\cite[\S~4]{Cargill91} can result in nested fields being embedded twice.
-\begin{cfa}
-struct A { int x; };
-struct B { inline A; };
-struct C { inline A; };
-struct D {
-	inline B;  // B.x
-	inline C;  // C.x
-} d;
-\end{cfa}
-Because the @inline@ structures are flattened, the expression @d.x@ is ambiguous, as it can refer to the embedded field either from @B@ or @C@.
-@gcc@ generates a syntax error about the duplicate member @x@.
-The equivalent \CC definition compiles:
-\begin{c++}
-struct A { int x; };
-struct B : public A {};
-struct C : public A {};
-struct D : @public B, C@ {  // multiple inheritance
-} d;
-\end{c++}
-and again the expression @d.x@ is ambiguous.
-While \CC has no direct syntax to disambiguate @x@, \ie @d.B.x@ or @d.C.x@, it is possible with casts, @((B)d).x@ or @((C)d).x@.
-Like \CC, \CFA compiles the Plan-9 version and provides direct qualification and casts to disambiguate @x@.
-While ambiguous definitions are allowed, duplicate field names is poor practice and should be avoided if possible.
-However, when a programmer does not control all code, this problem can occur and a naming workaround must exist.
Index: c/theses/fangren_yu_MMath/content2.tex
===================================================================
--- doc/theses/fangren_yu_MMath/content2.tex	(revision ed5023d1a93d52e928c7120920781fefeb7fe26b)
+++ 	(revision )
@@ -1,553 +1,0 @@
-\chapter{Resolution Algorithms}
-\label{c:content2}
-
-The \CFA's type-system provides expressive polymorphism: variables can be overloaded, functions can be overloaded by argument and return types, tuple types, generic (polymorphic) functions and types (aggregates) can have multiple type parameters with assertion restrictions;
-in addition, C's multiple implicit type-conversions must be respected.
-This generality leads to internal complexity and correspondingly higher compilation cost directly related to type resolution.
-The reason is that the type resolver must analyze \emph{each} component of an expression with many possible forms of overloading, type restrictions, and conversions.
-Designing a ruleset that behaves as expected, \ie matches C programmer expectations, and implementing an efficient algorithm is a very challenging task.
-
-My first work on the \CFA type-system was as a co-op student.
-At that time, compiling a medium-sized \CFA program using advanced polymorphism took multiple minutes (5+ minutes)~\cite[\S~5]{Yu20}.
-I found a number of type-resolution problems, which resulted in the following changes to the type-resolution algorithm.
-\begin{enumerate}[itemsep=0pt]
-\item
-new AST data structure
-\item
-special symbol table and argument-dependent lookup
-\item
-late assertion-satisfaction
-\item
-revised function-type representation
-\item
-skip pruning on expressions for function types
-\end{enumerate}
-\VRef[Table]{t:SelectedFileByCompilerBuild} shows improvements for selected tests with accumulated reductions in compile time across each of the 5 fixes.
-The large reduction in compilation time significantly improved the development of the \CFA's runtime because of its frequent compilation cycles.
-
-\begin{table}[htb]
-\centering
-\caption{Compile time of selected files by compiler build in seconds}
-\label{t:SelectedFileByCompilerBuild}
-\lstset{deletekeywords={mutex,thread}}
-\setlength{\extrarowheight}{1pt}
-\vspace*{-4pt}
-\begin{tabular}{l|r|rrrrr}
-				& \multicolumn{1}{c|}{Original}	& \multicolumn{5}{c}{Accumulative fixes}	\\
-\cline{3-7}				
-Test case		& \multicolumn{1}{c|}{resolver}	& \multicolumn{1}{c}{1}	& \multicolumn{1}{c}{2}	& \multicolumn{1}{c}{3}	& \multicolumn{1}{c}{4}	& \multicolumn{1}{c}{5}	\\
-\hline
-@lib/fstream@	& 86.4	& 25.2	& 10.8	& 9.5	& 7.8	& 7.1	\\
-@lib/mutex@		& 210.4	& 77.4	& 16.7	& 15.1	& 12.6	& 11.7	\\
-@lib/vector@	& 17.2	& 8.9	& 3.1	& 2.8	& 2.4	& 2.2	\\
-@lib/stdlib@	& 16.6	& 8.3	& 3.2	& 2.9	& 2.6	& 2.4	\\
-@test/io2@		& 300.8	& 53.6	& 43.2	& 27.9	& 19.1	& 16.3	\\
-@test/thread@	& 210.9	& 73.5	& 17.0	& 15.1	& 12.6	& 11.8	\\
-\end{tabular}
-\medskip
-\newline
-Results are average of 5 runs (3 runs if time exceeds 100 seconds)
-\end{table}
-
-Since then, many new features utilizing the expressiveness of \CFA's type system have been implemented, such as generic container types similar to those in \CC's standard template library.
-During the development of multiple \CFA programs and libraries, more weaknesses and design flaws have been discovered within the type system.
-Some of those problems arise from the newly introduced language features described in the previous chapter.
-In addition, fixing unexpected interactions within the type system has presented challenges.
-This chapter describes in detail the type-resolution rules currently in use and some major problems that have been identified.
-Not all of those problems have immediate solutions, because fixing them may require redesigning parts of the \CFA type system at a larger scale, which correspondingly affects the language design.
-
-
-\section{Expression Cost-Model}
-
-\CFA has been using a total-order expression cost-model to resolve ambiguity of overloaded expressions from the very beginning.
-Most \CFA operators can be overloaded in \CFA\footnote{
-\CFA excludes overloading the comma operator, short-circuit logical operators \lstinline{&&} and \lstinline{||}, and ternary conditional operator \lstinline{?}, all of which are short-hand control structures rather than operators.
-\CC overloads the comma and short-circuit logical operators, where the overloaded comma is esoteric short-circuit operators do not exhibit short-circuit semantics, which is confusing.};
-hence, they are treated the same way as other function calls with the same rules for overload resolution.
-
-In \CFA, candidates of an overloaded expression are ranked by numerical cost elements, which account any type conversions needed from a call-site argument type to the matching function parameter type, as well as polymorphic types and restrictions introduced by the @forall@ clause.
-Currently, the expression cost has the following components, ranked from highest to lowest.
-\begin{enumerate}
-\item \textbf{Unsafe} cost representing a narrowing conversion of arithmetic types, \eg @int@ to @short@, and qualifier-dropping conversions for pointer and reference types.
-\item \textbf{Polymorphic} cost for a function parameter type that is or contains a polymorphic type variable.
-\item \textbf{Safe} cost representing a widening conversion \eg @short@ to @int@, qualifier-adding conversions for pointer and reference types, and value conversion for enumeration constants.
-\item \textbf{Variable} cost that counts the number of polymorphic variables, if any, introduced by the @forall@ clause in the function declaration.
-\item \textbf{Specialization} cost counting the number of restrictions introduced by type assertions.
-\end{enumerate}
-The comparison of cost tuple is by lexicographical order, from unsafe (highest) to specialization (lowest), with ties moving to the next lowest item.
-At a subexpression level, the lowest cost candidate for each result type is included as a possible interpretation of the expression;
-at the top level, all possible interpretations of different types are considered (generating a total ordering) and the overall lowest cost is selected as the final interpretation of the expression.
-Glen Ditchfield first proposed this costing model~\cite[\S~4.4.5]{Ditchfield92} to generate a resolution behaviour that is reasonable to C programmers based on existing conversions in the C programming language.
-This model carried over into the first implementation of the \CFA type-system by Richard Bilson~\cite[\S~2.2]{Bilson03}, and was extended but not redesigned by Aaron Moss~\cite[chap.~4]{Moss19}.
-Moss's work began to show problems with the underlying costing model;
-these design issues are part of this work.
-
-\begin{comment}
-From: Richard Bilson <rcbilson@gmail.com>
-Date: Tue, 10 Dec 2024 09:54:59 -0500
-Subject: Re: cost model
-To: "Peter A. Buhr" <pabuhr@fastmail.fm>
-
-I didn't invent it although I may have refined it somewhat. The idea of the
-conversion cost is from Glen's thesis, see for instance page 90
-
-On Tue, Dec 10, 2024 at 9:35AM Peter A. Buhr <pabuhr@fastmail.fm> wrote:
-> Cforall has a costing model based on an N-tuple using lexicographical ordering.
-> (unsafe, polymorphic, safe, variable, specialization)
->
-> Did you invent this costing model as a cheap and cheerful way to encode Glen's
-> type system?
-
-From: Richard Bilson <rcbilson@gmail.com>
-Date: Tue, 10 Dec 2024 17:04:26 -0500
-Subject: Re: cost model
-To: "Peter A. Buhr" <pabuhr@fastmail.fm>
-
-Yes, I think that's fair to say.
-
-On Tue, Dec 10, 2024 at 2:22PM Peter A. Buhr <pabuhr@fastmail.fm> wrote:
-> Found it thanks. But Glen never implemented anything (small examples). So it
-> feels like you took his conversion-cost idea and created an implementation table
-> with the lexicographical comparison.
-\end{comment}
-
-In many languages that support function/method overloading, such as \CC and Java, a partial-order system decides which interpretation of the expression gets selected.
-Hence, sometimes the candidates are incomparable (none are considered a better match), and only when one candidate is considered better than all others (maximal with respect to the partial order) is the expression unambiguous.
-Specifically, the resolution algorithms used in \CC and Java are greedy, selecting the best match for each sub-expression without considering the higher-level ones (bottom-up).
-Therefore, at each resolution step, the arguments are already given unique interpretations, so the ordering only needs to compare different sets of conversion targets (function parameter types) on the same set of input.
-\begin{cfa}
-@generate a C++ example here@
-\end{cfa}
-
-In \CFA, trying to use such a system is problematic because of the presence of return-type overloading of functions and variable.
-Specifically, \CFA expression resolution considers multiple interpretations of argument subexpressions with different types, \eg:
-\begin{cfa}
-@generate a CFA example here@
-\end{cfa}
-so it is possible that both the selected function and the set of arguments are different, and cannot be compared with a partial-ordering system.
-This situation arises often in \CFA, even in the simple expression @f(x)@, where both the function name @f@ and variable name @x@ are overloaded.
-
-Ada is another programming language that has overloading based on return type.
-Although Ada also allows implicit type conversions of function arguments, it is fairly conservative on resolving ambiguities.
-\begin{cfa}
-@generate an Ada example here@
-\end{cfa}
-There are only two preference rules in Ada overload resolution, one for primitive arithmetic operators and one for universal access types (analogous to @void *@ in C);
-any other cases where an expression has multiple legal interpretations are considered ambiguous.
-The current overload resolution system for \CFA is on the other end of the spectrum, as it tries to order every legal interpretations of an expression and chooses the best one according to cost, occasionally giving unexpected results.
-
-Interestingly, the \CFA cost-based model can sometimes make expression resolution too permissive because it always attempts to select the lowest cost option, and only when there are multiple options tied at the lowest cost does it report the expression is ambiguous.
-The reason is that there are so many elements in the cost tuple, the type system ``tries too hard to discover a match'', and therefore, ties are uncommon.
-Other than the case of multiple exact matches, where all have cost zero, incomparable candidates under a partial ordering of being more specific can often have different expression costs since different kinds of implicit conversions are involved, resulting in seemingly arbitrary overload selections.
-
-There are currently at least three different situations where the polymorphic cost element of the cost model does not yield a candidate selection that is clearly justifiable, and one of them is straight up wrong.
-\begin{enumerate}[leftmargin=*]
-\item Polymorphic exact match versus non-polymorphic inexact match.
-\begin{cfa}
-forall( T ) void f( T );		$\C[2.5in]{// 1}$
-void f( long );					$\C{// 2}$
-f( 42 );						$\C{// currently selects 2}\CRT$
-\end{cfa}
-Under the current cost model, option 1 incurs a polymorphic cost from matching the argument type @int@ to type variable @T@, and option 2 incurs a safe cost from integer promotion of type @int@ to @long@.
-Since polymorphic cost is ranked above safe conversion cost, option 2 is considered to have lower cost and gets selected.
-
-In contrast, the template deduction and overload resolution rules in \CC selects option 1 (converting @forall@ to the equivalent function template declaration).
-\CC performs template argument deduction and overload candidate ranking in two separate steps.
-\begin{itemize}
-\item
-In the first step, the type parameters are deduced for each primary function template, and if the corresponding template instantiation succeeds, the resulting function prototype is added to the resolution candidate set.
-\item
-In the second step, the implicit conversions (if any) applied to argument types are compared after taking away top-level qualifiers and references.
-It then prefers an exact match, followed by basic type promotions (roughly corresponds to safe conversion in \CFA), and then other kinds of conversions (roughly corresponds to unsafe conversion in \CFA).
-Only when the type conversions are the same does it prioritize a non-template candidate.
-\end{itemize}
-In this example, option 1 produces the prototype @void f( int )@, which gives an exact match and therefore takes priority.
-The \CC resolution rules effectively makes option 2 a specialization that only applies to type @long@ exactly,\footnote{\CC does have explicit template specializations, however they do not participate directly in overload resolution and can sometimes lead to unintuitive results.} while the current \CFA rules make option 2 apply for all integral types below @long@.
-This difference could be explained as compensating for \CFA polymorphic functions being separately compiled versus template inlining;
-hence, calling them requires passing type information and assertions increasing the runtime cost.
-\item
-Having a lower total polymorphic cost does not always mean a function is more specialized.
-The following example is from Moss's thesis, which discusses some improvements to the \CFA cost-model.
-He claims the following function prototypes are increasingly more constrained:
-\begin{cfa}
-forall( T, U ) void f( T, U );	$\C[2.75in]{// 1 polymorphic}$
-forall( T ) void f( T, T );		$\C{// 2 less polymorphic}$
-forall( T ) void f( T, int );	$\C{// 3 even less polymorphic}$
-forall( T ) void f( T *, int ); $\C{// 4 least polymorphic}\CRT$
-\end{cfa}
-This argument is not entirely correct.
-Although it is true that both the sequence 1, 2 and 1, 3, 4 are increasingly more constrained on the argument types, option 2 is not comparable to either of option 3 or 4;
-they actually describe independent constraints on the two arguments.
-Specifically, option 2 says the two arguments must have the same type, while option 3 states the second argument must have type @int@, 
-Because two constraints can independently be satisfied, neither should be considered a better match when trying to resolve a call to @f@ with argument types @(int, int)@;
-reporting such an expression as ambiguous is more appropriate.
-The example illustrates the limitation of using a numerical cost value as it cannot represent these complicated cases.
-\item
-A generic type can require more type variables to describe a more specific type.
-For example, a generic function taking a @pair@-type, requires two type variables.
-\begin{cfa}
-forall( T, U ) void f( pair( T, U ) ); $\C[2.75in]{// 1}$
-\end{cfa}
-Add a function taking any type.
-\begin{cfa}
-forall( T ) void f( T );		$\C{// 2}\CRT$
-\end{cfa}
-Passing a @pair@ variable to @f@ gives a cost of 1 poly, 2 variable for the @pair@ overload, versus a cost of 1 poly, 1 variable for the unconstrained overload.
-Programmer expectation is to select the former, but the cost model selects the latter;
-either could work.
-As a result, simply counting the number of polymorphic type variables is no longer correct to order the function candidates as being more constrained.
-\end{enumerate}
-
-These inconsistencies are not easily solvable in the current cost-model, meaning the currently \CFA codebase has to workaround these defects.
-One potential solution is to mix the conversion cost and \CC-like partial ordering of specializations.
-For example, observe that the first three elements (unsafe, polymorphic and safe conversions) in the \CFA cost-tuple are related to the argument/parameter types, while the other two elements (polymorphic variable and assertion counts) are properties of the function declaration.
-Using this observation, it is reasonable to have an ordering that compares the argument/parameter conversion costs first and uses the partial ordering of specializations as a tiebreaker.
-Hence, the \CC template-specialization ordering can be applied to \CFA with a slight modification.
-
-At the meantime, some other improvements have been made to the expression cost system.
-Most notably, the expression resolution algorithm now consistently uses the globally minimal cost interpretation, as discussed in a previous technical report.
-While implementing the change, there are also two detailed issues that need to be addressed for the new rules to fully work.
-
-The first deals with a common idiom in \CFA that creates many overloads with equal cost.
-Many C library functions have multiple versions for different argument types.
-For example, the absolute-value function is defined for basic arithmetic types with different names, since C does not support overloading.
-\begin{cquote}
-\begin{tabular}{@{}lll@{}}
-\begin{cfa}
-int abs( int );
-long labs( long );
-long long int llabs( long long int );
-\end{cfa}
-&
-\begin{cfa}
-double fabs( double );
-float fabsf( float );
-long double fabsl( long double );
-\end{cfa}
-&
-\begin{cfa}
-cabs, cabsf, $and$ cabsl
-$for$ _Complex
-
-\end{cfa}
-\end{tabular}
-\end{cquote}
-It is cumbersome for programmers to remember these function names and select the correct one.
-If an incorrect version is picked, the program compiles but with potential negative consequences such as using an integral version with a floating-point argument.
-In \CFA, these functions are wrapped by functions with the overloaded name @abs@, which results in multiple overloads with the same total cost when some conversion is needed.
-For example, @long x = abs( 42 )@ resolves to @long abs( long )@ with @int@ argument 42 converted to @long@ or @int abs( int )@ converting the result to @long@.
-In this example, the choice could be arbitrary because both yield identical results.
-However, for @int i = abs( -9223372036854775807LL )@, the result is @-1@, suggesting some kind of type error rather than silently generating a logically incorrect result.
-There are multiple overload groupings of C functions into a single \CFA name, so usage should not report an ambiguity or warning unless there is a significant chance of error.
-
-While testing the effects of the updated cost rule, the following example was found in the \CFA standard library.
-\begin{cfa}
-static inline double __to_readyQ_avg( unsigned long long intsc ) {
-	if ( unlikely( 0 == intsc ) ) return 0.0; 
-	else return log2( @intsc@ ); // implicit conversion happens here
-}
-\end{cfa}
-This helper function is used for performance logging as part of computing a geometric mean;
-it is called during summing of logarithmic values.
-However, the function name @log2@ is overloaded in \CFA for both integer and floating point types.
-In this case, the integer overload returns an integral result, truncating any small fractional part of the logarithm, so the sum is slightly incorrect.
-When experimenting with the updated cost rules, it flagged the @log2@ call as an ambiguous expression.
-When asked, the developer expected the floating-point overload because of return-type overloading.
-This mistake went unnoticed because the truncated component was insignificant in the performance logging.
-Investigation of this example leads to the decision that the resolution algorithm favors a lower conversion cost up the expression tree when the total global cost is equal.
-
-Another change addresses the issue that C arithmetic expressions have unique meanings governed by its arithmetic conversion rules.
-\begin{enumerate}[leftmargin=*,topsep=5pt,itemsep=4pt]
-\item
-First, if the corresponding real type of either operand is @long double@, the other operand is converted, without change of type domain, to a type whose corresponding real type is @long double@.
-\item
-Otherwise, if the corresponding real type of either operand is @double@, the other operand is converted, without change of type domain, to a type whose corresponding real type is @double@.
-\item
-Otherwise, if the corresponding real type of either operand is @float@, the other operand is converted, without change of type domain, to a type whose corresponding real type is @float@.\footnote{
-For example, addition of a \lstinline{double _Complex} and a \lstinline{float} entails just the conversion of the \lstinline{float} operand to \lstinline{double} (and yields a \lstinline{double _Complex} result).}
-\item
-Otherwise, the integer promotions are performed on both operands.
-Then the following rules are applied to the promoted operands:
-\begin{enumerate}[topsep=5pt,itemsep=4pt]
-\item
-If both operands have the same type, then no further conversion is needed.
-\item
-Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
-\item
-\label{p:SignedToUnsignedSafe}
-Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
-\item
-\label{p:UnsignedToSignedUnsafe}
-Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
-\item
-\label{p:Miscellaneous}
-Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
-
-\hfill\cite[\S~6.3.1.8]{C11}
-\end{enumerate}
-\end{enumerate}
-\VRef[Figure]{f:CExpressionConversions} shows the C arithmetic conversions graphically.
-\VRef[Rule]{p:SignedToUnsignedSafe} says an unsigned type is safely convertible to an signed type with greater rank, while \VRef[rule]{p:UnsignedToSignedUnsafe} says a signed type is unsafely convertible to an unsigned type.
-Therefore, these two rules cover every possible case.
-\VRef[Rule]{p:Miscellaneous} is the catch-all to accommodate any kinds of exotic integral representations.
-These conversions are applied greedily at local points within an expression.
-Because there is no overloading in C, except for builtin operators, no cost model is needed to differentiate among alternatives that could result in ambiguous matches.
-
-\begin{figure}
-\input{C_expression_conversion.pstex_t}
-\caption{C Expression Conversions: T1 operator T2}
-\label{f:CExpressionConversions}
-
-\smallskip
-\input{CFA_curr_arithmetic_conversion.pstex_t}
-\caption{\CFA Total-Ordering Expression Conversions}
-\label{f:CFACurrArithmeticConversions}
-
-\smallskip
-\input{CFA_arithmetic_conversion.pstex_t}
-\caption{\CFA Partial-Ordering Expression Conversions}
-\label{f:CFAArithmeticConversions}
-\end{figure}
-
-\VRef[Figure]{f:CFACurrArithmeticConversions} shows the current \CFA total-order arithmetic-conversions graphically.
-Here, the unsafe cost of signed to unsigned is factored into the ranking, so the safe conversion is selected over an unsafe one.
-Furthermore, an integral option is taken before considering a floating option.
-This model locally matches the C approach, but provides an ordering when there are many overloaded alternative.
-However, as Moss pointed out overload resolution by total cost has problems, \eg handling cast expressions.
-\begin{cquote}
-\ldots if a cast argument has an unambiguous interpretation as a conversion argument then it must be interpreted as such, even if the ascription interpretation would have a lower overall cost.
-This is demonstrated in the following example, adapted from the C standard library:
-\begin{cfa}
-unsigned long long x;
-(unsigned)( x >> 32 );
-\end{cfa}
-\vspace{5pt}
-In C semantics, this example is unambiguously upcasting 32 to @unsigned long long@, performing the shift, then downcasting the result to @unsigned@, at cost (1, 0, 3, 1, 0, 0, 0).
-If ascription were allowed to be a first-class interpretation of a cast expression, it would be cheaper to select the @unsigned@ interpretation of @?>>?@ by downcasting @x@ to @unsigned@ and upcasting 32 to @unsigned@, at a total cost of (1, 0, 1, 1, 0, 0, 0).
-\PAB{This example feels incorrect. Assume a word size of 4 bits (long long). Given the value 1001, shift it >> 2, gives 10, which fits in a half word (int). Now truncate 1001 to a halfword 01 and shift it >> 2, gives 00. So the lower-cost alternative does generate the correct result. Basically truncation is an unsafe operation and hence has a huge negative cost.}
-However, this break from C semantics is not backwards compatible, so to maintain C compatibility, the \CFA resolver selects the lowest-cost interpretation of the cast argument for which a conversion or coercion to the target type exists (upcasting to @unsigned long long@ in the example above, due to the lack of unsafe downcasts), using the cost of the conversion itself only as a tie-breaker.\cite[pp.~46-47]{Moss19}
-\end{cquote}
-However, a cast expression is not necessary to have such inconsistency to C semantics.
-An implicit argument-parameter type conversion in a function calls can replicate this issue without an explicit cast.
-\begin{cfa}
-unsigned long long x;
-void f( unsigned );
-f( x >> 32 );
-\end{cfa}
-The argument generation has the same effect as using an explicit cast to coerce the type of expression @x >> 32@ to @unsigned@.
-This example shows the problem is not coming from the cast expressions, but from modelling the C builtin operators as overloaded functions.
-As a result, a different rule is used to select the builtin function candidates to fix this problem:
-if an expression has any legal interpretations as a C builtin operation, only the lowest cost one is kept, regardless of the result type.
-
-\VRef[Figure]{f:CFAArithmeticConversions} shows an alternative \CFA partial-order arithmetic-conversions graphically.
-The idea here is to first look for the best integral alternative because integral calculations are exact and cheap.
-If no integral solution is found, than there are different rules to select among floating-point alternatives.
-This approach reduces the search space by partitioning: only look at integral operations, and then only look at float-point operations.
-
-
-\section{Type Unification}
-
-Type unification is the algorithm that assigns values to each (free) type parameters such that the types of the provided arguments and function parameters match.
-
-\CFA does not attempt to do any type \textit{inference} \see{\VRef{s:IntoTypeInferencing}}: it has no anonymous functions (\ie lambdas, commonly found in functional programming and also used in \CC and Java), and the variable types must all be explicitly defined (no auto typing).
-This restriction makes the unification problem more tractable in \CFA, as the argument types at each call site are usually all specified.
-There is a single exception case when the function return type contains a free type variable that does not occur in any of the argument types, and subsequently passed into the parent expression. One such example is the \CFA wrapper for @malloc@ which also infers size argument from the deducted return type.
-\begin{cfa}
-forall (T*) T* malloc() {
-	return (T*) malloc (sizeof(T)); // calls C malloc with the size inferred from context
-}
-\end{cfa}
-A top level expression whose type still contains an unbounded type variable is considered ill-formed as such an expression is inherently ambiguous.
-
-The unification algorithm in \CFA is originally presented in Richard Bilson's thesis and it has remained as the basis of the algorithm currently in use.
-Some additions have been made in order to accommodate for the newly added type features to the language.
-To summarize, the \CFA type unification has two minor variants: an \textit{exact} mode and an \textit{inexact} mode.
-The inexact mode is applied at top level argument-parameter matching, and attempts to find an assignment to the type variables such that the argument types can be converted to parameter types with minimal cost as defined in the previous section.
-The exact mode is required since the type matching algorithm operates recursively and the inner types often have to match exactly, for example there is no common type for the pointer types \textbf{int*} and \textbf{long*} while there is for @int@ and @long@.
-With the introduction of generic record types, the parameters must match exactly as well; currently there are no covariance or contravariance supported for the generics.
-
-One simplification was made to the \CFA language that makes modelling the type system easier: polymorphic function pointer types are no longer allowed in declarations.
-The polymorphic function declarations themselves are still treated as function pointer types internally, however the change means that formal parameter types can no longer be polymorphic.
-Previously it is possible to write function prototypes such as 
-\begin{cfa}
-void f( forall( T | { T -?( T ); } ) T (*p)( T, T ) );
-\end{cfa}
-Notably, the unification algorithm implemented in the \CFA compiler has never managed to trace the assertion parameters on the formal types at all, and the problem of determining if two assertion sets are compatible may very likely be undecidable in general, given the ability of synthesizing more complicated types by the nesting of generics.
-Eventually, the reason of not allowing such constructs is that they mostly do not provide useful type features for actual programming tasks.
-A subroutine of a program operates on the arguments provided at the call site together with (if any) local and global variables, and even though the subroutine can be polymorphic, the types will be supported at each call site.
-On each invocation the types to be operate on can be determined from the arguments provided, and therefore there should not be a need to pass a polymorphic function pointer, which can take any type in principle.
-
-For example, consider a polymorphic function that takes one argument of type @T@ and another pointer to a subroutine
-\begin{cfa}
-forall (T) void f (T x, forall (U) void (*g) (U));
-\end{cfa}
-Making @g@ polymorphic in this context would almost certainly be unnecessary, since it can only be called inside the body of @f@ and the types of the argument would have been known anyways, although it can potentially depend on @T@.
-Moreover, requesting a function parameter to be able to potentially work on any input type at all would always impose too much constraint on the arguments, as it only needs to make each calls inside the body of @f@ valid.
-
-Rewriting the prototype to
-\begin{cfa}
-forall (T) void f (T x, void (*g) (T));
-\end{cfa}
-will be sufficient (or potentially, some compound type synthesized from @T@), in which case @g@ is no longer a polymorphic type on itself.
-The ``monomorphization'' conversion is readily supported in \CFA, either by explicitly assigning a polymorphic function name to a compatible function pointer type, or implicitly done in deducing assertion parameters (which will be discussed in the next section).
-Such technique can be directly applied to argument passing, which is essentially just assignment to function parameter variables.
-There might be some edge cases where the supplied subroutine @g@ is called on arguments of different types inside the body of @f@ and so declared as polymorphic, but such use case is rare and the benefit of allowing such constructs seems to be minimal in practice.
-
-The result of this change is that the unification algorithm no longer needs to distinguish open and closed type-variables, as the latter is not allowed to exist.
-The only type variables that need to be handled are those introduced by the @forall@ clause from the function prototype.
-The subtype relationship between function types is now also rendered redundant since none of the function parameter or return types can be polymorphic, and no basic types or non-polymorphic function types are subtypes of any other type.
-Therefore the goal of (exact) type unification now simply becomes finding a substitution that produces identical types.
-The assertion set need to be resolved is also always just the declarations on the function prototype, which also simplifies the assertion satisfaction algorithm by a bit, as will be discussed further in the next section.
-
-The type unification results are stored in a type environment data structure, which represents all the type variables currently in scope as equivalent classes, together with their bound types and some other extra information, such as whether the bound type is allowed to be opaque (i.e.
-a forward declaration without definition in scope), and whether the bounds are allowed to be widened.
-In the more general approach commonly used in functional languages, the unification variables are given a lower bound and an upper bound to account for covariance and contravariance of types.
-\CFA currently does not implement any variance with its generic types, and does not allow polymorphic function types, therefore no explicit upper bound is needed and one simple binding value for each equivalence class suffices.
-However, since type conversions are allowed in \CFA, the type environment needs to keep track on which type variables are allowed conversions.
-This behavior is notably different from \CC template argument deduction which enforces an exact match everywhere unless the template argument types are explicitly given.
-For example, a polymorphic maximum function in \CFA can be called with arguments of different arithmetic types and the result follows the usual arithmetic conversion rules, while such expression is not allowed by \CC:
-\begin{cfa}
-forall (T | {int ?<? (T, T); }) T max (T, T);
-
-max (42, 3.14); // OK, T=double; requires explicit type annotation in C++ such as max<double>(42, 3.14);
-\end{cfa}
-
-From a theoretical point of view, the simplified implementation of type environment has its shortcomings. There are some cases that do not work nicely with this implementation and some compromise has to be made. A more detailed discussion follows at the end of this chapter.
-
-
-\section{Satisfaction of Assertions}
-
-The assertion satisfaction problem greatly increases the complexity of \CFA expression resolution.
-Past experiments have shown that the majority of time is spent in resolving the assertions for those expressions that takes the longest time to resolve.
-Even though a few heuristics-based optimizations are introduced to the compiler now, this remains to be the most costly part of compiling a \CFA program.
-The major difficulty of resolving assertions is that the problem can become recursive, since the expression used to satisfy an outstanding assertion can have its own assertions, and in theory this can go on indefinitely.
-Detecting infinite recursion cases in general is not algorithmically possible and it is not attempted in the compiler.
-Instead, a fixed maximum depth of recursive assertions is imposed.
-This approach is also taken by \CC compilers as template argument deduction is also similarly undecidable in general.
-
-
-In previous versions of \CFA this number was set at 4; as the compiler becomes more optimized and capable of handling more complex expressions in a reasonable amount of time, I have increased the limit to 8 and in most occasions it does not lead to trouble.
-Very rarely there will be a case where the infinite recursion produces an exponentially growing assertion set, causing minutes of time wasted before the limit is reached.
-Fortunately it is very hard to run into this situation with realistic \CFA code, and the ones that were found all have some clear characteristics, which can be prevented by some clever tricks.
-In fact, some of the performance optimizations come from analyzing these problematic cases.
-One example of such will be presented later in this section.
-
-While the assertion satisfaction problem in isolation looks like just another expression to resolve, the recursive nature makes some techniques applied to expression resolution without assertions no longer possible.
-The most significant impact is that the type unification has a side effect, namely editing the type environment (equivalence classes and bindings), which means that if one expression has multiple associated assertions, they are not independent as the changes to the type environment must be compatible for all the assertions to be resolved.
-Particularly, if one assertion parameter can be resolved in multiple different ways, all of the results need to be checked to make sure the change to type variable bindings are compatible with other assertions to be resolved.
-A naive algorithm that simply picks any pending assertion to resolve and continue in a depth-first search could be very inefficient and especially prone of falling into an infinite loop, while in many cases it can be avoided by examining other assertions that can provide insight on the desired type binding: if one assertion parameter can only be matched by a unique option, we can then update the type bindings confidently without the need of backtracking.
-
-
-The algorithm currently used in \CFA compiler is designed by Aaron Moss through a simplified prototype experiment that captures most of \CFA type system features and ported back to the actual language.
-It can be described as a mix of breadth- and depth-first search in a staged approach.
-
-
-To resolve a set of assertions, the algorithm first attempts to resolve each assertion item individually.
-There are three possible outcomes on resolving each assertion:
-\begin{enumerate}
-\item If no matches are found, the algorithm terminates with a failure.
-\item If exactly one match is found, the type environment is updated immediately, and used in resolving any remaining assertions.
-\item If multiple matches are found, the assertion candidates with their updated type environments are stored in a list that will be checked for compatibility at the end.
-\end{enumerate}
-When all assertion items are resolved successfully, the algorithm attempts to combine the ambiguously resolved assertions to produce mutually compatible assignments.
-If any new assertions are introduced by the selected candidates, the algorithm is applied recursively, until there are none pending resolution or the recursion limit is reached which results in a failure.
-
-It has been discovered in practice that the efficiency of such algorithm can sometimes be very sensitive to the order of resolving assertions.
-Suppose an unbound type variable @T@ appears in two assertions, one can be uniquely resolved and allow the type @T@ to be inferred immediately, and another has many ways to be resolved, each results in @T@ being bound to a different concrete type.
-If the first assertion is examined first by the algorithm, the deducted type can then be utilized in resolving the second assertion and eliminate many incorrect options without producing the list of candidates pending further checks.
-In practice, this have a significant impact when an unbound type @T@ is declared to satisfy the basic \emph{object assertions}\footnote{The term is borrowed from object-oriented languages although \CFA is not object-oriented in principle.} of having a default constructor, destructor, and copy assignment operations.
-Since they are defined for every type currently in scope, there are often hundreds or even thousands of matches to these functions with an unspecified operand type, and in most of the cases the value of @T@ can be deduced by resolving another assertion first, which then allows specific object lifetime functions to be looked up since they are sorted internally by the operand type, and greatly reduces the number of wasted resolution attempts.
-
-Currently this issue also causes the capability of the assertion resolution algorithm to be limited.
-Assertion matching is implemented to be more restricted than expression resolution in general, in that the parameter types must match exactly, rather than just merely callable.
-If one function declaration includes an assertion of @void f(T)@ and only a @f(long)@ is currently in scope, trying to resolve the assertion with @T=int@ would not work.
-Loosening the assertion matching requirement causes many assertion variables to have multiple matches and makes the delayed combination step too costly.
-
-Given all the issues caused by assertion resolution potentially creating new type variable bindings, a natural progression is to put some restrictions on free type variables such that all the type variables will be bound when the expression reaches assertion resolution stage.
-A type variable introduced by the @forall@ clause of function declaration can appear in parameter types, return types and assertion variables.
-If it appears in the parameter types, it will be bound when matching the arguments to parameters at the call site.
-If it only appears in the return type, it can be eventually figured out by the context in principle.
-The current implementation in \CFA compiler does not do enough return type deduction as it performs eager assertion resolution, and the return type information cannot be known in general before the parent expression is resolved, unless the expression is in an initialization context, in which the type of variable to be initialized is certainly known.
-By delaying the assertion resolution until the return type becomes known, this problem can be circumvented.
-The truly problematic case occurs if a type variable does not appear in either of the parameter or return types and only appears in some assertion variables.
-Such case is very rare and it is not evident that forcing every type variable to appear at least once in parameter or return types limits the expressiveness of \CFA type system to a significant extent.
-In the next chapter I will discuss about a proposal of including type declarations in traits rather than having all type variables appear in the trait parameter list, which could be helpful for providing equivalent functionality of having an unbound type parameter in assertion variables, and also addressing some of the variable cost issue discussed in section 4.1.
-
-
-\subsection*{Caching Assertion Resolution Results}
-
-In Aaron Moss's prototype design and experiments, a potential optimization of caching the result of already resolved assertions is discussed.
-Based on the experiment results, this approach can improve the performance of expression resolution in general, and sometimes allow hard instances of assertion resolution problems to be solved that are otherwise infeasible, for example when the resolution would encounter infinite loops.
-
-The problem that makes this approach tricky to be implemented correctly is that the resolution algorithm has side effects, namely modifying the type bindings in the environment.
-If we were to cache those results that cause the type bindings to be modified, it would be necessary to store the changes to type bindings too, and in case where multiple candidates can be used to satisfy one assertion parameter, all of them needs to be cached including those that are not eventually selected, since the side effect can produce different results depending on the context.
-
-In the original design of \CFA that includes unrestricted polymorphic formal parameters that can have assertions on themselves, the problem is even more complicated as new declarations can be introduced in scope during expression resolution.
-Here is one such example taken from Bilson:
-\begin{cfa}
-void f( forall( T | { T -?( T ); } ) T (*p)( T, T ) );
-forall( U, V | { U -?( U ); V -?( V ); } ) U g( U, V ) );
-f( g );
-\end{cfa}
-The inner assertion parameter on the \textit{closed} type variable @T@ is used to satisfy the assertions on @U@ and @V@ in this example.
-
-However, as per the previous discussions on this topic, polymorphic function pointer types have been removed from \CFA, since correctly implementing assertion matching is not possible in general.
-Without closed parameters (and therefore no have-set for assertions) the set of declarations in scope remains unchanged while resolving any expression.
-The current \CFA implementation also does not attempt to widen any already bound type parameters to satisfy an assertion.
-Note that such restriction does mean that certain kinds of expressions cannot be resolved, for example:
-\begin{cfa}
-forall (T | {void f(T);}) void g(T);
-void f (long);
-g(42);
-\end{cfa}
-The call @g(42)@ is rejected since no attempt is made to widen the parameter type @T@ from @int@ to @long@.
-Such problem could be mitigated if we allow inexact matches of assertions, but cannot be eliminated completely, if @T@ is matched in a parameterized type, including pointers and references:
-\begin{cfa}
-forall (T | {void f(T*);}) void g(T);
-void f (long *);
-g(42);
-\end{cfa}
-Here the only way to resolve the call @g(42)@ is to allow assertion resolution to widen the parameter @T@, since even with inexact matching, @int*@ cannot be converted to @long*@.
-
-
-\section{Compiler Implementation Considerations}
-\CFA is still an experimental language and there is no formal specification of expression resolution rules yet. 
-Presently there is also only one reference implementation, namely the cfa-cc translator, which is under active development and the specific behavior of the implementation can change frequently as new features are added.
-
-Ideally, the goal of expression resolution involving polymorphic functions would be to find the set of type variable assignments such that the global conversion cost is minimal and all assertion variables can be satisfied. 
-Unfortunately, with a lot of complications involving implicit conversions and assertion variables, fully achieving this goal is not realistic. The \CFA compiler is specifically not designed to accommodate for all edge cases, either. 
-Instead it makes a few restrictions to simplify the algorithm so that most expressions that will be encountered in actual code can still pass type checking within a reasonable amount of time.
-
-As previously mentioned, \CFA polymorphic type resolution is based on a modified version of unification algorithm, where both equivalence (exact) and subtyping (inexact) relations are considered. However, the implementation of type environment is simplified; it only stores a tentative type binding with a flag indicating whether \textit{widening} is possible for an equivalence class of type variables. 
-Formally speaking, this means the type environment used in \CFA compiler is only capable of representing \textit{lower bound} constraints.
-This simplification can still work well most of the time, given the following properties of the existing \CFA type system and the resolution algorithms in use:
-
-\begin{enumerate}
-	\item Type resolution almost exclusively proceeds in bottom-up order, which naturally produces lower bound constraints. Since all identifiers can be overloaded in \CFA, not much definite information can be gained from top-down. In principle it would be possible to detect non-overloaded function names and perform top-down resolution for those; however, the prototype experiments have shown that such optimization does not give a meaningful performance benefit, and therefore it is not implemented. 
-	\item Few nontrivial subtyping relationships are present in \CFA, namely the arithmetic types presented in \VRef[Figure]{f:CFACurrArithmeticConversions}, and qualified pointer/reference types. In particular, \CFA lacks the nominal inheritance subtyping present in object-oriented languages, and the generic types do not support covariance on type parameters. As a result, named types such as structs are always matched by strict equivalence, including any type parameters should they exist.
-	\item Unlike in functional programming where subtyping between arrow types exists, \ie if $T_2 <: T_1$ and $U_1 <: U_2$ then $T_1 \rightarrow T_2 <: U_1 \rightarrow U_2$, \CFA uses C function pointer types and the parameter/return types must match exactly to be compatible.
-\end{enumerate}
-
-\CFA does attempt to incorporate type information propagated from upstream in the case of variable declaration with initializer, since the type of the variable being initialized is definitely known. It is known that the current type environment representation is flawed in handling such type deduction when the return type in the initializer is polymorphic, and an inefficient workaround has to be performed in certain cases. An annotated example is included in the \CFA compiler source code:
-
-\begin{cfa}
-// If resolution is unsuccessful with a target type, try again without, since it
-// will sometimes succeed when it wouldn't with a target type binding.
-// For example:
-forall( otype T ) T & ?[]( T *, ptrdiff_t );
-const char * x = "hello world";
-int ch = x[0];
-// Fails with simple return type binding (xxx -- check this!) as follows:
-// * T is bound to int
-// * (x: const char *) is unified with int *, which fails
-\end{cfa}
-
-The problem here is that we can only represent the constraints $T = int$ and $int <: T$, but since the type information flows in the opposite direction, the proper constraint for this case is $T <: int$, which cannot be represented in the simplified type environment. Currently, an attempt to resolve with equality constraint generated from the initialized variable is still made, since it is often the correct type binding (especially in the case where the initialized variable is a struct), and when such attempt fails, the resolution algorithm is rerun without the initialization context.
-
-One additional remark to make here is that \CFA does not provide a mechanism to explicitly specify values for polymorphic type parameters. In \CC for example, users may specify template arguments in angle brackets, which could be useful when automatic deduction fails, \eg @max<double>(42, 3.14)@. 
-There are some partial workarounds such as adding casts to the arguments, but they are not guaranteed to work in all cases. If a type parameter appears in the function return type, however, using the ascription (return) cast will force the desired type binding, since the cast only compiles when the expression type matches exactly with the target.
-
-\section{Related Work}
-
-
-
Index: doc/theses/fangren_yu_MMath/features.tex
===================================================================
--- doc/theses/fangren_yu_MMath/features.tex	(revision 0393fda8cd4ef8e0811b9035990f83c2271392e8)
+++ doc/theses/fangren_yu_MMath/features.tex	(revision 0393fda8cd4ef8e0811b9035990f83c2271392e8)
@@ -0,0 +1,852 @@
+\chapter{\CFA Features and Type System Interactions}
+\label{c:content1}
+
+This chapter discusses \CFA features introduced over time by multiple people and their interactions with the type system.
+
+
+\section{Reference Types}
+
+Reference types were added to \CFA by Robert Schluntz and Aaron Moss~\cite{Moss18}.
+The \CFA reference type generalizes the \CC reference type (and its equivalent in other modern programming languages) by providing both mutable and immutable forms and cascading referencing and dereferencing.
+Specifically, \CFA attempts to extend programmer intuition about pointers to references.
+That is, use a pointer when its primary purpose is manipulating the address of storage, \eg a top/head/tail pointer or link field in a mutable data structure.
+Here, manipulating the pointer address is the primary operation, while dereferencing the pointer to its value is the secondary operation.
+For example, \emph{within} a data structure, \eg stack or queue, all operations involve pointer addresses and the pointer may never be dereferenced because the referenced object is opaque.
+Alternatively, use a reference when its primary purpose is to alias a value, \eg a function parameter that does not copy the argument (performance reason).
+Here, manipulating the value is the primary operation, while changing the pointer address is the secondary operation.
+Succinctly, if the address changes often, use a pointer;
+if the value changes often, use a reference.
+Java has mutable references but no pointers.
+\CC has mutable pointers but immutable references;
+here, references match with functional programming.
+However, the consequence is asymmetric semantics between pointer and reference.
+\CFA adopts a uniform policy between pointers and references where mutability is a separate property made at the declaration.
+
+The following examples shows how pointers and references are treated uniformly in \CFA.
+\begin{cfa}[numbers=left,numberblanklines=false]
+int x = 1, y = 2, z = 3;$\label{p:refexamples}$
+int * p1 = &x, ** p2 = &p1,  *** p3 = &p2,	$\C{// pointers to x}$
+	@&@ r1 = x,  @&&@ r2 = r1,   @&&&@ r3 = r2;	$\C{// references to x}$
+int * p4 = &z, & r4 = z;
+
+*p1 = 3; **p2 = 3; ***p3 = 3;				$\C{// different ways to change x to 3}$
+ r1 =  3;	r2 = 3;		r3 = 3;				$\C{// change x: implicit dereference *r1, **r2, ***r3}$
+**p3 = &y;	*p3 = &p4;						$\C{// change p1, p2}$
+// cancel implicit dereferences (&*)**r3, (&(&*)*)*r3, &(&*)r4
+@&@r3 = @&@y; @&&@r3 = @&&@r4;				$\C{// change r1, r2}$
+\end{cfa}
+Like pointers, reference can be cascaded, \ie a reference to a reference, \eg @&& r2@.\footnote{
+\CC uses \lstinline{&&} for rvalue reference, a feature for move semantics and handling the \lstinline{const} Hell problem.}
+Usage of a reference variable automatically performs the same number of dereferences as the number of references in its declaration, \eg @r2@ becomes @**r2@.
+Finally, to reassign a reference's address needs a mechanism to stop the auto-referencing, which is accomplished by using a single reference to cancel all the auto-dereferencing, \eg @&r3 = &y@ resets @r3@'s address to point to @y@.
+\CFA's reference type (including multi-de/references) is powerful enough to describe the lvalue rules in C by types only.
+As a result, the \CFA type checker now works on just types without using the notion of lvalue in an expression.
+(\CFA internals still use lvalue for code generation purposes.)
+
+The current reference typing rules in \CFA are summarized as follows:
+\begin{enumerate}
+    \item For a variable $x$ with declared type $T$, the variable-expression $x$ has type reference to $T$, even if $T$ itself is a reference type.
+    \item For an expression $e$ with type $T\ \&_1...\&_n$, \ie $T$ followed by $n$ references, where $T$ is not a reference type, the expression $\&T$ (address of $T$) has type $T *$ followed by $n - 1$ references.
+    \item For an expression $e$ with type $T *\&_1...\&_n$, \ie $T *$  followed by $n$ references, the expression $* T$ (dereference $T$) has type $T$ followed by $n + 1$ references.
+	This rule is the reverse of the previous rule, such that address-of and dereference operators are perfect inverses.
+    \item When matching argument and parameter types at a function call, the number of references on the argument type is stripped off to match the number of references on the parameter type.\footnote{
+	\CFA handles the \lstinline{const} Hell problem by allowing rvalue expressions to be converted to reference values by implicitly creating a temporary variable, with some restrictions.
+	As well, there is a warning that the output nature of the reference is lost.
+	Hence, a single function handles \lstinline{const} and non-\lstinline{const} as constness is handled at the call site.}
+	In an assignment context, the left-hand-side operand-type is always reduced to a single reference.
+\end{enumerate}
+Under this ruleset, a type parameter is never bound to a reference type in a function-call context.
+\begin{cfa}
+forall( T ) void f( T & );
+int & x;
+f( x );  // implicit dereference
+\end{cfa}
+The call applies an implicit dereference once to @x@ so the call is typed @f( int & )@ with @T = int@, rather than with @T = int &@.
+
+As for a pointer type, a reference type may have qualifiers, where @const@ is most common.
+\begin{cfa}
+int x = 3; $\C{// mutable}$
+const int cx = 5; $\C{// immutable}$
+int * const cp = &x, $\C{// immutable pointer pointer/reference}$
+	& const cr = cx;
+const int * const ccp = &cx, $\C{// immutable value and pointer/reference}$
+			& const ccr = cx;
+\end{cfa}
+\begin{cquote}
+\setlength{\tabcolsep}{26pt}
+\begin{tabular}{@{}lll@{}}
+pointer & reference & \\
+\begin{cfa}
+*cp = 7;
+cp = &x;
+*ccp = 7;
+ccp = &cx;
+\end{cfa}
+&
+\begin{cfa}
+cr = 7;
+cr = &x;
+*ccr = 7;
+ccr = &cx;
+\end{cfa}
+&
+\begin{cfa}
+// allowed
+// error, assignment of read-only variable
+// error, assignment of read-only location
+// error, assignment of read-only variable
+\end{cfa}
+\end{tabular}
+\end{cquote}
+Interestingly, C does not give a warning/error if a @const@ pointer is not initialized, while \CC does.
+Hence, type @& const@ is similar to a \CC reference, but \CFA does not preclude initialization with a non-variable address.
+For example, in system's programming, there are cases where an immutable address is initialized to a specific memory location.
+\begin{cfa}
+int & const mem_map = *0xe45bbc67@p@; $\C{// hardware mapped registers ('p' for pointer)}$
+\end{cfa}
+Finally, qualification is generalized across all pointer/reference declarations.
+\begin{cfa}
+const * const * const * const ccccp = ...
+const & const & const & const ccccr = ...
+\end{cfa}
+
+In the initial \CFA reference design, the goal was to make the reference type a \emph{real} data type \vs a restricted \CC reference, which is mostly used for choosing the argument-passing method, \ie by-value or by-reference.
+However, there is an inherent ambiguity for auto-dereferencing: every argument expression involving a reference variable can potentially mean passing the reference's value or address.
+For example, in
+\begin{cfa}
+int & x;
+forall( T ) void foo( T );
+forall( T ) void bar( T & );
+foo( x ); $\C{// means pass by value}$
+bar( x ); $\C{// means pass by reference}$
+\end{cfa}
+the call to @foo@ must pass @x@ by value, implying auto-dereference, while the call to @bar@ must pass @x@ by reference, implying no auto-dereference.
+Without any restrictions, this ambiguity limits the behaviour of reference types in \CFA polymorphic functions, where a type @T@ can bind to a reference or non-reference type.
+This ambiguity prevents the type system treating reference types the same way as other types, even if type variables could be bound to reference types.
+The reason is that \CFA uses a common \emph{object trait}\label{p:objecttrait} (constructor, destructor and assignment operators) to handle passing dynamic concrete type arguments into polymorphic functions, and the reference types are handled differently in these contexts so they do not satisfy this common interface.
+
+Moreover, there is also some discrepancy in how the reference types are treated in initialization and assignment expressions.
+For example, in line 3 of the example code on \VPageref{p:refexamples}:
+\begin{cfa}
+int @&@ r1 = x,  @&&@ r2 = r1,   @&&&@ r3 = r2;	$\C{// references to x}$
+\end{cfa}
+each initialization expression is implicitly dereferenced to match the types, \eg @&x@, because an address is always required and a variable normally returns its value;
+\CC does the same implicit dereference when initializing its reference variables.
+For lines 6 and 9 of the previous example code:
+\begin{cfa}
+ r1 =  3;	r2 = 3;   r3 = 3;				$\C{// change x: implicit dereference *r1, **r2, ***r3}$
+@&@r3 = @&@y; @&&@r3 = @&&@r4;				$\C{// change r1, r2}$
+\end{cfa}
+there are no actual assignment operators defined for reference types that can be overloaded;
+instead, all reference assignments are handled by semantic actions in the type system.
+In fact, the reassignment of reference variables is setup internally to use the assignment operators for pointer types.
+Finally, there is an annoying issue (although purely syntactic) for setting a mutable reference to a specific address like null, @int & r1 = *0p@, which looks like dereferencing a null pointer.
+Here, the expression is rewritten as @int & r1 = &(*0p)@, like the variable dereference of @x@ above.
+However, the implicit @&@ needs to be cancelled for an address, which is done with the @*@, \ie @&*@ cancel each other, giving @0p@.
+Therefore, the dereferencing operation does not actually happen and the expression is translated into directly initializing the reference variable with the address.
+Note, the same explicit reference is used in \CC to set a reference variable to null.
+\begin{c++}
+int & ip = @*@(int *)nullptr;
+\end{c++}
+which is used in certain systems-programming situations.
+
+When generic types were introduced to \CFA~\cite{Moss19}, some thought was given to allow reference types as type arguments.
+\begin{cfa}
+forall( T ) struct vector { T t; }; $\C{// generic type}$
+vector( int @&@ ) vec; $\C{// vector of references to ints}$
+\end{cfa}
+While it is possible to write a reference type as the argument to a generic type, it is disallowed in assertion checking, if the generic type requires the object trait \see{\VPageref{p:objecttrait}} for the type argument, a fairly common use case.
+Even if the object trait can be made optional, the current type system often misbehaves by adding undesirable auto-dereference on the referenced-to value rather than the reference variable itself, as intended.
+Some tweaks are necessary to accommodate reference types in polymorphic contexts and it is unclear what can or cannot be achieved.
+Currently, there are contexts where the \CFA programmer is forced to use a pointer type, giving up the benefits of auto-dereference operations and better syntax with reference types.
+
+
+\section{Tuple Types}
+
+The addition of tuples to \CFA can be traced back to the original design by David Till in \mbox{K-W C}~\cite{Till89,Buhr94a}, a predecessor project of \CFA.
+The primary purpose of tuples is to eliminate output parameters or creating an aggregate type to return multiple values from a function, called a multiple-value-returning (MVR) function.
+Traditionally, returning multiple values is accomplished via (in/)output parameters or packing the results in a structure.
+The following examples show these two techniques for a function returning three values.
+\begin{cquote}
+\begin{tabular}{@{}l@{\hspace{20pt}}l@{}}
+\begin{cfa}
+int foo( int &p1, int &p2 );  // in/out parameters
+int x, y = 3, z = 4;
+x = foo( y, z );  // return 3 values: 1 out, 2 in/out
+\end{cfa}
+&
+\begin{cfa}
+struct Ret { int x, y, z; } ret;
+Ret foo( int p1, int p2 );  // return structure
+ret = foo( 3, 4 );  // return 3 values: 3 out
+\end{cfa}
+\end{tabular}
+\end{cquote}
+Like Go, K-W C allows direct return of multiple values into a tuple.
+\begin{cfa}
+@[int, int, int]@ foo( int p1, int p2 );
+@[x, y, z]@ = foo( 3, 4 );  // return 3 values into a tuple
+\end{cfa}
+Along with making returning multiple values a first-class feature, tuples were extended to simplify a number of other common context that normally require multiple statements and/or additional declarations, all of which reduces coding time and errors.
+\begin{cfa}
+[x, y, z] = 3; $\C[2in]{// x = 3; y = 3; z = 3, where types may be different}$
+[x, y] = [y, x]; $\C{// int tmp = x; x = y; y = tmp;}$
+void bar( int, int, int );
+bar( foo( 3, 4 ) ); $\C{// int t0, t1, t2; [t0, t1, t2] = foo( 3, 4 ); bar( t0, t1, t2 );}$
+x = foo( 3, 4 )@.1@; $\C{//  int t0, t1, t2; [t0, t1, t2] = foo( 3, 4 ); x = t1;}\CRT$
+\end{cfa}
+For the call to @bar@, the three results (tuple value) from @foo@ are \newterm{flattened} into individual arguments.
+Flattening is how tuples interact with parameter and subscript lists, and with other tuples, \eg:
+\begin{cfa}
+[ [ x, y ], z, [a, b, c] ] = [2, [3, 4], foo( 3, 4) ]  $\C{// structured}$
+[ x, y, z, a, b, c] = [2, 3, 4, foo( 3, 4) ]  $\C{// flattened, where foo results are t0, t1, t2}$
+\end{cfa}
+Note, in most cases, a tuple is just compile-time syntactic-sugar for a number of individual assignments statements and possibly temporary variables.
+Only when returning a tuple from a function is there the notion of a tuple value.
+
+Overloading in the \CFA type-system must support complex composition of tuples and C type conversions using a costing scheme giving lower cost to widening conversions that do not truncate a value.
+\begin{cfa}
+[ int, int ] foo$\(_1\)$( int );			$\C{// overloaded foo functions}$
+[ double ] foo$\(_2\)$( int );
+void bar( int, double, double );
+bar( @foo@( 3 ), @foo@( 3 ) );
+\end{cfa}
+The type resolver only has the tuple return types to resolve the call to @bar@ as the @foo@ parameters are identical.
+The resultion involves unifying the flattened @foo@ return values with @bar@'s parameter list.
+However, no combination of @foo@s is an exact match with @bar@'s parameters;
+thus, the resolver applies C conversions to obtain a best match.
+The resulting minimal cost expression is @bar( foo@$_1$@( 3 ), foo@$_2$@( 3 ) )@, where the two possible coversions are (@int@, {\color{red}@int@}, @double@) to (@int@, {\color{red}@double@}, @double@) with a safe (widening) conversion from @int@ to @double@ versus ({\color{red}@double@}, {\color{red}@int@}, {\color{red}@int@}) to ({\color{red}@int@}, {\color{red}@double@}, {\color{red}@double@}) with one unsafe (narrowing) conversion from @double@ to @int@ and two safe conversions from @int@ to @double@.
+Go provides a simplified mechanism where only one tuple returning function call is allowed and there are no implicit type conversions.
+\begin{lstlisting}[language=Go]
+func foo( int ) ( int, int, int ) { return 3, 7, 8 }
+func bar( int, int, int ) { ... } // types must match
+bar( foo( 3 ) ) // only one tuple returning call 
+\end{lstlisting}
+Hence, programers cannot take advantage of the full power of tuples but type match is straightforward.
+
+K-W C also supported tuple variables, but with a strong distinction between tuples and tuple values/variables.
+\begin{quote}
+Note that tuple variables are not themselves tuples.
+Tuple variables reference contiguous areas of storage, in which tuple values are stored;
+tuple variables and tuple values are entities which appear during program execution.
+Tuples, on the other hand, are compile-time constructs;
+they are lists of expressions, whose values may not be stored contiguously.~\cite[p.~30]{Till89}
+\end{quote}
+Fundamentally, a tuple value/variable is just a structure (contiguous areas) with an alternate (tuple) interface.
+A tuple value/variable is assignable (like structures), its fields can be accessed using position rather than name qualification, and it can interact with regular tuples.
+\begin{cfa}
+[ int, int, int ] t1, t2;
+t1 = t2;  			$\C{// tuple assignment}$
+t1@.1@ = t2@.0@;  	$\C{// position qualification}$
+int x, y, z;
+t1 = [ x, y, z ];  	$\C{// interact with regular tuples}$
+[ x, y, z ] = t1;
+bar( t2 );			$\C{// bar defined above}$
+\end{cfa}
+\VRef[Figure]{f:Nesting} shows the difference is nesting of structures and tuples.
+The left \CC nested-structure is named so it is not flattened.
+The middle C/\CC nested-structure is unnamed and flattened, causing an error because @i@ and @j@ are duplication names.
+The right \CFA nested tuple cannot be named and is flattened.
+C allows named nested-structures, but they have issues \see{\VRef{s:inlineSubstructure}}.
+Note, it is common in C to have an unnamed @union@ so its fields do not require qualification.
+
+\begin{figure}
+\setlength{\tabcolsep}{20pt}
+\begin{tabular}{@{}ll@{\hspace{90pt}}l@{}}
+\multicolumn{1}{c}{\CC} & \multicolumn{1}{c}{C/\CC} & \multicolumn{1}{c}{tuple} \\
+\begin{cfa}
+struct S {
+	struct @T@ { // not flattened
+		int i, j;
+	};
+	int i, j;
+};
+\end{cfa}
+&
+\begin{cfa}
+struct S2 {
+	struct ${\color{red}/* unnamed */}$ { // flatten
+		int i, j;
+	};
+	int i, j;
+};
+\end{cfa}
+&
+\begin{cfa}
+[
+	[ // flatten
+		1, 2
+	]
+	1, 2
+]
+\end{cfa}
+\end{tabular}
+\caption{Nesting}
+\label{f:Nesting}
+\end{figure}
+
+The primary issues for tuples in the \CFA type system are polymorphism and conversions.
+Specifically, does it make sense to have a generic (polymorphic) tuple type, as is possible for a structure?
+\begin{cfa}
+forall( T, S ) [ T, S ] GT; // polymorphic tuple type
+GT( int, char ) @gt@;
+GT( int, double ) @gt@;
+@gt@ = [ 3, 'a' ];  // select correct gt
+@gt@ = [ 3, 3.5 ];
+\end{cfa}
+and what is the cost model for C conversions across multiple values?
+\begin{cfa}
+gt = [ 'a', 3L ];  // select correct gt
+\end{cfa}
+
+
+\section{Tuple Implementation}
+
+As noted, tradition languages manipulate multiple values by in/out parameters and/or structures.
+K-W C adopted the structure for tuple values or variables, and as needed, the fields are extracted by field access operations.
+As well, for the tuple-assignment implementation, the left-hand tuple expression is expanded into assignments of each component, creating temporary variables to avoid unexpected side effects.
+For example, the tuple value returned from @foo@ is a structure, and its fields are individually assigned to a left-hand tuple, @x@, @y@, @z@, \emph{or} copied directly into a corresponding tuple variable.
+
+In the second implementation of \CFA tuples by Rodolfo Gabriel Esteves~\cite{Esteves04}, a different strategy is taken to handle MVR functions.
+The return values are converted to output parameters passed in by pointers.
+When the return values of a MVR function are directly used in an assignment expression, the addresses of the left-hand operands can be directly passed into the function;
+composition of MVR functions is handled by creating temporaries for the returns.
+For example, given a function returning two values:
+\begin{cfa}
+[int, int] gives_two() { int r1, r2; ... return [ r1, r2 ]; }
+int x, y;
+[x, y] = gives_two();
+\end{cfa}
+\VRef[Figure]{f:AlternateTupleImplementation} shows the two implementation approaches.
+In the left approach, the return statement is rewritten to pack the return values into a structure, which is returned by value, and the structure fields are individually assigned to the left-hand side of the assignment.
+In the right approach, the return statement is rewritten as direct assignments into the passed-in argument addresses.
+The upside of the right implementation is consistence and no copying.
+The downside is indirection within @gives_two@ to access values, unless values get hoisted into registers for some period of time, which is common.
+
+\begin{figure}
+\begin{cquote}
+\setlength{\tabcolsep}{20pt}
+\begin{tabular}{@{}ll@{}}
+Till K-W C implementation & Esteves \CFA implementation \\
+\begin{cfa}
+struct _tuple2 { int _0; int _1; }
+struct _tuple2 gives_two() {
+	... struct _tuple2 ret = { r1, r2 };
+	return ret;
+}
+int x, y;
+struct _tuple2 _tmp = gives_two();
+x = _tmp._0; y = _tmp._1;
+\end{cfa}
+&
+\begin{cfa}
+
+void gives_two( int * r1, int * r2 ) {
+	... *r1 = ...; *r2 = ...;
+	return;
+}
+int x, y;
+
+gives_two( &x, &y );
+\end{cfa}
+\end{tabular}
+\end{cquote}
+\caption{Alternate Tuple Implementation}
+\label{f:AlternateTupleImplementation}
+\end{figure}
+
+Interestingly, in the third implementation of \CFA tuples by Robert Schluntz~\cite[\S~3]{Schluntz17}, the MVR functions revert back to structure based, where it remains in the current version of \CFA.
+The reason for the reversion is a uniform approach for tuple values/variables making tuples first-class types in \CFA, \ie allow tuples with corresponding tuple variables.
+This reversion was possible, because in parallel with Schluntz's work, generic types were added independently by Moss~\cite{Moss19}, and the tuple variables leveraged the same implementation techniques as for generic variables~\cite[\S~3.7]{Schluntz17}.
+For example, these two tuples:
+\begin{cfa}
+[double, double] x;
+[int, double, int] y;
+\end{cfa}
+are transformed internally into two generic structures:
+\begin{cfa}
+forall( T0 &, & T1 | sized( T0 ) | sized( T1 ) )
+struct _tuple2_ {
+	T0 field_0 ;   T1 field_1 ;
+};
+forall( T0 &, T1 &, T2 & | sized( T0 ) | sized( T1 ) | sized( T2 ) )
+struct _tuple3_ {
+	T0 field_0 ;   T1 field_1 ;   T2 field_2 ;
+};
+\end{cfa}
+and the declarations become instances of these generic structure types:
+\begin{cfa}
+_tuple2_( double, double ) x;
+_tuple3_( int, double, int ) y;
+\end{cfa}
+Now types @_tuple2_@ and @_tuple3_@ are available for any further 2 or 3 tuple-types in the translation unit, simplifying internal code transformations by memoizing a small set of tuple structures.
+Ultimately, these generic types are lowered to specific C structures during code generation.
+Scala, like \CC, provides tuple types through a library using this structural expansion, \eg Scala provides tuple sizes 1 through 22 via hand-coded generic data-structures.
+
+However, after experience gained building the \CFA runtime system, making tuple-types first-class seems to add little benefit.
+The main reason is that tuples usages are largely unstructured,
+\begin{cfa}
+[int, int] foo( int, int ); $\C[2in]{// unstructured function}$
+typedef [int, int] Pair; $\C{// tuple type}$
+Pair bar( Pair ); $\C{// structured function}$
+int x = 3, y = 4;
+[x, y] = foo( x, y ); $\C{// unstructured call}$
+Pair ret = [3, 4]; $\C{// tuple variable}$
+ret = bar( ret ); $\C{// structured call}\CRT$
+\end{cfa}
+Basically, creating the tuple-type @Pair@ is largely equivalent to creating a @struct@ type, and creating more types and names defeats the simplicity that tuples are trying to achieve.
+Furthermore, since operator overloading in \CFA is implemented by treating operators as overloadable functions, tuple types are very rarely used in a structured way.
+When a tuple-type expression appears in a function call (except assignment expressions, which are handled differently by mass- or multiple-assignment expansions), it is always flattened, and the tuple structure of function parameter is not considered a part of the function signatures.
+For example, these two prototypes for @foo@:
+\begin{cfa}
+void f( int, int );
+void f( @[@ int, int @]@ );
+f( 3, 4 );  // ambiguous call
+\end{cfa}
+have the same signature (a function taking two @int@s and returning nothing), and therefore invalid overloads.
+Note, the ambiguity error occurs at the call rather than at the second declaration of @f@, because it is possible to have multiple equivalent prototype definitions of a function.
+Furthermore, ordinary polymorphic type-parameters are not allowed to have tuple types.
+\begin{cfa}
+forall( T ) T foo( T );
+int x, y, z;
+[x, y, z] = foo( [x, y, z] );  // substitute tuple type for T
+\end{cfa}
+Without this restriction, the expression resolution algorithm can create too many argument-parameter matching options.
+For example, with multiple type parameters,
+\begin{cfa}
+forall( T, U ) void f( T, U );
+f( [1, 2, 3, 4] );
+\end{cfa}
+the call to @f@ can be interpreted as @T = [1]@ and @U = [2, 3, 4, 5]@, or @T = [1, 2]@ and @U = [3, 4, 5]@, and so on.
+The restriction ensures type checking remains tractable and does not take too long to compute.
+Therefore, tuple types are never present in any fixed-argument function calls, because of the flattening.
+
+\begin{comment}
+Date: Mon, 13 Jan 2025 10:09:06 -0500
+Subject: Re: structure / tuple
+To: "Peter A. Buhr" <pabuhr@uwaterloo.ca>
+CC: Andrew Beach <ajbeach@uwaterloo.ca>,
+        Michael Brooks <mlbrooks@uwaterloo.ca>,
+        Fangren Yu <f37yu@uwaterloo.ca>, Jiada Liang <j82liang@uwaterloo.ca>,
+        Alvin Zhang <alvin.zhang@uwaterloo.ca>,
+        Kyoung Seo <lseo@plg.uwaterloo.ca>
+From: Gregor Richards <gregor.richards@uwaterloo.ca>
+
+Languages support tuples to abbreviate syntax where the meaning of several
+values is obvious from context, such as returns from functions, or where the
+effort of creating a dedicated type is not worth the reward of using that type
+in exactly one location. The positions always have meanings which could be
+given names, and are only not given names for brevity. Whether that brevity is
+a good idea or not is the programmer's problem to deal with. I don't think
+there's any pragmatic value to tuples beyond brevity. (From a theoretical
+perspective, having the empty tuple is useful for type-theoretical reasons, and
+tuples are usually easier to reason about than structures, but that only
+applies to theoretical reasoning, not to actual programming.)
+
+Your distinction unstructured tuples could just as well be made for structs as
+well, if you had named arguments (or named returns?).  Personally, I think that
+having these be a syntactic distinction is a mistake. Other languages return
+fully codified tuples, and if you immediately destructure them, even the most
+naive optimizer will manage to never create an actual tuple in memory. In my
+opinion, since tuples are for brevity, they should always be declared with your
+"unstructured" syntax, and it's up to the optimizer to realize when you've
+never stored them. But, you live closer to the metal in CFA than most
+languages, so perhaps communicating that intent is of sufficient value.
+
+The only value of tuples beyond that is to make it possible for annoying
+students to use std::pair in place of ever creating their own class hierarchy
+or naming things. Then again, I hear that that is one of the hard problems in
+computer science.
+
+With valediction,
+  - Gregor Richards
+
+On 1/13/25 09:11, Peter A. Buhr wrote:
+> The CFA team has been discussing the difference between a structure and
+> tuple.  Basically, a structure has named fields and a tuple has anonymous
+> fields. As a result, structure access uses field names and tuple access uses
+> position.
+>
+>    struct S { int i, j, k ; };
+>    S s;
+>    s.i; s.j; // field access
+>
+>    tuple T { int, int };
+>    T t;
+>    t.0; t.1; // position access, zero origin
+>    t[0]; t[1]; // alternate access
+>
+> Hence the difference is small.
+>
+> In CFA, we differentiate between unstructured and structured tuples. An
+> unstructured tuple is a lexical grouping of potentially disjoint variables.
+>
+>    [ int, int, int ] f();
+>    void g( int, int, int );
+>    x, y, z = f(); // Go unstructured tuple, flatten tuple
+>    g( foo() ); // flatten tuple
+>
+> Here, the tuple returned from f is flattened into disjoint variables.  A
+> structured tuple is like above and has contiguous memory.
+>
+> CFA has fancy unstructured stuff like
+>
+>    s.[i,k] += 1; // add 1 to each field
+>    t.[1,0] = 1; // don't think this works but could
+>
+> which is just an unstructured tuple access (sugar).
+>
+> What is your opinion of structures and tuples since the difference is
+> small. Why do many languages support both features? Are we missing some
+> important aspect of tuples that differentiates them from structures?  Is CFA
+> unique in having both unstructured and structured tuples?
+\end{comment}
+
+Finally, a type-safe variadic argument signature was added by Robert Schluntz~\cite[\S~4.1.2]{Schluntz17} using @forall@ and a new tuple parameter-type, denoted by the keyword @ttype@ in Schluntz's implementation, but changed to the ellipsis syntax similar to \CC's template parameter pack.
+For C variadics, \eg @va_list@, the number and types of the arguments must be conveyed in some way, \eg @printf@ uses a format string indicating the number and types of the arguments.
+\begin{cfa}
+int printf( const char * format, ${\color{red}\LARGE ...}$ );  // variadic list of variables to print
+\end{cfa}
+\VRef[Figure]{f:CVariadicMaxFunction} shows an $N$ argument @maxd@ function using the C untyped @va_list@ interface.
+In the example, the first argument is the number of following arguments, and the following arguments are assumed to be @double@;
+looping is used to traverse the argument pack from left to right.
+The @va_list@ interface is walking up the stack (by address) looking at the arguments pushed by the caller.
+(Magic knowledge is needed for arguments pushed using registers.)
+
+\begin{figure}
+\begin{cfa}
+double maxd( int @count@, @...@ ) { // ellipse parameter
+	double max = 0;
+	va_list args;
+	va_start( args, count );
+	for ( int i = 0; i < count; i += 1 ) {
+		double num = va_arg( args, double );
+		if ( num > max ) max = num;
+	}
+	va_end(args);
+	return max;
+}
+printf( "%g\n", maxd( @4@, 25.0, 27.3, 26.9, 25.7 ) );
+\end{cfa}
+\caption{C Variadic Maximum Function}
+\label{f:CVariadicMaxFunction}
+\end{figure}
+
+There are two common patterns for using variadic functions in \CFA.
+\begin{enumerate}[leftmargin=*]
+\item
+Argument forwarding to another function, \eg:
+\begin{cfa}
+forall( T *, TT ... | { @void ?{}( T &, TT );@ } ) // constructor assertion
+T * new( TT tp ) { return ((T *)malloc())@{@ tp @}@; }  // call constructor on storage
+\end{cfa}
+Note, the assertion on @T@ requires it to have a constructor @?{}@.
+The function @new@ calls @malloc@ to obtain storage and then invokes @T@'s constructor passing the tuple pack by flattening it over the constructor's arguments, \eg:
+\begin{cfa}
+struct S { int i, j; };
+void ?{}( S & s, int i, int j ) { s.[ i, j ] = [ i, j ]; }  // constructor
+S * sp = new( 3, 4 );  // flatten [3, 4] into call ?{}( 3, 4 );
+\end{cfa}
+\item
+Structural recursion for processing the argument-pack values one at a time, \eg:
+\begin{cfa}
+forall( T | { int ?<?( T, T ); } )
+T max( T v1, T v2 ) { return v1 < v2 ? v2 : v1; }
+$\vspace{-10pt}$
+forall( T, TT ... | { T max( T, T ); T max( TT ); } )
+T max( T arg, TT args ) { return max( arg, max( args ) ); }
+\end{cfa}
+The first non-recursive @max@ function is the polymorphic base-case for the recursion, \ie, find the maximum of two identically typed values with a less-than (@<@) operator.
+The second recursive @max@ function takes two parameters, @T@ and the @TT@ tuple pack, handling all argument lengths greater than two.
+The recursive function computes the maximum for the first argument and the maximum value of the rest of the tuple pack.
+The call of @max@ with one argument is the recursive call, where the tuple pack is converted into two arguments by taking the first value (lisp @car@) from the tuple pack as the first argument (flattening) and the remaining pack becomes the second argument (lisp @cdr@).
+The recursion stops when the argument pack is empty.
+For example, @max( 2, 3, 4 )@ matches with the recursive function, which performs @return max( 2, max( [3, 4] ) )@ and one more step yields @return max( 2, max( 3, 4 ) )@, so the tuple pack is empty.
+\end{enumerate}
+
+As an aside, polymorphic functions are precise with respect to their parameter types, \eg @max@ states all argument values must be the same type, which logically makes sense.
+However, this precision precludes normal C conversions among the base types, \eg, this mix-mode call @max( 2h, 2l, 3.0f, 3.0ld )@ fails because the types are not the same.
+Unfortunately, this failure violates programmer intuition because there are specialized two-argument non-polymorphic versions of @max@ that work, \eg @max( 3, 3.5 )@.
+Allowing C conversions for polymorphic types will require a significant change to the type resolver.
+
+Currently in \CFA, variadic polymorphic functions are the only place tuple types are used.
+And because \CFA compiles polymorphic functions versus template expansion, many wrapper functions are generated to implement both user-defined generic-types and polymorphism with variadics.
+Fortunately, the only permitted operations on polymorphic function parameters are given by the list of assertion (trait) functions.
+Nevertheless, this small set of functions eventually needs to be called with flattened tuple arguments.
+Unfortunately, packing the variadic arguments into a rigid @struct@ type and generating all the required wrapper functions is significant work and largely wasted because most are never called.
+Interested readers can refer to pages 77-80 of Robert Schluntz's thesis to see how verbose the translator output is to implement a simple variadic call with 3 arguments.
+As the number of arguments increases, \eg a call with 5 arguments, the translator generates a concrete @struct@ types for a 4-tuple and a 3-tuple along with all the polymorphic type data for them.
+An alternative approach is to put the variadic arguments into an array, along with an offset array to retrieve each individual argument.
+This method is similar to how the C @va_list@ object is used (and how \CFA accesses polymorphic fields in a generic type), but the \CFA variadics generate the required type information to guarantee type safety (like the @printf@ format string).
+For example, given the following heterogeneous, variadic, typed @print@ and usage:
+\begin{cquote}
+\begin{tabular}{@{}ll@{}}
+\begin{cfa}
+forall( T, TT ... | { void print( T ); void print( TT ); } )
+void print( T arg , TT rest ) {
+	print( arg );
+	print( rest );
+}
+\end{cfa}
+&
+\begin{cfa}
+void print( int i ) { printf( "%d ", i ); }
+void print( double d ) { printf( "%g ", d ); }
+... // other types
+int i = 3 ; double d = 3.5;
+print( i, d );
+\end{cfa}
+\end{tabular}
+\end{cquote}
+it would look like the following using the offset-array implementation approach.
+\begin{cfa}
+void print( T arg, char * _data_rest, size_t * _offset_rest ) {
+	print( arg );
+	print( *((typeof rest.0)*) _data_rest,  $\C{// first element of rest}$
+		  _data_rest + _offset_rest[0],  $\C{// remainder of data}$
+		  _offset_rest + 1);  $\C{// remainder of offset array}$
+}
+\end{cfa}
+where the fixed-arg polymorphism for @T@ can be handled by the standard @void *@-based \CFA polymorphic calling conventions, and the type information can be deduced at the call site.
+Note, the variadic @print@ supports heterogeneous types because the polymorphic @T@ is not returned (unlike variadic @max@), so there is no cascade of type relationships.
+
+Turning tuples into first-class values in \CFA does have a few benefits, namely allowing pointers to tuples and arrays of tuples to exist.
+However, it seems unlikely that these types have realistic use cases that cannot be achieved with structures.
+And having a pointer-to-tuple type potentially forbids the simple offset-array implementation of variadic polymorphism.
+For example, in the case where a type assertion requests the pointer type @TT *@ in the above example, it forces the tuple type to be a @struct@, and thus incurring a high cost.
+My conclusion is that tuples should not be structured (first-class), rather they should be unstructured.
+This agrees with Rodolfo's original description:
+\begin{quote}
+As such, their [tuples] use does not enforce a particular memory layout, and in particular, does not guarantee that the components of a tuple occupy a contiguous region of memory.~\cite[pp.~74--75]{Esteves04}
+\end{quote}
+allowing the simplified implementations for MVR and variadic functions.
+
+Finally, a missing feature for tuples is using an MVR in an initialization context.
+Currently, this feature is \emph{only} possible when declaring a tuple variable.
+\begin{cfa}
+[int, int] ret = gives_two();  $\C{// no constructor call (unstructured)}$
+Pair ret = gives_two();  $\C{// constructor call (structured)}$
+\end{cfa}
+However, this forces the programer to use a tuple variable and possibly a tuple type to support a constructor, when they actually want separate variables with separate constructors.
+And as stated previously, type variables (structured tuples) are rare in general \CFA programming so far.
+To address this issue, while retaining the ability to leverage constructors, I proposed the following new tuple-like declaration syntax.
+\begin{cfa}
+[ int x, int y ] = gives_two();
+\end{cfa}
+where the semantics is:
+\begin{cfa}
+T t0, t1;
+[ t0, t1 ] = gives_two();
+T x = t0;  // constructor call
+T y = t1;  // constructor call
+\end{cfa}
+and the implementation performs as much copy elision as possible.
+Currently, this new declaration form is parsed by \CFA, showing its syntax is viable, but it is unimplemented because of downstream resolver issues.
+
+
+\section{\lstinline{inline} Substructure}
+\label{s:inlineSubstructure}
+
+As mentioned, C allows an anonymous aggregate type (@struct@ or @union@) to be embedded (nested) within another one \see{\VRef[Figure]{f:Nesting}}, \eg a tagged union.
+\begin{cfa}
+struct S {
+	unsigned int tag;
+	union { // anonymous nested aggregate
+		int x;  double y;  char z;
+	};
+} s;
+\end{cfa}
+Here, the @union@ combines its field into a common block of storage, and because there is no variable-name overloading in C, all of the union field names must be unique.
+Furthermore, because the union is unnamed, these field-names are hoisted into the @struct@, giving direct access, \eg @s.x@;
+hence, the union field names must be unique with the structure field names.
+The same semantics applies to a nested anonymous @struct@:
+\begin{cquote}
+\begin{tabular}{@{}l@{\hspace{35pt}}l@{}}
+original & rewritten \\
+\begin{cfa}
+struct S {
+	struct { int i, j; };
+	struct { int k, l; };
+};
+\end{cfa}
+&
+\begin{cfa}
+struct S {
+	int i, j;
+	int k, l;
+};
+\end{cfa}
+\end{tabular}
+\end{cquote}
+However, unlike the union which provides storage sharing, there is no semantic difference between the nested anonymous structure and its rewritten counterpart.
+Hence, the nested anonymous structure provides no useful capability.
+
+Nested \emph{named} aggregates are allowed in C but there is no qualification operator, like the \CC type operator `@::@', to access an inner type.
+\emph{To compensate for the missing type operator, all named nested aggregates are hoisted to global scope, regardless of the nesting depth, and type usages within the nested type are replaced with global type name.}
+Hoisting nested types can result in name collisions among types at the global level, which defeats the purpose of nesting the type.
+\VRef[Figure]{f:NestedNamedAggregate} shows the nested type @T@ is hoisted to the global scope and the declaration rewrites within structure @S@.
+Hence, the possible accesses are:
+\begin{cfa}
+struct S s;
+s.i = 1;
+s.t.i = 2;
+s.w = (struct T){ 7, 8 };
+struct T x = { 5, 6 }; // use (un)nested type name
+s.t = (struct T){ 2, 3 };
+\end{cfa}
+where @T@ is used without qualification even though it is nested in @S@.
+It is for these reasons that nested types are not used in C, and if used, are extremely confusing.
+
+\begin{figure}
+\begin{cquote}
+\begin{tabular}{@{}l@{\hspace{35pt}}l@{}}
+original & rewritten \\
+\begin{cfa}
+struct S {
+	@struct T@ {
+		int i, j;
+	} t; // warning without declaration
+	struct T w;
+	int k;
+};
+
+\end{cfa}
+&
+\begin{cfa}
+@struct T@ {
+	int i, j;
+};
+struct S {
+	@struct T t@;
+	struct T w;
+	int k;
+};
+\end{cfa}
+\end{tabular}
+\end{cquote}
+\caption{Nested Named Aggregate}
+\label{f:NestedNamedAggregate}
+\end{figure}
+
+For good reasons, \CC chose to change this semantics:
+\begin{cquote}
+\begin{description}[leftmargin=*,topsep=0pt,itemsep=0pt,parsep=0pt]
+\item[Change:] A struct is a scope in C++, not in C.
+\item[Rationale:] Class scope is crucial to C++, and a struct is a class.
+\item[Effect on original feature:] Change to semantics of well-defined feature.
+\item[Difficulty of converting:] Semantic transformation.
+\item[How widely used:] C programs use @struct@ extremely frequently, but the change is only noticeable when @struct@, enumeration, or enumerator names are referred to outside the @struct@.
+The latter is probably rare.
+\end{description}
+\hfill ISO/IEC 14882:1998 (\CC Programming Language Standard)~\cite[C.1.2.3.3]{ANSI98:C++}
+\end{cquote}
+\CFA chose to adopt the \CC non-compatible change for nested types, since \CC's change has already forced certain coding changes in C libraries that must be parsed by \CC.
+\CFA also added the ability to access from a variable through a type to a field.
+\begin{cfa}
+struct S s;  @s.i@;  @s.T@.i;
+\end{cfa}
+See the use case for this feature at the end of this section.
+
+% https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html
+
+A polymorphic extension to nested aggregates appears in the Plan-9 C dialect, used in the Bell Labs' Plan-9 research operating-system.
+The feature is called \newterm{unnamed substructures}~\cite[\S~3.3]{Thompson90new}, which continues to be supported by @gcc@ and @clang@ using the extension (@-fplan9-extensions@).
+The goal is to provided the same effect as a nested aggregate with the aggregate type defined elsewhere, which requires it be named.
+\begin{cfa}
+union U {  $\C{// unnested named}$
+	int x;  double y;  char z;
+} u;
+struct W {
+	int i;  double j;  char k;
+} w;
+struct S {
+	@struct W;@  $\C{// Plan-9 substructure}$
+	unsigned int tag;
+	@union U;@  $\C{// Plan-9 substructure}$
+} s;
+\end{cfa}
+Note, the position of the substructure is normally unimportant, unless there is some form of memory or @union@ overlay.
+Like an anonymous nested type, a named Plan-9 nested type has its field names hoisted into @struct S@, so there is direct access, \eg @s.x@ and @s.i@.
+Hence, the field names must be unique, unlike \CC nested types, but the type names are at a nested scope level, unlike type nesting in C.
+In addition, a pointer to a structure is automatically converted to a pointer to an anonymous field for assignments and function calls, providing containment inheritance with implicit subtyping, \ie @U@ $\subset$ @S@ and @W@ $\subset$ @S@, \eg:
+\begin{cfa}
+void f( union U * u );
+void g( struct W * );
+union U * up;   struct W * wp;   struct S * sp;
+up = &s; $\C{// assign pointer to U in S}$
+wp = &s; $\C{// assign pointer to W in S}$
+f( &s ); $\C{// pass pointer to U in S}$
+g( &s ); $\C{// pass pointer to W in S}$
+\end{cfa}
+Note, there is no value assignment, such as, @w = s@, to copy the @W@ field from @S@.
+
+Unfortunately, the Plan-9 designers did not lookahead to other useful features, specifically nested types.
+This nested type compiles in \CC and \CFA.
+\begin{cfa}
+struct R {
+	@struct T;@		$\C[2in]{// forward declaration, conflicts with Plan-9 syntax}$
+	struct S {		$\C{// nested types, mutually recursive reference}\CRT$
+		S * sp;   T * tp;  ...
+	};
+	struct T {
+		S * sp;   T * tp;  ...
+	};
+};
+\end{cfa}
+Note, the syntax for the forward declaration conflicts with the Plan-9 declaration syntax.
+
+\CFA extends the Plan-9 substructure by allowing polymorphism for values and pointers, where the extended substructure is denoted using @inline@.
+\begin{cfa}
+struct S {
+	@inline@ struct W;  $\C{// extended Plan-9 substructure}$
+	unsigned int tag;
+	@inline@ U;  $\C{// extended Plan-9 substructure}$
+} s;
+\end{cfa}
+Note, the declaration of @U@ is not prefixed with @union@.
+Like \CC, \CFA allows optional prefixing of type names with their kind, \eg @struct@, @union@, and @enum@, unless there is ambiguity with variable names in the same scope.
+In addition, a semi-non-compatible change is made so that Plan-9 syntax means a forward declaration in a nested type.
+Since the Plan-9 extension is not part of C and rarely used, this change has minimal impact.
+Hence, all Plan-9 semantics are denoted by the @inline@ qualifier, which is good ``eye-candy'' when reading a structure definition to spot Plan-9 definitions.
+Finally, the following code shows the value and pointer polymorphism.
+\begin{cfa}
+void f( U, U * ); $\C{// value, pointer}$
+void g( W, W * ); $\C{// value, pointer}$
+U u, * up;   S s, * sp;   W w, * wp;
+u = s;   up = sp; $\C{// value, pointer}$
+w = s;   wp = sp; $\C{// value, pointer}$
+f( s, &s ); $\C{// value, pointer}$
+g( s, &s ); $\C{// value, pointer}$
+\end{cfa}
+
+In general, non-standard C features (@gcc@) do not need any special treatment, as they are directly passed through to the C compiler.
+However, the Plan-9 semantics allow implicit conversions from the outer type to the inner type, which means the \CFA type resolver must take this information into account.
+Therefore, the \CFA resolver must implement the Plan-9 features and insert necessary type conversions into the translated code output.
+In the current version of \CFA, this is the only kind of implicit type conversion other than the standard C arithmetic conversions.
+
+Plan-9 polymorphism can result in duplicate field names.
+For example, the \newterm{diamond pattern}~\cite[\S~6.1]{Stroustrup89}\cite[\S~4]{Cargill91} can result in nested fields being embedded twice.
+\begin{cfa}
+struct A { int x; };
+struct B { inline A; };
+struct C { inline A; };
+struct D {
+	inline B;  // B.x
+	inline C;  // C.x
+} d;
+\end{cfa}
+Because the @inline@ structures are flattened, the expression @d.x@ is ambiguous, as it can refer to the embedded field either from @B@ or @C@.
+@gcc@ generates a syntax error about the duplicate member @x@.
+The equivalent \CC definition compiles:
+\begin{c++}
+struct A { int x; };
+struct B : public A {};
+struct C : public A {};
+struct D : @public B, C@ {  // multiple inheritance
+} d;
+\end{c++}
+and again the expression @d.x@ is ambiguous.
+While \CC has no direct syntax to disambiguate @x@, \ie @d.B.x@ or @d.C.x@, it is possible with casts, @((B)d).x@ or @((C)d).x@.
+Like \CC, \CFA compiles the Plan-9 version and provides direct qualification and casts to disambiguate @x@.
+While ambiguous definitions are allowed, duplicate field names is poor practice and should be avoided if possible.
+However, when a programmer does not control all code, this problem can occur and a naming workaround must exist.
Index: doc/theses/fangren_yu_MMath/future.tex
===================================================================
--- doc/theses/fangren_yu_MMath/future.tex	(revision 0393fda8cd4ef8e0811b9035990f83c2271392e8)
+++ doc/theses/fangren_yu_MMath/future.tex	(revision 0393fda8cd4ef8e0811b9035990f83c2271392e8)
@@ -0,0 +1,143 @@
+\chapter{Future Work}
+
+These are some feature requests related to type system enhancement that come up during the development of \CFA language and library, but have not been implemented yet. A lot of the current implementations have to work around the limits of the language features and sometimes lead to inefficiency. 
+
+\section{Closed trait types}
+
+\CFA as it currently is does not have any closed types, as new functions can be added at any time. It is also possible to locally declare a function,\footnote{Local functions are not a standard feature in C but supported by mainstream C compilers such as gcc, and allowed in \CFA too.} or a function pointer variable to make a type satisfy a certain trait temporarily and be used as such in this limited scope. However, the lack of closed types in such a "duck typing" scheme proposes two problems. For library implementors, it is common to not want the defined set of operations to be overwritten and cause the behavior of polymorphic invocations to change. For the compiler, it means caching and reusing the result of resolution is not reliable as newly introduced declarations can participate in assertion resolution, making a previously invalid expression valid, or the other way around by introducing ambiguity in assertions. Sometimes those interfaces are fairly complicated, for example the I/O library traits \textbf{istream} and \textbf{ostream} each has over 20 operations. Without the ability to store and reuse assertion resolution results, each time the compiler encounters an I/O operation in the source code (mainly the pipe operator \textbf{?|?} used to represent stream operations in \CFA) it has to resolve the same set of assertions again, causing a lot of repetitive work. Previous experiments have shown that the I/O assertions often account for over half of the number of assertions resolved in a \CFA translation unit. Introducing a way to eliminate the need of doing such repetitive assertion resolutions that are very unlikely to change by new overloads can therefore provide significant improvement to the performance of the compiler.
+
+The output stream trait in \CFA looks like this:
+
+\begin{cfa}
+forall( ostype & )
+trait basic_ostream {
+	// private
+	bool sepPrt$\$$( ostype & );							// get separator state (on/off)
+    
+	void sepReset$\$$( ostype & );							// set separator state to default state
+	void sepReset$\$$( ostype &, bool );					// set separator and default state
+	const char * sepGetCur$\$$( ostype & );				// get current separator string
+	void sepSetCur$\$$( ostype &, const char [] );			// set current separator string
+	bool getNL$\$$( ostype & );							// get newline
+	bool setNL$\$$( ostype &, bool );						// set newline
+	bool getANL$\$$( ostype & );							// get auto newline (on/off)
+	bool setANL$\$$( ostype &, bool );						// set auto newline (on/off), and return previous state
+	bool getPrt$\$$( ostype & );							// get fmt called in output cascade
+	bool setPrt$\$$( ostype &, bool );						// set fmt called in output cascade
+	// public
+	void nlOn( ostype & );								// turn auto-newline state on
+	void nlOff( ostype & );								// turn auto-newline state off
+
+	void sep( ostype & );								// turn separator state on
+	void nosep( ostype & );								// turn separator state off
+	bool sepOn( ostype & );								// set default state to on, and return previous state
+	bool sepOff( ostype & );							// set default state to off, and return previous state
+	const char * sepGet( ostype & );					// get separator string
+	void sepSet( ostype &, const char [] );				// set separator to string (15 character maximum)
+	const char * sepGetTuple( ostype & );				// get tuple separator string
+	void sepSetTuple( ostype &, const char [] );		// set tuple separator to string (15 character maximum)
+
+	void ends( ostype & );								// end of output statement
+	int fmt( ostype &, const char format[], ... ) __attribute__(( format(printf, 2, 3) ));
+}; // basic_ostream
+
+\end{cfa}
+
+and the overloaded output operator is declared as
+
+\begin{cfa}
+forall( ostype & | basic_ostream( ostype ) ) ostype & ?|?( ostype &, int ); // also defined for all other basic types
+\end{cfa}
+
+The \CFA polymorphic calling convention makes the output operator take all the trait functions above as implicit arguments, and they are repeatedly resolved every time an output operator is called. This creates both compile-time and run-time overheads. \CFA compilation takes the most amount of time in resolving assertions, and it is not uncommon to see in the current codebase and test suite that the majority of time spent in compiling a program is due to a series of I/O statements, while resolving those assertions for I/O operators could have been a once and done job. Repeatedly pushing tens of arguments onto the stack to call these operators at run-time can also have some impact on performance, although in the I/O case it is not as significant, since these operations involve system calls which are already slow.
+
+An addition of closed trait type will make a \CFA trait behave similarly to a Haskell type class and require an explicit instance declaration as well. The syntax for closed trait is not settled yet, but it may look something like the following:
+
+\begin{cfa}
+    forall(ostype &)
+    closed trait basic_ostream{
+        // all the above declarations
+    };
+
+    struct ofstream {
+        // ... 
+    };
+
+    // implementation of trait functions for ofstream
+
+    __cfa_trait_instance(ofstream, basic_ostream);
+\end{cfa}
+
+At the point of instance declaration, all the required trait functions must be declared in scope, and they are inferred and specialized in the usual way. The list of inferred and specialized functions will then be saved into an array of function pointers, and polymorphic functions using the closed trait type can simply take the instance pointer, instead of a list of implicit parameters.
+
+\begin{cfa}
+    // original declaration
+    forall( ostype & | basic_ostream( ostype ) ) ostype & ?|?( ostype & os, int i); 
+
+    // translated code with closed trait
+    void* ?|?(void* os, int i, void* _instance_basic_ostream); 
+    // polymorphic pointer and reference types are erased
+
+    // call site
+    sout | 42;
+    
+    // generated code
+    ?|?(sout, 42, __cfa_instance_ofstream__basic_ostream /* generated by __cfa_trait_instance decl */);
+\end{cfa}
+
+Introducing closed trait types also comes with the additional benefit of the ability to achieve better encapsulation. In the above @basic_ostream@ example, the functions ending with dollar sign \$ are meant to be private (internal to the library) and the client code should not be able to see or call those functions directly. This is impossible with the current trait and assertion system, because the satisfying declarations must be visible at call site for the polymorphic call to resolve. With closed traits and instance types, it will be possible to forward declare a closed trait (which would not even make sense otherwise) and the instance pointer of a type implementing the trait without exposing any details to the client, since all the information necessary to call the trait functions are now captured in a single pointer to a table of functions built at compile time, and the type of that pointer is made opaque to the client.
+
+
+
+
+\section{Associated Types}
+
+The analysis presented in section 4.3 shows that if we mandate that all type parameters have to be bound before assertion resolution, the complexity of resolving assertions become much lower as every assertion parameter can be resolved independently. By fully utilizing information from higher up the expression tree for return value overloading, most of the type bindings can be resolved. However there are certain scenarios where we would like to have some intermediate types to be involved in certain operations, that are neither input nor output types.
+
+\CFA standard library provides a few polymorphic container types similar to those found in \CC standard template library. Naturally, we would like to have a harmonized iterator interface for different container types. The feature is still under development and has not been finalized, but for the purpose of this discussion, we will be using a similar approach to the \CC standard iterator contract. The equivalent type signatures can be given in \CFA trait as:
+
+\begin{cfa}
+    forall (Container, Iter, Elem) trait iterable {
+        Iter begin(Container);
+        Iter end(Container);
+        Elem& *?(Iter);
+        Iter ++?(Iter);
+        bool ?==?(Iter,Iter);
+    }
+\end{cfa}
+
+These are the exact operators required in \CC for the for-loop comprehension
+@for (Elem & e: container)@. The problem with this trait declaration is that the iterator type @Iter@ has to be explicitly given in the trait parameter list, but is intended to be deduced from the @begin@ and @end@ operations on the container type; and the same can be said for the element type. The iterable trait is meant to describe a property for the container type but involves two additional types, one for the iterator type and one for the element type. If we were to disallow unbound type parameters in assertions without adding any features to the type system, we will not be able to describe this iterable property in \CFA. 
+
+To solve this problem, I propose adding associate type declarations in addition to the existing (free) @forall@ type parameters. In the body of a trait declaration, new types can be introduced as the return type of an expression involving the type parameters. Following the above example, the iterator contract could be rewritten to
+\begin{cfa}
+    forall (Container) trait iterable {
+        type Iter = begin(Container);
+        type Elem& = *?(Iter);
+        Iter end(Container);
+        Iter ++?(Iter);
+        bool ?==?(Iter,Iter);
+    }
+\end{cfa}
+
+This matches conceptually better that the iterable trait is about one type (the container) rather than about three types. The iterator type and element type can be viewed as properties of the container types, not independent type parameters.
+
+There remains one design decision to be made, that is whether the expressions that define associated types need to be uniquely resolved. If the associated type does not appear in parameter or return types, having such requirement seems to be reasonable, because the expression cost does not consider any conversions that occur in assertion parameter matching, which essentially means having multiple ways to resolve an associated type always results in ambiguity. A more interesting case would be that the associated type also appears in the return type, where both resolving the return type overload based on context and resolving the assertion that defines the associate type can help, and for the whole expression to resolve, they must agree on a common result. Moss gave the following example to illustrate \CFA assertion system's expressiveness that could be viewed as having an associated type as return, and can potentially vary based on the context:
+
+\begin{cfa}
+    forall(Ptr, Elem) trait pointer_like {
+        Elem& *?(Ptr); // Ptr can be dereferenced to Elem
+    };
+    struct list {
+        int value;
+        list* next; // may omit struct on type names
+    };
+    typedef list* list_iterator;
+    int& *?(list_iterator it) {
+        return it->value;
+    }
+\end{cfa}
+
+Note that the type @list*@ satisfies both @pointer_like(list*, int)@ and @pointer_like(list*, list)@ (the latter by the built-in pointer dereference operator) and the expression @*it@ could be either a @struct list@ or an @int@. Requiring associated types to be unique would make the @pointer_like@ trait inapplicable to @list*@ here, which is not desirable. I have not attempted to implement associated types in \CFA compiler, but based on the above discussions, one option could be to make associated type resolution and return type overloading coexist: when the associated type appears in the returns, we deduce it from the context and then verify the trait with ordinary assertion resolution; when it does not appear in the returns, we instead require the type to be uniquely determined by the expression that defines the associated type.
+
+\section{User-defined conversions}
Index: c/theses/fangren_yu_MMath/performance.tex
===================================================================
--- doc/theses/fangren_yu_MMath/performance.tex	(revision ed5023d1a93d52e928c7120920781fefeb7fe26b)
+++ 	(revision )
@@ -1,143 +1,0 @@
-\chapter{Future Work}
-
-These are some feature requests related to type system enhancement that come up during the development of \CFA language and library, but have not been implemented yet. A lot of the current implementations have to work around the limits of the language features and sometimes lead to inefficiency. 
-
-\section{Closed trait types}
-
-\CFA as it currently is does not have any closed types, as new functions can be added at any time. It is also possible to locally declare a function,\footnote{Local functions are not a standard feature in C but supported by mainstream C compilers such as gcc, and allowed in \CFA too.} or a function pointer variable to make a type satisfy a certain trait temporarily and be used as such in this limited scope. However, the lack of closed types in such a "duck typing" scheme proposes two problems. For library implementors, it is common to not want the defined set of operations to be overwritten and cause the behavior of polymorphic invocations to change. For the compiler, it means caching and reusing the result of resolution is not reliable as newly introduced declarations can participate in assertion resolution, making a previously invalid expression valid, or the other way around by introducing ambiguity in assertions. Sometimes those interfaces are fairly complicated, for example the I/O library traits \textbf{istream} and \textbf{ostream} each has over 20 operations. Without the ability to store and reuse assertion resolution results, each time the compiler encounters an I/O operation in the source code (mainly the pipe operator \textbf{?|?} used to represent stream operations in \CFA) it has to resolve the same set of assertions again, causing a lot of repetitive work. Previous experiments have shown that the I/O assertions often account for over half of the number of assertions resolved in a \CFA translation unit. Introducing a way to eliminate the need of doing such repetitive assertion resolutions that are very unlikely to change by new overloads can therefore provide significant improvement to the performance of the compiler.
-
-The output stream trait in \CFA looks like this:
-
-\begin{cfa}
-forall( ostype & )
-trait basic_ostream {
-	// private
-	bool sepPrt$\$$( ostype & );							// get separator state (on/off)
-    
-	void sepReset$\$$( ostype & );							// set separator state to default state
-	void sepReset$\$$( ostype &, bool );					// set separator and default state
-	const char * sepGetCur$\$$( ostype & );				// get current separator string
-	void sepSetCur$\$$( ostype &, const char [] );			// set current separator string
-	bool getNL$\$$( ostype & );							// get newline
-	bool setNL$\$$( ostype &, bool );						// set newline
-	bool getANL$\$$( ostype & );							// get auto newline (on/off)
-	bool setANL$\$$( ostype &, bool );						// set auto newline (on/off), and return previous state
-	bool getPrt$\$$( ostype & );							// get fmt called in output cascade
-	bool setPrt$\$$( ostype &, bool );						// set fmt called in output cascade
-	// public
-	void nlOn( ostype & );								// turn auto-newline state on
-	void nlOff( ostype & );								// turn auto-newline state off
-
-	void sep( ostype & );								// turn separator state on
-	void nosep( ostype & );								// turn separator state off
-	bool sepOn( ostype & );								// set default state to on, and return previous state
-	bool sepOff( ostype & );							// set default state to off, and return previous state
-	const char * sepGet( ostype & );					// get separator string
-	void sepSet( ostype &, const char [] );				// set separator to string (15 character maximum)
-	const char * sepGetTuple( ostype & );				// get tuple separator string
-	void sepSetTuple( ostype &, const char [] );		// set tuple separator to string (15 character maximum)
-
-	void ends( ostype & );								// end of output statement
-	int fmt( ostype &, const char format[], ... ) __attribute__(( format(printf, 2, 3) ));
-}; // basic_ostream
-
-\end{cfa}
-
-and the overloaded output operator is declared as
-
-\begin{cfa}
-forall( ostype & | basic_ostream( ostype ) ) ostype & ?|?( ostype &, int ); // also defined for all other basic types
-\end{cfa}
-
-The \CFA polymorphic calling convention makes the output operator take all the trait functions above as implicit arguments, and they are repeatedly resolved every time an output operator is called. This creates both compile-time and run-time overheads. \CFA compilation takes the most amount of time in resolving assertions, and it is not uncommon to see in the current codebase and test suite that the majority of time spent in compiling a program is due to a series of I/O statements, while resolving those assertions for I/O operators could have been a once and done job. Repeatedly pushing tens of arguments onto the stack to call these operators at run-time can also have some impact on performance, although in the I/O case it is not as significant, since these operations involve system calls which are already slow.
-
-An addition of closed trait type will make a \CFA trait behave similarly to a Haskell type class and require an explicit instance declaration as well. The syntax for closed trait is not settled yet, but it may look something like the following:
-
-\begin{cfa}
-    forall(ostype &)
-    closed trait basic_ostream{
-        // all the above declarations
-    };
-
-    struct ofstream {
-        // ... 
-    };
-
-    // implementation of trait functions for ofstream
-
-    __cfa_trait_instance(ofstream, basic_ostream);
-\end{cfa}
-
-At the point of instance declaration, all the required trait functions must be declared in scope, and they are inferred and specialized in the usual way. The list of inferred and specialized functions will then be saved into an array of function pointers, and polymorphic functions using the closed trait type can simply take the instance pointer, instead of a list of implicit parameters.
-
-\begin{cfa}
-    // original declaration
-    forall( ostype & | basic_ostream( ostype ) ) ostype & ?|?( ostype & os, int i); 
-
-    // translated code with closed trait
-    void* ?|?(void* os, int i, void* _instance_basic_ostream); 
-    // polymorphic pointer and reference types are erased
-
-    // call site
-    sout | 42;
-    
-    // generated code
-    ?|?(sout, 42, __cfa_instance_ofstream__basic_ostream /* generated by __cfa_trait_instance decl */);
-\end{cfa}
-
-Introducing closed trait types also comes with the additional benefit of the ability to achieve better encapsulation. In the above @basic_ostream@ example, the functions ending with dollar sign \$ are meant to be private (internal to the library) and the client code should not be able to see or call those functions directly. This is impossible with the current trait and assertion system, because the satisfying declarations must be visible at call site for the polymorphic call to resolve. With closed traits and instance types, it will be possible to forward declare a closed trait (which would not even make sense otherwise) and the instance pointer of a type implementing the trait without exposing any details to the client, since all the information necessary to call the trait functions are now captured in a single pointer to a table of functions built at compile time, and the type of that pointer is made opaque to the client.
-
-
-
-
-\section{Associated Types}
-
-The analysis presented in section 4.3 shows that if we mandate that all type parameters have to be bound before assertion resolution, the complexity of resolving assertions become much lower as every assertion parameter can be resolved independently. By fully utilizing information from higher up the expression tree for return value overloading, most of the type bindings can be resolved. However there are certain scenarios where we would like to have some intermediate types to be involved in certain operations, that are neither input nor output types.
-
-\CFA standard library provides a few polymorphic container types similar to those found in \CC standard template library. Naturally, we would like to have a harmonized iterator interface for different container types. The feature is still under development and has not been finalized, but for the purpose of this discussion, we will be using a similar approach to the \CC standard iterator contract. The equivalent type signatures can be given in \CFA trait as:
-
-\begin{cfa}
-    forall (Container, Iter, Elem) trait iterable {
-        Iter begin(Container);
-        Iter end(Container);
-        Elem& *?(Iter);
-        Iter ++?(Iter);
-        bool ?==?(Iter,Iter);
-    }
-\end{cfa}
-
-These are the exact operators required in \CC for the for-loop comprehension
-@for (Elem & e: container)@. The problem with this trait declaration is that the iterator type @Iter@ has to be explicitly given in the trait parameter list, but is intended to be deduced from the @begin@ and @end@ operations on the container type; and the same can be said for the element type. The iterable trait is meant to describe a property for the container type but involves two additional types, one for the iterator type and one for the element type. If we were to disallow unbound type parameters in assertions without adding any features to the type system, we will not be able to describe this iterable property in \CFA. 
-
-To solve this problem, I propose adding associate type declarations in addition to the existing (free) @forall@ type parameters. In the body of a trait declaration, new types can be introduced as the return type of an expression involving the type parameters. Following the above example, the iterator contract could be rewritten to
-\begin{cfa}
-    forall (Container) trait iterable {
-        type Iter = begin(Container);
-        type Elem& = *?(Iter);
-        Iter end(Container);
-        Iter ++?(Iter);
-        bool ?==?(Iter,Iter);
-    }
-\end{cfa}
-
-This matches conceptually better that the iterable trait is about one type (the container) rather than about three types. The iterator type and element type can be viewed as properties of the container types, not independent type parameters.
-
-There remains one design decision to be made, that is whether the expressions that define associated types need to be uniquely resolved. If the associated type does not appear in parameter or return types, having such requirement seems to be reasonable, because the expression cost does not consider any conversions that occur in assertion parameter matching, which essentially means having multiple ways to resolve an associated type always results in ambiguity. A more interesting case would be that the associated type also appears in the return type, where both resolving the return type overload based on context and resolving the assertion that defines the associate type can help, and for the whole expression to resolve, they must agree on a common result. Moss gave the following example to illustrate \CFA assertion system's expressiveness that could be viewed as having an associated type as return, and can potentially vary based on the context:
-
-\begin{cfa}
-    forall(Ptr, Elem) trait pointer_like {
-        Elem& *?(Ptr); // Ptr can be dereferenced to Elem
-    };
-    struct list {
-        int value;
-        list* next; // may omit struct on type names
-    };
-    typedef list* list_iterator;
-    int& *?(list_iterator it) {
-        return it->value;
-    }
-\end{cfa}
-
-Note that the type @list*@ satisfies both @pointer_like(list*, int)@ and @pointer_like(list*, list)@ (the latter by the built-in pointer dereference operator) and the expression @*it@ could be either a @struct list@ or an @int@. Requiring associated types to be unique would make the @pointer_like@ trait inapplicable to @list*@ here, which is not desirable. I have not attempted to implement associated types in \CFA compiler, but based on the above discussions, one option could be to make associated type resolution and return type overloading coexist: when the associated type appears in the returns, we deduce it from the context and then verify the trait with ordinary assertion resolution; when it does not appear in the returns, we instead require the type to be uniquely determined by the expression that defines the associated type.
-
-\section{User-defined conversions}
Index: doc/theses/fangren_yu_MMath/resolution.tex
===================================================================
--- doc/theses/fangren_yu_MMath/resolution.tex	(revision 0393fda8cd4ef8e0811b9035990f83c2271392e8)
+++ doc/theses/fangren_yu_MMath/resolution.tex	(revision 0393fda8cd4ef8e0811b9035990f83c2271392e8)
@@ -0,0 +1,553 @@
+\chapter{Resolution Algorithms}
+\label{c:content2}
+
+The \CFA's type-system provides expressive polymorphism: variables can be overloaded, functions can be overloaded by argument and return types, tuple types, generic (polymorphic) functions and types (aggregates) can have multiple type parameters with assertion restrictions;
+in addition, C's multiple implicit type-conversions must be respected.
+This generality leads to internal complexity and correspondingly higher compilation cost directly related to type resolution.
+The reason is that the type resolver must analyze \emph{each} component of an expression with many possible forms of overloading, type restrictions, and conversions.
+Designing a ruleset that behaves as expected, \ie matches C programmer expectations, and implementing an efficient algorithm is a very challenging task.
+
+My first work on the \CFA type-system was as a co-op student.
+At that time, compiling a medium-sized \CFA program using advanced polymorphism took multiple minutes (5+ minutes)~\cite[\S~5]{Yu20}.
+I found a number of type-resolution problems, which resulted in the following changes to the type-resolution algorithm.
+\begin{enumerate}[itemsep=0pt]
+\item
+new AST data structure
+\item
+special symbol table and argument-dependent lookup
+\item
+late assertion-satisfaction
+\item
+revised function-type representation
+\item
+skip pruning on expressions for function types
+\end{enumerate}
+\VRef[Table]{t:SelectedFileByCompilerBuild} shows improvements for selected tests with accumulated reductions in compile time across each of the 5 fixes.
+The large reduction in compilation time significantly improved the development of the \CFA's runtime because of its frequent compilation cycles.
+
+\begin{table}[htb]
+\centering
+\caption{Compile time of selected files by compiler build in seconds}
+\label{t:SelectedFileByCompilerBuild}
+\lstset{deletekeywords={mutex,thread}}
+\setlength{\extrarowheight}{1pt}
+\vspace*{-4pt}
+\begin{tabular}{l|r|rrrrr}
+				& \multicolumn{1}{c|}{Original}	& \multicolumn{5}{c}{Accumulative fixes}	\\
+\cline{3-7}				
+Test case		& \multicolumn{1}{c|}{resolver}	& \multicolumn{1}{c}{1}	& \multicolumn{1}{c}{2}	& \multicolumn{1}{c}{3}	& \multicolumn{1}{c}{4}	& \multicolumn{1}{c}{5}	\\
+\hline
+@lib/fstream@	& 86.4	& 25.2	& 10.8	& 9.5	& 7.8	& 7.1	\\
+@lib/mutex@		& 210.4	& 77.4	& 16.7	& 15.1	& 12.6	& 11.7	\\
+@lib/vector@	& 17.2	& 8.9	& 3.1	& 2.8	& 2.4	& 2.2	\\
+@lib/stdlib@	& 16.6	& 8.3	& 3.2	& 2.9	& 2.6	& 2.4	\\
+@test/io2@		& 300.8	& 53.6	& 43.2	& 27.9	& 19.1	& 16.3	\\
+@test/thread@	& 210.9	& 73.5	& 17.0	& 15.1	& 12.6	& 11.8	\\
+\end{tabular}
+\medskip
+\newline
+Results are average of 5 runs (3 runs if time exceeds 100 seconds)
+\end{table}
+
+Since then, many new features utilizing the expressiveness of \CFA's type system have been implemented, such as generic container types similar to those in \CC's standard template library.
+During the development of multiple \CFA programs and libraries, more weaknesses and design flaws have been discovered within the type system.
+Some of those problems arise from the newly introduced language features described in the previous chapter.
+In addition, fixing unexpected interactions within the type system has presented challenges.
+This chapter describes in detail the type-resolution rules currently in use and some major problems that have been identified.
+Not all of those problems have immediate solutions, because fixing them may require redesigning parts of the \CFA type system at a larger scale, which correspondingly affects the language design.
+
+
+\section{Expression Cost-Model}
+
+\CFA has been using a total-order expression cost-model to resolve ambiguity of overloaded expressions from the very beginning.
+Most \CFA operators can be overloaded in \CFA\footnote{
+\CFA excludes overloading the comma operator, short-circuit logical operators \lstinline{&&} and \lstinline{||}, and ternary conditional operator \lstinline{?}, all of which are short-hand control structures rather than operators.
+\CC overloads the comma and short-circuit logical operators, where the overloaded comma is esoteric short-circuit operators do not exhibit short-circuit semantics, which is confusing.};
+hence, they are treated the same way as other function calls with the same rules for overload resolution.
+
+In \CFA, candidates of an overloaded expression are ranked by numerical cost elements, which account any type conversions needed from a call-site argument type to the matching function parameter type, as well as polymorphic types and restrictions introduced by the @forall@ clause.
+Currently, the expression cost has the following components, ranked from highest to lowest.
+\begin{enumerate}
+\item \textbf{Unsafe} cost representing a narrowing conversion of arithmetic types, \eg @int@ to @short@, and qualifier-dropping conversions for pointer and reference types.
+\item \textbf{Polymorphic} cost for a function parameter type that is or contains a polymorphic type variable.
+\item \textbf{Safe} cost representing a widening conversion \eg @short@ to @int@, qualifier-adding conversions for pointer and reference types, and value conversion for enumeration constants.
+\item \textbf{Variable} cost that counts the number of polymorphic variables, if any, introduced by the @forall@ clause in the function declaration.
+\item \textbf{Specialization} cost counting the number of restrictions introduced by type assertions.
+\end{enumerate}
+The comparison of cost tuple is by lexicographical order, from unsafe (highest) to specialization (lowest), with ties moving to the next lowest item.
+At a subexpression level, the lowest cost candidate for each result type is included as a possible interpretation of the expression;
+at the top level, all possible interpretations of different types are considered (generating a total ordering) and the overall lowest cost is selected as the final interpretation of the expression.
+Glen Ditchfield first proposed this costing model~\cite[\S~4.4.5]{Ditchfield92} to generate a resolution behaviour that is reasonable to C programmers based on existing conversions in the C programming language.
+This model carried over into the first implementation of the \CFA type-system by Richard Bilson~\cite[\S~2.2]{Bilson03}, and was extended but not redesigned by Aaron Moss~\cite[chap.~4]{Moss19}.
+Moss's work began to show problems with the underlying costing model;
+these design issues are part of this work.
+
+\begin{comment}
+From: Richard Bilson <rcbilson@gmail.com>
+Date: Tue, 10 Dec 2024 09:54:59 -0500
+Subject: Re: cost model
+To: "Peter A. Buhr" <pabuhr@fastmail.fm>
+
+I didn't invent it although I may have refined it somewhat. The idea of the
+conversion cost is from Glen's thesis, see for instance page 90
+
+On Tue, Dec 10, 2024 at 9:35AM Peter A. Buhr <pabuhr@fastmail.fm> wrote:
+> Cforall has a costing model based on an N-tuple using lexicographical ordering.
+> (unsafe, polymorphic, safe, variable, specialization)
+>
+> Did you invent this costing model as a cheap and cheerful way to encode Glen's
+> type system?
+
+From: Richard Bilson <rcbilson@gmail.com>
+Date: Tue, 10 Dec 2024 17:04:26 -0500
+Subject: Re: cost model
+To: "Peter A. Buhr" <pabuhr@fastmail.fm>
+
+Yes, I think that's fair to say.
+
+On Tue, Dec 10, 2024 at 2:22PM Peter A. Buhr <pabuhr@fastmail.fm> wrote:
+> Found it thanks. But Glen never implemented anything (small examples). So it
+> feels like you took his conversion-cost idea and created an implementation table
+> with the lexicographical comparison.
+\end{comment}
+
+In many languages that support function/method overloading, such as \CC and Java, a partial-order system decides which interpretation of the expression gets selected.
+Hence, sometimes the candidates are incomparable (none are considered a better match), and only when one candidate is considered better than all others (maximal with respect to the partial order) is the expression unambiguous.
+Specifically, the resolution algorithms used in \CC and Java are greedy, selecting the best match for each sub-expression without considering the higher-level ones (bottom-up).
+Therefore, at each resolution step, the arguments are already given unique interpretations, so the ordering only needs to compare different sets of conversion targets (function parameter types) on the same set of input.
+\begin{cfa}
+@generate a C++ example here@
+\end{cfa}
+
+In \CFA, trying to use such a system is problematic because of the presence of return-type overloading of functions and variable.
+Specifically, \CFA expression resolution considers multiple interpretations of argument subexpressions with different types, \eg:
+\begin{cfa}
+@generate a CFA example here@
+\end{cfa}
+so it is possible that both the selected function and the set of arguments are different, and cannot be compared with a partial-ordering system.
+This situation arises often in \CFA, even in the simple expression @f(x)@, where both the function name @f@ and variable name @x@ are overloaded.
+
+Ada is another programming language that has overloading based on return type.
+Although Ada also allows implicit type conversions of function arguments, it is fairly conservative on resolving ambiguities.
+\begin{cfa}
+@generate an Ada example here@
+\end{cfa}
+There are only two preference rules in Ada overload resolution, one for primitive arithmetic operators and one for universal access types (analogous to @void *@ in C);
+any other cases where an expression has multiple legal interpretations are considered ambiguous.
+The current overload resolution system for \CFA is on the other end of the spectrum, as it tries to order every legal interpretations of an expression and chooses the best one according to cost, occasionally giving unexpected results.
+
+Interestingly, the \CFA cost-based model can sometimes make expression resolution too permissive because it always attempts to select the lowest cost option, and only when there are multiple options tied at the lowest cost does it report the expression is ambiguous.
+The reason is that there are so many elements in the cost tuple, the type system ``tries too hard to discover a match'', and therefore, ties are uncommon.
+Other than the case of multiple exact matches, where all have cost zero, incomparable candidates under a partial ordering of being more specific can often have different expression costs since different kinds of implicit conversions are involved, resulting in seemingly arbitrary overload selections.
+
+There are currently at least three different situations where the polymorphic cost element of the cost model does not yield a candidate selection that is clearly justifiable, and one of them is straight up wrong.
+\begin{enumerate}[leftmargin=*]
+\item Polymorphic exact match versus non-polymorphic inexact match.
+\begin{cfa}
+forall( T ) void f( T );		$\C[2.5in]{// 1}$
+void f( long );					$\C{// 2}$
+f( 42 );						$\C{// currently selects 2}\CRT$
+\end{cfa}
+Under the current cost model, option 1 incurs a polymorphic cost from matching the argument type @int@ to type variable @T@, and option 2 incurs a safe cost from integer promotion of type @int@ to @long@.
+Since polymorphic cost is ranked above safe conversion cost, option 2 is considered to have lower cost and gets selected.
+
+In contrast, the template deduction and overload resolution rules in \CC selects option 1 (converting @forall@ to the equivalent function template declaration).
+\CC performs template argument deduction and overload candidate ranking in two separate steps.
+\begin{itemize}
+\item
+In the first step, the type parameters are deduced for each primary function template, and if the corresponding template instantiation succeeds, the resulting function prototype is added to the resolution candidate set.
+\item
+In the second step, the implicit conversions (if any) applied to argument types are compared after taking away top-level qualifiers and references.
+It then prefers an exact match, followed by basic type promotions (roughly corresponds to safe conversion in \CFA), and then other kinds of conversions (roughly corresponds to unsafe conversion in \CFA).
+Only when the type conversions are the same does it prioritize a non-template candidate.
+\end{itemize}
+In this example, option 1 produces the prototype @void f( int )@, which gives an exact match and therefore takes priority.
+The \CC resolution rules effectively makes option 2 a specialization that only applies to type @long@ exactly,\footnote{\CC does have explicit template specializations, however they do not participate directly in overload resolution and can sometimes lead to unintuitive results.} while the current \CFA rules make option 2 apply for all integral types below @long@.
+This difference could be explained as compensating for \CFA polymorphic functions being separately compiled versus template inlining;
+hence, calling them requires passing type information and assertions increasing the runtime cost.
+\item
+Having a lower total polymorphic cost does not always mean a function is more specialized.
+The following example is from Moss's thesis, which discusses some improvements to the \CFA cost-model.
+He claims the following function prototypes are increasingly more constrained:
+\begin{cfa}
+forall( T, U ) void f( T, U );	$\C[2.75in]{// 1 polymorphic}$
+forall( T ) void f( T, T );		$\C{// 2 less polymorphic}$
+forall( T ) void f( T, int );	$\C{// 3 even less polymorphic}$
+forall( T ) void f( T *, int ); $\C{// 4 least polymorphic}\CRT$
+\end{cfa}
+This argument is not entirely correct.
+Although it is true that both the sequence 1, 2 and 1, 3, 4 are increasingly more constrained on the argument types, option 2 is not comparable to either of option 3 or 4;
+they actually describe independent constraints on the two arguments.
+Specifically, option 2 says the two arguments must have the same type, while option 3 states the second argument must have type @int@, 
+Because two constraints can independently be satisfied, neither should be considered a better match when trying to resolve a call to @f@ with argument types @(int, int)@;
+reporting such an expression as ambiguous is more appropriate.
+The example illustrates the limitation of using a numerical cost value as it cannot represent these complicated cases.
+\item
+A generic type can require more type variables to describe a more specific type.
+For example, a generic function taking a @pair@-type, requires two type variables.
+\begin{cfa}
+forall( T, U ) void f( pair( T, U ) ); $\C[2.75in]{// 1}$
+\end{cfa}
+Add a function taking any type.
+\begin{cfa}
+forall( T ) void f( T );		$\C{// 2}\CRT$
+\end{cfa}
+Passing a @pair@ variable to @f@ gives a cost of 1 poly, 2 variable for the @pair@ overload, versus a cost of 1 poly, 1 variable for the unconstrained overload.
+Programmer expectation is to select the former, but the cost model selects the latter;
+either could work.
+As a result, simply counting the number of polymorphic type variables is no longer correct to order the function candidates as being more constrained.
+\end{enumerate}
+
+These inconsistencies are not easily solvable in the current cost-model, meaning the currently \CFA codebase has to workaround these defects.
+One potential solution is to mix the conversion cost and \CC-like partial ordering of specializations.
+For example, observe that the first three elements (unsafe, polymorphic and safe conversions) in the \CFA cost-tuple are related to the argument/parameter types, while the other two elements (polymorphic variable and assertion counts) are properties of the function declaration.
+Using this observation, it is reasonable to have an ordering that compares the argument/parameter conversion costs first and uses the partial ordering of specializations as a tiebreaker.
+Hence, the \CC template-specialization ordering can be applied to \CFA with a slight modification.
+
+At the meantime, some other improvements have been made to the expression cost system.
+Most notably, the expression resolution algorithm now consistently uses the globally minimal cost interpretation, as discussed in a previous technical report.
+While implementing the change, there are also two detailed issues that need to be addressed for the new rules to fully work.
+
+The first deals with a common idiom in \CFA that creates many overloads with equal cost.
+Many C library functions have multiple versions for different argument types.
+For example, the absolute-value function is defined for basic arithmetic types with different names, since C does not support overloading.
+\begin{cquote}
+\begin{tabular}{@{}lll@{}}
+\begin{cfa}
+int abs( int );
+long labs( long );
+long long int llabs( long long int );
+\end{cfa}
+&
+\begin{cfa}
+double fabs( double );
+float fabsf( float );
+long double fabsl( long double );
+\end{cfa}
+&
+\begin{cfa}
+cabs, cabsf, $and$ cabsl
+$for$ _Complex
+
+\end{cfa}
+\end{tabular}
+\end{cquote}
+It is cumbersome for programmers to remember these function names and select the correct one.
+If an incorrect version is picked, the program compiles but with potential negative consequences such as using an integral version with a floating-point argument.
+In \CFA, these functions are wrapped by functions with the overloaded name @abs@, which results in multiple overloads with the same total cost when some conversion is needed.
+For example, @long x = abs( 42 )@ resolves to @long abs( long )@ with @int@ argument 42 converted to @long@ or @int abs( int )@ converting the result to @long@.
+In this example, the choice could be arbitrary because both yield identical results.
+However, for @int i = abs( -9223372036854775807LL )@, the result is @-1@, suggesting some kind of type error rather than silently generating a logically incorrect result.
+There are multiple overload groupings of C functions into a single \CFA name, so usage should not report an ambiguity or warning unless there is a significant chance of error.
+
+While testing the effects of the updated cost rule, the following example was found in the \CFA standard library.
+\begin{cfa}
+static inline double __to_readyQ_avg( unsigned long long intsc ) {
+	if ( unlikely( 0 == intsc ) ) return 0.0; 
+	else return log2( @intsc@ ); // implicit conversion happens here
+}
+\end{cfa}
+This helper function is used for performance logging as part of computing a geometric mean;
+it is called during summing of logarithmic values.
+However, the function name @log2@ is overloaded in \CFA for both integer and floating point types.
+In this case, the integer overload returns an integral result, truncating any small fractional part of the logarithm, so the sum is slightly incorrect.
+When experimenting with the updated cost rules, it flagged the @log2@ call as an ambiguous expression.
+When asked, the developer expected the floating-point overload because of return-type overloading.
+This mistake went unnoticed because the truncated component was insignificant in the performance logging.
+Investigation of this example leads to the decision that the resolution algorithm favors a lower conversion cost up the expression tree when the total global cost is equal.
+
+Another change addresses the issue that C arithmetic expressions have unique meanings governed by its arithmetic conversion rules.
+\begin{enumerate}[leftmargin=*,topsep=5pt,itemsep=4pt]
+\item
+First, if the corresponding real type of either operand is @long double@, the other operand is converted, without change of type domain, to a type whose corresponding real type is @long double@.
+\item
+Otherwise, if the corresponding real type of either operand is @double@, the other operand is converted, without change of type domain, to a type whose corresponding real type is @double@.
+\item
+Otherwise, if the corresponding real type of either operand is @float@, the other operand is converted, without change of type domain, to a type whose corresponding real type is @float@.\footnote{
+For example, addition of a \lstinline{double _Complex} and a \lstinline{float} entails just the conversion of the \lstinline{float} operand to \lstinline{double} (and yields a \lstinline{double _Complex} result).}
+\item
+Otherwise, the integer promotions are performed on both operands.
+Then the following rules are applied to the promoted operands:
+\begin{enumerate}[topsep=5pt,itemsep=4pt]
+\item
+If both operands have the same type, then no further conversion is needed.
+\item
+Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
+\item
+\label{p:SignedToUnsignedSafe}
+Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
+\item
+\label{p:UnsignedToSignedUnsafe}
+Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
+\item
+\label{p:Miscellaneous}
+Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
+
+\hfill\cite[\S~6.3.1.8]{C11}
+\end{enumerate}
+\end{enumerate}
+\VRef[Figure]{f:CExpressionConversions} shows the C arithmetic conversions graphically.
+\VRef[Rule]{p:SignedToUnsignedSafe} says an unsigned type is safely convertible to an signed type with greater rank, while \VRef[rule]{p:UnsignedToSignedUnsafe} says a signed type is unsafely convertible to an unsigned type.
+Therefore, these two rules cover every possible case.
+\VRef[Rule]{p:Miscellaneous} is the catch-all to accommodate any kinds of exotic integral representations.
+These conversions are applied greedily at local points within an expression.
+Because there is no overloading in C, except for builtin operators, no cost model is needed to differentiate among alternatives that could result in ambiguous matches.
+
+\begin{figure}
+\input{C_expression_conversion.pstex_t}
+\caption{C Expression Conversions: T1 operator T2}
+\label{f:CExpressionConversions}
+
+\smallskip
+\input{CFA_curr_arithmetic_conversion.pstex_t}
+\caption{\CFA Total-Ordering Expression Conversions}
+\label{f:CFACurrArithmeticConversions}
+
+\smallskip
+\input{CFA_arithmetic_conversion.pstex_t}
+\caption{\CFA Partial-Ordering Expression Conversions}
+\label{f:CFAArithmeticConversions}
+\end{figure}
+
+\VRef[Figure]{f:CFACurrArithmeticConversions} shows the current \CFA total-order arithmetic-conversions graphically.
+Here, the unsafe cost of signed to unsigned is factored into the ranking, so the safe conversion is selected over an unsafe one.
+Furthermore, an integral option is taken before considering a floating option.
+This model locally matches the C approach, but provides an ordering when there are many overloaded alternative.
+However, as Moss pointed out overload resolution by total cost has problems, \eg handling cast expressions.
+\begin{cquote}
+\ldots if a cast argument has an unambiguous interpretation as a conversion argument then it must be interpreted as such, even if the ascription interpretation would have a lower overall cost.
+This is demonstrated in the following example, adapted from the C standard library:
+\begin{cfa}
+unsigned long long x;
+(unsigned)( x >> 32 );
+\end{cfa}
+\vspace{5pt}
+In C semantics, this example is unambiguously upcasting 32 to @unsigned long long@, performing the shift, then downcasting the result to @unsigned@, at cost (1, 0, 3, 1, 0, 0, 0).
+If ascription were allowed to be a first-class interpretation of a cast expression, it would be cheaper to select the @unsigned@ interpretation of @?>>?@ by downcasting @x@ to @unsigned@ and upcasting 32 to @unsigned@, at a total cost of (1, 0, 1, 1, 0, 0, 0).
+\PAB{This example feels incorrect. Assume a word size of 4 bits (long long). Given the value 1001, shift it >> 2, gives 10, which fits in a half word (int). Now truncate 1001 to a halfword 01 and shift it >> 2, gives 00. So the lower-cost alternative does generate the correct result. Basically truncation is an unsafe operation and hence has a huge negative cost.}
+However, this break from C semantics is not backwards compatible, so to maintain C compatibility, the \CFA resolver selects the lowest-cost interpretation of the cast argument for which a conversion or coercion to the target type exists (upcasting to @unsigned long long@ in the example above, due to the lack of unsafe downcasts), using the cost of the conversion itself only as a tie-breaker.\cite[pp.~46-47]{Moss19}
+\end{cquote}
+However, a cast expression is not necessary to have such inconsistency to C semantics.
+An implicit argument-parameter type conversion in a function calls can replicate this issue without an explicit cast.
+\begin{cfa}
+unsigned long long x;
+void f( unsigned );
+f( x >> 32 );
+\end{cfa}
+The argument generation has the same effect as using an explicit cast to coerce the type of expression @x >> 32@ to @unsigned@.
+This example shows the problem is not coming from the cast expressions, but from modelling the C builtin operators as overloaded functions.
+As a result, a different rule is used to select the builtin function candidates to fix this problem:
+if an expression has any legal interpretations as a C builtin operation, only the lowest cost one is kept, regardless of the result type.
+
+\VRef[Figure]{f:CFAArithmeticConversions} shows an alternative \CFA partial-order arithmetic-conversions graphically.
+The idea here is to first look for the best integral alternative because integral calculations are exact and cheap.
+If no integral solution is found, than there are different rules to select among floating-point alternatives.
+This approach reduces the search space by partitioning: only look at integral operations, and then only look at float-point operations.
+
+
+\section{Type Unification}
+
+Type unification is the algorithm that assigns values to each (free) type parameters such that the types of the provided arguments and function parameters match.
+
+\CFA does not attempt to do any type \textit{inference} \see{\VRef{s:IntoTypeInferencing}}: it has no anonymous functions (\ie lambdas, commonly found in functional programming and also used in \CC and Java), and the variable types must all be explicitly defined (no auto typing).
+This restriction makes the unification problem more tractable in \CFA, as the argument types at each call site are usually all specified.
+There is a single exception case when the function return type contains a free type variable that does not occur in any of the argument types, and subsequently passed into the parent expression. One such example is the \CFA wrapper for @malloc@ which also infers size argument from the deducted return type.
+\begin{cfa}
+forall (T*) T* malloc() {
+	return (T*) malloc (sizeof(T)); // calls C malloc with the size inferred from context
+}
+\end{cfa}
+A top level expression whose type still contains an unbounded type variable is considered ill-formed as such an expression is inherently ambiguous.
+
+The unification algorithm in \CFA is originally presented in Richard Bilson's thesis and it has remained as the basis of the algorithm currently in use.
+Some additions have been made in order to accommodate for the newly added type features to the language.
+To summarize, the \CFA type unification has two minor variants: an \textit{exact} mode and an \textit{inexact} mode.
+The inexact mode is applied at top level argument-parameter matching, and attempts to find an assignment to the type variables such that the argument types can be converted to parameter types with minimal cost as defined in the previous section.
+The exact mode is required since the type matching algorithm operates recursively and the inner types often have to match exactly, for example there is no common type for the pointer types \textbf{int*} and \textbf{long*} while there is for @int@ and @long@.
+With the introduction of generic record types, the parameters must match exactly as well; currently there are no covariance or contravariance supported for the generics.
+
+One simplification was made to the \CFA language that makes modelling the type system easier: polymorphic function pointer types are no longer allowed in declarations.
+The polymorphic function declarations themselves are still treated as function pointer types internally, however the change means that formal parameter types can no longer be polymorphic.
+Previously it is possible to write function prototypes such as 
+\begin{cfa}
+void f( forall( T | { T -?( T ); } ) T (*p)( T, T ) );
+\end{cfa}
+Notably, the unification algorithm implemented in the \CFA compiler has never managed to trace the assertion parameters on the formal types at all, and the problem of determining if two assertion sets are compatible may very likely be undecidable in general, given the ability of synthesizing more complicated types by the nesting of generics.
+Eventually, the reason of not allowing such constructs is that they mostly do not provide useful type features for actual programming tasks.
+A subroutine of a program operates on the arguments provided at the call site together with (if any) local and global variables, and even though the subroutine can be polymorphic, the types will be supported at each call site.
+On each invocation the types to be operate on can be determined from the arguments provided, and therefore there should not be a need to pass a polymorphic function pointer, which can take any type in principle.
+
+For example, consider a polymorphic function that takes one argument of type @T@ and another pointer to a subroutine
+\begin{cfa}
+forall (T) void f (T x, forall (U) void (*g) (U));
+\end{cfa}
+Making @g@ polymorphic in this context would almost certainly be unnecessary, since it can only be called inside the body of @f@ and the types of the argument would have been known anyways, although it can potentially depend on @T@.
+Moreover, requesting a function parameter to be able to potentially work on any input type at all would always impose too much constraint on the arguments, as it only needs to make each calls inside the body of @f@ valid.
+
+Rewriting the prototype to
+\begin{cfa}
+forall (T) void f (T x, void (*g) (T));
+\end{cfa}
+will be sufficient (or potentially, some compound type synthesized from @T@), in which case @g@ is no longer a polymorphic type on itself.
+The ``monomorphization'' conversion is readily supported in \CFA, either by explicitly assigning a polymorphic function name to a compatible function pointer type, or implicitly done in deducing assertion parameters (which will be discussed in the next section).
+Such technique can be directly applied to argument passing, which is essentially just assignment to function parameter variables.
+There might be some edge cases where the supplied subroutine @g@ is called on arguments of different types inside the body of @f@ and so declared as polymorphic, but such use case is rare and the benefit of allowing such constructs seems to be minimal in practice.
+
+The result of this change is that the unification algorithm no longer needs to distinguish open and closed type-variables, as the latter is not allowed to exist.
+The only type variables that need to be handled are those introduced by the @forall@ clause from the function prototype.
+The subtype relationship between function types is now also rendered redundant since none of the function parameter or return types can be polymorphic, and no basic types or non-polymorphic function types are subtypes of any other type.
+Therefore the goal of (exact) type unification now simply becomes finding a substitution that produces identical types.
+The assertion set need to be resolved is also always just the declarations on the function prototype, which also simplifies the assertion satisfaction algorithm by a bit, as will be discussed further in the next section.
+
+The type unification results are stored in a type environment data structure, which represents all the type variables currently in scope as equivalent classes, together with their bound types and some other extra information, such as whether the bound type is allowed to be opaque (i.e.
+a forward declaration without definition in scope), and whether the bounds are allowed to be widened.
+In the more general approach commonly used in functional languages, the unification variables are given a lower bound and an upper bound to account for covariance and contravariance of types.
+\CFA currently does not implement any variance with its generic types, and does not allow polymorphic function types, therefore no explicit upper bound is needed and one simple binding value for each equivalence class suffices.
+However, since type conversions are allowed in \CFA, the type environment needs to keep track on which type variables are allowed conversions.
+This behavior is notably different from \CC template argument deduction which enforces an exact match everywhere unless the template argument types are explicitly given.
+For example, a polymorphic maximum function in \CFA can be called with arguments of different arithmetic types and the result follows the usual arithmetic conversion rules, while such expression is not allowed by \CC:
+\begin{cfa}
+forall (T | {int ?<? (T, T); }) T max (T, T);
+
+max (42, 3.14); // OK, T=double; requires explicit type annotation in C++ such as max<double>(42, 3.14);
+\end{cfa}
+
+From a theoretical point of view, the simplified implementation of type environment has its shortcomings. There are some cases that do not work nicely with this implementation and some compromise has to be made. A more detailed discussion follows at the end of this chapter.
+
+
+\section{Satisfaction of Assertions}
+
+The assertion satisfaction problem greatly increases the complexity of \CFA expression resolution.
+Past experiments have shown that the majority of time is spent in resolving the assertions for those expressions that takes the longest time to resolve.
+Even though a few heuristics-based optimizations are introduced to the compiler now, this remains to be the most costly part of compiling a \CFA program.
+The major difficulty of resolving assertions is that the problem can become recursive, since the expression used to satisfy an outstanding assertion can have its own assertions, and in theory this can go on indefinitely.
+Detecting infinite recursion cases in general is not algorithmically possible and it is not attempted in the compiler.
+Instead, a fixed maximum depth of recursive assertions is imposed.
+This approach is also taken by \CC compilers as template argument deduction is also similarly undecidable in general.
+
+
+In previous versions of \CFA this number was set at 4; as the compiler becomes more optimized and capable of handling more complex expressions in a reasonable amount of time, I have increased the limit to 8 and in most occasions it does not lead to trouble.
+Very rarely there will be a case where the infinite recursion produces an exponentially growing assertion set, causing minutes of time wasted before the limit is reached.
+Fortunately it is very hard to run into this situation with realistic \CFA code, and the ones that were found all have some clear characteristics, which can be prevented by some clever tricks.
+In fact, some of the performance optimizations come from analyzing these problematic cases.
+One example of such will be presented later in this section.
+
+While the assertion satisfaction problem in isolation looks like just another expression to resolve, the recursive nature makes some techniques applied to expression resolution without assertions no longer possible.
+The most significant impact is that the type unification has a side effect, namely editing the type environment (equivalence classes and bindings), which means that if one expression has multiple associated assertions, they are not independent as the changes to the type environment must be compatible for all the assertions to be resolved.
+Particularly, if one assertion parameter can be resolved in multiple different ways, all of the results need to be checked to make sure the change to type variable bindings are compatible with other assertions to be resolved.
+A naive algorithm that simply picks any pending assertion to resolve and continue in a depth-first search could be very inefficient and especially prone of falling into an infinite loop, while in many cases it can be avoided by examining other assertions that can provide insight on the desired type binding: if one assertion parameter can only be matched by a unique option, we can then update the type bindings confidently without the need of backtracking.
+
+
+The algorithm currently used in \CFA compiler is designed by Aaron Moss through a simplified prototype experiment that captures most of \CFA type system features and ported back to the actual language.
+It can be described as a mix of breadth- and depth-first search in a staged approach.
+
+
+To resolve a set of assertions, the algorithm first attempts to resolve each assertion item individually.
+There are three possible outcomes on resolving each assertion:
+\begin{enumerate}
+\item If no matches are found, the algorithm terminates with a failure.
+\item If exactly one match is found, the type environment is updated immediately, and used in resolving any remaining assertions.
+\item If multiple matches are found, the assertion candidates with their updated type environments are stored in a list that will be checked for compatibility at the end.
+\end{enumerate}
+When all assertion items are resolved successfully, the algorithm attempts to combine the ambiguously resolved assertions to produce mutually compatible assignments.
+If any new assertions are introduced by the selected candidates, the algorithm is applied recursively, until there are none pending resolution or the recursion limit is reached which results in a failure.
+
+It has been discovered in practice that the efficiency of such algorithm can sometimes be very sensitive to the order of resolving assertions.
+Suppose an unbound type variable @T@ appears in two assertions, one can be uniquely resolved and allow the type @T@ to be inferred immediately, and another has many ways to be resolved, each results in @T@ being bound to a different concrete type.
+If the first assertion is examined first by the algorithm, the deducted type can then be utilized in resolving the second assertion and eliminate many incorrect options without producing the list of candidates pending further checks.
+In practice, this have a significant impact when an unbound type @T@ is declared to satisfy the basic \emph{object assertions}\footnote{The term is borrowed from object-oriented languages although \CFA is not object-oriented in principle.} of having a default constructor, destructor, and copy assignment operations.
+Since they are defined for every type currently in scope, there are often hundreds or even thousands of matches to these functions with an unspecified operand type, and in most of the cases the value of @T@ can be deduced by resolving another assertion first, which then allows specific object lifetime functions to be looked up since they are sorted internally by the operand type, and greatly reduces the number of wasted resolution attempts.
+
+Currently this issue also causes the capability of the assertion resolution algorithm to be limited.
+Assertion matching is implemented to be more restricted than expression resolution in general, in that the parameter types must match exactly, rather than just merely callable.
+If one function declaration includes an assertion of @void f(T)@ and only a @f(long)@ is currently in scope, trying to resolve the assertion with @T=int@ would not work.
+Loosening the assertion matching requirement causes many assertion variables to have multiple matches and makes the delayed combination step too costly.
+
+Given all the issues caused by assertion resolution potentially creating new type variable bindings, a natural progression is to put some restrictions on free type variables such that all the type variables will be bound when the expression reaches assertion resolution stage.
+A type variable introduced by the @forall@ clause of function declaration can appear in parameter types, return types and assertion variables.
+If it appears in the parameter types, it will be bound when matching the arguments to parameters at the call site.
+If it only appears in the return type, it can be eventually figured out by the context in principle.
+The current implementation in \CFA compiler does not do enough return type deduction as it performs eager assertion resolution, and the return type information cannot be known in general before the parent expression is resolved, unless the expression is in an initialization context, in which the type of variable to be initialized is certainly known.
+By delaying the assertion resolution until the return type becomes known, this problem can be circumvented.
+The truly problematic case occurs if a type variable does not appear in either of the parameter or return types and only appears in some assertion variables.
+Such case is very rare and it is not evident that forcing every type variable to appear at least once in parameter or return types limits the expressiveness of \CFA type system to a significant extent.
+In the next chapter I will discuss about a proposal of including type declarations in traits rather than having all type variables appear in the trait parameter list, which could be helpful for providing equivalent functionality of having an unbound type parameter in assertion variables, and also addressing some of the variable cost issue discussed in section 4.1.
+
+
+\subsection*{Caching Assertion Resolution Results}
+
+In Aaron Moss's prototype design and experiments, a potential optimization of caching the result of already resolved assertions is discussed.
+Based on the experiment results, this approach can improve the performance of expression resolution in general, and sometimes allow hard instances of assertion resolution problems to be solved that are otherwise infeasible, for example when the resolution would encounter infinite loops.
+
+The problem that makes this approach tricky to be implemented correctly is that the resolution algorithm has side effects, namely modifying the type bindings in the environment.
+If we were to cache those results that cause the type bindings to be modified, it would be necessary to store the changes to type bindings too, and in case where multiple candidates can be used to satisfy one assertion parameter, all of them needs to be cached including those that are not eventually selected, since the side effect can produce different results depending on the context.
+
+In the original design of \CFA that includes unrestricted polymorphic formal parameters that can have assertions on themselves, the problem is even more complicated as new declarations can be introduced in scope during expression resolution.
+Here is one such example taken from Bilson:
+\begin{cfa}
+void f( forall( T | { T -?( T ); } ) T (*p)( T, T ) );
+forall( U, V | { U -?( U ); V -?( V ); } ) U g( U, V ) );
+f( g );
+\end{cfa}
+The inner assertion parameter on the \textit{closed} type variable @T@ is used to satisfy the assertions on @U@ and @V@ in this example.
+
+However, as per the previous discussions on this topic, polymorphic function pointer types have been removed from \CFA, since correctly implementing assertion matching is not possible in general.
+Without closed parameters (and therefore no have-set for assertions) the set of declarations in scope remains unchanged while resolving any expression.
+The current \CFA implementation also does not attempt to widen any already bound type parameters to satisfy an assertion.
+Note that such restriction does mean that certain kinds of expressions cannot be resolved, for example:
+\begin{cfa}
+forall (T | {void f(T);}) void g(T);
+void f (long);
+g(42);
+\end{cfa}
+The call @g(42)@ is rejected since no attempt is made to widen the parameter type @T@ from @int@ to @long@.
+Such problem could be mitigated if we allow inexact matches of assertions, but cannot be eliminated completely, if @T@ is matched in a parameterized type, including pointers and references:
+\begin{cfa}
+forall (T | {void f(T*);}) void g(T);
+void f (long *);
+g(42);
+\end{cfa}
+Here the only way to resolve the call @g(42)@ is to allow assertion resolution to widen the parameter @T@, since even with inexact matching, @int*@ cannot be converted to @long*@.
+
+
+\section{Compiler Implementation Considerations}
+\CFA is still an experimental language and there is no formal specification of expression resolution rules yet. 
+Presently there is also only one reference implementation, namely the cfa-cc translator, which is under active development and the specific behavior of the implementation can change frequently as new features are added.
+
+Ideally, the goal of expression resolution involving polymorphic functions would be to find the set of type variable assignments such that the global conversion cost is minimal and all assertion variables can be satisfied. 
+Unfortunately, with a lot of complications involving implicit conversions and assertion variables, fully achieving this goal is not realistic. The \CFA compiler is specifically not designed to accommodate for all edge cases, either. 
+Instead it makes a few restrictions to simplify the algorithm so that most expressions that will be encountered in actual code can still pass type checking within a reasonable amount of time.
+
+As previously mentioned, \CFA polymorphic type resolution is based on a modified version of unification algorithm, where both equivalence (exact) and subtyping (inexact) relations are considered. However, the implementation of type environment is simplified; it only stores a tentative type binding with a flag indicating whether \textit{widening} is possible for an equivalence class of type variables. 
+Formally speaking, this means the type environment used in \CFA compiler is only capable of representing \textit{lower bound} constraints.
+This simplification can still work well most of the time, given the following properties of the existing \CFA type system and the resolution algorithms in use:
+
+\begin{enumerate}
+	\item Type resolution almost exclusively proceeds in bottom-up order, which naturally produces lower bound constraints. Since all identifiers can be overloaded in \CFA, not much definite information can be gained from top-down. In principle it would be possible to detect non-overloaded function names and perform top-down resolution for those; however, the prototype experiments have shown that such optimization does not give a meaningful performance benefit, and therefore it is not implemented. 
+	\item Few nontrivial subtyping relationships are present in \CFA, namely the arithmetic types presented in \VRef[Figure]{f:CFACurrArithmeticConversions}, and qualified pointer/reference types. In particular, \CFA lacks the nominal inheritance subtyping present in object-oriented languages, and the generic types do not support covariance on type parameters. As a result, named types such as structs are always matched by strict equivalence, including any type parameters should they exist.
+	\item Unlike in functional programming where subtyping between arrow types exists, \ie if $T_2 <: T_1$ and $U_1 <: U_2$ then $T_1 \rightarrow T_2 <: U_1 \rightarrow U_2$, \CFA uses C function pointer types and the parameter/return types must match exactly to be compatible.
+\end{enumerate}
+
+\CFA does attempt to incorporate type information propagated from upstream in the case of variable declaration with initializer, since the type of the variable being initialized is definitely known. It is known that the current type environment representation is flawed in handling such type deduction when the return type in the initializer is polymorphic, and an inefficient workaround has to be performed in certain cases. An annotated example is included in the \CFA compiler source code:
+
+\begin{cfa}
+// If resolution is unsuccessful with a target type, try again without, since it
+// will sometimes succeed when it wouldn't with a target type binding.
+// For example:
+forall( otype T ) T & ?[]( T *, ptrdiff_t );
+const char * x = "hello world";
+int ch = x[0];
+// Fails with simple return type binding (xxx -- check this!) as follows:
+// * T is bound to int
+// * (x: const char *) is unified with int *, which fails
+\end{cfa}
+
+The problem here is that we can only represent the constraints $T = int$ and $int <: T$, but since the type information flows in the opposite direction, the proper constraint for this case is $T <: int$, which cannot be represented in the simplified type environment. Currently, an attempt to resolve with equality constraint generated from the initialized variable is still made, since it is often the correct type binding (especially in the case where the initialized variable is a struct), and when such attempt fails, the resolution algorithm is rerun without the initialization context.
+
+One additional remark to make here is that \CFA does not provide a mechanism to explicitly specify values for polymorphic type parameters. In \CC for example, users may specify template arguments in angle brackets, which could be useful when automatic deduction fails, \eg @max<double>(42, 3.14)@. 
+There are some partial workarounds such as adding casts to the arguments, but they are not guaranteed to work in all cases. If a type parameter appears in the function return type, however, using the ascription (return) cast will force the desired type binding, since the cast only compiles when the expression type matches exactly with the target.
+
+\section{Related Work}
+
+
+
