Context Navigation

← Previous Changeset
Next Changeset →

Changeset 7d405eb

Timestamp:

Apr 7, 2025, 9:22:23 PM (8 months ago)

Author:

Peter A. Buhr <pabuhr@…>

Branches:

Children:

Parents:

Message:

more proofreading, change file names

Location:

doc/theses/fangren_yu_MMath

Files:

: 5 edited

conclusion.tex (modified) (1 diff)
future.tex (modified) (1 diff)
intro.tex (modified) (18 diffs)
resolution.tex (modified) (20 diffs)
uw-ethesis.tex (modified) (2 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/theses/fangren_yu_MMath/conclusion.tex

r0393fda8	r7d405eb
3	3	The goal of this thesis is to ...
4	4
5		~~\section{Future Work}~~

doc/theses/fangren_yu_MMath/future.tex

r0393fda8	r7d405eb
5	5	\section{Closed trait types}
6	6
7		\CFA as it currently is does not have any closed types, as new functions can be added at any time. It is also possible to locally declare a function,\footnote{Local functions are not a standard feature in C but supported by mainstream C compilers such as gcc, and allowed in \CFA too.} or a function pointer variable to make a type satisfy a certain trait temporarily and be used as such in this limited scope. However, the lack of closed types in such a "duck typing" scheme proposes two problems. For library implementors, it is common to not want the defined set of operations to be overwritten and cause the behavior of polymorphic invocations to change. For the compiler, it means caching and reusing the result of resolution is not reliable as newly introduced declarations can participate in assertion resolution, making a previously invalid expression valid, or the other way around by introducing ambiguity in assertions. Sometimes those interfaces are fairly complicated, for example the I/O library traits \textbf{istream} and \textbf{ostream} each has over 20 operations. Without the ability to store and reuse assertion resolution results, each time the compiler encounters an I/O operation in the source code (mainly the pipe operator \textbf{?\|?} used to represent stream operations in \CFA) it has to resolve the same set of assertions again, causing a lot of repetitive work. Previous experiments have shown that the I/O assertions often account for over half of the number of assertions resolved in a \CFA translation unit. Introducing a way to eliminate the need of doing such repetitive assertion resolutions that are very unlikely to change by new overloads can therefore provide significant improvement to the performance of the compiler.
	7	\CFA as it currently is does not have any closed types, as new functions can be added at any time. It is also possible to locally declare a function,\footnote{Local functions are not a standard feature in C but supported by mainstream C compilers such as gcc, and allowed in \CFA too.} or a function pointer variable to make a type satisfy a certain trait temporarily and be used as such in this limited scope. However, the lack of closed types in such a "duck typing" scheme proposes two problems. For library implementors, it is common to not want the defined set of operations to be overwritten and cause the behaviour of polymorphic invocations to change. For the compiler, it means caching and reusing the result of resolution is not reliable as newly introduced declarations can participate in assertion resolution, making a previously invalid expression valid, or the other way around by introducing ambiguity in assertions. Sometimes those interfaces are fairly complicated, for example the I/O library traits \textbf{istream} and \textbf{ostream} each has over 20 operations. Without the ability to store and reuse assertion resolution results, each time the compiler encounters an I/O operation in the source code (mainly the pipe operator \textbf{?\|?} used to represent stream operations in \CFA) it has to resolve the same set of assertions again, causing a lot of repetitive work. Previous experiments have shown that the I/O assertions often account for over half of the number of assertions resolved in a \CFA translation unit. Introducing a way to eliminate the need of doing such repetitive assertion resolutions that are very unlikely to change by new overloads can therefore provide significant improvement to the performance of the compiler.
8	8
9	9	The output stream trait in \CFA looks like this:

doc/theses/fangren_yu_MMath/intro.tex

-              r0393fda8
+              r7d405eb
 \chapter{Introduction}
 This thesis is exploratory work I did to understand, fix, and extend the \CFA type-system, specifically, the \newterm{type-resolver} used to satisfy call-site assertions among overloaded variable and function names to allow polymorphic routine calls.
+This thesis is exploratory work I did to understand, fix, and extend the \CFA type-system, specifically, the \newterm{type-resolver} used to satisfy call-site assertions among overloaded variable and function names to allow polymorphic function calls.
 Assertions are the operations a function uses within its body to perform its computation.
 For example, a polymorphic function summing an array needs a size, zero, assignment, and plus for the array element-type, and a subscript operation for the array type.
 …
 T sum( T a[$\,$], size_t size ) {
         @T@ total = { @0@ };  $\C[1.75in]{// size, 0 for type T}$
         for ( size_t i = 0; i < size; i += 1 )
+        for ( i; size )
                 total @+=@ a@[@i@]@; $\C{// + and subscript for T}\CRT$
         return total;
 …
 Experience from \CC and \CFA developers shows the type system can implicitly and correctly disambiguates the majority of overloaded names, \ie it is rare to get an incorrect selection or ambiguity, even among hundreds of overloaded (variables and) functions.
 In many cases, a programmer is unaware of name clashes, as they are silently resolved, simplifying the development process.
+Disambiguating among overloads is implemented by examining each call site and selecting the best matching overloaded function based on criteria like the types and number of arguments and the return context.
+Since the hardware does not support mixed-mode operands, @2 + 3.5@, the type system must disallow it or (safely) convert the operands to a common type.
+Like overloading, the majority of mixed-mode conversions are silently resolved, simplifying the development process.
+This approach matches with programmer intuition and expectation, regardless of any \emph{safety} issues resulting from converted values.
+Depending on the language, mix-mode conversions can be explicitly controlled using a cast.
 \newterm{Namespace pollution} refers to loading the global or other namespaces with many names, resulting in paranoia that the compiler could make wrong choices for overloaded names causing failure.
 …
 \newterm{General overloading} occurs when the type-system \emph{knows} a function's parameters and return types (or a variable's type for variable overloading).
 In functional programming-languages, there is always a return type (except for a monad).
 If a return type is specified, the compiler does not have to inference the routine body.
+If a return type is specified, the compiler does not have to inference the function body.
 For example, the compiler has complete knowledge about builtin types and their overloaded arithmetic operators.
 In this context, there is a fixed set of overloads for a given name that are completely specified.
 …
 \begin{comment}
 \begin{lstlisting}[language=Haskell]
+\begin{haskell}
 f( int, int );  f( int, float ); -- return types to be determined
 g( int, int );  g( float, int );
 let x = curry f( 3, _ ); -- which f
 let y = curry g( _ , 3 ); -- which g
 \end{lstlisting}
+\end{haskell}
 For the currying to succeed, there cannot be overloaded function names resulting in ambiguities.
 To allow currying to succeed requires an implicit disambiguating mechanism, \ie a kind of transfer function.
 …
 A type-class instance binds a specific type to the generic operations to form concrete instances, giving a name type-class.
 Then Qualified types concisely express the operations required to convert an overloaded
 The name type-class is used as a transfer function to convert an overloaded routine into a polymorphic routine that is uniquely qualified with the  name type-class.
+The name type-class is used as a transfer function to convert an overloaded function into a polymorphic function that is uniquely qualified with the  name type-class.
 \begin{cfa}
 void foo_int_trait( special int trait for operations in this foo );
 …
 > I'm making this up. The Haskell type-class is a trait, like an interface or
 > abstract class, and its usage/declaration/binding creates a specific trait
 > instance for bound types, which is a vtable filled with the typed routines
+> instance for bound types, which is a vtable filled with the typed functions
 > instantiated/located for the trait. The vtables are present at runtime and
 > passed implicitly to ad-hoc polymorphic routines allowing differentiate of
+> passed implicitly to ad-hoc polymorphic functions allowing differentiate of
 > overloaded functions based on the number of traits and their specialization.
 > (Major geek talk, YA! 8-)
 …
 Virtually all programming languages provide general overloading of the arithmetic operators across the basic computational types using the number and type of parameters and returns.
 However, in some languages, arithmetic operators may not be first class, and hence, cannot be overloaded.
 Like \CC, \CFA does allow general operator overloading for user-defined types.
+Like \CC, \CFA allows general operator overloading for user-defined types.
 The \CFA syntax for operator names uses the @'?'@ character to denote a parameter, \eg left and right unary operators: @?++@ and @++?@, and binary operators @?+?@ and @?<=?@.
 Here, a user-defined type is extended with an addition operation with the same syntax as a builtin type.
 …
 s1 = @?+?@( s1, s2 );   $\C{// direct call}\CRT$
 \end{cfa}
+The type system examines each call site and selects the best matching overloaded function based on the number and types of arguments.
+If there are mixed-mode operands, @2 + 3.5@, the type system attempts (safe) conversions, like in C/\CC, converting the argument type(s) to the parameter type(s).
+Conversions are necessary because the hardware rarely supports mix-mode operations, so both operands must be converted to a common type.
+Like overloading, the majority of mixed-mode conversions are silently resolved, simplifying the development process.
+This approach matches with programmer intuition and expectation, regardless of any \emph{safety} issues resulting from converted values.
+Depending on the language, mix-mode conversions can be explicitly controlled using some form of cast.
+\CFA excludes overloading the comma operator, short-circuit logical operators \lstinline{&&} and \lstinline{||}, and ternary conditional operator \lstinline{?} (\eg @max = x > y ? x : y@), all of which are short-hand control structures rather than operators.
+\CC \emph{does} overload the comma and short-circuit logical operators, where overloading the comma operator is esoteric and rarely used, and the short-circuit operators do not exhibit short-circuit semantics, both of which are confusing and/or inconsistent.
 …
 auto x = @...@
 \end{cfa}
 and the need to write a routine to compute using @x@
+and the need to write a function to compute using @x@
 \begin{cfa}
 void rtn( @type of x@ parm );
 …
 void qsort( void * base, size_t nmemb, size_t size, int (*cmp)( const void *, const void * ) );
 \end{cfa}
 Here, the polymorphism is type-erasure, and the parametric assertion is the comparison routine, which is explicitly passed.
+Here, the polymorphism is type-erasure, and the parametric assertion is the comparison function, which is explicitly passed.
 \begin{cfa}
 enum { N = 5 };
 …
 \subsection{Traits}
+\label{s:Traits}
 \CFA provides \newterm{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration.
 …
 \begin{tabular}{@{}l|@{\hspace{10pt}}l@{}}
 \begin{cfa}
 trait @sumable@( T ) {
+forall( T ) trait @sumable@ {
         void @?{}@( T &, zero_t ); // 0 literal constructor
         T ?+?( T, T );           // assortment of additions
         T @?+=?@( T &, T );
+        T @?+=?@( T &, T );              // assortment of additions
+        T ?+?( T, T );
         T ++?( T & );
         T ?++( T & );
 …
+&
 \begin{cfa}
 forall( T @| sumable( T )@ ) // use trait
+forall( T @| sumable( T )@ )   // use trait
 T sum( T a[$\,$], size_t size ) {
         @T@ total = { @0@ };  // initialize by 0 constructor
         for ( size_t i = 0; i < size; i += 1 )
                 total @+=@ a[i]; // select appropriate +
+        @T@ total = 0;          // initialize by 0 constructor
+        for ( i; size )
+                total @+=@ a[i];    // select appropriate +
         return total;
+}
 …
 };
 \end{cfa}
 These implicit routines are used by the @sumable@ operator @?+=?@ for the right side of @?+=?@ and return.
+These implicit functions are used by the @sumable@ operator @?+=?@ for the right side of @?+=?@ and return.
 If the array type is not a builtin type, an extra type parameter and assertions are required, like subscripting.
 …
 The difference between fixed and dynamic is the complexity and cost of field access.
 For fixed, field offsets are computed (known) at compile time and embedded as displacements in instructions.
 For dynamic, field offsets are compile-time computed at the call site, stored in an array of offset values, passed as a polymorphic parameter, and added to the structure address for each field dereference within a polymorphic routine.
+For dynamic, field offsets are compile-time computed at the call site, stored in an array of offset values, passed as a polymorphic parameter, and added to the structure address for each field dereference within a polymorphic function.
 See~\cite[\S~3.2]{Moss19} for complete implementation details.
 …
 general\footnote{overloadable entities: V $\Rightarrow$ variable, O $\Rightarrow$ operator, F $\Rightarrow$ function, M $\Rightarrow$ member}
                                                 & O\footnote{except assignment}/F       & O/F/M & V/O/F & M\footnote{not universal}     & O/M   & O/F/M & no    & no    \\
+general constraints\footnote{parameter \# $\Rightarrow$ number, T $\Rightarrow$ type, N $\Rightarrow$ name; R $\Rightarrow$ return type}
+                                                & \#/T/R        & \#/T  & \#/T/R        & \#/T  & \#/T/N/R      & \#/T/N/R      & \#/T/N        & T/R \\
+type parameters                 & no            & yes   & yes           & yes   & yes           & yes           & yes   & yes \\
+type constraint\footnote{T $\Rightarrow$ trait/concept, B $\Rightarrow$ type bounds, TC $\Rightarrow$ type class}
+                                                & no            & T             & T                     & T             & B                     & T                     & T     & TC \\
+general constraints\footnote{T $\Rightarrow$ parameter type, \# $\Rightarrow$ parameter number, N $\Rightarrow$ parameter name; R $\Rightarrow$ return type}
+                                                & T/\#/N/R      & T/\#  & T/\#/R        & T/\#  & T/\#/N/R      & T/\#/N/R      & T/\#/N        & T/R \\
+univ. type constraints\footnote{A $\Rightarrow$ concept, T $\Rightarrow$ interface/trait/type-class, B $\Rightarrow$ type bounds}
+                                                & no\footnote{generic cannot be overloaded}             & C             & T                     & B             & B                     & T                     & T                     & T \\
 Safe/Unsafe arg. conv.  & no            & S/U\footnote{no conversions allowed during template parameter deduction}      & S/U
                                                 & S\footnote{unsafe (narrowing) conversion only allowed in assignment or initialization to a primitive (builtin) type}  & S
 …
 \end{tabular}
 \end{cquote}
 The second row classifies the specialization mechanisms used to distinguish among the general overload capabilities.
 \begin{cquote}
 \begin{tabular}{@{}llll@{}}
+\multicolumn{1}{c}{\textbf{number}} & \multicolumn{1}{c}{\textbf{type}} & \multicolumn{1}{c}{\textbf{name}} & \multicolumn{1}{c}{\textbf{return}} \\
+\begin{lstlisting}[language=swift]
+\multicolumn{1}{c}{\textbf{type}} & \multicolumn{1}{c}{\textbf{number}} & \multicolumn{1}{c}{\textbf{name}} & \multicolumn{1}{c}{\textbf{return}} \\
+\begin{swift}
+func foo( _ x : Int )
+func foo( _ x : Double )
+foo( 3 )
+foo( 3.5 )
+\end{swift}
+&
+\begin{swift}
 func foo( _ x : Int )
 func foo( _ x : Int, _ y : Int )
 foo( 3 )
 foo( 3, 3 )
+\end{lstlisting}
+&
+\begin{lstlisting}[language=swift]
+func foo( _ x : Int )
+func foo( _ x : Double )
+foo( 3 )
+foo( 3.5 )
+\end{lstlisting}
+&
+\begin{lstlisting}[language=swift]
+\end{swift}
+&
+\begin{swift}
 func foo( x : Int )
 func foo( y : Int )
 foo( x : 3 )
 foo( y : 3 );
 \end{lstlisting}
+&
 \begin{lstlisting}[language=swift]
+\end{swift}
+&
+\begin{swift}
 func foo() -> Int
 func foo() -> String
 var i : Int = foo()
 var s : String = foo();
 \end{lstlisting}
+\end{swift}
 \end{tabular}
 \end{cquote}
+The third row classifies if generic functions can be overloaded based on the number and differing type variables.
+\begin{cfa}
+forall( T ) T foo( T t );
+forall( T ) T foo( T t, T s );
+forall( T, U ) T foo( T t, U s );
+\end{cfa}
+The fourth row classifies the mechanism used to specialize the type variables to make them practical.
+\begin{cfa}
+forall( T | T ?+?( T, T ) ) T foo( T t );
+\end{cfa}
+The fifth row classifies if conversions are attempted beyond exact match.
+\begin{cfa}
+int foo( double ); // 1
+double foo( int ); // 2
+int i = foo( 3 ); // 1 : 3 => 3.0
+double d = foo( 3.5 ); // 1 : int result => double
+The third row classifies if generic functions can be overloaded based on differing type variables, number of type variables, and type-variable constraints.
+The following examples illustrates the first two overloading cases.
+\begin{swift}
+func foo<T>( a : T );  $\C[2.25in]{// foo1}$
+func foo<T>( a : T,  b : T );  $\C{// foo2}$
+func foo<T,U>( a : T,  b :  U );  $\C{// foo3}$
+foo( a : 3.5 );  $\C{// foo1}$
+foo( a : 2, b : 2 );  $\C{// foo2}$
+foo( a : 2, b : 2.5 );  $\C{// foo3}\CRT$
+\end{swift}
+Here, the overloaded calls are disambiguated using argument types and their number.
+However, the parameter operations are severely restricted because universal types have few operations.
+For example, swift provides a @print@ operation for its universal type, and the java @Object@ class provides general methods: @toString@, @hashCode@, @equals@, @finalize@, \etc.
+This restricted mechanism still supports a few useful functions, where the parameters are abstract entities, \eg:
+\begin{swift}
+func swap<T>( _ a : inout T, _ b : inout T ) { let temp : T = a;  a = b;  b = temp; }
+var ix = 4, iy = 3;
+swap( &ix, &iy );
+var fx = 4.5, fy = 3.5;
+swap( &fx, &fy );
+\end{swift}
+To make a universal function useable, an abstract description is needed for the operations used on the parameters within the function body.
+Type matching these operations can occur by discover using techniques like \CC template expansion, or explicit stating, \eg interfaces, subtyping (inheritance), assertions (traits), type classes, type bounds.
+The mechanism chosen can affect separate compilation or require runtime type information (RTTI).
+\begin{description}
+\item[concept] ~
+\begin{c++}
+concept range = requires( T& t ) {
+        ranges::begin(t); // equality-preserving for forward iterators
+        ranges::end (t);
+};
+\end{c++}
+\item[traits/type-classes] ~
+\begin{cfa}
+forall( T | { T ?+?( T, T ) } )
+\end{cfa}
+\item[inheritance/type bounds] ~
+\begin{scala}
+class CarPark[Vehicle  >: bike  <: bus](val lot : Vehicle)
+\end{scala}
+\end{description}
+\begin{figure}
+\setlength{\tabcolsep}{15pt}
+\begin{tabular}{@{}ll@{}}
+\multicolumn{1}{c}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{Haskell}} \\
+\begin{cfa}
+forall( T ) trait sumable {
+        void ?{}( T &, zero_t );
+        T ?+=?( T &, T );
+};
+forall( T | sumable( T ) )
+T sum( T a[], size_t size ) {
+        T total = 0;
+        for ( i; size ) total += a[i];
+        return total;
+}
+struct S { int i, j; };
+void ?{}( S & s, zero_t ) { s.[i, j] = 0; }
+void ?{}( S & s ) { s{0}; }
+void ?{}( S & s, int i, int j ) { s.[i, j] = [i, j]; }
+S ?+=?( S & l, S r ) { l.[i, j] += r.[i, j]; }
+int main() {
+        int ia[] = { 1, 2, 3 };
+        sout | sum( ia, 3 );        // trait inference
+        double da[] = { 1.5, 2.5, 3.5 };
+        sout | sum( da, 3 );        // trait inference
+        S sa[] = { {1, 1}, {2, 2}, {3, 3 } };
+        sout | sum( sa, 3 ).[i, j]; // trait inference
+}
+\end{cfa}
+&
+\begin{haskell}
+class Sumable a where
+        szero :: a
+        sadd :: a -> a -> a
+ssum ::  Sumable a $=>$ [a] -> a
+ssum (x:xs) = sadd x (ssum xs)
+ssum [] = szero
+data S = S Int Int deriving Show
+@instance Sumable Int@ where
+        szero = 0
+        sadd = (+)
+@instance Sumable Float@ where
+        szero = 0.0
+   sadd = (+)
+@instance Sumable S@ where
+        szero = S 0 0
+        sadd (S l1 l2) (S r1 r2) = S (l1 + r1) (l2 + r2)
+main = do
+        print ( ssum [1::Int, 2, 3] )
+        print ( ssum [1.5::Float, 2.5, 3.5] )
+        print ( ssum [(S 1 1), (S 2 2), (S 3 3)] )
+\end{haskell}
+\end{tabular}
+\caption{Implicitly/Explicitly Trait Inferencing}
+\label{f:ImplicitlyExplicitlyTraitInferencing}
+\end{figure}
+One differentiating feature among these specialization techniques is the ability to implicitly or explicitly infer the trait information at a class site.
+\VRef[Figure]{f:ImplicitlyExplicitlyTraitInferencing} compares the @sumable@ trait and polymorphic @sum@ function \see{\VRef{s:Traits}} for \CFA and Haskell.
+Here, the \CFA type system inferences the trait functions at each call site, so no additional specification is necessary by the programmer.
+The Haskell program requires the programmer to explicitly bind the trait and to each type that can be summed.
+(The extra casts in the Haskell program are necessary to differentiate between the two kinds of integers and floating-point numbers.)
+Ada also requires explicit binding, creating a new function name for each generic instantiation.
+\begin{ada}
+function int_Twice is new Twice( Integer, "+" => "+" );
+function float_Twice is new Twice( Float, "+" => "+" );
+\end{ada}
+Finally, there is a belief that certain type systems cannot support general overloading, \eg Haskell.
+As \VRef[Table]{t:OverloadingFeatures} shows, there are multiple languages with both general and parametric overloading, so the decision to not support general overloading is based on the opinion of the language designers and the type system they choose, not any reason in type theory.
+The fourth row classifies if conversions are attempted beyond exact match.
+\begin{cfa}
+int foo( double ); $\C[1.75in]{// 1}$
+double foo( int ); $\C{// 2}$
+int i = foo( 3 ); $\C{// 1 : 3 to 3.0 argument conversion => all conversions are safe}$
+double d = foo( 3.5 ); $\C{// 1 : int to double result conversion => all conversions are safe}\CRT$
 \end{cfa}

doc/theses/fangren_yu_MMath/resolution.tex

-              r0393fda8
+              r7d405eb
 \label{c:content2}
 The \CFA's type-system provides expressive polymorphism: variables can be overloaded, functions can be overloaded by argument and return types, tuple types, generic (polymorphic) functions and types (aggregates) can have multiple type parameters with assertion restrictions;
+Recapping, the \CFA's type-system provides expressive polymorphism: variables can be overloaded, functions can be overloaded by argument and return types, tuple types, generic (polymorphic) functions and types (aggregates) can have multiple type parameters with assertion restrictions;
 in addition, C's multiple implicit type-conversions must be respected.
 This generality leads to internal complexity and correspondingly higher compilation cost directly related to type resolution.
 The reason is that the type resolver must analyze \emph{each} component of an expression with many possible forms of overloading, type restrictions, and conversions.
 Designing a ruleset that behaves as expected, \ie matches C programmer expectations, and implementing an efficient algorithm is a very challenging task.
 My first work on the \CFA type-system was as a co-op student.
+Designing a ruleset that is expressive, behaves as expected, \ie matches C programmer expectations, and can be efficiently implemented is a very challenging task.
+I first worked on the \CFA type-system as a co-op student.
 At that time, compiling a medium-sized \CFA program using advanced polymorphism took multiple minutes (5+ minutes)~\cite[\S~5]{Yu20}.
 I found a number of type-resolution problems, which resulted in the following changes to the type-resolution algorithm.
 …
 \end{enumerate}
 \VRef[Table]{t:SelectedFileByCompilerBuild} shows improvements for selected tests with accumulated reductions in compile time across each of the 5 fixes.
 The large reduction in compilation time significantly improved the development of the \CFA's runtime because of its frequent compilation cycles.
+To this day, the large reduction in compilation time significantly improves the development of the \CFA's runtime because of its frequent compilation cycles.
 \begin{table}[htb]
 …
 \section{Expression Cost-Model}
+\label{s:ExpressionCostModel}
 \CFA has been using a total-order expression cost-model to resolve ambiguity of overloaded expressions from the very beginning.
+Most \CFA operators can be overloaded in \CFA\footnote{
+\CFA excludes overloading the comma operator, short-circuit logical operators \lstinline{&&} and \lstinline{||}, and ternary conditional operator \lstinline{?}, all of which are short-hand control structures rather than operators.
+\CC overloads the comma and short-circuit logical operators, where the overloaded comma is esoteric short-circuit operators do not exhibit short-circuit semantics, which is confusing.};
+Most \CFA operators can be overloaded in \CFA;
 hence, they are treated the same way as other function calls with the same rules for overload resolution.
 In \CFA, candidates of an overloaded expression are ranked by numerical cost elements, which account any type conversions needed from a call-site argument type to the matching function parameter type, as well as polymorphic types and restrictions introduced by the @forall@ clause.
 Currently, the expression cost has the following components, ranked from highest to lowest.
 \begin{enumerate}
+In \CFA, candidates of an overloaded expression are ranked by numerical cost elements, which accounts for any type conversions needed from a call-site argument type to the matching function parameter type, as well as polymorphic types and restrictions introduced by the @forall@ clause.
+Currently, the expression cost has the following components, ranked from highest to lowest. where lower cost is better.
+\begin{enumerate}[leftmargin=*]
 \item \textbf{Unsafe} cost representing a narrowing conversion of arithmetic types, \eg @int@ to @short@, and qualifier-dropping conversions for pointer and reference types.
+Narrowing conversions have the potential to lose (truncation) data.
+A programmer must decide if the computed data-range can safely be shorted in the smaller storage.
+Warnings for unsafe conversions are helpful.
+\begin{cfa}
+void f( short p );
+f( 3L ); $\C[2.5in]{// unsafe conversion, possible warning}\CRT$
+\end{cfa}
 \item \textbf{Polymorphic} cost for a function parameter type that is or contains a polymorphic type variable.
+The fewer polymorphic parameters means a more specific function, which may be able to compute faster or more accurately, like specialization templates in \CC.
+\begin{cfa}
+forall( T ) f( T p, int i ); $\C[2.5in]{// 1}$
+forall( T ) f( T p, T i );   $\C{// 2}$
+f( 3, 4 ); $\C{// 1}\CRT$
+\end{cfa}
 \item \textbf{Safe} cost representing a widening conversion \eg @short@ to @int@, qualifier-adding conversions for pointer and reference types, and value conversion for enumeration constants.
+Even when conversions are safe, the fewest conversions it ranked better, \eg @short@ to @int@ versus @short@ to @long int@.
+\begin{cfa}
+void f( long int p ); $\C[2.5in]{// 1}$
+void f( double p );   $\C{// 2}$
+f( 3h ); $\C{// 1, short constant}\CRT$
+\end{cfa}
 \item \textbf{Variable} cost that counts the number of polymorphic variables, if any, introduced by the @forall@ clause in the function declaration.
+Fewer polymorphic variables implies more specificity.
+\begin{cfa}
+forall( T, T ) f( T p1, T p2 ); $\C[2.5in]{// 1}$
+forall( T, U ) f( T p1, U p2 ); $\C{// 2}$
+f( 3, 4 ); $\C{// 1}$
+f( 3, 4.5 ); $\C{// 2}\CRT$
+\end{cfa}
 \item \textbf{Specialization} cost counting the number of restrictions introduced by type assertions.
+Fewer restriction means fews parametric variables passed at the function call giving better performance.
+\begin{cfa}
+forall( T | { T ?+?( T, T ) } ) void f( T ); $\C[3.25in]{// 1}$
+forall( T | { T ?+?( T, T ), void f( T, T ) } ); $\C{// 2}$
+f( 42 ); $\C{// 1}\CRT$
+\end{cfa}
 \end{enumerate}
 The comparison of cost tuple is by lexicographical order, from unsafe (highest) to specialization (lowest), with ties moving to the next lowest item.
+Cost tuples are compared by lexicographical order, from unsafe (highest) to specialization (lowest), with ties moving to the next lowest item.
 At a subexpression level, the lowest cost candidate for each result type is included as a possible interpretation of the expression;
 at the top level, all possible interpretations of different types are considered (generating a total ordering) and the overall lowest cost is selected as the final interpretation of the expression.
 …
 \end{comment}
 In many languages that support function/method overloading, such as \CC and Java, a partial-order system decides which interpretation of the expression gets selected.
 Hence, sometimes the candidates are incomparable (none are considered a better match), and only when one candidate is considered better than all others (maximal with respect to the partial order) is the expression unambiguous.
 Specifically, the resolution algorithms used in \CC and Java are greedy, selecting the best match for each sub-expression without considering the higher-level ones (bottom-up).
+In many languages that support function/method overloading, such as \CC and Java, a partial-order system decides which interpretation of the expression is selected.
+Hence, sometimes the candidates are incomparable (none are considered a best match), and only when one candidate is considered better than all others (maximal with respect to the partial order) is the expression unambiguous.
+Specifically, the resolution algorithms used in \CC and Java are greedy, selecting the best match for each subexpression without considering the higher-level ones (bottom-up).
 Therefore, at each resolution step, the arguments are already given unique interpretations, so the ordering only needs to compare different sets of conversion targets (function parameter types) on the same set of input.
 \begin{cfa}
 @generate a C++ example here@
+read more
 \end{cfa}
 …
 \begin{cfa}
 @generate a CFA example here@
+read more
 \end{cfa}
 so it is possible that both the selected function and the set of arguments are different, and cannot be compared with a partial-ordering system.
 …
 Ada is another programming language that has overloading based on return type.
 Although Ada also allows implicit type conversions of function arguments, it is fairly conservative on resolving ambiguities.
+Although Ada also allows implicit type conversions of function arguments, it is fairly conservative on resolving ambiguities~\cite[\S~8.6]{Ada22}.
 \begin{cfa}
 @generate an Ada example here@
+\end{cfa}
+There are only two preference rules in Ada overload resolution, one for primitive arithmetic operators and one for universal access types (analogous to @void *@ in C);
+section 8.6 the context of overload resolution
+page 468, item number 28 - 30
+\end{cfa}
+There are only two preference rules in Ada overload resolution, one for primitive arithmetic operators and one for universal access types (pointer type);
+\begin{ada}
+Function "-"( L, R : Float ) return Integer is begin
+        return Integer(L) + (-Integer(R)); --  prevent infinite recursion
+end;
+Integer i;
+i := 7 - 3; -- prefer
+i := 7.2 - 3.5
+\end{ada}
 any other cases where an expression has multiple legal interpretations are considered ambiguous.
 The current overload resolution system for \CFA is on the other end of the spectrum, as it tries to order every legal interpretations of an expression and chooses the best one according to cost, occasionally giving unexpected results.
+The current overload resolution system for \CFA is on the other end of the spectrum, as it tries to order every legal interpretations of an expression and chooses the best one according to cost, occasionally giving unexpected results rather than an ambiguity.
 Interestingly, the \CFA cost-based model can sometimes make expression resolution too permissive because it always attempts to select the lowest cost option, and only when there are multiple options tied at the lowest cost does it report the expression is ambiguous.
 The reason is that there are so many elements in the cost tuple, the type system ``tries too hard to discover a match'', and therefore, ties are uncommon.
 Other than the case of multiple exact matches, where all have cost zero, incomparable candidates under a partial ordering of being more specific can often have different expression costs since different kinds of implicit conversions are involved, resulting in seemingly arbitrary overload selections.
+Other than the case of multiple exact matches, where all have cost zero, incomparable candidates under a partial ordering can often have different expression costs since different kinds of implicit conversions are involved, resulting in seemingly arbitrary overload selections.
 There are currently at least three different situations where the polymorphic cost element of the cost model does not yield a candidate selection that is clearly justifiable, and one of them is straight up wrong.
 …
 Since polymorphic cost is ranked above safe conversion cost, option 2 is considered to have lower cost and gets selected.
 In contrast, the template deduction and overload resolution rules in \CC selects option 1 (converting @forall@ to the equivalent function template declaration).
 \CC performs template argument deduction and overload candidate ranking in two separate steps.
+In contrast, the template inferencing and overload resolution rules in \CC selects option 1 (converting @forall@ to the equivalent function template declaration).
+\CC performs template argument inferencing and overload candidate ranking in two separate steps.
 \begin{itemize}
 \item
 …
 This difference could be explained as compensating for \CFA polymorphic functions being separately compiled versus template inlining;
 hence, calling them requires passing type information and assertions increasing the runtime cost.
+We are learning that having the type system consider performance may be inappropriate.
 \item
 Having a lower total polymorphic cost does not always mean a function is more specialized.
 …
 \item
 A generic type can require more type variables to describe a more specific type.
 For example, a generic function taking a @pair@-type, requires two type variables.
+For example, consider a generic function taking a @pair@-type requires two type variables.
 \begin{cfa}
 forall( T, U ) void f( pair( T, U ) ); $\C[2.75in]{// 1}$
 …
 forall( T ) void f( T );                $\C{// 2}\CRT$
 \end{cfa}
+Passing a @pair@ variable to @f@ gives a cost of 1 poly, 2 variable for the @pair@ overload, versus a cost of 1 poly, 1 variable for the unconstrained overload.
+Programmer expectation is to select the former, but the cost model selects the latter;
+either could work.
+Passing a @pair@ variable to @f@
+\begin{cfa}
+pair p;
+f( p );
+\end{cfa}
+gives a cost of 1 poly, 2 variable for the @pair@ overload, versus a cost of 1 poly, 1 variable for the unconstrained overload.
+Programmer expectation is to select option 1 because of the exact match, but the cost model selects 2;
+while either could work, the type system should select a call that meets expectation of say the call is ambiguous, forcing the programmer to mediate.
 As a result, simply counting the number of polymorphic type variables is no longer correct to order the function candidates as being more constrained.
 \end{enumerate}
 …
 Hence, the \CC template-specialization ordering can be applied to \CFA with a slight modification.
 At the meantime, some other improvements have been made to the expression cost system.
+In the meantime, some other improvements have been made to the expression cost system.
 Most notably, the expression resolution algorithm now consistently uses the globally minimal cost interpretation, as discussed in a previous technical report.
 While implementing the change, there are also two detailed issues that need to be addressed for the new rules to fully work.
 …
 For example, @long x = abs( 42 )@ resolves to @long abs( long )@ with @int@ argument 42 converted to @long@ or @int abs( int )@ converting the result to @long@.
 In this example, the choice could be arbitrary because both yield identical results.
 However, for @int i = abs( -9223372036854775807LL )@, the result is @-1@, suggesting some kind of type error rather than silently generating a logically incorrect result.
 There are multiple overload groupings of C functions into a single \CFA name, so usage should not report an ambiguity or warning unless there is a significant chance of error.
+However, for @int i = abs( -9223372036854775807LL )@, the result is @-1@ due to the narrowing conversion from @long@ to @int@ on the assignment, suggesting at warning to the programmer to reconsider the type of @i@ or the compiler generates an error as the argument is too large for an @int@ parameter.
+The \CFA system library has multiple overload groupings of C functions into a single name, so usage should not report an ambiguity or warning unless there is a significant chance of error.
 While testing the effects of the updated cost rule, the following example was found in the \CFA standard library.
 …
 When asked, the developer expected the floating-point overload because of return-type overloading.
 This mistake went unnoticed because the truncated component was insignificant in the performance logging.
 Investigation of this example leads to the decision that the resolution algorithm favors a lower conversion cost up the expression tree when the total global cost is equal.
+To correct this mistake, I changed the resolution algorithm to favour a lower conversion cost up the expression tree when the total global cost is equal.
 Another change addresses the issue that C arithmetic expressions have unique meanings governed by its arithmetic conversion rules.
+From the C language reference manual:
 \begin{enumerate}[leftmargin=*,topsep=5pt,itemsep=4pt]
 \item
 …
 \VRef[Rule]{p:SignedToUnsignedSafe} says an unsigned type is safely convertible to an signed type with greater rank, while \VRef[rule]{p:UnsignedToSignedUnsafe} says a signed type is unsafely convertible to an unsigned type.
 Therefore, these two rules cover every possible case.
+\VRef[Rule]{p:Miscellaneous} is the catch-all to accommodate any kinds of exotic integral representations.
+These conversions are applied greedily at local points within an expression.
+\VRef[Rule]{p:Miscellaneous} is the odd-ball rule because it is really a demotion because signed to unsigned (marked in red) is unsafe.
+Finally, assignment allows demotion of any larger type into a smaller type, with a possibly lossy conversion (and often no warning).
+These promotion conversions are applied greedily at local points within an expression.
 Because there is no overloading in C, except for builtin operators, no cost model is needed to differentiate among alternatives that could result in ambiguous matches.
 …
 This is demonstrated in the following example, adapted from the C standard library:
 \begin{cfa}
 unsigned long long x;
 (unsigned)( x >> 32 );
+        unsigned long long x;
+        (unsigned)( x >> 32 );
 \end{cfa}
 \vspace{5pt}
 …
 However, this break from C semantics is not backwards compatible, so to maintain C compatibility, the \CFA resolver selects the lowest-cost interpretation of the cast argument for which a conversion or coercion to the target type exists (upcasting to @unsigned long long@ in the example above, due to the lack of unsafe downcasts), using the cost of the conversion itself only as a tie-breaker.\cite[pp.~46-47]{Moss19}
 \end{cquote}
 However, a cast expression is not necessary to have such inconsistency to C semantics.
+However, a cast expression is unnecessary to have such inconsistency to C semantics.
 An implicit argument-parameter type conversion in a function calls can replicate this issue without an explicit cast.
 \begin{cfa}
 …
 The idea here is to first look for the best integral alternative because integral calculations are exact and cheap.
 If no integral solution is found, than there are different rules to select among floating-point alternatives.
 This approach reduces the search space by partitioning: only look at integral operations, and then only look at float-point operations.
+This approach reduces the search space by partitioning into two categories.
 …
 \CFA does not attempt to do any type \textit{inference} \see{\VRef{s:IntoTypeInferencing}}: it has no anonymous functions (\ie lambdas, commonly found in functional programming and also used in \CC and Java), and the variable types must all be explicitly defined (no auto typing).
 This restriction makes the unification problem more tractable in \CFA, as the argument types at each call site are usually all specified.
+There is a single exception case when the function return type contains a free type variable that does not occur in any of the argument types, and subsequently passed into the parent expression. One such example is the \CFA wrapper for @malloc@ which also infers size argument from the deducted return type.
+\begin{cfa}
+forall (T*) T* malloc() {
+        return (T*) malloc (sizeof(T)); // calls C malloc with the size inferred from context
+There is a single exception case when the function return type contains a free type variable that does not occur in any of the argument types, and subsequently passed into the parent expression.
+One such example is the \CFA wrapper for @malloc@ which also infers size argument from the inferred return type.
+\begin{cfa}
+forall( T * ) T * malloc() {
+        return ( T *)malloc( sizeof(T) ); // calls C malloc with the size inferred from context
+}
+int * i = malloc();   // infer int for T from LHS
 \end{cfa}
 A top level expression whose type still contains an unbounded type variable is considered ill-formed as such an expression is inherently ambiguous.
 …
 Some additions have been made in order to accommodate for the newly added type features to the language.
 To summarize, the \CFA type unification has two minor variants: an \textit{exact} mode and an \textit{inexact} mode.
+The exact mode is required since the type matching algorithm operates recursively and the inner types often have to match exactly, \eg there is no common type for the pointer types @int *@ and @long *@ while there is for @int@ and @long@.
 The inexact mode is applied at top level argument-parameter matching, and attempts to find an assignment to the type variables such that the argument types can be converted to parameter types with minimal cost as defined in the previous section.
-The exact mode is required since the type matching algorithm operates recursively and the inner types often have to match exactly, for example there is no common type for the pointer types \textbf{int*} and \textbf{long*} while there is for @int@ and @long@.
 With the introduction of generic record types, the parameters must match exactly as well; currently there are no covariance or contravariance supported for the generics.
 One simplification was made to the \CFA language that makes modelling the type system easier: polymorphic function pointer types are no longer allowed in declarations.
+One simplification was made to the \CFA language that makes modelling the type system easier: polymorphic function pointer types are no longer allowed.
 The polymorphic function declarations themselves are still treated as function pointer types internally, however the change means that formal parameter types can no longer be polymorphic.
 Previously it is possible to write function prototypes such as
 \begin{cfa}
 void f( forall( T | { T -?( T ); } ) T (*p)( T, T ) );
 \end{cfa}
 Notably, the unification algorithm implemented in the \CFA compiler has never managed to trace the assertion parameters on the formal types at all, and the problem of determining if two assertion sets are compatible may very likely be undecidable in general, given the ability of synthesizing more complicated types by the nesting of generics.
 Eventually, the reason of not allowing such constructs is that they mostly do not provide useful type features for actual programming tasks.
 A subroutine of a program operates on the arguments provided at the call site together with (if any) local and global variables, and even though the subroutine can be polymorphic, the types will be supported at each call site.
 On each invocation the types to be operate on can be determined from the arguments provided, and therefore there should not be a need to pass a polymorphic function pointer, which can take any type in principle.
 For example, consider a polymorphic function that takes one argument of type @T@ and another pointer to a subroutine
 \begin{cfa}
 forall (T) void f (T x, forall (U) void (*g) (U));
 \end{cfa}
 Making @g@ polymorphic in this context would almost certainly be unnecessary, since it can only be called inside the body of @f@ and the types of the argument would have been known anyways, although it can potentially depend on @T@.
 Moreover, requesting a function parameter to be able to potentially work on any input type at all would always impose too much constraint on the arguments, as it only needs to make each calls inside the body of @f@ valid.
+Rewriting the prototype to
 \begin{cfa}
 forall (T) void f (T x, void (*g) (T));
 \end{cfa}
 will be sufficient (or potentially, some compound type synthesized from @T@), in which case @g@ is no longer a polymorphic type on itself.
 The ``monomorphization'' conversion is readily supported in \CFA, either by explicitly assigning a polymorphic function name to a compatible function pointer type, or implicitly done in deducing assertion parameters (which will be discussed in the next section).
 Such technique can be directly applied to argument passing, which is essentially just assignment to function parameter variables.
 There might be some edge cases where the supplied subroutine @g@ is called on arguments of different types inside the body of @f@ and so declared as polymorphic, but such use case is rare and the benefit of allowing such constructs seems to be minimal in practice.
 The result of this change is that the unification algorithm no longer needs to distinguish open and closed type-variables, as the latter is not allowed to exist.
+Previously it was possible to write function prototypes such as
+\begin{cfa}
+void f( forall( T | { T -?( T ); } ) T (@*p@)( T, T ) );
+\end{cfa}
+Notably, the unification algorithm implemented in the \CFA compiler has never managed to trace the assertion parameters on the formal types at all, and the problem of determining if two assertion sets are compatible is likely undecidable in general, given the ability of synthesizing more complicated types by the nesting of generics.
+Essentially, the reason for disallowing this construct is that it does not provide a useful type feature.
+A function operates on the call-site arguments together with any local and global variables.
+When the function is polymorphic, the types are inferred at each call site.
+On each invocation, the types to be operate on are determined from the arguments provided, and therefore, there is no need to pass a polymorphic function pointer, which can take any type in principle.
+For example, consider a polymorphic function that takes one argument of type @T@ and polymorphic function pointer.
+\begin{cfa}
+forall( T ) void f( T x, forall( U ) void (* g)( U ) );
+\end{cfa}
+Making @g@ polymorphic in this context is almost certainly unnecessary, since it can only be called inside the body of @f@ and the types of the argument must be known, although it can potentially depend on @T@.
+Moreover, requesting a function parameter to be able to work on any input type would impose too much constraint on the arguments, as it only needs to make each call inside the body of @f@ valid.
+Hence, rewriting the prototype (or potentially, some compound type synthesized from @T@):
+\begin{cfa}
+forall( T ) void f( T x, void (* g)( T ) );
+\end{cfa}
+is sufficient, so @g@ is no longer a polymorphic type itself.
+This \emph{monomorphization} conversion is readily supported in \CFA, either by explicitly assigning a polymorphic function name to a compatible function pointer type, or implicitly done in inferring assertion parameters (which is discussed next).
+This technique is applicable to argument passing, which is just an assignment to a function parameter variable.
+In theory, there might be edge cases where the supplied function @g@ is called on arguments of different types inside the body of @f@ and so needs to be polymorphic, but this case is rare and its benefit seems to be minimal in practice.
+The result of this change is that the unification algorithm no longer needs to distinguish open (unbounded) and closed (bounded) type-variables, as the latter is not allowed to exist.
 The only type variables that need to be handled are those introduced by the @forall@ clause from the function prototype.
 The subtype relationship between function types is now also rendered redundant since none of the function parameter or return types can be polymorphic, and no basic types or non-polymorphic function types are subtypes of any other type.
 Therefore the goal of (exact) type unification now simply becomes finding a substitution that produces identical types.
 The assertion set need to be resolved is also always just the declarations on the function prototype, which also simplifies the assertion satisfaction algorithm by a bit, as will be discussed further in the next section.
 The type unification results are stored in a type environment data structure, which represents all the type variables currently in scope as equivalent classes, together with their bound types and some other extra information, such as whether the bound type is allowed to be opaque (i.e.
 a forward declaration without definition in scope), and whether the bounds are allowed to be widened.
 In the more general approach commonly used in functional languages, the unification variables are given a lower bound and an upper bound to account for covariance and contravariance of types.
 \CFA currently does not implement any variance with its generic types, and does not allow polymorphic function types, therefore no explicit upper bound is needed and one simple binding value for each equivalence class suffices.
 However, since type conversions are allowed in \CFA, the type environment needs to keep track on which type variables are allowed conversions.
 This behavior is notably different from \CC template argument deduction which enforces an exact match everywhere unless the template argument types are explicitly given.
+For example, a polymorphic maximum function in \CFA can be called with arguments of different arithmetic types and the result follows the usual arithmetic conversion rules, while such expression is not allowed by \CC:
+\begin{cfa}
+forall (T | {int ?<? (T, T); }) T max (T, T);
+max (42, 3.14); // OK, T=double; requires explicit type annotation in C++ such as max<double>(42, 3.14);
+\end{cfa}
 From a theoretical point of view, the simplified implementation of type environment has its shortcomings. There are some cases that do not work nicely with this implementation and some compromise has to be made. A more detailed discussion follows at the end of this chapter.
 \section{Satisfaction of Assertions}
 The assertion satisfaction problem greatly increases the complexity of \CFA expression resolution.
 Past experiments have shown that the majority of time is spent in resolving the assertions for those expressions that takes the longest time to resolve.
 Even though a few heuristics-based optimizations are introduced to the compiler now, this remains to be the most costly part of compiling a \CFA program.
+The subtype relationship among function types is now also rendered redundant since none of the function parameter or return types can be polymorphic, and no basic types or non-polymorphic function types are subtypes of any other type.
+Therefore, the goal of (exact) type unification becomes finding a substitution that produces identical types.
+The assertion set that needs to be resolved is just the declarations on the function prototype, which also simplifies the assertion satisfaction algorithm, which is discussed further in the next section.
+An implementation sketch stores type unification results in a type-environment data-structure, which represents all the type variables currently in scope as equivalent classes, together with their bound types and information such as whether the bound type is allowed to be opaque (\ie a forward declaration without definition in scope) and whether the bounds are allowed to be widened.
+In the general approach commonly used in functional languages, the unification variables are given a lower bound and an upper bound to account for covariance and contravariance of types.
+\CFA does not implement any variance with its generic types and does not allow polymorphic function types, therefore no explicit upper bound is needed and one binding value for each equivalence class suffices.
+However, to handle type conversions, the type environment needs to keep track of which type variables can be converted.
+This behaviour is notably different from \CC template argument inferencing that enforces an exact match everywhere unless the template argument types are explicitly given.
+For example, a polymorphic maximum function in \CFA can be called with arguments of different arithmetic types and the result follows the usual arithmetic conversion rules.
+\begin{cfa}
+forall( T | {int ?<? ( T, T ); } ) T max ( T, T );
+max( 42, 3.14 );   $\C[2.5in]{\LstCommentStyle // \CFA implicitly infers T == double}$
+max<double>(42, 3.14); $\C{\LstCommentStyle // \CC requires explicit type specification}\CRT$
+\end{cfa}
+From a theoretical standpoint, the simplified implementation of the type environment has its shortcomings.
+Some cases do not work nicely with this implementation, and hence, some compromise has to be made.
+A more detailed discussion follows in \VRef{s:CompilerImplementationConsiderations}.
+\section{Assertion Satisfaction}
+The assertion-satisfaction problem greatly increases the complexity of \CFA expression resolution.
+Past experiments have shown that the majority of compilation time is spent in resolving the assertions for those expressions that takes the longest time to resolve.
+Even though a few heuristics-based optimizations have been introduced to the compiler, this remains the most costly part of a \CFA compilation.
 The major difficulty of resolving assertions is that the problem can become recursive, since the expression used to satisfy an outstanding assertion can have its own assertions, and in theory this can go on indefinitely.
 Detecting infinite recursion cases in general is not algorithmically possible and it is not attempted in the compiler.
 Instead, a fixed maximum depth of recursive assertions is imposed.
 This approach is also taken by \CC compilers as template argument deduction is also similarly undecidable in general.
 In previous versions of \CFA this number was set at 4; as the compiler becomes more optimized and capable of handling more complex expressions in a reasonable amount of time, I have increased the limit to 8 and in most occasions it does not lead to trouble.
 Very rarely there will be a case where the infinite recursion produces an exponentially growing assertion set, causing minutes of time wasted before the limit is reached.
 Fortunately it is very hard to run into this situation with realistic \CFA code, and the ones that were found all have some clear characteristics, which can be prevented by some clever tricks.
 In fact, some of the performance optimizations come from analyzing these problematic cases.
 One example of such will be presented later in this section.
 While the assertion satisfaction problem in isolation looks like just another expression to resolve, the recursive nature makes some techniques applied to expression resolution without assertions no longer possible.
 The most significant impact is that the type unification has a side effect, namely editing the type environment (equivalence classes and bindings), which means that if one expression has multiple associated assertions, they are not independent as the changes to the type environment must be compatible for all the assertions to be resolved.
+This approach is also taken by \CC compilers as template argument inferencing is also similarly undecidable in general.
+In previous versions of \CFA, this number was set at 4; as the compiler becomes more optimized and capable of handling more complex expressions in a reasonable amount of time, I have increased the limit to 8 and it does not lead to problems.
+Only rarely is there a case where the infinite recursion produces an exponentially growing assertion set, causing minutes of time wasted before the limit is reached.
+Fortunately, it is very hard to generate this situation with realistic \CFA code, and the ones that have occurred have clear characteristics, which can be prevented by alternative approaches.
+In fact, some of my performance optimizations come from analyzing these problematic cases.
+One example is analysed in this section.
+While the assertion satisfaction problem in isolation looks like just another expression to resolve, its recursive nature makes some techniques for expression resolution no longer possible.
+The most significant impact is that type unification has a side effect, namely editing the type environment (equivalence classes and bindings), which means if one expression has multiple associated assertions it is dependent, as the changes to the type environment must be compatible for all the assertions to be resolved.
 Particularly, if one assertion parameter can be resolved in multiple different ways, all of the results need to be checked to make sure the change to type variable bindings are compatible with other assertions to be resolved.
+A naive algorithm that simply picks any pending assertion to resolve and continue in a depth-first search could be very inefficient and especially prone of falling into an infinite loop, while in many cases it can be avoided by examining other assertions that can provide insight on the desired type binding: if one assertion parameter can only be matched by a unique option, we can then update the type bindings confidently without the need of backtracking.
+The algorithm currently used in \CFA compiler is designed by Aaron Moss through a simplified prototype experiment that captures most of \CFA type system features and ported back to the actual language.
+It can be described as a mix of breadth- and depth-first search in a staged approach.
+A naive algorithm that simply picks any pending assertion to resolve and continue in a depth-first search could be very inefficient and especially prone to an infinite loop.
+In many cases, these problems can be avoided by examining other assertions that provide insight on the desired type binding: if one assertion parameter can only be matched by a unique option, the type bindings can be updated confidently without the need for backtracking.
+The Moss algorithm currently used in \CFA was developed using a simplified type-simulator that capture most of \CFA type-system features.
+The simulation results were then ported back to the actual language.
+The simulator used a mix of breadth- and depth-first search in a staged approach.
 To resolve a set of assertions, the algorithm first attempts to resolve each assertion item individually.
 There are three possible outcomes on resolving each assertion:
 \begin{enumerate}
 \item If no matches are found, the algorithm terminates with a failure.
 \item If exactly one match is found, the type environment is updated immediately, and used in resolving any remaining assertions.
 \item If multiple matches are found, the assertion candidates with their updated type environments are stored in a list that will be checked for compatibility at the end.
+\item If no matches are found, the algorithm terminates with a failure (ambiguity).
+\item If exactly one match is found, the type environment is updated immediately with this result, affecting the resolution of remaining assertions.
+\item If multiple matches are found, the assertion candidates with their updated type environments are stored in a list that is checked for compatibility at the end.
 \end{enumerate}
 When all assertion items are resolved successfully, the algorithm attempts to combine the ambiguously resolved assertions to produce mutually compatible assignments.
+If any new assertions are introduced by the selected candidates, the algorithm is applied recursively, until there are none pending resolution or the recursion limit is reached which results in a failure.
+It has been discovered in practice that the efficiency of such algorithm can sometimes be very sensitive to the order of resolving assertions.
+Suppose an unbound type variable @T@ appears in two assertions, one can be uniquely resolved and allow the type @T@ to be inferred immediately, and another has many ways to be resolved, each results in @T@ being bound to a different concrete type.
+If the first assertion is examined first by the algorithm, the deducted type can then be utilized in resolving the second assertion and eliminate many incorrect options without producing the list of candidates pending further checks.
+In practice, this have a significant impact when an unbound type @T@ is declared to satisfy the basic \emph{object assertions}\footnote{The term is borrowed from object-oriented languages although \CFA is not object-oriented in principle.} of having a default constructor, destructor, and copy assignment operations.
+Since they are defined for every type currently in scope, there are often hundreds or even thousands of matches to these functions with an unspecified operand type, and in most of the cases the value of @T@ can be deduced by resolving another assertion first, which then allows specific object lifetime functions to be looked up since they are sorted internally by the operand type, and greatly reduces the number of wasted resolution attempts.
+Currently this issue also causes the capability of the assertion resolution algorithm to be limited.
+Assertion matching is implemented to be more restricted than expression resolution in general, in that the parameter types must match exactly, rather than just merely callable.
+If one function declaration includes an assertion of @void f(T)@ and only a @f(long)@ is currently in scope, trying to resolve the assertion with @T=int@ would not work.
+Loosening the assertion matching requirement causes many assertion variables to have multiple matches and makes the delayed combination step too costly.
+Given all the issues caused by assertion resolution potentially creating new type variable bindings, a natural progression is to put some restrictions on free type variables such that all the type variables will be bound when the expression reaches assertion resolution stage.
+If any new assertions are introduced by the selected candidates, the algorithm is applied recursively, until there are none pending resolution or the recursion limit is reached, which results in a failure.
+However, in practice the efficiency of this algorithm can be sensitive to the order of resolving assertions.
+Suppose an unbound type variable @T@ appears in two assertions:
+\begin{cfa}
+forall( @T@ | { void foo( @T@ ), void bar( @T@ ) } ) T f( T );
+void foo( int );
+void bar( int );   void bar( double );   ...
+f( 3 );
+\end{cfa}
+where one can be uniquely resolved and allow the type @T@ to be inferred immediately, and another has many ways to be resolved, each resulting in @T@ being bound to a different concrete type.
+If the first assertion is examined by the algorithm, the inferred type can then be utilized in resolving the second assertion eliminating many incorrect options without producing a list of candidates requiring further checks.
+In practice, this have a significant impact when an unbound type @T@ is declared to satisfy the basic \emph{object assertions}\footnote{The term is borrowed from object-oriented languages although \CFA is not object-oriented.} of having a default constructor, destructor, and copy assignment operations.
+Since these functions are implicitly defined for almost every type in scope, there can be hundreds or even thousands of matches to these functions with an unspecified operand type.
+In most cases, the value of @T@ can be inferred by resolving other assertions first, and then the object lifetime-functions can easily be fulfilled since functions are sorted internally by the operand type;
+because of its size, this optimization greatly reduces the number of wasted resolution attempts.
+This issue also limits the capability of the assertion resolution algorithm.
+Assertion matching is implemented to be more restrictive than expression resolution in general, in that the parameter types must match exactly, rather than just merely callable.
+If a function declaration includes the assertion @void f(T)@ and only a @f(long)@ is in scope, trying to resolve the assertion with @T == int@ does not work.
+However, loosening the assertion matching requirement causes many assertion variables to have multiple matches and makes the delayed combination step too costly.
+Given all the issues caused by assertion resolution creating new type variable bindings, a natural progression is to restrict free type-variables such that all the type variables are bound when the expression reaches the assertion resolution stage.
 A type variable introduced by the @forall@ clause of function declaration can appear in parameter types, return types and assertion variables.
 If it appears in the parameter types, it will be bound when matching the arguments to parameters at the call site.
 If it only appears in the return type, it can be eventually figured out by the context in principle.
 The current implementation in \CFA compiler does not do enough return type deduction as it performs eager assertion resolution, and the return type information cannot be known in general before the parent expression is resolved, unless the expression is in an initialization context, in which the type of variable to be initialized is certainly known.
+If it appears in parameter types, it can be bound when matching the arguments to parameters at the call site.
+If it only appears in the return type, it can be eventually be determined from the call-site context.
+Currently, type resolution cannot do enough return-type inferencing while performing eager assertion resolution: the return type information is unknown before the parent expression is resolved, unless the expression is an initialization context where the variable type is known.
 By delaying the assertion resolution until the return type becomes known, this problem can be circumvented.
+The truly problematic case occurs if a type variable does not appear in either of the parameter or return types and only appears in some assertion variables.
+Such case is very rare and it is not evident that forcing every type variable to appear at least once in parameter or return types limits the expressiveness of \CFA type system to a significant extent.
+In the next chapter I will discuss about a proposal of including type declarations in traits rather than having all type variables appear in the trait parameter list, which could be helpful for providing equivalent functionality of having an unbound type parameter in assertion variables, and also addressing some of the variable cost issue discussed in section 4.1.
+\subsection*{Caching Assertion Resolution Results}
+The truly problematic case occurs if a type variable does not appear in either of the parameter or return types and only appears in assertions or variables (associate types).
+\begin{cfa}
+forall( T | { void foo( @T@ ) } ) int f( float ) {
+        @T@ t;
+}
+\end{cfa}
+This case is rare so forcing every type variable to appear at least once in parameter or return types limits does not limit the expressiveness of \CFA type system to a significant extent.
+The next section presents a proposal for including type declarations in traits rather than having all type variables appear in the trait parameter list, which is provides equivalent functionality to an unbound type parameter in assertion variables, and also addresses some of the variable cost issue discussed in \VRef{s:ExpressionCostModel}.
+\subsection{Caching Assertion Resolution Results}
 In Aaron Moss's prototype design and experiments, a potential optimization of caching the result of already resolved assertions is discussed.
+Based on the experiment results, this approach can improve the performance of expression resolution in general, and sometimes allow hard instances of assertion resolution problems to be solved that are otherwise infeasible, for example when the resolution would encounter infinite loops.
+The problem that makes this approach tricky to be implemented correctly is that the resolution algorithm has side effects, namely modifying the type bindings in the environment.
+If we were to cache those results that cause the type bindings to be modified, it would be necessary to store the changes to type bindings too, and in case where multiple candidates can be used to satisfy one assertion parameter, all of them needs to be cached including those that are not eventually selected, since the side effect can produce different results depending on the context.
+In the original design of \CFA that includes unrestricted polymorphic formal parameters that can have assertions on themselves, the problem is even more complicated as new declarations can be introduced in scope during expression resolution.
+Here is one such example taken from Bilson:
+Based on the experiment results, this approach can improve the performance of expression resolution in general, and sometimes allow difficult instances of assertion resolution problems to be solved that are otherwise infeasible, \eg when the resolution encounters an infinite loop.
+The tricky problem in implementing this approach is that the resolution algorithm has side effects, namely modifying the type bindings in the environment.
+If the modifications are cached, \ie the results that cause the type bindings to be modified, it is also necessary to store the changes to type bindings, too.
+Furthermore, in cases where multiple candidates can be used to satisfy one assertion parameter, all of them must be cached including those that are not eventually selected, since the side effect can produce different results depending on the context.
+The original design of \CFA includes unrestricted polymorphic formal parameters with assertions on themselves, making the problem more complicated as new declarations can be introduced in scope during expression resolution.
+Here is an example taken from Bilson:
 \begin{cfa}
 void f( forall( T | { T -?( T ); } ) T (*p)( T, T ) );
 forall( U, V | { U -?( U ); V -?( V ); } ) U g( U, V ) );
 f( g );
 \end{cfa}
 The inner assertion parameter on the \textit{closed} type variable @T@ is used to satisfy the assertions on @U@ and @V@ in this example.
 However, as per the previous discussions on this topic, polymorphic function pointer types have been removed from \CFA, since correctly implementing assertion matching is not possible in general.
 Without closed parameters (and therefore no have-set for assertions) the set of declarations in scope remains unchanged while resolving any expression.
 The current \CFA implementation also does not attempt to widen any already bound type parameters to satisfy an assertion.
 Note that such restriction does mean that certain kinds of expressions cannot be resolved, for example:
 \begin{cfa}
 forall (T | {void f(T);}) void g(T);
 void f (long);
 g(42);
+forall( U, V | { U -?( U ); V -?( V ); } )  U @g@( U, V ) );
+f( @g@ );
+\end{cfa}
+The inner assertion parameter on the \emph{closed} type-variable @T@ is used to satisfy the assertions on @U@ and @V@ in this example.
+However, as per the previous discussions on this topic, polymorphic function-pointer types have been removed from \CFA, since correctly implementing assertion matching is impossible in general.
+Without closed parameters (and therefore no have-set for assertions) the set of declarations in scope remains unchanged, while resolving an expression.
+The current \CFA implementation also does not attempt to widen any bound type-parameters to satisfy an assertion.
+Note, this restriction does mean certain kinds of expressions cannot be resolved, \eg:
+\begin{cfa}
+forall( T | { void f( T ); } )  void g( T );
+void f( long );
+g( 42 );
 \end{cfa}
 The call @g(42)@ is rejected since no attempt is made to widen the parameter type @T@ from @int@ to @long@.
 Such problem could be mitigated if we allow inexact matches of assertions, but cannot be eliminated completely, if @T@ is matched in a parameterized type, including pointers and references:
 \begin{cfa}
 forall (T | {void f(T*);}) void g(T);
 void f (long *);
 g(42);
 \end{cfa}
 Here the only way to resolve the call @g(42)@ is to allow assertion resolution to widen the parameter @T@, since even with inexact matching, @int*@ cannot be converted to @long*@.
+Such problem could be mitigated by allowing inexact matches of assertions, but cannot be eliminated completely, if @T@ is matched in a parameterized type, including pointers and references:
+\begin{cfa}
+forall( T | {void f( T*);})  void g( T );
+void f( long * );
+g( 42 );
+\end{cfa}
+Here the only way to resolve the call @g(42)@ is to allow assertion resolution to widen the parameter @T@, but even with inexact matching, @int *@ cannot be converted to @long *@.
 \section{Compiler Implementation Considerations}
+\label{s:CompilerImplementationConsiderations}
 \CFA is still an experimental language and there is no formal specification of expression resolution rules yet.
+Presently there is also only one reference implementation, namely the cfa-cc translator, which is under active development and the specific behavior of the implementation can change frequently as new features are added.
+Ideally, the goal of expression resolution involving polymorphic functions would be to find the set of type variable assignments such that the global conversion cost is minimal and all assertion variables can be satisfied.
+Unfortunately, with a lot of complications involving implicit conversions and assertion variables, fully achieving this goal is not realistic. The \CFA compiler is specifically not designed to accommodate for all edge cases, either.
+Instead it makes a few restrictions to simplify the algorithm so that most expressions that will be encountered in actual code can still pass type checking within a reasonable amount of time.
+As previously mentioned, \CFA polymorphic type resolution is based on a modified version of unification algorithm, where both equivalence (exact) and subtyping (inexact) relations are considered. However, the implementation of type environment is simplified; it only stores a tentative type binding with a flag indicating whether \textit{widening} is possible for an equivalence class of type variables.
+Formally speaking, this means the type environment used in \CFA compiler is only capable of representing \textit{lower bound} constraints.
+This simplification can still work well most of the time, given the following properties of the existing \CFA type system and the resolution algorithms in use:
+Currently, there is only one reference implementation, namely the @cfa-cc@ translator, which is under active development and the specific behaviour of the implementation can change frequently as new features are added.
+(This situation is standard for any new programming language.)
+Ideally, the goal of expression resolution involving polymorphic functions is to find the set of type variable assignments such that the global conversion cost is minimal and all assertion variables can be satisfied.
+Unfortunately, there are a lot of complications involving implicit conversions and assertion variables;
+hence, fully achieving this goal is unrealistic.
+And as noted, the \CFA compiler is not covering all the edge cases for its current type-system design.
+Instead it makes a few restrictions to simplify the resolution algorithm so most expressions in actual code still pass type checking within a reasonable amount of time.
+As previously mentioned, \CFA polymorphic type resolution is based on a modified version of a unification algorithm, where both equivalence (exact) and subtyping (inexact) relations are considered.
+However, the implementation of the type environment is simplified;
+it only stores a tentative type binding with a flag indicating whether \emph{widening} is possible for an equivalence class of type variables.
+Formally speaking, this means the type environment used in \CFA is only capable of representing \emph{lower-bound} constraints.
+This simplification works most of the time, given the following properties of the existing \CFA type system and the resolution algorithms:
 \begin{enumerate}
+        \item Type resolution almost exclusively proceeds in bottom-up order, which naturally produces lower bound constraints. Since all identifiers can be overloaded in \CFA, not much definite information can be gained from top-down. In principle it would be possible to detect non-overloaded function names and perform top-down resolution for those; however, the prototype experiments have shown that such optimization does not give a meaningful performance benefit, and therefore it is not implemented.
+        \item Few nontrivial subtyping relationships are present in \CFA, namely the arithmetic types presented in \VRef[Figure]{f:CFACurrArithmeticConversions}, and qualified pointer/reference types. In particular, \CFA lacks the nominal inheritance subtyping present in object-oriented languages, and the generic types do not support covariance on type parameters. As a result, named types such as structs are always matched by strict equivalence, including any type parameters should they exist.
+        \item Unlike in functional programming where subtyping between arrow types exists, \ie if $T_2 <: T_1$ and $U_1 <: U_2$ then $T_1 \rightarrow T_2 <: U_1 \rightarrow U_2$, \CFA uses C function pointer types and the parameter/return types must match exactly to be compatible.
+        \item
+        Type resolution almost exclusively proceeds in bottom-up order, which naturally produces lower bound constraints.
+        Since all identifiers can be overloaded in \CFA, little information can be gained from top-down analysis.
+        In principle, it is possible to detect non-overloaded function names and perform top-down resolution for those;
+        however, Moss' prototype experiments showed this optimization does not give a meaningful performance benefit, and therefore was not implemented.
+        \item
+        Few nontrivial subtyping relationships are present in \CFA, \eg the arithmetic types presented in \VRef[Figure]{f:CFACurrArithmeticConversions} and qualified pointer/reference types.
+        The reason is that \CFA lacks nominal inheritance subtyping present in object-oriented languages, and the generic types do not support covariance on type parameters.
+        As a result, named types such as structures are always matched by strict equivalence, even when type parameters exist.
+        \item
+        Unlike functional programming where subtyping between arrow types exists, \ie if $T_2 <: T_1$ and $U_1 <: U_2$ then $T_1 \rightarrow T_2 <: U_1 \rightarrow U_2$, \CFA uses C function pointer types and the parameter/return types must match exactly to be compatible.
 \end{enumerate}
+\CFA does attempt to incorporate type information propagated from upstream in the case of variable declaration with initializer, since the type of the variable being initialized is definitely known. It is known that the current type environment representation is flawed in handling such type deduction when the return type in the initializer is polymorphic, and an inefficient workaround has to be performed in certain cases. An annotated example is included in the \CFA compiler source code:
+\begin{cfa}
+// If resolution is unsuccessful with a target type, try again without, since it
+// will sometimes succeed when it wouldn't with a target type binding.
+// For example:
+forall( otype T ) T & ?[]( T *, ptrdiff_t );
+\CFA does attempt to incorporate upstream type information propagated from variable declaration with initializer, since the type of the variable being initialized is definitely known.
+However, the current type-environment representation is flawed in handling such type inferencing, when the return type in the initializer is polymorphic.
+Currently, an inefficient workaround is performed to efficiency the necessary effect.
+The following is an annotated example of the workaround.
+\begin{cfa}
+// If resolution is unsuccessful with a target type, try again without, since it sometimes succeeds
+// when it does not with a target type binding. For example:
+forall( T ) T & ?[]( T *, ptrdiff_t );
 const char * x = "hello world";
 int ch = x[0];
 …
 // * (x: const char *) is unified with int *, which fails
 \end{cfa}
+The problem here is that we can only represent the constraints $T = int$ and $int <: T$, but since the type information flows in the opposite direction, the proper constraint for this case is $T <: int$, which cannot be represented in the simplified type environment. Currently, an attempt to resolve with equality constraint generated from the initialized variable is still made, since it is often the correct type binding (especially in the case where the initialized variable is a struct), and when such attempt fails, the resolution algorithm is rerun without the initialization context.
+One additional remark to make here is that \CFA does not provide a mechanism to explicitly specify values for polymorphic type parameters. In \CC for example, users may specify template arguments in angle brackets, which could be useful when automatic deduction fails, \eg @max<double>(42, 3.14)@.
+There are some partial workarounds such as adding casts to the arguments, but they are not guaranteed to work in all cases. If a type parameter appears in the function return type, however, using the ascription (return) cast will force the desired type binding, since the cast only compiles when the expression type matches exactly with the target.
+\section{Related Work}
+The problem is that the type system can only represent the constraints $T = int$ and $int <: T$, but since the type information flows in the opposite direction, the proper constraint for this case is $T <: int$, which cannot be represented in the simplified type environment. Currently, an attempt to resolve with equality constraint generated from the initialized variable is still made, since it is often the correct type binding (especially in the case where the initialized variable is a structure), and when such attempt fails, the resolution algorithm is rerun without the initialization context.
+One additional remark to make here is that \CFA does not provide a mechanism to explicitly specify values for polymorphic type parameters. In \CC for example, users may specify template arguments in angle brackets, which is necessary when automatic inferencing fails, \eg @max<double>(42, 3.14)@.
+There are some partial workarounds such as adding casts to the arguments, but they are not guaranteed to work in all cases.
+If a type parameter appears in the function return type, however, using an ascription (return) cast
+\begin{cfa}
+(@return@ double (*)( double, double ))max( 42, 3.14 );   $\C[2.5in]{// tell resolver what type to use}\CRT$
+\end{cfa}
+forces the desired type binding, since the cast only compiles when the expression type matches exactly with the target.

doc/theses/fangren_yu_MMath/uw-ethesis.tex

-              r0393fda8
+              r7d405eb
 \lstset{language=cfa,belowskip=-1pt} % set default language to CFA
 \lstnewenvironment{c++}[1][]{\lstset{language=[GNU]C++,escapechar=\$,moredelim=**[is][\color{red}]{@}{@},}\lstset{#1}}{}
+\lstnewenvironment{haskell}[1][]{\lstset{language=Haskell,escapechar=\$,moredelim=**[is][\color{red}]{@}{@},}\lstset{#1}}{}
+\lstnewenvironment{swift}[1][]{\lstset{language=swift,escapechar=\$,moredelim=**[is][\color{red}]{@}{@},}\lstset{#1}}{}
+\lstnewenvironment{scala}[1][]{\lstset{language=scala,escapechar=\$,moredelim=**[is][\color{red}]{@}{@},}\lstset{#1}}{}
+\lstnewenvironment{ada}[1][]{\lstset{language=Ada,escapechar=\$,moredelim=**[is][\color{red}]{@}{@},}\lstset{#1}}{}
 \newcommand{\PAB}[1]{{\color{red}PAB: #1}}
 …
 \input{intro}
 \input{background}
 \input{content1}
 \input{content2}
 \input{performance}
+\input{features}
+\input{resolution}
+\input{future}
 \input{conclusion}

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: