Context Navigation

← Previous Change
Next Change →

Changeset a2971cc for doc/theses/aaron_moss_PhD

Timestamp:

Feb 20, 2019, 5:37:55 PM (7 years ago)

Author:

Aaron Moss <a3moss@…>

Branches:

ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum

Children:

Parents:

Message:

thesis: first-pass edits to ch.1-3

Location:

doc/theses/aaron_moss_PhD/phd

Files:

: 4 edited

background.tex (modified) (16 diffs)
generic-types.tex (modified) (16 diffs)
introduction.tex (modified) (1 diff)
resolution-heuristics.tex (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

doc/theses/aaron_moss_PhD/phd/background.tex

-              rf728971
+              ra2971cc
 To provide background for the contributions in subsequent chapters, this chapter provides a summary of the features of \CFA{} at the time this work was conducted.
 The core design of \CFA{} is laid out in Glen Ditchfield's 1992 PhD thesis, \emph{Contextual Polymorphism}\cite{Ditchfield92}; in that thesis, Ditchfield presents the theoretical underpinnings of the \CFA{} polymorphism model.
 Building on Ditchfield's design for contextual polymorphism as well as KW-C\cite{Buhr94a}, an earlier set of (largely syntactic) extensions to C, Richard Bilson\cite{Bilson03} built the first version of the \CFA{} compiler, \CFACC{}, in the early 2000's.
+The core design of \CFA{} is laid out in Glen Ditchfield's 1992 PhD thesis, \emph{Contextual Polymorphism} \cite{Ditchfield92}; in that thesis, Ditchfield presents the theoretical underpinnings of the \CFA{} polymorphism model.
+Building on Ditchfield's design for contextual polymorphism as well as KW-C \cite{Buhr94a}, an earlier set of (largely syntactic) extensions to C, Richard Bilson \cite{Bilson03} built the first version of the \CFA{} compiler, \CFACC{}, in the early 2000's.
 This early \CFACC{} provided basic functionality, but incorporated a number of poor algorithmic choices due to a rushed implementation time frame, and as such lacked the runtime performance required for practical use; this thesis is substantially concerned with rectifying those deficits.
 The \CFA{} project was revived in 2015 with the intention of building a production-ready language and compiler; at the time of this writing, both \CFA{} and \CFACC{} have been under active development continuously since.
 As this development has been proceeding concurrently with the work described in this thesis, the state of \CFA{} has been somewhat of a moving target; however, Moss~\etal\cite{Moss18} provides a reasonable summary of the current design.
 Notable features added during this period include generic types (Chapter~\ref{generic-chap}), constructors and destructors\cite{Schluntz17}, improved support for tuples\cite{Schluntz17}, reference types\cite{Moss18}, first-class concurrent and parallel programming support\cite{Delisle18}, as well as numerous pieces of syntactic sugar and the start of an idiomatic standard library\cite{Moss18}.
+As this development has been proceeding concurrently with the work described in this thesis, the state of \CFA{} has been somewhat of a moving target; however, Moss~\etal \cite{Moss18} provides a reasonable summary of the current design.
+Notable features added during this period include generic types (Chapter~\ref{generic-chap}), constructors and destructors \cite{Schluntz17}, improved support for tuples \cite{Schluntz17}, reference types \cite{Moss18}, first-class concurrent and parallel programming support \cite{Delisle18}, as well as numerous pieces of syntactic sugar and the start of an idiomatic standard library \cite{Moss18}.
 \section{\CFA{} Features}
 …
 This choice is in marked contrast to \CC{}, which, though it has backward-compatibility with C on the source code level, is a much larger and more complex language, and requires extensive developer re-training to write idiomatic, efficient code in \CC{}'s object-oriented paradigm.
 \CFA{} does have a system of implicit type conversions derived from C's ``usual arithmetic conversions''; while these conversions may be thought of as something like an inheritance hierarchy, the underlying semantics are significantly different and such an analogy is loose at best.
+\CFA{} does have a system of implicit type conversions derived from C's ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11}; while these conversions may be thought of as something like an inheritance hierarchy, the underlying semantics are significantly different and such an analogy is loose at best.
 Particularly, \CFA{} has no concept of \emph{subclass}, and thus no need to integrate an inheritance-based form of polymorphism with its parametric and overloading-based polymorphism.
 The graph structure of the \CFA{} type conversions is also markedly different than an inheritance hierarchy; it has neither a top nor a bottom type, and does not satisfy the lattice properties typical of inheritance hierarchies.
+The graph structure of the \CFA{} type conversions (discussed in Section~\ref{conv-cost-sec}) is also markedly different than an inheritance hierarchy; it has neither a top nor a bottom type, and does not satisfy the lattice properties typical of inheritance hierarchies.
 Another aspect of \CFA{}'s procedural paradigm is that it retains C's translation-unit-based encapsulation model, rather than class-based encapsulation such as in \CC{}.
 …
 \subsection{Name Overloading} \label{overloading-sec}
 In C, no more than one variable or function in the same scope may share the same name\footnote{Technically, C has multiple separated namespaces, one holding \lstinline{struct}, \lstinline{union}, and \lstinline{enum} tags, one holding labels, one holding \lstinline{typedef} names, variable, function, and enumerator identifiers, and one for each \lstinline{struct} and \lstinline{union} type holding the field names\cit{}.}, and variable or function declarations in inner scopes with the same name as a declaration in an outer scope hide the outer declaration.
+In C, no more than one variable or function in the same scope may share the same name\footnote{Technically, C has multiple separated namespaces, one holding \lstinline{struct}, \lstinline{union}, and \lstinline{enum} tags, one holding labels, one holding \lstinline{typedef} names, variable, function, and enumerator identifiers, and one for each \lstinline{struct} and \lstinline{union} type holding the field names \cite[\S{}6.2.3]{C11}.}, and variable or function declarations in inner scopes with the same name as a declaration in an outer scope hide the outer declaration.
 This restriction makes finding the proper declaration to match to a variable expression or function application a simple matter of symbol-table lookup, which can be easily and efficiently implemented.
 \CFA{}, on the other hand, allows overloading of variable and function names so long as the overloaded declarations do not have the same type, avoiding the multiplication of variable and function names for different types common in the C standard library, as in the following example:
 …
 While the wisdom of giving both the maximum value of a type and the function to take the maximum of two values the same name is debatable, \eg{} some developers may prefer !MAX! for the former, the guiding philosophy of \CFA{} is ``describe, don't prescribe'' --- we prefer to trust programmers with powerful tools, and there is no technical reason to restrict overloading between variables and functions.
 However, the expressivity of \CFA{}'s name overloading has the consequence that simple table lookup is insufficient to match identifiers to declarations, and a type-matching algorithm must be part of expression resolution.
+However, the expressivity of \CFA{}'s name overloading does have the consequence that simple table lookup is insufficient to match identifiers to declarations, and a type-matching algorithm must be part of expression resolution.
 \subsubsection{Operator Overloading}
 …
 \end{cfa}
 Together, \CFA{}'s backward-compatibility with C and the inclusion of this operator overloading feature imply that \CFA{} must select among function overloads using a method compatible with C's ``usual arithmetic conversions''\cit{}, so as to present user programmers with only a single set of overloading rules.
+Together, \CFA{}'s backward-compatibility with C and the inclusion of this operator overloading feature imply that \CFA{} must select among function overloads using a method compatible with C's ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11}, so as to present user programmers with only a single set of overloading rules.
 \subsubsection{Special Literal Types}
 …
 \CFA{} provides a special type for the !0! literal, !zero_t!, so that users can define a zero value for their own types without being forced to create a conversion from an integer or pointer type (though \CFA{} also includes implicit conversions from !zero_t! to the integer and pointer types for backward compatibility).
 According to the C standard\cit{}, !0! is the only false value; any value that compares equal to zero is false, while any value that does not is true.
 By this rule, boolean contexts such as !if ( x )! can always be equivalently rewritten as \lstinline{if ( (x) != 0 )}.
 \CFACC{} applies this rewriting in all boolean contexts, so any type !T! can be made ``truthy'' (that is, given a boolean interpretation) in \CFA{} by defining an operator overload \lstinline{int ?!=?(T, zero_t)}; unlike \CC{} prior to the addition of explicit casts in \CCeleven{}, this design does not add comparability or convertablity to arbitrary integer types.
+According to the C standard \cite[\S{}6.8.4.1]{C11}, !0! is the only false value; any value that compares equal to zero is false, while any value that does not is true.
+By this rule, Boolean contexts such as !if ( x )! can always be equivalently rewritten as \lstinline{if ( (x) != 0 )}.
+\CFACC{} applies this rewriting in all Boolean contexts, so any type !T! can be made ``truthy'' (that is, given a Boolean interpretation) in \CFA{} by defining an operator overload \lstinline{int ?!=?(T, zero_t)}; unlike \CC{} prior to the addition of explicit casts in \CCeleven{}, this design does not add comparability or convertibility to arbitrary integer types.
 \CFA{} also includes a special type for !1!, !one_t!; like !zero_t!, !one_t! has built-in implicit conversions to the various integral types so that !1! maintains its expected semantics in legacy code.
 …
 \CFA{} previously allowed !0! and !1! to be the names of polymorphic variables, with separate overloads for !int 0!, !int 1!, and !forall(dtype T) T* 0!.
 As revealed in my own work on generic types (Chapter~\ref{generic-chap}), the parameteric polymorphic zero variable was not generalizable to other types; though all null pointers have the same in-memory representation, the same cannot be said of the zero values of arbitrary types.
+As revealed in the work presented here on generic types (Chapter~\ref{generic-chap}), the parametric polymorphic zero variable was not generalizable to other types; though all null pointers have the same in-memory representation, the same cannot be said of the zero values of arbitrary types.
 As such, variables that could represent !0! and !1! were phased out in favour of functions that could generate those values for a given type as appropriate.
 …
 \CFA{} passes the size and alignment of the type represented by an !otype! parameter, as well as a default constructor, copy constructor, assignment operator, and destructor.
 Types which do not have one or more of these operators visible cannot be bound to !otype! parameters.
+In this design, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; experiments have shown this overhead to be similar to \CC{} virtual function calls. \TODO{rerun experiments, possibly look at vtable variant}
+In this design, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; the experiments in Chapter~\ref{generic-chap} indicate that this overhead is comparable to that of \CC{} virtual function calls.
+% \TODO{rerun experiments, possibly look at vtable variant}
 One benefit of this design is that it allows polymorphic functions to be separately compiled.
 …
 If a polymorphic function can be used to satisfy one of its own type assertions, this recursion may not terminate, as it is possible that that function is examined as a candidate for its own assertion unboundedly repeatedly.
 To avoid such infinite loops, \CFACC{} imposes a fixed limit on the possible depth of recursion, similar to that employed by most \CC{} compilers for template expansion; this restriction means that there are some semantically well-typed expressions that cannot be resolved by \CFACC{}.
+\TODO{Update this with final state} One contribution made in the course of this thesis was modifying \CFACC{} to use the more flexible expression resolution algorithm for assertion matching, rather than the simpler but limited previous approach of unification on the types of the functions.
+\subsubsection{Deleted Declarations}
+Particular type combinations can be exempted from matching a given polymorphic function through use of a \emph{deleted function declaration}:
+\begin{cfa}
+int somefn(char) = void;
+\end{cfa}
+This feature is based on a \CCeleven{} feature typically used to make a type non-copyable by deleting its copy constructor and assignment operator\footnote{In previous versions of \CC{}, a type could be made non-copyable by declaring a private copy constructor and assignment operator, but not defining either. This idiom is well-known, but depends on some rather subtle and \CC{}-specific rules about private members and implicitly-generated functions; the deleted function form is both clearer and less verbose.} or forbidding some interpretations of a polymorphic function by specifically deleting the forbidden overloads\footnote{Specific polymorphic function overloads can also be forbidden in previous \CC{} versions through use of template metaprogramming techniques, though this advanced usage is beyond the skills of many programmers. A similar effect can be produced on an ad-hoc basis at the appropriate call sites through use of casts to determine the function type. In both cases, the deleted-function form is clearer and more concise.}.
+Deleted function declarations are implemented in \CFACC{} by adding them to the symbol table as usual, but with a flag set that indicates that the function is deleted.
+If this deleted declaration is selected as the unique minimal-cost interpretation of an expression than an error is produced. \TODO{Check this is implemented at print.}
+% \subsubsection{Deleted Declarations}
+% Particular type combinations can be exempted from matching a given polymorphic function through use of a \emph{deleted function declaration}:
+% \begin{cfa}
+% int somefn(char) = void;
+% \end{cfa}
+% This feature is based on a \CCeleven{} feature typically used to make a type non-copyable by deleting its copy constructor and assignment operator\footnote{In previous versions of \CC{}, a type could be made non-copyable by declaring a private copy constructor and assignment operator, but not defining either. This idiom is well-known, but depends on some rather subtle and \CC{}-specific rules about private members and implicitly-generated functions; the deleted function form is both clearer and less verbose.} or forbidding some interpretations of a polymorphic function by specifically deleting the forbidden overloads\footnote{Specific polymorphic function overloads can also be forbidden in previous \CC{} versions through use of template metaprogramming techniques, though this advanced usage is beyond the skills of many programmers. A similar effect can be produced on an ad-hoc basis at the appropriate call sites through use of casts to determine the function type. In both cases, the deleted-function form is clearer and more concise.}.
+% Deleted function declarations are implemented in \CFACC{} by adding them to the symbol table as usual, but with a flag set that indicates that the function is deleted.
+% If this deleted declaration is selected as the unique minimal-cost interpretation of an expression than an error is produced.
 \subsubsection{Traits}
 \CFA{} provides \emph{traits} as a means to name a group of type assertions, as in the example below\footnote{This example uses \CFA{}'s reference types, constructors, and zero type, described in Section~\ref{type-features-sec}.}:
+\CFA{} provides \emph{traits} as a means to name a group of type assertions, as in the example below\footnote{This example uses \CFA{}'s reference types and constructors, described in Section~\ref{type-features-sec}.}:
 \begin{cfa}
 …
 The flexibility of \CFA{}'s implicit trait-satisfaction mechanism provides programmers with a great deal of power, but also blocks some optimization approaches for expression resolution.
 The ability of types to begin or cease to satisfy traits when declarations go into or out of scope makes caching of trait satisfaction judgements difficult, and the ability of traits to take multiple type parameters can lead to a combinatorial explosion of work in any attempt to pre-compute trait satisfaction relationships.
+The ability of types to begin or cease to satisfy traits when declarations go into or out of scope makes caching of trait satisfaction judgments difficult, and the ability of traits to take multiple type parameters can lead to a combinatorial explosion of work in any attempt to pre-compute trait satisfaction relationships.
 \subsection{Implicit Conversions} \label{implicit-conv-sec}
 In addition to the multiple interpretations of an expression produced by name overloading and polymorphic functions, for backward compatibility \CFA{} must support all of the implicit conversions present in C, producing further candidate interpretations for expressions.
 As mentioned above, C does not have an inheritance hierarchy of types, but the C standard's rules for the ``usual arithmetic conversions'\cit{} define which of the built-in types are implicitly convertible to which other types, and the relative cost of any pair of such conversions from a single source type.
+As mentioned above, C does not have an inheritance hierarchy of types, but the C standard's rules for the ``usual arithmetic conversions' \cite[\S{}6.3.1.8]{C11} define which of the built-in types are implicitly convertible to which other types, as well as which implicit conversions to apply for mixed arguments to binary operators.
 \CFA{} adds rules to the usual arithmetic conversions defining the cost of binding a polymorphic type variable in a function call; such bindings are cheaper than any \emph{unsafe} (narrowing) conversion, \eg{} !int! to !char!, but more expensive than any \emph{safe} (widening) conversion, \eg{} !int! to !double!.
 One contribution of this thesis, discussed in Section \TODO{add to resolution chapter}, is a number of refinements to this cost model to more efficiently resolve polymorphic function calls.
+One contribution of this thesis, discussed in Section~\ref{conv-cost-sec}, is a number of refinements to this cost model to more efficiently resolve polymorphic function calls.
 The expression resolution problem which is the focus of Chapter~\ref{resolution-chap} is to find the unique minimal-cost interpretation of each expression in the program, where all identifiers must be matched to a declaration, and implicit conversions or polymorphic bindings of the result of an expression may increase the cost of the expression.
 …
 \subsubsection{Reference Types}
 One of the key ergonomic improvements in \CFA{} is reference types, designed and implemented by Robert Schluntz\cite{Schluntz17}.
+One of the key ergonomic improvements in \CFA{} is reference types, designed and implemented by Robert Schluntz \cite{Schluntz17}.
 Given some type !T!, a !T&! (``reference to !T!'') is essentially an automatically dereferenced pointer.
 These types allow seamless pass-by-reference for function parameters, without the extraneous dereferencing syntax present in C; they also allow easy easy aliasing of nested values with a similarly convenient syntax.
 …
 The C standard makes heavy use of the concept of \emph{lvalue}, an expression with a memory address; its complement, \emph{rvalue} (a non-addressable expression) is not explicitly named.
 In \CFA{}, the distinction between lvalue and rvalue can be reframed in terms of reference and non-reference types, with the benefit of being able to express the difference in user code.
+In \CFA{}, the distinction between lvalue and rvalue can be re-framed in terms of reference and non-reference types, with the benefit of being able to express the difference in user code.
 \CFA{} references preserve the existing qualifier-dropping implicit lvalue-to-rvalue conversion from C (\eg{} a !const volatile int&! can be implicitly copied to a bare !int!)
 To make reference types more easily usable in legacy pass-by-value code, \CFA{} also adds an implicit rvalue-to-lvalue conversion, implemented by storing the value in a fresh compiler-generated temporary variable and passing a reference to that temporary.
 …
 The primary issue with \CC{} references is that it is impossible to extract the address of the reference variable rather than the address of the referred-to variable.
 This breaks a number of the usual compositional properties of the \CC{} type system, \eg{} a reference cannot be re-bound to another variable, nor is it possible to take a pointer to, array of, or reference to a reference.
 \CFA{} supports all of these use cases \TODO{test array} without further added syntax.
+\CFA{} supports all of these use cases without further added syntax.
 The key to this syntax-free feature support is an observation made by the author that the address of a reference is a lvalue.
 In C, the address-of operator !&x! can only be applied to lvalue expressions, and always produces an immutable rvalue; \CFA{} supports reference re-binding by assignment to the address of a reference, and pointers to references by repeating the address-of operator:
 …
 \subsubsection{Resource Management}
 \CFA{} also supports the RAII (``Resource Acquisition is Initialization'') idiom originated by \CC{}, thanks to the object lifetime work of Robert Schluntz\cite{Schluntz17}.
+\CFA{} also supports the RAII (``Resource Acquisition is Initialization'') idiom originated by \CC{}, thanks to the object lifetime work of Robert Schluntz \cite{Schluntz17}.
 This idiom allows a safer and more principled approach to resource management by tying acquisition of a resource to object initialization, with the corresponding resource release executed automatically at object finalization.
 A wide variety of conceptual resources may be conveniently managed by this scheme, including heap memory, file handles, and software locks.
 …
 The implementation of tuples in \CFACC{}'s code generation is based on the generic types introduced in Chapter~\ref{generic-chap}, with one compiler-generated generic type for each tuple arity.
 This allows tuples to take advantage of the same runtime optimizations available to generic types, while reducing code bloat.
 An extended presentation of the tuple features of \CFA{} can be found in \cite{Moss18}, but the following example shows the basics:
+An extended presentation of the tuple features of \CFA{} can be found in \cite{Moss18}, but the following example demonstrates the basic features:
 \begin{cfa}
 …
 This precludes some argument-parameter matching strategies for expression resolution, as well as cheap interpretation filters based on comparing number of parameters and arguments.
 As an example, in the call to !swap( x, x )! above, the second !x! can be resolved starting at the second or third parameter of !swap!, depending which interpretation of !x! was chosen for the first argument.
+\section{Conclusion}
+\CFA{} adds a significant number of features to standard C, increasing the expressivity and re-usability of \CFA{} code while maintaining backwards compatibility for both code and larger language paradigms.
+This flexibility does incur significant added compilation costs, however, the mitigation of which are the primary concern of this thesis.

doc/theses/aaron_moss_PhD/phd/generic-types.tex

-              rf728971
+              ra2971cc
 \begin{figure}
         \begin{cfa}
+                #include <stdlib.h> $\C{// for malloc}$
+                #include <stdio.h>  $\C{// for printf}$
                 struct int_list { int value; struct int_list* next; };
 …
 \begin{figure}
         \begin{cfa}
+                #include <stdlib.h> $\C{// for malloc}$
+                #include <stdio.h>  $\C{// for printf}$
                 // single code implementation
 …
 \begin{figure}
         \begin{cfa}
+                #include <stdlib.h> $\C{// for malloc}$
+                #include <stdio.h>  $\C{// for printf}$
                 $\C[\textwidth]{// code is nested in macros}$
 …
 \CFA{} generic types integrate efficiently and naturally with the existing polymorphic functions in \CFA{}, while retaining backward compatibility with C in layout and support for separate compilation.
 A generic type can be declared in \CFA{} by placing a !forall! specifier on a !struct! or !union! declaration, and instantiated using a parenthesized list of types after the generic name.
 An example comparable to the C polymorphism examples in Figures~\ref{bespoke-generic-fig}, \ref{void-generic-fig}, and \ref{macro-generic-fig} can be seen in Figure~\ref{cfa-generic-fig} \TODO{test this code}.
+An example comparable to the C polymorphism examples in Figures~\ref{bespoke-generic-fig}, \ref{void-generic-fig}, and \ref{macro-generic-fig} can be seen in Figure~\ref{cfa-generic-fig}.
 \begin{figure}
         \begin{cfa}
+                #include <stdlib.hfa> $\C{// for alloc}$
+                #include <stdio.h>  $\C{// for printf}$
                 forall(otype T) struct list { T value; list(T)* next; };
 …
 Though a number of languages have some implementation of generic types, backward compatibility with both C and existing \CFA{} polymorphism presented some unique design constraints for this project.
 The guiding principle was to maintain an unsurprising language model for C programmers without compromising runtime efficiency.
 A key insight for this design was that C already possesses a handful of built-in generic types (\emph{compound types} in the language of the standard\cit{}), notably pointer (!T*!) and array (!T[]!), and that user-definable generics should act similarly.
+A key insight for this design was that C already possesses a handful of built-in generic types (\emph{derived types} in the language of the standard \cite[\S{}6.2.5]{C11}), notably pointer (!T*!) and array (!T[]!), and that user-definable generics should act similarly.
 \subsection{Related Work}
 One approach to the design of generic types is that taken by \CC{} templates\cite{C++}.
+One approach to the design of generic types is that taken by \CC{} templates \cite{C++}.
 The template approach is closely related to the macro-expansion approach to C polymorphism demonstrated in Figure~\ref{macro-generic-fig}, but where the macro-expansion syntax has been given first-class language support.
 Template expansion has the benefit of generating code with near-optimal runtime efficiency, as distinct optimizations can be applied for each instantiation of the template.
 On the other hand, template expansion can also lead to significant code bloat, exponential in the worst case\cit{}, and the costs of increased instruction cache pressure at runtime and wasted developer time when compiling cannot be discounted.
+On the other hand, template expansion can also lead to significant code bloat, exponential in the worst case \cite{Haberman16}, and the costs of increased instruction cache pressure at runtime and wasted developer time when compiling cannot be discounted.
 The most significant restriction of the \CC{} template model is that it breaks separate compilation and C's translation-unit-based encapsulation mechanisms.
 Because a \CC{} template is not actually code, but rather a sort of ``recipe'' to generate code, template code must be visible at its call site to be used.
 Furthermore, \CC{} template code cannot be type-checked without instantiating it, a time consuming process with no hope of improvement until \CC{} concepts\cite{C++Concepts} are standardized in \CCtwenty{}.
+Furthermore, \CC{} template code cannot be type-checked without instantiating it, a time consuming process with no hope of improvement until \CC{} concepts \cite{C++Concepts} are standardized in \CCtwenty{}.
 C code, by contrast, only needs a !struct! or function declaration to call that function or use (by-pointer) values of that type, a desirable property to maintain for \CFA{}.
 Java\cite{Java8} has another prominent implementation for generic types, introduced in Java~5 and based on a significantly different approach than \CC{}.
+Java \cite{Java8} has another prominent implementation for generic types, introduced in Java~5 and based on a significantly different approach than \CC{}.
 The Java approach has much more in common with the !void*!-polymorphism shown in Figure~\ref{void-generic-fig}; since in Java nearly all data is stored by reference, the Java approach to polymorphic data is to store pointers to arbitrary data and insert type-checked implicit casts at compile-time.
 This process of \emph{type erasure} has the benefit of allowing a single instantiation of polymorphic code, but relies heavily on Java's object model and garbage collector.
 To use this model, a more C-like language such as \CFA{} would be required to dynamically allocate internal storage for variables, track their lifetime, and properly clean them up afterward.
 Cyclone\cite{Grossman06} is another language extending C, and also provides capabilities for polymorphic functions and existential types, similar to \CFA{}'s !forall! functions and generic types.
+Cyclone \cite{Grossman06} is another language extending C, and also provides capabilities for polymorphic functions and existential types, similar to \CFA{}'s !forall! functions and generic types.
 Cyclone existential types can include function pointers in a construct similar to a virtual function table, but these pointers must be explicitly initialized at some point in the code, which is tedious and error-prone compared to \CFA{}'s implicit assertion satisfaction.
 Furthermore, Cyclone's polymorphic functions and types are restricted to abstraction over types with the same layout and calling convention as !void*!, \ie{} only pointer types and !int!.
 …
 Many other languages include some form of generic types.
 As a brief survey, ML\cite{ML} was the first language to support parameteric polymorphism, but unlike \CFA{} does not support the use of assertions and traits to constrain type arguments.
 Haskell\cite{Haskell10} combines ML-style polymorphism with the notion of type classes, similar to \CFA{} traits, but requiring an explicit association with their implementing types, unlike \CFA{}.
 Objective-C\cite{obj-c-book} is an extension to C which has had some industrial success; however, it did not support type-checked generics until recently\cite{xcode7}, and it's garbage-collected, message-passing object-oriented model is a radical departure from C.
 Go\cite{Go}, and Rust\cite{Rust} are modern compiled languages with abstraction features similar to \CFA{} traits, \emph{interfaces} in Go and \emph{traits} in Rust.
 Go has implicit interface implementation and uses a ``fat pointer'' construct to pass polymorphic objects to functions, similar in principle to \CFA{}'s implicit forall paramters.
+As a brief survey, ML \cite{ML} was the first language to support parametric polymorphism, but unlike \CFA{} does not support the use of assertions and traits to constrain type arguments.
+Haskell \cite{Haskell10} combines ML-style polymorphism with the notion of type classes, similar to \CFA{} traits, but requiring an explicit association with their implementing types, unlike \CFA{}.
+Objective-C \cite{obj-c-book} is an extension to C which has had some industrial success; however, it did not support type-checked generics until recently \cite{xcode7}, and its garbage-collected, message-passing object-oriented model is a radical departure from C.
+Go \cite{Go}, and Rust \cite{Rust} are modern compiled languages with abstraction features similar to \CFA{} traits, \emph{interfaces} in Go and \emph{traits} in Rust.
+Go has implicit interface implementation and uses a ``fat pointer'' construct to pass polymorphic objects to functions, similar in principle to \CFA{}'s implicit forall parameters.
 Go does not, however, allow user code to define generic types, restricting Go programmers to the small set of generic types defined by the compiler.
 Rust has powerful abstractions for generic programming, including explicit implemenation of traits and options for both separately-compiled virtual dispatch and template-instantiated static dispatch in functions.
+Rust has powerful abstractions for generic programming, including explicit implementation of traits and options for both separately-compiled virtual dispatch and template-instantiated static dispatch in functions.
 On the other hand, the safety guarantees of Rust's \emph{lifetime} abstraction and borrow checker impose a distinctly idiosyncratic programming style and steep learning curve; \CFA{}, with its more modest safety features, allows direct ports of C code while maintaining the idiomatic style of the original source.
 …
 Like \CC{} template types, generic !struct!s and !union!s in \CFA{} have macro-expanded storage layouts, but, like Java generics, \CFA{} generic types can be used with separately-compiled polymorphic functions without requiring either the type or function definition to be visible to the other.
 The fact that the storage layout of any instantiation of a \CFA{} generic type is identical to that of the monomorphic type produced by simple macro replacement of the generic type parameters is important to provide consistent and predictable runtime performance, and to not impose any undue abstraction penalty on generic code.
+As an example, consider the following generic type and function \TODO{test this}:
+As an example, consider the following generic type and function:
+% TODO whatever the bug is with initializer-expressions not working, it affects this
 \begin{cfa}
 forall( otype R, otype S ) struct pair { R first; S second; };
 pair(const char*, int) with_len( const char* s ) {
         return (pair(const char*), int){ s, strlen(s) };
+        return (pair(const char*, int)){ s, strlen(s) };
+}
 \end{cfa}
 …
 If its return type was !pair(const char*, int)*!, callers of !with_len! would only need the declaration !forall(otype R, otype S) struct pair! to call it, in accordance with the usual C rules for opaque types.
 !with_len! is itself a monomorphic function, returning a type that is structurally identical to !struct { const char* first; int second; }!, and as such could be called from C given an appropriate redeclaration and demangling flags.
 However, the definition of !with_len! depends on a polymorphic function call to the !pair! constructor, which only needs to be written once (in this case, implicitly by the compiler according to the usual \CFA{} constructor generation\cite{Moss18}) and can be re-used for a wide variety of !pair! instantiations.
+!with_len! is itself a monomorphic function, returning a type that is structurally identical to !struct { const char* first; int second; }!, and as such could be called from C given an appropriate re-declaration and demangling flags.
+However, the definition of !with_len! depends on a polymorphic function call to the !pair! constructor, which only needs to be written once (in this case, implicitly by the compiler according to the usual \CFA{} constructor generation \cite{Schluntz17}) and can be re-used for a wide variety of !pair! instantiations.
 Since the parameters to this polymorphic constructor call are all statically known, compiler inlining can eliminate any runtime overhead of this polymorphic call.
 \CFA{} deliberately does not support \CC{}-style partial specializations of generic types.
 A particularly infamous example in the \CC{} standard library is !vector<bool>!, which is represented as a bitstring rather than the array representation of the other !vector! instantiations.
+A particularly infamous example in the \CC{} standard library is !vector<bool>!, which is represented as a bit-string rather than the array representation of the other !vector! instantiations.
 Complications from this inconsistency (chiefly the fact that a single bit is not addressable, unlike an array element) make the \CC{} !vector! unpleasant to use in generic contexts due to the break in its public interface.
 Rather than attempting to plug leaks in the template specialization abstraction with a detailed method interface, \CFA{} takes the more principled position that two types with an unrelated data layout are in fact unrelated types, and should be handled with different code.
 Of course, to the degree that distinct types are similar enough to share an interface, the \CFA{} !trait! system allows one to be defined, and objects of types implementing that !trait! to be operated on in the same polymorphic functions.
+Rather than attempting to plug leaks in the template specialization abstraction with a detailed method interface, \CFA{} takes the more consistent position that two types with an unrelated data layout are in fact unrelated types, and should be handled with different code.
+Of course, to the degree that distinct types are similar enough to share an interface, the \CFA{} !trait! system allows such an interface to be defined, and objects of types implementing that !trait! to be operated on using the same polymorphic functions.
 Since \CFA{} polymorphic functions can operate over polymorphic generic types, functions over such types can be partially or completely specialized using the usual overload selection rules.
 …
 Instead, my design for generic type support in \CFACC{} distinguishes between \emph{concrete} generic types that have a fixed memory layout regardless of type parameters and \emph{dynamic} generic types that may vary in memory layout depending on their type parameters.
 A \emph{dtype-static} type has polymorphic parameters but is still concrete.
 Polymorphic pointers are an example of dtype-static types; given some type variable !T!, T is a polymorphic type, but !T*! has a fixed size and can therefore be represented by a !void*! in code generation.
+Polymorphic pointers are an example of dtype-static types; given some type variable !T!, !T! is a polymorphic type, but !T*! has a fixed size and can therefore be represented by a !void*! in code generation.
 In particular, generic types where all parameters are un-!sized! (\ie{} they do not conform to the built-in !sized! trait because the compiler does not know their size and alignment) are always concrete, as there is no possibility for their layout to vary based on type parameters of unknown size and alignment.
 More precisely, a type is concrete if and only if all of its !sized! type parameters are concrete, and a concrete type is dtype-static if any of its type parameters are (possibly recursively) polymorphic.
+To illustrate, the following code using the !pair! type from above \TODO{test this} has each use of !pair! commented with its class:
+To illustrate, the following code using the !pair! type from above has each use of !pair! commented with its class:
+% TODO constructor bugs here too
 \begin{cfa}
 //dynamic, layout varies based on T
 …
 Access to members of a dynamic structure is provided at runtime via base displacement addressing the structure pointer and the member offset (similar to the !offsetof! macro), moving a compile-time offset calculation to runtime.
 the offset arrays are statically generated where possible.
+The offset arrays are statically generated where possible.
 If a dynamic generic type is passed or returned by value from a polymorphic function, \CFACC{} can safely assume that the generic type is complete (\ie{} has a known layout) at any call site, and the offset array is passed from the caller; if the generic type is concrete at the call site, the elements of this offset array can even be statically generated using the C !offsetof! macro.
 As an example, the body of the second !value! function above is implemented as
 …
 Here, !_assign_T! is passed in as an implicit parameter from !otype T! and takes two !T*! (!void*! in the generated code), a destination and a source, and !_retval! is the pointer to a caller-allocated buffer for the return value, the usual \CFA{} method to handle dynamically-sized return types.
 !_offsetof_pair! is the offset array passed into !value!; this array is generated at the call site as
+!_offsetof_pair! is the offset array passed into !value!; this array is statically generated at the call site as
 \begin{cfa}
 …
 For instance, modularity is generally provided in C by including an opaque forward declaration of a structure and associated accessor and mutator functions in a header file, with the actual implementations in a separately-compiled \texttt{.c} file.
 \CFA{} supports this pattern for generic types, implying that the caller of a polymorphic function may not know the actual layout or size of a dynamic generic type and only holds it by pointer.
 \CFACC{} automatically generates \emph{layout functions} for cases where the size, alignment, and offset array of a generic struct cannot be passed into a function from that functions's caller.
+\CFACC{} automatically generates \emph{layout functions} for cases where the size, alignment, and offset array of a generic struct cannot be passed into a function from that function's caller.
 These layout functions take as arguments pointers to size and alignment variables and a caller-allocated array of member offsets, as well as the size and alignment of all !sized! parameters to the generic structure.
 Un!sized! parameters not passed because they are forbidden from being used in a context that affects layout by C's usual rules about incomplete types.
+Un-!sized! parameters are not passed because they are forbidden from being used in a context that affects layout by C's usual rules about incomplete types.
 Similarly, the layout function can only safely be called from a context where the generic type definition is visible, because otherwise the caller will not know how large to allocate the array of member offsets.
 The C standard does not specify a memory layout for structs, but the POSIX ABI for x86\cit{} does; this memory layout is common for C implementations, but is a platform-specific issue for porting \CFA{}.
+The C standard does not specify a memory layout for structs, but the POSIX ABI for x86 \cite{POSIX08} does; this memory layout is common for C implementations, but is a platform-specific issue for porting \CFA{}.
 This algorithm, sketched below in pseudo-\CFA{}, is a straightforward mapping of consecutive fields into the first properly-aligned offset in the !struct! layout; layout functions for !union! types omit the offset array and simply calculate the maximum size and alignment over all union variants.
 Since \CFACC{} generates a distinct layout function for each type, constant-folding and loop unrolling are applied.
 …
 Results of layout function calls are cached so that they are only computed once per type per function.
 Layout functions also allow generic types to be used in a function definition without reflecting them in the function signature, an important implemenation-hiding constraint of the design.
+Layout functions also allow generic types to be used in a function definition without reflecting them in the function signature, an important implementation-hiding constraint of the design.
 For instance, a function that strips duplicate values from an unsorted !list(T)! likely has a reference to the list as its only explicit parameter, but uses some sort of !set(T)! internally to test for duplicate values.
 This function could acquire the layout for !set(T)! by calling its layout function, providing as an argument the layout of !T! implicitly passed into that function.
 Whether a type is concrete, dtype-static, or dynamic is decided solely on the basis of the type arguments and !forall! clause type paramters.
 This design allows opaque forward declarations of generic types, \eg{} !forall(otype T) struct Box;! like in C, all uses of $Box(T)$ can be separately compiled, and callers from other translation units know the proper calling conventions to use.
+Whether a type is concrete, dtype-static, or dynamic is decided solely on the basis of the type arguments and !forall! clause type parameters.
+This design allows opaque forward declarations of generic types, \eg{} !forall(otype T) struct Box;! like in C, all uses of !Box(T)! can be separately compiled, and callers from other translation units know the proper calling conventions to use.
 In an alternate design where the definition of a structure type is included in deciding whether a generic type is dynamic or concrete, some further types may be recognized as dtype-static --- \eg{} !Box! could be defined with a body !{ T* p; }!, and would thus not depend on !T! for its layout.
 However, the existence of an !otype! parameter !T! means that !Box! \emph{could} depend on !T! for its layout if this definition is not visible, and we judged preserving separate compilation (and the associated C compatibility) in the implemented design to be an acceptable trade-off.
 …
 \end{cfa}
 The four versions of the benchmark implemented are C with !void*!-based polymorphism, \CFA{} with parameteric polymorphism, \CC{} with templates, and \CC{} using only class inheritance for polymorphism, denoted \CCV{}.
+The four versions of the benchmark implemented are C with !void*!-based polymorphism, \CFA{} with parametric polymorphism, \CC{} with templates, and \CC{} using only class inheritance for polymorphism, denoted \CCV{}.
 The \CCV{} variant illustrates an alternative object-oriented idiom where all objects inherit from a base !object! class, mimicking a Java-like interface; in particular, runtime checks are necessary to safely downcast objects.
 The most notable difference among the implementations is the memory layout of generic types: \CFA{} and \CC{} inline the stack and pair elements into corresponding list and pair nodes, while C and \CCV{} lack such capability and, instead, must store generic objects via pointers to separately allocated objects.
 …
 Use of these standard library types has minimal impact on the performance benchmarks, but shrinks the \CFA{} and \CC{} code to 39 and 42 lines, respectively.
 The difference between the \CFA{} and \CC{} line counts is primarily declaration duplication to implement separate compilation; a header-only \CFA{} library is similar in length to the \CC{} version.
 On the other hand, due to the language shortcomings mentioned at the beginning of the chapter, C does not have a generic collections library in its standard distribution, resulting in frequent reimplementation of such collection types by C programmers.
+On the other hand, due to the language shortcomings mentioned at the beginning of the chapter, C does not have a generic collections library in its standard distribution, resulting in frequent re-implementation of such collection types by C programmers.
 \CCV{} does not use the \CC{} standard template library by construction, and, in fact, includes the definition of !object! and wrapper classes for !char!, !short!, and !int! in its line count, which inflates this count somewhat, as an actual object-oriented language would include these in the standard library.
 I justify the given line count by noting that many object-oriented languages do not allow implementing new interfaces on library types without subclassing or wrapper types, which may be similarly verbose.
 …
 Line count is a fairly rough measure of code complexity; another important factor is how much type information the programmer must specify manually, especially where that information is not type-checked.
 Such unchecked type information produces a heavier documentation burden and increased potential for runtime bugs and is much less common in \CFA{} than C, with its manually specified function pointer arguments and format codes, or \CCV{}, with its extensive use of un-type-checked downcasts, \eg{} !object! to !integer! when popping a stack.
 To quantify this manual typing, the ``redundant type annotations'' line in Table~\ref{generic-eval-table} counts the number of lines on which the known type of a variable is respecified, either as a format specifier, explicit downcast, type-specific function, or by name in a !sizeof!, !struct! literal, or !new! expression.
+To quantify this manual typing, the ``redundant type annotations'' line in Table~\ref{generic-eval-table} counts the number of lines on which the known type of a variable is re-specified, either as a format specifier, explicit downcast, type-specific function, or by name in a !sizeof!, !struct! literal, or !new! expression.
 The \CC{} benchmark uses two redundant type annotations to create new stack nodes, whereas the C and \CCV{} benchmarks have several such annotations spread throughout their code.
 The \CFA{} benchmark is able to eliminate \emph{all} redundant type annotations through use of the return-type polymorphic !alloc! function in the \CFA{} standard library.

doc/theses/aaron_moss_PhD/phd/introduction.tex

-              rf728971
+              ra2971cc
 Particular contributions of this work include design and implementation of
 parametric-polymorphic (``generic'') types in a manner compatible with the existing polymorphism design of \CFA{} (Chapter~\ref{generic-chap}), a type environment data structure based on a novel variant of the union-find algorithm (Chapter~\ref{env-chap}), and a new expression resolution algorithm designed to quickly locate the optimal declarations for a \CFA{} expression (Chapter~\ref{resolution-chap}).
+This expression resolution algorithm was designed with the aid of a purpose-built prototype system which encapsulates the essential aspects of the \CFA{} type system without incurring the technical debt of the existing system or the friction-inducing necessity of maintaining a working compiler; the goal of this prototype system was to discover effective heuristics to avoid performing unnecessary work in the process of locating the optimal \CFA{} expression resolution.
+This expression resolution algorithm was designed with the aid of a purpose-built prototype system which encapsulates the essential aspects of the \CFA{} type system without incurring the technical debt of the existing system or the friction-inducing necessity of maintaining a working compiler; the goal of this prototype system was to discover effective heuristics to avoid performing unnecessary work in the process of locating the optimal \CFA{} expression resolution.
+This prototype system (described in Chapter~\ref{expr-chap}) shows promise as a vehicle for future research in compiler design, and is itself a technical contribution of this thesis.
 Though the direction and validation of this work was fairly narrowly focused on the \CFA{} programming language, the tools used and results obtained should be of interest to a wider compiler and programming language design community.
 In particular, with the addition of \emph{concepts} in \CCtwenty{}, conforming \CC{} compilers must support a model of type assertions very similar to that in \CFA{}, and the algorithmic techniques used in the expression resolution algorithm presented here may prove useful.
+In particular, with the addition of \emph{concepts} in \CCtwenty{} \cite{C++Concepts}, conforming \CC{} compilers must support a model of type assertions very similar to that in \CFA{}, and the algorithmic techniques used here may prove useful.
 Type environments are also widely modelled in compiler implementations, particularly of functional languages, though also increasingly commonly in other languages (such as Rust) which perform type inference; the type environment presented here may be useful to those language implementers.

doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex

rf728971	ra2971cc
22	22	\subsection{Conversion Cost} \label{conv-cost-sec}
23	23
24		C does not have an explicit cost model for implicit conversions, but the ``usual arithmetic conversions''\cite[\S{}6.3.1.8]{C11} used to decide which arithmetic operators to use define one implicitly.
	24	C does not have an explicit cost model for implicit conversions, but the ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} used to decide which arithmetic operators to use define one implicitly.
25	25	The only context in which C has name overloading is the arithmetic operators, and the usual arithmetic conversions define a \emph{common type} for mixed-type arguments to binary arithmetic operators.
26	26	Since for backward-compatibility purposes the conversion costs of \CFA{} must produce an equivalent result to these common type rules, it is appropriate to summarize \cite[\S{}6.3.1.8]{C11} here:

Note: See TracChangeset for help on using the changeset viewer.

Download in other formats: