Ignore:
Timestamp:
Feb 20, 2019, 5:37:55 PM (5 years ago)
Author:
Aaron Moss <a3moss@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
f316c68
Parents:
f728971
Message:

thesis: first-pass edits to ch.1-3

Location:
doc/theses/aaron_moss_PhD/phd
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/aaron_moss_PhD/phd/background.tex

    rf728971 ra2971cc  
    55To provide background for the contributions in subsequent chapters, this chapter provides a summary of the features of \CFA{} at the time this work was conducted.
    66
    7 The core design of \CFA{} is laid out in Glen Ditchfield's 1992 PhD thesis, \emph{Contextual Polymorphism}\cite{Ditchfield92}; in that thesis, Ditchfield presents the theoretical underpinnings of the \CFA{} polymorphism model.
    8 Building on Ditchfield's design for contextual polymorphism as well as KW-C\cite{Buhr94a}, an earlier set of (largely syntactic) extensions to C, Richard Bilson\cite{Bilson03} built the first version of the \CFA{} compiler, \CFACC{}, in the early 2000's.
     7The core design of \CFA{} is laid out in Glen Ditchfield's 1992 PhD thesis, \emph{Contextual Polymorphism} \cite{Ditchfield92}; in that thesis, Ditchfield presents the theoretical underpinnings of the \CFA{} polymorphism model.
     8Building on Ditchfield's design for contextual polymorphism as well as KW-C \cite{Buhr94a}, an earlier set of (largely syntactic) extensions to C, Richard Bilson \cite{Bilson03} built the first version of the \CFA{} compiler, \CFACC{}, in the early 2000's.
    99This early \CFACC{} provided basic functionality, but incorporated a number of poor algorithmic choices due to a rushed implementation time frame, and as such lacked the runtime performance required for practical use; this thesis is substantially concerned with rectifying those deficits.
    1010
    1111The \CFA{} project was revived in 2015 with the intention of building a production-ready language and compiler; at the time of this writing, both \CFA{} and \CFACC{} have been under active development continuously since.
    12 As this development has been proceeding concurrently with the work described in this thesis, the state of \CFA{} has been somewhat of a moving target; however, Moss~\etal\cite{Moss18} provides a reasonable summary of the current design.
    13 Notable features added during this period include generic types (Chapter~\ref{generic-chap}), constructors and destructors\cite{Schluntz17}, improved support for tuples\cite{Schluntz17}, reference types\cite{Moss18}, first-class concurrent and parallel programming support\cite{Delisle18}, as well as numerous pieces of syntactic sugar and the start of an idiomatic standard library\cite{Moss18}.
     12As this development has been proceeding concurrently with the work described in this thesis, the state of \CFA{} has been somewhat of a moving target; however, Moss~\etal \cite{Moss18} provides a reasonable summary of the current design.
     13Notable features added during this period include generic types (Chapter~\ref{generic-chap}), constructors and destructors \cite{Schluntz17}, improved support for tuples \cite{Schluntz17}, reference types \cite{Moss18}, first-class concurrent and parallel programming support \cite{Delisle18}, as well as numerous pieces of syntactic sugar and the start of an idiomatic standard library \cite{Moss18}.
    1414
    1515\section{\CFA{} Features}
     
    2424This choice is in marked contrast to \CC{}, which, though it has backward-compatibility with C on the source code level, is a much larger and more complex language, and requires extensive developer re-training to write idiomatic, efficient code in \CC{}'s object-oriented paradigm.
    2525
    26 \CFA{} does have a system of implicit type conversions derived from C's ``usual arithmetic conversions''; while these conversions may be thought of as something like an inheritance hierarchy, the underlying semantics are significantly different and such an analogy is loose at best.
     26\CFA{} does have a system of implicit type conversions derived from C's ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11}; while these conversions may be thought of as something like an inheritance hierarchy, the underlying semantics are significantly different and such an analogy is loose at best.
    2727Particularly, \CFA{} has no concept of \emph{subclass}, and thus no need to integrate an inheritance-based form of polymorphism with its parametric and overloading-based polymorphism.
    28 The graph structure of the \CFA{} type conversions is also markedly different than an inheritance hierarchy; it has neither a top nor a bottom type, and does not satisfy the lattice properties typical of inheritance hierarchies.
     28The graph structure of the \CFA{} type conversions (discussed in Section~\ref{conv-cost-sec}) is also markedly different than an inheritance hierarchy; it has neither a top nor a bottom type, and does not satisfy the lattice properties typical of inheritance hierarchies.
    2929
    3030Another aspect of \CFA{}'s procedural paradigm is that it retains C's translation-unit-based encapsulation model, rather than class-based encapsulation such as in \CC{}.
     
    3333\subsection{Name Overloading} \label{overloading-sec}
    3434
    35 In C, no more than one variable or function in the same scope may share the same name\footnote{Technically, C has multiple separated namespaces, one holding \lstinline{struct}, \lstinline{union}, and \lstinline{enum} tags, one holding labels, one holding \lstinline{typedef} names, variable, function, and enumerator identifiers, and one for each \lstinline{struct} and \lstinline{union} type holding the field names\cit{}.}, and variable or function declarations in inner scopes with the same name as a declaration in an outer scope hide the outer declaration.
     35In C, no more than one variable or function in the same scope may share the same name\footnote{Technically, C has multiple separated namespaces, one holding \lstinline{struct}, \lstinline{union}, and \lstinline{enum} tags, one holding labels, one holding \lstinline{typedef} names, variable, function, and enumerator identifiers, and one for each \lstinline{struct} and \lstinline{union} type holding the field names \cite[\S{}6.2.3]{C11}.}, and variable or function declarations in inner scopes with the same name as a declaration in an outer scope hide the outer declaration.
    3636This restriction makes finding the proper declaration to match to a variable expression or function application a simple matter of symbol-table lookup, which can be easily and efficiently implemented.
    3737\CFA{}, on the other hand, allows overloading of variable and function names so long as the overloaded declarations do not have the same type, avoiding the multiplication of variable and function names for different types common in the C standard library, as in the following example:
     
    5252
    5353While the wisdom of giving both the maximum value of a type and the function to take the maximum of two values the same name is debatable, \eg{} some developers may prefer !MAX! for the former, the guiding philosophy of \CFA{} is ``describe, don't prescribe'' --- we prefer to trust programmers with powerful tools, and there is no technical reason to restrict overloading between variables and functions.
    54 However, the expressivity of \CFA{}'s name overloading has the consequence that simple table lookup is insufficient to match identifiers to declarations, and a type-matching algorithm must be part of expression resolution.
     54However, the expressivity of \CFA{}'s name overloading does have the consequence that simple table lookup is insufficient to match identifiers to declarations, and a type-matching algorithm must be part of expression resolution.
    5555
    5656\subsubsection{Operator Overloading}
     
    7272\end{cfa}
    7373
    74 Together, \CFA{}'s backward-compatibility with C and the inclusion of this operator overloading feature imply that \CFA{} must select among function overloads using a method compatible with C's ``usual arithmetic conversions''\cit{}, so as to present user programmers with only a single set of overloading rules.
     74Together, \CFA{}'s backward-compatibility with C and the inclusion of this operator overloading feature imply that \CFA{} must select among function overloads using a method compatible with C's ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11}, so as to present user programmers with only a single set of overloading rules.
    7575
    7676\subsubsection{Special Literal Types}
     
    7979\CFA{} provides a special type for the !0! literal, !zero_t!, so that users can define a zero value for their own types without being forced to create a conversion from an integer or pointer type (though \CFA{} also includes implicit conversions from !zero_t! to the integer and pointer types for backward compatibility).
    8080
    81 According to the C standard\cit{}, !0! is the only false value; any value that compares equal to zero is false, while any value that does not is true.
    82 By this rule, boolean contexts such as !if ( x )! can always be equivalently rewritten as \lstinline{if ( (x) != 0 )}.
    83 \CFACC{} applies this rewriting in all boolean contexts, so any type !T! can be made ``truthy'' (that is, given a boolean interpretation) in \CFA{} by defining an operator overload \lstinline{int ?!=?(T, zero_t)}; unlike \CC{} prior to the addition of explicit casts in \CCeleven{}, this design does not add comparability or convertablity to arbitrary integer types.
     81According to the C standard \cite[\S{}6.8.4.1]{C11}, !0! is the only false value; any value that compares equal to zero is false, while any value that does not is true.
     82By this rule, Boolean contexts such as !if ( x )! can always be equivalently rewritten as \lstinline{if ( (x) != 0 )}.
     83\CFACC{} applies this rewriting in all Boolean contexts, so any type !T! can be made ``truthy'' (that is, given a Boolean interpretation) in \CFA{} by defining an operator overload \lstinline{int ?!=?(T, zero_t)}; unlike \CC{} prior to the addition of explicit casts in \CCeleven{}, this design does not add comparability or convertibility to arbitrary integer types.
    8484
    8585\CFA{} also includes a special type for !1!, !one_t!; like !zero_t!, !one_t! has built-in implicit conversions to the various integral types so that !1! maintains its expected semantics in legacy code.
     
    8787
    8888\CFA{} previously allowed !0! and !1! to be the names of polymorphic variables, with separate overloads for !int 0!, !int 1!, and !forall(dtype T) T* 0!.
    89 As revealed in my own work on generic types (Chapter~\ref{generic-chap}), the parameteric polymorphic zero variable was not generalizable to other types; though all null pointers have the same in-memory representation, the same cannot be said of the zero values of arbitrary types.
     89As revealed in the work presented here on generic types (Chapter~\ref{generic-chap}), the parametric polymorphic zero variable was not generalizable to other types; though all null pointers have the same in-memory representation, the same cannot be said of the zero values of arbitrary types.
    9090As such, variables that could represent !0! and !1! were phased out in favour of functions that could generate those values for a given type as appropriate.
    9191
     
    104104\CFA{} passes the size and alignment of the type represented by an !otype! parameter, as well as a default constructor, copy constructor, assignment operator, and destructor.
    105105Types which do not have one or more of these operators visible cannot be bound to !otype! parameters.
    106 In this design, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; experiments have shown this overhead to be similar to \CC{} virtual function calls. \TODO{rerun experiments, possibly look at vtable variant}
     106In this design, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; the experiments in Chapter~\ref{generic-chap} indicate that this overhead is comparable to that of \CC{} virtual function calls.
     107% \TODO{rerun experiments, possibly look at vtable variant}
    107108
    108109One benefit of this design is that it allows polymorphic functions to be separately compiled.
     
    139140If a polymorphic function can be used to satisfy one of its own type assertions, this recursion may not terminate, as it is possible that that function is examined as a candidate for its own assertion unboundedly repeatedly.
    140141To avoid such infinite loops, \CFACC{} imposes a fixed limit on the possible depth of recursion, similar to that employed by most \CC{} compilers for template expansion; this restriction means that there are some semantically well-typed expressions that cannot be resolved by \CFACC{}.
    141 \TODO{Update this with final state} One contribution made in the course of this thesis was modifying \CFACC{} to use the more flexible expression resolution algorithm for assertion matching, rather than the simpler but limited previous approach of unification on the types of the functions.
    142 
    143 \subsubsection{Deleted Declarations}
    144 
    145 Particular type combinations can be exempted from matching a given polymorphic function through use of a \emph{deleted function declaration}:
    146 
    147 \begin{cfa}
    148 int somefn(char) = void;
    149 \end{cfa}
    150 
    151 This feature is based on a \CCeleven{} feature typically used to make a type non-copyable by deleting its copy constructor and assignment operator\footnote{In previous versions of \CC{}, a type could be made non-copyable by declaring a private copy constructor and assignment operator, but not defining either. This idiom is well-known, but depends on some rather subtle and \CC{}-specific rules about private members and implicitly-generated functions; the deleted function form is both clearer and less verbose.} or forbidding some interpretations of a polymorphic function by specifically deleting the forbidden overloads\footnote{Specific polymorphic function overloads can also be forbidden in previous \CC{} versions through use of template metaprogramming techniques, though this advanced usage is beyond the skills of many programmers. A similar effect can be produced on an ad-hoc basis at the appropriate call sites through use of casts to determine the function type. In both cases, the deleted-function form is clearer and more concise.}.
    152 Deleted function declarations are implemented in \CFACC{} by adding them to the symbol table as usual, but with a flag set that indicates that the function is deleted.
    153 If this deleted declaration is selected as the unique minimal-cost interpretation of an expression than an error is produced. \TODO{Check this is implemented at print.}
     142
     143% \subsubsection{Deleted Declarations}
     144
     145% Particular type combinations can be exempted from matching a given polymorphic function through use of a \emph{deleted function declaration}:
     146
     147% \begin{cfa}
     148% int somefn(char) = void;
     149% \end{cfa}
     150
     151% This feature is based on a \CCeleven{} feature typically used to make a type non-copyable by deleting its copy constructor and assignment operator\footnote{In previous versions of \CC{}, a type could be made non-copyable by declaring a private copy constructor and assignment operator, but not defining either. This idiom is well-known, but depends on some rather subtle and \CC{}-specific rules about private members and implicitly-generated functions; the deleted function form is both clearer and less verbose.} or forbidding some interpretations of a polymorphic function by specifically deleting the forbidden overloads\footnote{Specific polymorphic function overloads can also be forbidden in previous \CC{} versions through use of template metaprogramming techniques, though this advanced usage is beyond the skills of many programmers. A similar effect can be produced on an ad-hoc basis at the appropriate call sites through use of casts to determine the function type. In both cases, the deleted-function form is clearer and more concise.}.
     152% Deleted function declarations are implemented in \CFACC{} by adding them to the symbol table as usual, but with a flag set that indicates that the function is deleted.
     153% If this deleted declaration is selected as the unique minimal-cost interpretation of an expression than an error is produced.
    154154
    155155\subsubsection{Traits}
    156156
    157 \CFA{} provides \emph{traits} as a means to name a group of type assertions, as in the example below\footnote{This example uses \CFA{}'s reference types, constructors, and zero type, described in Section~\ref{type-features-sec}.}:
     157\CFA{} provides \emph{traits} as a means to name a group of type assertions, as in the example below\footnote{This example uses \CFA{}'s reference types and constructors, described in Section~\ref{type-features-sec}.}:
    158158
    159159\begin{cfa}
     
    212212
    213213The flexibility of \CFA{}'s implicit trait-satisfaction mechanism provides programmers with a great deal of power, but also blocks some optimization approaches for expression resolution.
    214 The ability of types to begin or cease to satisfy traits when declarations go into or out of scope makes caching of trait satisfaction judgements difficult, and the ability of traits to take multiple type parameters can lead to a combinatorial explosion of work in any attempt to pre-compute trait satisfaction relationships.
     214The ability of types to begin or cease to satisfy traits when declarations go into or out of scope makes caching of trait satisfaction judgments difficult, and the ability of traits to take multiple type parameters can lead to a combinatorial explosion of work in any attempt to pre-compute trait satisfaction relationships.
    215215
    216216\subsection{Implicit Conversions} \label{implicit-conv-sec}
    217217
    218218In addition to the multiple interpretations of an expression produced by name overloading and polymorphic functions, for backward compatibility \CFA{} must support all of the implicit conversions present in C, producing further candidate interpretations for expressions.
    219 As mentioned above, C does not have an inheritance hierarchy of types, but the C standard's rules for the ``usual arithmetic conversions'\cit{} define which of the built-in types are implicitly convertible to which other types, and the relative cost of any pair of such conversions from a single source type.
     219As mentioned above, C does not have an inheritance hierarchy of types, but the C standard's rules for the ``usual arithmetic conversions' \cite[\S{}6.3.1.8]{C11} define which of the built-in types are implicitly convertible to which other types, as well as which implicit conversions to apply for mixed arguments to binary operators.
    220220\CFA{} adds rules to the usual arithmetic conversions defining the cost of binding a polymorphic type variable in a function call; such bindings are cheaper than any \emph{unsafe} (narrowing) conversion, \eg{} !int! to !char!, but more expensive than any \emph{safe} (widening) conversion, \eg{} !int! to !double!.
    221 One contribution of this thesis, discussed in Section \TODO{add to resolution chapter}, is a number of refinements to this cost model to more efficiently resolve polymorphic function calls.
     221One contribution of this thesis, discussed in Section~\ref{conv-cost-sec}, is a number of refinements to this cost model to more efficiently resolve polymorphic function calls.
    222222
    223223The expression resolution problem which is the focus of Chapter~\ref{resolution-chap} is to find the unique minimal-cost interpretation of each expression in the program, where all identifiers must be matched to a declaration, and implicit conversions or polymorphic bindings of the result of an expression may increase the cost of the expression.
     
    235235\subsubsection{Reference Types}
    236236
    237 One of the key ergonomic improvements in \CFA{} is reference types, designed and implemented by Robert Schluntz\cite{Schluntz17}.
     237One of the key ergonomic improvements in \CFA{} is reference types, designed and implemented by Robert Schluntz \cite{Schluntz17}.
    238238Given some type !T!, a !T&! (``reference to !T!'') is essentially an automatically dereferenced pointer.
    239239These types allow seamless pass-by-reference for function parameters, without the extraneous dereferencing syntax present in C; they also allow easy easy aliasing of nested values with a similarly convenient syntax.
     
    241241
    242242The C standard makes heavy use of the concept of \emph{lvalue}, an expression with a memory address; its complement, \emph{rvalue} (a non-addressable expression) is not explicitly named.
    243 In \CFA{}, the distinction between lvalue and rvalue can be reframed in terms of reference and non-reference types, with the benefit of being able to express the difference in user code.
     243In \CFA{}, the distinction between lvalue and rvalue can be re-framed in terms of reference and non-reference types, with the benefit of being able to express the difference in user code.
    244244\CFA{} references preserve the existing qualifier-dropping implicit lvalue-to-rvalue conversion from C (\eg{} a !const volatile int&! can be implicitly copied to a bare !int!)
    245245To make reference types more easily usable in legacy pass-by-value code, \CFA{} also adds an implicit rvalue-to-lvalue conversion, implemented by storing the value in a fresh compiler-generated temporary variable and passing a reference to that temporary.
     
    260260The primary issue with \CC{} references is that it is impossible to extract the address of the reference variable rather than the address of the referred-to variable.
    261261This breaks a number of the usual compositional properties of the \CC{} type system, \eg{} a reference cannot be re-bound to another variable, nor is it possible to take a pointer to, array of, or reference to a reference.
    262 \CFA{} supports all of these use cases \TODO{test array} without further added syntax.
     262\CFA{} supports all of these use cases without further added syntax.
    263263The key to this syntax-free feature support is an observation made by the author that the address of a reference is a lvalue.
    264264In C, the address-of operator !&x! can only be applied to lvalue expressions, and always produces an immutable rvalue; \CFA{} supports reference re-binding by assignment to the address of a reference, and pointers to references by repeating the address-of operator:
     
    275275\subsubsection{Resource Management}
    276276
    277 \CFA{} also supports the RAII (``Resource Acquisition is Initialization'') idiom originated by \CC{}, thanks to the object lifetime work of Robert Schluntz\cite{Schluntz17}.
     277\CFA{} also supports the RAII (``Resource Acquisition is Initialization'') idiom originated by \CC{}, thanks to the object lifetime work of Robert Schluntz \cite{Schluntz17}.
    278278This idiom allows a safer and more principled approach to resource management by tying acquisition of a resource to object initialization, with the corresponding resource release executed automatically at object finalization.
    279279A wide variety of conceptual resources may be conveniently managed by this scheme, including heap memory, file handles, and software locks.
     
    332332The implementation of tuples in \CFACC{}'s code generation is based on the generic types introduced in Chapter~\ref{generic-chap}, with one compiler-generated generic type for each tuple arity.
    333333This allows tuples to take advantage of the same runtime optimizations available to generic types, while reducing code bloat.
    334 An extended presentation of the tuple features of \CFA{} can be found in \cite{Moss18}, but the following example shows the basics:
     334An extended presentation of the tuple features of \CFA{} can be found in \cite{Moss18}, but the following example demonstrates the basic features:
    335335
    336336\begin{cfa}
     
    355355This precludes some argument-parameter matching strategies for expression resolution, as well as cheap interpretation filters based on comparing number of parameters and arguments.
    356356As an example, in the call to !swap( x, x )! above, the second !x! can be resolved starting at the second or third parameter of !swap!, depending which interpretation of !x! was chosen for the first argument.
     357
     358\section{Conclusion}
     359
     360\CFA{} adds a significant number of features to standard C, increasing the expressivity and re-usability of \CFA{} code while maintaining backwards compatibility for both code and larger language paradigms.
     361This flexibility does incur significant added compilation costs, however, the mitigation of which are the primary concern of this thesis.
  • doc/theses/aaron_moss_PhD/phd/generic-types.tex

    rf728971 ra2971cc  
    1414\begin{figure}
    1515        \begin{cfa}
     16                #include <stdlib.h> $\C{// for malloc}$
     17                #include <stdio.h>  $\C{// for printf}$
     18
    1619                struct int_list { int value; struct int_list* next; };
    1720
     
    5558\begin{figure}
    5659        \begin{cfa}
     60                #include <stdlib.h> $\C{// for malloc}$
     61                #include <stdio.h>  $\C{// for printf}$
     62
    5763                // single code implementation
    5864
     
    96102\begin{figure}
    97103        \begin{cfa}
     104                #include <stdlib.h> $\C{// for malloc}$
     105                #include <stdio.h>  $\C{// for printf}$
     106
    98107                $\C[\textwidth]{// code is nested in macros}$
    99108
     
    138147\CFA{} generic types integrate efficiently and naturally with the existing polymorphic functions in \CFA{}, while retaining backward compatibility with C in layout and support for separate compilation.
    139148A generic type can be declared in \CFA{} by placing a !forall! specifier on a !struct! or !union! declaration, and instantiated using a parenthesized list of types after the generic name.
    140 An example comparable to the C polymorphism examples in Figures~\ref{bespoke-generic-fig}, \ref{void-generic-fig}, and \ref{macro-generic-fig} can be seen in Figure~\ref{cfa-generic-fig} \TODO{test this code}.
     149An example comparable to the C polymorphism examples in Figures~\ref{bespoke-generic-fig}, \ref{void-generic-fig}, and \ref{macro-generic-fig} can be seen in Figure~\ref{cfa-generic-fig}.
    141150
    142151\begin{figure}
    143152        \begin{cfa}
     153                #include <stdlib.hfa> $\C{// for alloc}$
     154                #include <stdio.h>  $\C{// for printf}$
     155
    144156                forall(otype T) struct list { T value; list(T)* next; };
    145157
     
    175187Though a number of languages have some implementation of generic types, backward compatibility with both C and existing \CFA{} polymorphism presented some unique design constraints for this project.
    176188The guiding principle was to maintain an unsurprising language model for C programmers without compromising runtime efficiency.
    177 A key insight for this design was that C already possesses a handful of built-in generic types (\emph{compound types} in the language of the standard\cit{}), notably pointer (!T*!) and array (!T[]!), and that user-definable generics should act similarly.
     189A key insight for this design was that C already possesses a handful of built-in generic types (\emph{derived types} in the language of the standard \cite[\S{}6.2.5]{C11}), notably pointer (!T*!) and array (!T[]!), and that user-definable generics should act similarly.
    178190
    179191\subsection{Related Work}
    180192
    181 One approach to the design of generic types is that taken by \CC{} templates\cite{C++}.
     193One approach to the design of generic types is that taken by \CC{} templates \cite{C++}.
    182194The template approach is closely related to the macro-expansion approach to C polymorphism demonstrated in Figure~\ref{macro-generic-fig}, but where the macro-expansion syntax has been given first-class language support.
    183195Template expansion has the benefit of generating code with near-optimal runtime efficiency, as distinct optimizations can be applied for each instantiation of the template.
    184 On the other hand, template expansion can also lead to significant code bloat, exponential in the worst case\cit{}, and the costs of increased instruction cache pressure at runtime and wasted developer time when compiling cannot be discounted.
     196On the other hand, template expansion can also lead to significant code bloat, exponential in the worst case \cite{Haberman16}, and the costs of increased instruction cache pressure at runtime and wasted developer time when compiling cannot be discounted.
    185197The most significant restriction of the \CC{} template model is that it breaks separate compilation and C's translation-unit-based encapsulation mechanisms.
    186198Because a \CC{} template is not actually code, but rather a sort of ``recipe'' to generate code, template code must be visible at its call site to be used.
    187 Furthermore, \CC{} template code cannot be type-checked without instantiating it, a time consuming process with no hope of improvement until \CC{} concepts\cite{C++Concepts} are standardized in \CCtwenty{}.
     199Furthermore, \CC{} template code cannot be type-checked without instantiating it, a time consuming process with no hope of improvement until \CC{} concepts \cite{C++Concepts} are standardized in \CCtwenty{}.
    188200C code, by contrast, only needs a !struct! or function declaration to call that function or use (by-pointer) values of that type, a desirable property to maintain for \CFA{}.
    189201
    190 Java\cite{Java8} has another prominent implementation for generic types, introduced in Java~5 and based on a significantly different approach than \CC{}.
     202Java \cite{Java8} has another prominent implementation for generic types, introduced in Java~5 and based on a significantly different approach than \CC{}.
    191203The Java approach has much more in common with the !void*!-polymorphism shown in Figure~\ref{void-generic-fig}; since in Java nearly all data is stored by reference, the Java approach to polymorphic data is to store pointers to arbitrary data and insert type-checked implicit casts at compile-time.
    192204This process of \emph{type erasure} has the benefit of allowing a single instantiation of polymorphic code, but relies heavily on Java's object model and garbage collector.
    193205To use this model, a more C-like language such as \CFA{} would be required to dynamically allocate internal storage for variables, track their lifetime, and properly clean them up afterward.
    194206
    195 Cyclone\cite{Grossman06} is another language extending C, and also provides capabilities for polymorphic functions and existential types, similar to \CFA{}'s !forall! functions and generic types.
     207Cyclone \cite{Grossman06} is another language extending C, and also provides capabilities for polymorphic functions and existential types, similar to \CFA{}'s !forall! functions and generic types.
    196208Cyclone existential types can include function pointers in a construct similar to a virtual function table, but these pointers must be explicitly initialized at some point in the code, which is tedious and error-prone compared to \CFA{}'s implicit assertion satisfaction.
    197209Furthermore, Cyclone's polymorphic functions and types are restricted to abstraction over types with the same layout and calling convention as !void*!, \ie{} only pointer types and !int!.
     
    200212
    201213Many other languages include some form of generic types.
    202 As a brief survey, ML\cite{ML} was the first language to support parameteric polymorphism, but unlike \CFA{} does not support the use of assertions and traits to constrain type arguments.
    203 Haskell\cite{Haskell10} combines ML-style polymorphism with the notion of type classes, similar to \CFA{} traits, but requiring an explicit association with their implementing types, unlike \CFA{}.
    204 Objective-C\cite{obj-c-book} is an extension to C which has had some industrial success; however, it did not support type-checked generics until recently\cite{xcode7}, and it's garbage-collected, message-passing object-oriented model is a radical departure from C.
    205 Go\cite{Go}, and Rust\cite{Rust} are modern compiled languages with abstraction features similar to \CFA{} traits, \emph{interfaces} in Go and \emph{traits} in Rust.
    206 Go has implicit interface implementation and uses a ``fat pointer'' construct to pass polymorphic objects to functions, similar in principle to \CFA{}'s implicit forall paramters.
     214As a brief survey, ML \cite{ML} was the first language to support parametric polymorphism, but unlike \CFA{} does not support the use of assertions and traits to constrain type arguments.
     215Haskell \cite{Haskell10} combines ML-style polymorphism with the notion of type classes, similar to \CFA{} traits, but requiring an explicit association with their implementing types, unlike \CFA{}.
     216Objective-C \cite{obj-c-book} is an extension to C which has had some industrial success; however, it did not support type-checked generics until recently \cite{xcode7}, and its garbage-collected, message-passing object-oriented model is a radical departure from C.
     217Go \cite{Go}, and Rust \cite{Rust} are modern compiled languages with abstraction features similar to \CFA{} traits, \emph{interfaces} in Go and \emph{traits} in Rust.
     218Go has implicit interface implementation and uses a ``fat pointer'' construct to pass polymorphic objects to functions, similar in principle to \CFA{}'s implicit forall parameters.
    207219Go does not, however, allow user code to define generic types, restricting Go programmers to the small set of generic types defined by the compiler.
    208 Rust has powerful abstractions for generic programming, including explicit implemenation of traits and options for both separately-compiled virtual dispatch and template-instantiated static dispatch in functions.
     220Rust has powerful abstractions for generic programming, including explicit implementation of traits and options for both separately-compiled virtual dispatch and template-instantiated static dispatch in functions.
    209221On the other hand, the safety guarantees of Rust's \emph{lifetime} abstraction and borrow checker impose a distinctly idiosyncratic programming style and steep learning curve; \CFA{}, with its more modest safety features, allows direct ports of C code while maintaining the idiomatic style of the original source.
    210222
     
    214226Like \CC{} template types, generic !struct!s and !union!s in \CFA{} have macro-expanded storage layouts, but, like Java generics, \CFA{} generic types can be used with separately-compiled polymorphic functions without requiring either the type or function definition to be visible to the other.
    215227The fact that the storage layout of any instantiation of a \CFA{} generic type is identical to that of the monomorphic type produced by simple macro replacement of the generic type parameters is important to provide consistent and predictable runtime performance, and to not impose any undue abstraction penalty on generic code.
    216 As an example, consider the following generic type and function \TODO{test this}:
    217 
     228As an example, consider the following generic type and function:
     229
     230% TODO whatever the bug is with initializer-expressions not working, it affects this
    218231\begin{cfa}
    219232forall( otype R, otype S ) struct pair { R first; S second; };
    220233
    221234pair(const char*, int) with_len( const char* s ) {
    222         return (pair(const char*), int){ s, strlen(s) };
     235        return (pair(const char*, int)){ s, strlen(s) };
    223236}
    224237\end{cfa}
     
    227240If its return type was !pair(const char*, int)*!, callers of !with_len! would only need the declaration !forall(otype R, otype S) struct pair! to call it, in accordance with the usual C rules for opaque types.
    228241
    229 !with_len! is itself a monomorphic function, returning a type that is structurally identical to !struct { const char* first; int second; }!, and as such could be called from C given an appropriate redeclaration and demangling flags.
    230 However, the definition of !with_len! depends on a polymorphic function call to the !pair! constructor, which only needs to be written once (in this case, implicitly by the compiler according to the usual \CFA{} constructor generation\cite{Moss18}) and can be re-used for a wide variety of !pair! instantiations.
     242!with_len! is itself a monomorphic function, returning a type that is structurally identical to !struct { const char* first; int second; }!, and as such could be called from C given an appropriate re-declaration and demangling flags.
     243However, the definition of !with_len! depends on a polymorphic function call to the !pair! constructor, which only needs to be written once (in this case, implicitly by the compiler according to the usual \CFA{} constructor generation \cite{Schluntz17}) and can be re-used for a wide variety of !pair! instantiations.
    231244Since the parameters to this polymorphic constructor call are all statically known, compiler inlining can eliminate any runtime overhead of this polymorphic call.
    232245
    233246\CFA{} deliberately does not support \CC{}-style partial specializations of generic types.
    234 A particularly infamous example in the \CC{} standard library is !vector<bool>!, which is represented as a bitstring rather than the array representation of the other !vector! instantiations.
     247A particularly infamous example in the \CC{} standard library is !vector<bool>!, which is represented as a bit-string rather than the array representation of the other !vector! instantiations.
    235248Complications from this inconsistency (chiefly the fact that a single bit is not addressable, unlike an array element) make the \CC{} !vector! unpleasant to use in generic contexts due to the break in its public interface.
    236 Rather than attempting to plug leaks in the template specialization abstraction with a detailed method interface, \CFA{} takes the more principled position that two types with an unrelated data layout are in fact unrelated types, and should be handled with different code.
    237 Of course, to the degree that distinct types are similar enough to share an interface, the \CFA{} !trait! system allows one to be defined, and objects of types implementing that !trait! to be operated on in the same polymorphic functions.
     249Rather than attempting to plug leaks in the template specialization abstraction with a detailed method interface, \CFA{} takes the more consistent position that two types with an unrelated data layout are in fact unrelated types, and should be handled with different code.
     250Of course, to the degree that distinct types are similar enough to share an interface, the \CFA{} !trait! system allows such an interface to be defined, and objects of types implementing that !trait! to be operated on using the same polymorphic functions.
    238251
    239252Since \CFA{} polymorphic functions can operate over polymorphic generic types, functions over such types can be partially or completely specialized using the usual overload selection rules.
     
    262275Instead, my design for generic type support in \CFACC{} distinguishes between \emph{concrete} generic types that have a fixed memory layout regardless of type parameters and \emph{dynamic} generic types that may vary in memory layout depending on their type parameters.
    263276A \emph{dtype-static} type has polymorphic parameters but is still concrete.
    264 Polymorphic pointers are an example of dtype-static types; given some type variable !T!, T is a polymorphic type, but !T*! has a fixed size and can therefore be represented by a !void*! in code generation.
     277Polymorphic pointers are an example of dtype-static types; given some type variable !T!, !T! is a polymorphic type, but !T*! has a fixed size and can therefore be represented by a !void*! in code generation.
    265278In particular, generic types where all parameters are un-!sized! (\ie{} they do not conform to the built-in !sized! trait because the compiler does not know their size and alignment) are always concrete, as there is no possibility for their layout to vary based on type parameters of unknown size and alignment.
    266279More precisely, a type is concrete if and only if all of its !sized! type parameters are concrete, and a concrete type is dtype-static if any of its type parameters are (possibly recursively) polymorphic.
    267 To illustrate, the following code using the !pair! type from above \TODO{test this} has each use of !pair! commented with its class:
    268 
     280To illustrate, the following code using the !pair! type from above has each use of !pair! commented with its class:
     281
     282% TODO constructor bugs here too
    269283\begin{cfa}
    270284//dynamic, layout varies based on T
     
    310324Access to members of a dynamic structure is provided at runtime via base displacement addressing the structure pointer and the member offset (similar to the !offsetof! macro), moving a compile-time offset calculation to runtime.
    311325
    312 the offset arrays are statically generated where possible.
     326The offset arrays are statically generated where possible.
    313327If a dynamic generic type is passed or returned by value from a polymorphic function, \CFACC{} can safely assume that the generic type is complete (\ie{} has a known layout) at any call site, and the offset array is passed from the caller; if the generic type is concrete at the call site, the elements of this offset array can even be statically generated using the C !offsetof! macro.
    314328As an example, the body of the second !value! function above is implemented as
     
    319333
    320334Here, !_assign_T! is passed in as an implicit parameter from !otype T! and takes two !T*! (!void*! in the generated code), a destination and a source, and !_retval! is the pointer to a caller-allocated buffer for the return value, the usual \CFA{} method to handle dynamically-sized return types.
    321 !_offsetof_pair! is the offset array passed into !value!; this array is generated at the call site as
     335!_offsetof_pair! is the offset array passed into !value!; this array is statically generated at the call site as
    322336
    323337\begin{cfa}
     
    330344For instance, modularity is generally provided in C by including an opaque forward declaration of a structure and associated accessor and mutator functions in a header file, with the actual implementations in a separately-compiled \texttt{.c} file.
    331345\CFA{} supports this pattern for generic types, implying that the caller of a polymorphic function may not know the actual layout or size of a dynamic generic type and only holds it by pointer.
    332 \CFACC{} automatically generates \emph{layout functions} for cases where the size, alignment, and offset array of a generic struct cannot be passed into a function from that functions's caller.
     346\CFACC{} automatically generates \emph{layout functions} for cases where the size, alignment, and offset array of a generic struct cannot be passed into a function from that function's caller.
    333347These layout functions take as arguments pointers to size and alignment variables and a caller-allocated array of member offsets, as well as the size and alignment of all !sized! parameters to the generic structure.
    334 Un!sized! parameters not passed because they are forbidden from being used in a context that affects layout by C's usual rules about incomplete types.
     348Un-!sized! parameters are not passed because they are forbidden from being used in a context that affects layout by C's usual rules about incomplete types.
    335349Similarly, the layout function can only safely be called from a context where the generic type definition is visible, because otherwise the caller will not know how large to allocate the array of member offsets.
    336350
    337 The C standard does not specify a memory layout for structs, but the POSIX ABI for x86\cit{} does; this memory layout is common for C implementations, but is a platform-specific issue for porting \CFA{}.
     351The C standard does not specify a memory layout for structs, but the POSIX ABI for x86 \cite{POSIX08} does; this memory layout is common for C implementations, but is a platform-specific issue for porting \CFA{}.
    338352This algorithm, sketched below in pseudo-\CFA{}, is a straightforward mapping of consecutive fields into the first properly-aligned offset in the !struct! layout; layout functions for !union! types omit the offset array and simply calculate the maximum size and alignment over all union variants.
    339353Since \CFACC{} generates a distinct layout function for each type, constant-folding and loop unrolling are applied.
     
    361375
    362376Results of layout function calls are cached so that they are only computed once per type per function.
    363 Layout functions also allow generic types to be used in a function definition without reflecting them in the function signature, an important implemenation-hiding constraint of the design.
     377Layout functions also allow generic types to be used in a function definition without reflecting them in the function signature, an important implementation-hiding constraint of the design.
    364378For instance, a function that strips duplicate values from an unsorted !list(T)! likely has a reference to the list as its only explicit parameter, but uses some sort of !set(T)! internally to test for duplicate values.
    365379This function could acquire the layout for !set(T)! by calling its layout function, providing as an argument the layout of !T! implicitly passed into that function.
    366380
    367 Whether a type is concrete, dtype-static, or dynamic is decided solely on the basis of the type arguments and !forall! clause type paramters.
    368 This design allows opaque forward declarations of generic types, \eg{} !forall(otype T) struct Box;! like in C, all uses of $Box(T)$ can be separately compiled, and callers from other translation units know the proper calling conventions to use.
     381Whether a type is concrete, dtype-static, or dynamic is decided solely on the basis of the type arguments and !forall! clause type parameters.
     382This design allows opaque forward declarations of generic types, \eg{} !forall(otype T) struct Box;! like in C, all uses of !Box(T)! can be separately compiled, and callers from other translation units know the proper calling conventions to use.
    369383In an alternate design where the definition of a structure type is included in deciding whether a generic type is dynamic or concrete, some further types may be recognized as dtype-static --- \eg{} !Box! could be defined with a body !{ T* p; }!, and would thus not depend on !T! for its layout.
    370384However, the existence of an !otype! parameter !T! means that !Box! \emph{could} depend on !T! for its layout if this definition is not visible, and we judged preserving separate compilation (and the associated C compatibility) in the implemented design to be an acceptable trade-off.
     
    435449\end{cfa}
    436450
    437 The four versions of the benchmark implemented are C with !void*!-based polymorphism, \CFA{} with parameteric polymorphism, \CC{} with templates, and \CC{} using only class inheritance for polymorphism, denoted \CCV{}.
     451The four versions of the benchmark implemented are C with !void*!-based polymorphism, \CFA{} with parametric polymorphism, \CC{} with templates, and \CC{} using only class inheritance for polymorphism, denoted \CCV{}.
    438452The \CCV{} variant illustrates an alternative object-oriented idiom where all objects inherit from a base !object! class, mimicking a Java-like interface; in particular, runtime checks are necessary to safely downcast objects.
    439453The most notable difference among the implementations is the memory layout of generic types: \CFA{} and \CC{} inline the stack and pair elements into corresponding list and pair nodes, while C and \CCV{} lack such capability and, instead, must store generic objects via pointers to separately allocated objects.
     
    475489Use of these standard library types has minimal impact on the performance benchmarks, but shrinks the \CFA{} and \CC{} code to 39 and 42 lines, respectively.
    476490The difference between the \CFA{} and \CC{} line counts is primarily declaration duplication to implement separate compilation; a header-only \CFA{} library is similar in length to the \CC{} version.
    477 On the other hand, due to the language shortcomings mentioned at the beginning of the chapter, C does not have a generic collections library in its standard distribution, resulting in frequent reimplementation of such collection types by C programmers.
     491On the other hand, due to the language shortcomings mentioned at the beginning of the chapter, C does not have a generic collections library in its standard distribution, resulting in frequent re-implementation of such collection types by C programmers.
    478492\CCV{} does not use the \CC{} standard template library by construction, and, in fact, includes the definition of !object! and wrapper classes for !char!, !short!, and !int! in its line count, which inflates this count somewhat, as an actual object-oriented language would include these in the standard library.
    479493I justify the given line count by noting that many object-oriented languages do not allow implementing new interfaces on library types without subclassing or wrapper types, which may be similarly verbose.
     
    481495Line count is a fairly rough measure of code complexity; another important factor is how much type information the programmer must specify manually, especially where that information is not type-checked.
    482496Such unchecked type information produces a heavier documentation burden and increased potential for runtime bugs and is much less common in \CFA{} than C, with its manually specified function pointer arguments and format codes, or \CCV{}, with its extensive use of un-type-checked downcasts, \eg{} !object! to !integer! when popping a stack.
    483 To quantify this manual typing, the ``redundant type annotations'' line in Table~\ref{generic-eval-table} counts the number of lines on which the known type of a variable is respecified, either as a format specifier, explicit downcast, type-specific function, or by name in a !sizeof!, !struct! literal, or !new! expression.
     497To quantify this manual typing, the ``redundant type annotations'' line in Table~\ref{generic-eval-table} counts the number of lines on which the known type of a variable is re-specified, either as a format specifier, explicit downcast, type-specific function, or by name in a !sizeof!, !struct! literal, or !new! expression.
    484498The \CC{} benchmark uses two redundant type annotations to create new stack nodes, whereas the C and \CCV{} benchmarks have several such annotations spread throughout their code.
    485499The \CFA{} benchmark is able to eliminate \emph{all} redundant type annotations through use of the return-type polymorphic !alloc! function in the \CFA{} standard library.
  • doc/theses/aaron_moss_PhD/phd/introduction.tex

    rf728971 ra2971cc  
    3030Particular contributions of this work include design and implementation of
    3131parametric-polymorphic (``generic'') types in a manner compatible with the existing polymorphism design of \CFA{} (Chapter~\ref{generic-chap}), a type environment data structure based on a novel variant of the union-find algorithm (Chapter~\ref{env-chap}), and a new expression resolution algorithm designed to quickly locate the optimal declarations for a \CFA{} expression (Chapter~\ref{resolution-chap}).
    32 This expression resolution algorithm was designed with the aid of a purpose-built prototype system which encapsulates the essential aspects of the \CFA{} type system without incurring the technical debt of the existing system or the friction-inducing necessity of maintaining a working compiler; the goal of this prototype system was to discover effective heuristics to avoid performing unnecessary work in the process of locating the optimal \CFA{} expression resolution.
     32This expression resolution algorithm was designed with the aid of a purpose-built prototype system which encapsulates the essential aspects of the \CFA{} type system without incurring the technical debt of the existing system or the friction-inducing necessity of maintaining a working compiler; the goal of this prototype system was to discover effective heuristics to avoid performing unnecessary work in the process of locating the optimal \CFA{} expression resolution.
     33This prototype system (described in Chapter~\ref{expr-chap}) shows promise as a vehicle for future research in compiler design, and is itself a technical contribution of this thesis.
    3334
    3435Though the direction and validation of this work was fairly narrowly focused on the \CFA{} programming language, the tools used and results obtained should be of interest to a wider compiler and programming language design community.
    35 In particular, with the addition of \emph{concepts} in \CCtwenty{}, conforming \CC{} compilers must support a model of type assertions very similar to that in \CFA{}, and the algorithmic techniques used in the expression resolution algorithm presented here may prove useful.
     36In particular, with the addition of \emph{concepts} in \CCtwenty{} \cite{C++Concepts}, conforming \CC{} compilers must support a model of type assertions very similar to that in \CFA{}, and the algorithmic techniques used here may prove useful.
    3637Type environments are also widely modelled in compiler implementations, particularly of functional languages, though also increasingly commonly in other languages (such as Rust) which perform type inference; the type environment presented here may be useful to those language implementers.
  • doc/theses/aaron_moss_PhD/phd/resolution-heuristics.tex

    rf728971 ra2971cc  
    2222\subsection{Conversion Cost} \label{conv-cost-sec}
    2323
    24 C does not have an explicit cost model for implicit conversions, but the ``usual arithmetic conversions''\cite[\S{}6.3.1.8]{C11} used to decide which arithmetic operators to use define one implicitly.
     24C does not have an explicit cost model for implicit conversions, but the ``usual arithmetic conversions'' \cite[\S{}6.3.1.8]{C11} used to decide which arithmetic operators to use define one implicitly.
    2525The only context in which C has name overloading is the arithmetic operators, and the usual arithmetic conversions define a \emph{common type} for mixed-type arguments to binary arithmetic operators.
    2626Since for backward-compatibility purposes the conversion costs of \CFA{} must produce an equivalent result to these common type rules, it is appropriate to summarize \cite[\S{}6.3.1.8]{C11} here:
Note: See TracChangeset for help on using the changeset viewer.