Changeset 4eaefd1 for doc


Ignore:
Timestamp:
Feb 27, 2019, 10:26:18 PM (6 years ago)
Author:
Aaron Moss <a3moss@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
d1b1063
Parents:
4cdfcbd
Message:

thesis: second draft to end of 3.2

Location:
doc/theses/aaron_moss_PhD/phd
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/aaron_moss_PhD/phd/background.tex

    r4cdfcbd r4eaefd1  
    115115\subsection{Type Assertions}
    116116
    117 Since bare polymorphic types do not provide a great range of available operations, \CFA{} provides a \emph{type assertion} mechanism to provide further information about a type\footnote{Subscript not included in source code}:
     117Since bare polymorphic types do not provide a great range of available operations, \CFA{} provides a \emph{type assertion} mechanism to provide further information about a type\footnote{Subscript not included in source code.\label{sub-foot}}:
    118118
    119119\begin{cfa}
     
    129129
    130130Monomorphic specializations of polymorphic functions can themselves be used to satisfy type assertions.
    131 For instance, !twice! could have been defined like this\footnotemark[5]:
     131For instance, !twice! could have been defined like this\footref{sub-foot}:
    132132
    133133\begin{cfa}
  • doc/theses/aaron_moss_PhD/phd/generic-types.tex

    r4cdfcbd r4eaefd1  
    77While this approach is flexible and supports integration with the C type checker and tooling, it is also tedious and error prone, especially for more complex data structures.
    88A second approach is to use !void*!-based polymorphism, \eg{} the C standard library functions !bsearch! and !qsort!, which allow for the reuse of common functionality.
    9 However, basing all polymorphism on !void*! eliminates the type checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is otherwise not needed.
     9However, basing all polymorphism on !void*! eliminates the type checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is otherwise unnecessary.
    1010A third approach to generic code is to use preprocessor macros, which does allow the generated code to be both generic and type checked, but errors in such code may be difficult to locate and debug.
    1111Furthermore, writing and using preprocessor macros is unnatural and inflexible.
    12 Figure~\ref{bespoke-generic-fig} demonstrates the bespoke approach for a simple linked list with !insert! and !head! operations, while Figure~\ref{void-generic-fig} and Figure~\ref{macro-generic-fig} show the same example using !void*!- and !#define!-based polymorphism, respectively.
     12Figure~\ref{bespoke-generic-fig} demonstrates the bespoke approach for a simple linked list with !insert! and !head! operations, while Figure~\ref{void-generic-fig} and Figure~\ref{macro-generic-fig} show the same example using !void*! and !#define!-based polymorphism, respectively.
    1313
    1414\begin{figure}
     
    185185\section{Design}
    186186
    187 Though a number of languages have some implementation of generic types, backward compatibility with both C and existing \CFA{} polymorphism presented some unique design constraints for this project.
    188 The guiding principle was to maintain an unsurprising language model for C programmers without compromising runtime efficiency.
    189 A key insight for this design was that C already possesses a handful of built-in generic types (\emph{derived types} in the language of the standard \cite[\S{}6.2.5]{C11}), notably pointer (!T*!) and array (!T[]!), and that user-definable generics should act similarly.
     187Though a number of languages have some implementation of generic types, backward compatibility with both C and existing \CFA{} polymorphism present some unique design constraints for \CFA{} generics.
     188The guiding principle is to maintain an unsurprising language model for C programmers without compromising runtime efficiency.
     189A key insight for this design is that C already possesses a handful of built-in generic types (\emph{derived types} in the language of the standard \cite[\S{}6.2.5]{C11}), notably pointer (!T*!) and array (!T[]!), and that user-definable generics should act similarly.
    190190
    191191\subsection{Related Work}
     
    194194The template approach is closely related to the macro-expansion approach to C polymorphism demonstrated in Figure~\ref{macro-generic-fig}, but where the macro-expansion syntax has been given first-class language support.
    195195Template expansion has the benefit of generating code with near-optimal runtime efficiency, as distinct optimizations can be applied for each instantiation of the template.
    196 On the other hand, template expansion can also lead to significant code bloat, exponential in the worst case \cite{Haberman16}, and the costs of increased instruction cache pressure at runtime and wasted developer time when compiling cannot be discounted.
     196On the other hand, template expansion can also lead to significant code bloat, exponential in the worst case \cite{Haberman16}, and the costs of increased compilation time and instruction cache pressure cannot be ignored.
    197197The most significant restriction of the \CC{} template model is that it breaks separate compilation and C's translation-unit-based encapsulation mechanisms.
    198 Because a \CC{} template is not actually code, but rather a sort of ``recipe'' to generate code, template code must be visible at its call site to be used.
     198Because a \CC{} template is not actually code, but rather a ``recipe'' to generate code, template code must be visible at its call site to be used.
    199199Furthermore, \CC{} template code cannot be type-checked without instantiating it, a time consuming process with no hope of improvement until \CC{} concepts \cite{C++Concepts} are standardized in \CCtwenty{}.
    200 C code, by contrast, only needs a !struct! or function declaration to call that function or use (by-pointer) values of that type, a desirable property to maintain for \CFA{}.
     200C code, by contrast, only needs a function declaration to call that function or a !struct! declaration to use (by-pointer) values of that type, desirable properties to maintain in \CFA{}.
    201201
    202202Java \cite{Java8} has another prominent implementation for generic types, introduced in Java~5 and based on a significantly different approach than \CC{}.
     
    205205To use this model, a more C-like language such as \CFA{} would be required to dynamically allocate internal storage for variables, track their lifetime, and properly clean them up afterward.
    206206
    207 Cyclone \cite{Grossman06} is another language extending C, and also provides capabilities for polymorphic functions and existential types, similar to \CFA{}'s !forall! functions and generic types.
     207Cyclone \cite{Grossman06} extends C and also provides capabilities for polymorphic functions and existential types which are similar to \CFA{}'s !forall! functions and generic types.
    208208Cyclone existential types can include function pointers in a construct similar to a virtual function table, but these pointers must be explicitly initialized at some point in the code, which is tedious and error-prone compared to \CFA{}'s implicit assertion satisfaction.
    209209Furthermore, Cyclone's polymorphic functions and types are restricted to abstraction over types with the same layout and calling convention as !void*!, \ie{} only pointer types and !int!.
     
    215215Haskell \cite{Haskell10} combines ML-style polymorphism with the notion of type classes, similar to \CFA{} traits, but requiring an explicit association with their implementing types, unlike \CFA{}.
    216216Objective-C \cite{obj-c-book} is an extension to C which has had some industrial success; however, it did not support type-checked generics until recently \cite{xcode7}, and its garbage-collected, message-passing object-oriented model is a radical departure from C.
    217 Go \cite{Go}, and Rust \cite{Rust} are modern compiled languages with abstraction features similar to \CFA{} traits, \emph{interfaces} in Go and \emph{traits} in Rust.
     217Go \cite{Go}, and Rust \cite{Rust} are modern compiled languages with abstraction features similar to \CFA{} traits: \emph{interfaces} in Go and \emph{traits} in Rust.
    218218Go has implicit interface implementation and uses a ``fat pointer'' construct to pass polymorphic objects to functions, similar in principle to \CFA{}'s implicit forall parameters.
    219219Go does not, however, allow user code to define generic types, restricting Go programmers to the small set of generic types defined by the compiler.
     
    223223\subsection{\CFA{} Generics}
    224224
    225 The generic types design in \CFA{} draws inspiration from both \CC{} and Java generics, capturing the better aspects of each.
    226 Like \CC{} template types, generic !struct!s and !union!s in \CFA{} have macro-expanded storage layouts, but, like Java generics, \CFA{} generic types can be used with separately-compiled polymorphic functions without requiring either the type or function definition to be visible to the other.
     225The generic types design in \CFA{} draws inspiration from both \CC{} and Java generics, capturing useful aspects of each.
     226Like \CC{} template types, generic !struct! and !union! types in \CFA{} have macro-expanded storage layouts, but, like Java generics, \CFA{} generic types can be used with separately-compiled polymorphic functions without requiring either the type or function definition to be visible to the other.
    227227The fact that the storage layout of any instantiation of a \CFA{} generic type is identical to that of the monomorphic type produced by simple macro replacement of the generic type parameters is important to provide consistent and predictable runtime performance, and to not impose any undue abstraction penalty on generic code.
    228228As an example, consider the following generic type and function:
     
    238238
    239239In this example, !with_len! is defined at the same scope as !pair!, but it could be called from any context that can see the definition of !pair! and a declaration of !with_len!.
    240 If its return type was !pair(const char*, int)*!, callers of !with_len! would only need the declaration !forall(otype R, otype S) struct pair! to call it, in accordance with the usual C rules for opaque types.
    241 
    242 !with_len! is itself a monomorphic function, returning a type that is structurally identical to !struct { const char* first; int second; }!, and as such could be called from C given an appropriate re-declaration and demangling flags.
     240If its return type were !pair(const char*, int)*!, callers of !with_len! would only need the declaration !forall(otype R, otype S) struct pair! to call it, in accordance with the usual C rules for opaque types.
     241
     242!with_len! is itself a monomorphic function, returning a type that is structurally identical to !struct { const char* first; int second; }!, and as such could be called from C given appropriate re-declarations and demangling flags.
    243243However, the definition of !with_len! depends on a polymorphic function call to the !pair! constructor, which only needs to be written once (in this case, implicitly by the compiler according to the usual \CFA{} constructor generation \cite{Schluntz17}) and can be re-used for a wide variety of !pair! instantiations.
    244 Since the parameters to this polymorphic constructor call are all statically known, compiler inlining can eliminate any runtime overhead of this polymorphic call.
     244Since the parameters to this polymorphic constructor call are all statically known, compiler inlining can in principle eliminate any runtime overhead of this polymorphic call.
    245245
    246246\CFA{} deliberately does not support \CC{}-style partial specializations of generic types.
     
    251251
    252252Since \CFA{} polymorphic functions can operate over polymorphic generic types, functions over such types can be partially or completely specialized using the usual overload selection rules.
    253 As an example, the !with_len! function above could be an optimization of the following more general function:
     253As an example, the following generalization of !with_len! is a semantically-equivalent function which works for any type that has a !len! function declared, making use of both the ad-hoc (overloading) and parametric (!forall!) polymorphism features of \CFA{}:
    254254
    255255\begin{cfa}
     
    260260\end{cfa}
    261261
    262 \CFA{} generic types also support the type constraints from !forall! functions.
     262\CFA{} generic types also support type constraints, as in !forall! functions.
    263263For example, the following declaration of a sorted set type ensures that the set key implements equality and relational comparison:
    264264
     
    267267\end{cfa}
    268268
    269 These constraints are implemented by applying equivalent constraints to the compiler-generated constructors for this type.
     269These constraints are enforced by applying equivalent constraints to the compiler-generated constructors for this type.
    270270
    271271\section{Implementation} \label{generic-impl-sec}
    272272
    273 The ability to use generic types in polymorphic contexts means that the \CFA{} implementation in \CFACC{} must support a mechanism for accessing fields of generic types dynamically at runtime.
    274 While \CFACC{} could in principle use this same mechanism for accessing fields of all generic types, such an approach would throw away compiler knowledge of static types and impose an unnecessary runtime cost, limiting the utility of the generic type design.
    275 Instead, my design for generic type support in \CFACC{} distinguishes between \emph{concrete} generic types that have a fixed memory layout regardless of type parameters and \emph{dynamic} generic types that may vary in memory layout depending on their type parameters.
     273The ability to use generic types in polymorphic contexts means that the \CFA{} implementation must support a mechanism for accessing fields of generic types dynamically.
     274While \CFACC{} could in principle use this same mechanism for accessing fields of generic types in monomorphic contexts as well, such an approach would throw away compiler knowledge of static types and impose an unnecessary runtime cost.
     275Instead, my design for generic types in \CFACC{} distinguishes between \emph{concrete} generic types that have a fixed memory layout regardless of type parameters and \emph{dynamic} generic types that may vary in memory layout depending on their type parameters.
     276
    276277A \emph{dtype-static} type has polymorphic parameters but is still concrete.
    277278Polymorphic pointers are an example of dtype-static types; given some type variable !T!, !T! is a polymorphic type, but !T*! has a fixed size and can therefore be represented by a !void*! in code generation.
     
    283284\begin{cfa}
    284285//dynamic, layout varies based on T
    285 forall(otype T) T value( pair(const char*, T) p ) { return p.second; }
     286forall(otype T) T value$\(_1\)$( pair(const char*, T) p ) { return p.second; }
    286287
    287288// dtype-static, F* and T* are concrete but recursively polymorphic
    288 forall(dtype F, otype T) T value( pair(F*, T*) ) { return *p.second; }
     289forall(dtype F, otype T) T value$\(_2\)$( pair(F*, T*) ) { return *p.second; }
    289290
    290291pair(const char*, int) p = {"magic", 42}; $\C[2.5in]{// concrete}$
    291292int i = value(p);
    292 pair(void*, int*) q = {0, &p.second}; $\C[2.5in]{// concrete}$
     293pair(void*, int*) q = {0, &i}; $\C[2.5in]{// concrete}$
    293294i = value(q);
    294295double d = 1.0;
     
    299300\subsection{Concrete Generic Types}
    300301
    301 The \CFACC{} translator template expands concrete generic types into new structure types, affording maximal inlining.
     302The \CFACC{} translator template-expands concrete generic types into new structure types, affording maximal inlining.
    302303To enable interoperation among equivalent instantiations of a generic type, \CFACC{} saves the set of instantiations currently in scope and reuses the generated structure declarations where appropriate.
    303304In particular, tuple types are implemented as a single compiler-generated generic type definition per tuple arity, and can be instantiated and reused according to the usual rules for generic types.
    304305A function declaration that accepts or returns a concrete generic type produces a declaration for the instantiated structure in the same scope, which all callers may reuse.
    305 As an example, the concrete instantiation for !pair(const char*, int)! is\footnote{This omits the field name mangling performed by \CFACC{} for overloading purposes.\label{mangle-foot}}
     306As an example, the concrete instantiation for !pair(const char*, int)! is\footnote{Field name mangling for overloading purposes is omitted.\label{mangle-foot}}:
    306307
    307308\begin{cfa}
     
    310311
    311312A concrete generic type with dtype-static parameters is also expanded to a structure type, but this type is used for all matching instantiations.
    312 In the example above, the !pair(F*, T*)! parameter to !value! is such a type; its expansion is below\footref{mangle-foot}, and it is used as the type of the variables !q! and !r! as well, with casts for member access where appropriate.
     313In the example above, the !pair(F*, T*)! parameter to !value! is such a type; its expansion is below\footref{mangle-foot}, and it is used as the type of the variables !q! and !r! as well, with casts for member access where appropriate:
    313314
    314315\begin{cfa}
     
    322323The design for generic types presented here adds an \emph{offset array} containing structure-member offsets for dynamic generic !struct! types.
    323324A dynamic generic !union! needs no such offset array, as all members are at offset 0, but size and alignment are still necessary.
    324 Access to members of a dynamic structure is provided at runtime via base displacement addressing the structure pointer and the member offset (similar to the !offsetof! macro), moving a compile-time offset calculation to runtime.
     325Access to members of a dynamic structure is provided at runtime via base-displacement addressing of the structure pointer and the member offset (similar to the !offsetof! macro), moving a compile-time offset calculation to runtime.
    325326
    326327The offset arrays are statically generated where possible.
    327328If a dynamic generic type is passed or returned by value from a polymorphic function, \CFACC{} can safely assume that the generic type is complete (\ie{} has a known layout) at any call site, and the offset array is passed from the caller; if the generic type is concrete at the call site, the elements of this offset array can even be statically generated using the C !offsetof! macro.
    328 As an example, the body of the second !value! function above is implemented as
     329As an example, the body of !value!$_2$ above is implemented as:
    329330
    330331\begin{cfa}
     
    333334
    334335Here, !_assign_T! is passed in as an implicit parameter from !otype T! and takes two !T*! (!void*! in the generated code), a destination and a source, and !_retval! is the pointer to a caller-allocated buffer for the return value, the usual \CFA{} method to handle dynamically-sized return types.
    335 !_offsetof_pair! is the offset array passed into !value!; this array is statically generated at the call site as
     336!_offsetof_pair! is the offset array passed into !value!; this array is statically generated at the call site as:
    336337
    337338\begin{cfa}
     
    347348These layout functions take as arguments pointers to size and alignment variables and a caller-allocated array of member offsets, as well as the size and alignment of all !sized! parameters to the generic structure.
    348349Un-!sized! parameters are not passed because they are forbidden from being used in a context that affects layout by C's usual rules about incomplete types.
    349 Similarly, the layout function can only safely be called from a context where the generic type definition is visible, because otherwise the caller will not know how large to allocate the array of member offsets.
     350Similarly, the layout function can only safely be called from a context where the generic type definition is visible, because otherwise the caller does not know how large to allocate the array of member offsets.
    350351
    351352The C standard does not specify a memory layout for structs, but the POSIX ABI for x86 \cite{POSIX08} does; this memory layout is common for C implementations, but is a platform-specific issue for porting \CFA{}.
     
    356357forall(dtype T1, dtype T2, ... | sized(T1) | sized(T2) | ...)
    357358void layout(size_t* size, size_t* align, size_t* offsets) {
    358         // initialize values
    359359        *size = 0; *align = 1;
    360360        // set up members
     
    374374\end{cfa}
    375375
    376 Results of layout function calls are cached so that they are only computed once per type per function.
     376Results of layout-function calls are cached so that they are only computed once per type per function.
    377377Layout functions also allow generic types to be used in a function definition without reflecting them in the function signature, an important implementation-hiding constraint of the design.
    378378For instance, a function that strips duplicate values from an unsorted !list(T)! likely has a reference to the list as its only explicit parameter, but uses some sort of !set(T)! internally to test for duplicate values.
     
    380380
    381381Whether a type is concrete, dtype-static, or dynamic is decided solely on the basis of the type arguments and !forall! clause type parameters.
    382 This design allows opaque forward declarations of generic types, \eg{} !forall(otype T) struct Box;! like in C, all uses of !Box(T)! can be separately compiled, and callers from other translation units know the proper calling conventions to use.
    383 In an alternate design where the definition of a structure type is included in deciding whether a generic type is dynamic or concrete, some further types may be recognized as dtype-static --- \eg{} !Box! could be defined with a body !{ T* p; }!, and would thus not depend on !T! for its layout.
    384 However, the existence of an !otype! parameter !T! means that !Box! \emph{could} depend on !T! for its layout if this definition is not visible, and we judged preserving separate compilation (and the associated C compatibility) in the implemented design to be an acceptable trade-off.
     382This design allows opaque forward declarations of generic types, \eg{} !forall(otype T) struct Box;! like in C, all uses of !Box(T)! can be separately compiled, and callers from other translation units know the proper calling conventions.
     383In an alternate design, where the definition of a structure type is included in deciding whether a generic type is dynamic or concrete, some further types may be recognized as dtype-static --- \eg{} !Box! could be defined with a body !{ T* p; }!, and would thus not depend on !T! for its layout.
     384However, the existence of an !otype! parameter !T! means that !Box! \emph{could} depend on !T! for its layout if this definition is not visible, and preserving separate compilation (and the associated C compatibility) is a more important design metric.
    385385
    386386\subsection{Applications of Dtype-static Types} \label{dtype-static-sec}
     
    401401Another useful pattern enabled by reused dtype-static type instantiations is zero-cost \emph{tag structures}.
    402402Sometimes, information is only used for type checking and can be omitted at runtime.
    403 In the example below, !scalar! is a dtype-static type; hence, all uses have a single structure definition containing !unsigned long! and can share the same implementations of common functions like !?+?!.
     403In the example below, !scalar! is a dtype-static type; hence, all uses have a single structure definition containing !unsigned long! and can share the same implementations of common functions, like !?+?!.
    404404These implementations may even be separately compiled, unlike \CC{} template functions.
    405405However, the \CFA{} type checker ensures matching types are used by all calls to !?+?!, preventing nonsensical computations like adding a length to a volume.
     
    422422\section{Performance Experiments} \label{generic-performance-sec}
    423423
    424 To validate the practicality of this generic type design I have conducted microbenchmark-based tests against a number of comparable code designs in C and \CC{}, first published in \cite{Moss18}.
    425 Since all these languages are compiled with the same compiler backend and share a subset essentially comprising standard C, maximal-performance benchmarks should show little runtime variance, differing only in length and clarity of source code.
     424To validate the practicality of this generic type design, microbenchmark-based tests were conducted against a number of comparable code designs in C and \CC{}, first published in \cite{Moss18}.
     425Since all these languages are all C-based and compiled with the same compiler backend, maximal-performance benchmarks should show little runtime variance, differing only in length and clarity of source code.
    426426A more illustrative comparison measures the costs of idiomatic usage of each language's features.
    427427The code below shows the \CFA{} benchmark tests for a generic stack based on a singly-linked list; the test suite is equivalent for the other other languages.
     
    429429
    430430\begin{cfa}
     431#define N 4000000
    431432int main() {
    432433        int max = 0, val = 42;
Note: See TracChangeset for help on using the changeset viewer.