Ignore:
Timestamp:
Apr 26, 2019, 4:59:48 PM (5 years ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
ADT, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
Children:
5b11c25
Parents:
ffe2fad (diff), 1bc5975 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' into ctxswitch

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/aaron_moss_PhD/phd/generic-types.tex

    rffe2fad r673cd63  
    2727                int int_list_head( const struct int_list* ls ) { return ls->value; }
    2828
    29                 $\C[\textwidth]{// all code must be duplicated for every generic instantiation}$
     29                // all code must be duplicated for every generic instantiation
    3030
    3131                struct string_list { const char* value; struct string_list* next; };
     
    4040                        { return ls->value; }
    4141
    42                 $\C[\textwidth]{// use is efficient and idiomatic}$
     42                // use is efficient and idiomatic
    4343
    4444                int main() {
     
    6565                struct list { void* value; struct list* next; };
    6666
    67                 $\C[\textwidth]{// internal memory management requires helper functions}$
     67                // internal memory management requires helper functions
    6868
    6969                void list_insert( struct list** ls, void* x, void* (*copy)(void*) ) {
     
    7575                void* list_head( const struct list* ls ) { return ls->value; }
    7676
    77                 $\C[\textwidth]{// helpers duplicated per type}$
     77                // helpers duplicated per type
    7878
    7979                void* int_copy(void* x) {
     
    105105                #include <stdio.h>  $\C{// for printf}$
    106106
    107                 $\C[\textwidth]{// code is nested in macros}$
     107                // code is nested in macros
    108108
    109109                #define list(N) N ## _list
     
    127127                define_list(string, const char*); $\C[3in]{// defines string\_list}$
    128128
    129                 $\C[\textwidth]{// use is efficient, but syntactically idiosyncratic}$
     129                // use is efficient, but syntactically idiosyncratic
    130130
    131131                int main() {
     
    143143\end{figure}
    144144
    145 \CC{}, Java, and other languages use \emph{generic types} to produce type-safe abstract data types.
    146 Design and implementation of generic types for \CFA{} is the first major contribution of this thesis, a summary of which is published in \cite{Moss18} and from which this chapter is closely based.
     145\CC{}, Java, and other languages use \emph{generic types} (or \emph{parameterized types}) to produce type-safe abstract data types.
     146Design and implementation of generic types for \CFA{} is the first major contribution of this thesis, a summary of which is published in \cite{Moss18} and on which this chapter is closely based.
    147147\CFA{} generic types integrate efficiently and naturally with the existing polymorphic functions in \CFA{}, while retaining backward compatibility with C in layout and support for separate compilation.
    148148A generic type can be declared in \CFA{} by placing a !forall! specifier on a !struct! or !union! declaration, and instantiated using a parenthesized list of types after the generic name.
     
    156156                forall(otype T) struct list { T value; list(T)* next; };
    157157
    158                 $\C[\textwidth]{// single polymorphic implementation of each function}$
    159                 $\C[\textwidth]{// overloading reduces need for namespace prefixes}$
     158                // single polymorphic implementation of each function
     159                // overloading reduces need for namespace prefixes
    160160
    161161                forall(otype T) void insert( list(T)** ls, T x ) {
     
    167167                forall(otype T) T head( const list(T)* ls ) { return ls->value; }
    168168
    169                 $\C[\textwidth]{// use is clear and efficient}$
     169                // use is clear and efficient
    170170
    171171                int main() {
     
    189189A key insight for this design is that C already possesses a handful of built-in generic types (\emph{derived types} in the language of the standard \cite[\S{}6.2.5]{C11}), notably pointer (!T*!) and array (!T[]!), and that user-definable generics should act similarly.
    190190
    191 \subsection{Related Work}
     191\subsection{Related Work} \label{generic-related-sec}
    192192
    193193One approach to the design of generic types is that taken by \CC{} templates \cite{C++}.
     
    202202Java \cite{Java8} has another prominent implementation for generic types, introduced in Java~5 and based on a significantly different approach than \CC{}.
    203203The Java approach has much more in common with the !void*!-polymorphism shown in Figure~\ref{void-generic-fig}; since in Java nearly all data is stored by reference, the Java approach to polymorphic data is to store pointers to arbitrary data and insert type-checked implicit casts at compile-time.
    204 This process of \emph{type erasure} has the benefit of allowing a single instantiation of polymorphic code, but relies heavily on Java's object model and garbage collector.
     204This process of \emph{type erasure} has the benefit of allowing a single instantiation of polymorphic code, but relies heavily on Java's object model.
    205205To use this model, a more C-like language such as \CFA{} would be required to dynamically allocate internal storage for variables, track their lifetime, and properly clean them up afterward.
    206206
     
    277277A \emph{dtype-static} type has polymorphic parameters but is still concrete.
    278278Polymorphic pointers are an example of dtype-static types; given some type variable !T!, !T! is a polymorphic type, but !T*! has a fixed size and can therefore be represented by a !void*! in code generation.
    279 In particular, generic types where all parameters are un-!sized! (\ie{} they do not conform to the built-in !sized! trait because the compiler does not know their size and alignment) are always concrete, as there is no possibility for their layout to vary based on type parameters of unknown size and alignment.
     279In particular, generic types where all parameters are un-!sized! (\ie{} they do not conform to the built-in !sized! trait, which is satisfied by all types the compiler knows the size and alignment of) are always concrete, as there is no possibility for their layout to vary based on type parameters of unknown size and alignment.
    280280More precisely, a type is concrete if and only if all of its !sized! type parameters are concrete, and a concrete type is dtype-static if any of its type parameters are (possibly recursively) polymorphic.
    281281To illustrate, the following code using the !pair! type from above has each use of !pair! commented with its class:
     
    333333\end{cfa}
    334334
    335 Here, !_assign_T! is passed in as an implicit parameter from !otype T! and takes two !T*! (!void*! in the generated code), a destination and a source, and !_retval! is the pointer to a caller-allocated buffer for the return value, the usual \CFA{} method to handle dynamically-sized return types.
     335Here, !_assign_T! is passed in as an implicit parameter from !otype T! and takes two !T*! (!void*! in the generated code\footnote{A GCC extension allows arithmetic on \lstinline{void*}, calculated as if \lstinline{sizeof(void) == 1}.}), a destination and a source, and !_retval! is the pointer to a caller-allocated buffer for the return value, the usual \CFA{} method to handle dynamically-sized return types.
    336336!_offsetof_pair! is the offset array passed into !value!; this array is statically generated at the call site as:
    337337
     
    350350Similarly, the layout function can only safely be called from a context where the generic type definition is visible, because otherwise the caller does not know how large to allocate the array of member offsets.
    351351
    352 The C standard does not specify a memory layout for structs, but the POSIX ABI for x86 \cite{POSIX08} does; this memory layout is common for C implementations, but is a platform-specific issue for porting \CFA{}.
     352The C standard does not specify a memory layout for structs, but the System V ABI \cite{SysVABI} does; compatibility with this standard is sufficient for \CFA{}'s currently-supported architectures, though future ports may require different layout-function generation algorithms.
    353353This algorithm, sketched below in pseudo-\CFA{}, is a straightforward mapping of consecutive fields into the first properly-aligned offset in the !struct! layout; layout functions for !union! types omit the offset array and simply calculate the maximum size and alignment over all union variants.
    354354Since \CFACC{} generates a distinct layout function for each type, constant-folding and loop unrolling are applied.
     
    423423
    424424To validate the practicality of this generic type design, microbenchmark-based tests were conducted against a number of comparable code designs in C and \CC{}, first published in \cite{Moss18}.
    425 Since all these languages are all C-based and compiled with the same compiler backend, maximal-performance benchmarks should show little runtime variance, differing only in length and clarity of source code.
     425Since these languages are all C-based and compiled with the same compiler backend, maximal-performance benchmarks should show little runtime variance, differing only in length and clarity of source code.
    426426A more illustrative comparison measures the costs of idiomatic usage of each language's features.
    427 The code below shows the \CFA{} benchmark tests for a generic stack based on a singly-linked list; the test suite is equivalent for the other other languages, code for which is included in Appendix~\ref{generic-bench-app}.
    428 The experiment uses element types !int! and !pair(short, char)! and pushes $N = 40M$ elements on a generic stack, copies the stack, clears one of the stacks, and finds the maximum value in the other stack.
     427The code below shows the \CFA{} benchmark tests for a generic stack based on a singly-linked list; the test suite is equivalent for the other languages, code for which is included in Appendix~\ref{generic-bench-app}.
     428The experiment uses element types !int! and !pair(short, char)! and pushes $N = 4M$ elements on a generic stack, copies the stack, clears one of the stacks, and finds the maximum value in the other stack.
    429429
    430430\begin{cfa}
     
    451451
    452452The four versions of the benchmark implemented are C with !void*!-based polymorphism, \CFA{} with parametric polymorphism, \CC{} with templates, and \CC{} using only class inheritance for polymorphism, denoted \CCV{}.
    453 The \CCV{} variant illustrates an alternative object-oriented idiom where all objects inherit from a base !object! class, mimicking a Java-like interface; in particular, runtime checks are necessary to safely downcast objects.
     453The \CCV{} variant illustrates an alternative object-oriented idiom where all objects inherit from a base !object! class, a language design similar to Java 4; in particular, runtime checks are necessary to safely downcast objects.
    454454The most notable difference among the implementations is the memory layout of generic types: \CFA{} and \CC{} inline the stack and pair elements into corresponding list and pair nodes, while C and \CCV{} lack such capability and, instead, must store generic objects via pointers to separately allocated objects.
    455455Note that the C benchmark uses unchecked casts as C has no runtime mechanism to perform such checks, whereas \CFA{} and \CC{} provide type safety statically.
     
    481481
    482482The C and \CCV{} variants are generally the slowest and have the largest memory footprint, due to their less-efficient memory layout and the pointer indirection necessary to implement generic types in those languages; this inefficiency is exacerbated by the second level of generic types in the pair benchmarks.
    483 By contrast, the \CFA{} and \CC{} variants run in roughly equivalent time for both the integer and pair because of the equivalent storage layout, with the inlined libraries (\ie{} no separate compilation) and greater maturity of the \CC{} compiler contributing to its lead.
     483By contrast, the \CFA{} and \CC{} variants run in noticeably less time for both the integer and pair because of the equivalent storage layout, with the inlined libraries (\ie{} no separate compilation) and greater maturity of the \CC{} compiler contributing to its lead.
    484484\CCV{} is slower than C largely due to the cost of runtime type checking of downcasts (implemented with !dynamic_cast!); the outlier for \CFA{}, pop !pair!, results from the complexity of the generated-C polymorphic code.
    485485The gcc compiler is unable to optimize some dead code and condense nested calls; a compiler designed for \CFA{} could more easily perform these optimizations.
    486 Finally, the binary size for \CFA{} is larger because of static linking with \CFA{} libraries.
     486Finally, the binary size for \CFA{} is larger because of static linking with the \CFA{} prelude library, which includes function definitions for all the built-in operators.
    487487
    488488\CFA{} is also competitive in terms of source code size, measured as a proxy for programmer effort.
Note: See TracChangeset for help on using the changeset viewer.