Changes in / [560812b:a32346b]


Ignore:
Location:
doc/theses/aaron_moss_PhD/phd
Files:
5 added
5 edited

Legend:

Unmodified
Added
Removed
  • doc/theses/aaron_moss_PhD/phd/.gitignore

    r560812b ra32346b  
    11templates/
     2code/a.out
    23thesis.pdf
    34thesis.aux
  • doc/theses/aaron_moss_PhD/phd/Makefile

    r560812b ra32346b  
    1717introduction \
    1818background \
     19generic-types \
    1920type-environment \
    2021resolution-heuristics \
  • doc/theses/aaron_moss_PhD/phd/cfa-macros.tex

    r560812b ra32346b  
    4343tabsize=5,                                                                                              % N space tabbing
    4444xleftmargin=\parindentlnth,                                                             % indent code to paragraph indentation
    45 %mathescape=true,                                                                               % LaTeX math escape in CFA code $...$
    4645escapechar=\$,                                                                                  % LaTeX escape in CFA code
    4746keepspaces=true,                                                                                %
  • doc/theses/aaron_moss_PhD/phd/frontpgs.tex

    r560812b ra32346b  
    160160% L I S T   O F   F I G U R E S
    161161% -----------------------------
    162 % \addcontentsline{toc}{chapter}{List of Figures}
    163 % \listoffigures
    164 % \cleardoublepage
    165 % \phantomsection               % allows hyperref to link to the correct page
     162\addcontentsline{toc}{chapter}{List of Figures}
     163\listoffigures
     164\cleardoublepage
     165\phantomsection         % allows hyperref to link to the correct page
    166166
    167167% GLOSSARIES (Lists of definitions, abbreviations, symbols, etc. provided by the glossaries-extra package)
  • doc/theses/aaron_moss_PhD/phd/generic-types.tex

    r560812b ra32346b  
    22\label{generic-chap}
    33
    4 Talk about generic types. Pull from Moss~\etal\cite{Moss18}.
     4A significant shortcoming in standard C is the lack of reusable type-safe abstractions for generic data structures and algorithms.
     5Broadly speaking, there are three approaches to implement abstract data structures in C.
     6One approach is to write bespoke data structures for each context in which they are needed.
     7While this approach is flexible and supports integration with the C type checker and tooling, it is also tedious and error prone, especially for more complex data structures.
     8A second approach is to use !void*!-based polymorphism, \eg{} the C standard library functions !bsearch! and !qsort!, which allow for the reuse of common functionality.
     9However, basing all polymorphism on !void*! eliminates the type checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is otherwise not needed.
     10A third approach to generic code is to use preprocessor macros, which does allow the generated code to be both generic and type checked, but errors in such code may be difficult to locate and debug.
     11Furthermore, writing and using preprocessor macros is unnatural and inflexible.
     12Figure~\ref{bespoke-generic-fig} demonstrates the bespoke approach for a simple linked list with !insert! and !head! operations, while Figure~\ref{void-generic-fig} and Figure~\ref{macro-generic-fig} show the same example using !void*!- and !#define!-based polymorphism, respectively.
     13
     14\begin{figure}
     15        \begin{cfa}
     16                struct int_list { int value; struct int_list* next; };
     17
     18                void int_list_insert( struct int_list** ls, int x ) {
     19                        struct int_list* node = malloc(sizeof(struct int_list));
     20                        node->value = x; node->next = *ls;
     21                        *ls = node;
     22                }
     23
     24                int int_list_head( const struct int_list* ls ) { return ls->value; }
     25
     26                $\C[\textwidth]{// all code must be duplicated for every generic instantiation}$
     27
     28                struct string_list { const char* value; struct string_list* next; };
     29
     30                void string_list_insert( struct string_list** ls, const char* x ) {
     31                        struct string_list* node = malloc(sizeof(struct string_list));
     32                        node->value = x; node->next = *ls;
     33                        *ls = node;
     34                }
     35
     36                const char* string_list_head( const struct string_list* ls )
     37                        { return ls->value; }
     38
     39                $\C[\textwidth]{// use is efficient and idiomatic}$
     40
     41                int main() {
     42                        struct int_list* il = NULL;
     43                        int_list_insert( &il, 42 );
     44                        printf("%d\n", int_list_head(il));
     45                       
     46                        struct string_list* sl = NULL;
     47                        string_list_insert( &sl, "hello" );
     48                        printf("%s\n", string_list_head(sl));
     49                }
     50        \end{cfa}
     51
     52        \caption{Bespoke code for linked list implementation.} \label{bespoke-generic-fig}
     53\end{figure}
     54
     55\begin{figure}
     56        \begin{cfa}
     57                // single code implementation
     58
     59                struct list { void* value; struct list* next; };
     60
     61                $\C[\textwidth]{// internal memory management requires helper functions}$
     62
     63                void list_insert( struct list** ls, void* x, void* (*copy)(void*) ) {
     64                        struct list* node = malloc(sizeof(struct list));
     65                        node->value = copy(x); node->next = *ls;
     66                        *ls = node;
     67                }
     68
     69                void* list_head( const struct list* ls ) { return ls->value; }
     70
     71                $\C[\textwidth]{// helpers duplicated per type}$
     72
     73                void* int_copy(void* x) {
     74                        int* n = malloc(sizeof(int));
     75                        *n = *(int*)x;
     76                        return n;
     77                }
     78
     79                void* string_copy(void* x) { return strdup((const char*)x); }
     80
     81                int main() {
     82                        struct list* il = NULL;
     83                        int i = 42;
     84                        list_insert( &il, &i, int_copy );
     85                        printf("%d\n", *(int*)list_head(il));  $\C[2in]{// unsafe type cast}$
     86                       
     87                        struct list* sl = NULL;
     88                        list_insert( &sl, "hello", string_copy );
     89                        printf("%s\n", (char*)list_head(sl));  $\C[2in]{// unsafe type cast}$
     90                }
     91        \end{cfa}
     92
     93        \caption{\lstinline{void*}-polymorphic code for linked list implementation.} \label{void-generic-fig}
     94\end{figure}
     95
     96\begin{figure}
     97        \begin{cfa}
     98                $\C[\textwidth]{// code is nested in macros}$
     99
     100                #define list(N) N ## _list
     101
     102                #define list_insert(N) N ## _list_insert
     103
     104                #define list_head(N) N ## _list_head
     105
     106                #define define_list(N, T) $\C[0.25in]{ \textbackslash }$
     107                        struct list(N) { T value; struct list(N)* next; }; $\C[0.25in]{ \textbackslash }$
     108                        $\C[0.25in]{ \textbackslash }$
     109                        void list_insert(N)( struct list(N)** ls, T x ) { $\C[0.25in]{ \textbackslash }$
     110                                struct list(N)* node = malloc(sizeof(struct list(N))); $\C[0.25in]{ \textbackslash }$
     111                                node->value = x; node->next = *ls; $\C[0.25in]{ \textbackslash }$
     112                                *ls = node; $\C[0.25in]{ \textbackslash }$
     113                        } $\C[0.25in]{ \textbackslash }$
     114                        $\C[0.25in]{ \textbackslash }$
     115                        T list_head(N)( const struct list(N)* ls ) { return ls->value; }
     116
     117                define_list(int, int); $\C[3in]{// defines int\_list}$
     118                define_list(string, const char*); $\C[3in]{// defines string\_list}$
     119
     120                $\C[\textwidth]{// use is efficient, but syntactically idiosyncratic}$
     121
     122                int main() {
     123                        struct list(int)* il = NULL; $\C[3in]{// does not match compiler-visible name}$
     124                        list_insert(int)( &il, 42 );
     125                        printf("%d\n", list_head(int)(il));
     126                       
     127                        struct list(string)* sl = NULL;
     128                        list_insert(string)( &sl, "hello" );
     129                        printf("%s\n", list_head(string)(sl));
     130                }
     131        \end{cfa}
     132
     133        \caption{Macros for generic linked list implementation.} \label{macro-generic-fig}
     134\end{figure}
     135
     136\CC{}, Java, and other languages use \emph{generic types} to produce type-safe abstract data types.
     137Design and implementation of generic types for \CFA{} is the first major contribution of this thesis, a summary of which is published in \cite{Moss18}.
     138\CFA{} generic types integrate efficiently and naturally with the existing polymorphic functions in \CFA{}, while retaining backward compatibility with C in layout and support for separate compilation.
     139A generic type can be declared in \CFA{} by placing a !forall! specifier on a !struct! or !union! declaration, and instantiated using a parenthesized list of types after the generic name.
     140An example comparable to the C polymorphism examples in Figures~\ref{bespoke-generic-fig}, \ref{void-generic-fig}, and \ref{macro-generic-fig} can be seen in Figure~\ref{cfa-generic-fig} \TODO{test this code}.
     141
     142\begin{figure}
     143        \begin{cfa}
     144                forall(otype T) struct list { T value; list(T)* next; };
     145
     146                $\C[\textwidth]{// single polymorphic implementation of each function}$
     147                $\C[\textwidth]{// overloading reduces need for namespace prefixes}$
     148
     149                forall(otype T) void insert( list(T)** ls, T x ) {
     150                        list(T)* node = alloc();  $\C{// type-inferring alloc}$
     151                        (*node){ x, *ls };  $\C{// concise constructor syntax}$
     152                        *ls = node;
     153                }
     154
     155                forall(otype T) T head( const list(T)* ls ) { return ls->value; }
     156
     157                $\C[\textwidth]{// use is clear and efficient}$
     158
     159                int main() {
     160                        list(int)* il = 0;
     161                        insert( &il, 42 );  $\C{// inferred polymorphic T}$
     162                        printf("%d\n", head(il));
     163                       
     164                        list(const char*)* sl = 0;
     165                        insert( &sl, "hello" );
     166                        printf("%s\n", head(sl));
     167                }
     168        \end{cfa}
     169
     170        \caption{\CFA{} generic linked list implementation.} \label{cfa-generic-fig}
     171\end{figure}
     172
     173\section{Design}
     174
     175Though a number of languages have some implementation of generic types, backward compatibility with both C and existing \CFA{} polymorphism presented some unique design constraints for this project.
     176The guiding principle was to maintain an unsurprising language model for C programmers without compromising runtime efficiency.
     177A key insight for this design was that C already possesses a handful of built-in generic types (\emph{compound types} in the language of the standard\cit{}), notably pointer (!T*!) and array (!T[]!), and that user-definable generics should act similarly.
     178
     179\subsection{Related Work}
     180
     181One approach to the design of generic types is that taken by \CC{} templates\cit{}.
     182The template approach is closely related to the macro-expansion approach to C polymorphism demonstrated in Figure~\ref{macro-generic-fig}, but where the macro-expansion syntax has been given first-class language support.
     183Template expansion has the benefit of generating code with near-optimal runtime efficiency, as distinct optimizations can be applied for each instantiation of the template.
     184On the other hand, template expansion can also lead to significant code bloat, exponential in the worst case\cit{}, and the costs of increased instruction cache pressure at runtime and wasted developer time when compiling cannot be discounted.
     185The most significant restriction of the \CC{} template model is that it breaks separate compilation and C's translation-unit-based encapsulation mechanisms.
     186Because a \CC{} template is not actually code, but rather a sort of ``recipe'' to generate code, template code must be visible at its call site to be used.
     187C code, by contrast, only needs a !struct! or function declaration to call that function or use (by-pointer) values of that type, a desirable property to maintain for \CFA{}.
     188
     189Java\cit{} has another prominent implementation for generic types, based on a significantly different approach than \CC{}.
     190The Java approach has much more in common with the !void*!-polymorphism shown in Figure~\ref{void-generic-fig}; since in Java nearly all data is stored by reference, the Java approach to polymorphic data is to store pointers to arbitrary data and insert type-checked implicit casts at compile-time.
     191This process of \emph{type erasure} has the benefit of allowing a single instantiation of polymorphic code, but relies heavily on Java's object model and garbage collector.
     192To use this model, a more C-like language such as \CFA{} would be required to dynamically allocate internal storage for variables, track their lifetime, and properly clean them up afterward.
     193
     194\TODO{Talk about Go, maybe Rust, Swift, etc. as well; specifically mention ``fat pointer'' polymorphism}
     195
     196\TODO{Talk about Cyclone as well, and why my generics are more powerful}
     197
     198\subsection{\CFA{} Generics}
     199
     200The generic types design in \CFA{} draws inspiration from both \CC{} and Java generics, capturing the better aspects of each.
     201Like \CC{} template types, generic !struct!s and !union!s in \CFA{} have macro-expanded storage layouts, but, like Java generics, \CFA{} generic types can be used with separately-compiled polymorphic functions without requiring either the type or function definition to be visible to the other.
     202The fact that the storage layout of any instantiation of a \CFA{} generic type is identical to that of the monomorphic type produced by simple macro replacement of the generic type parameters is important to provide consistent and predictable runtime performance, and to not impose any undue abstraction penalty on generic code.
     203As an example, consider the following generic type and function \TODO{test this}:
     204
     205\begin{cfa}
     206forall( otype R, otype S ) struct pair { R first; S second; };
     207
     208pair(const char*, int) with_len( const char* s ) {
     209        return (pair(const char*), int){ s, strlen(s) };
     210}
     211\end{cfa}
     212
     213In this example, !with_len! is defined at the same scope as !pair!, but it could be called from any context that can see the definition of !pair! and a declaration of !with_len!.
     214If its return type was !pair(const char*, int)*!, callers of !with_len! would only need the declaration !forall(otype R, otype S) struct pair;! to call it, in accordance with the usual C rules for opaque types.
     215
     216!with_len! is itself a monomorphic function, returning a type that is structurally identical to !struct { const char* first; int second; }!, and as such could be called from C given an appropriate redeclaration and demangling flags.
     217However, the definition of !with_len! depends on a polymorphic function call to the !pair! constructor, which only needs to be written once (in this case, implicitly by the compiler according to the usual \CFA{} constructor generation\cite{Moss18}) and can be re-used for a wide variety of !pair! instantiations.
     218Since the parameters to this polymorphic constructor call are all statically known, compiler inlining can eliminate any runtime overhead of this polymorphic call.
     219
     220\CFA{} deliberately does not support \CC{}-style partial specializations of generic types.
     221A particularly infamous example in the \CC{} standard library is !vector<bool>!, which is represented as a bitstring rather than the array representation of the other !vector! instantiations.
     222Complications from this inconsistency (chiefly the fact that a single bit is not addressable, unlike an array element) make the \CC{} !vector! unpleasant to use in generic contexts due to the break in its public interface.
     223Rather than attempting to plug leaks in the template specialization abstraction with a detailed method interface, \CFA{} takes the more principled position that two types with an unrelated data layout are in fact unrelated types, and should be handled with different code.
     224Of course, to the degree that distinct types are similar enough to share an interface, the \CFA{} !trait! system allows one to be defined, and objects of types implementing that !trait! to be operated on in the same polymorphic functions.
     225
     226Since \CFA{} polymorphic functions can operate over polymorphic generic types, functions over such types can be partially or completely specialized using the usual overload selection rules.
     227As an example, the !with_len! function above could be an optimization of the following more general function:
     228
     229\begin{cfa}
     230forall(otype T, otype I | { I len(T); })
     231pair(T, I) with_len( T s ) {
     232        return (pair(T,I)){ s, len(s) };
     233}
     234\end{cfa}
     235
     236\section{Implementation}
     237
     238% forall constraints on struct/union constrain default constructor (TODO check with Rob)
    5239
    6240% TODO discuss layout function algorithm, application to separate compilation
    7 % TODO put a static const field in for _n_fields for each generic, describe utility for separate compilation
    8 
    9 % TODO mention impetus for zero_t design
    10 
    11 % TODO mention use in tuple-type implementation
     241% TODO put a static const field in for _n_fields for each generic, describe utility for separate compilation (actually no, you need to be able to see the type for it to be sized)
     242
     243% mention that tuples are implemented on top of a per-arity generic type
     244
     245\section{Performance Experiments}
    12246
    13247% TODO pull benchmarks from Moss et al.
     248
     249\section{Future Work}
     250
     251% mention future work adding non-type generic parameters, like ints
     252
     253% taking advantage of generic layout functions to provide field assertions in forall qualifiers
     254
     255% mention packed generic layouts (significantly more complex layout function, but possible)
Note: See TracChangeset for help on using the changeset viewer.