Changeset 3db60cb for doc/generic_types


Ignore:
Timestamp:
Apr 6, 2017, 3:39:39 PM (8 years ago)
Author:
Thierry Delisle <tdelisle@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
3a48e283, b0fedd4
Parents:
03bb816 (diff), 115a868c (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Location:
doc/generic_types
Files:
11 added
2 edited

Legend:

Unmodified
Added
Removed
  • TabularUnified doc/generic_types/acmart.cls

    r03bb816 r3db60cb  
    354354\let\@footnotemark@nolink\@footnotemark
    355355\let\@footnotetext@nolink\@footnotetext
    356 \RequirePackage[bookmarksnumbered]{hyperref}
     356\RequirePackage[bookmarksnumbered,breaklinks]{hyperref}
    357357\urlstyle{rm}
    358358\ifcase\ACM@format@nr
  • TabularUnified doc/generic_types/generic_types.tex

    r03bb816 r3db60cb  
    2828\newcommand{\CCseventeen}{\rm C\kern-.1em\hbox{+\kern-.25em+}17\xspace} % C++17 symbolic name
    2929\newcommand{\CCtwenty}{\rm C\kern-.1em\hbox{+\kern-.25em+}20\xspace} % C++20 symbolic name
    30 
    31 \newcommand{\TODO}[1]{\textbf{TODO}: #1} % TODO included
     30\newcommand{\CS}{C\raisebox{-0.7ex}{\Large$^\sharp$}\xspace}
     31\newcommand{\Textbf}[1]{{\color{red}\textbf{#1}}}
     32
     33\newcommand{\TODO}[1]{\textbf{TODO}: {\itshape #1}} % TODO included
    3234%\newcommand{\TODO}[1]{} % TODO elided
    3335\newcommand{\eg}{\textit{e}.\textit{g}.,\xspace}
     
    4951stringstyle=\tt,                                                                                % use typewriter font
    5052tabsize=4,                                                                                              % 4 space tabbing
    51 xleftmargin=\parindent,                                                                 % indent code to paragraph indentation
     53xleftmargin=\parindentlnth,                                                             % indent code to paragraph indentation
    5254%mathescape=true,                                                                               % LaTeX math escape in CFA code $...$
    5355escapechar=\$,                                                                                  % LaTeX escape in CFA code
     
    9092}
    9193
    92 \terms{generic, tuple, types}
    93 \keywords{generic types, tuple types, polymorphic functions, C, Cforall}
     94\terms{generic, tuple, variadic, types}
     95\keywords{generic types, tuple types, variadic types, polymorphic functions, C, Cforall}
    9496
    9597\begin{CCSXML}
     
    118120
    119121\begin{abstract}
    120 The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects. This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more. Nonetheless, C, first standardized over thirty years ago, lacks many features that make programming in more modern languages safer and more productive. The goal of the \CFA project is to create an extension of C that provides modern safety and productivity features while still ensuring strong backwards compatibility with C and its programmers. Prior projects have attempted similar goals but failed to honour C programming-style; for instance, adding object-oriented or functional programming with garbage collection is a non-starter for many C developers. Specifically, \CFA is designed to have an orthogonal feature-set based closely on the C programming paradigm, so that \CFA features can be added \emph{incrementally} to existing C code-bases, and C programmers can learn \CFA extensions on an as-needed basis, preserving investment in existing code and engineers. This paper describes only two \CFA extensions, generic and tuple types, and how they are implemented in accordance with these principles.
     122The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects. This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more. Nonetheless, C, first standardized over thirty years ago, lacks many features that make programming in more modern languages safer and more productive. The goal of the \CFA project is to create an extension of C that provides modern safety and productivity features while still ensuring strong backwards compatibility with C and its programmers. Prior projects have attempted similar goals but failed to honour C programming-style; for instance, adding object-oriented or functional programming with garbage collection is a non-starter for many C developers. Specifically, \CFA is designed to have an orthogonal feature-set based closely on the C programming paradigm, so that \CFA features can be added \emph{incrementally} to existing C code-bases, and C programmers can learn \CFA extensions on an as-needed basis, preserving investment in existing code and engineers. This paper describes two \CFA extensions, generic and tuple types, details how their design avoids shortcomings of similar features in C and other C-like languages, and presents experimental results validating the design.
    121123\end{abstract}
    122124
     
    124126\maketitle
    125127
    126 \section{Introduction \& Background}
    127 
    128 \CFA\footnote{Pronounced ``C-for-all'', and written \CFA or Cforall.} is an evolutionary extension of the C programming language that aims to add modern language features to C while maintaining both source compatibility with C and a familiar programming model for programmers. Four key design goals were set out in the original design of \CFA~\citep{Bilson03}:
    129 \begin{enumerate}
    130 \item The behaviour of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler.
    131 \item Standard C code must be as fast and as small when translated by a \CFA compiler as when translated by a C compiler.
    132 \item \CFA code must be at least as portable as standard C code.
    133 \item Extensions introduced by \CFA must be translated in the most efficient way possible.
    134 \end{enumerate}
    135 These goals ensure existing C code-bases can be converted to \CFA incrementally and with minimal effort, and C programmers can productively generate \CFA code without training beyond the features they wish to employ. In its current implementation, \CFA is compiled by translating it to the GCC-dialect of C~\citep{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by GCC, meeting goals (1)-(3). Ultimately, a compiler is necessary for advanced features and optimal performance.
    136 
    137 \CFA has been previously extended with polymorphic functions and name overloading (including operator overloading) by \citet{Bilson03}, and deterministically-executed constructors and destructors by \citet{Schluntz17}. This paper describes how generic and tuple types are designed and implemented in \CFA in accordance with both the backward compatibility goals and existing features described above.
     128
     129\section{Introduction and Background}
     130
     131The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects. This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more.
     132The \citet{TIOBE} ranks the top 5 most popular programming languages as: Java 16\%, \Textbf{C 7\%}, \Textbf{\CC 5\%}, \CS 4\%, Python 4\% = 36\%, where the next 50 languages are less than 3\% each with a long tail. The top 3 rankings over the past 30 years are:
     133\lstDeleteShortInline@
     134\begin{center}
     135\setlength{\tabcolsep}{10pt}
     136\begin{tabular}{@{}r|c|c|c|c|c|c|c@{}}
     137                & 2017  & 2012  & 2007  & 2002  & 1997  & 1992  & 1987          \\
     138\hline
     139Java    & 1             & 1             & 1             & 3             & 13    & -             & -                     \\
     140\hline
     141\Textbf{C}      & \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{1}& \Textbf{1}& \Textbf{1}& \Textbf{1}    \\
     142\hline
     143\CC             & 3             & 3             & 3             & 3             & 2             & 2             & 4                     \\
     144\end{tabular}
     145\end{center}
     146\lstMakeShortInline@
     147Love it or hate it, C is extremely popular, highly used, and one of the few system's languages.
     148In many cases, \CC is often used solely as a better C.
     149Nonetheless, C, first standardized over thirty years ago, lacks many features that make programming in more modern languages safer and more productive.
     150
     151\CFA (pronounced ``C-for-all'', and written \CFA or Cforall) is an evolutionary extension of the C programming language that aims to add modern language features to C while maintaining both source compatibility with C and a familiar programming model for programmers. The four key design goals for \CFA~\citep{Bilson03} are:
     152(1) The behaviour of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler;
     153(2) Standard C code must be as fast and as small when translated by a \CFA compiler as when translated by a C compiler;
     154(3) \CFA code must be at least as portable as standard C code;
     155(4) Extensions introduced by \CFA must be translated in the most efficient way possible.
     156These goals ensure existing C code-bases can be converted to \CFA incrementally with minimal effort, and C programmers can productively generate \CFA code without training beyond the features being used. In its current implementation, \CFA is compiled by translating it to the GCC-dialect of C~\citep{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by GCC, meeting goals (1)-(3). Ultimately, a compiler is necessary for advanced features and optimal performance.
     157
     158This paper identifies shortcomings in existing approaches to generic and variadic data types in C-like languages and presents a design for generic and variadic types avoiding those shortcomings. Specifically, the solution is both reusable and type-checked, as well as conforming to the design goals of \CFA with ergonomic use of existing C abstractions. The new constructs are empirically compared with both standard C and \CC; the results show the new design is comparable in performance.
    138159
    139160
     
    141162\label{sec:poly-fns}
    142163
    143 \CFA's polymorphism was originally formalized by \citet{Ditchfield92}, and first implemented by \citet{Bilson03}. The signature feature of \CFA is parametric-polymorphic functions; such functions are written using a @forall@ clause (which gives the language its name):
     164\CFA's polymorphism was originally formalized by \citet{Ditchfield92}, and first implemented by \citet{Bilson03}. The signature feature of \CFA is parametric-polymorphic functions where functions are generalized using a @forall@ clause (giving the language its name):
    144165\begin{lstlisting}
    145166`forall( otype T )` T identity( T val ) { return val; }
    146167int forty_two = identity( 42 );                         $\C{// T is bound to int, forty\_two == 42}$
    147168\end{lstlisting}
    148 The @identity@ function above can be applied to any complete object-type (or ``@otype@''). The type variable @T@ is transformed into a set of additional implicit parameters to @identity@ that encode sufficient information about @T@ to create and return a variable of that type. The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, \eg for a pointer, the type parameter can be declared as @dtype T@, where @dtype@ is short for ``data type''.
    149 
    150 Here, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; preliminary experiments have shown this overhead to be similar to \CC virtual function calls. An advantage of this design is that, unlike \CC template functions, \CFA @forall@ functions are compatible with C separate compilation.
    151 
    152 Since bare polymorphic-types provide only a narrow set of available operations, \CFA provides a \emph{type assertion} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type variable. For instance, @twice@ can be defined using the \CFA syntax for operator overloading:
    153 \begin{lstlisting}
    154 forall( otype T | { T `?`+`?`(T, T); } )        $\C{// ? denotes operands}$
    155   T twice( T x ) { return x + x; }                      $\C{// (2)}$
     169The @identity@ function above can be applied to any complete \emph{object type} (or @otype@). The type variable @T@ is transformed into a set of additional implicit parameters encoding sufficient information about @T@ to create and return a variable of that type. The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, \eg for a pointer, the type parameter can be declared as a \emph{data type} (or @dtype@).
     170
     171Here, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; preliminary experiments have shown this overhead is similar to \CC virtual function calls. An advantage of this design is that, unlike \CC template functions, \CFA polymorphic functions are compatible with C \emph{separate} compilation, preventing code bloat.
     172
     173Since bare polymorphic-types provide only a narrow set of available operations, \CFA provides a \emph{type assertion} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type-variable. For example, the function @twice@ can be defined using the \CFA syntax for operator overloading:
     174\begin{lstlisting}
     175forall( otype T `| { T ?+?(T, T); }` ) T twice( T x ) { return x + x; } $\C{// ? denotes operands}$
    156176int val = twice( twice( 3.7 ) );
    157177\end{lstlisting}
    158 which works for any type @T@ with an addition operator defined. The translator accomplishes this polymorphism by creating a wrapper function for calling @+@ with @T@ bound to @double@, then providing this function to the first call of @twice@. It then has the option of using the same @twice@ again and converting the result to @int@ on assignment, or creating another @twice@ with type parameter @T@ bound to @int@ because \CFA uses the return type in its type analysis. The first approach has a late conversion from integer to floating-point on the final assignment, while the second has an eager conversion to integer. \CFA minimizes the number of conversions and their potential to lose information, so it selects the first approach.
    159 
    160 Monomorphic specializations of polymorphic functions can satisfy polymorphic type-assertions.
    161 % \begin{lstlisting}
    162 % forall(otype T `| { T twice(T); }`)           $\C{// type assertion}$
    163 % T four_times(T x) { return twice( twice(x) ); }
    164 % double twice(double d) { return d * 2.0; }    $\C{// (1)}$
    165 % double magic = four_times(10.5);                      $\C{// T bound to double, uses (1) to satisfy type assertion}$
    166 % \end{lstlisting}
    167 \begin{lstlisting}
    168 forall( otype T `| { int ?<?( T, T ); }` )      $\C{// type assertion}$
    169   void qsort( const T * arr, size_t size );
    170 forall( otype T `| { int ?<?( T, T ); }` )      $\C{// type assertion}$
    171   T * bsearch( T key, const T * arr, size_t size );
    172 double vals[10] = { /* 10 floating-point values */ };
    173 qsort( vals, 10 );                                                      $\C{// sort array}$
    174 double * val = bsearch( 5.0, vals, 10 );        $\C{// binary search sorted array for key}$
    175 \end{lstlisting}
    176 @qsort@ and @bsearch@ can only be called with arguments for which there exists a function named @<@ taking two arguments of the same type and returning an @int@ value.
    177 Here, the built-in monomorphic specialization of @<@ for type @double@ is passed as an additional implicit parameter to the calls of @qsort@ and @bsearch@.
    178 
    179 Crucial to the design of a new programming language are the libraries to access thousands of external features.
    180 \CFA inherits a massive compatible library-base, where other programming languages have to rewrite or provide fragile inter-language communication with C.
    181 A simple example is leveraging the existing type-unsafe (@void *@) C @bsearch@, shown here searching a floating-point array:
     178which works for any type @T@ with a matching addition operator. The polymorphism is achieved by creating a wrapper function for calling @+@ with @T@ bound to @double@, then passing this function to the first call of @twice@. There is now the option of using the same @twice@ and converting the result to @int@ on assignment, or creating another @twice@ with type parameter @T@ bound to @int@ because \CFA uses the return type~\cite{Ada} in its type analysis. The first approach has a late conversion from @int@ to @double@ on the final assignment, while the second has an eager conversion to @int@. \CFA minimizes the number of conversions and their potential to lose information, so it selects the first approach, which corresponds with C-programmer intuition.
     179
     180Crucial to the design of a new programming language are the libraries to access thousands of external software features.
     181\CFA inherits a massive compatible library-base, where other programming languages must rewrite or provide fragile inter-language communication with C.
     182A simple example is leveraging the existing type-unsafe (@void *@) C @bsearch@ to binary search a sorted floating-point array:
    182183\begin{lstlisting}
    183184void * bsearch( const void * key, const void * base, size_t nmemb, size_t size,
     
    185186int comp( const void * t1, const void * t2 ) { return *(double *)t1 < *(double *)t2 ? -1 :
    186187                                *(double *)t2 < *(double *)t1 ? 1 : 0; }
     188double vals[10] = { /* 10 floating-point values */ };
    187189double key = 5.0;
    188 double * val = (double *)bsearch( &key, vals, size, sizeof(vals[0]), comp );
    189 \end{lstlisting}
    190 but providing a type-safe \CFA overloaded wrapper.
     190double * val = (double *)bsearch( &key, vals, 10, sizeof(vals[0]), comp );      $\C{// search sorted array}$
     191\end{lstlisting}
     192which can be augmented simply with a generalized, type-safe, \CFA-overloaded wrappers:
    191193\begin{lstlisting}
    192194forall( otype T | { int ?<?( T, T ); } ) T * bsearch( T key, const T * arr, size_t size ) {
    193195        int comp( const void * t1, const void * t2 ) { /* as above with double changed to T */ }
    194         return (T *)bsearch( &key, arr, size, sizeof(T), comp );
    195 }
     196        return (T *)bsearch( &key, arr, size, sizeof(T), comp ); }
    196197forall( otype T | { int ?<?( T, T ); } ) unsigned int bsearch( T key, const T * arr, size_t size ) {
    197198        T *result = bsearch( key, arr, size );  $\C{// call first version}$
    198         return result ? result - arr : size;            $\C{// pointer subtraction includes sizeof(T)}$
    199 }
     199        return result ? result - arr : size; }  $\C{// pointer subtraction includes sizeof(T)}$
    200200double * val = bsearch( 5.0, vals, 10 );        $\C{// selection based on return type}$
    201201int posn = bsearch( 5.0, vals, 10 );
    202202\end{lstlisting}
    203203The nested routine @comp@ provides the hidden interface from typed \CFA to untyped (@void *@) C, plus the cast of the result.
    204 As well, an alternate kind of return is made available, position versus pointer to found element.
     204As well, an alternate kind of return is made available: position versus pointer to found element.
    205205\CC's type-system cannot disambiguate between the two versions of @bsearch@ because it does not use the return type in overload resolution, nor can \CC separately compile a templated @bsearch@.
    206206
    207 Call-site inferencing and nested functions provide a localized form of inheritance. For example, @qsort@ only sorts in ascending order using @<@. However, it is trivial to locally change this behaviour:
    208 \begin{lstlisting}
    209 {   int ?<?( double x, double y ) { return x `>` y; }   $\C{// override behaviour}$
     207\CFA has replacement libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions, all without rewriting the actual computations.
     208For example, it is possible to write a type-safe \CFA wrapper @malloc@ based on the C @malloc@:
     209\begin{lstlisting}
     210forall( dtype T | sized(T) ) T * malloc( void ) { return (T *)(void *)malloc( (size_t)sizeof(T) ); }
     211int * ip = malloc();                                            $\C{// select type and size from left-hand side}$
     212double * dp = malloc();
     213struct S {...} * sp = malloc();
     214\end{lstlisting}
     215where the return type supplies the type/size of the allocation, which is impossible in most type systems.
     216
     217Call-site inferencing and nested functions provide a localized form of inheritance. For example, the \CFA @qsort@ only sorts in ascending order using @<@. However, it is trivial to locally change this behaviour:
     218\begin{lstlisting}
     219forall( otype T | { int ?<?( T, T ); } ) void qsort( const T * arr, size_t size ) { /* use C qsort */ }
     220{       int ?<?( double x, double y ) { return x `>` y; }       $\C{// locally override behaviour}$
    210221        qsort( vals, size );                                    $\C{// descending sort}$
    211222}
    212223\end{lstlisting}
    213224Within the block, the nested version of @<@ performs @>@ and this local version overrides the built-in @<@ so it is passed to @qsort@.
    214 Hence, programmers can easily form new local environments to maximize reuse of existing functions and types.
    215 
    216 Finally, variables may be overloaded:
     225Hence, programmers can easily form a local environments, adding and modifying appropriate functions, to maximize reuse of other existing functions and types.
     226
     227Finally, \CFA allows variable overloading:
    217228\lstDeleteShortInline@
    218229\par\smallskip
     
    233244\smallskip\par\noindent
    234245Hence, the single name @MAX@ replaces all the C type-specific names: @SHRT_MAX@, @INT_MAX@, @DBL_MAX@.
     246As well, restricted constant overloading is allowed for the values @0@ and @1@, which have special status in C, \eg the value @0@ is both an integer and a pointer literal, so its meaning depends on context.
     247In addition, several operations are defined in terms values @0@ and @1@.
     248For example,
     249\begin{lstlisting}
     250int x;
     251if (x)        // if (x != 0)
     252        x++;    //   x += 1;
     253\end{lstlisting}
     254Every if statement in C compares the condition with @0@, and every increment and decrement operator is semantically equivalent to adding or subtracting the value @1@ and storing the result.
     255Due to these rewrite rules, the values @0@ and @1@ have the types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly connect to all special @0@ and @1@ contexts.
     256The types @zero_t@ and @one_t@ have special built in implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work as normal.
     257
    235258
    236259\subsection{Traits}
    237260
    238 \CFA provides \emph{traits} to name a group of type assertions:
    239 % \begin{lstlisting}
    240 % trait has_magnitude(otype T) {
    241 %     _Bool ?<?(T, T);                                          $\C{// comparison operator for T}$
    242 %     T -?(T);                                                          $\C{// negation operator for T}$
    243 %     void ?{}(T*, zero_t);                                     $\C{// constructor from 0 literal}$
    244 % };
    245 % forall(otype M | has_magnitude(M))
    246 % M abs( M m ) {
    247 %     M zero = { 0 };                                                   $\C{// uses zero\_t constructor from trait}$
    248 %     return m < zero ? -m : m;
    249 % }
    250 % forall(otype M | has_magnitude(M))
    251 % M max_magnitude( M a, M b ) {
    252 %     return abs(a) < abs(b) ? b : a;
    253 % }
    254 % \end{lstlisting}
     261\CFA provides \emph{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration:
    255262\begin{lstlisting}
    256263trait summable( otype T ) {
    257         void ?{}(T*, zero_t);                                   $\C{// constructor from 0 literal}$
     264        void ?{}( T *, zero_t );                                $\C{// constructor from 0 literal}$
    258265        T ?+?( T, T );                                                  $\C{// assortment of additions}$
    259266        T ?+=?( T *, T );
    260267        T ++?( T * );
    261         T ?++( T * );
    262 };
    263 forall( otype T | summable( T ) )
    264   T sum( T a[$\,$], size_t size ) {
    265         T total = { 0 };                                                $\C{// instantiate T from 0}$
     268        T ?++( T * ); };
     269forall( otype T `| summable( T )` ) T sum( T a[$\,$], size_t size ) {  // use trait
     270        `T` total = { `0` };                                    $\C{// instantiate T from 0 by calling its constructor}$
    266271        for ( unsigned int i = 0; i < size; i += 1 )
    267                 total += a[i];                                          $\C{// select appropriate +}$
    268         return total;
    269 }
    270 \end{lstlisting}
    271 The trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration.
     272                total `+=` a[i];                                        $\C{// select appropriate +}$
     273        return total; }
     274\end{lstlisting}
    272275
    273276In fact, the set of operators is incomplete, \eg no assignment, but @otype@ is syntactic sugar for the following implicit trait:
    274277\begin{lstlisting}
    275 trait otype( dtype T | sized(T) ) {
    276         // sized is a compiler-provided pseudo-trait for types with known size and alignment}
     278trait otype( dtype T | sized(T) ) {  // sized is a pseudo-trait for types with known size and alignment
    277279        void ?{}( T * );                                                $\C{// default constructor}$
    278280        void ?{}( T *, T );                                             $\C{// copy constructor}$
    279281        void ?=?( T *, T );                                             $\C{// assignment operator}$
    280         void ^?{}( T * );                                               $\C{// destructor}$
    281 };
    282 \end{lstlisting}
    283 Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete struct type -- they can be stack-allocated using the @alloca@ compiler builtin, default or copy-initialized, assigned, and deleted. As an example, the @sum@ function produces generated code something like the following (simplified for clarity and brevity)\TODO{fix example, maybe elide, it's likely too long with the more complicated function}:
    284 \begin{lstlisting}
    285 void abs( size_t _sizeof_M, size_t _alignof_M,
    286                 void (*_ctor_M)(void*), void (*_copy_M)(void*, void*),
    287                 void (*_assign_M)(void*, void*), void (*_dtor_M)(void*),
    288                 _Bool (*_lt_M)(void*, void*), void (*_neg_M)(void*, void*),
    289         void (*_ctor_M_zero)(void*, int),
    290                 void* m, void* _rtn ) {                         $\C{// polymorphic parameter and return passed as void*}$
    291                                                                                         $\C{// M zero = { 0 };}$
    292         void* zero = alloca(_sizeof_M);                 $\C{// stack allocate zero temporary}$
    293         _ctor_M_zero(zero, 0);                                  $\C{// initialize using zero\_t constructor}$
    294                                                                                         $\C{// return m < zero ? -m : m;}$
    295         void *_tmp = alloca(_sizeof_M);
    296         _copy_M( _rtn,                                                  $\C{// copy-initialize return value}$
    297                 _lt_M( m, zero ) ?                                      $\C{// check condition}$
    298                  (_neg_M(m, _tmp), _tmp) :                      $\C{// negate m}$
    299                  m);
    300         _dtor_M(_tmp); _dtor_M(zero);                   $\C{// destroy temporaries}$
    301 }
    302 \end{lstlisting}
    303 
    304 Semantically, traits are simply a named lists of type assertions, but they may be used for many of the same purposes that interfaces in Java or abstract base classes in \CC are used for. Unlike Java interfaces or \CC base classes, \CFA types do not explicitly state any inheritance relationship to traits they satisfy; this can be considered a form of structural inheritance, similar to implementation of an interface in Go, as opposed to the nominal inheritance model of Java and \CC. Nominal inheritance can be simulated with traits using marker variables or functions:
     282        void ^?{}( T * ); };                                    $\C{// destructor}$
     283\end{lstlisting}
     284Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete type: stack-allocatable, default or copy-initialized, assigned, and deleted.
     285% As an example, the @sum@ function produces generated code something like the following (simplified for clarity and brevity)\TODO{fix example, maybe elide, it's likely too long with the more complicated function}:
     286% \begin{lstlisting}
     287% void abs( size_t _sizeof_M, size_t _alignof_M,
     288%               void (*_ctor_M)(void*), void (*_copy_M)(void*, void*),
     289%               void (*_assign_M)(void*, void*), void (*_dtor_M)(void*),
     290%               _Bool (*_lt_M)(void*, void*), void (*_neg_M)(void*, void*),
     291%       void (*_ctor_M_zero)(void*, int),
     292%               void* m, void* _rtn ) {                         $\C{// polymorphic parameter and return passed as void*}$
     293%                                                                                       $\C{// M zero = { 0 };}$
     294%       void* zero = alloca(_sizeof_M);                 $\C{// stack allocate zero temporary}$
     295%       _ctor_M_zero(zero, 0);                                  $\C{// initialize using zero\_t constructor}$
     296%                                                                                       $\C{// return m < zero ? -m : m;}$
     297%       void *_tmp = alloca(_sizeof_M);
     298%       _copy_M( _rtn,                                                  $\C{// copy-initialize return value}$
     299%               _lt_M( m, zero ) ?                                      $\C{// check condition}$
     300%                (_neg_M(m, _tmp), _tmp) :                      $\C{// negate m}$
     301%                m);
     302%       _dtor_M(_tmp); _dtor_M(zero);                   $\C{// destroy temporaries}$
     303% }
     304% \end{lstlisting}
     305
     306Traits may be used for many of the same purposes as interfaces in Java or abstract base classes in \CC. Unlike Java interfaces or \CC base classes, \CFA types do not explicitly state any inheritance relationship to traits they satisfy, which is a form of structural inheritance, similar to the implementation of an interface in Go~\citep{Go}, as opposed to the nominal inheritance model of Java and \CC.
     307
     308Nominal inheritance can be simulated with traits using marker variables or functions:
    305309\begin{lstlisting}
    306310trait nominal(otype T) {
    307311    T is_nominal;
    308312};
    309 
    310313int is_nominal;                                                         $\C{// int now satisfies the nominal trait}$
    311314\end{lstlisting}
    312315
    313 Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship among multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems:
     316Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship \emph{among} multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems:
    314317\begin{lstlisting}
    315318trait pointer_like(otype Ptr, otype El) {
    316319    lvalue El *?(Ptr);                                          $\C{// Ptr can be dereferenced into a modifiable value of type El}$
    317320}
    318 
    319321struct list {
    320322    int value;
    321     list *next;                                                         $\C{// may omit "struct" on type names}$
     323    list *next;                                                         $\C{// may omit "struct" on type names as in \CC}$
    322324};
    323 
    324325typedef list *list_iterator;
    325326
     
    334335One of the known shortcomings of standard C is that it does not provide reusable type-safe abstractions for generic data structures and algorithms. Broadly speaking, there are three approaches to create data structures in C. One approach is to write bespoke data structures for each context in which they are needed. While this approach is flexible and supports integration with the C type-checker and tooling, it is also tedious and error-prone, especially for more complex data structures. A second approach is to use @void*@-based polymorphism. This approach is taken by the C standard library functions @qsort@ and @bsearch@, and does allow the use of common code for common functionality. However, basing all polymorphism on @void*@ eliminates the type-checker's ability to ensure that argument types are properly matched, often requires a number of extra function parameters, and also adds pointer indirection and dynamic allocation to algorithms and data structures that would not otherwise require them. A third approach to generic code is to use pre-processor macros to generate it -- this approach does allow the generated code to be both generic and type-checked, though any errors produced may be difficult to interpret. Furthermore, writing and invoking C code as preprocessor macros is unnatural and somewhat inflexible.
    335336
    336 Other C-like languages such as \CC and Java use \emph{generic types} to produce type-safe abstract data types. The authors have chosen to implement generic types as well, with some care taken that the generic types design for \CFA integrates efficiently and naturally with the existing polymorphic functions in \CFA while retaining backwards compatibility with C; maintaining separate compilation is a particularly important constraint on the design. However, where the concrete parameters of the generic type are known, there is not extra overhead for the use of a generic type.
     337Other C-like languages such as \CC and Java use \emph{generic types} to produce type-safe abstract data types. \CFA implements generic types with some care taken that the generic types design for \CFA integrates efficiently and naturally with the existing polymorphic functions in \CFA while retaining backwards compatibility with C; maintaining separate compilation is a particularly important constraint on the design. However, where the concrete parameters of the generic type are known, there is no extra overhead for the use of a generic type, as for \CC templates.
    337338
    338339A generic type can be declared by placing a @forall@ specifier on a @struct@ or @union@ declaration, and instantiated using a parenthesized list of types after the type name:
     
    359360\end{lstlisting}
    360361
    361 \CFA classifies generic types as either \emph{concrete} or \emph{dynamic}. Dynamic generic types vary in their in-memory layout depending on their type parameters, while concrete generic types have a fixed memory layout regardless of type parameters. A type may have polymorphic parameters but still be concrete; in \CFA such types are called \emph{dtype-static}. Polymorphic pointers are an example of dtype-static types -- @forall(dtype T) T*@ is a polymorphic type, but for any @T@ chosen, @T*@ has exactly the same in-memory representation as a @void*@, and can therefore be represented by a @void*@ in code generation.
    362 
    363 \CFA generic types may also specify constraints on their argument type to be checked by the compiler. For example, consider the following declaration of a sorted set type, which ensures that the set key supports comparison and tests for equality:
     362\CFA classifies generic types as either \emph{concrete} or \emph{dynamic}. Concrete generic types have a fixed memory layout regardless of type parameters, while dynamic generic types vary in their in-memory layout depending on their type parameters. A type may have polymorphic parameters but still be concrete; in \CFA such types are called \emph{dtype-static}. Polymorphic pointers are an example of dtype-static types -- @forall(dtype T) T*@ is a polymorphic type, but for any @T@ chosen, @T*@ has exactly the same in-memory representation as a @void*@, and can therefore be represented by a @void*@ in code generation.
     363
     364\CFA generic types may also specify constraints on their argument type to be checked by the compiler. For example, consider the following declaration of a sorted set-type, which ensures that the set key supports equality and relational comparison:
    364365\begin{lstlisting}
    365366forall(otype Key | { _Bool ?==?(Key, Key); _Bool ?<?(Key, Key); })
    366 struct sorted_set;
     367  struct sorted_set;
    367368\end{lstlisting}
    368369
     
    384385};
    385386\end{lstlisting}
     387
    386388
    387389\subsection{Dynamic Generic Types}
     
    860862Cyclone also provides capabilities for polymorphic functions and existential types~\citep{Grossman06}, similar in concept to \CFA's @forall@ functions and generic types. Cyclone existential types can include function pointers in a construct similar to a virtual function table, but these pointers must be explicitly initialized at some point in the code, a tedious and potentially error-prone process. Furthermore, Cyclone's polymorphic functions and types are restricted in that they may only abstract over types with the same layout and calling convention as @void*@, in practice only pointer types and @int@ - in \CFA terms, all Cyclone polymorphism must be dtype-static. This design provides the efficiency benefits discussed in Section~\ref{sec:generic-apps} for dtype-static polymorphism, but is more restrictive than \CFA's more general model.
    861863
    862 Go \citep{Go} and Rust \citep{Rust} are both modern, compiled languages with abstraction features similar to \CFA traits, \emph{interfaces} in Go and \emph{traits} in Rust. However, both languages represent dramatic departures from C in terms of language model, and neither has the same level of compatibility with C as \CFA. Go is a garbage-collected language, imposing the associated runtime overhead, and complicating foreign-function calls with the necessity of accounting for data transfer between the managed Go runtime and the unmanaged C runtime. Furthermore, while generic types and functions are available in Go, they are limited to a small fixed set provided by the compiler, with no language facility to define more. Rust is not garbage-collected, and thus has a lighter-weight runtime that is more easily interoperable with C. It also possesses much more powerful abstraction capabilities for writing generic code than Go. On the other hand, Rust's borrow-checker, while it does provide strong safety guarantees, is complex and difficult to learn, and imposes a distinctly idiomatic programming style on Rust. \CFA, with its more modest safety features, is significantly easier to port C code to, while maintaining the idiomatic style of the original source.
     864Apple's Objective-C \citep{obj-c-book} is another industrially successful set of extensions to C. The Objective-C language model is a fairly radical departure from C, adding object-orientation and message-passing. Objective-C implements variadic functions using the C @va_arg@ mechanism, and did not support type-checked generics until recently \citep{xcode7}, historically using less-efficient and more error-prone runtime checking of object types instead. The GObject framework \citep{GObject} also adds object-orientation with runtime type-checking and reference-counting garbage-collection to C; these are much more intrusive feature additions than those provided by \CFA, in addition to the runtime overhead of reference-counting. The Vala programming language \citep{Vala} compiles to GObject-based C, and so adds the burden of learning a separate language syntax to the aforementioned demerits of GObject as a modernization path for existing C code-bases. Java \citep{Java8} has had generic types and variadic functions since Java~5; Java's generic types are type-checked at compilation and type-erased at runtime, similar to \CFA's, though in Java each object carries its own table of method pointers, while \CFA passes the method pointers separately so as to maintain a C-compatible struct layout. Java variadic functions are simply syntactic sugar for an array of a single type, and therefore less useful than \CFA's heterogeneously-typed variadic functions. Java is also a garbage-collected, object-oriented language, with the associated resource usage and C-interoperability burdens.
     865
     866D \citep{D}, Go \citep{Go}, and Rust \citep{Rust} are modern, compiled languages with abstraction features similar to \CFA traits, \emph{interfaces} in D and Go and \emph{traits} in Rust. However, each language represents dramatic departures from C in terms of language model, and none has the same level of compatibility with C as \CFA. D and Go are garbage-collected languages, imposing the associated runtime overhead. The necessity of accounting for data transfer between the managed Go runtime and the unmanaged C runtime complicates foreign-function interface between Go and C. Furthermore, while generic types and functions are available in Go, they are limited to a small fixed set provided by the compiler, with no language facility to define more. D restricts garbage collection to its own heap by default, while Rust is not garbage-collected, and thus has a lighter-weight runtime that is more easily interoperable with C. Rust also possesses much more powerful abstraction capabilities for writing generic code than Go. On the other hand, Rust's borrow-checker, while it does provide strong safety guarantees, is complex and difficult to learn, and imposes a distinctly idiomatic programming style on Rust. \CFA, with its more modest safety features, is significantly easier to port C code to, while maintaining the idiomatic style of the original source.
    863867
    864868\section{Conclusion \& Future Work}
    865869
    866 In conclusion, the authors' design for generic types and tuples imposes minimal runtime overhead while still supporting a full range of C features, including separately-compiled modules. There is ongoing work on a wide range of \CFA feature extensions, including reference types, exceptions, and concurrent programming primitives. In addition to this work, there are some interesting future directions the polymorphism design could take. Notably, \CC template functions trade compile time and code bloat for optimal runtime of individual instantiations of polymorphic functions. \CFA polymorphic functions, by contrast, use an approach that is essentially dynamic virtual dispatch. The runtime overhead of this approach is low, but not as low as \CC template functions, and it may be beneficial to provide a mechanism for particularly performance-sensitive code to close this gap. Further research is needed, but two promising approaches are to allow an annotation on polymorphic function call sites that tells the translator to create a template-specialization of the function (provided the code is visible in the current translation unit) or placing an annotation on polymorphic function definitions that instantiates a version of the polymorphic function specialized to some set of types. These approaches are not mutually exclusive, and would allow these performance optimizations to be applied only where most useful to increase performance, without suffering the code bloat or loss of generality of a template expansion approach where it is unnecessary.
     870There is ongoing work on a wide range of \CFA feature extensions, including reference types, exceptions, and concurrent programming primitives. In addition to this work, there are some interesting future directions the polymorphism design could take. Notably, \CC template functions trade compile time and code bloat for optimal runtime of individual instantiations of polymorphic functions. \CFA polymorphic functions, by contrast, use an approach that is essentially dynamic virtual dispatch. The runtime overhead of this approach is low, but not as low as \CC template functions, and it may be beneficial to provide a mechanism for particularly performance-sensitive code to close this gap. Further research is needed, but two promising approaches are to allow an annotation on polymorphic function call sites that tells the translator to create a template-specialization of the function (provided the code is visible in the current translation unit) or placing an annotation on polymorphic function definitions that instantiates a version of the polymorphic function specialized to some set of types. These approaches are not mutually exclusive, and would allow these performance optimizations to be applied only where most useful to increase performance, without suffering the code bloat or loss of generality of a template expansion approach where it is unnecessary.
     871
     872In conclusion, the authors' design for generic types and tuples, unlike those available in existing work, is both reusable and type-checked, while still supporting a full range of C features, including separately-compiled modules. We have experimentally validated the performance of our design against both \CC and standard C, showing it is \TODO{shiny, cap'n}.
    867873
    868874\begin{acks}
Note: See TracChangeset for help on using the changeset viewer.