Ignore:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/generic_types/generic_types.tex

    rae6cc8b r115a868  
    2828\newcommand{\CCseventeen}{\rm C\kern-.1em\hbox{+\kern-.25em+}17\xspace} % C++17 symbolic name
    2929\newcommand{\CCtwenty}{\rm C\kern-.1em\hbox{+\kern-.25em+}20\xspace} % C++20 symbolic name
     30\newcommand{\CS}{C\raisebox{-0.7ex}{\Large$^\sharp$}\xspace}
     31\newcommand{\Textbf}[1]{{\color{red}\textbf{#1}}}
    3032
    3133\newcommand{\TODO}[1]{\textbf{TODO}: {\itshape #1}} % TODO included
     
    4951stringstyle=\tt,                                                                                % use typewriter font
    5052tabsize=4,                                                                                              % 4 space tabbing
    51 xleftmargin=\parindent,                                                                 % indent code to paragraph indentation
     53xleftmargin=\parindentlnth,                                                             % indent code to paragraph indentation
    5254%mathescape=true,                                                                               % LaTeX math escape in CFA code $...$
    5355escapechar=\$,                                                                                  % LaTeX escape in CFA code
     
    124126\maketitle
    125127
    126 \section{Introduction \& Background}
    127 
    128 \CFA\footnote{Pronounced ``C-for-all'', and written \CFA or Cforall.} is an evolutionary extension of the C programming language that aims to add modern language features to C while maintaining both source compatibility with C and a familiar programming model for programmers. Four key design goals were set out in the original design of \CFA~\citep{Bilson03}:
    129 \begin{enumerate}
    130 \item The behaviour of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler.
    131 \item Standard C code must be as fast and as small when translated by a \CFA compiler as when translated by a C compiler.
    132 \item \CFA code must be at least as portable as standard C code.
    133 \item Extensions introduced by \CFA must be translated in the most efficient way possible.
    134 \end{enumerate}
    135 These goals ensure existing C code-bases can be converted to \CFA incrementally and with minimal effort, and C programmers can productively generate \CFA code without training beyond the features they wish to employ. In its current implementation, \CFA is compiled by translating it to the GCC-dialect of C~\citep{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by GCC, meeting goals (1)-(3). Ultimately, a compiler is necessary for advanced features and optimal performance.
    136 
    137 \CFA has been previously extended with polymorphic functions and name overloading (including operator overloading) by \citet{Bilson03}, and deterministically-executed constructors and destructors by \citet{Schluntz17}. This paper builds on those contributions, identifying shortcomings in existing approaches to generic and variadic data types in C-like languages and presenting a design of generic and variadic types as as extension of the \CFA language that avoids those shortcomings. Particularly, the solution we present is both reusable and type-checked, as well as conforming to the design goals of \CFA and ergonomically using existing C abstractions. We have empirically compared our new design to both standard C and \CC; the results show that this design is \TODO{awesome, I hope}.
     128
     129\section{Introduction and Background}
     130
     131The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects. This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more.
     132The \citet{TIOBE} ranks the top 5 most popular programming languages as: Java 16\%, \Textbf{C 7\%}, \Textbf{\CC 5\%}, \CS 4\%, Python 4\% = 36\%, where the next 50 languages are less than 3\% each with a long tail. The top 3 rankings over the past 30 years are:
     133\lstDeleteShortInline@
     134\begin{center}
     135\setlength{\tabcolsep}{10pt}
     136\begin{tabular}{@{}r|c|c|c|c|c|c|c@{}}
     137                & 2017  & 2012  & 2007  & 2002  & 1997  & 1992  & 1987          \\
     138\hline
     139Java    & 1             & 1             & 1             & 3             & 13    & -             & -                     \\
     140\hline
     141\Textbf{C}      & \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{1}& \Textbf{1}& \Textbf{1}& \Textbf{1}    \\
     142\hline
     143\CC             & 3             & 3             & 3             & 3             & 2             & 2             & 4                     \\
     144\end{tabular}
     145\end{center}
     146\lstMakeShortInline@
     147Love it or hate it, C is extremely popular, highly used, and one of the few system's languages.
     148In many cases, \CC is often used solely as a better C.
     149Nonetheless, C, first standardized over thirty years ago, lacks many features that make programming in more modern languages safer and more productive.
     150
     151\CFA (pronounced ``C-for-all'', and written \CFA or Cforall) is an evolutionary extension of the C programming language that aims to add modern language features to C while maintaining both source compatibility with C and a familiar programming model for programmers. The four key design goals for \CFA~\citep{Bilson03} are:
     152(1) The behaviour of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler;
     153(2) Standard C code must be as fast and as small when translated by a \CFA compiler as when translated by a C compiler;
     154(3) \CFA code must be at least as portable as standard C code;
     155(4) Extensions introduced by \CFA must be translated in the most efficient way possible.
     156These goals ensure existing C code-bases can be converted to \CFA incrementally with minimal effort, and C programmers can productively generate \CFA code without training beyond the features being used. In its current implementation, \CFA is compiled by translating it to the GCC-dialect of C~\citep{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by GCC, meeting goals (1)-(3). Ultimately, a compiler is necessary for advanced features and optimal performance.
     157
     158This paper identifies shortcomings in existing approaches to generic and variadic data types in C-like languages and presents a design for generic and variadic types avoiding those shortcomings. Specifically, the solution is both reusable and type-checked, as well as conforming to the design goals of \CFA with ergonomic use of existing C abstractions. The new constructs are empirically compared with both standard C and \CC; the results show the new design is comparable in performance.
     159
    138160
    139161\subsection{Polymorphic Functions}
    140162\label{sec:poly-fns}
    141163
    142 \CFA's polymorphism was originally formalized by \citet{Ditchfield92}, and first implemented by \citet{Bilson03}. The signature feature of \CFA is parametric-polymorphic functions; such functions are written using a @forall@ clause (which gives the language its name):
     164\CFA's polymorphism was originally formalized by \citet{Ditchfield92}, and first implemented by \citet{Bilson03}. The signature feature of \CFA is parametric-polymorphic functions where functions are generalized using a @forall@ clause (giving the language its name):
    143165\begin{lstlisting}
    144166`forall( otype T )` T identity( T val ) { return val; }
    145167int forty_two = identity( 42 );                         $\C{// T is bound to int, forty\_two == 42}$
    146168\end{lstlisting}
    147 The @identity@ function above can be applied to any complete object-type (or ``@otype@''). The type variable @T@ is transformed into a set of additional implicit parameters to @identity@ that encode sufficient information about @T@ to create and return a variable of that type. The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, \eg for a pointer, the type parameter can be declared as @dtype T@, where @dtype@ is short for ``data type''.
    148 
    149 Here, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; preliminary experiments have shown this overhead to be similar to \CC virtual function calls. An advantage of this design is that, unlike \CC template functions, \CFA @forall@ functions are compatible with C separate compilation.
    150 
    151 Since bare polymorphic-types provide only a narrow set of available operations, \CFA provides a \emph{type assertion} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type variable. For instance, @twice@ can be defined using the \CFA syntax for operator overloading:
    152 \begin{lstlisting}
    153 forall( otype T | { T `?`+`?`(T, T); } )        $\C{// ? denotes operands}$
    154   T twice( T x ) { return x + x; }                      $\C{// (2)}$
     169The @identity@ function above can be applied to any complete \emph{object type} (or @otype@). The type variable @T@ is transformed into a set of additional implicit parameters encoding sufficient information about @T@ to create and return a variable of that type. The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, \eg for a pointer, the type parameter can be declared as a \emph{data type} (or @dtype@).
     170
     171Here, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; preliminary experiments have shown this overhead is similar to \CC virtual function calls. An advantage of this design is that, unlike \CC template functions, \CFA polymorphic functions are compatible with C \emph{separate} compilation, preventing code bloat.
     172
     173Since bare polymorphic-types provide only a narrow set of available operations, \CFA provides a \emph{type assertion} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type-variable. For example, the function @twice@ can be defined using the \CFA syntax for operator overloading:
     174\begin{lstlisting}
     175forall( otype T `| { T ?+?(T, T); }` ) T twice( T x ) { return x + x; } $\C{// ? denotes operands}$
    155176int val = twice( twice( 3.7 ) );
    156177\end{lstlisting}
    157 which works for any type @T@ with an addition operator defined. The translator accomplishes this polymorphism by creating a wrapper function for calling @+@ with @T@ bound to @double@, then providing this function to the first call of @twice@. It then has the option of using the same @twice@ again and converting the result to @int@ on assignment, or creating another @twice@ with type parameter @T@ bound to @int@ because \CFA uses the return type in its type analysis. The first approach has a late conversion from integer to floating-point on the final assignment, while the second has an eager conversion to integer. \CFA minimizes the number of conversions and their potential to lose information, so it selects the first approach.
    158 
    159 Monomorphic specializations of polymorphic functions can satisfy polymorphic type-assertions.
    160 % \begin{lstlisting}
    161 % forall(otype T `| { T twice(T); }`)           $\C{// type assertion}$
    162 % T four_times(T x) { return twice( twice(x) ); }
    163 % double twice(double d) { return d * 2.0; }    $\C{// (1)}$
    164 % double magic = four_times(10.5);                      $\C{// T bound to double, uses (1) to satisfy type assertion}$
    165 % \end{lstlisting}
    166 \begin{lstlisting}
    167 forall( otype T `| { int ?<?( T, T ); }` )      $\C{// type assertion}$
    168   void qsort( const T * arr, size_t size );
    169 forall( otype T `| { int ?<?( T, T ); }` )      $\C{// type assertion}$
    170   T * bsearch( T key, const T * arr, size_t size );
    171 double vals[10] = { /* 10 floating-point values */ };
    172 qsort( vals, 10 );                                                      $\C{// sort array}$
    173 double * val = bsearch( 5.0, vals, 10 );        $\C{// binary search sorted array for key}$
    174 \end{lstlisting}
    175 @qsort@ and @bsearch@ can only be called with arguments for which there exists a function named @<@ taking two arguments of the same type and returning an @int@ value.
    176 Here, the built-in monomorphic specialization of @<@ for type @double@ is passed as an additional implicit parameter to the calls of @qsort@ and @bsearch@.
    177 
    178 Crucial to the design of a new programming language are the libraries to access thousands of external features.
    179 \CFA inherits a massive compatible library-base, where other programming languages have to rewrite or provide fragile inter-language communication with C.
    180 A simple example is leveraging the existing type-unsafe (@void *@) C @bsearch@, shown here searching a floating-point array:
     178which works for any type @T@ with a matching addition operator. The polymorphism is achieved by creating a wrapper function for calling @+@ with @T@ bound to @double@, then passing this function to the first call of @twice@. There is now the option of using the same @twice@ and converting the result to @int@ on assignment, or creating another @twice@ with type parameter @T@ bound to @int@ because \CFA uses the return type~\cite{Ada} in its type analysis. The first approach has a late conversion from @int@ to @double@ on the final assignment, while the second has an eager conversion to @int@. \CFA minimizes the number of conversions and their potential to lose information, so it selects the first approach, which corresponds with C-programmer intuition.
     179
     180Crucial to the design of a new programming language are the libraries to access thousands of external software features.
     181\CFA inherits a massive compatible library-base, where other programming languages must rewrite or provide fragile inter-language communication with C.
     182A simple example is leveraging the existing type-unsafe (@void *@) C @bsearch@ to binary search a sorted floating-point array:
    181183\begin{lstlisting}
    182184void * bsearch( const void * key, const void * base, size_t nmemb, size_t size,
     
    184186int comp( const void * t1, const void * t2 ) { return *(double *)t1 < *(double *)t2 ? -1 :
    185187                                *(double *)t2 < *(double *)t1 ? 1 : 0; }
     188double vals[10] = { /* 10 floating-point values */ };
    186189double key = 5.0;
    187 double * val = (double *)bsearch( &key, vals, size, sizeof(vals[0]), comp );
    188 \end{lstlisting}
    189 but providing a type-safe \CFA overloaded wrapper.
     190double * val = (double *)bsearch( &key, vals, 10, sizeof(vals[0]), comp );      $\C{// search sorted array}$
     191\end{lstlisting}
     192which can be augmented simply with a generalized, type-safe, \CFA-overloaded wrappers:
    190193\begin{lstlisting}
    191194forall( otype T | { int ?<?( T, T ); } ) T * bsearch( T key, const T * arr, size_t size ) {
    192195        int comp( const void * t1, const void * t2 ) { /* as above with double changed to T */ }
    193         return (T *)bsearch( &key, arr, size, sizeof(T), comp );
    194 }
     196        return (T *)bsearch( &key, arr, size, sizeof(T), comp ); }
    195197forall( otype T | { int ?<?( T, T ); } ) unsigned int bsearch( T key, const T * arr, size_t size ) {
    196198        T *result = bsearch( key, arr, size );  $\C{// call first version}$
    197         return result ? result - arr : size;            $\C{// pointer subtraction includes sizeof(T)}$
    198 }
     199        return result ? result - arr : size; }  $\C{// pointer subtraction includes sizeof(T)}$
    199200double * val = bsearch( 5.0, vals, 10 );        $\C{// selection based on return type}$
    200201int posn = bsearch( 5.0, vals, 10 );
    201202\end{lstlisting}
    202203The nested routine @comp@ provides the hidden interface from typed \CFA to untyped (@void *@) C, plus the cast of the result.
    203 As well, an alternate kind of return is made available, position versus pointer to found element.
     204As well, an alternate kind of return is made available: position versus pointer to found element.
    204205\CC's type-system cannot disambiguate between the two versions of @bsearch@ because it does not use the return type in overload resolution, nor can \CC separately compile a templated @bsearch@.
    205206
    206 Call-site inferencing and nested functions provide a localized form of inheritance. For example, @qsort@ only sorts in ascending order using @<@. However, it is trivial to locally change this behaviour:
    207 \begin{lstlisting}
    208 {   int ?<?( double x, double y ) { return x `>` y; }   $\C{// override behaviour}$
     207\CFA has replacement libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions, all without rewriting the actual computations.
     208For example, it is possible to write a type-safe \CFA wrapper @malloc@ based on the C @malloc@:
     209\begin{lstlisting}
     210forall( dtype T | sized(T) ) T * malloc( void ) { return (T *)(void *)malloc( (size_t)sizeof(T) ); }
     211int * ip = malloc();                                            $\C{// select type and size from left-hand side}$
     212double * dp = malloc();
     213struct S {...} * sp = malloc();
     214\end{lstlisting}
     215where the return type supplies the type/size of the allocation, which is impossible in most type systems.
     216
     217Call-site inferencing and nested functions provide a localized form of inheritance. For example, the \CFA @qsort@ only sorts in ascending order using @<@. However, it is trivial to locally change this behaviour:
     218\begin{lstlisting}
     219forall( otype T | { int ?<?( T, T ); } ) void qsort( const T * arr, size_t size ) { /* use C qsort */ }
     220{       int ?<?( double x, double y ) { return x `>` y; }       $\C{// locally override behaviour}$
    209221        qsort( vals, size );                                    $\C{// descending sort}$
    210222}
    211223\end{lstlisting}
    212224Within the block, the nested version of @<@ performs @>@ and this local version overrides the built-in @<@ so it is passed to @qsort@.
    213 Hence, programmers can easily form new local environments to maximize reuse of existing functions and types.
    214 
    215 Finally, variables may be overloaded:
     225Hence, programmers can easily form a local environments, adding and modifying appropriate functions, to maximize reuse of other existing functions and types.
     226
     227Finally, \CFA allows variable overloading:
    216228\lstDeleteShortInline@
    217229\par\smallskip
     
    232244\smallskip\par\noindent
    233245Hence, the single name @MAX@ replaces all the C type-specific names: @SHRT_MAX@, @INT_MAX@, @DBL_MAX@.
     246As well, restricted constant overloading is allowed for the values @0@ and @1@, which have special status in C, \eg the value @0@ is both an integer and a pointer literal, so its meaning depends on context.
     247In addition, several operations are defined in terms values @0@ and @1@.
     248For example,
     249\begin{lstlisting}
     250int x;
     251if (x)        // if (x != 0)
     252        x++;    //   x += 1;
     253\end{lstlisting}
     254Every if statement in C compares the condition with @0@, and every increment and decrement operator is semantically equivalent to adding or subtracting the value @1@ and storing the result.
     255Due to these rewrite rules, the values @0@ and @1@ have the types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly connect to all special @0@ and @1@ contexts.
     256The types @zero_t@ and @one_t@ have special built in implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work as normal.
     257
    234258
    235259\subsection{Traits}
    236260
    237 \CFA provides \emph{traits} to name a group of type assertions:
    238 % \begin{lstlisting}
    239 % trait has_magnitude(otype T) {
    240 %     _Bool ?<?(T, T);                                          $\C{// comparison operator for T}$
    241 %     T -?(T);                                                          $\C{// negation operator for T}$
    242 %     void ?{}(T*, zero_t);                                     $\C{// constructor from 0 literal}$
    243 % };
    244 % forall(otype M | has_magnitude(M))
    245 % M abs( M m ) {
    246 %     M zero = { 0 };                                                   $\C{// uses zero\_t constructor from trait}$
    247 %     return m < zero ? -m : m;
    248 % }
    249 % forall(otype M | has_magnitude(M))
    250 % M max_magnitude( M a, M b ) {
    251 %     return abs(a) < abs(b) ? b : a;
    252 % }
    253 % \end{lstlisting}
     261\CFA provides \emph{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration:
    254262\begin{lstlisting}
    255263trait summable( otype T ) {
    256         void ?{}(T*, zero_t);                                   $\C{// constructor from 0 literal}$
     264        void ?{}( T *, zero_t );                                $\C{// constructor from 0 literal}$
    257265        T ?+?( T, T );                                                  $\C{// assortment of additions}$
    258266        T ?+=?( T *, T );
    259267        T ++?( T * );
    260         T ?++( T * );
    261 };
    262 forall( otype T | summable( T ) )
    263   T sum( T a[$\,$], size_t size ) {
    264         T total = { 0 };                                                $\C{// instantiate T from 0}$
     268        T ?++( T * ); };
     269forall( otype T `| summable( T )` ) T sum( T a[$\,$], size_t size ) {  // use trait
     270        `T` total = { `0` };                                    $\C{// instantiate T from 0 by calling its constructor}$
    265271        for ( unsigned int i = 0; i < size; i += 1 )
    266                 total += a[i];                                          $\C{// select appropriate +}$
    267         return total;
    268 }
    269 \end{lstlisting}
    270 The trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration.
     272                total `+=` a[i];                                        $\C{// select appropriate +}$
     273        return total; }
     274\end{lstlisting}
    271275
    272276In fact, the set of operators is incomplete, \eg no assignment, but @otype@ is syntactic sugar for the following implicit trait:
    273277\begin{lstlisting}
    274 trait otype( dtype T | sized(T) ) {
    275         // sized is a compiler-provided pseudo-trait for types with known size and alignment}
     278trait otype( dtype T | sized(T) ) {  // sized is a pseudo-trait for types with known size and alignment
    276279        void ?{}( T * );                                                $\C{// default constructor}$
    277280        void ?{}( T *, T );                                             $\C{// copy constructor}$
    278281        void ?=?( T *, T );                                             $\C{// assignment operator}$
    279         void ^?{}( T * );                                               $\C{// destructor}$
    280 };
    281 \end{lstlisting}
    282 Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete struct type -- they can be stack-allocated using the @alloca@ compiler builtin, default or copy-initialized, assigned, and deleted. As an example, the @sum@ function produces generated code something like the following (simplified for clarity and brevity)\TODO{fix example, maybe elide, it's likely too long with the more complicated function}:
    283 \begin{lstlisting}
    284 void abs( size_t _sizeof_M, size_t _alignof_M,
    285                 void (*_ctor_M)(void*), void (*_copy_M)(void*, void*),
    286                 void (*_assign_M)(void*, void*), void (*_dtor_M)(void*),
    287                 _Bool (*_lt_M)(void*, void*), void (*_neg_M)(void*, void*),
    288         void (*_ctor_M_zero)(void*, int),
    289                 void* m, void* _rtn ) {                         $\C{// polymorphic parameter and return passed as void*}$
    290                                                                                         $\C{// M zero = { 0 };}$
    291         void* zero = alloca(_sizeof_M);                 $\C{// stack allocate zero temporary}$
    292         _ctor_M_zero(zero, 0);                                  $\C{// initialize using zero\_t constructor}$
    293                                                                                         $\C{// return m < zero ? -m : m;}$
    294         void *_tmp = alloca(_sizeof_M);
    295         _copy_M( _rtn,                                                  $\C{// copy-initialize return value}$
    296                 _lt_M( m, zero ) ?                                      $\C{// check condition}$
    297                  (_neg_M(m, _tmp), _tmp) :                      $\C{// negate m}$
    298                  m);
    299         _dtor_M(_tmp); _dtor_M(zero);                   $\C{// destroy temporaries}$
    300 }
    301 \end{lstlisting}
    302 
    303 Semantically, traits are simply a named lists of type assertions, but they may be used for many of the same purposes that interfaces in Java or abstract base classes in \CC are used for. Unlike Java interfaces or \CC base classes, \CFA types do not explicitly state any inheritance relationship to traits they satisfy; this can be considered a form of structural inheritance, similar to implementation of an interface in Go, as opposed to the nominal inheritance model of Java and \CC. Nominal inheritance can be simulated with traits using marker variables or functions:
     282        void ^?{}( T * ); };                                    $\C{// destructor}$
     283\end{lstlisting}
     284Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete type: stack-allocatable, default or copy-initialized, assigned, and deleted.
     285% As an example, the @sum@ function produces generated code something like the following (simplified for clarity and brevity)\TODO{fix example, maybe elide, it's likely too long with the more complicated function}:
     286% \begin{lstlisting}
     287% void abs( size_t _sizeof_M, size_t _alignof_M,
     288%               void (*_ctor_M)(void*), void (*_copy_M)(void*, void*),
     289%               void (*_assign_M)(void*, void*), void (*_dtor_M)(void*),
     290%               _Bool (*_lt_M)(void*, void*), void (*_neg_M)(void*, void*),
     291%       void (*_ctor_M_zero)(void*, int),
     292%               void* m, void* _rtn ) {                         $\C{// polymorphic parameter and return passed as void*}$
     293%                                                                                       $\C{// M zero = { 0 };}$
     294%       void* zero = alloca(_sizeof_M);                 $\C{// stack allocate zero temporary}$
     295%       _ctor_M_zero(zero, 0);                                  $\C{// initialize using zero\_t constructor}$
     296%                                                                                       $\C{// return m < zero ? -m : m;}$
     297%       void *_tmp = alloca(_sizeof_M);
     298%       _copy_M( _rtn,                                                  $\C{// copy-initialize return value}$
     299%               _lt_M( m, zero ) ?                                      $\C{// check condition}$
     300%                (_neg_M(m, _tmp), _tmp) :                      $\C{// negate m}$
     301%                m);
     302%       _dtor_M(_tmp); _dtor_M(zero);                   $\C{// destroy temporaries}$
     303% }
     304% \end{lstlisting}
     305
     306Traits may be used for many of the same purposes as interfaces in Java or abstract base classes in \CC. Unlike Java interfaces or \CC base classes, \CFA types do not explicitly state any inheritance relationship to traits they satisfy, which is a form of structural inheritance, similar to the implementation of an interface in Go~\citep{Go}, as opposed to the nominal inheritance model of Java and \CC.
     307
     308Nominal inheritance can be simulated with traits using marker variables or functions:
    304309\begin{lstlisting}
    305310trait nominal(otype T) {
    306311    T is_nominal;
    307312};
    308 
    309313int is_nominal;                                                         $\C{// int now satisfies the nominal trait}$
    310314\end{lstlisting}
    311315
    312 Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship among multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems:
     316Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship \emph{among} multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems:
    313317\begin{lstlisting}
    314318trait pointer_like(otype Ptr, otype El) {
    315319    lvalue El *?(Ptr);                                          $\C{// Ptr can be dereferenced into a modifiable value of type El}$
    316320}
    317 
    318321struct list {
    319322    int value;
    320     list *next;                                                         $\C{// may omit "struct" on type names}$
     323    list *next;                                                         $\C{// may omit "struct" on type names as in \CC}$
    321324};
    322 
    323325typedef list *list_iterator;
    324326
     
    333335One of the known shortcomings of standard C is that it does not provide reusable type-safe abstractions for generic data structures and algorithms. Broadly speaking, there are three approaches to create data structures in C. One approach is to write bespoke data structures for each context in which they are needed. While this approach is flexible and supports integration with the C type-checker and tooling, it is also tedious and error-prone, especially for more complex data structures. A second approach is to use @void*@-based polymorphism. This approach is taken by the C standard library functions @qsort@ and @bsearch@, and does allow the use of common code for common functionality. However, basing all polymorphism on @void*@ eliminates the type-checker's ability to ensure that argument types are properly matched, often requires a number of extra function parameters, and also adds pointer indirection and dynamic allocation to algorithms and data structures that would not otherwise require them. A third approach to generic code is to use pre-processor macros to generate it -- this approach does allow the generated code to be both generic and type-checked, though any errors produced may be difficult to interpret. Furthermore, writing and invoking C code as preprocessor macros is unnatural and somewhat inflexible.
    334336
    335 Other C-like languages such as \CC and Java use \emph{generic types} to produce type-safe abstract data types. The authors have chosen to implement generic types as well, with some care taken that the generic types design for \CFA integrates efficiently and naturally with the existing polymorphic functions in \CFA while retaining backwards compatibility with C; maintaining separate compilation is a particularly important constraint on the design. However, where the concrete parameters of the generic type are known, there is not extra overhead for the use of a generic type.
     337Other C-like languages such as \CC and Java use \emph{generic types} to produce type-safe abstract data types. \CFA implements generic types with some care taken that the generic types design for \CFA integrates efficiently and naturally with the existing polymorphic functions in \CFA while retaining backwards compatibility with C; maintaining separate compilation is a particularly important constraint on the design. However, where the concrete parameters of the generic type are known, there is no extra overhead for the use of a generic type, as for \CC templates.
    336338
    337339A generic type can be declared by placing a @forall@ specifier on a @struct@ or @union@ declaration, and instantiated using a parenthesized list of types after the type name:
     
    358360\end{lstlisting}
    359361
    360 \CFA classifies generic types as either \emph{concrete} or \emph{dynamic}. Dynamic generic types vary in their in-memory layout depending on their type parameters, while concrete generic types have a fixed memory layout regardless of type parameters. A type may have polymorphic parameters but still be concrete; in \CFA such types are called \emph{dtype-static}. Polymorphic pointers are an example of dtype-static types -- @forall(dtype T) T*@ is a polymorphic type, but for any @T@ chosen, @T*@ has exactly the same in-memory representation as a @void*@, and can therefore be represented by a @void*@ in code generation.
    361 
    362 \CFA generic types may also specify constraints on their argument type to be checked by the compiler. For example, consider the following declaration of a sorted set type, which ensures that the set key supports comparison and tests for equality:
     362\CFA classifies generic types as either \emph{concrete} or \emph{dynamic}. Concrete generic types have a fixed memory layout regardless of type parameters, while dynamic generic types vary in their in-memory layout depending on their type parameters. A type may have polymorphic parameters but still be concrete; in \CFA such types are called \emph{dtype-static}. Polymorphic pointers are an example of dtype-static types -- @forall(dtype T) T*@ is a polymorphic type, but for any @T@ chosen, @T*@ has exactly the same in-memory representation as a @void*@, and can therefore be represented by a @void*@ in code generation.
     363
     364\CFA generic types may also specify constraints on their argument type to be checked by the compiler. For example, consider the following declaration of a sorted set-type, which ensures that the set key supports equality and relational comparison:
    363365\begin{lstlisting}
    364366forall(otype Key | { _Bool ?==?(Key, Key); _Bool ?<?(Key, Key); })
    365 struct sorted_set;
     367  struct sorted_set;
    366368\end{lstlisting}
    367369
     
    383385};
    384386\end{lstlisting}
     387
    385388
    386389\subsection{Dynamic Generic Types}
     
    859862Cyclone also provides capabilities for polymorphic functions and existential types~\citep{Grossman06}, similar in concept to \CFA's @forall@ functions and generic types. Cyclone existential types can include function pointers in a construct similar to a virtual function table, but these pointers must be explicitly initialized at some point in the code, a tedious and potentially error-prone process. Furthermore, Cyclone's polymorphic functions and types are restricted in that they may only abstract over types with the same layout and calling convention as @void*@, in practice only pointer types and @int@ - in \CFA terms, all Cyclone polymorphism must be dtype-static. This design provides the efficiency benefits discussed in Section~\ref{sec:generic-apps} for dtype-static polymorphism, but is more restrictive than \CFA's more general model.
    860863
    861 \TODO{Talk about GObject, other object-oriented frameworks for C (Objective-C)?}
    862 
    863 Go \citep{Go} and Rust \citep{Rust} are both modern, compiled languages with abstraction features similar to \CFA traits, \emph{interfaces} in Go and \emph{traits} in Rust. However, both languages represent dramatic departures from C in terms of language model, and neither has the same level of compatibility with C as \CFA. Go is a garbage-collected language, imposing the associated runtime overhead, and complicating foreign-function calls with the necessity of accounting for data transfer between the managed Go runtime and the unmanaged C runtime. Furthermore, while generic types and functions are available in Go, they are limited to a small fixed set provided by the compiler, with no language facility to define more. Rust is not garbage-collected, and thus has a lighter-weight runtime that is more easily interoperable with C. It also possesses much more powerful abstraction capabilities for writing generic code than Go. On the other hand, Rust's borrow-checker, while it does provide strong safety guarantees, is complex and difficult to learn, and imposes a distinctly idiomatic programming style on Rust. \CFA, with its more modest safety features, is significantly easier to port C code to, while maintaining the idiomatic style of the original source.
     864Apple's Objective-C \citep{obj-c-book} is another industrially successful set of extensions to C. The Objective-C language model is a fairly radical departure from C, adding object-orientation and message-passing. Objective-C implements variadic functions using the C @va_arg@ mechanism, and did not support type-checked generics until recently \citep{xcode7}, historically using less-efficient and more error-prone runtime checking of object types instead. The GObject framework \citep{GObject} also adds object-orientation with runtime type-checking and reference-counting garbage-collection to C; these are much more intrusive feature additions than those provided by \CFA, in addition to the runtime overhead of reference-counting. The Vala programming language \citep{Vala} compiles to GObject-based C, and so adds the burden of learning a separate language syntax to the aforementioned demerits of GObject as a modernization path for existing C code-bases. Java \citep{Java8} has had generic types and variadic functions since Java~5; Java's generic types are type-checked at compilation and type-erased at runtime, similar to \CFA's, though in Java each object carries its own table of method pointers, while \CFA passes the method pointers separately so as to maintain a C-compatible struct layout. Java variadic functions are simply syntactic sugar for an array of a single type, and therefore less useful than \CFA's heterogeneously-typed variadic functions. Java is also a garbage-collected, object-oriented language, with the associated resource usage and C-interoperability burdens.
     865
     866D \citep{D}, Go \citep{Go}, and Rust \citep{Rust} are modern, compiled languages with abstraction features similar to \CFA traits, \emph{interfaces} in D and Go and \emph{traits} in Rust. However, each language represents dramatic departures from C in terms of language model, and none has the same level of compatibility with C as \CFA. D and Go are garbage-collected languages, imposing the associated runtime overhead. The necessity of accounting for data transfer between the managed Go runtime and the unmanaged C runtime complicates foreign-function interface between Go and C. Furthermore, while generic types and functions are available in Go, they are limited to a small fixed set provided by the compiler, with no language facility to define more. D restricts garbage collection to its own heap by default, while Rust is not garbage-collected, and thus has a lighter-weight runtime that is more easily interoperable with C. Rust also possesses much more powerful abstraction capabilities for writing generic code than Go. On the other hand, Rust's borrow-checker, while it does provide strong safety guarantees, is complex and difficult to learn, and imposes a distinctly idiomatic programming style on Rust. \CFA, with its more modest safety features, is significantly easier to port C code to, while maintaining the idiomatic style of the original source.
    864867
    865868\section{Conclusion \& Future Work}
Note: See TracChangeset for help on using the changeset viewer.