Context Navigation

Reverse Diff

generic_types.tex [ae6cc8b:115a868]

File:

: 1 edited

doc/generic_types/generic_types.tex (modified) (9 diffs)

Legend:

: Unmodified
: Added
: Removed

doc/generic_types/generic_types.tex

-                      rae6cc8b
+                      r115a868
 \newcommand{\CCseventeen}{\rm C\kern-.1em\hbox{+\kern-.25em+}17\xspace} % C++17 symbolic name
 \newcommand{\CCtwenty}{\rm C\kern-.1em\hbox{+\kern-.25em+}20\xspace} % C++20 symbolic name
+\newcommand{\CS}{C\raisebox{-0.7ex}{\Large$^\sharp$}\xspace}
+\newcommand{\Textbf}[1]{{\color{red}\textbf{#1}}}
 \newcommand{\TODO}[1]{\textbf{TODO}: {\itshape #1}} % TODO included
 …
 stringstyle=\tt,                                                                                % use typewriter font
 tabsize=4,                                                                                              % 4 space tabbing
 xleftmargin=\parindent,                                                                 % indent code to paragraph indentation
+xleftmargin=\parindentlnth,                                                             % indent code to paragraph indentation
 %mathescape=true,                                                                               % LaTeX math escape in CFA code $...$
 escapechar=\$,                                                                                  % LaTeX escape in CFA code
 …
 \maketitle
+\section{Introduction \& Background}
+\CFA\footnote{Pronounced ``C-for-all'', and written \CFA or Cforall.} is an evolutionary extension of the C programming language that aims to add modern language features to C while maintaining both source compatibility with C and a familiar programming model for programmers. Four key design goals were set out in the original design of \CFA~\citep{Bilson03}:
+\begin{enumerate}
+\item The behaviour of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler.
+\item Standard C code must be as fast and as small when translated by a \CFA compiler as when translated by a C compiler.
+\item \CFA code must be at least as portable as standard C code.
+\item Extensions introduced by \CFA must be translated in the most efficient way possible.
+\end{enumerate}
+These goals ensure existing C code-bases can be converted to \CFA incrementally and with minimal effort, and C programmers can productively generate \CFA code without training beyond the features they wish to employ. In its current implementation, \CFA is compiled by translating it to the GCC-dialect of C~\citep{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by GCC, meeting goals (1)-(3). Ultimately, a compiler is necessary for advanced features and optimal performance.
+\CFA has been previously extended with polymorphic functions and name overloading (including operator overloading) by \citet{Bilson03}, and deterministically-executed constructors and destructors by \citet{Schluntz17}. This paper builds on those contributions, identifying shortcomings in existing approaches to generic and variadic data types in C-like languages and presenting a design of generic and variadic types as as extension of the \CFA language that avoids those shortcomings. Particularly, the solution we present is both reusable and type-checked, as well as conforming to the design goals of \CFA and ergonomically using existing C abstractions. We have empirically compared our new design to both standard C and \CC; the results show that this design is \TODO{awesome, I hope}.
+\section{Introduction and Background}
+The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects. This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more.
+The \citet{TIOBE} ranks the top 5 most popular programming languages as: Java 16\%, \Textbf{C 7\%}, \Textbf{\CC 5\%}, \CS 4\%, Python 4\% = 36\%, where the next 50 languages are less than 3\% each with a long tail. The top 3 rankings over the past 30 years are:
+\lstDeleteShortInline@
+\begin{center}
+\setlength{\tabcolsep}{10pt}
+\begin{tabular}{@{}r|c|c|c|c|c|c|c@{}}
+                & 2017  & 2012  & 2007  & 2002  & 1997  & 1992  & 1987          \\
+\hline
+Java    & 1             & 1             & 1             & 3             & 13    & -             & -                     \\
+\hline
+\Textbf{C}      & \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{1}& \Textbf{1}& \Textbf{1}& \Textbf{1}    \\
+\hline
+\CC             & 3             & 3             & 3             & 3             & 2             & 2             & 4                     \\
+\end{tabular}
+\end{center}
+\lstMakeShortInline@
+Love it or hate it, C is extremely popular, highly used, and one of the few system's languages.
+In many cases, \CC is often used solely as a better C.
+Nonetheless, C, first standardized over thirty years ago, lacks many features that make programming in more modern languages safer and more productive.
+\CFA (pronounced ``C-for-all'', and written \CFA or Cforall) is an evolutionary extension of the C programming language that aims to add modern language features to C while maintaining both source compatibility with C and a familiar programming model for programmers. The four key design goals for \CFA~\citep{Bilson03} are:
+(1) The behaviour of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler;
+(2) Standard C code must be as fast and as small when translated by a \CFA compiler as when translated by a C compiler;
+(3) \CFA code must be at least as portable as standard C code;
+(4) Extensions introduced by \CFA must be translated in the most efficient way possible.
+These goals ensure existing C code-bases can be converted to \CFA incrementally with minimal effort, and C programmers can productively generate \CFA code without training beyond the features being used. In its current implementation, \CFA is compiled by translating it to the GCC-dialect of C~\citep{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by GCC, meeting goals (1)-(3). Ultimately, a compiler is necessary for advanced features and optimal performance.
+This paper identifies shortcomings in existing approaches to generic and variadic data types in C-like languages and presents a design for generic and variadic types avoiding those shortcomings. Specifically, the solution is both reusable and type-checked, as well as conforming to the design goals of \CFA with ergonomic use of existing C abstractions. The new constructs are empirically compared with both standard C and \CC; the results show the new design is comparable in performance.
 \subsection{Polymorphic Functions}
 \label{sec:poly-fns}
 \CFA's polymorphism was originally formalized by \citet{Ditchfield92}, and first implemented by \citet{Bilson03}. The signature feature of \CFA is parametric-polymorphic functions; such functions are written using a @forall@ clause (which gives the language its name):
+\CFA's polymorphism was originally formalized by \citet{Ditchfield92}, and first implemented by \citet{Bilson03}. The signature feature of \CFA is parametric-polymorphic functions where functions are generalized using a @forall@ clause (giving the language its name):
 \begin{lstlisting}
 `forall( otype T )` T identity( T val ) { return val; }
 int forty_two = identity( 42 );                         $\C{// T is bound to int, forty\_two == 42}$
 \end{lstlisting}
+The @identity@ function above can be applied to any complete object-type (or ``@otype@''). The type variable @T@ is transformed into a set of additional implicit parameters to @identity@ that encode sufficient information about @T@ to create and return a variable of that type. The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, \eg for a pointer, the type parameter can be declared as @dtype T@, where @dtype@ is short for ``data type''.
+Here, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; preliminary experiments have shown this overhead to be similar to \CC virtual function calls. An advantage of this design is that, unlike \CC template functions, \CFA @forall@ functions are compatible with C separate compilation.
+Since bare polymorphic-types provide only a narrow set of available operations, \CFA provides a \emph{type assertion} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type variable. For instance, @twice@ can be defined using the \CFA syntax for operator overloading:
+\begin{lstlisting}
+forall( otype T | { T `?`+`?`(T, T); } )        $\C{// ? denotes operands}$
+  T twice( T x ) { return x + x; }                      $\C{// (2)}$
+The @identity@ function above can be applied to any complete \emph{object type} (or @otype@). The type variable @T@ is transformed into a set of additional implicit parameters encoding sufficient information about @T@ to create and return a variable of that type. The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, \eg for a pointer, the type parameter can be declared as a \emph{data type} (or @dtype@).
+Here, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; preliminary experiments have shown this overhead is similar to \CC virtual function calls. An advantage of this design is that, unlike \CC template functions, \CFA polymorphic functions are compatible with C \emph{separate} compilation, preventing code bloat.
+Since bare polymorphic-types provide only a narrow set of available operations, \CFA provides a \emph{type assertion} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type-variable. For example, the function @twice@ can be defined using the \CFA syntax for operator overloading:
+\begin{lstlisting}
+forall( otype T `| { T ?+?(T, T); }` ) T twice( T x ) { return x + x; } $\C{// ? denotes operands}$
 int val = twice( twice( 3.7 ) );
 \end{lstlisting}
+which works for any type @T@ with an addition operator defined. The translator accomplishes this polymorphism by creating a wrapper function for calling @+@ with @T@ bound to @double@, then providing this function to the first call of @twice@. It then has the option of using the same @twice@ again and converting the result to @int@ on assignment, or creating another @twice@ with type parameter @T@ bound to @int@ because \CFA uses the return type in its type analysis. The first approach has a late conversion from integer to floating-point on the final assignment, while the second has an eager conversion to integer. \CFA minimizes the number of conversions and their potential to lose information, so it selects the first approach.
+Monomorphic specializations of polymorphic functions can satisfy polymorphic type-assertions.
+% \begin{lstlisting}
+% forall(otype T `| { T twice(T); }`)           $\C{// type assertion}$
+% T four_times(T x) { return twice( twice(x) ); }
+% double twice(double d) { return d * 2.0; }    $\C{// (1)}$
+% double magic = four_times(10.5);                      $\C{// T bound to double, uses (1) to satisfy type assertion}$
+% \end{lstlisting}
+\begin{lstlisting}
+forall( otype T `| { int ?<?( T, T ); }` )      $\C{// type assertion}$
+  void qsort( const T * arr, size_t size );
+forall( otype T `| { int ?<?( T, T ); }` )      $\C{// type assertion}$
+  T * bsearch( T key, const T * arr, size_t size );
+double vals[10] = { /* 10 floating-point values */ };
+qsort( vals, 10 );                                                      $\C{// sort array}$
+double * val = bsearch( 5.0, vals, 10 );        $\C{// binary search sorted array for key}$
+\end{lstlisting}
+@qsort@ and @bsearch@ can only be called with arguments for which there exists a function named @<@ taking two arguments of the same type and returning an @int@ value.
+Here, the built-in monomorphic specialization of @<@ for type @double@ is passed as an additional implicit parameter to the calls of @qsort@ and @bsearch@.
+Crucial to the design of a new programming language are the libraries to access thousands of external features.
+\CFA inherits a massive compatible library-base, where other programming languages have to rewrite or provide fragile inter-language communication with C.
+A simple example is leveraging the existing type-unsafe (@void *@) C @bsearch@, shown here searching a floating-point array:
+which works for any type @T@ with a matching addition operator. The polymorphism is achieved by creating a wrapper function for calling @+@ with @T@ bound to @double@, then passing this function to the first call of @twice@. There is now the option of using the same @twice@ and converting the result to @int@ on assignment, or creating another @twice@ with type parameter @T@ bound to @int@ because \CFA uses the return type~\cite{Ada} in its type analysis. The first approach has a late conversion from @int@ to @double@ on the final assignment, while the second has an eager conversion to @int@. \CFA minimizes the number of conversions and their potential to lose information, so it selects the first approach, which corresponds with C-programmer intuition.
+Crucial to the design of a new programming language are the libraries to access thousands of external software features.
+\CFA inherits a massive compatible library-base, where other programming languages must rewrite or provide fragile inter-language communication with C.
+A simple example is leveraging the existing type-unsafe (@void *@) C @bsearch@ to binary search a sorted floating-point array:
 \begin{lstlisting}
 void * bsearch( const void * key, const void * base, size_t nmemb, size_t size,
 …
 int comp( const void * t1, const void * t2 ) { return *(double *)t1 < *(double *)t2 ? -1 :
                                 *(double *)t2 < *(double *)t1 ? 1 : 0; }
+double vals[10] = { /* 10 floating-point values */ };
 double key = 5.0;
 double * val = (double *)bsearch( &key, vals, size, sizeof(vals[0]), comp );
 \end{lstlisting}
+but providing a type-safe \CFA overloaded wrapper.
+double * val = (double *)bsearch( &key, vals, 10, sizeof(vals[0]), comp );      $\C{// search sorted array}$
+\end{lstlisting}
+which can be augmented simply with a generalized, type-safe, \CFA-overloaded wrappers:
 \begin{lstlisting}
 forall( otype T | { int ?<?( T, T ); } ) T * bsearch( T key, const T * arr, size_t size ) {
         int comp( const void * t1, const void * t2 ) { /* as above with double changed to T */ }
+        return (T *)bsearch( &key, arr, size, sizeof(T), comp );
+}
+        return (T *)bsearch( &key, arr, size, sizeof(T), comp ); }
 forall( otype T | { int ?<?( T, T ); } ) unsigned int bsearch( T key, const T * arr, size_t size ) {
         T *result = bsearch( key, arr, size );  $\C{// call first version}$
+        return result ? result - arr : size;            $\C{// pointer subtraction includes sizeof(T)}$
+}
+        return result ? result - arr : size; }  $\C{// pointer subtraction includes sizeof(T)}$
 double * val = bsearch( 5.0, vals, 10 );        $\C{// selection based on return type}$
 int posn = bsearch( 5.0, vals, 10 );
 \end{lstlisting}
 The nested routine @comp@ provides the hidden interface from typed \CFA to untyped (@void *@) C, plus the cast of the result.
 As well, an alternate kind of return is made available, position versus pointer to found element.
+As well, an alternate kind of return is made available: position versus pointer to found element.
 \CC's type-system cannot disambiguate between the two versions of @bsearch@ because it does not use the return type in overload resolution, nor can \CC separately compile a templated @bsearch@.
+Call-site inferencing and nested functions provide a localized form of inheritance. For example, @qsort@ only sorts in ascending order using @<@. However, it is trivial to locally change this behaviour:
+\begin{lstlisting}
+{   int ?<?( double x, double y ) { return x `>` y; }   $\C{// override behaviour}$
+\CFA has replacement libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions, all without rewriting the actual computations.
+For example, it is possible to write a type-safe \CFA wrapper @malloc@ based on the C @malloc@:
+\begin{lstlisting}
+forall( dtype T | sized(T) ) T * malloc( void ) { return (T *)(void *)malloc( (size_t)sizeof(T) ); }
+int * ip = malloc();                                            $\C{// select type and size from left-hand side}$
+double * dp = malloc();
+struct S {...} * sp = malloc();
+\end{lstlisting}
+where the return type supplies the type/size of the allocation, which is impossible in most type systems.
+Call-site inferencing and nested functions provide a localized form of inheritance. For example, the \CFA @qsort@ only sorts in ascending order using @<@. However, it is trivial to locally change this behaviour:
+\begin{lstlisting}
+forall( otype T | { int ?<?( T, T ); } ) void qsort( const T * arr, size_t size ) { /* use C qsort */ }
+{       int ?<?( double x, double y ) { return x `>` y; }       $\C{// locally override behaviour}$
         qsort( vals, size );                                    $\C{// descending sort}$
+}
 \end{lstlisting}
 Within the block, the nested version of @<@ performs @>@ and this local version overrides the built-in @<@ so it is passed to @qsort@.
 Hence, programmers can easily form new local environments to maximize reuse of existing functions and types.
 Finally, variables may be overloaded:
+Hence, programmers can easily form a local environments, adding and modifying appropriate functions, to maximize reuse of other existing functions and types.
+Finally, \CFA allows variable overloading:
 \lstDeleteShortInline@
 \par\smallskip
 …
 \smallskip\par\noindent
 Hence, the single name @MAX@ replaces all the C type-specific names: @SHRT_MAX@, @INT_MAX@, @DBL_MAX@.
+As well, restricted constant overloading is allowed for the values @0@ and @1@, which have special status in C, \eg the value @0@ is both an integer and a pointer literal, so its meaning depends on context.
+In addition, several operations are defined in terms values @0@ and @1@.
+For example,
+\begin{lstlisting}
+int x;
+if (x)        // if (x != 0)
+        x++;    //   x += 1;
+\end{lstlisting}
+Every if statement in C compares the condition with @0@, and every increment and decrement operator is semantically equivalent to adding or subtracting the value @1@ and storing the result.
+Due to these rewrite rules, the values @0@ and @1@ have the types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly connect to all special @0@ and @1@ contexts.
+The types @zero_t@ and @one_t@ have special built in implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work as normal.
 \subsection{Traits}
+\CFA provides \emph{traits} to name a group of type assertions:
+% \begin{lstlisting}
+% trait has_magnitude(otype T) {
+%     _Bool ?<?(T, T);                                          $\C{// comparison operator for T}$
+%     T -?(T);                                                          $\C{// negation operator for T}$
+%     void ?{}(T*, zero_t);                                     $\C{// constructor from 0 literal}$
+% };
+% forall(otype M | has_magnitude(M))
+% M abs( M m ) {
+%     M zero = { 0 };                                                   $\C{// uses zero\_t constructor from trait}$
+%     return m < zero ? -m : m;
+% }
+% forall(otype M | has_magnitude(M))
+% M max_magnitude( M a, M b ) {
+%     return abs(a) < abs(b) ? b : a;
+% }
+% \end{lstlisting}
+\CFA provides \emph{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration:
 \begin{lstlisting}
 trait summable( otype T ) {
         void ?{}(T*, zero_t);                                   $\C{// constructor from 0 literal}$
+        void ?{}( T *, zero_t );                                $\C{// constructor from 0 literal}$
         T ?+?( T, T );                                                  $\C{// assortment of additions}$
         T ?+=?( T *, T );
         T ++?( T * );
+        T ?++( T * );
+};
+forall( otype T | summable( T ) )
+  T sum( T a[$\,$], size_t size ) {
+        T total = { 0 };                                                $\C{// instantiate T from 0}$
+        T ?++( T * ); };
+forall( otype T `| summable( T )` ) T sum( T a[$\,$], size_t size ) {  // use trait
+        `T` total = { `0` };                                    $\C{// instantiate T from 0 by calling its constructor}$
         for ( unsigned int i = 0; i < size; i += 1 )
+                total += a[i];                                          $\C{// select appropriate +}$
+        return total;
+}
+\end{lstlisting}
+The trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration.
+                total `+=` a[i];                                        $\C{// select appropriate +}$
+        return total; }
+\end{lstlisting}
 In fact, the set of operators is incomplete, \eg no assignment, but @otype@ is syntactic sugar for the following implicit trait:
 \begin{lstlisting}
+trait otype( dtype T | sized(T) ) {
+        // sized is a compiler-provided pseudo-trait for types with known size and alignment}
+trait otype( dtype T | sized(T) ) {  // sized is a pseudo-trait for types with known size and alignment
         void ?{}( T * );                                                $\C{// default constructor}$
         void ?{}( T *, T );                                             $\C{// copy constructor}$
         void ?=?( T *, T );                                             $\C{// assignment operator}$
+        void ^?{}( T * );                                               $\C{// destructor}$
+};
+\end{lstlisting}
+Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete struct type -- they can be stack-allocated using the @alloca@ compiler builtin, default or copy-initialized, assigned, and deleted. As an example, the @sum@ function produces generated code something like the following (simplified for clarity and brevity)\TODO{fix example, maybe elide, it's likely too long with the more complicated function}:
+\begin{lstlisting}
+void abs( size_t _sizeof_M, size_t _alignof_M,
+                void (*_ctor_M)(void*), void (*_copy_M)(void*, void*),
+                void (*_assign_M)(void*, void*), void (*_dtor_M)(void*),
+                _Bool (*_lt_M)(void*, void*), void (*_neg_M)(void*, void*),
+        void (*_ctor_M_zero)(void*, int),
+                void* m, void* _rtn ) {                         $\C{// polymorphic parameter and return passed as void*}$
+                                                                                        $\C{// M zero = { 0 };}$
+        void* zero = alloca(_sizeof_M);                 $\C{// stack allocate zero temporary}$
+        _ctor_M_zero(zero, 0);                                  $\C{// initialize using zero\_t constructor}$
+                                                                                        $\C{// return m < zero ? -m : m;}$
+        void *_tmp = alloca(_sizeof_M);
+        _copy_M( _rtn,                                                  $\C{// copy-initialize return value}$
+                _lt_M( m, zero ) ?                                      $\C{// check condition}$
+                 (_neg_M(m, _tmp), _tmp) :                      $\C{// negate m}$
+                 m);
+        _dtor_M(_tmp); _dtor_M(zero);                   $\C{// destroy temporaries}$
+}
+\end{lstlisting}
+Semantically, traits are simply a named lists of type assertions, but they may be used for many of the same purposes that interfaces in Java or abstract base classes in \CC are used for. Unlike Java interfaces or \CC base classes, \CFA types do not explicitly state any inheritance relationship to traits they satisfy; this can be considered a form of structural inheritance, similar to implementation of an interface in Go, as opposed to the nominal inheritance model of Java and \CC. Nominal inheritance can be simulated with traits using marker variables or functions:
+        void ^?{}( T * ); };                                    $\C{// destructor}$
+\end{lstlisting}
+Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete type: stack-allocatable, default or copy-initialized, assigned, and deleted.
+% As an example, the @sum@ function produces generated code something like the following (simplified for clarity and brevity)\TODO{fix example, maybe elide, it's likely too long with the more complicated function}:
+% \begin{lstlisting}
+% void abs( size_t _sizeof_M, size_t _alignof_M,
+%               void (*_ctor_M)(void*), void (*_copy_M)(void*, void*),
+%               void (*_assign_M)(void*, void*), void (*_dtor_M)(void*),
+%               _Bool (*_lt_M)(void*, void*), void (*_neg_M)(void*, void*),
+%       void (*_ctor_M_zero)(void*, int),
+%               void* m, void* _rtn ) {                         $\C{// polymorphic parameter and return passed as void*}$
+%                                                                                       $\C{// M zero = { 0 };}$
+%       void* zero = alloca(_sizeof_M);                 $\C{// stack allocate zero temporary}$
+%       _ctor_M_zero(zero, 0);                                  $\C{// initialize using zero\_t constructor}$
+%                                                                                       $\C{// return m < zero ? -m : m;}$
+%       void *_tmp = alloca(_sizeof_M);
+%       _copy_M( _rtn,                                                  $\C{// copy-initialize return value}$
+%               _lt_M( m, zero ) ?                                      $\C{// check condition}$
+%                (_neg_M(m, _tmp), _tmp) :                      $\C{// negate m}$
+%                m);
+%       _dtor_M(_tmp); _dtor_M(zero);                   $\C{// destroy temporaries}$
+% }
+% \end{lstlisting}
+Traits may be used for many of the same purposes as interfaces in Java or abstract base classes in \CC. Unlike Java interfaces or \CC base classes, \CFA types do not explicitly state any inheritance relationship to traits they satisfy, which is a form of structural inheritance, similar to the implementation of an interface in Go~\citep{Go}, as opposed to the nominal inheritance model of Java and \CC.
+Nominal inheritance can be simulated with traits using marker variables or functions:
 \begin{lstlisting}
 trait nominal(otype T) {
     T is_nominal;
 };
 int is_nominal;                                                         $\C{// int now satisfies the nominal trait}$
 \end{lstlisting}
 Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship among multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems:
+Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship \emph{among} multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems:
 \begin{lstlisting}
 trait pointer_like(otype Ptr, otype El) {
     lvalue El *?(Ptr);                                          $\C{// Ptr can be dereferenced into a modifiable value of type El}$
+}
 struct list {
     int value;
     list *next;                                                         $\C{// may omit "struct" on type names}$
+    list *next;                                                         $\C{// may omit "struct" on type names as in \CC}$
 };
 typedef list *list_iterator;
 …
 One of the known shortcomings of standard C is that it does not provide reusable type-safe abstractions for generic data structures and algorithms. Broadly speaking, there are three approaches to create data structures in C. One approach is to write bespoke data structures for each context in which they are needed. While this approach is flexible and supports integration with the C type-checker and tooling, it is also tedious and error-prone, especially for more complex data structures. A second approach is to use @void*@-based polymorphism. This approach is taken by the C standard library functions @qsort@ and @bsearch@, and does allow the use of common code for common functionality. However, basing all polymorphism on @void*@ eliminates the type-checker's ability to ensure that argument types are properly matched, often requires a number of extra function parameters, and also adds pointer indirection and dynamic allocation to algorithms and data structures that would not otherwise require them. A third approach to generic code is to use pre-processor macros to generate it -- this approach does allow the generated code to be both generic and type-checked, though any errors produced may be difficult to interpret. Furthermore, writing and invoking C code as preprocessor macros is unnatural and somewhat inflexible.
 Other C-like languages such as \CC and Java use \emph{generic types} to produce type-safe abstract data types. The authors have chosen to implement generic types as well, with some care taken that the generic types design for \CFA integrates efficiently and naturally with the existing polymorphic functions in \CFA while retaining backwards compatibility with C; maintaining separate compilation is a particularly important constraint on the design. However, where the concrete parameters of the generic type are known, there is not extra overhead for the use of a generic type.
+Other C-like languages such as \CC and Java use \emph{generic types} to produce type-safe abstract data types. \CFA implements generic types with some care taken that the generic types design for \CFA integrates efficiently and naturally with the existing polymorphic functions in \CFA while retaining backwards compatibility with C; maintaining separate compilation is a particularly important constraint on the design. However, where the concrete parameters of the generic type are known, there is no extra overhead for the use of a generic type, as for \CC templates.
 A generic type can be declared by placing a @forall@ specifier on a @struct@ or @union@ declaration, and instantiated using a parenthesized list of types after the type name:
 …
 \end{lstlisting}
 \CFA classifies generic types as either \emph{concrete} or \emph{dynamic}. Dynamic generic types vary in their in-memory layout depending on their type parameters, while concrete generic types have a fixed memory layout regardless of type parameters. A type may have polymorphic parameters but still be concrete; in \CFA such types are called \emph{dtype-static}. Polymorphic pointers are an example of dtype-static types -- @forall(dtype T) T*@ is a polymorphic type, but for any @T@ chosen, @T*@ has exactly the same in-memory representation as a @void*@, and can therefore be represented by a @void*@ in code generation.
 \CFA generic types may also specify constraints on their argument type to be checked by the compiler. For example, consider the following declaration of a sorted set type, which ensures that the set key supports comparison and tests for equality:
+\CFA classifies generic types as either \emph{concrete} or \emph{dynamic}. Concrete generic types have a fixed memory layout regardless of type parameters, while dynamic generic types vary in their in-memory layout depending on their type parameters. A type may have polymorphic parameters but still be concrete; in \CFA such types are called \emph{dtype-static}. Polymorphic pointers are an example of dtype-static types -- @forall(dtype T) T*@ is a polymorphic type, but for any @T@ chosen, @T*@ has exactly the same in-memory representation as a @void*@, and can therefore be represented by a @void*@ in code generation.
+\CFA generic types may also specify constraints on their argument type to be checked by the compiler. For example, consider the following declaration of a sorted set-type, which ensures that the set key supports equality and relational comparison:
 \begin{lstlisting}
 forall(otype Key | { _Bool ?==?(Key, Key); _Bool ?<?(Key, Key); })
 struct sorted_set;
+  struct sorted_set;
 \end{lstlisting}
 …
 };
 \end{lstlisting}
 \subsection{Dynamic Generic Types}
 …
 Cyclone also provides capabilities for polymorphic functions and existential types~\citep{Grossman06}, similar in concept to \CFA's @forall@ functions and generic types. Cyclone existential types can include function pointers in a construct similar to a virtual function table, but these pointers must be explicitly initialized at some point in the code, a tedious and potentially error-prone process. Furthermore, Cyclone's polymorphic functions and types are restricted in that they may only abstract over types with the same layout and calling convention as @void*@, in practice only pointer types and @int@ - in \CFA terms, all Cyclone polymorphism must be dtype-static. This design provides the efficiency benefits discussed in Section~\ref{sec:generic-apps} for dtype-static polymorphism, but is more restrictive than \CFA's more general model.
+\TODO{Talk about GObject, other object-oriented frameworks for C (Objective-C)?}
 Go \citep{Go} and Rust \citep{Rust} are both modern, compiled languages with abstraction features similar to \CFA traits, \emph{interfaces} in Go and \emph{traits} in Rust. However, both languages represent dramatic departures from C in terms of language model, and neither has the same level of compatibility with C as \CFA. Go is a garbage-collected language, imposing the associated runtime overhead, and complicating foreign-function calls with the necessity of accounting for data transfer between the managed Go runtime and the unmanaged C runtime. Furthermore, while generic types and functions are available in Go, they are limited to a small fixed set provided by the compiler, with no language facility to define more. Rust is not garbage-collected, and thus has a lighter-weight runtime that is more easily interoperable with C. It also possesses much more powerful abstraction capabilities for writing generic code than Go. On the other hand, Rust's borrow-checker, while it does provide strong safety guarantees, is complex and difficult to learn, and imposes a distinctly idiomatic programming style on Rust. \CFA, with its more modest safety features, is significantly easier to port C code to, while maintaining the idiomatic style of the original source.
+Apple's Objective-C \citep{obj-c-book} is another industrially successful set of extensions to C. The Objective-C language model is a fairly radical departure from C, adding object-orientation and message-passing. Objective-C implements variadic functions using the C @va_arg@ mechanism, and did not support type-checked generics until recently \citep{xcode7}, historically using less-efficient and more error-prone runtime checking of object types instead. The GObject framework \citep{GObject} also adds object-orientation with runtime type-checking and reference-counting garbage-collection to C; these are much more intrusive feature additions than those provided by \CFA, in addition to the runtime overhead of reference-counting. The Vala programming language \citep{Vala} compiles to GObject-based C, and so adds the burden of learning a separate language syntax to the aforementioned demerits of GObject as a modernization path for existing C code-bases. Java \citep{Java8} has had generic types and variadic functions since Java~5; Java's generic types are type-checked at compilation and type-erased at runtime, similar to \CFA's, though in Java each object carries its own table of method pointers, while \CFA passes the method pointers separately so as to maintain a C-compatible struct layout. Java variadic functions are simply syntactic sugar for an array of a single type, and therefore less useful than \CFA's heterogeneously-typed variadic functions. Java is also a garbage-collected, object-oriented language, with the associated resource usage and C-interoperability burdens.
+D \citep{D}, Go \citep{Go}, and Rust \citep{Rust} are modern, compiled languages with abstraction features similar to \CFA traits, \emph{interfaces} in D and Go and \emph{traits} in Rust. However, each language represents dramatic departures from C in terms of language model, and none has the same level of compatibility with C as \CFA. D and Go are garbage-collected languages, imposing the associated runtime overhead. The necessity of accounting for data transfer between the managed Go runtime and the unmanaged C runtime complicates foreign-function interface between Go and C. Furthermore, while generic types and functions are available in Go, they are limited to a small fixed set provided by the compiler, with no language facility to define more. D restricts garbage collection to its own heap by default, while Rust is not garbage-collected, and thus has a lighter-weight runtime that is more easily interoperable with C. Rust also possesses much more powerful abstraction capabilities for writing generic code than Go. On the other hand, Rust's borrow-checker, while it does provide strong safety guarantees, is complex and difficult to learn, and imposes a distinctly idiomatic programming style on Rust. \CFA, with its more modest safety features, is significantly easier to port C code to, while maintaining the idiomatic style of the original source.
 \section{Conclusion \& Future Work}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changes in doc/generic_types/generic_types.tex [ae6cc8b:115a868]

Legend:

doc/generic_types/generic_types.tex

Download in other formats: