Ignore:
Timestamp:
Apr 5, 2017, 9:38:25 AM (5 years ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
aaron-thesis, arm-eh, cleanup-dtors, deferred_resn, demangler, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, resolv-new, with_gc
Children:
8f5bf6d
Parents:
3195953
Message:

update first 4 pages

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/generic_types/generic_types.tex

    r3195953 r4fc45ff  
    2828\newcommand{\CCseventeen}{\rm C\kern-.1em\hbox{+\kern-.25em+}17\xspace} % C++17 symbolic name
    2929\newcommand{\CCtwenty}{\rm C\kern-.1em\hbox{+\kern-.25em+}20\xspace} % C++20 symbolic name
     30\newcommand{\CS}{C\raisebox{-0.7ex}{\Large$^\sharp$}\xspace}
     31\newcommand{\Textbf}[1]{{\color{red}\textbf{#1}}}
    3032
    3133\newcommand{\TODO}[1]{\textbf{TODO}: {\itshape #1}} % TODO included
     
    124126\maketitle
    125127
    126 \section{Introduction \& Background}
    127 
    128 \CFA\footnote{Pronounced ``C-for-all'', and written \CFA or Cforall.} is an evolutionary extension of the C programming language that aims to add modern language features to C while maintaining both source compatibility with C and a familiar programming model for programmers. Four key design goals were set out in the original design of \CFA~\citep{Bilson03}:
    129 \begin{enumerate}
    130 \item The behaviour of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler.
    131 \item Standard C code must be as fast and as small when translated by a \CFA compiler as when translated by a C compiler.
    132 \item \CFA code must be at least as portable as standard C code.
    133 \item Extensions introduced by \CFA must be translated in the most efficient way possible.
    134 \end{enumerate}
     128
     129\section{Introduction and Background}
     130
     131The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects. This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more.
     132TIOBE~\cite{TIOBE} ranks the top 5 most popular programming languages as: Java 16\%, \Textbf{C 7\%}, \Textbf{\CC 5\%}, \CS 4\%, Python 4\% = 36\%, where the next 50 languages are less than 3\% each with a long tail. The top 3 rankings over the past 30 years are:
     133\lstDeleteShortInline@
     134\begin{center}
     135\setlength{\tabcolsep}{10pt}
     136\begin{tabular}{@{}r|c|c|c|c|c|c|c@{}}
     137                & 2017  & 2012  & 2007  & 2002  & 1997  & 1992  & 1987          \\
     138\hline
     139Java    & 1             & 1             & 1             & 3             & 13    & -             & -                     \\
     140\hline
     141\Textbf{C}      & \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{1}& \Textbf{1}& \Textbf{1}& \Textbf{1}    \\
     142\hline
     143\CC             & 3             & 3             & 3             & 3             & 2             & 2             & 4                     \\
     144\end{tabular}
     145\end{center}
     146\lstMakeShortInline@
     147Love it or hate it, C is extremely popular, highly used, and one of the few system's languages.
     148In many cases, \CC is often used solely as a better C.
     149Nonetheless, C, first standardized over thirty years ago, lacks many features that make programming in more modern languages safer and more productive.
     150
     151\CFA (pronounced ``C-for-all'', and written \CFA or Cforall) is an evolutionary extension of the C programming language that aims to add modern language features to C while maintaining both source compatibility with C and a familiar programming model for programmers. Four key design goals were set out in the original design of \CFA~\citep{Bilson03}:
     152(1) The behaviour of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler;
     153(2) Standard C code must be as fast and as small when translated by a \CFA compiler as when translated by a C compiler;
     154(3) \CFA code must be at least as portable as standard C code;
     155(4) Extensions introduced by \CFA must be translated in the most efficient way possible.
    135156These goals ensure existing C code-bases can be converted to \CFA incrementally and with minimal effort, and C programmers can productively generate \CFA code without training beyond the features they wish to employ. In its current implementation, \CFA is compiled by translating it to the GCC-dialect of C~\citep{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by GCC, meeting goals (1)-(3). Ultimately, a compiler is necessary for advanced features and optimal performance.
    136157
    137 \CFA has been previously extended with polymorphic functions and name overloading (including operator overloading) by \citet{Bilson03}, and deterministically-executed constructors and destructors by \citet{Schluntz17}. This paper builds on those contributions, identifying shortcomings in existing approaches to generic and variadic data types in C-like languages and presenting a design of generic and variadic types as as extension of the \CFA language that avoids those shortcomings. Particularly, the solution we present is both reusable and type-checked, as well as conforming to the design goals of \CFA and ergonomically using existing C abstractions. We have empirically compared our new design to both standard C and \CC; the results show that this design is \TODO{awesome, I hope}.
     158\CFA has been previously extended with polymorphic functions and name overloading (including operator overloading) by \citet{Bilson03}, and deterministically-executed constructors and destructors by \citet{Schluntz17}. This paper builds on those contributions, identifying shortcomings in existing approaches to generic and variadic data types in C-like languages and presenting a design for generic and variadic types avoiding those shortcomings. Specifically, the solution is both reusable and type-checked, as well as conforming to the design goals of \CFA with ergonomic use of existing C abstractions. The new constructs are empirically compared with both standard C and \CC; the results show the new design is comparable in performance.
     159
    138160
    139161\subsection{Polymorphic Functions}
    140162\label{sec:poly-fns}
    141163
    142 \CFA's polymorphism was originally formalized by \citet{Ditchfield92}, and first implemented by \citet{Bilson03}. The signature feature of \CFA is parametric-polymorphic functions; such functions are written using a @forall@ clause (which gives the language its name):
     164\CFA's polymorphism was originally formalized by \citet{Ditchfield92}, and first implemented by \citet{Bilson03}. The signature feature of \CFA is parametric-polymorphic functions where functions are generalized using a @forall@ clause (giving the language its name):
    143165\begin{lstlisting}
    144166`forall( otype T )` T identity( T val ) { return val; }
    145167int forty_two = identity( 42 );                         $\C{// T is bound to int, forty\_two == 42}$
    146168\end{lstlisting}
    147 The @identity@ function above can be applied to any complete object-type (or ``@otype@''). The type variable @T@ is transformed into a set of additional implicit parameters to @identity@ that encode sufficient information about @T@ to create and return a variable of that type. The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, \eg for a pointer, the type parameter can be declared as @dtype T@, where @dtype@ is short for ``data type''.
    148 
    149 Here, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; preliminary experiments have shown this overhead to be similar to \CC virtual function calls. An advantage of this design is that, unlike \CC template functions, \CFA @forall@ functions are compatible with C separate compilation.
    150 
    151 Since bare polymorphic-types provide only a narrow set of available operations, \CFA provides a \emph{type assertion} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type variable. For instance, @twice@ can be defined using the \CFA syntax for operator overloading:
    152 \begin{lstlisting}
    153 forall( otype T | { T `?`+`?`(T, T); } )        $\C{// ? denotes operands}$
    154   T twice( T x ) { return x + x; }                      $\C{// (2)}$
     169The @identity@ function above can be applied to any complete object-type (or ``@otype@''). The type variable @T@ is transformed into a set of additional implicit parameters encoding sufficient information about @T@ to create and return a variable of that type. The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, \eg for a pointer, the type parameter can be declared as @dtype T@, where @dtype@ is short for ``data type''.
     170
     171Here, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; preliminary experiments have shown this overhead is similar to \CC virtual function calls. An advantage of this design is that, unlike \CC template functions, \CFA @forall@ functions are compatible with C \emph{separate} compilation.
     172
     173Since bare polymorphic-types provide only a narrow set of available operations, \CFA provides a \emph{type assertion} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type-variable. For example, the function @twice@ can be defined using the \CFA syntax for operator overloading:
     174\begin{lstlisting}
     175forall( otype T `| { T ?+?(T, T); }` ) T twice( T x ) { return x + x; } $\C{// ? denotes operands}$
    155176int val = twice( twice( 3.7 ) );
    156177\end{lstlisting}
    157 which works for any type @T@ with an addition operator defined. The translator accomplishes this polymorphism by creating a wrapper function for calling @+@ with @T@ bound to @double@, then providing this function to the first call of @twice@. It then has the option of using the same @twice@ again and converting the result to @int@ on assignment, or creating another @twice@ with type parameter @T@ bound to @int@ because \CFA uses the return type in its type analysis. The first approach has a late conversion from integer to floating-point on the final assignment, while the second has an eager conversion to integer. \CFA minimizes the number of conversions and their potential to lose information, so it selects the first approach.
     178which works for any type @T@ with a matching addition operator. The polymorphism is achieved by creating a wrapper function for calling @+@ with @T@ bound to @double@, then passing this function to the first call of @twice@. There is now the option of using the same @twice@ and converting the result to @int@ on assignment, or creating another @twice@ with type parameter @T@ bound to @int@ because \CFA uses the return type in its type analysis. The first approach has a late conversion from @int@ to @double@ on the final assignment, while the second has an eager conversion to @int@. \CFA minimizes the number of conversions and their potential to lose information, so it selects the first approach, which corresponds with C-programmer intuition.
    158179
    159180Monomorphic specializations of polymorphic functions can satisfy polymorphic type-assertions.
     
    165186% \end{lstlisting}
    166187\begin{lstlisting}
    167 forall( otype T `| { int ?<?( T, T ); }` )      $\C{// type assertion}$
    168   void qsort( const T * arr, size_t size );
    169 forall( otype T `| { int ?<?( T, T ); }` )      $\C{// type assertion}$
    170   T * bsearch( T key, const T * arr, size_t size );
     188forall( otype T `| { int ?<?( T, T ); }` ) void qsort( const T * arr, size_t size );
     189forall( otype T `| { int ?<?( T, T ); }` ) T * bsearch( T key, const T * arr, size_t size );
    171190double vals[10] = { /* 10 floating-point values */ };
    172191qsort( vals, 10 );                                                      $\C{// sort array}$
    173192double * val = bsearch( 5.0, vals, 10 );        $\C{// binary search sorted array for key}$
    174193\end{lstlisting}
    175 @qsort@ and @bsearch@ can only be called with arguments for which there exists a function named @<@ taking two arguments of the same type and returning an @int@ value.
    176 Here, the built-in monomorphic specialization of @<@ for type @double@ is passed as an additional implicit parameter to the calls of @qsort@ and @bsearch@.
     194@qsort@ and @bsearch@ work for any type @T@ with a matching @<@ operator, and the built-in monomorphic specialization of @<@ for type @double@ is passed as an implicit parameter to the calls of @qsort@ and @bsearch@.
    177195
    178196Crucial to the design of a new programming language are the libraries to access thousands of external features.
     
    187205double * val = (double *)bsearch( &key, vals, size, sizeof(vals[0]), comp );
    188206\end{lstlisting}
    189 but providing a type-safe \CFA overloaded wrapper.
     207which can be augmented simply with a generalized, type-safe, \CFA-overloaded wrapper:
    190208\begin{lstlisting}
    191209forall( otype T | { int ?<?( T, T ); } ) T * bsearch( T key, const T * arr, size_t size ) {
     
    201219\end{lstlisting}
    202220The nested routine @comp@ provides the hidden interface from typed \CFA to untyped (@void *@) C, plus the cast of the result.
    203 As well, an alternate kind of return is made available, position versus pointer to found element.
     221As well, an alternate kind of return is made available: position versus pointer to found element.
    204222\CC's type-system cannot disambiguate between the two versions of @bsearch@ because it does not use the return type in overload resolution, nor can \CC separately compile a templated @bsearch@.
    205223
     
    211229\end{lstlisting}
    212230Within the block, the nested version of @<@ performs @>@ and this local version overrides the built-in @<@ so it is passed to @qsort@.
    213 Hence, programmers can easily form new local environments to maximize reuse of existing functions and types.
     231Hence, programmers can easily form a local environments, adding and modifying appropriate functions, to maximize reuse of other existing functions and types.
    214232
    215233Finally, \CFA allows variable overloading:
     
    233251Hence, the single name @MAX@ replaces all the C type-specific names: @SHRT_MAX@, @INT_MAX@, @DBL_MAX@.
    234252
     253
    235254\subsection{Traits}
    236255
    237 \CFA provides \emph{traits} to name a group of type assertions:
    238 % \begin{lstlisting}
    239 % trait has_magnitude(otype T) {
    240 %     _Bool ?<?(T, T);                                          $\C{// comparison operator for T}$
    241 %     T -?(T);                                                          $\C{// negation operator for T}$
    242 %     void ?{}(T*, zero_t);                                     $\C{// constructor from 0 literal}$
    243 % };
    244 % forall(otype M | has_magnitude(M))
    245 % M abs( M m ) {
    246 %     M zero = { 0 };                                                   $\C{// uses zero\_t constructor from trait}$
    247 %     return m < zero ? -m : m;
    248 % }
    249 % forall(otype M | has_magnitude(M))
    250 % M max_magnitude( M a, M b ) {
    251 %     return abs(a) < abs(b) ? b : a;
    252 % }
    253 % \end{lstlisting}
     256\CFA provides \emph{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration:
    254257\begin{lstlisting}
    255258trait summable( otype T ) {
     
    262265forall( otype T | summable( T ) )
    263266  T sum( T a[$\,$], size_t size ) {
    264         `T` total = { `0` };                                    $\C{// instantiate T from 0}$
     267        `T` total = { `0` };                                    $\C{// instantiate T from 0 but calling its constructor}$
    265268        for ( unsigned int i = 0; i < size; i += 1 )
    266269                total `+=` a[i];                                        $\C{// select appropriate +}$
     
    268271}
    269272\end{lstlisting}
    270 The trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration.
    271273
    272274In fact, the set of operators is incomplete, \eg no assignment, but @otype@ is syntactic sugar for the following implicit trait:
    273275\begin{lstlisting}
    274 trait otype( dtype T | sized(T) ) {
    275         // sized is a compiler-provided pseudo-trait for types with known size and alignment}
     276trait otype( dtype T | sized(T) ) {  // sized is a pseudo-trait for types with known size and alignment
    276277        void ?{}( T * );                                                $\C{// default constructor}$
    277278        void ?{}( T *, T );                                             $\C{// copy constructor}$
     
    280281};
    281282\end{lstlisting}
    282 Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete struct type -- they can be stack-allocated using the @alloca@ compiler builtin, default or copy-initialized, assigned, and deleted. As an example, the @sum@ function produces generated code something like the following (simplified for clarity and brevity)\TODO{fix example, maybe elide, it's likely too long with the more complicated function}:
    283 \begin{lstlisting}
    284 void abs( size_t _sizeof_M, size_t _alignof_M,
    285                 void (*_ctor_M)(void*), void (*_copy_M)(void*, void*),
    286                 void (*_assign_M)(void*, void*), void (*_dtor_M)(void*),
    287                 _Bool (*_lt_M)(void*, void*), void (*_neg_M)(void*, void*),
    288         void (*_ctor_M_zero)(void*, int),
    289                 void* m, void* _rtn ) {                         $\C{// polymorphic parameter and return passed as void*}$
    290                                                                                         $\C{// M zero = { 0 };}$
    291         void* zero = alloca(_sizeof_M);                 $\C{// stack allocate zero temporary}$
    292         _ctor_M_zero(zero, 0);                                  $\C{// initialize using zero\_t constructor}$
    293                                                                                         $\C{// return m < zero ? -m : m;}$
    294         void *_tmp = alloca(_sizeof_M);
    295         _copy_M( _rtn,                                                  $\C{// copy-initialize return value}$
    296                 _lt_M( m, zero ) ?                                      $\C{// check condition}$
    297                  (_neg_M(m, _tmp), _tmp) :                      $\C{// negate m}$
    298                  m);
    299         _dtor_M(_tmp); _dtor_M(zero);                   $\C{// destroy temporaries}$
    300 }
    301 \end{lstlisting}
    302 
    303 Traits may be used for many of the same purposes as interfaces in Java or abstract base classes in \CC. Unlike Java interfaces or \CC base classes, \CFA types do not explicitly state any inheritance relationship to traits they satisfy, which is a form of structural inheritance, similar to the implementation of an interface in Go~\citep{Go}, as opposed to the nominal inheritance model of Java and \CC. Nominal inheritance can be simulated with traits using marker variables or functions:
     283Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete type: stack-allocatable, default or copy-initialized, assigned, and deleted.
     284% As an example, the @sum@ function produces generated code something like the following (simplified for clarity and brevity)\TODO{fix example, maybe elide, it's likely too long with the more complicated function}:
     285% \begin{lstlisting}
     286% void abs( size_t _sizeof_M, size_t _alignof_M,
     287%               void (*_ctor_M)(void*), void (*_copy_M)(void*, void*),
     288%               void (*_assign_M)(void*, void*), void (*_dtor_M)(void*),
     289%               _Bool (*_lt_M)(void*, void*), void (*_neg_M)(void*, void*),
     290%       void (*_ctor_M_zero)(void*, int),
     291%               void* m, void* _rtn ) {                         $\C{// polymorphic parameter and return passed as void*}$
     292%                                                                                       $\C{// M zero = { 0 };}$
     293%       void* zero = alloca(_sizeof_M);                 $\C{// stack allocate zero temporary}$
     294%       _ctor_M_zero(zero, 0);                                  $\C{// initialize using zero\_t constructor}$
     295%                                                                                       $\C{// return m < zero ? -m : m;}$
     296%       void *_tmp = alloca(_sizeof_M);
     297%       _copy_M( _rtn,                                                  $\C{// copy-initialize return value}$
     298%               _lt_M( m, zero ) ?                                      $\C{// check condition}$
     299%                (_neg_M(m, _tmp), _tmp) :                      $\C{// negate m}$
     300%                m);
     301%       _dtor_M(_tmp); _dtor_M(zero);                   $\C{// destroy temporaries}$
     302% }
     303% \end{lstlisting}
     304
     305Traits may be used for many of the same purposes as interfaces in Java or abstract base classes in \CC. Unlike Java interfaces or \CC base classes, \CFA types do not explicitly state any inheritance relationship to traits they satisfy, which is a form of structural inheritance, similar to the implementation of an interface in Go~\citep{Go}, as opposed to the nominal inheritance model of Java and \CC.
     306
     307Nominal inheritance can be simulated with traits using marker variables or functions:
    304308\begin{lstlisting}
    305309trait nominal(otype T) {
    306310    T is_nominal;
    307311};
    308 
    309312int is_nominal;                                                         $\C{// int now satisfies the nominal trait}$
    310313\end{lstlisting}
    311314
    312 Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship among multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems:
     315Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship \emph{among} multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems:
    313316\begin{lstlisting}
    314317trait pointer_like(otype Ptr, otype El) {
    315318    lvalue El *?(Ptr);                                          $\C{// Ptr can be dereferenced into a modifiable value of type El}$
    316319}
    317 
    318320struct list {
    319321    int value;
    320322    list *next;                                                         $\C{// may omit "struct" on type names as in \CC}$
    321323};
    322 
    323324typedef list *list_iterator;
    324325
Note: See TracChangeset for help on using the changeset viewer.