Changeset 3db60cb for doc/generic_types
- Timestamp:
- Apr 6, 2017, 3:39:39 PM (8 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
- Children:
- 3a48e283, b0fedd4
- Parents:
- 03bb816 (diff), 115a868c (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - Location:
- doc/generic_types
- Files:
-
- 11 added
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
TabularUnified doc/generic_types/acmart.cls ¶
r03bb816 r3db60cb 354 354 \let\@footnotemark@nolink\@footnotemark 355 355 \let\@footnotetext@nolink\@footnotetext 356 \RequirePackage[bookmarksnumbered ]{hyperref}356 \RequirePackage[bookmarksnumbered,breaklinks]{hyperref} 357 357 \urlstyle{rm} 358 358 \ifcase\ACM@format@nr -
TabularUnified doc/generic_types/generic_types.tex ¶
r03bb816 r3db60cb 28 28 \newcommand{\CCseventeen}{\rm C\kern-.1em\hbox{+\kern-.25em+}17\xspace} % C++17 symbolic name 29 29 \newcommand{\CCtwenty}{\rm C\kern-.1em\hbox{+\kern-.25em+}20\xspace} % C++20 symbolic name 30 31 \newcommand{\TODO}[1]{\textbf{TODO}: #1} % TODO included 30 \newcommand{\CS}{C\raisebox{-0.7ex}{\Large$^\sharp$}\xspace} 31 \newcommand{\Textbf}[1]{{\color{red}\textbf{#1}}} 32 33 \newcommand{\TODO}[1]{\textbf{TODO}: {\itshape #1}} % TODO included 32 34 %\newcommand{\TODO}[1]{} % TODO elided 33 35 \newcommand{\eg}{\textit{e}.\textit{g}.,\xspace} … … 49 51 stringstyle=\tt, % use typewriter font 50 52 tabsize=4, % 4 space tabbing 51 xleftmargin=\parindent ,% indent code to paragraph indentation53 xleftmargin=\parindentlnth, % indent code to paragraph indentation 52 54 %mathescape=true, % LaTeX math escape in CFA code $...$ 53 55 escapechar=\$, % LaTeX escape in CFA code … … 90 92 } 91 93 92 \terms{generic, tuple, types}93 \keywords{generic types, tuple types, polymorphic functions, C, Cforall}94 \terms{generic, tuple, variadic, types} 95 \keywords{generic types, tuple types, variadic types, polymorphic functions, C, Cforall} 94 96 95 97 \begin{CCSXML} … … 118 120 119 121 \begin{abstract} 120 The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects. This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more. Nonetheless, C, first standardized over thirty years ago, lacks many features that make programming in more modern languages safer and more productive. The goal of the \CFA project is to create an extension of C that provides modern safety and productivity features while still ensuring strong backwards compatibility with C and its programmers. Prior projects have attempted similar goals but failed to honour C programming-style; for instance, adding object-oriented or functional programming with garbage collection is a non-starter for many C developers. Specifically, \CFA is designed to have an orthogonal feature-set based closely on the C programming paradigm, so that \CFA features can be added \emph{incrementally} to existing C code-bases, and C programmers can learn \CFA extensions on an as-needed basis, preserving investment in existing code and engineers. This paper describes only two \CFA extensions, generic and tuple types, and how they are implemented in accordance with these principles.122 The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects. This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more. Nonetheless, C, first standardized over thirty years ago, lacks many features that make programming in more modern languages safer and more productive. The goal of the \CFA project is to create an extension of C that provides modern safety and productivity features while still ensuring strong backwards compatibility with C and its programmers. Prior projects have attempted similar goals but failed to honour C programming-style; for instance, adding object-oriented or functional programming with garbage collection is a non-starter for many C developers. Specifically, \CFA is designed to have an orthogonal feature-set based closely on the C programming paradigm, so that \CFA features can be added \emph{incrementally} to existing C code-bases, and C programmers can learn \CFA extensions on an as-needed basis, preserving investment in existing code and engineers. This paper describes two \CFA extensions, generic and tuple types, details how their design avoids shortcomings of similar features in C and other C-like languages, and presents experimental results validating the design. 121 123 \end{abstract} 122 124 … … 124 126 \maketitle 125 127 126 \section{Introduction \& Background} 127 128 \CFA\footnote{Pronounced ``C-for-all'', and written \CFA or Cforall.} is an evolutionary extension of the C programming language that aims to add modern language features to C while maintaining both source compatibility with C and a familiar programming model for programmers. Four key design goals were set out in the original design of \CFA~\citep{Bilson03}: 129 \begin{enumerate} 130 \item The behaviour of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler. 131 \item Standard C code must be as fast and as small when translated by a \CFA compiler as when translated by a C compiler. 132 \item \CFA code must be at least as portable as standard C code. 133 \item Extensions introduced by \CFA must be translated in the most efficient way possible. 134 \end{enumerate} 135 These goals ensure existing C code-bases can be converted to \CFA incrementally and with minimal effort, and C programmers can productively generate \CFA code without training beyond the features they wish to employ. In its current implementation, \CFA is compiled by translating it to the GCC-dialect of C~\citep{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by GCC, meeting goals (1)-(3). Ultimately, a compiler is necessary for advanced features and optimal performance. 136 137 \CFA has been previously extended with polymorphic functions and name overloading (including operator overloading) by \citet{Bilson03}, and deterministically-executed constructors and destructors by \citet{Schluntz17}. This paper describes how generic and tuple types are designed and implemented in \CFA in accordance with both the backward compatibility goals and existing features described above. 128 129 \section{Introduction and Background} 130 131 The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects. This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more. 132 The \citet{TIOBE} ranks the top 5 most popular programming languages as: Java 16\%, \Textbf{C 7\%}, \Textbf{\CC 5\%}, \CS 4\%, Python 4\% = 36\%, where the next 50 languages are less than 3\% each with a long tail. The top 3 rankings over the past 30 years are: 133 \lstDeleteShortInline@ 134 \begin{center} 135 \setlength{\tabcolsep}{10pt} 136 \begin{tabular}{@{}r|c|c|c|c|c|c|c@{}} 137 & 2017 & 2012 & 2007 & 2002 & 1997 & 1992 & 1987 \\ 138 \hline 139 Java & 1 & 1 & 1 & 3 & 13 & - & - \\ 140 \hline 141 \Textbf{C} & \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{1}& \Textbf{1}& \Textbf{1}& \Textbf{1} \\ 142 \hline 143 \CC & 3 & 3 & 3 & 3 & 2 & 2 & 4 \\ 144 \end{tabular} 145 \end{center} 146 \lstMakeShortInline@ 147 Love it or hate it, C is extremely popular, highly used, and one of the few system's languages. 148 In many cases, \CC is often used solely as a better C. 149 Nonetheless, C, first standardized over thirty years ago, lacks many features that make programming in more modern languages safer and more productive. 150 151 \CFA (pronounced ``C-for-all'', and written \CFA or Cforall) is an evolutionary extension of the C programming language that aims to add modern language features to C while maintaining both source compatibility with C and a familiar programming model for programmers. The four key design goals for \CFA~\citep{Bilson03} are: 152 (1) The behaviour of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler; 153 (2) Standard C code must be as fast and as small when translated by a \CFA compiler as when translated by a C compiler; 154 (3) \CFA code must be at least as portable as standard C code; 155 (4) Extensions introduced by \CFA must be translated in the most efficient way possible. 156 These goals ensure existing C code-bases can be converted to \CFA incrementally with minimal effort, and C programmers can productively generate \CFA code without training beyond the features being used. In its current implementation, \CFA is compiled by translating it to the GCC-dialect of C~\citep{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by GCC, meeting goals (1)-(3). Ultimately, a compiler is necessary for advanced features and optimal performance. 157 158 This paper identifies shortcomings in existing approaches to generic and variadic data types in C-like languages and presents a design for generic and variadic types avoiding those shortcomings. Specifically, the solution is both reusable and type-checked, as well as conforming to the design goals of \CFA with ergonomic use of existing C abstractions. The new constructs are empirically compared with both standard C and \CC; the results show the new design is comparable in performance. 138 159 139 160 … … 141 162 \label{sec:poly-fns} 142 163 143 \CFA's polymorphism was originally formalized by \citet{Ditchfield92}, and first implemented by \citet{Bilson03}. The signature feature of \CFA is parametric-polymorphic functions ; such functions are written using a @forall@ clause (which givesthe language its name):164 \CFA's polymorphism was originally formalized by \citet{Ditchfield92}, and first implemented by \citet{Bilson03}. The signature feature of \CFA is parametric-polymorphic functions where functions are generalized using a @forall@ clause (giving the language its name): 144 165 \begin{lstlisting} 145 166 `forall( otype T )` T identity( T val ) { return val; } 146 167 int forty_two = identity( 42 ); $\C{// T is bound to int, forty\_two == 42}$ 147 168 \end{lstlisting} 148 The @identity@ function above can be applied to any complete object-type (or ``@otype@''). The type variable @T@ is transformed into a set of additional implicit parameters to @identity@ that encode sufficient information about @T@ to create and return a variable of that type. The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, \eg for a pointer, the type parameter can be declared as @dtype T@, where @dtype@ is short for ``data type''. 149 150 Here, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; preliminary experiments have shown this overhead to be similar to \CC virtual function calls. An advantage of this design is that, unlike \CC template functions, \CFA @forall@ functions are compatible with C separate compilation. 151 152 Since bare polymorphic-types provide only a narrow set of available operations, \CFA provides a \emph{type assertion} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type variable. For instance, @twice@ can be defined using the \CFA syntax for operator overloading: 153 \begin{lstlisting} 154 forall( otype T | { T `?`+`?`(T, T); } ) $\C{// ? denotes operands}$ 155 T twice( T x ) { return x + x; } $\C{// (2)}$ 169 The @identity@ function above can be applied to any complete \emph{object type} (or @otype@). The type variable @T@ is transformed into a set of additional implicit parameters encoding sufficient information about @T@ to create and return a variable of that type. The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, \eg for a pointer, the type parameter can be declared as a \emph{data type} (or @dtype@). 170 171 Here, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; preliminary experiments have shown this overhead is similar to \CC virtual function calls. An advantage of this design is that, unlike \CC template functions, \CFA polymorphic functions are compatible with C \emph{separate} compilation, preventing code bloat. 172 173 Since bare polymorphic-types provide only a narrow set of available operations, \CFA provides a \emph{type assertion} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type-variable. For example, the function @twice@ can be defined using the \CFA syntax for operator overloading: 174 \begin{lstlisting} 175 forall( otype T `| { T ?+?(T, T); }` ) T twice( T x ) { return x + x; } $\C{// ? denotes operands}$ 156 176 int val = twice( twice( 3.7 ) ); 157 177 \end{lstlisting} 158 which works for any type @T@ with an addition operator defined. The translator accomplishes this polymorphism by creating a wrapper function for calling @+@ with @T@ bound to @double@, then providing this function to the first call of @twice@. It then has the option of using the same @twice@ again and converting the result to @int@ on assignment, or creating another @twice@ with type parameter @T@ bound to @int@ because \CFA uses the return type in its type analysis. The first approach has a late conversion from integer to floating-point on the final assignment, while the second has an eager conversion to integer. \CFA minimizes the number of conversions and their potential to lose information, so it selects the first approach. 159 160 Monomorphic specializations of polymorphic functions can satisfy polymorphic type-assertions. 161 % \begin{lstlisting} 162 % forall(otype T `| { T twice(T); }`) $\C{// type assertion}$ 163 % T four_times(T x) { return twice( twice(x) ); } 164 % double twice(double d) { return d * 2.0; } $\C{// (1)}$ 165 % double magic = four_times(10.5); $\C{// T bound to double, uses (1) to satisfy type assertion}$ 166 % \end{lstlisting} 167 \begin{lstlisting} 168 forall( otype T `| { int ?<?( T, T ); }` ) $\C{// type assertion}$ 169 void qsort( const T * arr, size_t size ); 170 forall( otype T `| { int ?<?( T, T ); }` ) $\C{// type assertion}$ 171 T * bsearch( T key, const T * arr, size_t size ); 172 double vals[10] = { /* 10 floating-point values */ }; 173 qsort( vals, 10 ); $\C{// sort array}$ 174 double * val = bsearch( 5.0, vals, 10 ); $\C{// binary search sorted array for key}$ 175 \end{lstlisting} 176 @qsort@ and @bsearch@ can only be called with arguments for which there exists a function named @<@ taking two arguments of the same type and returning an @int@ value. 177 Here, the built-in monomorphic specialization of @<@ for type @double@ is passed as an additional implicit parameter to the calls of @qsort@ and @bsearch@. 178 179 Crucial to the design of a new programming language are the libraries to access thousands of external features. 180 \CFA inherits a massive compatible library-base, where other programming languages have to rewrite or provide fragile inter-language communication with C. 181 A simple example is leveraging the existing type-unsafe (@void *@) C @bsearch@, shown here searching a floating-point array: 178 which works for any type @T@ with a matching addition operator. The polymorphism is achieved by creating a wrapper function for calling @+@ with @T@ bound to @double@, then passing this function to the first call of @twice@. There is now the option of using the same @twice@ and converting the result to @int@ on assignment, or creating another @twice@ with type parameter @T@ bound to @int@ because \CFA uses the return type~\cite{Ada} in its type analysis. The first approach has a late conversion from @int@ to @double@ on the final assignment, while the second has an eager conversion to @int@. \CFA minimizes the number of conversions and their potential to lose information, so it selects the first approach, which corresponds with C-programmer intuition. 179 180 Crucial to the design of a new programming language are the libraries to access thousands of external software features. 181 \CFA inherits a massive compatible library-base, where other programming languages must rewrite or provide fragile inter-language communication with C. 182 A simple example is leveraging the existing type-unsafe (@void *@) C @bsearch@ to binary search a sorted floating-point array: 182 183 \begin{lstlisting} 183 184 void * bsearch( const void * key, const void * base, size_t nmemb, size_t size, … … 185 186 int comp( const void * t1, const void * t2 ) { return *(double *)t1 < *(double *)t2 ? -1 : 186 187 *(double *)t2 < *(double *)t1 ? 1 : 0; } 188 double vals[10] = { /* 10 floating-point values */ }; 187 189 double key = 5.0; 188 double * val = (double *)bsearch( &key, vals, size, sizeof(vals[0]), comp );189 \end{lstlisting} 190 but providing a type-safe \CFA overloaded wrapper. 190 double * val = (double *)bsearch( &key, vals, 10, sizeof(vals[0]), comp ); $\C{// search sorted array}$ 191 \end{lstlisting} 192 which can be augmented simply with a generalized, type-safe, \CFA-overloaded wrappers: 191 193 \begin{lstlisting} 192 194 forall( otype T | { int ?<?( T, T ); } ) T * bsearch( T key, const T * arr, size_t size ) { 193 195 int comp( const void * t1, const void * t2 ) { /* as above with double changed to T */ } 194 return (T *)bsearch( &key, arr, size, sizeof(T), comp ); 195 } 196 return (T *)bsearch( &key, arr, size, sizeof(T), comp ); } 196 197 forall( otype T | { int ?<?( T, T ); } ) unsigned int bsearch( T key, const T * arr, size_t size ) { 197 198 T *result = bsearch( key, arr, size ); $\C{// call first version}$ 198 return result ? result - arr : size; $\C{// pointer subtraction includes sizeof(T)}$ 199 } 199 return result ? result - arr : size; } $\C{// pointer subtraction includes sizeof(T)}$ 200 200 double * val = bsearch( 5.0, vals, 10 ); $\C{// selection based on return type}$ 201 201 int posn = bsearch( 5.0, vals, 10 ); 202 202 \end{lstlisting} 203 203 The nested routine @comp@ provides the hidden interface from typed \CFA to untyped (@void *@) C, plus the cast of the result. 204 As well, an alternate kind of return is made available ,position versus pointer to found element.204 As well, an alternate kind of return is made available: position versus pointer to found element. 205 205 \CC's type-system cannot disambiguate between the two versions of @bsearch@ because it does not use the return type in overload resolution, nor can \CC separately compile a templated @bsearch@. 206 206 207 Call-site inferencing and nested functions provide a localized form of inheritance. For example, @qsort@ only sorts in ascending order using @<@. However, it is trivial to locally change this behaviour: 208 \begin{lstlisting} 209 { int ?<?( double x, double y ) { return x `>` y; } $\C{// override behaviour}$ 207 \CFA has replacement libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions, all without rewriting the actual computations. 208 For example, it is possible to write a type-safe \CFA wrapper @malloc@ based on the C @malloc@: 209 \begin{lstlisting} 210 forall( dtype T | sized(T) ) T * malloc( void ) { return (T *)(void *)malloc( (size_t)sizeof(T) ); } 211 int * ip = malloc(); $\C{// select type and size from left-hand side}$ 212 double * dp = malloc(); 213 struct S {...} * sp = malloc(); 214 \end{lstlisting} 215 where the return type supplies the type/size of the allocation, which is impossible in most type systems. 216 217 Call-site inferencing and nested functions provide a localized form of inheritance. For example, the \CFA @qsort@ only sorts in ascending order using @<@. However, it is trivial to locally change this behaviour: 218 \begin{lstlisting} 219 forall( otype T | { int ?<?( T, T ); } ) void qsort( const T * arr, size_t size ) { /* use C qsort */ } 220 { int ?<?( double x, double y ) { return x `>` y; } $\C{// locally override behaviour}$ 210 221 qsort( vals, size ); $\C{// descending sort}$ 211 222 } 212 223 \end{lstlisting} 213 224 Within the block, the nested version of @<@ performs @>@ and this local version overrides the built-in @<@ so it is passed to @qsort@. 214 Hence, programmers can easily form new local environments to maximize reuse ofexisting functions and types.215 216 Finally, variables may be overloaded:225 Hence, programmers can easily form a local environments, adding and modifying appropriate functions, to maximize reuse of other existing functions and types. 226 227 Finally, \CFA allows variable overloading: 217 228 \lstDeleteShortInline@ 218 229 \par\smallskip … … 233 244 \smallskip\par\noindent 234 245 Hence, the single name @MAX@ replaces all the C type-specific names: @SHRT_MAX@, @INT_MAX@, @DBL_MAX@. 246 As well, restricted constant overloading is allowed for the values @0@ and @1@, which have special status in C, \eg the value @0@ is both an integer and a pointer literal, so its meaning depends on context. 247 In addition, several operations are defined in terms values @0@ and @1@. 248 For example, 249 \begin{lstlisting} 250 int x; 251 if (x) // if (x != 0) 252 x++; // x += 1; 253 \end{lstlisting} 254 Every if statement in C compares the condition with @0@, and every increment and decrement operator is semantically equivalent to adding or subtracting the value @1@ and storing the result. 255 Due to these rewrite rules, the values @0@ and @1@ have the types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly connect to all special @0@ and @1@ contexts. 256 The types @zero_t@ and @one_t@ have special built in implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work as normal. 257 235 258 236 259 \subsection{Traits} 237 260 238 \CFA provides \emph{traits} to name a group of type assertions: 239 % \begin{lstlisting} 240 % trait has_magnitude(otype T) { 241 % _Bool ?<?(T, T); $\C{// comparison operator for T}$ 242 % T -?(T); $\C{// negation operator for T}$ 243 % void ?{}(T*, zero_t); $\C{// constructor from 0 literal}$ 244 % }; 245 % forall(otype M | has_magnitude(M)) 246 % M abs( M m ) { 247 % M zero = { 0 }; $\C{// uses zero\_t constructor from trait}$ 248 % return m < zero ? -m : m; 249 % } 250 % forall(otype M | has_magnitude(M)) 251 % M max_magnitude( M a, M b ) { 252 % return abs(a) < abs(b) ? b : a; 253 % } 254 % \end{lstlisting} 261 \CFA provides \emph{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration: 255 262 \begin{lstlisting} 256 263 trait summable( otype T ) { 257 void ?{}( T*, zero_t);$\C{// constructor from 0 literal}$264 void ?{}( T *, zero_t ); $\C{// constructor from 0 literal}$ 258 265 T ?+?( T, T ); $\C{// assortment of additions}$ 259 266 T ?+=?( T *, T ); 260 267 T ++?( T * ); 261 T ?++( T * ); 262 }; 263 forall( otype T | summable( T ) ) 264 T sum( T a[$\,$], size_t size ) { 265 T total = { 0 }; $\C{// instantiate T from 0}$ 268 T ?++( T * ); }; 269 forall( otype T `| summable( T )` ) T sum( T a[$\,$], size_t size ) { // use trait 270 `T` total = { `0` }; $\C{// instantiate T from 0 by calling its constructor}$ 266 271 for ( unsigned int i = 0; i < size; i += 1 ) 267 total += a[i]; $\C{// select appropriate +}$ 268 return total; 269 } 270 \end{lstlisting} 271 The trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration. 272 total `+=` a[i]; $\C{// select appropriate +}$ 273 return total; } 274 \end{lstlisting} 272 275 273 276 In fact, the set of operators is incomplete, \eg no assignment, but @otype@ is syntactic sugar for the following implicit trait: 274 277 \begin{lstlisting} 275 trait otype( dtype T | sized(T) ) { 276 // sized is a compiler-provided pseudo-trait for types with known size and alignment} 278 trait otype( dtype T | sized(T) ) { // sized is a pseudo-trait for types with known size and alignment 277 279 void ?{}( T * ); $\C{// default constructor}$ 278 280 void ?{}( T *, T ); $\C{// copy constructor}$ 279 281 void ?=?( T *, T ); $\C{// assignment operator}$ 280 void ^?{}( T * ); $\C{// destructor}$ 281 }; 282 \end{lstlisting} 283 Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete struct type -- they can be stack-allocated using the @alloca@ compiler builtin, default or copy-initialized, assigned, and deleted. As an example, the @sum@ function produces generated code something like the following (simplified for clarity and brevity)\TODO{fix example, maybe elide, it's likely too long with the more complicated function}: 284 \begin{lstlisting} 285 void abs( size_t _sizeof_M, size_t _alignof_M, 286 void (*_ctor_M)(void*), void (*_copy_M)(void*, void*), 287 void (*_assign_M)(void*, void*), void (*_dtor_M)(void*), 288 _Bool (*_lt_M)(void*, void*), void (*_neg_M)(void*, void*), 289 void (*_ctor_M_zero)(void*, int), 290 void* m, void* _rtn ) { $\C{// polymorphic parameter and return passed as void*}$ 291 $\C{// M zero = { 0 };}$ 292 void* zero = alloca(_sizeof_M); $\C{// stack allocate zero temporary}$ 293 _ctor_M_zero(zero, 0); $\C{// initialize using zero\_t constructor}$ 294 $\C{// return m < zero ? -m : m;}$ 295 void *_tmp = alloca(_sizeof_M); 296 _copy_M( _rtn, $\C{// copy-initialize return value}$ 297 _lt_M( m, zero ) ? $\C{// check condition}$ 298 (_neg_M(m, _tmp), _tmp) : $\C{// negate m}$ 299 m); 300 _dtor_M(_tmp); _dtor_M(zero); $\C{// destroy temporaries}$ 301 } 302 \end{lstlisting} 303 304 Semantically, traits are simply a named lists of type assertions, but they may be used for many of the same purposes that interfaces in Java or abstract base classes in \CC are used for. Unlike Java interfaces or \CC base classes, \CFA types do not explicitly state any inheritance relationship to traits they satisfy; this can be considered a form of structural inheritance, similar to implementation of an interface in Go, as opposed to the nominal inheritance model of Java and \CC. Nominal inheritance can be simulated with traits using marker variables or functions: 282 void ^?{}( T * ); }; $\C{// destructor}$ 283 \end{lstlisting} 284 Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete type: stack-allocatable, default or copy-initialized, assigned, and deleted. 285 % As an example, the @sum@ function produces generated code something like the following (simplified for clarity and brevity)\TODO{fix example, maybe elide, it's likely too long with the more complicated function}: 286 % \begin{lstlisting} 287 % void abs( size_t _sizeof_M, size_t _alignof_M, 288 % void (*_ctor_M)(void*), void (*_copy_M)(void*, void*), 289 % void (*_assign_M)(void*, void*), void (*_dtor_M)(void*), 290 % _Bool (*_lt_M)(void*, void*), void (*_neg_M)(void*, void*), 291 % void (*_ctor_M_zero)(void*, int), 292 % void* m, void* _rtn ) { $\C{// polymorphic parameter and return passed as void*}$ 293 % $\C{// M zero = { 0 };}$ 294 % void* zero = alloca(_sizeof_M); $\C{// stack allocate zero temporary}$ 295 % _ctor_M_zero(zero, 0); $\C{// initialize using zero\_t constructor}$ 296 % $\C{// return m < zero ? -m : m;}$ 297 % void *_tmp = alloca(_sizeof_M); 298 % _copy_M( _rtn, $\C{// copy-initialize return value}$ 299 % _lt_M( m, zero ) ? $\C{// check condition}$ 300 % (_neg_M(m, _tmp), _tmp) : $\C{// negate m}$ 301 % m); 302 % _dtor_M(_tmp); _dtor_M(zero); $\C{// destroy temporaries}$ 303 % } 304 % \end{lstlisting} 305 306 Traits may be used for many of the same purposes as interfaces in Java or abstract base classes in \CC. Unlike Java interfaces or \CC base classes, \CFA types do not explicitly state any inheritance relationship to traits they satisfy, which is a form of structural inheritance, similar to the implementation of an interface in Go~\citep{Go}, as opposed to the nominal inheritance model of Java and \CC. 307 308 Nominal inheritance can be simulated with traits using marker variables or functions: 305 309 \begin{lstlisting} 306 310 trait nominal(otype T) { 307 311 T is_nominal; 308 312 }; 309 310 313 int is_nominal; $\C{// int now satisfies the nominal trait}$ 311 314 \end{lstlisting} 312 315 313 Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship amongmultiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems:316 Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship \emph{among} multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems: 314 317 \begin{lstlisting} 315 318 trait pointer_like(otype Ptr, otype El) { 316 319 lvalue El *?(Ptr); $\C{// Ptr can be dereferenced into a modifiable value of type El}$ 317 320 } 318 319 321 struct list { 320 322 int value; 321 list *next; $\C{// may omit "struct" on type names }$323 list *next; $\C{// may omit "struct" on type names as in \CC}$ 322 324 }; 323 324 325 typedef list *list_iterator; 325 326 … … 334 335 One of the known shortcomings of standard C is that it does not provide reusable type-safe abstractions for generic data structures and algorithms. Broadly speaking, there are three approaches to create data structures in C. One approach is to write bespoke data structures for each context in which they are needed. While this approach is flexible and supports integration with the C type-checker and tooling, it is also tedious and error-prone, especially for more complex data structures. A second approach is to use @void*@-based polymorphism. This approach is taken by the C standard library functions @qsort@ and @bsearch@, and does allow the use of common code for common functionality. However, basing all polymorphism on @void*@ eliminates the type-checker's ability to ensure that argument types are properly matched, often requires a number of extra function parameters, and also adds pointer indirection and dynamic allocation to algorithms and data structures that would not otherwise require them. A third approach to generic code is to use pre-processor macros to generate it -- this approach does allow the generated code to be both generic and type-checked, though any errors produced may be difficult to interpret. Furthermore, writing and invoking C code as preprocessor macros is unnatural and somewhat inflexible. 335 336 336 Other C-like languages such as \CC and Java use \emph{generic types} to produce type-safe abstract data types. The authors have chosen to implement generic types as well, with some care taken that the generic types design for \CFA integrates efficiently and naturally with the existing polymorphic functions in \CFA while retaining backwards compatibility with C; maintaining separate compilation is a particularly important constraint on the design. However, where the concrete parameters of the generic type are known, there is not extra overhead for the use of a generic type.337 Other C-like languages such as \CC and Java use \emph{generic types} to produce type-safe abstract data types. \CFA implements generic types with some care taken that the generic types design for \CFA integrates efficiently and naturally with the existing polymorphic functions in \CFA while retaining backwards compatibility with C; maintaining separate compilation is a particularly important constraint on the design. However, where the concrete parameters of the generic type are known, there is no extra overhead for the use of a generic type, as for \CC templates. 337 338 338 339 A generic type can be declared by placing a @forall@ specifier on a @struct@ or @union@ declaration, and instantiated using a parenthesized list of types after the type name: … … 359 360 \end{lstlisting} 360 361 361 \CFA classifies generic types as either \emph{concrete} or \emph{dynamic}. Dynamic generic types vary in their in-memory layout depending on their type parameters, while concrete generic types have a fixed memory layout regardless oftype parameters. A type may have polymorphic parameters but still be concrete; in \CFA such types are called \emph{dtype-static}. Polymorphic pointers are an example of dtype-static types -- @forall(dtype T) T*@ is a polymorphic type, but for any @T@ chosen, @T*@ has exactly the same in-memory representation as a @void*@, and can therefore be represented by a @void*@ in code generation.362 363 \CFA generic types may also specify constraints on their argument type to be checked by the compiler. For example, consider the following declaration of a sorted set type, which ensures that the set key supports comparison and tests for equality:362 \CFA classifies generic types as either \emph{concrete} or \emph{dynamic}. Concrete generic types have a fixed memory layout regardless of type parameters, while dynamic generic types vary in their in-memory layout depending on their type parameters. A type may have polymorphic parameters but still be concrete; in \CFA such types are called \emph{dtype-static}. Polymorphic pointers are an example of dtype-static types -- @forall(dtype T) T*@ is a polymorphic type, but for any @T@ chosen, @T*@ has exactly the same in-memory representation as a @void*@, and can therefore be represented by a @void*@ in code generation. 363 364 \CFA generic types may also specify constraints on their argument type to be checked by the compiler. For example, consider the following declaration of a sorted set-type, which ensures that the set key supports equality and relational comparison: 364 365 \begin{lstlisting} 365 366 forall(otype Key | { _Bool ?==?(Key, Key); _Bool ?<?(Key, Key); }) 366 struct sorted_set;367 struct sorted_set; 367 368 \end{lstlisting} 368 369 … … 384 385 }; 385 386 \end{lstlisting} 387 386 388 387 389 \subsection{Dynamic Generic Types} … … 860 862 Cyclone also provides capabilities for polymorphic functions and existential types~\citep{Grossman06}, similar in concept to \CFA's @forall@ functions and generic types. Cyclone existential types can include function pointers in a construct similar to a virtual function table, but these pointers must be explicitly initialized at some point in the code, a tedious and potentially error-prone process. Furthermore, Cyclone's polymorphic functions and types are restricted in that they may only abstract over types with the same layout and calling convention as @void*@, in practice only pointer types and @int@ - in \CFA terms, all Cyclone polymorphism must be dtype-static. This design provides the efficiency benefits discussed in Section~\ref{sec:generic-apps} for dtype-static polymorphism, but is more restrictive than \CFA's more general model. 861 863 862 Go \citep{Go} and Rust \citep{Rust} are both modern, compiled languages with abstraction features similar to \CFA traits, \emph{interfaces} in Go and \emph{traits} in Rust. However, both languages represent dramatic departures from C in terms of language model, and neither has the same level of compatibility with C as \CFA. Go is a garbage-collected language, imposing the associated runtime overhead, and complicating foreign-function calls with the necessity of accounting for data transfer between the managed Go runtime and the unmanaged C runtime. Furthermore, while generic types and functions are available in Go, they are limited to a small fixed set provided by the compiler, with no language facility to define more. Rust is not garbage-collected, and thus has a lighter-weight runtime that is more easily interoperable with C. It also possesses much more powerful abstraction capabilities for writing generic code than Go. On the other hand, Rust's borrow-checker, while it does provide strong safety guarantees, is complex and difficult to learn, and imposes a distinctly idiomatic programming style on Rust. \CFA, with its more modest safety features, is significantly easier to port C code to, while maintaining the idiomatic style of the original source. 864 Apple's Objective-C \citep{obj-c-book} is another industrially successful set of extensions to C. The Objective-C language model is a fairly radical departure from C, adding object-orientation and message-passing. Objective-C implements variadic functions using the C @va_arg@ mechanism, and did not support type-checked generics until recently \citep{xcode7}, historically using less-efficient and more error-prone runtime checking of object types instead. The GObject framework \citep{GObject} also adds object-orientation with runtime type-checking and reference-counting garbage-collection to C; these are much more intrusive feature additions than those provided by \CFA, in addition to the runtime overhead of reference-counting. The Vala programming language \citep{Vala} compiles to GObject-based C, and so adds the burden of learning a separate language syntax to the aforementioned demerits of GObject as a modernization path for existing C code-bases. Java \citep{Java8} has had generic types and variadic functions since Java~5; Java's generic types are type-checked at compilation and type-erased at runtime, similar to \CFA's, though in Java each object carries its own table of method pointers, while \CFA passes the method pointers separately so as to maintain a C-compatible struct layout. Java variadic functions are simply syntactic sugar for an array of a single type, and therefore less useful than \CFA's heterogeneously-typed variadic functions. Java is also a garbage-collected, object-oriented language, with the associated resource usage and C-interoperability burdens. 865 866 D \citep{D}, Go \citep{Go}, and Rust \citep{Rust} are modern, compiled languages with abstraction features similar to \CFA traits, \emph{interfaces} in D and Go and \emph{traits} in Rust. However, each language represents dramatic departures from C in terms of language model, and none has the same level of compatibility with C as \CFA. D and Go are garbage-collected languages, imposing the associated runtime overhead. The necessity of accounting for data transfer between the managed Go runtime and the unmanaged C runtime complicates foreign-function interface between Go and C. Furthermore, while generic types and functions are available in Go, they are limited to a small fixed set provided by the compiler, with no language facility to define more. D restricts garbage collection to its own heap by default, while Rust is not garbage-collected, and thus has a lighter-weight runtime that is more easily interoperable with C. Rust also possesses much more powerful abstraction capabilities for writing generic code than Go. On the other hand, Rust's borrow-checker, while it does provide strong safety guarantees, is complex and difficult to learn, and imposes a distinctly idiomatic programming style on Rust. \CFA, with its more modest safety features, is significantly easier to port C code to, while maintaining the idiomatic style of the original source. 863 867 864 868 \section{Conclusion \& Future Work} 865 869 866 In conclusion, the authors' design for generic types and tuples imposes minimal runtime overhead while still supporting a full range of C features, including separately-compiled modules. There is ongoing work on a wide range of \CFA feature extensions, including reference types, exceptions, and concurrent programming primitives. In addition to this work, there are some interesting future directions the polymorphism design could take. Notably, \CC template functions trade compile time and code bloat for optimal runtime of individual instantiations of polymorphic functions. \CFA polymorphic functions, by contrast, use an approach that is essentially dynamic virtual dispatch. The runtime overhead of this approach is low, but not as low as \CC template functions, and it may be beneficial to provide a mechanism for particularly performance-sensitive code to close this gap. Further research is needed, but two promising approaches are to allow an annotation on polymorphic function call sites that tells the translator to create a template-specialization of the function (provided the code is visible in the current translation unit) or placing an annotation on polymorphic function definitions that instantiates a version of the polymorphic function specialized to some set of types. These approaches are not mutually exclusive, and would allow these performance optimizations to be applied only where most useful to increase performance, without suffering the code bloat or loss of generality of a template expansion approach where it is unnecessary. 870 There is ongoing work on a wide range of \CFA feature extensions, including reference types, exceptions, and concurrent programming primitives. In addition to this work, there are some interesting future directions the polymorphism design could take. Notably, \CC template functions trade compile time and code bloat for optimal runtime of individual instantiations of polymorphic functions. \CFA polymorphic functions, by contrast, use an approach that is essentially dynamic virtual dispatch. The runtime overhead of this approach is low, but not as low as \CC template functions, and it may be beneficial to provide a mechanism for particularly performance-sensitive code to close this gap. Further research is needed, but two promising approaches are to allow an annotation on polymorphic function call sites that tells the translator to create a template-specialization of the function (provided the code is visible in the current translation unit) or placing an annotation on polymorphic function definitions that instantiates a version of the polymorphic function specialized to some set of types. These approaches are not mutually exclusive, and would allow these performance optimizations to be applied only where most useful to increase performance, without suffering the code bloat or loss of generality of a template expansion approach where it is unnecessary. 871 872 In conclusion, the authors' design for generic types and tuples, unlike those available in existing work, is both reusable and type-checked, while still supporting a full range of C features, including separately-compiled modules. We have experimentally validated the performance of our design against both \CC and standard C, showing it is \TODO{shiny, cap'n}. 867 873 868 874 \begin{acks}
Note: See TracChangeset
for help on using the changeset viewer.