Changeset 4fc45ff
- Timestamp:
- Apr 5, 2017, 9:38:25 AM (8 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
- Children:
- 8f5bf6d
- Parents:
- 3195953
- Location:
- doc
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/bibliography/cfa.bib
r3195953 r4fc45ff 6549 6549 } 6550 6550 6551 @unpublished{TIOBE, 6552 contributer = {pabuhr@plg}, 6553 author = {TIOBE Index}, 6554 title = {}, 6555 year = {March 2017}, 6556 note = {\url{http://www.tiobe.com/tiobe_index}}, 6557 } 6558 6551 6559 @misc{Bumbulis90, 6552 6560 keywords = {parameter inference, ForceN}, … … 6555 6563 title = {Towards Making Signatures First-Class}, 6556 6564 howpublished= {personal communication}, 6557 month = sep, year = 1990, 6565 month = sep, 6566 year = 1990, 6558 6567 note = {} 6559 6568 } -
doc/generic_types/generic_types.tex
r3195953 r4fc45ff 28 28 \newcommand{\CCseventeen}{\rm C\kern-.1em\hbox{+\kern-.25em+}17\xspace} % C++17 symbolic name 29 29 \newcommand{\CCtwenty}{\rm C\kern-.1em\hbox{+\kern-.25em+}20\xspace} % C++20 symbolic name 30 \newcommand{\CS}{C\raisebox{-0.7ex}{\Large$^\sharp$}\xspace} 31 \newcommand{\Textbf}[1]{{\color{red}\textbf{#1}}} 30 32 31 33 \newcommand{\TODO}[1]{\textbf{TODO}: {\itshape #1}} % TODO included … … 124 126 \maketitle 125 127 126 \section{Introduction \& Background} 127 128 \CFA\footnote{Pronounced ``C-for-all'', and written \CFA or Cforall.} is an evolutionary extension of the C programming language that aims to add modern language features to C while maintaining both source compatibility with C and a familiar programming model for programmers. Four key design goals were set out in the original design of \CFA~\citep{Bilson03}: 129 \begin{enumerate} 130 \item The behaviour of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler. 131 \item Standard C code must be as fast and as small when translated by a \CFA compiler as when translated by a C compiler. 132 \item \CFA code must be at least as portable as standard C code. 133 \item Extensions introduced by \CFA must be translated in the most efficient way possible. 134 \end{enumerate} 128 129 \section{Introduction and Background} 130 131 The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects. This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more. 132 TIOBE~\cite{TIOBE} ranks the top 5 most popular programming languages as: Java 16\%, \Textbf{C 7\%}, \Textbf{\CC 5\%}, \CS 4\%, Python 4\% = 36\%, where the next 50 languages are less than 3\% each with a long tail. The top 3 rankings over the past 30 years are: 133 \lstDeleteShortInline@ 134 \begin{center} 135 \setlength{\tabcolsep}{10pt} 136 \begin{tabular}{@{}r|c|c|c|c|c|c|c@{}} 137 & 2017 & 2012 & 2007 & 2002 & 1997 & 1992 & 1987 \\ 138 \hline 139 Java & 1 & 1 & 1 & 3 & 13 & - & - \\ 140 \hline 141 \Textbf{C} & \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{1}& \Textbf{1}& \Textbf{1}& \Textbf{1} \\ 142 \hline 143 \CC & 3 & 3 & 3 & 3 & 2 & 2 & 4 \\ 144 \end{tabular} 145 \end{center} 146 \lstMakeShortInline@ 147 Love it or hate it, C is extremely popular, highly used, and one of the few system's languages. 148 In many cases, \CC is often used solely as a better C. 149 Nonetheless, C, first standardized over thirty years ago, lacks many features that make programming in more modern languages safer and more productive. 150 151 \CFA (pronounced ``C-for-all'', and written \CFA or Cforall) is an evolutionary extension of the C programming language that aims to add modern language features to C while maintaining both source compatibility with C and a familiar programming model for programmers. Four key design goals were set out in the original design of \CFA~\citep{Bilson03}: 152 (1) The behaviour of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler; 153 (2) Standard C code must be as fast and as small when translated by a \CFA compiler as when translated by a C compiler; 154 (3) \CFA code must be at least as portable as standard C code; 155 (4) Extensions introduced by \CFA must be translated in the most efficient way possible. 135 156 These goals ensure existing C code-bases can be converted to \CFA incrementally and with minimal effort, and C programmers can productively generate \CFA code without training beyond the features they wish to employ. In its current implementation, \CFA is compiled by translating it to the GCC-dialect of C~\citep{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by GCC, meeting goals (1)-(3). Ultimately, a compiler is necessary for advanced features and optimal performance. 136 157 137 \CFA has been previously extended with polymorphic functions and name overloading (including operator overloading) by \citet{Bilson03}, and deterministically-executed constructors and destructors by \citet{Schluntz17}. This paper builds on those contributions, identifying shortcomings in existing approaches to generic and variadic data types in C-like languages and presenting a design of generic and variadic types as as extension of the \CFA language that avoids those shortcomings. Particularly, the solution we present is both reusable and type-checked, as well as conforming to the design goals of \CFA and ergonomically using existing C abstractions. We have empirically compared our new design to both standard C and \CC; the results show that this design is \TODO{awesome, I hope}. 158 \CFA has been previously extended with polymorphic functions and name overloading (including operator overloading) by \citet{Bilson03}, and deterministically-executed constructors and destructors by \citet{Schluntz17}. This paper builds on those contributions, identifying shortcomings in existing approaches to generic and variadic data types in C-like languages and presenting a design for generic and variadic types avoiding those shortcomings. Specifically, the solution is both reusable and type-checked, as well as conforming to the design goals of \CFA with ergonomic use of existing C abstractions. The new constructs are empirically compared with both standard C and \CC; the results show the new design is comparable in performance. 159 138 160 139 161 \subsection{Polymorphic Functions} 140 162 \label{sec:poly-fns} 141 163 142 \CFA's polymorphism was originally formalized by \citet{Ditchfield92}, and first implemented by \citet{Bilson03}. The signature feature of \CFA is parametric-polymorphic functions ; such functions are written using a @forall@ clause (which givesthe language its name):164 \CFA's polymorphism was originally formalized by \citet{Ditchfield92}, and first implemented by \citet{Bilson03}. The signature feature of \CFA is parametric-polymorphic functions where functions are generalized using a @forall@ clause (giving the language its name): 143 165 \begin{lstlisting} 144 166 `forall( otype T )` T identity( T val ) { return val; } 145 167 int forty_two = identity( 42 ); $\C{// T is bound to int, forty\_two == 42}$ 146 168 \end{lstlisting} 147 The @identity@ function above can be applied to any complete object-type (or ``@otype@''). The type variable @T@ is transformed into a set of additional implicit parameters to @identity@ that encode sufficient information about @T@ to create and return a variable of that type. The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, \eg for a pointer, the type parameter can be declared as @dtype T@, where @dtype@ is short for ``data type''. 148 149 Here, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; preliminary experiments have shown this overhead to be similar to \CC virtual function calls. An advantage of this design is that, unlike \CC template functions, \CFA @forall@ functions are compatible with C separate compilation. 150 151 Since bare polymorphic-types provide only a narrow set of available operations, \CFA provides a \emph{type assertion} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type variable. For instance, @twice@ can be defined using the \CFA syntax for operator overloading: 152 \begin{lstlisting} 153 forall( otype T | { T `?`+`?`(T, T); } ) $\C{// ? denotes operands}$ 154 T twice( T x ) { return x + x; } $\C{// (2)}$ 169 The @identity@ function above can be applied to any complete object-type (or ``@otype@''). The type variable @T@ is transformed into a set of additional implicit parameters encoding sufficient information about @T@ to create and return a variable of that type. The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, \eg for a pointer, the type parameter can be declared as @dtype T@, where @dtype@ is short for ``data type''. 170 171 Here, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; preliminary experiments have shown this overhead is similar to \CC virtual function calls. An advantage of this design is that, unlike \CC template functions, \CFA @forall@ functions are compatible with C \emph{separate} compilation. 172 173 Since bare polymorphic-types provide only a narrow set of available operations, \CFA provides a \emph{type assertion} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type-variable. For example, the function @twice@ can be defined using the \CFA syntax for operator overloading: 174 \begin{lstlisting} 175 forall( otype T `| { T ?+?(T, T); }` ) T twice( T x ) { return x + x; } $\C{// ? denotes operands}$ 155 176 int val = twice( twice( 3.7 ) ); 156 177 \end{lstlisting} 157 which works for any type @T@ with a n addition operator defined. The translator accomplishes this polymorphism by creating a wrapper function for calling @+@ with @T@ bound to @double@, then providing this function to the first call of @twice@. It then has the option of using the same @twice@ again and converting the result to @int@ on assignment, or creating another @twice@ with type parameter @T@ bound to @int@ because \CFA uses the return type in its type analysis. The first approach has a late conversion from integer to floating-point on the final assignment, while the second has an eager conversion to integer. \CFA minimizes the number of conversions and their potential to lose information, so it selects the first approach.178 which works for any type @T@ with a matching addition operator. The polymorphism is achieved by creating a wrapper function for calling @+@ with @T@ bound to @double@, then passing this function to the first call of @twice@. There is now the option of using the same @twice@ and converting the result to @int@ on assignment, or creating another @twice@ with type parameter @T@ bound to @int@ because \CFA uses the return type in its type analysis. The first approach has a late conversion from @int@ to @double@ on the final assignment, while the second has an eager conversion to @int@. \CFA minimizes the number of conversions and their potential to lose information, so it selects the first approach, which corresponds with C-programmer intuition. 158 179 159 180 Monomorphic specializations of polymorphic functions can satisfy polymorphic type-assertions. … … 165 186 % \end{lstlisting} 166 187 \begin{lstlisting} 167 forall( otype T `| { int ?<?( T, T ); }` ) $\C{// type assertion}$ 168 void qsort( const T * arr, size_t size ); 169 forall( otype T `| { int ?<?( T, T ); }` ) $\C{// type assertion}$ 170 T * bsearch( T key, const T * arr, size_t size ); 188 forall( otype T `| { int ?<?( T, T ); }` ) void qsort( const T * arr, size_t size ); 189 forall( otype T `| { int ?<?( T, T ); }` ) T * bsearch( T key, const T * arr, size_t size ); 171 190 double vals[10] = { /* 10 floating-point values */ }; 172 191 qsort( vals, 10 ); $\C{// sort array}$ 173 192 double * val = bsearch( 5.0, vals, 10 ); $\C{// binary search sorted array for key}$ 174 193 \end{lstlisting} 175 @qsort@ and @bsearch@ can only be called with arguments for which there exists a function named @<@ taking two arguments of the same type and returning an @int@ value. 176 Here, the built-in monomorphic specialization of @<@ for type @double@ is passed as an additional implicit parameter to the calls of @qsort@ and @bsearch@. 194 @qsort@ and @bsearch@ work for any type @T@ with a matching @<@ operator, and the built-in monomorphic specialization of @<@ for type @double@ is passed as an implicit parameter to the calls of @qsort@ and @bsearch@. 177 195 178 196 Crucial to the design of a new programming language are the libraries to access thousands of external features. … … 187 205 double * val = (double *)bsearch( &key, vals, size, sizeof(vals[0]), comp ); 188 206 \end{lstlisting} 189 but providing a type-safe \CFA overloaded wrapper. 207 which can be augmented simply with a generalized, type-safe, \CFA-overloaded wrapper: 190 208 \begin{lstlisting} 191 209 forall( otype T | { int ?<?( T, T ); } ) T * bsearch( T key, const T * arr, size_t size ) { … … 201 219 \end{lstlisting} 202 220 The nested routine @comp@ provides the hidden interface from typed \CFA to untyped (@void *@) C, plus the cast of the result. 203 As well, an alternate kind of return is made available ,position versus pointer to found element.221 As well, an alternate kind of return is made available: position versus pointer to found element. 204 222 \CC's type-system cannot disambiguate between the two versions of @bsearch@ because it does not use the return type in overload resolution, nor can \CC separately compile a templated @bsearch@. 205 223 … … 211 229 \end{lstlisting} 212 230 Within the block, the nested version of @<@ performs @>@ and this local version overrides the built-in @<@ so it is passed to @qsort@. 213 Hence, programmers can easily form new local environments to maximize reuse ofexisting functions and types.231 Hence, programmers can easily form a local environments, adding and modifying appropriate functions, to maximize reuse of other existing functions and types. 214 232 215 233 Finally, \CFA allows variable overloading: … … 233 251 Hence, the single name @MAX@ replaces all the C type-specific names: @SHRT_MAX@, @INT_MAX@, @DBL_MAX@. 234 252 253 235 254 \subsection{Traits} 236 255 237 \CFA provides \emph{traits} to name a group of type assertions: 238 % \begin{lstlisting} 239 % trait has_magnitude(otype T) { 240 % _Bool ?<?(T, T); $\C{// comparison operator for T}$ 241 % T -?(T); $\C{// negation operator for T}$ 242 % void ?{}(T*, zero_t); $\C{// constructor from 0 literal}$ 243 % }; 244 % forall(otype M | has_magnitude(M)) 245 % M abs( M m ) { 246 % M zero = { 0 }; $\C{// uses zero\_t constructor from trait}$ 247 % return m < zero ? -m : m; 248 % } 249 % forall(otype M | has_magnitude(M)) 250 % M max_magnitude( M a, M b ) { 251 % return abs(a) < abs(b) ? b : a; 252 % } 253 % \end{lstlisting} 256 \CFA provides \emph{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration: 254 257 \begin{lstlisting} 255 258 trait summable( otype T ) { … … 262 265 forall( otype T | summable( T ) ) 263 266 T sum( T a[$\,$], size_t size ) { 264 `T` total = { `0` }; $\C{// instantiate T from 0 }$267 `T` total = { `0` }; $\C{// instantiate T from 0 but calling its constructor}$ 265 268 for ( unsigned int i = 0; i < size; i += 1 ) 266 269 total `+=` a[i]; $\C{// select appropriate +}$ … … 268 271 } 269 272 \end{lstlisting} 270 The trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration.271 273 272 274 In fact, the set of operators is incomplete, \eg no assignment, but @otype@ is syntactic sugar for the following implicit trait: 273 275 \begin{lstlisting} 274 trait otype( dtype T | sized(T) ) { 275 // sized is a compiler-provided pseudo-trait for types with known size and alignment} 276 trait otype( dtype T | sized(T) ) { // sized is a pseudo-trait for types with known size and alignment 276 277 void ?{}( T * ); $\C{// default constructor}$ 277 278 void ?{}( T *, T ); $\C{// copy constructor}$ … … 280 281 }; 281 282 \end{lstlisting} 282 Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete struct type -- they can be stack-allocated using the @alloca@ compiler builtin, default or copy-initialized, assigned, and deleted. As an example, the @sum@ function produces generated code something like the following (simplified for clarity and brevity)\TODO{fix example, maybe elide, it's likely too long with the more complicated function}: 283 \begin{lstlisting} 284 void abs( size_t _sizeof_M, size_t _alignof_M, 285 void (*_ctor_M)(void*), void (*_copy_M)(void*, void*), 286 void (*_assign_M)(void*, void*), void (*_dtor_M)(void*), 287 _Bool (*_lt_M)(void*, void*), void (*_neg_M)(void*, void*), 288 void (*_ctor_M_zero)(void*, int), 289 void* m, void* _rtn ) { $\C{// polymorphic parameter and return passed as void*}$ 290 $\C{// M zero = { 0 };}$ 291 void* zero = alloca(_sizeof_M); $\C{// stack allocate zero temporary}$ 292 _ctor_M_zero(zero, 0); $\C{// initialize using zero\_t constructor}$ 293 $\C{// return m < zero ? -m : m;}$ 294 void *_tmp = alloca(_sizeof_M); 295 _copy_M( _rtn, $\C{// copy-initialize return value}$ 296 _lt_M( m, zero ) ? $\C{// check condition}$ 297 (_neg_M(m, _tmp), _tmp) : $\C{// negate m}$ 298 m); 299 _dtor_M(_tmp); _dtor_M(zero); $\C{// destroy temporaries}$ 300 } 301 \end{lstlisting} 302 303 Traits may be used for many of the same purposes as interfaces in Java or abstract base classes in \CC. Unlike Java interfaces or \CC base classes, \CFA types do not explicitly state any inheritance relationship to traits they satisfy, which is a form of structural inheritance, similar to the implementation of an interface in Go~\citep{Go}, as opposed to the nominal inheritance model of Java and \CC. Nominal inheritance can be simulated with traits using marker variables or functions: 283 Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete type: stack-allocatable, default or copy-initialized, assigned, and deleted. 284 % As an example, the @sum@ function produces generated code something like the following (simplified for clarity and brevity)\TODO{fix example, maybe elide, it's likely too long with the more complicated function}: 285 % \begin{lstlisting} 286 % void abs( size_t _sizeof_M, size_t _alignof_M, 287 % void (*_ctor_M)(void*), void (*_copy_M)(void*, void*), 288 % void (*_assign_M)(void*, void*), void (*_dtor_M)(void*), 289 % _Bool (*_lt_M)(void*, void*), void (*_neg_M)(void*, void*), 290 % void (*_ctor_M_zero)(void*, int), 291 % void* m, void* _rtn ) { $\C{// polymorphic parameter and return passed as void*}$ 292 % $\C{// M zero = { 0 };}$ 293 % void* zero = alloca(_sizeof_M); $\C{// stack allocate zero temporary}$ 294 % _ctor_M_zero(zero, 0); $\C{// initialize using zero\_t constructor}$ 295 % $\C{// return m < zero ? -m : m;}$ 296 % void *_tmp = alloca(_sizeof_M); 297 % _copy_M( _rtn, $\C{// copy-initialize return value}$ 298 % _lt_M( m, zero ) ? $\C{// check condition}$ 299 % (_neg_M(m, _tmp), _tmp) : $\C{// negate m}$ 300 % m); 301 % _dtor_M(_tmp); _dtor_M(zero); $\C{// destroy temporaries}$ 302 % } 303 % \end{lstlisting} 304 305 Traits may be used for many of the same purposes as interfaces in Java or abstract base classes in \CC. Unlike Java interfaces or \CC base classes, \CFA types do not explicitly state any inheritance relationship to traits they satisfy, which is a form of structural inheritance, similar to the implementation of an interface in Go~\citep{Go}, as opposed to the nominal inheritance model of Java and \CC. 306 307 Nominal inheritance can be simulated with traits using marker variables or functions: 304 308 \begin{lstlisting} 305 309 trait nominal(otype T) { 306 310 T is_nominal; 307 311 }; 308 309 312 int is_nominal; $\C{// int now satisfies the nominal trait}$ 310 313 \end{lstlisting} 311 314 312 Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship amongmultiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems:315 Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship \emph{among} multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems: 313 316 \begin{lstlisting} 314 317 trait pointer_like(otype Ptr, otype El) { 315 318 lvalue El *?(Ptr); $\C{// Ptr can be dereferenced into a modifiable value of type El}$ 316 319 } 317 318 320 struct list { 319 321 int value; 320 322 list *next; $\C{// may omit "struct" on type names as in \CC}$ 321 323 }; 322 323 324 typedef list *list_iterator; 324 325
Note: See TracChangeset
for help on using the changeset viewer.