- Timestamp:
- Mar 2, 2018, 6:50:13 AM (7 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
- Children:
- c9b3a41
- Parents:
- 507e7a2
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/papers/general/Paper.tex
r507e7a2 rab3251e 199 199 The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects. 200 200 This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more. 201 The TIOBE~\cite{TIOBE} ranks the top 5 most popular programming languages as: Java 16\%, \Textbf{C 7\%}, \Textbf{\CC 5\%}, \Csharp 4\%, Python 4\% = 36\%, where the next 50 languages are less than 3\% each with a long tail.201 The TIOBE~\cite{TIOBE} ranks the top 5 most \emph{popular} programming languages as: Java 15\%, \Textbf{C 12\%}, \Textbf{\CC 5.5\%}, Python 5\%, \Csharp 4.5\% = 42\%, where the next 50 languages are less than 4\% each with a long tail. 202 202 The top 3 rankings over the past 30 years are: 203 203 \begin{center} … … 205 205 \lstDeleteShortInline@% 206 206 \begin{tabular}{@{}rccccccc@{}} 207 & 201 7 & 2012 & 2007 & 2002 & 1997 & 1992 & 1987\\ \hline208 Java & 1 & 1 & 1 & 1 & 12 & - & -\\209 \Textbf{C} & \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{1}& \Textbf{1}& \Textbf{1}\\210 \CC & 3 & 3 & 3 & 3 & 2 & 2 & 4\\207 & 2018 & 2013 & 2008 & 2003 & 1998 & 1993 & 1988 \\ \hline 208 Java & 1 & 2 & 1 & 1 & 18 & - & - \\ 209 \Textbf{C}& \Textbf{2} & \Textbf{1} & \Textbf{2} & \Textbf{2} & \Textbf{1} & \Textbf{1} & \Textbf{1} \\ 210 \CC & 3 & 4 & 3 & 3 & 2 & 2 & 5 \\ 211 211 \end{tabular} 212 212 \lstMakeShortInline@% … … 227 227 \CFA is currently implemented as a source-to-source translator from \CFA to the gcc-dialect of C~\cite{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by gcc, meeting goals (1)--(3). 228 228 Ultimately, a compiler is necessary for advanced features and optimal performance. 229 230 This paper identifies shortcomings in existing approaches to generic and variadic data types in C-like languages and presents a design for generic and variadic types avoiding those shortcomings. 229 All of the features discussed in this paper are working, unless a feature states it is a future feature for completion. 230 231 232 \section{Polymorphic Functions} 233 234 \CFA introduces both ad-hoc and parametric polymorphism to C, with a design originally formalized by Ditchfield~\cite{Ditchfield92}, and first implemented by Bilson~\cite{Bilson03}. 235 Shortcomings are identified in existing approaches to generic and variadic data types in C-like languages and how these shortcomings are avoided in \CFA. 231 236 Specifically, the solution is both reusable and type-checked, as well as conforming to the design goals of \CFA with ergonomic use of existing C abstractions. 232 The new constructs are empirically compared with both standard C and \CC; the results show the new design is comparable in performance. 233 234 \section{Polymorphic Functions} 235 236 \CFA introduces both ad-hoc and parametric polymorphism to C, with a design originally formalized by Ditchfield~\cite{Ditchfield92}, and first implemented by Bilson~\cite{Bilson03}. 237 The new constructs are empirically compared with C and \CC approaches via performance experiments in Section~\ref{sec:eval}. 238 237 239 238 240 \subsection{Name Overloading} 239 241 \label{s:NameOverloading} 242 243 \begin{quote} 244 There are only two hard things in Computer Science: cache invalidation and \emph{naming things} -- Phil Karlton 245 \end{quote} 240 246 C already has a limited form of ad-hoc polymorphism in the form of its basic arithmetic operators, which apply to a variety of different types using identical syntax. 241 247 \CFA extends the built-in operator overloading by allowing users to define overloads for any function, not just operators, and even any variable; … … 245 251 246 252 \begin{cfa} 247 int max(int a, int b) { return a < b ? b : a; } // (1) 248 double max(double a, double b) { return a < b ? b : a; } // (2) 249 250 int max = INT_MAX; // (3) 251 double max = DBL_MAX; // (4) 252 253 max(7, -max); $\C{// uses (1) and (3), by matching int from constant 7}$ 254 max(max, 3.14); $\C{// uses (2) and (4), by matching double from constant 3.14}$ 255 256 //max(max, -max); $\C{// ERROR: ambiguous}$ 257 int m = max(max, -max); $\C{// uses (1) once and (3) twice, by matching return type}$ 258 \end{cfa} 259 260 \Celeven did add @_Generic@ expressions, which can be used in preprocessor macros to provide a form of ad-hoc polymorphism; however, this polymorphism is both functionally and ergonomically inferior to \CFA name overloading. 253 int max = 2147483647; $\C[3.75in]{// (1)}$ 254 double max = 1.7976931348623157E+308; $\C{// (2)}$ 255 int max( int a, int b ) { return a < b ? b : a; } $\C{// (3)}$ 256 double max( double a, double b ) { return a < b ? b : a; } $\C{// (4)}\CRT$ 257 max( 7, -max ); $\C{// uses (3) and (1), by matching int from constant 7}$ 258 max( max, 3.14 ); $\C{// uses (4) and (2), by matching double from constant 3.14}$ 259 max( max, -max ); $\C{// ERROR: ambiguous}$ 260 int m = max( max, -max ); $\C{// uses (3) and (1) twice, by matching return type}$ 261 \end{cfa} 262 \CFA maximizes the ability to reuse names to aggressively address the naming problem. 263 In some cases, hundreds of names can be reduced to tens, resulting in a significant cognitive reduction for a programmer. 264 In the above, the name @max@ has a consistent meaning, and a programmer only needs to remember the single concept: maximum. 265 To prevent significant ambiguities, \CFA uses the return type in selecting overloads, \eg in the assignment to @m@, the compiler use @m@'s type to unambiguously select the most appropriate call to function @max@ (as does Ada). 266 As is shown later, there are a number of situations where \CFA takes advantage of available type information to disambiguate, where other programming languages generate ambiguities. 267 268 \Celeven added @_Generic@ expressions, which can be used in preprocessor macros to provide a form of ad-hoc polymorphism; however, this polymorphism is both functionally and ergonomically inferior to \CFA name overloading. 261 269 The macro wrapping the generic expression imposes some limitations; as an example, it could not implement the example above, because the variables @max@ would collide with the functions @max@. 262 270 Ergonomic limitations of @_Generic@ include the necessity to put a fixed list of supported types in a single place and manually dispatch to appropriate overloads, as well as possible namespace pollution from the functions dispatched to, which must all have distinct names. 263 Though name-overloading removes a major use-case for @_Generic@ expressions, \CFA does implement@_Generic@ for backwards-compatibility purposes. \TODO{actually implement that}271 Though name-overloading removes a major use-case for @_Generic@ expressions, \CFA implements @_Generic@ for backwards-compatibility purposes. \TODO{actually implement that} 264 272 265 273 % http://fanf.livejournal.com/144696.html … … 288 296 For example, the function @twice@ can be defined using the \CFA syntax for operator overloading: 289 297 \begin{cfa} 290 forall( otype T `| { T ?+?(T, T); }` ) T twice( T x ) { return x +x; } $\C{// ? denotes operands}$298 forall( otype T `| { T ?+?(T, T); }` ) T twice( T x ) { return x `+` x; } $\C{// ? denotes operands}$ 291 299 int val = twice( twice( 3.7 ) ); 292 300 \end{cfa} … … 343 351 \begin{cfa} 344 352 forall( otype T | { int ?<?( T, T ); } ) void qsort( const T * arr, size_t size ) { /* use C qsort */ } 345 { int ?<?( double x, double y ) { return x `>` y; } $\C{// locally override behaviour}$ 353 { 354 int ?<?( double x, double y ) { return x `>` y; } $\C{// locally override behaviour}$ 346 355 qsort( vals, size ); $\C{// descending sort}$ 347 356 } … … 414 423 \section{Generic Types} 415 424 416 One of the known shortcomings of standard C is that it does not providereusable type-safe abstractions for generic data structures and algorithms.425 A significant shortcoming of standard C is the lack of reusable type-safe abstractions for generic data structures and algorithms. 417 426 Broadly speaking, there are three approaches to implement abstract data-structures in C. 418 427 One approach is to write bespoke data-structures for each context in which they are needed. 419 428 While this approach is flexible and supports integration with the C type-checker and tooling, it is also tedious and error-prone, especially for more complex data structures. 420 A second approach is to use @void *@--based polymorphism, \eg the C standard-library functions @bsearch@ and @qsort@ ; an approach which does allow reuse of code forcommon functionality.421 However, basing all polymorphism on @void *@ eliminates the type-checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that would not otherwise be needed.429 A second approach is to use @void *@--based polymorphism, \eg the C standard-library functions @bsearch@ and @qsort@, which allows reuse of code with common functionality. 430 However, basing all polymorphism on @void *@ eliminates the type-checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is not otherwise needed. 422 431 A third approach to generic code is to use preprocessor macros, which does allow the generated code to be both generic and type-checked, but errors may be difficult to interpret. 423 432 Furthermore, writing and using preprocessor macros can be unnatural and inflexible. … … 434 443 }; 435 444 forall( otype T ) T value( pair( const char *, T ) p ) { return p.second; } 436 forall( dtype F, otype T ) T value _p( pair( F *, T * ) p ) { return *p.second; }445 forall( dtype F, otype T ) T value( pair( F *, T * ) p ) { return *p.second; } 437 446 438 447 pair( const char *, int ) p = { "magic", 42 }; 439 int magic= value( p );448 int i = value( p ); 440 449 pair( void *, int * ) q = { 0, &p.second }; 441 magic = value_p( q );450 i = value( q ); 442 451 double d = 1.0; 443 452 pair( double *, double * ) r = { &d, &d }; 444 d = value _p( r );453 d = value( r ); 445 454 \end{cfa} 446 455 … … 589 598 [ double ] foo$\(_2\)$( int ); 590 599 void bar( int, double, double ); 591 bar( foo( 3 ), foo( 3 ) );600 `bar`( foo( 3 ), foo( 3 ) ); 592 601 \end{cfa} 593 602 The type-resolver only has the tuple return-types to resolve the call to @bar@ as the @foo@ parameters are identical, which involves unifying the possible @foo@ functions with @bar@'s parameter list. … … 835 844 Since @sum@\(_0\) does not accept any arguments, it is not a valid candidate function for the call @sum(10, 20, 30)@. 836 845 In order to call @sum@\(_1\), @10@ is matched with @x@, and the argument resolution moves on to the argument pack @rest@, which consumes the remainder of the argument list and @Params@ is bound to @[20, 30]@. 837 The process continues un itl @Params@ is bound to @[]@, requiring an assertion @int sum()@, which matches @sum@\(_0\) and terminates the recursion.846 The process continues until @Params@ is bound to @[]@, requiring an assertion @int sum()@, which matches @sum@\(_0\) and terminates the recursion. 838 847 Effectively, this algorithm traces as @sum(10, 20, 30)@ $\rightarrow$ @10 + sum(20, 30)@ $\rightarrow$ @10 + (20 + sum(30))@ $\rightarrow$ @10 + (20 + (30 + sum()))@ $\rightarrow$ @10 + (20 + (30 + 0))@. 839 848 … … 1161 1170 @case@ clauses are made disjoint by the @break@ statement. 1162 1171 While the ability to fall through \emph{is} a useful form of control flow, it does not match well with programmer intuition, resulting in many errors from missing @break@ statements. 1163 For backwards compatibility, \CFA provides a \emph{new} control structure, @choose@, which mimics @switch@, but reverses the meaning of fall through: 1164 \begin{cquote} 1172 For backwards compatibility, \CFA provides a \emph{new} control structure, @choose@, which mimics @switch@, but reverses the meaning of fall through (see Figure~\ref{f:ChooseSwitchStatements}). 1173 Collectively, these enhancements reduce programmer burden and increase readability and safety. 1174 1175 \begin{figure} 1176 \centering 1165 1177 \lstDeleteShortInline@% 1166 \begin{tabular}{@{}l@{\hspace{ \parindentlnth}}l@{}}1167 \multicolumn{1}{c@{\hspace{ \parindentlnth}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\1178 \begin{tabular}{@{}l@{\hspace{2\parindentlnth}}l@{}} 1179 \multicolumn{1}{c@{\hspace{2\parindentlnth}}}{\textbf{\CFA}} & \multicolumn{1}{c}{\textbf{C}} \\ 1168 1180 \begin{cfa} 1169 1181 `choose` ( day ) { … … 1199 1211 \end{tabular} 1200 1212 \lstMakeShortInline@% 1201 \end{cquote} 1202 Collectively, these enhancements reduce programmer burden and increase readability and safety. 1213 \caption{\lstinline|choose| versus \lstinline|switch| Statements} 1214 \label{f:ChooseSwitchStatements} 1215 \end{figure} 1203 1216 1204 1217 \begin{comment} … … 1368 1381 \begin{cquote} 1369 1382 \lstDeleteShortInline@% 1370 \begin{tabular}{@{}l@{\hspace{ \parindentlnth}}l@{}}1371 \multicolumn{1}{c@{\hspace{ \parindentlnth}}}{\textbf{Resumption}} & \multicolumn{1}{c}{\textbf{Recovery}} \\1383 \begin{tabular}{@{}l@{\hspace{2\parindentlnth}}l@{}} 1384 \multicolumn{1}{c@{\hspace{2\parindentlnth}}}{\textbf{Resumption}} & \multicolumn{1}{c}{\textbf{Termination}} \\ 1372 1385 \begin{cfa} 1373 1386 `exception R { int fix; };` … … 2067 2080 In many cases, the interface is an inline wrapper providing overloading during compilation but zero cost at runtime. 2068 2081 The following sections give a glimpse of the interface reduction to many C libraries. 2069 In many cases, @signed@/@unsigned@ @char@ and @short@ routines are available (but not shown) to ensure expression computations remain in a single type, as conversions can distort results.2082 In many cases, @signed@/@unsigned@ @char@, @short@, and @_Complex@ routines are available (but not shown) to ensure expression computations remain in a single type, as conversions can distort results. 2070 2083 2071 2084 … … 2099 2112 \begin{cfa} 2100 2113 MIN 2114 2101 2115 MAX 2102 M_PI 2103 M_E 2116 2117 PI 2118 E 2104 2119 \end{cfa} 2105 2120 & 2106 2121 \begin{cfa} 2107 2122 SCHAR_MIN, CHAR_MIN, SHRT_MIN, INT_MIN, LONG_MIN, LLONG_MIN, 2123 FLT_MIN, DBL_MIN, LDBL_MIN 2108 2124 SCHAR_MAX, UCHAR_MAX, SHRT_MAX, INT_MAX, LONG_MAX, LLONG_MAX, 2109 M_PI, M_PIl, M_CPI, M_CPIl, 2110 M_E, M_El, M_CE, M_CEl 2125 FLT_MAX, DBL_MAX, LDBL_MAX 2126 M_PI, M_PIl 2127 M_E, M_El 2111 2128 \end{cfa} 2112 2129 \end{tabular} … … 2158 2175 While \Celeven has type-generic math~\cite[\S~7.25]{C11} in @tgmath.h@ to provide a similar mechanism, these macros are limited, matching a routine name with a single set of floating type(s). 2159 2176 For example, it is impossible to overload @atan@ for both one and two arguments; 2160 instead the names @atan@ and @atan2@ are required .2177 instead the names @atan@ and @atan2@ are required (see Section~\ref{s:NameOverloading}). 2161 2178 The key observation is that only a restricted set of type-generic macros are provided for a limited set of routine names, which do not generalize across the type system, as in \CFA. 2162 2179 … … 2446 2463 \begin{cfa}[xleftmargin=3\parindentlnth,aboveskip=0pt,belowskip=0pt] 2447 2464 int main( int argc, char * argv[] ) { 2448 FILE * out = fopen( "cfa-out.txt", "w" ); 2449 int maxi = 0, vali = 42; 2450 stack(int) si, ti; 2451 2452 REPEAT_TIMED( "push_int", N, push( &si, vali ); ) 2453 TIMED( "copy_int", ti = si; ) 2454 TIMED( "clear_int", clear( &si ); ) 2455 REPEAT_TIMED( "pop_int", N, 2456 int xi = pop( &ti ); if ( xi > maxi ) { maxi = xi; } ) 2457 REPEAT_TIMED( "print_int", N/2, print( out, vali, ":", vali, "\n" ); ) 2458 2459 pair(_Bool, char) maxp = { (_Bool)0, '\0' }, valp = { (_Bool)1, 'a' }; 2460 stack(pair(_Bool, char)) sp, tp; 2461 2462 REPEAT_TIMED( "push_pair", N, push( &sp, valp ); ) 2463 TIMED( "copy_pair", tp = sp; ) 2464 TIMED( "clear_pair", clear( &sp ); ) 2465 REPEAT_TIMED( "pop_pair", N, 2466 pair(_Bool, char) xp = pop( &tp ); if ( xp > maxp ) { maxp = xp; } ) 2467 REPEAT_TIMED( "print_pair", N/2, print( out, valp, ":", valp, "\n" ); ) 2468 fclose(out); 2465 ofstream out = { "cfa-out.txt" }; 2466 int max = 0, val = 42; 2467 stack( int ) si, t; 2468 2469 REPEAT_TIMED( "push_int", N, push( si, val ); ) 2470 TIMED( "copy_int", t = si; ) 2471 TIMED( "clear_int", clear( si ); ) 2472 REPEAT_TIMED( "pop_int", N, int x = pop( t ); max = max( x, max ); ) 2473 REPEAT_TIMED( "print_int", N/2, out | val | ':' | val | endl; ) 2474 2475 pair( _Bool, char ) max = { (_Bool)0, '\0' }, val = { (_Bool)1, 'a' }; 2476 stack( pair( _Bool, char ) ) s, t; 2477 2478 REPEAT_TIMED( "push_pair", N, push( s, val ); ) 2479 TIMED( "copy_pair", t = s; ) 2480 TIMED( "clear_pair", clear( s ); ) 2481 REPEAT_TIMED( "pop_pair", N, pair(_Bool, char) x = pop( t ); max = max( x, max ); ) 2482 REPEAT_TIMED( "print_pair", N/2, out | val | ':' | val | endl; ) 2469 2483 } 2470 2484 \end{cfa} … … 2640 2654 \CFA 2641 2655 \begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt] 2656 forall(otype T) struct stack_node; 2657 forall(otype T) struct stack { 2658 stack_node(T) * head; 2659 }; 2642 2660 forall(otype T) struct stack_node { 2643 2661 T value; 2644 2662 stack_node(T) * next; 2645 2663 }; 2646 forall(otype T) void ?{}( stack(T) * s) { (&s->head){ 0 }; }2647 forall(otype T) void ?{}( stack(T) * s, stack(T) t) {2648 stack_node(T) ** crnt = &s ->head;2664 forall(otype T) void ?{}( stack(T) & s ) { (s.head){ 0 }; } 2665 forall(otype T) void ?{}( stack(T) & s, stack(T) t ) { 2666 stack_node(T) ** crnt = &s.head; 2649 2667 for ( stack_node(T) * next = t.head; next; next = next->next ) { 2650 *crnt = ((stack_node(T) *)malloc()){ next->value }; /***/2668 *crnt = malloc(){ next->value }; 2651 2669 stack_node(T) * acrnt = *crnt; 2652 2670 crnt = &acrnt->next; … … 2654 2672 *crnt = 0; 2655 2673 } 2656 forall(otype T) stack(T) ?=?( stack(T) * s, stack(T) t) {2657 if ( s ->head == t.head ) return *s;2658 clear( s);2674 forall(otype T) stack(T) ?=?( stack(T) & s, stack(T) t ) { 2675 if ( s.head == t.head ) return s; 2676 clear( s ); 2659 2677 s{ t }; 2660 return *s;2661 } 2662 forall(otype T) void ^?{}( stack(T) * s) { clear(s); }2663 forall(otype T) _Bool empty( const stack(T) * s) { return s->head == 0; }2664 forall(otype T) void push( stack(T) * s, T value) {2665 s ->head = ((stack_node(T) *)malloc()){ value, s->head }; /***/2666 } 2667 forall(otype T) T pop( stack(T) * s) {2668 stack_node(T) * n = s ->head;2669 s ->head = n->next;2678 return s; 2679 } 2680 forall(otype T) void ^?{}( stack(T) & s) { clear( s ); } 2681 forall(otype T) _Bool empty( const stack(T) & s ) { return s.head == 0; } 2682 forall(otype T) void push( stack(T) & s, T value ) { 2683 s.head = malloc(){ value, s.head }; 2684 } 2685 forall(otype T) T pop( stack(T) & s ) { 2686 stack_node(T) * n = s.head; 2687 s.head = n->next; 2670 2688 T x = n->value; 2671 2689 ^n{}; 2672 free( n);2690 free( n ); 2673 2691 return x; 2674 2692 } 2675 forall(otype T) void clear( stack(T) * s) {2676 for ( stack_node(T) * next = s ->head; next; ) {2693 forall(otype T) void clear( stack(T) & s ) { 2694 for ( stack_node(T) * next = s.head; next; ) { 2677 2695 stack_node(T) * crnt = next; 2678 2696 next = crnt->next; 2679 delete( crnt);2697 delete( crnt ); 2680 2698 } 2681 s ->head = 0;2699 s.head = 0; 2682 2700 } 2683 2701 \end{cfa} … … 2828 2846 2829 2847 \begin{comment} 2830 2831 2848 \subsubsection{bench.h} 2832 2849 (\texttt{bench.hpp} is similar.)
Note: See TracChangeset
for help on using the changeset viewer.