Changeset ab3251e for doc/papers/general


Ignore:
Timestamp:
Mar 2, 2018, 6:50:13 AM (7 years ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
c9b3a41
Parents:
507e7a2
Message:

updates, change CFA evaluation code

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/papers/general/Paper.tex

    r507e7a2 rab3251e  
    199199The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects.
    200200This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more.
    201 The TIOBE~\cite{TIOBE} ranks the top 5 most popular programming languages as: Java 16\%, \Textbf{C 7\%}, \Textbf{\CC 5\%}, \Csharp 4\%, Python 4\% = 36\%, where the next 50 languages are less than 3\% each with a long tail.
     201The TIOBE~\cite{TIOBE} ranks the top 5 most \emph{popular} programming languages as: Java 15\%, \Textbf{C 12\%}, \Textbf{\CC 5.5\%}, Python 5\%, \Csharp 4.5\% = 42\%, where the next 50 languages are less than 4\% each with a long tail.
    202202The top 3 rankings over the past 30 years are:
    203203\begin{center}
     
    205205\lstDeleteShortInline@%
    206206\begin{tabular}{@{}rccccccc@{}}
    207                 & 2017  & 2012  & 2007  & 2002  & 1997  & 1992  & 1987          \\ \hline
    208 Java    & 1             & 1             & 1             & 1             & 12    & -             & -                     \\
    209 \Textbf{C}      & \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{1}& \Textbf{1}& \Textbf{1}    \\
    210 \CC             & 3             & 3             & 3             & 3             & 2             & 2             & 4                     \\
     207                & 2018  & 2013  & 2008  & 2003  & 1998  & 1993  & 1988  \\ \hline
     208Java    & 1             & 2             & 1             & 1             & 18    & -             & -             \\
     209\Textbf{C}& \Textbf{2} & \Textbf{1} & \Textbf{2} & \Textbf{2} & \Textbf{1} & \Textbf{1} & \Textbf{1} \\
     210\CC             & 3             & 4             & 3             & 3             & 2             & 2             & 5             \\
    211211\end{tabular}
    212212\lstMakeShortInline@%
     
    227227\CFA is currently implemented as a source-to-source translator from \CFA to the gcc-dialect of C~\cite{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by gcc, meeting goals (1)--(3).
    228228Ultimately, a compiler is necessary for advanced features and optimal performance.
    229 
    230 This paper identifies shortcomings in existing approaches to generic and variadic data types in C-like languages and presents a design for generic and variadic types avoiding those shortcomings.
     229All of the features discussed in this paper are working, unless a feature states it is a future feature for completion.
     230
     231
     232\section{Polymorphic Functions}
     233
     234\CFA introduces both ad-hoc and parametric polymorphism to C, with a design originally formalized by Ditchfield~\cite{Ditchfield92}, and first implemented by Bilson~\cite{Bilson03}.
     235Shortcomings are identified in existing approaches to generic and variadic data types in C-like languages and how these shortcomings are avoided in \CFA.
    231236Specifically, the solution is both reusable and type-checked, as well as conforming to the design goals of \CFA with ergonomic use of existing C abstractions.
    232 The new constructs are empirically compared with both standard C and \CC; the results show the new design is comparable in performance.
    233 
    234 \section{Polymorphic Functions}
    235 
    236 \CFA introduces both ad-hoc and parametric polymorphism to C, with a design originally formalized by Ditchfield~\cite{Ditchfield92}, and first implemented by Bilson~\cite{Bilson03}.
     237The new constructs are empirically compared with C and \CC approaches via performance experiments in Section~\ref{sec:eval}.
     238
    237239
    238240\subsection{Name Overloading}
    239 
     241\label{s:NameOverloading}
     242
     243\begin{quote}
     244There are only two hard things in Computer Science: cache invalidation and \emph{naming things} -- Phil Karlton
     245\end{quote}
    240246C already has a limited form of ad-hoc polymorphism in the form of its basic arithmetic operators, which apply to a variety of different types using identical syntax.
    241247\CFA extends the built-in operator overloading by allowing users to define overloads for any function, not just operators, and even any variable;
     
    245251
    246252\begin{cfa}
    247 int max(int a, int b) { return a < b ? b : a; }  // (1)
    248 double max(double a, double b) { return a < b ? b : a; }  // (2)
    249 
    250 int max = INT_MAX;     // (3)
    251 double max = DBL_MAX;  // (4)
    252 
    253 max(7, -max);   $\C{// uses (1) and (3), by matching int from constant 7}$
    254 max(max, 3.14); $\C{// uses (2) and (4), by matching double from constant 3.14}$
    255 
    256 //max(max, -max);  $\C{// ERROR: ambiguous}$
    257 int m = max(max, -max); $\C{// uses (1) once and (3) twice, by matching return type}$
    258 \end{cfa}
    259 
    260 \Celeven did add @_Generic@ expressions, which can be used in preprocessor macros to provide a form of ad-hoc polymorphism; however, this polymorphism is both functionally and ergonomically inferior to \CFA name overloading.
     253int max = 2147483647;                                           $\C[3.75in]{// (1)}$
     254double max = 1.7976931348623157E+308;   $\C{// (2)}$
     255int max( int a, int b ) { return a < b ? b : a; }  $\C{// (3)}$
     256double max( double a, double b ) { return a < b ? b : a; }  $\C{// (4)}\CRT$
     257max( 7, -max );                                                         $\C{// uses (3) and (1), by matching int from constant 7}$
     258max( max, 3.14 );                                                       $\C{// uses (4) and (2), by matching double from constant 3.14}$
     259max( max, -max );                                                       $\C{// ERROR: ambiguous}$
     260int m = max( max, -max );                                       $\C{// uses (3) and (1) twice, by matching return type}$
     261\end{cfa}
     262\CFA maximizes the ability to reuse names to aggressively address the naming problem.
     263In some cases, hundreds of names can be reduced to tens, resulting in a significant cognitive reduction for a programmer.
     264In the above, the name @max@ has a consistent meaning, and a programmer only needs to remember the single concept: maximum.
     265To prevent significant ambiguities, \CFA uses the return type in selecting overloads, \eg in the assignment to @m@, the compiler use @m@'s type to unambiguously select the most appropriate call to function @max@ (as does Ada).
     266As is shown later, there are a number of situations where \CFA takes advantage of available type information to disambiguate, where other programming languages generate ambiguities.
     267
     268\Celeven added @_Generic@ expressions, which can be used in preprocessor macros to provide a form of ad-hoc polymorphism; however, this polymorphism is both functionally and ergonomically inferior to \CFA name overloading.
    261269The macro wrapping the generic expression imposes some limitations; as an example, it could not implement the example above, because the variables @max@ would collide with the functions @max@.
    262270Ergonomic limitations of @_Generic@ include the necessity to put a fixed list of supported types in a single place and manually dispatch to appropriate overloads, as well as possible namespace pollution from the functions dispatched to, which must all have distinct names.
    263 Though name-overloading removes a major use-case for @_Generic@ expressions, \CFA does implement @_Generic@ for backwards-compatibility purposes. \TODO{actually implement that}
     271Though name-overloading removes a major use-case for @_Generic@ expressions, \CFA implements @_Generic@ for backwards-compatibility purposes. \TODO{actually implement that}
    264272
    265273% http://fanf.livejournal.com/144696.html
     
    288296For example, the function @twice@ can be defined using the \CFA syntax for operator overloading:
    289297\begin{cfa}
    290 forall( otype T `| { T ?+?(T, T); }` ) T twice( T x ) { return x + x; } $\C{// ? denotes operands}$
     298forall( otype T `| { T ?+?(T, T); }` ) T twice( T x ) { return x `+` x; }       $\C{// ? denotes operands}$
    291299int val = twice( twice( 3.7 ) );
    292300\end{cfa}
     
    343351\begin{cfa}
    344352forall( otype T | { int ?<?( T, T ); } ) void qsort( const T * arr, size_t size ) { /* use C qsort */ }
    345 {       int ?<?( double x, double y ) { return x `>` y; }       $\C{// locally override behaviour}$
     353{
     354        int ?<?( double x, double y ) { return x `>` y; }       $\C{// locally override behaviour}$
    346355        qsort( vals, size );                                    $\C{// descending sort}$
    347356}
     
    414423\section{Generic Types}
    415424
    416 One of the known shortcomings of standard C is that it does not provide reusable type-safe abstractions for generic data structures and algorithms.
     425A significant shortcoming of standard C is the lack of reusable type-safe abstractions for generic data structures and algorithms.
    417426Broadly speaking, there are three approaches to implement abstract data-structures in C.
    418427One approach is to write bespoke data-structures for each context in which they are needed.
    419428While this approach is flexible and supports integration with the C type-checker and tooling, it is also tedious and error-prone, especially for more complex data structures.
    420 A second approach is to use @void *@--based polymorphism, \eg the C standard-library functions @bsearch@ and @qsort@; an approach which does allow reuse of code for common functionality.
    421 However, basing all polymorphism on @void *@ eliminates the type-checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that would not otherwise be needed.
     429A second approach is to use @void *@--based polymorphism, \eg the C standard-library functions @bsearch@ and @qsort@, which allows reuse of code with common functionality.
     430However, basing all polymorphism on @void *@ eliminates the type-checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is not otherwise needed.
    422431A third approach to generic code is to use preprocessor macros, which does allow the generated code to be both generic and type-checked, but errors may be difficult to interpret.
    423432Furthermore, writing and using preprocessor macros can be unnatural and inflexible.
     
    434443};
    435444forall( otype T ) T value( pair( const char *, T ) p ) { return p.second; }
    436 forall( dtype F, otype T ) T value_p( pair( F *, T * ) p ) { return * p.second; }
     445forall( dtype F, otype T ) T value( pair( F *, T * ) p ) { return *p.second; }
    437446
    438447pair( const char *, int ) p = { "magic", 42 };
    439 int magic = value( p );
     448int i = value( p );
    440449pair( void *, int * ) q = { 0, &p.second };
    441 magic = value_p( q );
     450i = value( q );
    442451double d = 1.0;
    443452pair( double *, double * ) r = { &d, &d };
    444 d = value_p( r );
     453d = value( r );
    445454\end{cfa}
    446455
     
    589598[ double ] foo$\(_2\)$( int );
    590599void bar( int, double, double );
    591 bar( foo( 3 ), foo( 3 ) );
     600`bar`( foo( 3 ), foo( 3 ) );
    592601\end{cfa}
    593602The type-resolver only has the tuple return-types to resolve the call to @bar@ as the @foo@ parameters are identical, which involves unifying the possible @foo@ functions with @bar@'s parameter list.
     
    835844Since @sum@\(_0\) does not accept any arguments, it is not a valid candidate function for the call @sum(10, 20, 30)@.
    836845In order to call @sum@\(_1\), @10@ is matched with @x@, and the argument resolution moves on to the argument pack @rest@, which consumes the remainder of the argument list and @Params@ is bound to @[20, 30]@.
    837 The process continues unitl @Params@ is bound to @[]@, requiring an assertion @int sum()@, which matches @sum@\(_0\) and terminates the recursion.
     846The process continues until @Params@ is bound to @[]@, requiring an assertion @int sum()@, which matches @sum@\(_0\) and terminates the recursion.
    838847Effectively, this algorithm traces as @sum(10, 20, 30)@ $\rightarrow$ @10 + sum(20, 30)@ $\rightarrow$ @10 + (20 + sum(30))@ $\rightarrow$ @10 + (20 + (30 + sum()))@ $\rightarrow$ @10 + (20 + (30 + 0))@.
    839848
     
    11611170@case@ clauses are made disjoint by the @break@ statement.
    11621171While the ability to fall through \emph{is} a useful form of control flow, it does not match well with programmer intuition, resulting in many errors from missing @break@ statements.
    1163 For backwards compatibility, \CFA provides a \emph{new} control structure, @choose@, which mimics @switch@, but reverses the meaning of fall through:
    1164 \begin{cquote}
     1172For backwards compatibility, \CFA provides a \emph{new} control structure, @choose@, which mimics @switch@, but reverses the meaning of fall through (see Figure~\ref{f:ChooseSwitchStatements}).
     1173Collectively, these enhancements reduce programmer burden and increase readability and safety.
     1174
     1175\begin{figure}
     1176\centering
    11651177\lstDeleteShortInline@%
    1166 \begin{tabular}{@{}l@{\hspace{\parindentlnth}}l@{}}
    1167 \multicolumn{1}{c@{\hspace{\parindentlnth}}}{\textbf{\CFA}}     & \multicolumn{1}{c}{\textbf{C}}        \\
     1178\begin{tabular}{@{}l@{\hspace{2\parindentlnth}}l@{}}
     1179\multicolumn{1}{c@{\hspace{2\parindentlnth}}}{\textbf{\CFA}}    & \multicolumn{1}{c}{\textbf{C}}        \\
    11681180\begin{cfa}
    11691181`choose` ( day ) {
     
    11991211\end{tabular}
    12001212\lstMakeShortInline@%
    1201 \end{cquote}
    1202 Collectively, these enhancements reduce programmer burden and increase readability and safety.
     1213\caption{\lstinline|choose| versus \lstinline|switch| Statements}
     1214\label{f:ChooseSwitchStatements}
     1215\end{figure}
    12031216
    12041217\begin{comment}
     
    13681381\begin{cquote}
    13691382\lstDeleteShortInline@%
    1370 \begin{tabular}{@{}l@{\hspace{\parindentlnth}}l@{}}
    1371 \multicolumn{1}{c@{\hspace{\parindentlnth}}}{\textbf{Resumption}}       & \multicolumn{1}{c}{\textbf{Recovery}} \\
     1383\begin{tabular}{@{}l@{\hspace{2\parindentlnth}}l@{}}
     1384\multicolumn{1}{c@{\hspace{2\parindentlnth}}}{\textbf{Resumption}}      & \multicolumn{1}{c}{\textbf{Termination}}      \\
    13721385\begin{cfa}
    13731386`exception R { int fix; };`
     
    20672080In many cases, the interface is an inline wrapper providing overloading during compilation but zero cost at runtime.
    20682081The following sections give a glimpse of the interface reduction to many C libraries.
    2069 In many cases, @signed@/@unsigned@ @char@ and @short@ routines are available (but not shown) to ensure expression computations remain in a single type, as conversions can distort results.
     2082In many cases, @signed@/@unsigned@ @char@, @short@, and @_Complex@ routines are available (but not shown) to ensure expression computations remain in a single type, as conversions can distort results.
    20702083
    20712084
     
    20992112\begin{cfa}
    21002113MIN
     2114
    21012115MAX
    2102 M_PI
    2103 M_E
     2116
     2117PI
     2118E
    21042119\end{cfa}
    21052120&
    21062121\begin{cfa}
    21072122SCHAR_MIN, CHAR_MIN, SHRT_MIN, INT_MIN, LONG_MIN, LLONG_MIN,
     2123                FLT_MIN, DBL_MIN, LDBL_MIN
    21082124SCHAR_MAX, UCHAR_MAX, SHRT_MAX, INT_MAX, LONG_MAX, LLONG_MAX,
    2109 M_PI, M_PIl, M_CPI, M_CPIl,
    2110 M_E, M_El, M_CE, M_CEl
     2125                 FLT_MAX, DBL_MAX, LDBL_MAX
     2126M_PI, M_PIl
     2127M_E, M_El
    21112128\end{cfa}
    21122129\end{tabular}
     
    21582175While \Celeven has type-generic math~\cite[\S~7.25]{C11} in @tgmath.h@ to provide a similar mechanism, these macros are limited, matching a routine name with a single set of floating type(s).
    21592176For example, it is impossible to overload @atan@ for both one and two arguments;
    2160 instead the names @atan@ and @atan2@ are required.
     2177instead the names @atan@ and @atan2@ are required (see Section~\ref{s:NameOverloading}).
    21612178The key observation is that only a restricted set of type-generic macros are provided for a limited set of routine names, which do not generalize across the type system, as in \CFA.
    21622179
     
    24462463\begin{cfa}[xleftmargin=3\parindentlnth,aboveskip=0pt,belowskip=0pt]
    24472464int main( int argc, char * argv[] ) {
    2448         FILE * out = fopen( "cfa-out.txt", "w" );
    2449         int maxi = 0, vali = 42;
    2450         stack(int) si, ti;
    2451 
    2452         REPEAT_TIMED( "push_int", N, push( &si, vali ); )
    2453         TIMED( "copy_int", ti = si; )
    2454         TIMED( "clear_int", clear( &si ); )
    2455         REPEAT_TIMED( "pop_int", N,
    2456                 int xi = pop( &ti ); if ( xi > maxi ) { maxi = xi; } )
    2457         REPEAT_TIMED( "print_int", N/2, print( out, vali, ":", vali, "\n" ); )
    2458 
    2459         pair(_Bool, char) maxp = { (_Bool)0, '\0' }, valp = { (_Bool)1, 'a' };
    2460         stack(pair(_Bool, char)) sp, tp;
    2461 
    2462         REPEAT_TIMED( "push_pair", N, push( &sp, valp ); )
    2463         TIMED( "copy_pair", tp = sp; )
    2464         TIMED( "clear_pair", clear( &sp ); )
    2465         REPEAT_TIMED( "pop_pair", N,
    2466                 pair(_Bool, char) xp = pop( &tp ); if ( xp > maxp ) { maxp = xp; } )
    2467         REPEAT_TIMED( "print_pair", N/2, print( out, valp, ":", valp, "\n" ); )
    2468         fclose(out);
     2465        ofstream out = { "cfa-out.txt" };
     2466        int max = 0, val = 42;
     2467        stack( int ) si, t;
     2468
     2469        REPEAT_TIMED( "push_int", N, push( si, val ); )
     2470        TIMED( "copy_int", t = si; )
     2471        TIMED( "clear_int", clear( si ); )
     2472        REPEAT_TIMED( "pop_int", N, int x = pop( t ); max = max( x, max ); )
     2473        REPEAT_TIMED( "print_int", N/2, out | val | ':' | val | endl; )
     2474
     2475        pair( _Bool, char ) max = { (_Bool)0, '\0' }, val = { (_Bool)1, 'a' };
     2476        stack( pair( _Bool, char ) ) s, t;
     2477
     2478        REPEAT_TIMED( "push_pair", N, push( s, val ); )
     2479        TIMED( "copy_pair", t = s; )
     2480        TIMED( "clear_pair", clear( s ); )
     2481        REPEAT_TIMED( "pop_pair", N, pair(_Bool, char) x = pop( t ); max = max( x, max ); )
     2482        REPEAT_TIMED( "print_pair", N/2, out | val | ':' | val | endl; )
    24692483}
    24702484\end{cfa}
     
    26402654\CFA
    26412655\begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
     2656forall(otype T) struct stack_node;
     2657forall(otype T) struct stack {
     2658        stack_node(T) * head;
     2659};
    26422660forall(otype T) struct stack_node {
    26432661        T value;
    26442662        stack_node(T) * next;
    26452663};
    2646 forall(otype T) void ?{}(stack(T) * s) { (&s->head){ 0 }; }
    2647 forall(otype T) void ?{}(stack(T) * s, stack(T) t) {
    2648         stack_node(T) ** crnt = &s->head;
     2664forall(otype T) void ?{}( stack(T) & s ) { (s.head){ 0 }; }
     2665forall(otype T) void ?{}( stack(T) & s, stack(T) t ) {
     2666        stack_node(T) ** crnt = &s.head;
    26492667        for ( stack_node(T) * next = t.head; next; next = next->next ) {
    2650                 *crnt = ((stack_node(T) *)malloc()){ next->value }; /***/
     2668                *crnt = malloc(){ next->value };
    26512669                stack_node(T) * acrnt = *crnt;
    26522670                crnt = &acrnt->next;
     
    26542672        *crnt = 0;
    26552673}
    2656 forall(otype T) stack(T) ?=?(stack(T) * s, stack(T) t) {
    2657         if ( s->head == t.head ) return *s;
    2658         clear(s);
     2674forall(otype T) stack(T) ?=?( stack(T) & s, stack(T) t ) {
     2675        if ( s.head == t.head ) return s;
     2676        clear( s );
    26592677        s{ t };
    2660         return *s;
    2661 }
    2662 forall(otype T) void ^?{}(stack(T) * s) { clear(s); }
    2663 forall(otype T) _Bool empty(const stack(T) * s) { return s->head == 0; }
    2664 forall(otype T) void push(stack(T) * s, T value) {
    2665         s->head = ((stack_node(T) *)malloc()){ value, s->head }; /***/
    2666 }
    2667 forall(otype T) T pop(stack(T) * s) {
    2668         stack_node(T) * n = s->head;
    2669         s->head = n->next;
     2678        return s;
     2679}
     2680forall(otype T) void ^?{}( stack(T) & s) { clear( s ); }
     2681forall(otype T) _Bool empty( const stack(T) & s ) { return s.head == 0; }
     2682forall(otype T) void push( stack(T) & s, T value ) {
     2683        s.head = malloc(){ value, s.head };
     2684}
     2685forall(otype T) T pop( stack(T) & s ) {
     2686        stack_node(T) * n = s.head;
     2687        s.head = n->next;
    26702688        T x = n->value;
    26712689        ^n{};
    2672         free(n);
     2690        free( n );
    26732691        return x;
    26742692}
    2675 forall(otype T) void clear(stack(T) * s) {
    2676         for ( stack_node(T) * next = s->head; next; ) {
     2693forall(otype T) void clear( stack(T) & s ) {
     2694        for ( stack_node(T) * next = s.head; next; ) {
    26772695                stack_node(T) * crnt = next;
    26782696                next = crnt->next;
    2679                 delete(crnt);
     2697                delete( crnt );
    26802698        }
    2681         s->head = 0;
     2699        s.head = 0;
    26822700}
    26832701\end{cfa}
     
    28282846
    28292847\begin{comment}
    2830 
    28312848\subsubsection{bench.h}
    28322849(\texttt{bench.hpp} is similar.)
Note: See TracChangeset for help on using the changeset viewer.