Context Navigation

-                      r507e7a2
+                      rab3251e
 The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects.
 This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more.
 The TIOBE~\cite{TIOBE} ranks the top 5 most popular programming languages as: Java 16\%, \Textbf{C 7\%}, \Textbf{\CC 5\%}, \Csharp 4\%, Python 4\% = 36\%, where the next 50 languages are less than 3\% each with a long tail.
+The TIOBE~\cite{TIOBE} ranks the top 5 most \emph{popular} programming languages as: Java 15\%, \Textbf{C 12\%}, \Textbf{\CC 5.5\%}, Python 5\%, \Csharp 4.5\% = 42\%, where the next 50 languages are less than 4\% each with a long tail.
 The top 3 rankings over the past 30 years are:
 \begin{center}
 …
 \lstDeleteShortInline@%
 \begin{tabular}{@{}rccccccc@{}}
                 & 2017  & 2012  & 2007  & 2002  & 1997  & 1992  & 1987          \\ \hline
 Java    & 1             & 1             & 1             & 1             & 12    & -             & -                     \\
 \Textbf{C}      & \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{1}& \Textbf{1}& \Textbf{1}    \\
 \CC             & 3             & 3             & 3             & 3             & 2             & 2             & 4                     \\
+                & 2018  & 2013  & 2008  & 2003  & 1998  & 1993  & 1988  \\ \hline
+Java    & 1             & 2             & 1             & 1             & 18    & -             & -             \\
+\Textbf{C}& \Textbf{2} & \Textbf{1} & \Textbf{2} & \Textbf{2} & \Textbf{1} & \Textbf{1} & \Textbf{1} \\
+\CC             & 3             & 4             & 3             & 3             & 2             & 2             & 5             \\
 \end{tabular}
 \lstMakeShortInline@%
 …
 \CFA is currently implemented as a source-to-source translator from \CFA to the gcc-dialect of C~\cite{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by gcc, meeting goals (1)--(3).
 Ultimately, a compiler is necessary for advanced features and optimal performance.
+This paper identifies shortcomings in existing approaches to generic and variadic data types in C-like languages and presents a design for generic and variadic types avoiding those shortcomings.
+All of the features discussed in this paper are working, unless a feature states it is a future feature for completion.
+\section{Polymorphic Functions}
+\CFA introduces both ad-hoc and parametric polymorphism to C, with a design originally formalized by Ditchfield~\cite{Ditchfield92}, and first implemented by Bilson~\cite{Bilson03}.
+Shortcomings are identified in existing approaches to generic and variadic data types in C-like languages and how these shortcomings are avoided in \CFA.
 Specifically, the solution is both reusable and type-checked, as well as conforming to the design goals of \CFA with ergonomic use of existing C abstractions.
+The new constructs are empirically compared with both standard C and \CC; the results show the new design is comparable in performance.
+\section{Polymorphic Functions}
+\CFA introduces both ad-hoc and parametric polymorphism to C, with a design originally formalized by Ditchfield~\cite{Ditchfield92}, and first implemented by Bilson~\cite{Bilson03}.
+The new constructs are empirically compared with C and \CC approaches via performance experiments in Section~\ref{sec:eval}.
 \subsection{Name Overloading}
+\label{s:NameOverloading}
+\begin{quote}
+There are only two hard things in Computer Science: cache invalidation and \emph{naming things} -- Phil Karlton
+\end{quote}
 C already has a limited form of ad-hoc polymorphism in the form of its basic arithmetic operators, which apply to a variety of different types using identical syntax.
 \CFA extends the built-in operator overloading by allowing users to define overloads for any function, not just operators, and even any variable;
 …
 \begin{cfa}
+int max(int a, int b) { return a < b ? b : a; }  // (1)
+double max(double a, double b) { return a < b ? b : a; }  // (2)
+int max = INT_MAX;     // (3)
+double max = DBL_MAX;  // (4)
+max(7, -max);   $\C{// uses (1) and (3), by matching int from constant 7}$
+max(max, 3.14); $\C{// uses (2) and (4), by matching double from constant 3.14}$
+//max(max, -max);  $\C{// ERROR: ambiguous}$
+int m = max(max, -max); $\C{// uses (1) once and (3) twice, by matching return type}$
+\end{cfa}
+\Celeven did add @_Generic@ expressions, which can be used in preprocessor macros to provide a form of ad-hoc polymorphism; however, this polymorphism is both functionally and ergonomically inferior to \CFA name overloading.
+int max = 2147483647;                                           $\C[3.75in]{// (1)}$
+double max = 1.7976931348623157E+308;   $\C{// (2)}$
+int max( int a, int b ) { return a < b ? b : a; }  $\C{// (3)}$
+double max( double a, double b ) { return a < b ? b : a; }  $\C{// (4)}\CRT$
+max( 7, -max );                                                         $\C{// uses (3) and (1), by matching int from constant 7}$
+max( max, 3.14 );                                                       $\C{// uses (4) and (2), by matching double from constant 3.14}$
+max( max, -max );                                                       $\C{// ERROR: ambiguous}$
+int m = max( max, -max );                                       $\C{// uses (3) and (1) twice, by matching return type}$
+\end{cfa}
+\CFA maximizes the ability to reuse names to aggressively address the naming problem.
+In some cases, hundreds of names can be reduced to tens, resulting in a significant cognitive reduction for a programmer.
+In the above, the name @max@ has a consistent meaning, and a programmer only needs to remember the single concept: maximum.
+To prevent significant ambiguities, \CFA uses the return type in selecting overloads, \eg in the assignment to @m@, the compiler use @m@'s type to unambiguously select the most appropriate call to function @max@ (as does Ada).
+As is shown later, there are a number of situations where \CFA takes advantage of available type information to disambiguate, where other programming languages generate ambiguities.
+\Celeven added @_Generic@ expressions, which can be used in preprocessor macros to provide a form of ad-hoc polymorphism; however, this polymorphism is both functionally and ergonomically inferior to \CFA name overloading.
 The macro wrapping the generic expression imposes some limitations; as an example, it could not implement the example above, because the variables @max@ would collide with the functions @max@.
 Ergonomic limitations of @_Generic@ include the necessity to put a fixed list of supported types in a single place and manually dispatch to appropriate overloads, as well as possible namespace pollution from the functions dispatched to, which must all have distinct names.
 Though name-overloading removes a major use-case for @_Generic@ expressions, \CFA does implement @_Generic@ for backwards-compatibility purposes. \TODO{actually implement that}
+Though name-overloading removes a major use-case for @_Generic@ expressions, \CFA implements @_Generic@ for backwards-compatibility purposes. \TODO{actually implement that}
 % http://fanf.livejournal.com/144696.html
 …
 For example, the function @twice@ can be defined using the \CFA syntax for operator overloading:
 \begin{cfa}
 forall( otype T `| { T ?+?(T, T); }` ) T twice( T x ) { return x + x; } $\C{// ? denotes operands}$
+forall( otype T `| { T ?+?(T, T); }` ) T twice( T x ) { return x `+` x; }       $\C{// ? denotes operands}$
 int val = twice( twice( 3.7 ) );
 \end{cfa}
 …
 \begin{cfa}
 forall( otype T | { int ?<?( T, T ); } ) void qsort( const T * arr, size_t size ) { /* use C qsort */ }
+{       int ?<?( double x, double y ) { return x `>` y; }       $\C{// locally override behaviour}$
+{
+        int ?<?( double x, double y ) { return x `>` y; }       $\C{// locally override behaviour}$
         qsort( vals, size );                                    $\C{// descending sort}$
+}
 …
 \section{Generic Types}
 One of the known shortcomings of standard C is that it does not provide reusable type-safe abstractions for generic data structures and algorithms.
+A significant shortcoming of standard C is the lack of reusable type-safe abstractions for generic data structures and algorithms.
 Broadly speaking, there are three approaches to implement abstract data-structures in C.
 One approach is to write bespoke data-structures for each context in which they are needed.
 While this approach is flexible and supports integration with the C type-checker and tooling, it is also tedious and error-prone, especially for more complex data structures.
 A second approach is to use @void *@--based polymorphism, \eg the C standard-library functions @bsearch@ and @qsort@; an approach which does allow reuse of code for common functionality.
 However, basing all polymorphism on @void *@ eliminates the type-checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that would not otherwise be needed.
+A second approach is to use @void *@--based polymorphism, \eg the C standard-library functions @bsearch@ and @qsort@, which allows reuse of code with common functionality.
+However, basing all polymorphism on @void *@ eliminates the type-checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is not otherwise needed.
 A third approach to generic code is to use preprocessor macros, which does allow the generated code to be both generic and type-checked, but errors may be difficult to interpret.
 Furthermore, writing and using preprocessor macros can be unnatural and inflexible.
 …
 };
 forall( otype T ) T value( pair( const char *, T ) p ) { return p.second; }
 forall( dtype F, otype T ) T value_p( pair( F *, T * ) p ) { return * p.second; }
+forall( dtype F, otype T ) T value( pair( F *, T * ) p ) { return *p.second; }
 pair( const char *, int ) p = { "magic", 42 };
 int magic = value( p );
+int i = value( p );
 pair( void *, int * ) q = { 0, &p.second };
 magic = value_p( q );
+i = value( q );
 double d = 1.0;
 pair( double *, double * ) r = { &d, &d };
 d = value_p( r );
+d = value( r );
 \end{cfa}
 …
 [ double ] foo$\(_2\)$( int );
 void bar( int, double, double );
 bar( foo( 3 ), foo( 3 ) );
+`bar`( foo( 3 ), foo( 3 ) );
 \end{cfa}
 The type-resolver only has the tuple return-types to resolve the call to @bar@ as the @foo@ parameters are identical, which involves unifying the possible @foo@ functions with @bar@'s parameter list.
 …
 Since @sum@\(_0\) does not accept any arguments, it is not a valid candidate function for the call @sum(10, 20, 30)@.
 In order to call @sum@\(_1\), @10@ is matched with @x@, and the argument resolution moves on to the argument pack @rest@, which consumes the remainder of the argument list and @Params@ is bound to @[20, 30]@.
 The process continues unitl @Params@ is bound to @[]@, requiring an assertion @int sum()@, which matches @sum@\(_0\) and terminates the recursion.
+The process continues until @Params@ is bound to @[]@, requiring an assertion @int sum()@, which matches @sum@\(_0\) and terminates the recursion.
 Effectively, this algorithm traces as @sum(10, 20, 30)@ $\rightarrow$ @10 + sum(20, 30)@ $\rightarrow$ @10 + (20 + sum(30))@ $\rightarrow$ @10 + (20 + (30 + sum()))@ $\rightarrow$ @10 + (20 + (30 + 0))@.
 …
 @case@ clauses are made disjoint by the @break@ statement.
 While the ability to fall through \emph{is} a useful form of control flow, it does not match well with programmer intuition, resulting in many errors from missing @break@ statements.
+For backwards compatibility, \CFA provides a \emph{new} control structure, @choose@, which mimics @switch@, but reverses the meaning of fall through:
+\begin{cquote}
+For backwards compatibility, \CFA provides a \emph{new} control structure, @choose@, which mimics @switch@, but reverses the meaning of fall through (see Figure~\ref{f:ChooseSwitchStatements}).
+Collectively, these enhancements reduce programmer burden and increase readability and safety.
+\begin{figure}
+\centering
 \lstDeleteShortInline@%
 \begin{tabular}{@{}l@{\hspace{\parindentlnth}}l@{}}
 \multicolumn{1}{c@{\hspace{\parindentlnth}}}{\textbf{\CFA}}     & \multicolumn{1}{c}{\textbf{C}}        \\
+\begin{tabular}{@{}l@{\hspace{2\parindentlnth}}l@{}}
+\multicolumn{1}{c@{\hspace{2\parindentlnth}}}{\textbf{\CFA}}    & \multicolumn{1}{c}{\textbf{C}}        \\
 \begin{cfa}
 `choose` ( day ) {
 …
 \end{tabular}
 \lstMakeShortInline@%
+\end{cquote}
+Collectively, these enhancements reduce programmer burden and increase readability and safety.
+\caption{\lstinline|choose| versus \lstinline|switch| Statements}
+\label{f:ChooseSwitchStatements}
+\end{figure}
 \begin{comment}
 …
 \begin{cquote}
 \lstDeleteShortInline@%
 \begin{tabular}{@{}l@{\hspace{\parindentlnth}}l@{}}
 \multicolumn{1}{c@{\hspace{\parindentlnth}}}{\textbf{Resumption}}       & \multicolumn{1}{c}{\textbf{Recovery}} \\
+\begin{tabular}{@{}l@{\hspace{2\parindentlnth}}l@{}}
+\multicolumn{1}{c@{\hspace{2\parindentlnth}}}{\textbf{Resumption}}      & \multicolumn{1}{c}{\textbf{Termination}}      \\
 \begin{cfa}
 `exception R { int fix; };`
 …
 In many cases, the interface is an inline wrapper providing overloading during compilation but zero cost at runtime.
 The following sections give a glimpse of the interface reduction to many C libraries.
 In many cases, @signed@/@unsigned@ @char@ and @short@ routines are available (but not shown) to ensure expression computations remain in a single type, as conversions can distort results.
+In many cases, @signed@/@unsigned@ @char@, @short@, and @_Complex@ routines are available (but not shown) to ensure expression computations remain in a single type, as conversions can distort results.
 …
 \begin{cfa}
 MIN
 MAX
+M_PI
+M_E
+PI
+E
 \end{cfa}
+&
 \begin{cfa}
 SCHAR_MIN, CHAR_MIN, SHRT_MIN, INT_MIN, LONG_MIN, LLONG_MIN,
+                FLT_MIN, DBL_MIN, LDBL_MIN
 SCHAR_MAX, UCHAR_MAX, SHRT_MAX, INT_MAX, LONG_MAX, LLONG_MAX,
+M_PI, M_PIl, M_CPI, M_CPIl,
+M_E, M_El, M_CE, M_CEl
+                 FLT_MAX, DBL_MAX, LDBL_MAX
+M_PI, M_PIl
+M_E, M_El
 \end{cfa}
 \end{tabular}
 …
 While \Celeven has type-generic math~\cite[\S~7.25]{C11} in @tgmath.h@ to provide a similar mechanism, these macros are limited, matching a routine name with a single set of floating type(s).
 For example, it is impossible to overload @atan@ for both one and two arguments;
 instead the names @atan@ and @atan2@ are required.
+instead the names @atan@ and @atan2@ are required (see Section~\ref{s:NameOverloading}).
 The key observation is that only a restricted set of type-generic macros are provided for a limited set of routine names, which do not generalize across the type system, as in \CFA.
 …
 \begin{cfa}[xleftmargin=3\parindentlnth,aboveskip=0pt,belowskip=0pt]
 int main( int argc, char * argv[] ) {
+        FILE * out = fopen( "cfa-out.txt", "w" );
+        int maxi = 0, vali = 42;
+        stack(int) si, ti;
+        REPEAT_TIMED( "push_int", N, push( &si, vali ); )
+        TIMED( "copy_int", ti = si; )
+        TIMED( "clear_int", clear( &si ); )
+        REPEAT_TIMED( "pop_int", N,
+                int xi = pop( &ti ); if ( xi > maxi ) { maxi = xi; } )
+        REPEAT_TIMED( "print_int", N/2, print( out, vali, ":", vali, "\n" ); )
+        pair(_Bool, char) maxp = { (_Bool)0, '\0' }, valp = { (_Bool)1, 'a' };
+        stack(pair(_Bool, char)) sp, tp;
+        REPEAT_TIMED( "push_pair", N, push( &sp, valp ); )
+        TIMED( "copy_pair", tp = sp; )
+        TIMED( "clear_pair", clear( &sp ); )
+        REPEAT_TIMED( "pop_pair", N,
+                pair(_Bool, char) xp = pop( &tp ); if ( xp > maxp ) { maxp = xp; } )
+        REPEAT_TIMED( "print_pair", N/2, print( out, valp, ":", valp, "\n" ); )
+        fclose(out);
+        ofstream out = { "cfa-out.txt" };
+        int max = 0, val = 42;
+        stack( int ) si, t;
+        REPEAT_TIMED( "push_int", N, push( si, val ); )
+        TIMED( "copy_int", t = si; )
+        TIMED( "clear_int", clear( si ); )
+        REPEAT_TIMED( "pop_int", N, int x = pop( t ); max = max( x, max ); )
+        REPEAT_TIMED( "print_int", N/2, out | val | ':' | val | endl; )
+        pair( _Bool, char ) max = { (_Bool)0, '\0' }, val = { (_Bool)1, 'a' };
+        stack( pair( _Bool, char ) ) s, t;
+        REPEAT_TIMED( "push_pair", N, push( s, val ); )
+        TIMED( "copy_pair", t = s; )
+        TIMED( "clear_pair", clear( s ); )
+        REPEAT_TIMED( "pop_pair", N, pair(_Bool, char) x = pop( t ); max = max( x, max ); )
+        REPEAT_TIMED( "print_pair", N/2, out | val | ':' | val | endl; )
+}
 \end{cfa}
 …
 \CFA
 \begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
+forall(otype T) struct stack_node;
+forall(otype T) struct stack {
+        stack_node(T) * head;
+};
 forall(otype T) struct stack_node {
         T value;
         stack_node(T) * next;
 };
 forall(otype T) void ?{}(stack(T) * s) { (&s->head){ 0 }; }
 forall(otype T) void ?{}(stack(T) * s, stack(T) t) {
         stack_node(T) ** crnt = &s->head;
+forall(otype T) void ?{}( stack(T) & s ) { (s.head){ 0 }; }
+forall(otype T) void ?{}( stack(T) & s, stack(T) t ) {
+        stack_node(T) ** crnt = &s.head;
         for ( stack_node(T) * next = t.head; next; next = next->next ) {
                 *crnt = ((stack_node(T) *)malloc()){ next->value }; /***/
+                *crnt = malloc(){ next->value };
                 stack_node(T) * acrnt = *crnt;
                 crnt = &acrnt->next;
 …
         *crnt = 0;
+}
 forall(otype T) stack(T) ?=?(stack(T) * s, stack(T) t) {
         if ( s->head == t.head ) return *s;
         clear(s);
+forall(otype T) stack(T) ?=?( stack(T) & s, stack(T) t ) {
+        if ( s.head == t.head ) return s;
+        clear( s );
         s{ t };
         return *s;
+}
 forall(otype T) void ^?{}(stack(T) * s) { clear(s); }
 forall(otype T) _Bool empty(const stack(T) * s) { return s->head == 0; }
 forall(otype T) void push(stack(T) * s, T value) {
         s->head = ((stack_node(T) *)malloc()){ value, s->head }; /***/
+}
 forall(otype T) T pop(stack(T) * s) {
         stack_node(T) * n = s->head;
         s->head = n->next;
+        return s;
+}
+forall(otype T) void ^?{}( stack(T) & s) { clear( s ); }
+forall(otype T) _Bool empty( const stack(T) & s ) { return s.head == 0; }
+forall(otype T) void push( stack(T) & s, T value ) {
+        s.head = malloc(){ value, s.head };
+}
+forall(otype T) T pop( stack(T) & s ) {
+        stack_node(T) * n = s.head;
+        s.head = n->next;
         T x = n->value;
         ^n{};
         free(n);
+        free( n );
         return x;
+}
 forall(otype T) void clear(stack(T) * s) {
         for ( stack_node(T) * next = s->head; next; ) {
+forall(otype T) void clear( stack(T) & s ) {
+        for ( stack_node(T) * next = s.head; next; ) {
                 stack_node(T) * crnt = next;
                 next = crnt->next;
                 delete(crnt);
+                delete( crnt );
+        }
         s->head = 0;
+        s.head = 0;
+}
 \end{cfa}
 …
 \begin{comment}
 \subsubsection{bench.h}
 (\texttt{bench.hpp} is similar.)

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset ab3251e for doc/papers/general

Legend:

doc/papers/general/Paper.tex

Download in other formats: