# Changeset d16f9fd for doc/papers

Ignore:
Timestamp:
Aug 10, 2018, 8:41:42 AM (4 years ago)
Branches:
aaron-thesis, arm-eh, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, no_list, persistent-indexer, pthread-emulation, qualifiedEnum
Children:
bd56b07
Parents:
581743f
Message:

changes and corrections to match SPE proofs

File:
1 edited

### Legend:

Unmodified
 r581743f \documentclass[AMA,STIX1COL]{WileyNJD-v2} \setlength\typewidth{170mm} \setlength\textwidth{170mm} \articletype{RESEARCH ARTICLE}% \received{26 April 2016} \revised{6 June 2016} \accepted{6 June 2016} \received{12 March 2018} \revised{8 May 2018} \accepted{28 June 2018} \setlength\typewidth{168mm} \setlength\textwidth{168mm} \raggedbottom } \title{\texorpdfstring{\protect\CFA : Adding Modern Programming Language Features to C}{Cforall : Adding Modern Programming Language Features to C}} \title{\texorpdfstring{\protect\CFA : Adding modern programming language features to C}{Cforall : Adding modern programming language features to C}} \author[1]{Aaron Moss} \author[1]{Robert Schluntz} \author[1]{Peter A. Buhr*} \author[1]{Peter A. Buhr} \author[]{\textcolor{blue}{Q1 AUTHOR NAMES CORRECT}} \authormark{MOSS \textsc{et al}} \address[1]{\orgdiv{Cheriton School of Computer Science}, \orgname{University of Waterloo}, \orgaddress{\state{Waterloo, ON}, \country{Canada}}} \corres{*Peter A. Buhr, Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada. \email{pabuhr{\char\@}uwaterloo.ca}} \address[1]{\orgdiv{Cheriton School of Computer Science}, \orgname{University of Waterloo}, \orgaddress{\state{Waterloo, Ontario}, \country{Canada}}} \corres{Peter A. Buhr, Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada. \email{pabuhr{\char\@}uwaterloo.ca}} \fundingInfo{Natural Sciences and Engineering Research Council of Canada} \abstract[Summary]{ The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from hobby projects to commercial operating-systems. This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more. Nevertheless, C, first standardized almost thirty years ago, lacks many features that make programming in more modern languages safer and more productive. The goal of the \CFA project (pronounced C-for-all'') is to create an extension of C that provides modern safety and productivity features while still ensuring strong backwards compatibility with C and its programmers. Prior projects have attempted similar goals but failed to honour C programming-style; for instance, adding object-oriented or functional programming with garbage collection is a non-starter for many C developers. Specifically, \CFA is designed to have an orthogonal feature-set based closely on the C programming paradigm, so that \CFA features can be added \emph{incrementally} to existing C code-bases, and C programmers can learn \CFA extensions on an as-needed basis, preserving investment in existing code and programmers. This paper presents a quick tour of \CFA features showing how their design avoids shortcomings of similar features in C and other C-like languages. The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from hobby projects to commercial operating systems. This installation base and the programmers producing it represent a massive software engineering investment spanning decades and likely to continue for decades more. Nevertheless, C, which was first standardized almost 30 \textcolor{blue}{CHANGE 40'' TO 30''} years ago, lacks many features that make programming in more modern languages safer and more productive. The goal of the \CFA project (pronounced C for all'') is to create an extension of C that provides modern safety and productivity features while still ensuring strong backward compatibility with C and its programmers. Prior projects have attempted similar goals but failed to honor the C programming style; for instance, adding object-oriented or functional programming with garbage collection is a nonstarter for many C developers. Specifically, \CFA is designed to have an orthogonal feature set based closely on the C programming paradigm, so that \CFA features can be added \emph{incrementally} to existing C code bases, and C programmers can learn \CFA extensions on an as-needed basis, preserving investment in existing code and programmers. This paper presents a quick tour of \CFA features, showing how their design avoids shortcomings of similar features in C and other C-like languages. Experimental results are presented to validate several of the new features. }% \keywords{generic types, tuple types, variadic types, polymorphic functions, C, Cforall} \keywords{C, Cforall, generic types, polymorphic functions, tuple types, variadic types} \begin{document} \linenumbers                                            % comment out to turn off line numbering %\linenumbers                                            % comment out to turn off line numbering \maketitle \section{Introduction} The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from hobby projects to commercial operating-systems. This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more. The TIOBE index~\cite{TIOBE} ranks the top 5 most \emph{popular} programming languages as: Java 15\%, \Textbf{C 12\%}, \Textbf{\CC 5.5\%}, Python 5\%, \Csharp 4.5\% = 42\%, where the next 50 languages are less than 4\% each, with a long tail. The top 3 rankings over the past 30 years are: The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from hobby projects to commercial operating systems. This installation base and the programmers producing it represent a massive software engineering investment spanning decades and likely to continue for decades more. The TIOBE index~\cite{TIOBE} \textcolor{blue}{CHANGE TIOBE'' TO The TIOBE index''} ranks the top five most \emph{popular} programming languages as Java 15\%, \Textbf{C 12\%}, \Textbf{\CC 5.5\%}, and Python 5\%, \Csharp 4.5\% = 42\%, where the next 50 languages are less than 4\% each with a long tail. The top three rankings over the past 30 years are as follows. \newpage \textcolor{blue}{MOVE TABLE HERE} \begin{center} \setlength{\tabcolsep}{10pt} \lstDeleteShortInline@% \begin{tabular}{@{}rccccccc@{}} & 2018  & 2013  & 2008  & 2003  & 1998  & 1993  & 1988  \\ \hline Java    & 1             & 2             & 1             & 1             & 18    & -             & -             \\ \fontsize{9bp}{11bp}\selectfont \lstDeleteShortInline@% \begin{tabular}{@{}cccccccc@{}} & 2018  & 2013  & 2008  & 2003  & 1998  & 1993  & 1988  \\ Java    & 1             & 2             & 1             & 1             & 18    & --    & --    \\ \Textbf{C}& \Textbf{2} & \Textbf{1} & \Textbf{2} & \Textbf{2} & \Textbf{1} & \Textbf{1} & \Textbf{1} \\ \CC             & 3             & 4             & 3             & 3             & 2             & 2             & 5             \\ \lstMakeShortInline@% \end{center} Love it or hate it, C is extremely popular, highly used, and one of the few systems languages. In many cases, \CC is often used solely as a better C. Nevertheless, C, first standardized almost forty years ago~\cite{ANSI89:C}, lacks many features that make programming in more modern languages safer and more productive. \CFA (pronounced C-for-all'', and written \CFA or Cforall) is an evolutionary extension of the C programming language that adds modern language-features to C, while maintaining source and runtime compatibility in the familiar C programming model. The four key design goals for \CFA~\cite{Bilson03} are: (1) The behaviour of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler; (2) Standard C code must be as fast and as small when translated by a \CFA compiler as when translated by a C compiler; (3) \CFA code must be at least as portable as standard C code; (4) Extensions introduced by \CFA must be translated in the most efficient way possible. These goals ensure existing C code-bases can be converted to \CFA incrementally with minimal effort, and C programmers can productively generate \CFA code without training beyond the features being used. \CC is used similarly, but has the disadvantages of multiple legacy design-choices that cannot be updated, and active divergence of the language model from C, requiring significant effort and training to incrementally add \CC to a C-based project. All languages features discussed in this paper are working, except some advanced exception-handling features. Not discussed in this paper are the integrated concurrency-constructs and user-level threading-library~\cite{Delisle18}. Nevertheless, C, which was first standardized almost 30 \textcolor{blue}{CHANGE 40'' TO 30''} years ago~\cite{ANSI89:C}, lacks many features that make programming in more modern languages safer and more productive. \CFA (pronounced C for all'' and written \CFA or Cforall) is an evolutionary extension of the C programming language that adds modern language features to C, while maintaining source and runtime compatibility in the familiar C programming model. The four key design goals for \CFA~\cite{Bilson03} are as follows: (1) the behavior of standard C code must remain the same when translated by a \CFA compiler as when translated by a C compiler; (2) the standard C code must be as fast and as small when translated by a \CFA compiler as when translated by a C compiler; (3) the \CFA code must be at least as portable as standard C code; (4) extensions introduced by \CFA must be translated in the most efficient way possible. These goals ensure that the existing C code bases can be converted into \CFA incrementally with minimal effort, and C programmers can productively generate \CFA code without training beyond the features being used. \CC is used similarly but has the disadvantages of multiple legacy design choices that cannot be updated and active divergence of the language model from C, requiring significant effort and training to incrementally add \CC to a C-based project. All language features discussed in this paper are working, except some advanced exception-handling features. Not discussed in this paper are the integrated concurrency constructs and user-level threading library~\cite{Delisle18}. \CFA is an \emph{open-source} project implemented as a source-to-source translator from \CFA to the gcc-dialect of C~\cite{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by gcc, meeting goals (1)--(3). % @plg2[9]% cd cfa-cc/src; cloc ArgTweak CodeGen CodeTools Common Concurrency ControlStruct Designators GenPoly InitTweak MakeLibCfa.cc MakeLibCfa.h Parser ResolvExpr SymTab SynTree Tuples driver prelude main.cc % SUM:                           223           8203           8263          46479 % ------------------------------------------------------------------------------- The \CFA translator is 200+ files and 46,000+ lines of code written in C/\CC. A translator versus a compiler makes it easier and faster to generate and debug C object-code rather than intermediate, assembler or machine code; The \CFA translator is 200+ files and 46\,000+ lines of code written in C/\CC. A translator versus a compiler makes it easier and faster to generate and debug the C object code rather than the intermediate, assembler, or machine code; ultimately, a compiler is necessary for advanced features and optimal performance. % The translator design is based on the \emph{visitor pattern}, allowing multiple passes over the abstract code-tree, which works well for incrementally adding new feature through additional visitor passes. Two key translator components are expression analysis, determining expression validity and what operations are required for its implementation, and code generation, dealing with multiple forms of overloading, polymorphism, and multiple return values by converting them into C code for a C compiler that supports none of these features. Details of these components are available in Bilson~\cite{Bilson03} Chapters 2 and 3, and form the base for the current \CFA translator. Details of these components are available in chapters 2 and 3 in the work of Bilson~\cite{Bilson03} and form the base for the current \CFA translator. % @plg2[8]% cd cfa-cc/src; cloc libcfa % ------------------------------------------------------------------------------- % SUM:                           100           1895           2785          11763 % ------------------------------------------------------------------------------- The \CFA runtime system is 100+ files and 11,000+ lines of code, written in \CFA. The \CFA runtime system is 100+ files and 11\,000+ lines of code, written in \CFA. Currently, the \CFA runtime is the largest \emph{user} of \CFA providing a vehicle to test the language features and implementation. % @plg2[6]% cd cfa-cc/src; cloc tests examples benchmark \vspace*{-6pt} \section{Polymorphic Functions} \CFA introduces both ad-hoc and parametric polymorphism to C, with a design originally formalized by Ditchfield~\cite{Ditchfield92}, and first implemented by Bilson~\cite{Bilson03}. Shortcomings are identified in existing approaches to generic and variadic data types in C-like languages and how these shortcomings are avoided in \CFA. Specifically, the solution is both reusable and type-checked, as well as conforming to the design goals of \CFA with ergonomic use of existing C abstractions. \CFA introduces both ad hoc and parametric polymorphism to C, with a design originally formalized by Ditchfield~\cite{Ditchfield92} and first implemented by Bilson~\cite{Bilson03}. Shortcomings are identified in the existing approaches to generic and variadic data types in C-like languages and how these shortcomings are avoided in \CFA. Specifically, the solution is both reusable and type checked, as well as conforming to the design goals of \CFA with ergonomic use of existing C abstractions. The new constructs are empirically compared with C and \CC approaches via performance experiments in Section~\ref{sec:eval}. \subsection{Name Overloading} \vspace*{-6pt} \subsection{Name overloading} \label{s:NameOverloading} \begin{quote} There are only two hard things in Computer Science: cache invalidation and \emph{naming things} -- Phil Karlton There are only two hard things in Computer Science: cache invalidation and \emph{naming things}.''---Phil Karlton \end{quote} \vspace{-9pt} C already has a limited form of ad-hoc polymorphism in its basic arithmetic operators, which apply to a variety of different types using identical syntax. C already has a limited form of ad hoc polymorphism in its basic arithmetic operators, which apply to a variety of different types using identical syntax. \CFA extends the built-in operator overloading by allowing users to define overloads for any function, not just operators, and even any variable; Section~\ref{sec:libraries} includes a number of examples of how this overloading simplifies \CFA programming relative to C. Code generation for these overloaded functions and variables is implemented by the usual approach of mangling the identifier names to include a representation of their type, while \CFA decides which overload to apply based on the same usual arithmetic conversions'' used in C to disambiguate operator overloads. As an example: \textcolor{blue}{REMOVE We have the following as an example''} \newpage \textcolor{blue}{UPDATE FOLLOWING PROGRAM EXAMPLE WITH ADJUSTED COMMENTS TO FIT PAGE WIDTH.} \begin{cfa} int max = 2147483647;                                           $\C[4in]{// (1)}$ int max( int a, int b ) { return a < b ? b : a; }  $\C{// (3)}$ double max( double a, double b ) { return a < b ? b : a; }  $\C{// (4)}\CRT$ max( 7, -max );                                         $\C{// uses (3) and (1), by matching int from constant 7}$ max( 7, -max );                                         $\C[3in]{// uses (3) and (1), by matching int from constant 7}$ max( max, 3.14 );                                       $\C{// uses (4) and (2), by matching double from constant 3.14}$ max( max, -max );                                       $\C{// ERROR, ambiguous}$ int m = max( max, -max );                       $\C{// uses (3) and (1) twice, by matching return type}$ int m = max( max, -max );                       $\C{// uses (3) and (1) twice, by matching return type}\CRT$ \end{cfa} As is shown later, there are a number of situations where \CFA takes advantage of available type information to disambiguate, where other programming languages generate ambiguities. \Celeven added @_Generic@ expressions~\cite[\S~6.5.1.1]{C11}, which is used with preprocessor macros to provide ad-hoc polymorphism; \Celeven added @_Generic@ expressions (see section~6.5.1.1 of the ISO/IEC 9899~\cite{C11}), which is used with preprocessor macros to provide ad hoc polymorphism; however, this polymorphism is both functionally and ergonomically inferior to \CFA name overloading. The macro wrapping the generic expression imposes some limitations; \eg, it cannot implement the example above, because the variables @max@ are ambiguous with the functions @max@. The macro wrapping the generic expression imposes some limitations, for instance, it cannot implement the example above, because the variables @max@ are ambiguous with the functions @max@. Ergonomic limitations of @_Generic@ include the necessity to put a fixed list of supported types in a single place and manually dispatch to appropriate overloads, as well as possible namespace pollution from the dispatch functions, which must all have distinct names. \CFA supports @_Generic@ expressions for backwards compatibility, but it is an unnecessary mechanism. \TODO{actually implement that} \CFA supports @_Generic@ expressions for backward compatibility, but it is an unnecessary mechanism. % http://fanf.livejournal.com/144696.html \subsection{\texorpdfstring{\protect\lstinline{forall} Functions}{forall Functions}} \vspace*{-10pt} \subsection{\texorpdfstring{\protect\lstinline{forall} functions}{forall functions}} \label{sec:poly-fns} The signature feature of \CFA is parametric-polymorphic functions~\cite{forceone:impl,Cormack90,Duggan96} with functions generalized using a @forall@ clause (giving the language its name): The signature feature of \CFA is parametric-polymorphic functions~\cite{forceone:impl,Cormack90,Duggan96} with functions generalized using a @forall@ clause (giving the language its name). \textcolor{blue}{REMOVE as follows''} \begin{cfa} forall( otype T ) T identity( T val ) { return val; } This @identity@ function can be applied to any complete \newterm{object type} (or @otype@). The type variable @T@ is transformed into a set of additional implicit parameters encoding sufficient information about @T@ to create and return a variable of that type. The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, \eg for a pointer, the type parameter can be declared as a \newterm{data type} (or @dtype@). In \CFA, the polymorphic runtime-cost is spread over each polymorphic call, because more arguments are passed to polymorphic functions; the experiments in Section~\ref{sec:eval} show this overhead is similar to \CC virtual-function calls. A design advantage is that, unlike \CC template-functions, \CFA polymorphic-functions are compatible with C \emph{separate compilation}, preventing compilation and code bloat. Since bare polymorphic-types provide a restricted set of available operations, \CFA provides a \newterm{type assertion}~\cite[pp.~37-44]{Alphard} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type-variable. For example, the function @twice@ can be defined using the \CFA syntax for operator overloading: The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor, and destructor. If this extra information is not needed, for instance, for a pointer, the type parameter can be declared as a \newterm{data type} (or @dtype@). In \CFA, the polymorphic runtime cost is spread over each polymorphic call, because more arguments are passed to polymorphic functions; the experiments in Section~\ref{sec:eval} show this overhead is similar to \CC virtual function calls. A design advantage is that, unlike \CC template functions, \CFA polymorphic functions are compatible with C \emph{separate compilation}, preventing compilation and code bloat. Since bare polymorphic types provide a restricted set of available operations, \CFA provides a \newterm{type assertion}~\cite[pp.~37-44]{Alphard} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type variable. For example, the function @twice@ can be defined using the \CFA syntax for operator overloading. \textcolor{blue}{REMOVE as follows''} \begin{cfa} forall( otype T | { T ?+?(T, T); } ) T twice( T x ) { return x + x; }  $\C{// ? denotes operands}$ int val = twice( twice( 3.7 ) );  $\C{// val == 14}$ \end{cfa} which works for any type @T@ with a matching addition operator. The polymorphism is achieved by creating a wrapper function for calling @+@ with @T@ bound to @double@, then passing this function to the first call of @twice@. There is now the option of using the same @twice@ and converting the result to @int@ on assignment, or creating another @twice@ with type parameter @T@ bound to @int@ because \CFA uses the return type~\cite{Cormack81,Baker82,Ada} in its type analysis. The first approach has a late conversion from @double@ to @int@ on the final assignment, while the second has an early conversion to @int@. \CFA minimizes the number of conversions and their potential to lose information, so it selects the first approach, which corresponds with C-programmer intuition. This works for any type @T@ with a matching addition operator. The polymorphism is achieved by creating a wrapper function for calling @+@ with the @T@ bound to @double@ and then passing this function to the first call of @twice@. There is now the option of using the same @twice@ and converting the result into @int@ on assignment or creating another @twice@ with the type parameter @T@ bound to @int@ because \CFA uses the return type~\cite{Cormack81,Baker82,Ada} in its type analysis. The first approach has a late conversion from @double@ to @int@ on the final assignment, whereas the second has an early conversion to @int@. \CFA minimizes the number of conversions and their potential to lose information; hence, it selects the first approach, which corresponds with C programmer intuition. Crucial to the design of a new programming language are the libraries to access thousands of external software features. Like \CC, \CFA inherits a massive compatible library-base, where other programming languages must rewrite or provide fragile inter-language communication with C. A simple example is leveraging the existing type-unsafe (@void *@) C @bsearch@ to binary search a sorted float array: Like \CC, \CFA inherits a massive compatible library base, where other programming languages must rewrite or provide fragile interlanguage communication with C. A simple example is leveraging the existing type-unsafe (@void *@) C @bsearch@ to binary search a sorted float array. \textcolor{blue}{REMOVE as follows''} \begin{cfa} void * bsearch( const void * key, const void * base, size_t nmemb, size_t size, double * val = (double *)bsearch( &key, vals, 10, sizeof(vals[0]), comp ); $\C{// search sorted array}$ \end{cfa} which can be augmented simply with generalized, type-safe, \CFA-overloaded wrappers: This can be augmented simply with generalized, type-safe, \CFA-overloaded wrappers. \begin{cfa} forall( otype T | { int ? y; } $\C{// locally override behaviour}$ int ? y; } $\C{// locally override behavior}$ qsort( vals, 10 );                                                      $\C{// descending sort}$ } \end{cfa} The local version of @??@ overriding the built-in @??@. The following shows one example where \CFA \emph{extends} an existing standard C interface to reduce complexity and provide safety. C/\Celeven provide a number of complex and overlapping storage-management operation to support the following capabilities: \begin{description}%[topsep=3pt,itemsep=2pt,parsep=0pt] In addition, there are polymorphic functions, like @min@ and @max@, that work on any type with operator @??@. The following shows one example where \CFA \textcolor{blue}{ADD SPACE} \emph{extends} an existing standard C interface to reduce complexity and provide safety. C/\Celeven provide a number of complex and overlapping storage-management operations to support the following capabilities. \begin{list}{}{\itemsep=0pt\parsep=0pt\labelwidth=0pt\leftmargin\parindent\itemindent-\leftmargin\let\makelabel\descriptionlabel} \item[fill] an allocation with a specified character. \item[resize] an existing allocation to decrease or increase its size. In either case, new storage may or may not be allocated and, if there is a new allocation, as much data from the existing allocation is copied. In either case, new storage may or may not be allocated, and if there is a new allocation, as much data from the existing allocation are copied. For an increase in storage size, new storage after the copied data may be filled. \newpage \item[align] an allocation on a specified memory boundary, \eg, an address multiple of 64 or 128 for cache-line purposes. allocation with a specified number of elements. An array may be filled, resized, or aligned. \end{description} Table~\ref{t:StorageManagementOperations} shows the capabilities provided by C/\Celeven allocation-functions and how all the capabilities can be combined into two \CFA functions. \CFA storage-management functions extend the C equivalents by overloading, providing shallow type-safety, and removing the need to specify the base allocation-size. Figure~\ref{f:StorageAllocation} contrasts \CFA and C storage-allocation performing the same operations with the same type safety. \end{list} Table~\ref{t:StorageManagementOperations} shows the capabilities provided by C/\Celeven allocation functions and how all the capabilities can be combined into two \CFA functions. \CFA storage-management functions extend the C equivalents by overloading, providing shallow type safety, and removing the need to specify the base allocation size. Figure~\ref{f:StorageAllocation} contrasts \CFA and C storage allocation performing the same operations with the same type safety. \begin{table} \caption{Storage-Management Operations} \caption{Storage-management operations} \label{t:StorageManagementOperations} \centering \lstDeleteShortInline@% \lstMakeShortInline~% \begin{tabular}{@{}r|r|l|l|l|l@{}} \multicolumn{1}{c}{}&           & \multicolumn{1}{c|}{fill}     & resize        & align & array \\ \hline \begin{tabular}{@{}rrllll@{}} \multicolumn{1}{c}{}&           & \multicolumn{1}{c}{fill}      & resize        & align & array \\ C               & ~malloc~                      & no                    & no            & no            & no    \\ & ~calloc~                      & yes (0 only)  & no            & no            & yes   \\ & ~memalign~            & no                    & no            & yes           & no    \\ & ~posix_memalign~      & no                    & no            & yes           & no    \\ \hline C11             & ~aligned_alloc~       & no                    & no            & yes           & no    \\ \hline \CFA    & ~alloc~                       & yes/copy              & no/yes        & no            & yes   \\ & ~align_alloc~         & yes                   & no            & yes           & yes   \\ \begin{figure} \centering \fontsize{9bp}{11bp}\selectfont \begin{cfa}[aboveskip=0pt,xleftmargin=0pt] size_t  dim = 10;                                                       $\C{// array dimension}$ \end{tabular} \lstMakeShortInline@% \caption{\CFA versus C Storage-Allocation} \caption{\CFA versus C storage allocation} \label{f:StorageAllocation} \end{figure} Variadic @new@ (see Section~\ref{sec:variadic-tuples}) cannot support the same overloading because extra parameters are for initialization. Hence, there are @new@ and @anew@ functions for single and array variables, and the fill value is the arguments to the constructor, \eg: Hence, there are @new@ and @anew@ functions for single and array variables, and the fill value is the arguments to the constructor. \begin{cfa} struct S { int i, j; }; S * as = anew( dim, 2, 3 );                                     $\C{// each array element initialized to 2, 3}$ \end{cfa} Note, \CC can only initialize array elements via the default constructor. Finally, the \CFA memory-allocator has \newterm{sticky properties} for dynamic storage: fill and alignment are remembered with an object's storage in the heap. Note that \CC can only initialize array elements via the default constructor. Finally, the \CFA memory allocator has \newterm{sticky properties} for dynamic storage: fill and alignment are remembered with an object's storage in the heap. When a @realloc@ is performed, the sticky properties are respected, so that new storage is correctly aligned and initialized with the fill character. \label{s:IOLibrary} The goal of \CFA I/O is to simplify the common cases, while fully supporting polymorphism and user defined types in a consistent way. The goal of \CFA I/O is to simplify the common cases, while fully supporting polymorphism and user-defined types in a consistent way. The approach combines ideas from \CC and Python. The \CFA header file for the I/O library is @fstream@. \lstMakeShortInline@% \end{cquote} The \CFA form has half the characters of the \CC form, and is similar to Python I/O with respect to implicit separators. The \CFA form has half the characters of the \CC form and is similar to Python I/O with respect to implicit separators. Similar simplification occurs for tuple I/O, which prints all tuple values separated by \lstinline[showspaces=true]@, @''. \begin{cfa} \lstMakeShortInline@% \end{cquote} There is a weak similarity between the \CFA logical-or operator and the Shell pipe-operator for moving data, where data flows in the correct direction for input but the opposite direction for output. There is a weak similarity between the \CFA logical-or operator and the Shell pipe operator for moving data, where data flow in the correct direction for input but in the opposite direction for output. \begin{comment} The implicit separator character (space/blank) is a separator not a terminator. \end{itemize} \end{comment} There are functions to set and get the separator string, and manipulators to toggle separation on and off in the middle of output. \subsection{Multi-precision Integers} There are functions to set and get the separator string and manipulators to toggle separation on and off in the middle of output. \subsection{Multiprecision integers} \label{s:MultiPrecisionIntegers} \CFA has an interface to the GMP multi-precision signed-integers~\cite{GMP}, similar to the \CC interface provided by GMP. The \CFA interface wraps GMP functions into operator functions to make programming with multi-precision integers identical to using fixed-sized integers. The \CFA type name for multi-precision signed-integers is @Int@ and the header file is @gmp@. Figure~\ref{f:GMPInterface} shows a multi-precision factorial-program contrasting the GMP interface in \CFA and C. \begin{figure} \CFA has an interface to the \textcolor{blue}{Q3 CHANGE GMP multiprecision'' TO GNU multiple precision (GMP)''} signed integers~\cite{GMP}, similar to the \CC interface provided by GMP. The \CFA interface wraps GMP functions into operator functions to make programming with multiprecision integers identical to using fixed-sized integers. The \CFA type name for multiprecision signed integers is @Int@ and the header file is @gmp@. Figure~\ref{f:GMPInterface} shows a multiprecision factorial program contrasting the GMP interface in \CFA and C. \begin{figure}[b] \centering \fontsize{9bp}{11bp}\selectfont \lstDeleteShortInline@% \begin{tabular}{@{}l@{\hspace{3\parindentlnth}}l@{}} \end{tabular} \lstMakeShortInline@% \caption{GMP Interface \CFA versus C} \caption{GMP interface \CFA versus C} \label{f:GMPInterface} \end{figure} \vspace{-4pt} \section{Polymorphism Evaluation} \label{sec:eval} % Though \CFA provides significant added functionality over C, these features have a low runtime penalty. % In fact, it is shown that \CFA's generic programming can enable faster runtime execution than idiomatic @void *@-based C code. The experiment is a set of generic-stack micro-benchmarks~\cite{CFAStackEvaluation} in C, \CFA, and \CC (see implementations in Appendix~\ref{sec:BenchmarkStackImplementations}). The experiment is a set of generic-stack microbenchmarks~\cite{CFAStackEvaluation} in C, \CFA, and \CC (see implementations in Appendix~\ref{sec:BenchmarkStackImplementations}). Since all these languages share a subset essentially comprising standard C, maximal-performance benchmarks should show little runtime variance, differing only in length and clarity of source code. A more illustrative comparison measures the costs of idiomatic usage of each language's features. Figure~\ref{fig:BenchmarkTest} shows the \CFA benchmark tests for a generic stack based on a singly linked-list. Figure~\ref{fig:BenchmarkTest} shows the \CFA benchmark tests for a generic stack based on a singly linked list. The benchmark test is similar for the other languages. The experiment uses element types @int@ and @pair(short, char)@, and pushes $N=40M$ elements on a generic stack, copies the stack, clears one of the stacks, and finds the maximum value in the other stack. \begin{figure} \fontsize{9bp}{11bp}\selectfont \begin{cfa}[xleftmargin=3\parindentlnth,aboveskip=0pt,belowskip=0pt] int main() { } \end{cfa} \caption{\protect\CFA Benchmark Test} \caption{\protect\CFA benchmark test} \label{fig:BenchmarkTest} \vspace*{-10pt} \end{figure} The structure of each benchmark implemented is: C with @void *@-based polymorphism, \CFA with parametric polymorphism, \CC with templates, and \CC using only class inheritance for polymorphism, called \CCV. The structure of each benchmark implemented is C with @void *@-based polymorphism, \CFA with parametric polymorphism, \CC with templates, and \CC using only class inheritance for polymorphism, called \CCV. The \CCV variant illustrates an alternative object-oriented idiom where all objects inherit from a base @object@ class, mimicking a Java-like interface; hence runtime checks are necessary to safely down-cast objects. The most notable difference among the implementations is in memory layout of generic types: \CFA and \CC inline the stack and pair elements into corresponding list and pair nodes, while C and \CCV lack such a capability and instead must store generic objects via pointers to separately-allocated objects. Note, the C benchmark uses unchecked casts as C has no runtime mechanism to perform such checks, while \CFA and \CC provide type-safety statically. hence, runtime checks are necessary to safely downcast objects. The most notable difference among the implementations is in memory layout of generic types: \CFA and \CC inline the stack and pair elements into corresponding list and pair nodes, whereas C and \CCV lack such capability and, instead, must store generic objects via pointers to separately allocated objects. Note that the C benchmark uses unchecked casts as C has no runtime mechanism to perform such checks, whereas \CFA and \CC provide type safety statically. Figure~\ref{fig:eval} and Table~\ref{tab:eval} show the results of running the benchmark in Figure~\ref{fig:BenchmarkTest} and its C, \CC, and \CCV equivalents. The graph plots the median of 5 consecutive runs of each program, with an initial warm-up run omitted. All code is compiled at \texttt{-O2} by gcc or g++ 6.4.0, with all \CC code compiled as \CCfourteen. The benchmarks are run on an Ubuntu 16.04 workstation with 16 GB of RAM and a 6-core AMD FX-6300 CPU with 3.5 GHz maximum clock frequency. The graph plots the median of five consecutive runs of each program, with an initial warm-up run omitted. All code is compiled at \texttt{-O2} by gcc or g++ 6.4.0, with all \CC code compiled as \CCfourteen. \textcolor{blue}{CHANGE \CC{}fourteen'' TO \CCfourteen''} The benchmarks are run on an Ubuntu 16.04 workstation with 16 GB of RAM and a 6-core AMD FX-6300 CPU with 3.5 GHz \textcolor{blue}{REMOVE of''} maximum clock frequency. \begin{figure} \centering \input{timing} \caption{Benchmark Timing Results (smaller is better)} \resizebox{0.7\textwidth}{!}{\input{timing}} \caption{Benchmark timing results (smaller is better)} \label{fig:eval} \vspace*{-10pt} \end{figure} \begin{table} \vspace*{-10pt} \caption{Properties of benchmark code} \label{tab:eval} \centering \vspace*{-4pt} \newcommand{\CT}[1]{\multicolumn{1}{c}{#1}} \begin{tabular}{rrrrr} & \CT{C}        & \CT{\CFA}     & \CT{\CC}      & \CT{\CCV}             \\ \hline maximum memory usage (MB)                       & 10,001        & 2,502         & 2,503         & 11,253                \\ \begin{tabular}{lrrrr} & \CT{C}        & \CT{\CFA}     & \CT{\CC}      & \CT{\CCV}             \\ maximum memory usage (MB)                       & 10\,001       & 2\,502        & 2\,503        & 11\,253               \\ source code size (lines)                        & 201           & 191           & 125           & 294                   \\ redundant type annotations (lines)      & 27            & 0                     & 2                     & 16                    \\ binary size (KB)                                        & 14            & 257           & 14            & 37                    \\ \end{tabular} \vspace*{-16pt} \end{table} The C and \CCV variants are generally the slowest with the largest memory footprint, because of their less-efficient memory layout and the pointer-indirection necessary to implement generic types; The C and \CCV variants are generally the slowest with the largest memory footprint, due to their less-efficient memory layout and the pointer indirection necessary to implement generic types; this inefficiency is exacerbated by the second level of generic types in the pair benchmarks. By contrast, the \CFA and \CC variants run in roughly equivalent time for both the integer and pair because of equivalent storage layout, with the inlined libraries (\ie no separate compilation) and greater maturity of the \CC compiler contributing to its lead. \CCV is slower than C largely due to the cost of runtime type-checking of down-casts (implemented with @dynamic_cast@); By contrast, the \CFA and \CC variants run in roughly equivalent time for both the integer and pair because of the equivalent storage layout, with the inlined libraries (\ie no separate compilation) and greater maturity of the \CC compiler contributing to its lead. \CCV is slower than C largely due to the cost of runtime type checking of downcasts (implemented with @dynamic_cast@). The outlier for \CFA, pop @pair@, results from the complexity of the generated-C polymorphic code. The gcc compiler is unable to optimize some dead code and condense nested calls; Finally, the binary size for \CFA is larger because of static linking with the \CFA libraries. \CFA is also competitive in terms of source code size, measured as a proxy for programmer effort. The line counts in Table~\ref{tab:eval} include implementations of @pair@ and @stack@ types for all four languages for purposes of direct comparison, though it should be noted that \CFA and \CC have pre-written data structures in their standard libraries that programmers would generally use instead. Use of these standard library types has minimal impact on the performance benchmarks, but shrinks the \CFA and \CC benchmarks to 39 and 42 lines, respectively. \CFA is also competitive in terms of source code size, measured as a proxy for programmer effort. The line counts in Table~\ref{tab:eval} include implementations of @pair@ and @stack@ types for all four languages for purposes of direct comparison, although it should be noted that \CFA and \CC have prewritten data structures in their standard libraries that programmers would generally use instead. Use of these standard library types has minimal impact on the performance benchmarks, but shrinks the \CFA and \CC benchmarks to 39 and 42 lines, respectively. The difference between the \CFA and \CC line counts is primarily declaration duplication to implement separate compilation; a header-only \CFA library would be similar in length to the \CC version. On the other hand, C does not have a generic collections-library in its standard distribution, resulting in frequent reimplementation of such collection types by C programmers. \CCV does not use the \CC standard template library by construction, and in fact includes the definition of @object@ and wrapper classes for @char@, @short@, and @int@ in its line count, which inflates this count somewhat, as an actual object-oriented language would include these in the standard library; On the other hand, C does not have a generic collections library in its standard distribution, resulting in frequent reimplementation of such collection types by C programmers. \CCV does not use the \CC standard template library by construction and, in fact, includes the definition of @object@ and wrapper classes for @char@, @short@, and @int@ in its line count, which inflates this count somewhat, as an actual object-oriented language would include these in the standard library; with their omission, the \CCV line count is similar to C. We justify the given line count by noting that many object-oriented languages do not allow implementing new interfaces on library types without subclassing or wrapper types, which may be similarly verbose. Line-count is a fairly rough measure of code complexity; another important factor is how much type information the programmer must specify manually, especially where that information is not compiler-checked. Such unchecked type information produces a heavier documentation burden and increased potential for runtime bugs, and is much less common in \CFA than C, with its manually specified function pointer arguments and format codes, or \CCV, with its extensive use of un-type-checked downcasts, \eg @object@ to @integer@ when popping a stack. Line count is a fairly rough measure of code complexity; another important factor is how much type information the programmer must specify manually, especially where that information is not compiler checked. Such unchecked type information produces a heavier documentation burden and increased potential for runtime bugs and is much less common in \CFA than C, with its manually specified function pointer arguments and format codes, or \CCV, with its extensive use of un-type-checked downcasts, \eg @object@ to @integer@ when popping a stack. To quantify this manual typing, the redundant type annotations'' line in Table~\ref{tab:eval} counts the number of lines on which the type of a known variable is respecified, either as a format specifier, explicit downcast, type-specific function, or by name in a @sizeof@, struct literal, or @new@ expression. The \CC benchmark uses two redundant type annotations to create a new stack nodes, while the C and \CCV benchmarks have several such annotations spread throughout their code. The \CC benchmark uses two redundant type annotations to create a new stack nodes, whereas the C and \CCV benchmarks have several such annotations spread throughout their code. The \CFA benchmark is able to eliminate all redundant type annotations through use of the polymorphic @alloc@ function discussed in Section~\ref{sec:libraries}. We conjecture these results scale across most generic data-types as the underlying polymorphism implement is constant. We conjecture that these results scale across most generic data types as the underlying polymorphism implement is constant. \vspace*{-8pt} \section{Related Work} \label{s:RelatedWork} \CC provides three disjoint polymorphic extensions to C: overloading, inheritance, and templates. The overloading is restricted because resolution does not use the return type, inheritance requires learning object-oriented programming and coping with a restricted nominal-inheritance hierarchy, templates cannot be separately compiled resulting in compilation/code bloat and poor error messages, and determining how these mechanisms interact and which to use is confusing. In contrast, \CFA has a single facility for polymorphic code supporting type-safe separate-compilation of polymorphic functions and generic (opaque) types, which uniformly leverage the C procedural paradigm. In contrast, \CFA has a single facility for polymorphic code supporting type-safe separate compilation of polymorphic functions and generic (opaque) types, which uniformly leverage the C procedural paradigm. The key mechanism to support separate compilation is \CFA's \emph{explicit} use of assumed type properties. Until \CC concepts~\cite{C++Concepts} are standardized (anticipated for \CCtwenty), \CC provides no way to specify the requirements of a generic function beyond compilation errors during template expansion; Until \CC concepts~\cite{C++Concepts} are standardized (anticipated for \CCtwenty), \CC provides no way of specifying the requirements of a generic function beyond compilation errors during template expansion; furthermore, \CC concepts are restricted to template polymorphism. Cyclone~\cite{Grossman06} also provides capabilities for polymorphic functions and existential types, similar to \CFA's @forall@ functions and generic types. Cyclone existential types can include function pointers in a construct similar to a virtual function-table, but these pointers must be explicitly initialized at some point in the code, a tedious and potentially error-prone process. Cyclone existential types can include function pointers in a construct similar to a virtual function table, but these pointers must be explicitly initialized at some point in the code, which is a tedious and potentially error-prone process. Furthermore, Cyclone's polymorphic functions and types are restricted to abstraction over types with the same layout and calling convention as @void *@, \ie only pointer types and @int@. In \CFA terms, all Cyclone polymorphism must be dtype-static. While the Cyclone design provides the efficiency benefits discussed in Section~\ref{sec:generic-apps} for dtype-static polymorphism, it is more restrictive than \CFA's general model. Smith and Volpano~\cite{Smith98} present Polymorphic C, an ML dialect with polymorphic functions, C-like syntax, and pointer types; it lacks many of C's features, however, most notably structure types, and so is not a practical C replacement. Smith and Volpano~\cite{Smith98} present Polymorphic C, an ML dialect with polymorphic functions, C-like syntax, and pointer types; it lacks many of C's features, most notably structure types, and hence, is not a practical C replacement. Objective-C~\cite{obj-c-book} is an industrially successful extension to C. However, Objective-C is a radical departure from C, using an object-oriented model with message-passing. However, Objective-C is a radical departure from C, using an object-oriented model with message passing. Objective-C did not support type-checked generics until recently \cite{xcode7}, historically using less-efficient runtime checking of object types. The GObject~\cite{GObject} framework also adds object-oriented programming with runtime type-checking and reference-counting garbage-collection to C; these features are more intrusive additions than those provided by \CFA, in addition to the runtime overhead of reference-counting. Vala~\cite{Vala} compiles to GObject-based C, adding the burden of learning a separate language syntax to the aforementioned demerits of GObject as a modernization path for existing C code-bases. Java~\cite{Java8} included generic types in Java~5, which are type-checked at compilation and type-erased at runtime, similar to \CFA's. However, in Java, each object carries its own table of method pointers, while \CFA passes the method pointers separately to maintain a C-compatible layout. The GObject~\cite{GObject} framework also adds object-oriented programming with runtime type-checking and reference-counting garbage collection to C; these features are more intrusive additions than those provided by \CFA, in addition to the runtime overhead of reference counting. Vala~\cite{Vala} compiles to GObject-based C, adding the burden of learning a separate language syntax to the aforementioned demerits of GObject as a modernization path for existing C code bases. Java~\cite{Java8} included generic types in Java~5, which are type checked at compilation and type erased at runtime, similar to \CFA's. However, in Java, each object carries its own table of method pointers, whereas \CFA passes the method pointers separately to maintain a C-compatible layout. Java is also a garbage-collected, object-oriented language, with the associated resource usage and C-interoperability burdens. D~\cite{D}, Go, and Rust~\cite{Rust} are modern, compiled languages with abstraction features similar to \CFA traits, \emph{interfaces} in D and Go and \emph{traits} in Rust. D~\cite{D}, Go, and Rust~\cite{Rust} are modern compiled languages with abstraction features similar to \CFA traits, \emph{interfaces} in D and Go, and \emph{traits} in Rust. However, each language represents a significant departure from C in terms of language model, and none has the same level of compatibility with C as \CFA. D and Go are garbage-collected languages, imposing the associated runtime overhead. The necessity of accounting for data transfer between managed runtimes and the unmanaged C runtime complicates foreign-function interfaces to C. Furthermore, while generic types and functions are available in Go, they are limited to a small fixed set provided by the compiler, with no language facility to define more. D restricts garbage collection to its own heap by default, while Rust is not garbage-collected, and thus has a lighter-weight runtime more interoperable with C. D restricts garbage collection to its own heap by default, whereas Rust is not garbage collected and, thus, has a lighter-weight runtime more interoperable with C. Rust also possesses much more powerful abstraction capabilities for writing generic code than Go. On the other hand, Rust's borrow-checker provides strong safety guarantees but is complex and difficult to learn and imposes a distinctly idiomatic programming style. On the other hand, Rust's borrow checker provides strong safety guarantees but is complex and difficult to learn and imposes a distinctly idiomatic programming style. \CFA, with its more modest safety features, allows direct ports of C code while maintaining the idiomatic style of the original source. \subsection{Tuples/Variadics} \vspace*{-18pt} \subsection{Tuples/variadics} \vspace*{-5pt} Many programming languages have some form of tuple construct and/or variadic functions, \eg SETL, C, KW-C, \CC, D, Go, Java, ML, and Scala. SETL~\cite{SETL} is a high-level mathematical programming language, with tuples being one of the primary data types. Tuples in SETL allow subscripting, dynamic expansion, and multiple assignment. C provides variadic functions through @va_list@ objects, but the programmer is responsible for managing the number of arguments and their types, so the mechanism is type unsafe. C provides variadic functions through @va_list@ objects, but the programmer is responsible for managing the number of arguments and their types; thus, the mechanism is type unsafe. KW-C~\cite{Buhr94a}, a predecessor of \CFA, introduced tuples to C as an extension of the C syntax, taking much of its inspiration from SETL. The main contributions of that work were adding MRVF, tuple mass and multiple assignment, and record-member access. \CCeleven introduced @std::tuple@ as a library variadic template structure. \CCeleven introduced @std::tuple@ as a library variadic-template structure. Tuples are a generalization of @std::pair@, in that they allow for arbitrary length, fixed-size aggregation of heterogeneous values. Operations include @std::get@ to extract values, @std::tie@ to create a tuple of references used for assignment, and lexicographic comparisons. \CCseventeen proposes \emph{structured bindings}~\cite{Sutter15} to eliminate pre-declaring variables and use of @std::tie@ for binding the results. This extension requires the use of @auto@ to infer the types of the new variables, so complicated expressions with a non-obvious type must be documented with some other mechanism. \CCseventeen \textcolor{blue}{CHANGE \CC{}seventeen TO \CCseventeen''} proposes \emph{structured bindings}~\cite{Sutter15} to eliminate predeclaring variables and the use of @std::tie@ for binding the results. This extension requires the use of @auto@ to infer the types of the new variables; hence, complicated expressions with a nonobvious type must be documented with some other mechanism. Furthermore, structured bindings are not a full replacement for @std::tie@, as it always declares new variables. Like \CC, D provides tuples through a library variadic-template structure. Go does not have tuples but supports MRVF. Java's variadic functions appear similar to C's but are type-safe using homogeneous arrays, which are less useful than \CFA's heterogeneously-typed variadic functions. Java's variadic functions appear similar to C's but are type safe using homogeneous arrays, which are less useful than \CFA's heterogeneously typed variadic functions. Tuples are a fundamental abstraction in most functional programming languages, such as Standard ML~\cite{sml}, Haskell, and Scala~\cite{Scala}, which decompose tuples using pattern matching. \vspace*{-18pt} \subsection{C Extensions} \CC is the best known C-based language, and is similar to \CFA in that both are extensions to C with source and runtime backwards compatibility. Specific difference between \CFA and \CC have been identified in prior sections, with a final observation that \CFA has equal or fewer tokens to express the same notion in many cases. \vspace*{-5pt} \CC is the best known C-based language and is similar to \CFA in that both are extensions to C with source and runtime backward compatibility. Specific differences between \CFA and \CC have been identified in prior sections, with a final observation that \CFA has equal or fewer tokens to express the same notion in many cases. The key difference in design philosophies is that \CFA is easier for C programmers to understand by maintaining a procedural paradigm and avoiding complex interactions among extensions. \CC, on the other hand, has multiple overlapping features (such as the three forms of polymorphism), many of which have complex interactions with its object-oriented design. As a result, \CC has a steep learning curve for even experienced C programmers, especially when attempting to maintain performance equivalent to C legacy-code. There are several other C extension-languages with less usage and even more dramatic changes than \CC. Objective-C and Cyclone are two other extensions to C with different design goals than \CFA, as discussed above. As a result, \CC has a steep learning curve for even experienced C programmers, especially when attempting to maintain performance equivalent to C legacy code. There are several other C extension languages with less usage and even more dramatic changes than \CC. \mbox{Objective-C} and Cyclone are two other extensions to C with different design goals than \CFA, as discussed above. Other languages extend C with more focused features. $\mu$\CC~\cite{uC++book}, CUDA~\cite{Nickolls08}, ispc~\cite{Pharr12}, and Sierra~\cite{Leissa14} add concurrent or data-parallel primitives to C or \CC; data-parallel features have not yet been added to \CFA, but are easily incorporated within its design, while concurrency primitives similar to those in $\mu$\CC have already been added~\cite{Delisle18}. Finally, CCured~\cite{Necula02} and Ironclad \CC~\cite{DeLozier13} attempt to provide a more memory-safe C by annotating pointer types with garbage collection information; type-checked polymorphism in \CFA covers several of C's memory-safety issues, but more aggressive approaches such as annotating all pointer types with their nullability or requiring runtime garbage collection are contradictory to \CFA's backwards compatibility goals. data-parallel features have not yet been added to \CFA, but are easily incorporated within its design, whereas concurrency primitives similar to those in $\mu$\CC have already been added~\cite{Delisle18}. Finally, CCured~\cite{Necula02} and Ironclad \CC~\cite{DeLozier13} attempt to provide a more memory-safe C by annotating pointer types with garbage collection information; type-checked polymorphism in \CFA covers several of C's memory-safety issues, but more aggressive approaches such as annotating all pointer types with their nullability or requiring runtime garbage collection are contradictory to \CFA's backward compatibility goals. \section{Conclusion and Future Work} The goal of \CFA is to provide an evolutionary pathway for large C development-environments to be more productive and safer, while respecting the talent and skill of C programmers. While other programming languages purport to be a better C, they are in fact new and interesting languages in their own right, but not C extensions. The purpose of this paper is to introduce \CFA, and showcase language features that illustrate the \CFA type-system and approaches taken to achieve the goal of evolutionary C extension. The contributions are a powerful type-system using parametric polymorphism and overloading, generic types, tuples, advanced control structures, and extended declarations, which all have complex interactions. The goal of \CFA is to provide an evolutionary pathway for large C development environments to be more productive and safer, while respecting the talent and skill of C programmers. While other programming languages purport to be a better C, they are, in fact, new and interesting languages in their own right, but not C extensions. The purpose of this paper is to introduce \CFA, and showcase language features that illustrate the \CFA type system and approaches taken to achieve the goal of evolutionary C extension. The contributions are a powerful type system using parametric polymorphism and overloading, generic types, tuples, advanced control structures, and extended declarations, which all have complex interactions. The work is a challenging design, engineering, and implementation exercise. On the surface, the project may appear as a rehash of similar mechanisms in \CC. However, every \CFA feature is different than its \CC counterpart, often with extended functionality, better integration with C and its programmers, and always supporting separate compilation. All of these new features are being used by the \CFA development-team to build the \CFA runtime-system. All of these new features are being used by the \CFA development team to build the \CFA runtime system. Finally, we demonstrate that \CFA performance for some idiomatic cases is better than C and close to \CC, showing the design is practically applicable. While all examples in the paper compile and run, there are ongoing efforts to reduce compilation time, provide better debugging, and add more libraries; when this work is complete in early 2019, a public beta release will be available at \url{https://github.com/cforall/cforall}. There is also new work on a number of \CFA features, including arrays with size, runtime type-information, virtual functions, user-defined conversions, and modules. While \CFA polymorphic functions use dynamic virtual-dispatch with low runtime overhead (see Section~\ref{sec:eval}), it is not as low as \CC template-inlining. Hence it may be beneficial to provide a mechanism for performance-sensitive code. Two promising approaches are an @inline@ annotation at polymorphic function call sites to create a template-specialization of the function (provided the code is visible) or placing an @inline@ annotation on polymorphic function-definitions to instantiate a specialized version for some set of types (\CC template specialization). These approaches are not mutually exclusive and allow performance optimizations to be applied only when necessary, without suffering global code-bloat. In general, we believe separate compilation, producing smaller code, works well with loaded hardware-caches, which may offset the benefit of larger inlined-code. There is also new work on a number of \CFA features, including arrays with size, runtime type information, virtual functions, user-defined conversions, and modules. While \CFA polymorphic functions use dynamic virtual dispatch with low runtime overhead (see Section~\ref{sec:eval}), it is not as low as \CC template inlining. Hence, it may be beneficial to provide a mechanism for performance-sensitive code. Two promising approaches are an @inline@ annotation at polymorphic function call sites to create a template specialization of the function (provided the code is visible) or placing an @inline@ annotation on polymorphic function definitions to instantiate a specialized version for some set of types (\CC template specialization). These approaches are not mutually exclusive and allow performance optimizations to be applied only when necessary, without suffering global code bloat. In general, we believe separate compilation, producing smaller code, works well with loaded hardware caches, which may offset the benefit of larger inlined code. \section{Acknowledgments} The authors would like to recognize the design assistance of Glen Ditchfield, Richard Bilson, Thierry Delisle, Andrew Beach and Brice Dobry on the features described in this paper, and thank Magnus Madsen for feedback on the writing. Funding for this project has been provided by Huawei Ltd.\ (\url{http://www.huawei.com}), and Aaron Moss and Peter Buhr are partially funded by the Natural Sciences and Engineering Research Council of Canada. The authors would like to recognize the design assistance of Glen Ditchfield, Richard Bilson, Thierry Delisle, Andrew Beach, and Brice Dobry on the features described in this paper and thank Magnus Madsen for feedback on the writing. Funding for this project was provided by Huawei Ltd (\url{http://www.huawei.com}), and Aaron Moss and Peter Buhr were partially funded by the Natural Sciences and Engineering Research Council of Canada. {% \enlargethispage{1000pt} \subsection{\CFA} \label{s:CforallStack} \newpage \subsection{\CC}