Context Navigation

-              r1280f95
+              r9fd06ae
 \title{Generic and Tuple Types with Efficient Dynamic Layout in \protect\CFA}
+\title{\protect\CFA : Adding Modern Programming Language Features to C}
 \author{Aaron Moss, Robert Schluntz, Peter Buhr}
 …
 The C programming language is a foundational technology for modern computing with millions of lines of code implementing everything from commercial operating-systems to hobby projects.
 This installation base and the programmers producing it represent a massive software-engineering investment spanning decades and likely to continue for decades more.
+Nonetheless, C, first standardized over thirty years ago, lacks many features that make programming in more modern languages safer and more productive.
+Nevertheless, C, first standardized over thirty years ago, lacks many features that make programming in more modern languages safer and more productive.
 The goal of the \CFA project is to create an extension of C that provides modern safety and productivity features while still ensuring strong backwards compatibility with C and its programmers.
 Prior projects have attempted similar goals but failed to honour C programming-style; for instance, adding object-oriented or functional programming with garbage collection is a non-starter for many C developers.
+Specifically, \CFA is designed to have an orthogonal feature-set based closely on the C programming paradigm, so that \CFA features can be added \emph{incrementally} to existing C code-bases, and C programmers can learn \CFA extensions on an as-needed basis, preserving investment in existing code and engineers.
+This paper describes two \CFA extensions, generic and tuple types, details how their design avoids shortcomings of similar features in C and other C-like languages, and presents experimental results validating the design.
+Specifically, \CFA is designed to have an orthogonal feature-set based closely on the C programming paradigm, so that \CFA features can be added \emph{incrementally} to existing C code-bases, and C programmers can learn \CFA extensions on an as-needed basis, preserving investment in existing code and programmers.
+This paper presents a quick tour of \CFA features showing how their design avoids shortcomings of similar features in C and other C-like languages.
+Finally, experimental results are presented to validate several of the new features.
 \end{abstract}
 …
 \CC is used similarly, but has the disadvantages of multiple legacy design-choices that cannot be updated and active divergence of the language model from C, requiring significant effort and training to incrementally add \CC to a C-based project.
 \CFA is currently implemented as a source-to-source translator from \CFA to the GCC-dialect of C~\cite{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by GCC, meeting goals (1)--(3).
+\CFA is currently implemented as a source-to-source translator from \CFA to the gcc-dialect of C~\cite{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by gcc, meeting goals (1)--(3).
 Ultimately, a compiler is necessary for advanced features and optimal performance.
 …
 The signature feature of \CFA is parametric-polymorphic functions~\cite{forceone:impl,Cormack90,Duggan96} with functions generalized using a @forall@ clause (giving the language its name):
 \begin{lstlisting}
+\begin{cfa}
 `forall( otype T )` T identity( T val ) { return val; }
 int forty_two = identity( 42 );                         $\C{// T is bound to int, forty\_two == 42}$
 \end{lstlisting}
+\end{cfa}
 The @identity@ function above can be applied to any complete \newterm{object type} (or @otype@).
 The type variable @T@ is transformed into a set of additional implicit parameters encoding sufficient information about @T@ to create and return a variable of that type.
 …
 Since bare polymorphic-types provide a restricted set of available operations, \CFA provides a \newterm{type assertion}~\cite[pp.~37-44]{Alphard} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type-variable.
 For example, the function @twice@ can be defined using the \CFA syntax for operator overloading:
 \begin{lstlisting}
+\begin{cfa}
 forall( otype T `| { T ?+?(T, T); }` ) T twice( T x ) { return x + x; } $\C{// ? denotes operands}$
 int val = twice( twice( 3.7 ) );
 \end{lstlisting}
+\end{cfa}
 which works for any type @T@ with a matching addition operator.
 The polymorphism is achieved by creating a wrapper function for calling @+@ with @T@ bound to @double@, then passing this function to the first call of @twice@.
 …
 Like \CC, \CFA inherits a massive compatible library-base, where other programming languages must rewrite or provide fragile inter-language communication with C.
 A simple example is leveraging the existing type-unsafe (@void *@) C @bsearch@ to binary search a sorted float array:
 \begin{lstlisting}
+\begin{cfa}
 void * bsearch( const void * key, const void * base, size_t nmemb, size_t size,
                                 int (* compar)( const void *, const void * ));
 …
 double key = 5.0, vals[10] = { /* 10 sorted float values */ };
 double * val = (double *)bsearch( &key, vals, 10, sizeof(vals[0]), comp );      $\C{// search sorted array}$
 \end{lstlisting}
+\end{cfa}
 which can be augmented simply with a generalized, type-safe, \CFA-overloaded wrappers:
 \begin{lstlisting}
+\begin{cfa}
 forall( otype T | { int ?<?( T, T ); } ) T * bsearch( T key, const T * arr, size_t size ) {
         int comp( const void * t1, const void * t2 ) { /* as above with double changed to T */ }
 …
 double * val = bsearch( 5.0, vals, 10 );        $\C{// selection based on return type}$
 int posn = bsearch( 5.0, vals, 10 );
 \end{lstlisting}
+\end{cfa}
 The nested function @comp@ provides the hidden interface from typed \CFA to untyped (@void *@) C, plus the cast of the result.
 Providing a hidden @comp@ function in \CC is awkward as lambdas do not use C calling-conventions and template declarations cannot appear at block scope.
 …
 \CFA has replacement libraries condensing hundreds of existing C functions into tens of \CFA overloaded functions, all without rewriting the actual computations (see Section~\ref{sec:libraries}).
 For example, it is possible to write a type-safe \CFA wrapper @malloc@ based on the C @malloc@:
 \begin{lstlisting}
+\begin{cfa}
 forall( dtype T | sized(T) ) T * malloc( void ) { return (T *)malloc( sizeof(T) ); }
 int * ip = malloc();                                            $\C{// select type and size from left-hand side}$
 double * dp = malloc();
 struct S {...} * sp = malloc();
 \end{lstlisting}
+\end{cfa}
 where the return type supplies the type/size of the allocation, which is impossible in most type systems.
 …
 For example, the \CFA @qsort@ only sorts in ascending order using @<@.
 However, it is trivial to locally change this behaviour:
 \begin{lstlisting}
+\begin{cfa}
 forall( otype T | { int ?<?( T, T ); } ) void qsort( const T * arr, size_t size ) { /* use C qsort */ }
 {       int ?<?( double x, double y ) { return x `>` y; }       $\C{// locally override behaviour}$
         qsort( vals, size );                                    $\C{// descending sort}$
+}
 \end{lstlisting}
+\end{cfa}
 Within the block, the nested version of @?<?@ performs @?>?@ and this local version overrides the built-in @?<?@ so it is passed to @qsort@.
 Hence, programmers can easily form local environments, adding and modifying appropriate functions, to maximize reuse of other existing functions and types.
-%% Redundant with Section~\ref{sec:libraries} %%
-% Finally, \CFA allows variable overloading:
-% \begin{lstlisting}
-% short int MAX = ...;   int MAX = ...;  double MAX = ...;
-% short int s = MAX;    int i = MAX;    double d = MAX;   $\C{// select correct MAX}$
-% \end{lstlisting}
-% Here, the single name @MAX@ replaces all the C type-specific names: @SHRT_MAX@, @INT_MAX@, @DBL_MAX@.
 \subsection{Traits}
 \CFA provides \newterm{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration:
 \begin{lstlisting}
+\begin{cfa}
 trait `summable`( otype T ) {
         void ?{}( T *, zero_t );                                $\C{// constructor from 0 literal}$
 …
         for ( unsigned int i = 0; i < size; i += 1 ) total `+=` a[i]; $\C{// select appropriate +}$
         return total; }
 \end{lstlisting}
+\end{cfa}
 In fact, the set of @summable@ trait operators is incomplete, as it is missing assignment for type @T@, but @otype@ is syntactic sugar for the following implicit trait:
 \begin{lstlisting}
+\begin{cfa}
 trait otype( dtype T | sized(T) ) {  // sized is a pseudo-trait for types with known size and alignment
         void ?{}( T * );                                                $\C{// default constructor}$
 …
         void ?=?( T *, T );                                             $\C{// assignment operator}$
         void ^?{}( T * ); };                                    $\C{// destructor}$
 \end{lstlisting}
+\end{cfa}
 Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete type: stack-allocatable, default or copy-initialized, assigned, and deleted.
 …
 % Nominal inheritance can be simulated with traits using marker variables or functions:
 % \begin{lstlisting}
+% \begin{cfa}
 % trait nominal(otype T) {
 %     T is_nominal;
 % };
 % int is_nominal;                                                               $\C{// int now satisfies the nominal trait}$
 % \end{lstlisting}
+% \end{cfa}
+%
 % Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship \emph{among} multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems:
 % \begin{lstlisting}
+% \begin{cfa}
 % trait pointer_like(otype Ptr, otype El) {
 %     lvalue El *?(Ptr);                                                $\C{// Ptr can be dereferenced into a modifiable value of type El}$
 …
+%
 % lvalue int *?( list_iterator it ) { return it->value; }
 % \end{lstlisting}
+% \end{cfa}
 % In the example above, @(list_iterator, int)@ satisfies @pointer_like@ by the user-defined dereference function, and @(list_iterator, list)@ also satisfies @pointer_like@ by the built-in dereference operator for pointers. Given a declaration @list_iterator it@, @*it@ can be either an @int@ or a @list@, with the meaning disambiguated by context (\eg @int x = *it;@ interprets @*it@ as an @int@, while @(*it).value = 42;@ interprets @*it@ as a @list@).
 % While a nominal-inheritance system with associated types could model one of those two relationships by making @El@ an associated type of @Ptr@ in the @pointer_like@ implementation, few such systems could model both relationships simultaneously.
 …
 A generic type can be declared by placing a @forall@ specifier on a @struct@ or @union@ declaration, and instantiated using a parenthesized list of types after the type name:
 \begin{lstlisting}
+\begin{cfa}
 forall( otype R, otype S ) struct pair {
         R first;
 …
 pair( double *, double * ) r = { &d, &d };
 d = value_p( r );
 \end{lstlisting}
+\end{cfa}
 \CFA classifies generic types as either \newterm{concrete} or \newterm{dynamic}.
 …
 \CFA generic types also allow checked argument-constraints.
 For example, the following declaration of a sorted set-type ensures the set key supports equality and relational comparison:
 \begin{lstlisting}
+\begin{cfa}
 forall( otype Key | { _Bool ?==?(Key, Key); _Bool ?<?(Key, Key); } ) struct sorted_set;
 \end{lstlisting}
+\end{cfa}
 …
 A function declaration that accepts or returns a concrete generic-type produces a declaration for the instantiated structure in the same scope, which all callers may reuse.
 For example, the concrete instantiation for @pair( const char *, int )@ is:
 \begin{lstlisting}
+\begin{cfa}
 struct _pair_conc1 {
         const char * first;
         int second;
 };
 \end{lstlisting}
+\end{cfa}
 A concrete generic-type with dtype-static parameters is also expanded to a structure type, but this type is used for all matching instantiations.
 In the above example, the @pair( F *, T * )@ parameter to @value_p@ is such a type; its expansion is below and it is used as the type of the variables @q@ and @r@ as well, with casts for member access where appropriate:
 \begin{lstlisting}
+\begin{cfa}
 struct _pair_conc0 {
         void * first;
         void * second;
 };
 \end{lstlisting}
+\end{cfa}
 …
 The reuse of dtype-static structure instantiations enables useful programming patterns at zero runtime cost.
 The most important such pattern is using @forall(dtype T) T *@ as a type-checked replacement for @void *@, \eg creating a lexicographic comparison for pairs of pointers used by @bsearch@ or @qsort@:
 \begin{lstlisting}
+\begin{cfa}
 forall(dtype T) int lexcmp( pair( T *, T * ) * a, pair( T *, T * ) * b, int (* cmp)( T *, T * ) ) {
         return cmp( a->first, b->first ) ? : cmp( a->second, b->second );
+}
 \end{lstlisting}
+\end{cfa}
 Since @pair(T *, T * )@ is a concrete type, there are no implicit parameters passed to @lexcmp@, so the generated code is identical to a function written in standard C using @void *@, yet the \CFA version is type-checked to ensure the fields of both pairs and the arguments to the comparison function match in type.
 Another useful pattern enabled by reused dtype-static type instantiations is zero-cost \newterm{tag-structures}.
 Sometimes information is only used for type-checking and can be omitted at runtime, \eg:
 \begin{lstlisting}
+\begin{cfa}
 forall(dtype Unit) struct scalar { unsigned long value; };
 struct metres {};
 …
 scalar(litres) two_pools = swimming_pool + swimming_pool;
 marathon + swimming_pool;                                       $\C{// compilation ERROR}$
 \end{lstlisting}
+\end{cfa}
 @scalar@ is a dtype-static type, so all uses have a single structure definition, containing @unsigned long@, and can share the same implementations of common functions like @?+?@.
 These implementations may even be separately compiled, unlike \CC template functions.
 …
 however, many operations have multiple outcomes, some exceptional.
 Consider C's @div@ and @remquo@ functions, which return the quotient and remainder for a division of integer and float values, respectively.
 \begin{lstlisting}
+\begin{cfa}
 typedef struct { int quo, rem; } div_t;         $\C{// from include stdlib.h}$
 div_t div( int num, int den );
 …
 int q;
 double r = remquo( 13.5, 5.2, &q );                     $\C{// return remainder, alias quotient}$
 \end{lstlisting}
+\end{cfa}
 @div@ aggregates the quotient/remainder in a structure, while @remquo@ aliases a parameter to an argument.
 Both approaches are awkward.
 Alternatively, a programming language can directly support returning multiple values, \eg in \CFA:
 \begin{lstlisting}
+\begin{cfa}
 [ int, int ] div( int num, int den );           $\C{// return two integers}$
 [ double, double ] div( double num, double den ); $\C{// return two doubles}$
 …
 [ q, r ] = div( 13, 5 );                                        $\C{// select appropriate div and q, r}$
 [ q, r ] = div( 13.5, 5.2 );                            $\C{// assign into tuple}$
 \end{lstlisting}
+\end{cfa}
 Clearly, this approach is straightforward to understand and use;
 therefore, why do few programming languages support this obvious feature or provide it awkwardly?
 …
 However, functions also use \newterm{composition} (nested calls), with the direct consequence that MRVFs must also support composition to be orthogonal with single-returning-value functions (SRVF), \eg:
 \begin{lstlisting}
+\begin{cfa}
 printf( "%d %d\n", div( 13, 5 ) );                      $\C{// return values seperated into arguments}$
 \end{lstlisting}
+\end{cfa}
 Here, the values returned by @div@ are composed with the call to @printf@ by flattening the tuple into separate arguments.
 However, the \CFA type-system must support significantly more complex composition:
 \begin{lstlisting}
+\begin{cfa}
 [ int, int ] foo$\(_1\)$( int );                        $\C{// overloaded foo functions}$
 [ double ] foo$\(_2\)$( int );
 void bar( int, double, double );
 bar( foo( 3 ), foo( 3 ) );
 \end{lstlisting}
+\end{cfa}
 The type-resolver only has the tuple return-types to resolve the call to @bar@ as the @foo@ parameters are identical, which involves unifying the possible @foo@ functions with @bar@'s parameter list.
 No combination of @foo@s are an exact match with @bar@'s parameters, so the resolver applies C conversions.
 …
 An important observation from function composition is that new variable names are not required to initialize parameters from an MRVF.
 \CFA also allows declaration of tuple variables that can be initialized from an MRVF, since it can be awkward to declare multiple variables of different types, \eg:
 \begin{lstlisting}
+\begin{cfa}
 [ int, int ] qr = div( 13, 5 );                         $\C{// tuple-variable declaration and initialization}$
 [ double, double ] qr = div( 13.5, 5.2 );
 \end{lstlisting}
+\end{cfa}
 where the tuple variable-name serves the same purpose as the parameter name(s).
 Tuple variables can be composed of any types, except for array types, since array sizes are generally unknown in C.
 One way to access the tuple-variable components is with assignment or composition:
 \begin{lstlisting}
+\begin{cfa}
 [ q, r ] = qr;                                                          $\C{// access tuple-variable components}$
 printf( "%d %d\n", qr );
 \end{lstlisting}
+\end{cfa}
 \CFA also supports \newterm{tuple indexing} to access single components of a tuple expression:
 \begin{lstlisting}
+\begin{cfa}
 [int, int] * p = &qr;                                           $\C{// tuple pointer}$
 int rem = qr`.1`;                                                       $\C{// access remainder}$
 …
 bar( qr`.1`, qr );                                                      $\C{// pass remainder and quotient/remainder}$
 rem = [div( 13, 5 ), 42]`.0.1`;                         $\C{// access 2nd component of 1st component of tuple expression}$
 \end{lstlisting}
+\end{cfa}
 …
 %\par\smallskip
 %\begin{tabular}{@{}l@{\hspace{1.5\parindent}}||@{\hspace{1.5\parindent}}l@{}}
 \begin{lstlisting}
+\begin{cfa}
 int f( int, int );
 int g( [int, int] );
 …
 g( y, 10 );             $\C{// structure}$
 h( x, y );              $\C{// flatten and structure}$
 \end{lstlisting}
 %\end{lstlisting}
+\end{cfa}
+%\end{cfa}
 %&
 %\begin{lstlisting}
+%\begin{cfa}
 %\end{tabular}
 %\smallskip\par\noindent
 …
 %\par\smallskip
 %\begin{tabular}{@{}l@{\hspace{1.5\parindent}}||@{\hspace{1.5\parindent}}l@{}}
 \begin{lstlisting}
+\begin{cfa}
 int x = 10;
 double y = 3.5;
 …
 z = 10;                                                                         $\C{// mass assignment}$
 [y, x] = 3.14;                                                          $\C{// mass assignment}$
 \end{lstlisting}
 %\end{lstlisting}
+\end{cfa}
+%\end{cfa}
 %&
 %\begin{lstlisting}
+%\begin{cfa}
 %\end{tabular}
 %\smallskip\par\noindent
 …
 Finally, tuple assignment is an expression where the result type is the type of the left-hand side of the assignment, just like all other assignment expressions in C.
 This example shows mass, multiple, and cascading assignment used in one expression:
 \begin{lstlisting}
+\begin{cfa}
 void f( [int, int] );
 f( [x, y] = z = 1.5 );                                          $\C{// assignments in parameter list}$
 \end{lstlisting}
+\end{cfa}
 …
 It is also possible to access multiple fields from a single expression using a \newterm{member-access}.
 The result is a single tuple-valued expression whose type is the tuple of the types of the members, \eg:
 \begin{lstlisting}
+\begin{cfa}
 struct S { int x; double y; char * z; } s;
 s.[x, y, z] = 0;
 \end{lstlisting}
+\end{cfa}
 Here, the mass assignment sets all members of @s@ to zero.
 Since tuple-index expressions are a form of member-access expression, it is possible to use tuple-index expressions in conjunction with member tuple expressions to manually restructure a tuple (\eg rearrange, drop, and duplicate components).
 …
 %\par\smallskip
 %\begin{tabular}{@{}l@{\hspace{1.5\parindent}}||@{\hspace{1.5\parindent}}l@{}}
 \begin{lstlisting}
+\begin{cfa}
 [int, int, long, double] x;
 void f( double, long );
 …
 f( x.[0, 3] );                                                          $\C{// drop: f(x.0, x.3)}$
 [int, int, int] y = x.[2, 0, 2];                        $\C{// duplicate: [y.0, y.1, y.2] = [x.2, x.0.x.2]}$
 \end{lstlisting}
 %\end{lstlisting}
+\end{cfa}
+%\end{cfa}
 %&
 %\begin{lstlisting}
+%\begin{cfa}
 %\end{tabular}
 %\smallskip\par\noindent
 %\lstMakeShortInline@%
 It is also possible for a member access to contain other member accesses, \eg:
 \begin{lstlisting}
+\begin{cfa}
 struct A { double i; int j; };
 struct B { int * k; short l; };
 struct C { int x; A y; B z; } v;
 v.[x, y.[i, j], z.k];                                           $\C{// [v.x, [v.y.i, v.y.j], v.z.k]}$
 \end{lstlisting}
+\end{cfa}
 …
 In \CFA, the cast operator has a secondary use as type ascription.
 That is, a cast can be used to select the type of an expression when it is ambiguous, as in the call to an overloaded function:
 \begin{lstlisting}
+\begin{cfa}
 int f();     // (1)
 double f();  // (2)
 …
 f();       // ambiguous - (1),(2) both equally viable
 (int)f();  // choose (2)
 \end{lstlisting}
+\end{cfa}
 Since casting is a fundamental operation in \CFA, casts should be given a meaningful interpretation in the context of tuples.
 Taking a look at standard C provides some guidance with respect to the way casts should work with tuples:
 \begin{lstlisting}
+\begin{cfa}
 int f();
 void g();
 …
 (void)f();  // (1)
 (int)g();  // (2)
 \end{lstlisting}
+\end{cfa}
 In C, (1) is a valid cast, which calls @f@ and discards its result.
 On the other hand, (2) is invalid, because @g@ does not produce a result, so requesting an @int@ to materialize from nothing is nonsensical.
 …
 For example, in
 \begin{lstlisting}
+\begin{cfa}
 [int, int, int] f();
 [int, [int, int], int] g();
 …
 ([int, int, int, int])g();    $\C{// (4)}$
 ([int, [int, int, int]])g();  $\C{// (5)}$
 \end{lstlisting}
+\end{cfa}
 (1) discards the last element of the return value and converts the second element to @double@.
 …
 Tuples also integrate with \CFA polymorphism as a kind of generic type.
 Due to the implicit flattening and structuring conversions involved in argument passing, @otype@ and @dtype@ parameters are restricted to matching only with non-tuple types, \eg:
 \begin{lstlisting}
+\begin{cfa}
 forall(otype T, dtype U) void f( T x, U * y );
 f( [5, "hello"] );
 \end{lstlisting}
+\end{cfa}
 where @[5, "hello"]@ is flattened, giving argument list @5, "hello"@, and @T@ binds to @int@ and @U@ binds to @const char@.
 Tuples, however, may contain polymorphic components.
 For example, a plus operator can be written to add two triples together.
 \begin{lstlisting}
+\begin{cfa}
 forall(otype T | { T ?+?( T, T ); }) [T, T, T] ?+?( [T, T, T] x, [T, T, T] y ) {
         return [x.0 + y.0, x.1 + y.1, x.2 + y.2];
 …
 int i1, i2, i3;
 [i1, i2, i3] = x + ([10, 20, 30]);
 \end{lstlisting}
+\end{cfa}
 Flattening and restructuring conversions are also applied to tuple types in polymorphic type assertions.
 \begin{lstlisting}
+\begin{cfa}
 int f( [int, double], double );
 forall(otype T, otype U | { T f( T, U, U ); }) void g( T, U );
 g( 5, 10.21 );
 \end{lstlisting}
+\end{cfa}
 Hence, function parameter and return lists are flattened for the purposes of type unification allowing the example to pass expression resolution.
 This relaxation is possible by extending the thunk scheme described by Bilson~\cite{Bilson03}.
 Whenever a candidate's parameter structure does not exactly match the formal parameter's structure, a thunk is generated to specialize calls to the actual function:
 \begin{lstlisting}
+\begin{cfa}
 int _thunk( int _p0, double _p1, double _p2 ) { return f( [_p0, _p1], _p2 ); }
 \end{lstlisting}
+\end{cfa}
 so the thunk provides flattening and structuring conversions to inferred functions, improving the compatibility of tuples and polymorphism.
 These thunks take advantage of GCC C nested-functions to produce closures that have the usual function-pointer signature.
+These thunks take advantage of gcc C nested-functions to produce closures that have the usual function-pointer signature.
 …
 Unlike variadic templates, @ttype@ polymorphic functions can be separately compiled.
 For example, a generalized @sum@ function written using @ttype@:
 \begin{lstlisting}
+\begin{cfa}
 int sum$\(_0\)$() { return 0; }
 forall(ttype Params | { int sum( Params ); } ) int sum$\(_1\)$( int x, Params rest ) {
 …
+}
 sum( 10, 20, 30 );
 \end{lstlisting}
+\end{cfa}
 Since @sum@\(_0\) does not accept any arguments, it is not a valid candidate function for the call @sum(10, 20, 30)@.
 In order to call @sum@\(_1\), @10@ is matched with @x@, and the argument resolution moves on to the argument pack @rest@, which consumes the remainder of the argument list and @Params@ is bound to @[20, 30]@.
 …
 It is reasonable to take the @sum@ function a step further to enforce a minimum number of arguments:
 \begin{lstlisting}
+\begin{cfa}
 int sum( int x, int y ) { return x + y; }
 forall(ttype Params | { int sum( int, Params ); } ) int sum( int x, int y, Params rest ) {
         return sum( x + y, rest );
+}
 \end{lstlisting}
+\end{cfa}
 One more step permits the summation of any summable type with all arguments of the same type:
 \begin{lstlisting}
+\begin{cfa}
 trait summable(otype T) {
         T ?+?( T, T );
 …
         return sum( x + y, rest );
+}
 \end{lstlisting}
+\end{cfa}
 Unlike C variadic functions, it is unnecessary to hard code the number and expected types.
 Furthermore, this code is extendable for any user-defined type with a @?+?@ operator.
 …
 It is also possible to write a type-safe variadic print function to replace @printf@:
 \begin{lstlisting}
+\begin{cfa}
 struct S { int x, y; };
 forall(otype T, ttype Params | { void print(T); void print(Params); }) void print(T arg, Params rest) {
 …
 void print( S s ) { print( "{ ", s.x, ",", s.y, " }" ); }
 print( "s = ", (S){ 1, 2 }, "\n" );
 \end{lstlisting}
+\end{cfa}
 This example showcases a variadic-template-like decomposition of the provided argument list.
 The individual @print@ functions allow printing a single element of a type.
 …
 Finally, it is possible to use @ttype@ polymorphism to provide arbitrary argument forwarding functions.
 For example, it is possible to write @new@ as a library function:
 \begin{lstlisting}
+\begin{cfa}
 forall( otype R, otype S ) void ?{}( pair(R, S) *, R, S );
 forall( dtype T, ttype Params | sized(T) | { void ?{}( T *, Params ); } ) T * new( Params p ) {
 …
+}
 pair( int, char ) * x = new( 42, '!' );
 \end{lstlisting}
+\end{cfa}
 The @new@ function provides the combination of type-safe @malloc@ with a \CFA constructor call, making it impossible to forget constructing dynamically allocated objects.
 This function provides the type-safety of @new@ in \CC, without the need to specify the allocated type again, thanks to return-type inference.
 …
 Tuples are implemented in the \CFA translator via a transformation into \newterm{generic types}.
 For each $N$, the first time an $N$-tuple is seen in a scope a generic type with $N$ type parameters is generated, \eg:
 \begin{lstlisting}
+\begin{cfa}
 [int, int] f() {
         [double, double] x;
         [int, double, int] y;
+}
 \end{lstlisting}
+\end{cfa}
 is transformed into:
 \begin{lstlisting}
+\begin{cfa}
 forall(dtype T0, dtype T1 | sized(T0) | sized(T1)) struct _tuple2 {
         T0 field_0;                                                             $\C{// generated before the first 2-tuple}$
 …
         _tuple3(int, double, int) y;
+}
 \end{lstlisting}
+\end{cfa}
 \begin{sloppypar}
 Tuple expressions are then simply converted directly into compound literals, \eg @[5, 'x', 1.24]@ becomes @(_tuple3(int, char, double)){ 5, 'x', 1.24 }@.
 …
 \begin{comment}
 Since tuples are essentially structures, tuple indexing expressions are just field accesses:
 \begin{lstlisting}
+\begin{cfa}
 void f(int, [double, char]);
 [int, double] x;
 …
 printf("%d %g\n", x);
 f(x, 'z');
 \end{lstlisting}
+\end{cfa}
 Is transformed into:
 \begin{lstlisting}
+\begin{cfa}
 void f(int, _tuple2(double, char));
 _tuple2(int, double) x;
 …
 printf("%d %g\n", x.field_0, x.field_1);
 f(x.field_0, (_tuple2){ x.field_1, 'z' });
 \end{lstlisting}
+\end{cfa}
 Note that due to flattening, @x@ used in the argument position is converted into the list of its fields.
 In the call to @f@, the second and third argument components are structured into a tuple argument.
 …
 Expressions that may contain side effects are made into \newterm{unique expressions} before being expanded by the flattening conversion.
 Each unique expression is assigned an identifier and is guaranteed to be executed exactly once:
 \begin{lstlisting}
+\begin{cfa}
 void g(int, double);
 [int, double] h();
 g(h());
 \end{lstlisting}
+\end{cfa}
 Internally, this expression is converted to two variables and an expression:
 \begin{lstlisting}
+\begin{cfa}
 void g(int, double);
 [int, double] h();
 …
         (_unq0_finished_ ? _unq0 : (_unq0 = f(), _unq0_finished_ = 1, _unq0)).1,
 );
 \end{lstlisting}
+\end{cfa}
 Since argument evaluation order is not specified by the C programming language, this scheme is built to work regardless of evaluation order.
 The first time a unique expression is executed, the actual expression is evaluated and the accompanying boolean is set to true.
 …
+% \subsection{Exception Handling ???}
+\subsection{Exception Handling}
+\CFA provides two forms of exception handling: \newterm{resumption} (fix-up) and \newterm{recovery}.
+Both mechanisms provide dynamic call to a handler using dynamic name-lookup, where fix-up has dynamic return and recovery has static return from the handler.
+\begin{cquote}
+\lstDeleteShortInline@%
+\begin{tabular}{@{}l@{\hspace{\parindentlnth}}l@{}}
+\multicolumn{1}{c@{\hspace{\parindentlnth}}}{\textbf{Resumption}}       & \multicolumn{1}{c}{\textbf{Recovery}} \\
+\begin{cfa}
+_Exception E { int fix; };
+void f() {
+        ... _Resume E;
+        // control returns here after handler
+try {
+        f();
+} catchResume( E e ) {
+        ... e.fix = ...; // return correction to raise
+} // dynamic return to _Resume
+\end{cfa}
+&
+\begin{cfa}
+_Exception E {};
+void f() {
+        ... _Throw E;
+        // control does NOT return here after handler
+try {
+        f();
+} catch( E e ) {
+        ... // recover and continue
+} // static return to next statement
+\end{cfa}
+\end{tabular}
+\lstMakeShortInline@%
+\end{cquote}
 …
 An important part of this subjective feel is maintaining C's procedural paradigm, as opposed to the object-oriented paradigm of other systems languages such as \CC and Rust.
 Maintaining this procedural paradigm means that C coding-patterns remain not only functional but idiomatic in \CFA, reducing the mental burden of retraining C programmers and switching between C and \CFA development.
 Nonetheless, some features of object-oriented languages are undeniably convienient but are independent of object-oriented programming;
+Nonetheless, some features of object-oriented languages are undeniably convenient but are independent of object-oriented programming;
 \CFA adapts these features to a procedural paradigm.
 …
 as well, parameter names are optional, \eg:
 \begin{cfa}
 [ int x ] f ();                                                 $\C{// returning int with no parameters}$
+[ int x ] f ( /* void */ );                             $\C{// returning int with no parameters}$
 [ int x ] f (...);                                              $\C{// returning int with unknown parameters}$
 [ * int ] g ( int y );                                  $\C{// returning pointer to int with int parameter}$
 [ ] h ( int, char );                                    $\C{// returning no result with int and char parameters}$
+[ void ] h ( int, char );                               $\C{// returning no result with int and char parameters}$
 [ * int, int ] j ( int );                               $\C{// returning pointer to int and int, with int parameter}$
 \end{cfa}
 …
 \subsection{Constructors and Destructors}
 One of the strengths (and weaknesses) of C is control over memory management, allowing resource release to be more consistent and precisely timed than possible with garbage-collected memory-management.
 However, this manual approach is often verbose, furthermore it is useful to manage resources other than memory (\eg file handles) using the same mechanism as memory.
+One of the strengths (and weaknesses) of C is memory-management control, allowing resource release to be precisely specified versus unknown release with garbage-collected memory-management.
+However, this manual approach is often verbose, and it is useful to manage resources other than memory (\eg file handles) using the same mechanism as memory.
 \CC addresses these issues using Resource Aquisition Is Initialization (RAII), implemented by means of \newterm{constructor} and \newterm{destructor} functions;
 \CFA adopts constructors and destructors (and @finally@) to facilitate RAII.
 While constructors and destructors are a common feature of object-oriented programming-languages, they are independent capabilities allowing \CFA to retain a procedural paradigm.
+While constructors and destructors are a common feature of object-oriented programming-languages, they are an independent capability allowing \CFA to adopt them while retaining a procedural paradigm.
 Specifically, \CFA constructors and destructors are denotes by name and first parameter-type versus name and nesting in an aggregate type.
+In \CFA, a constructor is named @?{}@ and a destructor is named @^?{}@;
 like other \CFA operators, these names represent the syntax used to call the constructor or destructor, \eg @x{ ... };@ or @^x{...};@.
+Constructor calls seamlessly integrate with existing C initialization syntax, providing a simple and familiar syntax to C programmers and allowing constructor calls to be inserted into legacy C code with minimal code changes.
+In \CFA, a constructor is named @?{}@ and a destructor is named @^?{}@.
 The name @{}@ comes from the syntax for the initializer: @struct S { int i, j; } s = `{` 2, 3 `}`@.
+The symbol \lstinline+^+ is used because it was the last remaining binary operator that could be used in a unary context.
+Like other \CFA operators, these names represent the syntax used to call the constructor or destructor, \eg @?{}(x, ...)@ or @^{}(x, ...)@.
 The constructor and destructor have return type @void@ and a first parameter of reference to the object type to be constructed or destructs.
 While the first parameter is informally called the @this@ parameter, as in object-oriented languages, any variable name may be used.
+Both constructors and destructors allow additional parametes after the @this@ parameter for specifying values for initialization/de-initialization\footnote{Destruction parameters are useful for specifying storage-management actions, such as de-initialize but not de-allocate.}.
+Both constructors and destructors allow additional parametes after the @this@ parameter for specifying values for initialization/de-initialization\footnote{
+Destruction parameters are useful for specifying storage-management actions, such as de-initialize but not deallocate.}.
 \begin{cfa}
 struct VLA {
         int len, * data;
 };
+void ?{}( VLA& vla ) with ( vla ) {                     $\C{// default constructor}$
+        len = 10;  data = calloc( len );
+}
+void ^?{}( VLA& vla ) {                                         $\C{// destructor}$
+        free( vla.data );
+}
+{       VLA x; `?{}(x);`                                                $\C{// compiler generated constructor call}$
+        // ... use x
+`^?{}(x);` }                                                            $\C{// compiler generated desturctor call}$
+\end{cfa}
+@VLA@ is an example of a \newterm{managed type}\footnote{A managed type affects the runtime environment versus being self-contained.} in \CFA: a type requiring a non-trivial constructor or destructor, or with a field of a managed type.
+A managed types is implicitly constructed upon allocation, and destructed upon deallocation to ensure proper interaction of runtime resources, in this case the @data@ array in the heap.
+The exact details of the placement of these implicit constructor and destructor calls are omitted here for brevity, the interested reader should consult \cite{Schluntz17}.
+Constructor calls seamlessly integrate with existing C initialization syntax, providing a simple and familiar syntax to C programmers and allowing constructor calls to be inserted into legacy C code with minimal code changes.
+As such, \CFA also provides syntax for \newterm{initialization} and \newterm{copy}:
+\begin{cfa}
+void ?{}( VLA & vla, int size, int fill );      $\C{// initialization}$
+void ?{}( VLA & vla, VLA other );                       $\C{// copy}$
+VLA y = { 20, 0xdeadbeef },  // initialization
+           z = y; // copy
+\end{cfa}
+Copy constructors have exactly two parameters, the second of which has the same type as the base type of the @this@ parameter; appropriate care is taken in the implementation to avoid recursive calls to the copy constructor when initializing this second parameter.
+Other constructor calls look just like C initializers, except rather than using field-by-field initialization (as in C), an initialization which matches a defined constructor will call the constructor instead.
+In addition to initialization syntax, \CFA provides two ways to explicitly call constructors and destructors.
+Explicit calls to constructors double as a placement syntax, useful for construction of member fields in user-defined constructors and reuse of large storage allocations.
+While the existing function-call syntax works for explicit calls to constructors and destructors, \CFA also provides a more concise \newterm{operator syntax} for both:
+\begin{cfa}
+VLA a, b;
+a{};                            $\C{// default construct}$
+b{ a };                         $\C{// copy construct}$
+^a{};                           $\C{// destruct}$
+a{ 5, 0xFFFFFFFF };     $\C{// explicit constructor call}$
+void ?{}( VLA & vla ) with ( vla ) {            $\C{// default constructor}$
+        len = 10;  data = alloc( len );
+}
+void ^?{}( VLA & vla ) with ( vla ) {           $\C{// destructor}$
+        free( data );
+}
+{
+        VLA x;                                                                  $\C{// implicit:  ?\{\}( x );}$
+}                                                                                       $\C{// implicit:  ?\^{}\{\}( x );}$
+\end{cfa}
+@VLA@ is a \newterm{managed type}\footnote{
+A managed type affects the runtime environment versus a self-contained type.}: a type requiring a non-trivial constructor or destructor, or with a field of a managed type.
+A managed types is implicitly constructed upon allocation and destructed upon deallocation to ensure proper interaction with runtime resources, in this case the @data@ array in the heap.
+For details of the placement of implicit constructor and destructor calls among complex executable statements see~\cite[\S~2.2]{Schluntz17}.
+\CFA also provides syntax for \newterm{initialization} and \newterm{copy}:
+\begin{cfa}
+void ?{}( VLA & vla, int size, char fill ) with ( vla ) {       $\C{// initialization}$
+        len = size;  data = alloc( len, fill );
+}
+void ?{}( VLA & vla, VLA other ) {                      $\C{// copy}$
+        vla.len = other.len;  vla.data = other.data;
+}
+\end{cfa}
+An initialization constructor-call has the same syntax as a C initializer, except the initialization values are passed as arguments to a matching constructor (number and type of paremeters).
+\begin{cfa}
+VLA va = `{` 20, 0 `}`,  * arr = alloc()`{` 5, 0 `}`;
+\end{cfa}
+Note, the use of a \newterm{constructor expression} to initialize the storage from the dynamic storage-allocation.
+Like \CC, the copy constructor has two parameters, the second of which is a value parameter with the same type as the first parameter;
+appropriate care is taken to not recursively call the copy constructor when initializing the second parameter.
+\CFA constructors may be explicitly call, like Java, and destructors may be explicitly called, like \CC.
+Explicit calls to constructors double as a \CC-style \emph{placement syntax}, useful for construction of member fields in user-defined constructors and reuse of existing storage allocations.
+While existing call syntax works for explicit calls to constructors and destructors, \CFA also provides a more concise \newterm{operator syntax} for both:
+\begin{cfa}
+{
+        VLA x,  y = { 20, 0x01 },  z = y;
+        // implicit:  ?{}( x );  ?{}( y, 20, 0x01 );  ?{}( z, y ); z points to y
+        ^x{};                                                                   $\C{// deallocate x}$
+        x{};                                                                    $\C{// reallocate x}$
+        z{ 5, 0xff };                                                   $\C{// reallocate z, not pointing to y}$
+        ^y{};                                                                   $\C{// deallocate y}$
+        y{ x };                                                                 $\C{// reallocate y, points to x}$
+        x{};                                                                    $\C{// reallocate x, not pointing to y}$
+        // implicit:  ^?{}(z);  ^?{}(y);  ^?{}(x);
+}
 \end{cfa}
 To provide a uniform type interface for @otype@ polymorphism, the \CFA compiler automatically generates a default constructor, copy constructor, assignment operator, and destructor for all types.
+These default functions can be overridden by user-generated versions of them.
+For compatibility with the standard behaviour of C, the default constructor and destructor for all basic, pointer, and reference types do nothing, while the copy constructor and assignment operator are bitwise copies; if default zero-initialization is desired, the default constructors can be overridden.
+These default functions can be overridden by user-generated versions.
+For compatibility with the standard behaviour of C, the default constructor and destructor for all basic, pointer, and reference types do nothing, while the copy constructor and assignment operator are bitwise copies;
+if default zero-initialization is desired, the default constructors can be overridden.
 For user-generated types, the four functions are also automatically generated.
 @enum@ types are handled the same as their underlying integral type, and unions are also bitwise copied and no-op initialized and destructed.
 …
 For @struct@ types, each of the four functions are implicitly defined to call their corresponding functions on each member of the struct.
 To better simulate the behaviour of C initializers, a set of \newterm{field constructors} is also generated for structures.
+A constructor is generated for each non-empty prefix of a structure's member-list which copy-constructs the members passed as parameters and default-constructs the remaining members.
+To allow users to limit the set of constructors available for a type, when a user declares any constructor or destructor, the corresponding generated function and all field constructors for that type are hidden from expression resolution; similarly, the generated default constructor is hidden upon declaration of any constructor.
+A constructor is generated for each non-empty prefix of a structure's member-list to copy-construct the members passed as parameters and default-construct the remaining members.
+To allow users to limit the set of constructors available for a type, when a user declares any constructor or destructor, the corresponding generated function and all field constructors for that type are hidden from expression resolution;
+similarly, the generated default constructor is hidden upon declaration of any constructor.
 These semantics closely mirror the rule for implicit declaration of constructors in \CC\cite[p.~186]{ANSI98:C++}.
 In rare situations user programmers may not wish to have constructors and destructors called; in these cases, \CFA provides an ``escape hatch'' to not call them.
 If a variable is initialized using the syntax \lstinline|S x @= {}| it will be an \newterm{unmanaged object}, and will not have constructors or destructors called.
 Any C initializer can be the right-hand side of an \lstinline|@=| initializer, \eg  \lstinline|VLA a @= { 0, 0x0 }|, with the usual C initialization semantics.
 In addition to the expressive power, \lstinline|@=| provides a simple path for migrating legacy C code to \CFA, by providing a mechanism to incrementally convert initializers; the \CFA design team decided to introduce a new syntax for this escape hatch because we believe that our RAII implementation will handle the vast majority of code in a desirable way, and we wished to maintain familiar syntax for this common case.
+In some circumstance programmers may not wish to have constructor and destructor calls.
+In these cases, \CFA provides the initialization syntax \lstinline|S x @= {}|, and the object becomes unmanaged, so constructors and destructors calls are not generated.
+Any C initializer can be the right-hand side of an \lstinline|@=| initializer, \eg \lstinline|VLA a @= { 0, 0x0 }|, with the usual C initialization semantics.
+The point of \lstinline|@=| is to provide a migration path from legacy C code to \CFA, by providing a mechanism to incrementally convert to implicit initialization.
 …
 \subsection{Default Parameters}
+% \subsection{Default Parameters}
 …
 A separator does not appear before a C string starting with the characters: \lstinline[mathescape=off,basicstyle=\tt]@([{=$@
 \item
 A seperator does not appear after a C string ending with the characters: \lstinline[basicstyle=\tt]@,.;!?)]}%@
+A separator does not appear after a C string ending with the characters: \lstinline[basicstyle=\tt]@,.;!?)]}%@
 \item
 {\lstset{language=CFA,deletedelim=**[is][]{`}{`}}
 A seperator does not appear before or after a C string begining/ending with the quote or whitespace characters: \lstinline[basicstyle=\tt,showspaces=true]@`'": \t\v\f\r\n@
+A separator does not appear before or after a C string beginning/ending with the quote or whitespace characters: \lstinline[basicstyle=\tt,showspaces=true]@`'": \t\v\f\r\n@
 }%
 \item
 …
 \begin{figure}
 \begin{lstlisting}[xleftmargin=3\parindentlnth,aboveskip=0pt,belowskip=0pt]
+\begin{cfa}[xleftmargin=3\parindentlnth,aboveskip=0pt,belowskip=0pt]
 int main( int argc, char * argv[] ) {
         FILE * out = fopen( "cfa-out.txt", "w" );
 …
         fclose(out);
+}
 \end{lstlisting}
+\end{cfa}
 \caption{\protect\CFA Benchmark Test}
 \label{fig:BenchmarkTest}
 …
 Figure~\ref{fig:eval} and Table~\ref{tab:eval} show the results of running the benchmark in Figure~\ref{fig:BenchmarkTest} and its C, \CC, and \CCV equivalents.
 The graph plots the median of 5 consecutive runs of each program, with an initial warm-up run omitted.
 All code is compiled at \texttt{-O2} by GCC or G++ 6.2.0, with all \CC code compiled as \CCfourteen.
+All code is compiled at \texttt{-O2} by gcc or g++ 6.2.0, with all \CC code compiled as \CCfourteen.
 The benchmarks are run on an Ubuntu 16.04 workstation with 16 GB of RAM and a 6-core AMD FX-6300 CPU with 3.5 GHz maximum clock frequency.
 …
 \CCV is slower than C largely due to the cost of runtime type-checking of down-casts (implemented with @dynamic_cast@);
 There are two outliers in the graph for \CFA: all prints and pop of @pair@.
 Both of these cases result from the complexity of the C-generated polymorphic code, so that the GCC compiler is unable to optimize some dead code and condense nested calls.
+Both of these cases result from the complexity of the C-generated polymorphic code, so that the gcc compiler is unable to optimize some dead code and condense nested calls.
 A compiler designed for \CFA could easily perform these optimizations.
 Finally, the binary size for \CFA is larger because of static linking with the \CFA libraries.
 …
 \smallskip\noindent
 \CFA
 \begin{lstlisting}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
+\begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
 forall(otype T) struct stack_node {
         T value;
 …
         s->head = 0;
+}
 \end{lstlisting}
+\end{cfa}
 \medskip\noindent
 \CC
 \begin{lstlisting}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
+\begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
 template<typename T> class stack {
         struct node {
 …
+        }
 };
 \end{lstlisting}
+\end{cfa}
 \medskip\noindent
+C
 \begin{lstlisting}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
+\begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
 struct stack_node {
         void * value;
 …
         s->head = NULL;
+}
 \end{lstlisting}
+\end{cfa}
 \medskip\noindent
 \CCV
 \begin{lstlisting}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
+\begin{cfa}[xleftmargin=2\parindentlnth,aboveskip=0pt,belowskip=0pt]
 stack::node::node( const object & v, node * n ) : value( v.new_copy() ), next( n ) {}
 void stack::copy(const stack & o) {
 …
         head = nullptr;
+}
 \end{lstlisting}
+\end{cfa}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 9fd06ae for doc

Legend:

doc/papers/general/Paper.tex

Download in other formats: