Changeset bede27b for doc/papers/general


Ignore:
Timestamp:
Feb 9, 2018, 4:39:52 PM (8 years ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
298ed08
Parents:
381fdee (diff), a722c7a (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

fix conflicts

Location:
doc/papers/general
Files:
1 added
2 edited

Legend:

Unmodified
Added
Removed
  • doc/papers/general/Makefile

    r381fdee rbede27b  
    1818
    1919FIGURES = ${addsuffix .tex, \
     20Cdecl \
    2021}
    2122
  • doc/papers/general/Paper.tex

    r381fdee rbede27b  
    22
    33\usepackage{fullpage}
     4\usepackage{epic,eepic}
    45\usepackage{xspace,calc,comment}
    56\usepackage{upquote}                                                                    % switch curled `'" to straight
     
    3637%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    3738
    38 \newcommand{\Textbf}[1]{{\color{red}\textbf{#1}}}
     39\newcommand{\Textbf}[2][red]{{\color{#1}{\textbf{#2}}}}
    3940\newcommand{\TODO}[1]{\textbf{TODO}: {\itshape #1}} % TODO included
    4041%\newcommand{\TODO}[1]{} % TODO elided
     
    101102\makeatother
    102103
     104\newenvironment{cquote}{%
     105        \list{}{\lstset{resetmargins=true,aboveskip=0pt,belowskip=0pt}\topsep=4pt\parsep=0pt\leftmargin=\parindent\rightmargin\leftmargin}%
     106        \item\relax
     107}{%
     108        \endlist
     109}% cquote
     110
    103111% CFA programming language, based on ANSI C (with some gcc additions)
    104112\lstdefinelanguage{CFA}[ANSI]{C}{
     
    226234int forty_two = identity( 42 );                         $\C{// T is bound to int, forty\_two == 42}$
    227235\end{lstlisting}
    228 The @identity@ function above can be applied to any complete \emph{object type} (or @otype@).
     236The @identity@ function above can be applied to any complete \newterm{object type} (or @otype@).
    229237The type variable @T@ is transformed into a set of additional implicit parameters encoding sufficient information about @T@ to create and return a variable of that type.
    230238The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor.
    231 If this extra information is not needed, \eg for a pointer, the type parameter can be declared as a \emph{data type} (or @dtype@).
     239If this extra information is not needed, \eg for a pointer, the type parameter can be declared as a \newterm{data type} (or @dtype@).
    232240
    233241In \CFA, the polymorphism runtime-cost is spread over each polymorphic call, due to passing more arguments to polymorphic functions;
     
    235243A design advantage is that, unlike \CC template-functions, \CFA polymorphic-functions are compatible with C \emph{separate compilation}, preventing compilation and code bloat.
    236244
    237 Since bare polymorphic-types provide a restricted set of available operations, \CFA provides a \emph{type assertion}~\cite[pp.~37-44]{Alphard} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type-variable.
     245Since bare polymorphic-types provide a restricted set of available operations, \CFA provides a \newterm{type assertion}~\cite[pp.~37-44]{Alphard} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type-variable.
    238246For example, the function @twice@ can be defined using the \CFA syntax for operator overloading:
    239247\begin{lstlisting}
     
    315323\subsection{Traits}
    316324
    317 \CFA provides \emph{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration:
     325\CFA provides \newterm{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration:
    318326\begin{lstlisting}
    319327trait summable( otype T ) {
     
    339347Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete type: stack-allocatable, default or copy-initialized, assigned, and deleted.
    340348
    341 In summation, the \CFA type-system uses \emph{nominal typing} for concrete types, matching with the C type-system, and \emph{structural typing} for polymorphic types.
     349In summation, the \CFA type-system uses \newterm{nominal typing} for concrete types, matching with the C type-system, and \newterm{structural typing} for polymorphic types.
    342350Hence, trait names play no part in type equivalence;
    343351the names are simply macros for a list of polymorphic assertions, which are expanded at usage sites.
     
    384392Furthermore, writing and using preprocessor macros can be unnatural and inflexible.
    385393
    386 \CC, Java, and other languages use \emph{generic types} to produce type-safe abstract data-types.
     394\CC, Java, and other languages use \newterm{generic types} to produce type-safe abstract data-types.
    387395\CFA also implements generic types that integrate efficiently and naturally with the existing polymorphic functions, while retaining backwards compatibility with C and providing separate compilation.
    388396However, for known concrete parameters, the generic-type definition can be inlined, like \CC templates.
     
    405413\end{lstlisting}
    406414
    407 \CFA classifies generic types as either \emph{concrete} or \emph{dynamic}.
     415\CFA classifies generic types as either \newterm{concrete} or \newterm{dynamic}.
    408416Concrete types have a fixed memory layout regardless of type parameters, while dynamic types vary in memory layout depending on their type parameters.
    409 A type may have polymorphic parameters but still be concrete, called \emph{dtype-static}.
     417A type may have polymorphic parameters but still be concrete, called \newterm{dtype-static}.
    410418Polymorphic pointers are an example of dtype-static types, \eg @forall(dtype T) T *@ is a polymorphic type, but for any @T@, @T *@  is a fixed-sized pointer, and therefore, can be represented by a @void *@ in code generation.
    411419
     
    444452Though \CFA implements concrete generic-types efficiently, it also has a fully general system for dynamic generic types.
    445453As mentioned in Section~\ref{sec:poly-fns}, @otype@ function parameters (in fact all @sized@ polymorphic parameters) come with implicit size and alignment parameters provided by the caller.
    446 Dynamic generic-types also have an \emph{offset array} containing structure-member offsets.
     454Dynamic generic-types also have an \newterm{offset array} containing structure-member offsets.
    447455A dynamic generic-union needs no such offset array, as all members are at offset 0, but size and alignment are still necessary.
    448456Access to members of a dynamic structure is provided at runtime via base-displacement addressing with the structure pointer and the member offset (similar to the @offsetof@ macro), moving a compile-time offset calculation to runtime.
     
    457465For instance, modularity is generally provided in C by including an opaque forward-declaration of a structure and associated accessor and mutator functions in a header file, with the actual implementations in a separately-compiled @.c@ file.
    458466\CFA supports this pattern for generic types, but the caller does not know the actual layout or size of the dynamic generic-type, and only holds it by a pointer.
    459 The \CFA translator automatically generates \emph{layout functions} for cases where the size, alignment, and offset array of a generic struct cannot be passed into a function from that function's caller.
     467The \CFA translator automatically generates \newterm{layout functions} for cases where the size, alignment, and offset array of a generic struct cannot be passed into a function from that function's caller.
    460468These layout functions take as arguments pointers to size and alignment variables and a caller-allocated array of member offsets, as well as the size and alignment of all @sized@ parameters to the generic structure (un@sized@ parameters are forbidden from being used in a context that affects layout).
    461469Results of these layout functions are cached so that they are only computed once per type per function. %, as in the example below for @pair@.
     
    481489Since @pair(T *, T * )@ is a concrete type, there are no implicit parameters passed to @lexcmp@, so the generated code is identical to a function written in standard C using @void *@, yet the \CFA version is type-checked to ensure the fields of both pairs and the arguments to the comparison function match in type.
    482490
    483 Another useful pattern enabled by reused dtype-static type instantiations is zero-cost \emph{tag-structures}.
     491Another useful pattern enabled by reused dtype-static type instantiations is zero-cost \newterm{tag-structures}.
    484492Sometimes information is only used for type-checking and can be omitted at runtime, \eg:
    485493\begin{lstlisting}
     
    537545The addition of multiple-return-value functions (MRVF) are useless without a syntax for accepting multiple values at the call-site.
    538546The simplest mechanism for capturing the return values is variable assignment, allowing the values to be retrieved directly.
    539 As such, \CFA allows assigning multiple values from a function into multiple variables, using a square-bracketed list of lvalue expressions (as above), called a \emph{tuple}.
    540 
    541 However, functions also use \emph{composition} (nested calls), with the direct consequence that MRVFs must also support composition to be orthogonal with single-returning-value functions (SRVF), \eg:
     547As such, \CFA allows assigning multiple values from a function into multiple variables, using a square-bracketed list of lvalue expressions (as above), called a \newterm{tuple}.
     548
     549However, functions also use \newterm{composition} (nested calls), with the direct consequence that MRVFs must also support composition to be orthogonal with single-returning-value functions (SRVF), \eg:
    542550\begin{lstlisting}
    543551printf( "%d %d\n", div( 13, 5 ) );                      $\C{// return values seperated into arguments}$
     
    572580printf( "%d %d\n", qr );
    573581\end{lstlisting}
    574 \CFA also supports \emph{tuple indexing} to access single components of a tuple expression:
     582\CFA also supports \newterm{tuple indexing} to access single components of a tuple expression:
    575583\begin{lstlisting}
    576584[int, int] * p = &qr;                                           $\C{// tuple pointer}$
     
    615623\subsection{Tuple Assignment}
    616624
    617 An assignment where the left side is a tuple type is called \emph{tuple assignment}.
    618 There are two kinds of tuple assignment depending on whether the right side of the assignment operator has a tuple type or a non-tuple type, called \emph{multiple} and \emph{mass assignment}, respectively.
     625An assignment where the left side is a tuple type is called \newterm{tuple assignment}.
     626There are two kinds of tuple assignment depending on whether the right side of the assignment operator has a tuple type or a non-tuple type, called \newterm{multiple} and \newterm{mass assignment}, respectively.
    619627%\lstDeleteShortInline@%
    620628%\par\smallskip
     
    650658\subsection{Member Access}
    651659
    652 It is also possible to access multiple fields from a single expression using a \emph{member-access}.
     660It is also possible to access multiple fields from a single expression using a \newterm{member-access}.
    653661The result is a single tuple-valued expression whose type is the tuple of the types of the members, \eg:
    654662\begin{lstlisting}
     
    780788Matching against a @ttype@ parameter consumes all remaining argument components and packages them into a tuple, binding to the resulting tuple of types.
    781789In a given parameter list, there must be at most one @ttype@ parameter that occurs last, which matches normal variadic semantics, with a strong feeling of similarity to \CCeleven variadic templates.
    782 As such, @ttype@ variables are also called \emph{argument packs}.
     790As such, @ttype@ variables are also called \newterm{argument packs}.
    783791
    784792Like variadic templates, the main way to manipulate @ttype@ polymorphic functions is via recursion.
     
    852860\subsection{Implementation}
    853861
    854 Tuples are implemented in the \CFA translator via a transformation into \emph{generic types}.
     862Tuples are implemented in the \CFA translator via a transformation into \newterm{generic types}.
    855863For each $N$, the first time an $N$-tuple is seen in a scope a generic type with $N$ type parameters is generated, \eg:
    856864\begin{lstlisting}
     
    903911Similarly, tuple member expressions are recursively expanded into a list of member access expressions.
    904912
    905 Expressions that may contain side effects are made into \emph{unique expressions} before being expanded by the flattening conversion.
     913Expressions that may contain side effects are made into \newterm{unique expressions} before being expanded by the flattening conversion.
    906914Each unique expression is assigned an identifier and is guaranteed to be executed exactly once:
    907915\begin{lstlisting}
     
    10521060\label{s:WithClauseStatement}
    10531061
    1054 Grouping heterogenous data into \newterm{aggregate}s is a common programming practice, and an aggregate can be further organized into more complex structures, such as arrays and containers:
    1055 \begin{cfa}
    1056 struct S {                                                              $\C{// aggregate}$
    1057         char c;                                                         $\C{// fields}$
     1062Grouping heterogenous data into \newterm{aggregate}s (structure/union) is a common programming practice, and an aggregate can be further organized into more complex structures, such as arrays and containers:
     1063\begin{cfa}
     1064struct S {                                                                      $\C{// aggregate}$
     1065        char c;                                                                 $\C{// fields}$
    10581066        int i;
    10591067        double d;
     
    10611069S s, as[10];
    10621070\end{cfa}
    1063 However, routines manipulating aggregates have repeition of the aggregate name to access its containing fields:
     1071However, routines manipulating aggregates must repeat the aggregate name to access its containing fields:
    10641072\begin{cfa}
    10651073void f( S s ) {
    1066         `s.`c; `s.`i; `s.`d;                            $\C{// access containing fields}$
     1074        `s.`c; `s.`i; `s.`d;                                    $\C{// access containing fields}$
    10671075}
    10681076\end{cfa}
     
    10701078\begin{C++}
    10711079class C {
    1072         char c;                                                         $\C{// fields}$
     1080        char c;                                                                 $\C{// fields}$
    10731081        int i;
    10741082        double d;
    1075         int mem() {                                                     $\C{// implicit "this" parameter}$
    1076                 `this->`c; `this->`i; `this->`d;$\C{// access containing fields}$
     1083        int mem() {                                                             $\C{// implicit "this" parameter}$
     1084                `this->`c; `this->`i; `this->`d;        $\C{// access containing fields}$
    10771085        }
    10781086}
    10791087\end{C++}
    1080 Nesting of member routines in a \lstinline[language=C++]@class@ allows eliding \lstinline[language=C++]@this->@ because of nested lexical-scoping.
     1088Nesting of member routines in a \lstinline[language=C++]@class@ allows eliding \lstinline[language=C++]@this->@ because of lexical scoping.
     1089However, for other aggregate parameters, qualification is necessary:
     1090\begin{cfa}
     1091struct T { double m, n; };
     1092int C::mem( T & t ) {                                           $\C{// multiple aggregate parameters}$
     1093        c; i; d;                                                                $\C{\color{red}// this-\textgreater.c, this-\textgreater.i, this-\textgreater.d}$
     1094        `t.`m; `t.`n;                                                   $\C{// must qualify}$
     1095}
     1096\end{cfa}
    10811097
    10821098% In object-oriented programming, there is an implicit first parameter, often names @self@ or @this@, which is elided.
    1083 % In any programming language, some functions have a naturally close relationship with a particular data type. 
    1084 % Object-oriented programming allows this close relationship to be codified in the language by making such functions \emph{class methods} of their related data type.
    1085 % Class methods have certain privileges with respect to their associated data type, notably un-prefixed access to the fields of that data type. 
    1086 % When writing C functions in an object-oriented style, this un-prefixed access is swiftly missed, as access to fields of a @Foo* f@ requires an extra three characters @f->@ every time, which disrupts coding flow and clutters the produced code. 
    1087 % 
     1099% In any programming language, some functions have a naturally close relationship with a particular data type.
     1100% Object-oriented programming allows this close relationship to be codified in the language by making such functions \newterm{class methods} of their related data type.
     1101% Class methods have certain privileges with respect to their associated data type, notably un-prefixed access to the fields of that data type.
     1102% When writing C functions in an object-oriented style, this un-prefixed access is swiftly missed, as access to fields of a @Foo* f@ requires an extra three characters @f->@ every time, which disrupts coding flow and clutters the produced code.
     1103%
    10881104% \TODO{Fill out section. Be sure to mention arbitrary expressions in with-blocks, recent change driven by Thierry to prioritize field name over parameters.}
    10891105
    1090 \CFA provides a @with@ clause/statement (see Pascal~\cite[\S~4.F]{Pascal}) to elided aggregate qualification to fields by opening a scope containing field identifiers.
    1091 Hence, the qualified fields become variables, and making it easier to optimizing field references in a block.
    1092 \begin{cfa}
    1093 void f( S s ) `with s` {                                $\C{// with clause}$
    1094         c; i; d;                                                        $\C{\color{red}// s.c, s.i, s.d}$
     1106To simplify the programmer experience, \CFA provides a @with@ clause/statement (see Pascal~\cite[\S~4.F]{Pascal}) to elide aggregate qualification to fields by opening a scope containing the field identifiers.
     1107Hence, the qualified fields become variables with the side-effect that it is easier to optimizing field references in a block.
     1108\begin{cfa}
     1109void f( S s ) `with( s )` {                                     $\C{// with clause}$
     1110        c; i; d;                                                                $\C{\color{red}// s.c, s.i, s.d}$
    10951111}
    10961112\end{cfa}
    10971113and the equivalence for object-style programming is:
    10981114\begin{cfa}
    1099 int mem( S & this ) `with this` {               $\C{// with clause}$
    1100         c; i; d;                                                        $\C{\color{red}// this.c, this.i, this.d}$
    1101 }
    1102 \end{cfa}
    1103 The key generality over the object-oriented approach is that one aggregate parameter \lstinline[language=C++]@this@ is not treated specially over other aggregate parameters:
    1104 \begin{cfa}
    1105 struct T { double m, n; };
    1106 int mem( S & s, T & t ) `with s, t` {   $\C{// multiple aggregate parameters}$
    1107         c; i; d;                                                        $\C{\color{red}// s.c, s.i, s.d}$
    1108         m; n;                                                           $\C{\color{red}// t.m, t.n}$
    1109 }
    1110 \end{cfa}
    1111 The equivalent object-oriented style is:
    1112 \begin{cfa}
    1113 int S::mem( T & t ) {                                   $\C{// multiple aggregate parameters}$
    1114         c; i; d;                                                        $\C{\color{red}// this-\textgreater.c, this-\textgreater.i, this-\textgreater.d}$
    1115         `t.`m; `t.`n;
     1115int mem( S & this ) `with( this )` {            $\C{// with clause}$
     1116        c; i; d;                                                                $\C{\color{red}// this.c, this.i, this.d}$
     1117}
     1118\end{cfa}
     1119with the generality of opening multiple aggregate-parameters:
     1120\begin{cfa}
     1121int mem( S & s, T & t ) `with( s, t )` {        $\C{// multiple aggregate parameters}$
     1122        c; i; d;                                                                $\C{\color{red}// s.c, s.i, s.d}$
     1123        m; n;                                                                   $\C{\color{red}// t.m, t.n}$
     1124}
     1125\end{cfa}
     1126
     1127In detail, the @with@ clause/statement has the form:
     1128\begin{cfa}
     1129$\emph{with-statement}$:
     1130        'with' '(' $\emph{expression-list}$ ')' $\emph{compound-statement}$
     1131\end{cfa}
     1132and may appear as the body of a routine or nested within a routine body.
     1133Each expression in the expression-list provides a type and object.
     1134The type must be an aggregate type.
     1135(Enumerations are already opened.)
     1136The object is the implicit qualifier for the open structure-fields.
     1137
     1138All expressions in the expression list are open in ``parallel'' within the compound statement.
     1139This semantic is different from Pascal, which nests the openings.
     1140The difference between parallel and nesting occurs for fields with the same name but different type:
     1141\begin{cfa}
     1142struct S { int i; int j; double m; } s, w;
     1143struct T { int i; int k; int m } t, w;
     1144with( s, t ) {
     1145        j + k;                                                                  $\C{// unambiguous, s.j + t.m}$
     1146        m = 5.0;                                                                $\C{// unambiguous, t.m = 5.0}$
     1147        m = 1;                                                                  $\C{// unambiguous, s.m = 1}$
     1148        int a = s.i + m;                                                $\C{// unambiguous, a = s.i + t.i}$
     1149        int b = s.i + t.i;                                              $\C{// unambiguous, qualification}$
     1150        sout | (double)m | endl;                                $\C{// unambiguous, cast}$
     1151        i;                                                                              $\C{// ambiguous}$
     1152}
     1153\end{cfa}
     1154\CFA's ability to overload variables means usages of field with the same names can be automatically disambiguated, eliminating most qualification.
     1155Qualification or a cast is used to disambiguate.
     1156A cast may be necessary to disambiguate between the overload variables in a @with@ expression:
     1157\begin{cfa}
     1158with( w ) { ... }                                                       $\C{// ambiguous, same name and no context}$
     1159with( (S)w ) { ... }                                            $\C{// unambiguous}$
     1160\end{cfa}
     1161
     1162\begin{cfa}
     1163struct S { int i, j; } sv;
     1164with( sv ) {
     1165        S & sr = sv;
     1166        with( sr ) {
     1167                S * sp = &sv;
     1168                with( *sp ) {
     1169                        i = 3; j = 4;                                   $\C{\color{red}// sp-{\textgreater}i, sp-{\textgreater}j}$
     1170                }
     1171                i = 3; j = 4;                                           $\C{\color{red}// sr.i, sr.j}$
     1172        }
     1173        i = 3; j = 4;                                                   $\C{\color{red}// sv.i, sv.j}$
    11161174}
    11171175\end{cfa}
     
    11221180        struct S1 { ... } s1;
    11231181        struct S2 { ... } s2;
    1124         `with s1` {                                             $\C{// with statement}$
     1182        `with( s1 )` {                                                  $\C{// with statement}$
    11251183                // access fields of s1 without qualification
    1126                 `with s2` {                                     $\C{// nesting}$
     1184                `with( s2 )` {                                          $\C{// nesting}$
    11271185                        // access fields of s1 and s2 without qualification
    11281186                }
    11291187        }
    1130         `with s1, s2` {
     1188        `with( s1, s2 )` {
    11311189                // access unambiguous fields of s1 and s2 without qualification
    11321190        }
     
    11341192\end{cfa}
    11351193
    1136 When opening multiple structures, fields with the same name and type are ambiguous and must be fully qualified.
    1137 For fields with the same name but different type, context/cast can be used to disambiguate.
    1138 \begin{cfa}
    1139 struct S { int i; int j; double m; } a, c;
    1140 struct T { int i; int k; int m } b, c;
    1141 `with a, b` {
    1142         j + k;                                                  $\C{// unambiguous, unique names define unique types}$
    1143         i;                                                              $\C{// ambiguous, same name and type}$
    1144         a.i + b.i;                                              $\C{// unambiguous, qualification defines unique names}$
    1145         m;                                                              $\C{// ambiguous, same name and no context to define unique type}$
    1146         m = 5.0;                                                $\C{// unambiguous, same name and context defines unique type}$
    1147         m = 1;                                                  $\C{// unambiguous, same name and context defines unique type}$
    1148 }
    1149 `with c` { ... }                                        $\C{// ambiguous, same name and no context}$
    1150 `with (S)c` { ... }                                     $\C{// unambiguous, same name and cast defines unique type}$
    1151 \end{cfa}
    1152 
    1153 The components in the "with" clause
    1154 
    1155   with a, b, c { ... }
    1156 
    1157 serve 2 purposes: each component provides a type and object. The type must be a
    1158 structure type. Enumerations are already opened, and I think a union is opened
    1159 to some extent, too. (Or is that just unnamed unions?) The object is the target
    1160 that the naked structure-fields apply to. The components are open in "parallel"
    1161 at the scope of the "with" clause/statement, so opening "a" does not affect
    1162 opening "b", etc. This semantic is different from Pascal, which nests the
    1163 openings.
    1164 
    1165 Having said the above, it seems reasonable to allow a "with" component to be an
    1166 expression. The type is the static expression-type and the object is the result
    1167 of the expression. Again, the type must be an aggregate. Expressions require
    1168 parenthesis around the components.
    1169 
    1170   with( a, b, c ) { ... }
    1171 
    1172 Does this now make sense?
    1173 
    1174 Having written more CFA code, it is becoming clear to me that I *really* want
    1175 the "with" to be implemented because I hate having to type all those object
    1176 names for fields. It's a great way to drive people away from the language.
    1177 
    11781194
    11791195\subsection{Exception Handling ???}
     
    11821198\section{Declarations}
    11831199
    1184 It is important to the design team that \CFA subjectively ``feel like'' C to user programmers. 
    1185 An important part of this subjective feel is maintaining C's procedural programming paradigm, as opposed to the object-oriented paradigm of other systems languages such as \CC and Rust. 
    1186 Maintaining this procedural paradigm means that coding patterns that work in C will remain not only functional but idiomatic in \CFA, reducing the mental burden of retraining C programmers and switching between C and \CFA development. 
     1200It is important to the design team that \CFA subjectively ``feel like'' C to user programmers.
     1201An important part of this subjective feel is maintaining C's procedural programming paradigm, as opposed to the object-oriented paradigm of other systems languages such as \CC and Rust.
     1202Maintaining this procedural paradigm means that coding patterns that work in C will remain not only functional but idiomatic in \CFA, reducing the mental burden of retraining C programmers and switching between C and \CFA development.
    11871203Nonetheless, some features of object-oriented languages are undeniably convienient, and the \CFA design team has attempted to adapt them to a procedural paradigm so as to incorporate their benefits into \CFA; two of these features are resource management and name scoping.
    11881204
     
    11901206\subsection{Alternative Declaration Syntax}
    11911207
     1208\newcommand{\R}[1]{\Textbf{#1}}
     1209\newcommand{\B}[1]{{\Textbf[blue]{#1}}}
     1210\newcommand{\G}[1]{{\Textbf[OliveGreen]{#1}}}
     1211
     1212C declaration syntax is notoriously confusing and error prone.
     1213For example, many C programmers are confused by a declaration as simple as:
     1214\begin{cquote}
     1215\lstDeleteShortInline@%
     1216\begin{tabular}{@{}ll@{}}
     1217\begin{cfa}
     1218int * x[5]
     1219\end{cfa}
     1220&
     1221\raisebox{-0.75\totalheight}{\input{Cdecl}}
     1222\end{tabular}
     1223\lstMakeShortInline@%
     1224\end{cquote}
     1225Is this an array of 5 pointers to integers or a pointer to an array of 5 integers?
     1226If there is any doubt, it implies productivity and safety issues even for basic programs.
     1227Another example of confusion results from the fact that a routine name and its parameters are embedded within the return type, mimicking the way the return value is used at the routine's call site.
     1228For example, a routine returning a pointer to an array of integers is defined and used in the following way:
     1229\begin{cfa}
     1230int `(*`f`())[`5`]` {...};                              $\C{// definition}$
     1231 ... `(*`f`())[`3`]` += 1;                              $\C{// usage}$
     1232\end{cfa}
     1233Essentially, the return type is wrapped around the routine name in successive layers (like an onion).
     1234While attempting to make the two contexts consistent is a laudable goal, it has not worked out in practice.
     1235
     1236\CFA provides its own type, variable and routine declarations, using a different syntax.
     1237The new declarations place qualifiers to the left of the base type, while C declarations place qualifiers to the right of the base type.
     1238In the following example, \R{red} is the base type and \B{blue} is qualifiers.
     1239The \CFA declarations move the qualifiers to the left of the base type, \ie move the blue to the left of the red, while the qualifiers have the same meaning but are ordered left to right to specify a variable's type.
     1240\begin{cquote}
     1241\lstDeleteShortInline@%
     1242\lstset{moredelim=**[is][\color{blue}]{+}{+}}
     1243\begin{tabular}{@{}l@{\hspace{3em}}l@{}}
     1244\multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}}        & \multicolumn{1}{c}{\textbf{C}}        \\
     1245\begin{cfa}
     1246+[5] *+ `int` x1;
     1247+* [5]+ `int` x2;
     1248+[* [5] int]+ f`( int p )`;
     1249\end{cfa}
     1250&
     1251\begin{cfa}
     1252`int` +*+ x1 +[5]+;
     1253`int` +(*+x2+)[5]+;
     1254+int (*+f`( int p )`+)[5]+;
     1255\end{cfa}
     1256\end{tabular}
     1257\lstMakeShortInline@%
     1258\end{cquote}
     1259The only exception is bit field specification, which always appear to the right of the base type.
     1260% Specifically, the character @*@ is used to indicate a pointer, square brackets @[@\,@]@ are used to represent an array or function return value, and parentheses @()@ are used to indicate a routine parameter.
     1261However, unlike C, \CFA type declaration tokens are distributed across all variables in the declaration list.
     1262For instance, variables @x@ and @y@ of type pointer to integer are defined in \CFA as follows:
     1263\begin{cquote}
     1264\lstDeleteShortInline@%
     1265\begin{tabular}{@{}l@{\hspace{3em}}l@{}}
     1266\multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}}        & \multicolumn{1}{c}{\textbf{C}}        \\
     1267\begin{cfa}
     1268`*` int x, y;
     1269\end{cfa}
     1270&
     1271\begin{cfa}
     1272int `*`x, `*`y;
     1273\end{cfa}
     1274\end{tabular}
     1275\lstMakeShortInline@%
     1276\end{cquote}
     1277The downside of this semantics is the need to separate regular and pointer declarations:
     1278\begin{cquote}
     1279\lstDeleteShortInline@%
     1280\begin{tabular}{@{}l@{\hspace{3em}}l@{}}
     1281\multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}}        & \multicolumn{1}{c}{\textbf{C}}        \\
     1282\begin{cfa}
     1283`*` int x;
     1284int y;
     1285\end{cfa}
     1286&
     1287\begin{cfa}
     1288int `*`x, y;
     1289
     1290\end{cfa}
     1291\end{tabular}
     1292\lstMakeShortInline@%
     1293\end{cquote}
     1294which is prescribing a safety benefit.
     1295Other examples are:
     1296\begin{cquote}
     1297\lstDeleteShortInline@%
     1298\begin{tabular}{@{}l@{\hspace{3em}}l@{\hspace{2em}}l@{}}
     1299\multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}}        & \multicolumn{1}{c@{\hspace{2em}}}{\textbf{C}} \\
     1300\begin{cfa}
     1301[ 5 ] int z;
     1302[ 5 ] * char w;
     1303* [ 5 ] double v;
     1304struct s {
     1305        int f0:3;
     1306        * int f1;
     1307        [ 5 ] * int f2;
     1308};
     1309\end{cfa}
     1310&
     1311\begin{cfa}
     1312int z[ 5 ];
     1313char * w[ 5 ];
     1314double (* v)[ 5 ];
     1315struct s {
     1316        int f0:3;
     1317        int * f1;
     1318        int * f2[ 5 ]
     1319};
     1320\end{cfa}
     1321&
     1322\begin{cfa}
     1323// array of 5 integers
     1324// array of 5 pointers to char
     1325// pointer to array of 5 doubles
     1326
     1327// common bit field syntax
     1328
     1329
     1330
     1331\end{cfa}
     1332\end{tabular}
     1333\lstMakeShortInline@%
     1334\end{cquote}
     1335
     1336All type qualifiers, \eg @const@, @volatile@, etc., are used in the normal way with the new declarations and also appear left to right, \eg:
     1337\begin{cquote}
     1338\lstDeleteShortInline@%
     1339\begin{tabular}{@{}l@{\hspace{1em}}l@{\hspace{1em}}l@{}}
     1340\multicolumn{1}{c@{\hspace{1em}}}{\textbf{\CFA}}        & \multicolumn{1}{c@{\hspace{1em}}}{\textbf{C}} \\
     1341\begin{cfa}
     1342const * const int x;
     1343const * [ 5 ] const int y;
     1344\end{cfa}
     1345&
     1346\begin{cfa}
     1347int const * const x;
     1348const int (* const y)[ 5 ]
     1349\end{cfa}
     1350&
     1351\begin{cfa}
     1352// const pointer to const integer
     1353// const pointer to array of 5 const integers
     1354\end{cfa}
     1355\end{tabular}
     1356\lstMakeShortInline@%
     1357\end{cquote}
     1358All declaration qualifiers, \eg @extern@, @static@, etc., are used in the normal way with the new declarations but can only appear at the start of a \CFA routine declaration,\footnote{\label{StorageClassSpecifier}
     1359The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature.~\cite[\S~6.11.5(1)]{C11}} \eg:
     1360\begin{cquote}
     1361\lstDeleteShortInline@%
     1362\begin{tabular}{@{}l@{\hspace{3em}}l@{\hspace{2em}}l@{}}
     1363\multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}}        & \multicolumn{1}{c@{\hspace{2em}}}{\textbf{C}} \\
     1364\begin{cfa}
     1365extern [ 5 ] int x;
     1366static * const int y;
     1367\end{cfa}
     1368&
     1369\begin{cfa}
     1370int extern x[ 5 ];
     1371const int static * y;
     1372\end{cfa}
     1373&
     1374\begin{cfa}
     1375// externally visible array of 5 integers
     1376// internally visible pointer to constant int
     1377\end{cfa}
     1378\end{tabular}
     1379\lstMakeShortInline@%
     1380\end{cquote}
     1381
     1382The new declaration syntax can be used in other contexts where types are required, \eg casts and the pseudo-routine @sizeof@:
     1383\begin{cquote}
     1384\lstDeleteShortInline@%
     1385\begin{tabular}{@{}l@{\hspace{3em}}l@{}}
     1386\multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}}        & \multicolumn{1}{c}{\textbf{C}}        \\
     1387\begin{cfa}
     1388y = (* int)x;
     1389i = sizeof([ 5 ] * int);
     1390\end{cfa}
     1391&
     1392\begin{cfa}
     1393y = (int *)x;
     1394i = sizeof(int * [ 5 ]);
     1395\end{cfa}
     1396\end{tabular}
     1397\lstMakeShortInline@%
     1398\end{cquote}
     1399
     1400Finally, new \CFA declarations may appear together with C declarations in the same program block, but cannot be mixed within a specific declaration.
     1401Therefore, a programmer has the option of either continuing to use traditional C declarations or take advantage of the new style.
     1402Clearly, both styles need to be supported for some time due to existing C-style header-files, particularly for UNIX-like systems.
     1403
    11921404
    11931405\subsection{References}
    11941406
    1195 All variables in C have an \emph{address}, a \emph{value}, and a \emph{type}; at the position in the program's memory denoted by the address, there exists a sequence of bits (the value), with the length and semantic meaning of this bit sequence defined by the type.
    1196 The C type system does not always track the relationship between a value and its address; a value that does not have a corresponding address is called a \emph{rvalue} (for ``right-hand value''), while a value that does have an address is called a \emph{lvalue} (for ``left-hand value''); in @int x; x = 42;@ the variable expression @x@ on the left-hand-side of the assignment is a lvalue, while the constant expression @42@ on the right-hand-side of the assignment is a rvalue.
    1197 Which address a value is located at is sometimes significant; the imperative programming paradigm of C relies on the mutation of values at specific addresses.
    1198 Within a lexical scope, lvalue exressions can be used in either their \emph{address interpretation} to determine where a mutated value should be stored or in their \emph{value interpretation} to refer to their stored value; in @x = y;@ in @{ int x, y = 7; x = y; }@, @x@ is used in its address interpretation, while y is used in its value interpretation.
    1199 Though this duality of interpretation is useful, C lacks a direct mechanism to pass lvalues between contexts, instead relying on \emph{pointer types} to serve a similar purpose.
    1200 In C, for any type @T@ there is a pointer type @T*@, the value of which is the address of a value of type @T@; a pointer rvalue can be explicitly \emph{dereferenced} to the pointed-to lvalue with the dereference operator @*?@, while the rvalue representing the address of a lvalue can be obtained with the address-of operator @&?@.
     1407All variables in C have an \newterm{address}, a \newterm{value}, and a \newterm{type};
     1408at the position in the program's memory denoted by the address, there exists a sequence of bits (the value), with the length and semantic meaning of this bit sequence defined by the type.
     1409The C type-system does not always track the relationship between a value and its address;
     1410a value that does not have a corresponding address is called a \newterm{rvalue} (for ``right-hand value''), while a value that does have an address is called a \newterm{lvalue} (for ``left-hand value'').
     1411For example, in @int x; x = 42;@ the variable expression @x@ on the left-hand-side of the assignment is a lvalue, while the constant expression @42@ on the right-hand-side of the assignment is a rvalue.
     1412In imperative programming, the address of a value is used for both reading and writing (mutating) a value.
     1413
     1414Within a lexical scope, lvalue expressions have an \newterm{address interpretation} for writing a value or a \newterm{value interpretation} to read a value.
     1415For example, in @x = y@, @x@ has an address interpretation, while @y@ has a value interpretation.
     1416Though this duality of interpretation is useful, C lacks a direct mechanism to pass lvalues between contexts, instead relying on \newterm{pointer types} to serve a similar purpose.
     1417In C, for any type @T@ there is a pointer type @T *@, the value of which is the address of a value of type @T@.
     1418A pointer rvalue can be explicitly \newterm{dereferenced} to the pointed-to lvalue with the dereference operator @*?@, while the rvalue representing the address of a lvalue can be obtained with the address-of operator @&?@.
    12011419
    12021420\begin{cfa}
    12031421int x = 1, y = 2, * p1, * p2, ** p3;
    1204 p1 = &x;  $\C{// p1 points to x}$
    1205 p2 = &y;  $\C{// p2 points to y}$
    1206 p3 = &p1;  $\C{// p3 points to p1}$
    1207 \end{cfa}
    1208 
    1209 Unfortunately, the dereference and address-of operators introduce a great deal of syntactic noise when dealing with pointed-to values rather than pointers, as well as the potential for subtle bugs.
    1210 It would be desirable to have the compiler figure out how to elide the dereference operators in a complex expression such as @*p2 = ((*p1 + *p2) * (**p3 - *p1)) / (**p3 - 15);@, for both brevity and clarity.
    1211 However, since C defines a number of forms of \emph{pointer arithmetic}, two similar expressions involving pointers to arithmetic types (\eg @*p1 + x@ and @p1 + x@) may each have well-defined but distinct semantics, introducing the possibility that a user programmer may write one when they mean the other, and precluding any simple algorithm for elision of dereference operators.
     1422p1 = &x;                                                                $\C{// p1 points to x}$
     1423p2 = &y;                                                                $\C{// p2 points to y}$
     1424p3 = &p1;                                                               $\C{// p3 points to p1}$
     1425*p2 = ((*p1 + *p2) * (**p3 - *p1)) / (**p3 - 15);
     1426\end{cfa}
     1427
     1428Unfortunately, the dereference and address-of operators introduce a great deal of syntactic noise when dealing with pointed-to values rather than pointers, as well as the potential for subtle bugs.
     1429For both brevity and clarity, it would be desirable to have the compiler figure out how to elide the dereference operators in a complex expression such as the assignment to @*p2@ above.
     1430However, since C defines a number of forms of \newterm{pointer arithmetic}, two similar expressions involving pointers to arithmetic types (\eg @*p1 + x@ and @p1 + x@) may each have well-defined but distinct semantics, introducing the possibility that a user programmer may write one when they mean the other, and precluding any simple algorithm for elision of dereference operators.
    12121431To solve these problems, \CFA introduces reference types @T&@; a @T&@ has exactly the same value as a @T*@, but where the @T*@ takes the address interpretation by default, a @T&@ takes the value interpretation by default, as below:
    12131432
    12141433\begin{cfa}
    1215 inx x = 1, y = 2, & r1, & r2, && r3;
     1434int x = 1, y = 2, & r1, & r2, && r3;
    12161435&r1 = &x;  $\C{// r1 points to x}$
    12171436&r2 = &y;  $\C{// r2 points to y}$
     
    12201439\end{cfa}
    12211440
    1222 Except for auto-dereferencing by the compiler, this reference example is exactly the same as the previous pointer example. 
    1223 Hence, a reference behaves like a variable name -- an lvalue expression which is interpreted as a value, but also has the type system track the address of that value. 
     1441Except for auto-dereferencing by the compiler, this reference example is exactly the same as the previous pointer example.
     1442Hence, a reference behaves like a variable name -- an lvalue expression which is interpreted as a value, but also has the type system track the address of that value.
    12241443One way to conceptualize a reference is via a rewrite rule, where the compiler inserts a dereference operator before the reference variable for each reference qualifier in the reference variable declaration, so the previous example implicitly acts like:
    12251444
     
    12281447\end{cfa}
    12291448
    1230 References in \CFA are similar to those in \CC, but with a couple important improvements, both of which can be seen in the example above.
    1231 Firstly, \CFA does not forbid references to references, unlike \CC.
    1232 This provides a much more orthogonal design for library implementors, obviating the need for workarounds such as @std::reference_wrapper@.
    1233 
    1234 Secondly, unlike the references in \CC which always point to a fixed address, \CFA references are rebindable.
    1235 This allows \CFA references to be default-initialized (to a null pointer), and also to point to different addresses throughout their lifetime.
    1236 This rebinding is accomplished without adding any new syntax to \CFA, but simply by extending the existing semantics of the address-of operator in C.
    1237 In C, the address of a lvalue is always a rvalue, as in general that address is not stored anywhere in memory, and does not itself have an address.
    1238 In \CFA, the address of a @T&@ is a lvalue @T*@, as the address of the underlying @T@ is stored in the reference, and can thus be mutated there.
    1239 The result of this rule is that any reference can be rebound using the existing pointer assignment semantics by assigning a compatible pointer into the address of the reference, \eg @&r1 = &x;@ above.
    1240 This rebinding can occur to an arbitrary depth of reference nesting; $n$ address-of operators applied to a reference nested $m$ times will produce an lvalue pointer nested $n$ times if $n \le m$ (note that $n = m+1$ is simply the usual C rvalue address-of operator applied to the $n = m$ case).
    1241 The explicit address-of operators can be thought of as ``cancelling out'' the implicit dereference operators, \eg @(&`*`)r1 = &x@ or @(&(&`*`)`*`)r3 = &(&`*`)r1@ or even @(&`*`)r2 = (&`*`)`*`r3@ for @&r2 = &r3@.
    1242 
    1243 Since pointers and references share the same internal representation, code using either is equally performant; in fact the \CFA compiler converts references to pointers internally, and the choice between them in user code can be made based solely on convenience.
     1449References in \CFA are similar to those in \CC, but with a couple important improvements, both of which can be seen in the example above.
     1450Firstly, \CFA does not forbid references to references, unlike \CC.
     1451This provides a much more orthogonal design for library implementors, obviating the need for workarounds such as @std::reference_wrapper@.
     1452
     1453Secondly, unlike the references in \CC which always point to a fixed address, \CFA references are rebindable.
     1454This allows \CFA references to be default-initialized (\eg to a null pointer), and also to point to different addresses throughout their lifetime.
     1455This rebinding is accomplished without adding any new syntax to \CFA, but simply by extending the existing semantics of the address-of operator in C.
     1456
     1457In C, the address of a lvalue is always a rvalue, as in general that address is not stored anywhere in memory, and does not itself have an address.
     1458In \CFA, the address of a @T&@ is a lvalue @T*@, as the address of the underlying @T@ is stored in the reference, and can thus be mutated there.
     1459The result of this rule is that any reference can be rebound using the existing pointer assignment semantics by assigning a compatible pointer into the address of the reference, \eg @&r1 = &x;@ above.
     1460This rebinding can occur to an arbitrary depth of reference nesting; loosely speaking, nested address-of operators will produce an lvalue nested pointer up to as deep as the reference they're applied to.
     1461These explicit address-of operators can be thought of as ``cancelling out'' the implicit dereference operators, \eg @(&`*`)r1 = &x@ or @(&(&`*`)`*`)r3 = &(&`*`)r1@ or even @(&`*`)r2 = (&`*`)`*`r3@ for @&r2 = &r3@.
     1462More precisely:
     1463\begin{itemize}
     1464        \item
     1465        if @R@ is an rvalue of type {@T &@$_1 \cdots$@ &@$_r$} where $r \ge 1$ references (@&@ symbols) than @&R@ has type {@T `*`&@$_{\color{red}2} \cdots$@ &@$_{\color{red}r}$}, \\ \ie @T@ pointer with $r-1$ references (@&@ symbols).
     1466       
     1467        \item
     1468        if @L@ is an lvalue of type {@T &@$_1 \cdots$@ &@$_l$} where $l \ge 0$ references (@&@ symbols) then @&L@ has type {@T `*`&@$_{\color{red}1} \cdots$@ &@$_{\color{red}l}$}, \\ \ie @T@ pointer with $l$ references (@&@ symbols).
     1469\end{itemize}
     1470Since pointers and references share the same internal representation, code using either is equally performant; in fact the \CFA compiler converts references to pointers internally, and the choice between them in user code can be made based solely on convenience.
     1471
    12441472By analogy to pointers, \CFA references also allow cv-qualifiers:
    12451473
     
    12541482\end{cfa}
    12551483
    1256 Given that a reference is meant to represent a lvalue, \CFA provides some syntactic shortcuts when initializing references.
    1257 There are three initialization contexts in \CFA: declaration initialization, argument/parameter binding, and return/temporary binding.
    1258 In each of these contexts, the address-of operator on the target lvalue may (in fact, must) be elided.
    1259 The syntactic motivation for this is clearest when considering overloaded operator-assignment, \eg @int ?+=?(int &, int)@; given @int x, y@, the expected call syntax is @x += y@, not @&x += y@.
    1260 
    1261 This initialization of references from lvalues rather than pointers can be considered a ``lvalue-to-reference'' conversion rather than an elision of the address-of operator; similarly, use of a the value pointed to by a reference in an rvalue context can be thought of as a ``reference-to-rvalue'' conversion.
    1262 \CFA includes one more reference conversion, an ``rvalue-to-reference'' conversion, implemented by means of an implicit temporary.
    1263 When an rvalue is used to initialize a reference, it is instead used to initialize a hidden temporary value with the same lexical scope as the reference, and the reference is initialized to the address of this temporary.
    1264 This allows complex values to be succinctly and efficiently passed to functions, without the syntactic overhead of explicit definition of a temporary variable or the runtime cost of pass-by-value.
    1265 \CC allows a similar binding, but only for @const@ references; the more general semantics of \CFA are an attempt to avoid the \emph{const hell} problem, in which addition of a @const@ qualifier to one reference requires a cascading chain of added qualifiers.
     1484Given that a reference is meant to represent a lvalue, \CFA provides some syntactic shortcuts when initializing references.
     1485There are three initialization contexts in \CFA: declaration initialization, argument/parameter binding, and return/temporary binding.
     1486In each of these contexts, the address-of operator on the target lvalue may (in fact, must) be elided.
     1487The syntactic motivation for this is clearest when considering overloaded operator-assignment, \eg @int ?+=?(int &, int)@; given @int x, y@, the expected call syntax is @x += y@, not @&x += y@.
     1488
     1489More generally, this initialization of references from lvalues rather than pointers is an instance of a ``lvalue-to-reference'' conversion rather than an elision of the address-of operator; this conversion can actually be used in any context in \CFA an implicit conversion would be allowed.
     1490Similarly, use of a the value pointed to by a reference in an rvalue context can be thought of as a ``reference-to-rvalue'' conversion, and \CFA also includes a qualifier-adding ``reference-to-reference'' conversion, analogous to the @T *@ to @const T *@ conversion in standard C.
     1491The final reference conversion included in \CFA is ``rvalue-to-reference'' conversion, implemented by means of an implicit temporary.
     1492When an rvalue is used to initialize a reference, it is instead used to initialize a hidden temporary value with the same lexical scope as the reference, and the reference is initialized to the address of this temporary.
     1493This allows complex values to be succinctly and efficiently passed to functions, without the syntactic overhead of explicit definition of a temporary variable or the runtime cost of pass-by-value.
     1494\CC allows a similar binding, but only for @const@ references; the more general semantics of \CFA are an attempt to avoid the \newterm{const hell} problem, in which addition of a @const@ qualifier to one reference requires a cascading chain of added qualifiers.
     1495
    12661496
    12671497\subsection{Constructors and Destructors}
    12681498
    1269 One of the strengths of C is the control over memory management it gives programmers, allowing resource release to be more consistent and precisely timed than is possible with garbage-collected memory management.
    1270 However, this manual approach to memory management is often verbose, and it is useful to manage resources other than memory (\eg file handles) using the same mechanism as memory.
    1271 \CC is well-known for an approach to manual memory management that addresses both these issues, Resource Allocation Is Initialization (RAII), implemented by means of special \emph{constructor} and \emph{destructor} functions; we have implemented a similar feature in \CFA.
    1272 
    1273 \TODO{Fill out section. Mention field-constructors and at-equal escape hatch to C-style initialization. Probably pull some text from Rob's thesis for first draft.}
    1274 
     1499One of the strengths of C is the control over memory management it gives programmers, allowing resource release to be more consistent and precisely timed than is possible with garbage-collected memory management.
     1500However, this manual approach to memory management is often verbose, and it is useful to manage resources other than memory (\eg file handles) using the same mechanism as memory.
     1501\CC is well-known for an approach to manual memory management that addresses both these issues, Resource Aquisition Is Initialization (RAII), implemented by means of special \newterm{constructor} and \newterm{destructor} functions; we have implemented a similar feature in \CFA.
     1502While RAII is a common feature of object-oriented programming languages, its inclusion in \CFA does not violate the design principle that \CFA retain the same procedural paradigm as C.
     1503In particular, \CFA does not implement class-based encapsulation: neither the constructor nor any other function has privileged access to the implementation details of a type, except through the translation-unit-scope method of opaque structs provided by C.
     1504
     1505In \CFA, a constructor is a function named @?{}@, while a destructor is a function named @^?{}@; like other \CFA operators, these names represent the syntax used to call the constructor or destructor, \eg @x{ ... };@ or @^x{};@.
     1506Every constructor and destructor must have a return type of @void@, and its first parameter must have a reference type whose base type is the type of the object the function constructs or destructs.
     1507This first parameter is informally called the @this@ parameter, as in many object-oriented languages, though a programmer may give it an arbitrary name.
     1508Destructors must have exactly one parameter, while constructors allow passing of zero or more additional arguments along with the @this@ parameter.
     1509
     1510\begin{cfa}
     1511struct Array {
     1512        int * data;
     1513        int len;
     1514};
     1515
     1516void ?{}( Array& arr ) {
     1517        arr.len = 10;
     1518        arr.data = calloc( arr.len, sizeof(int) );
     1519}
     1520
     1521void ^?{}( Array& arr ) {
     1522        free( arr.data );
     1523}
     1524
     1525{
     1526        Array x;
     1527        `?{}(x);`       $\C{// implicitly compiler-generated}$
     1528        // ... use x
     1529        `^?{}(x);`      $\C{// implicitly compiler-generated}$
     1530}
     1531\end{cfa}
     1532
     1533In the example above, a \newterm{default constructor} (\ie one with no parameters besides the @this@ parameter) and destructor are defined for the @Array@ struct, a dynamic array of @int@.
     1534@Array@ is an example of a \newterm{managed type} in \CFA, a type with a non-trivial constructor or destructor, or with a field of a managed type.
     1535As in the example, all instances of managed types are implicitly constructed upon allocation, and destructed upon deallocation; this ensures proper initialization and cleanup of resources contained in managed types, in this case the @data@ array on the heap.
     1536The exact details of the placement of these implicit constructor and destructor calls are omitted here for brevity, the interested reader should consult \cite{Schluntz17}.
     1537
     1538Constructor calls are intended to seamlessly integrate with existing C initialization syntax, providing a simple and familiar syntax to veteran C programmers and allowing constructor calls to be inserted into legacy C code with minimal code changes.
     1539As such, \CFA also provides syntax for \newterm{copy initialization} and \newterm{initialization parameters}:
     1540
     1541\begin{cfa}
     1542void ?{}( Array& arr, Array other );
     1543
     1544void ?{}( Array& arr, int size, int fill );
     1545
     1546Array y = { 20, 0xDEADBEEF }, z = y;
     1547\end{cfa}
     1548
     1549Copy constructors have exactly two parameters, the second of which has the same type as the base type of the @this@ parameter; appropriate care is taken in the implementation to avoid recursive calls to the copy constructor when initializing this second parameter.
     1550Other constructor calls look just like C initializers, except rather than using field-by-field initialization (as in C), an initialization which matches a defined constructor will call the constructor instead.
     1551
     1552In addition to initialization syntax, \CFA provides two ways to explicitly call constructors and destructors.
     1553Explicit calls to constructors double as a placement syntax, useful for construction of member fields in user-defined constructors and reuse of large storage allocations.
     1554While the existing function-call syntax works for explicit calls to constructors and destructors, \CFA also provides a more concise \newterm{operator syntax} for both:
     1555
     1556\begin{cfa}
     1557Array a, b;
     1558a{};                            $\C{// default construct}$
     1559b{ a };                         $\C{// copy construct}$
     1560^a{};                           $\C{// destruct}$
     1561a{ 5, 0xFFFFFFFF };     $\C{// explicit constructor call}$
     1562\end{cfa}
     1563
     1564To provide a uniform type interface for @otype@ polymorphism, the \CFA compiler automatically generates a default constructor, copy constructor, assignment operator, and destructor for all types.
     1565These default functions can be overridden by user-generated versions of them.
     1566For compatibility with the standard behaviour of C, the default constructor and destructor for all basic, pointer, and reference types do nothing, while the copy constructor and assignment operator are bitwise copies; if default zero-initialization is desired, the default constructors can be overridden.
     1567For user-generated types, the four functions are also automatically generated.
     1568@enum@ types are handled the same as their underlying integral type, and unions are also bitwise copied and no-op initialized and destructed.
     1569For compatibility with C, a copy constructor from the first union member type is also defined.
     1570For @struct@ types, each of the four functions are implicitly defined to call their corresponding functions on each member of the struct.
     1571To better simulate the behaviour of C initializers, a set of \newterm{field constructors} is also generated for structures.
     1572A constructor is generated for each non-empty prefix of a structure's member-list which copy-constructs the members passed as parameters and default-constructs the remaining members.
     1573To allow users to limit the set of constructors available for a type, when a user declares any constructor or destructor, the corresponding generated function and all field constructors for that type are hidden from expression resolution; similarly, the generated default constructor is hidden upon declaration of any constructor.
     1574These semantics closely mirror the rule for implicit declaration of constructors in \CC\cite[p.~186]{ANSI98:C++}.
     1575
     1576In rare situations user programmers may not wish to have constructors and destructors called; in these cases, \CFA provides an ``escape hatch'' to not call them.
     1577If a variable is initialized using the syntax \lstinline|S x @= {}| it will be an \newterm{unmanaged object}, and will not have constructors or destructors called.
     1578Any C initializer can be the right-hand side of an \lstinline|@=| initializer, \eg  \lstinline|Array a @= { 0, 0x0 }|, with the usual C initialization semantics.
     1579In addition to the expressive power, \lstinline|@=| provides a simple path for migrating legacy C code to \CFA, by providing a mechanism to incrementally convert initializers; the \CFA design team decided to introduce a new syntax for this escape hatch because we believe that our RAII implementation will handle the vast majority of code in a desirable way, and we wished to maintain familiar syntax for this common case.
    12751580
    12761581\subsection{Default Parameters}
     
    13361641        TIMED( "copy_int", ti = si; )
    13371642        TIMED( "clear_int", clear( &si ); )
    1338         REPEAT_TIMED( "pop_int", N, 
     1643        REPEAT_TIMED( "pop_int", N,
    13391644                int xi = pop( &ti ); if ( xi > maxi ) { maxi = xi; } )
    13401645        REPEAT_TIMED( "print_int", N/2, print( out, vali, ":", vali, "\n" ); )
     
    13461651        TIMED( "copy_pair", tp = sp; )
    13471652        TIMED( "clear_pair", clear( &sp ); )
    1348         REPEAT_TIMED( "pop_pair", N, 
     1653        REPEAT_TIMED( "pop_pair", N,
    13491654                pair(_Bool, char) xp = pop( &tp ); if ( xp > maxp ) { maxp = xp; } )
    13501655        REPEAT_TIMED( "print_pair", N/2, print( out, valp, ":", valp, "\n" ); )
     
    13631668Note, the C benchmark uses unchecked casts as there is no runtime mechanism to perform such checks, while \CFA and \CC provide type-safety statically.
    13641669
    1365 Figure~\ref{fig:eval} and Table~\ref{tab:eval} show the results of running the benchmark in Figure~\ref{fig:BenchmarkTest} and its C, \CC, and \CCV equivalents. 
     1670Figure~\ref{fig:eval} and Table~\ref{tab:eval} show the results of running the benchmark in Figure~\ref{fig:BenchmarkTest} and its C, \CC, and \CCV equivalents.
    13661671The graph plots the median of 5 consecutive runs of each program, with an initial warm-up run omitted.
    13671672All code is compiled at \texttt{-O2} by GCC or G++ 6.2.0, with all \CC code compiled as \CCfourteen.
     
    13971702Finally, the binary size for \CFA is larger because of static linking with the \CFA libraries.
    13981703
    1399 \CFA is also competitive in terms of source code size, measured as a proxy for programmer effort. The line counts in Table~\ref{tab:eval} include implementations of @pair@ and @stack@ types for all four languages for purposes of direct comparison, though it should be noted that \CFA and \CC have pre-written data structures in their standard libraries that programmers would generally use instead. Use of these standard library types has minimal impact on the performance benchmarks, but shrinks the \CFA and \CC benchmarks to 73 and 54 lines, respectively. 
     1704\CFA is also competitive in terms of source code size, measured as a proxy for programmer effort. The line counts in Table~\ref{tab:eval} include implementations of @pair@ and @stack@ types for all four languages for purposes of direct comparison, though it should be noted that \CFA and \CC have pre-written data structures in their standard libraries that programmers would generally use instead. Use of these standard library types has minimal impact on the performance benchmarks, but shrinks the \CFA and \CC benchmarks to 73 and 54 lines, respectively.
    14001705On the other hand, C does not have a generic collections-library in its standard distribution, resulting in frequent reimplementation of such collection types by C programmers.
    1401 \CCV does not use the \CC standard template library by construction, and in fact includes the definition of @object@ and wrapper classes for @bool@, @char@, @int@, and @const char *@ in its line count, which inflates this count somewhat, as an actual object-oriented language would include these in the standard library; 
     1706\CCV does not use the \CC standard template library by construction, and in fact includes the definition of @object@ and wrapper classes for @bool@, @char@, @int@, and @const char *@ in its line count, which inflates this count somewhat, as an actual object-oriented language would include these in the standard library;
    14021707with their omission, the \CCV line count is similar to C.
    14031708We justify the given line count by noting that many object-oriented languages do not allow implementing new interfaces on library types without subclassing or wrapper types, which may be similarly verbose.
     
    14921797In addition, there are interesting future directions for the polymorphism design.
    14931798Notably, \CC template functions trade compile time and code bloat for optimal runtime of individual instantiations of polymorphic functions.
    1494 \CFA polymorphic functions use dynamic virtual-dispatch; 
     1799\CFA polymorphic functions use dynamic virtual-dispatch;
    14951800the runtime overhead of this approach is low, but not as low as inlining, and it may be beneficial to provide a mechanism for performance-sensitive code.
    14961801Two promising approaches are an @inline@ annotation at polymorphic function call sites to create a template-specialization of the function (provided the code is visible) or placing an @inline@ annotation on polymorphic function-definitions to instantiate a specialized version for some set of types (\CC template specialization).
Note: See TracChangeset for help on using the changeset viewer.