Ignore:
Timestamp:
Feb 12, 2018, 3:49:04 PM (6 years ago)
Author:
Rob Schluntz <rschlunt@…>
Branches:
ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
54c9000
Parents:
1dcd52a3 (diff), ff2d1139 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg.uwaterloo.ca:/u/cforall/software/cfa/cfa-cc

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/papers/general/Paper.tex

    r1dcd52a3 r7a052e34  
    102102\makeatother
    103103
     104\newenvironment{cquote}{%
     105        \list{}{\lstset{resetmargins=true,aboveskip=0pt,belowskip=0pt}\topsep=4pt\parsep=0pt\leftmargin=\parindent\rightmargin\leftmargin}%
     106        \item\relax
     107}{%
     108        \endlist
     109}% cquote
     110
    104111% CFA programming language, based on ANSI C (with some gcc additions)
    105112\lstdefinelanguage{CFA}[ANSI]{C}{
     
    227234int forty_two = identity( 42 );                         $\C{// T is bound to int, forty\_two == 42}$
    228235\end{lstlisting}
    229 The @identity@ function above can be applied to any complete \emph{object type} (or @otype@).
     236The @identity@ function above can be applied to any complete \newterm{object type} (or @otype@).
    230237The type variable @T@ is transformed into a set of additional implicit parameters encoding sufficient information about @T@ to create and return a variable of that type.
    231238The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor.
    232 If this extra information is not needed, \eg for a pointer, the type parameter can be declared as a \emph{data type} (or @dtype@).
     239If this extra information is not needed, \eg for a pointer, the type parameter can be declared as a \newterm{data type} (or @dtype@).
    233240
    234241In \CFA, the polymorphism runtime-cost is spread over each polymorphic call, due to passing more arguments to polymorphic functions;
     
    236243A design advantage is that, unlike \CC template-functions, \CFA polymorphic-functions are compatible with C \emph{separate compilation}, preventing compilation and code bloat.
    237244
    238 Since bare polymorphic-types provide a restricted set of available operations, \CFA provides a \emph{type assertion}~\cite[pp.~37-44]{Alphard} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type-variable.
     245Since bare polymorphic-types provide a restricted set of available operations, \CFA provides a \newterm{type assertion}~\cite[pp.~37-44]{Alphard} mechanism to provide further type information, where type assertions may be variable or function declarations that depend on a polymorphic type-variable.
    239246For example, the function @twice@ can be defined using the \CFA syntax for operator overloading:
    240247\begin{lstlisting}
     
    303310\end{lstlisting}
    304311Here, the single name @MAX@ replaces all the C type-specific names: @SHRT_MAX@, @INT_MAX@, @DBL_MAX@.
    305 As well, restricted constant overloading is allowed for the values @0@ and @1@, which have special status in C, \eg the value @0@ is both an integer and a pointer literal, so its meaning depends on context.
    306 In addition, several operations are defined in terms values @0@ and @1@, \eg:
    307 \begin{lstlisting}
    308 int x;
    309 if (x) x++                                                                      $\C{// if (x != 0) x += 1;}$
    310 \end{lstlisting}
    311 Every @if@ and iteration statement in C compares the condition with @0@, and every increment and decrement operator is semantically equivalent to adding or subtracting the value @1@ and storing the result.
    312 Due to these rewrite rules, the values @0@ and @1@ have the types @zero_t@ and @one_t@ in \CFA, which allows overloading various operations for new types that seamlessly connect to all special @0@ and @1@ contexts.
    313 The types @zero_t@ and @one_t@ have special built in implicit conversions to the various integral types, and a conversion to pointer types for @0@, which allows standard C code involving @0@ and @1@ to work as normal.
    314 
    315312
    316313\subsection{Traits}
    317314
    318 \CFA provides \emph{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration:
     315\CFA provides \newterm{traits} to name a group of type assertions, where the trait name allows specifying the same set of assertions in multiple locations, preventing repetition mistakes at each function declaration:
    319316\begin{lstlisting}
    320317trait summable( otype T ) {
     
    340337Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete type: stack-allocatable, default or copy-initialized, assigned, and deleted.
    341338
    342 In summation, the \CFA type-system uses \emph{nominal typing} for concrete types, matching with the C type-system, and \emph{structural typing} for polymorphic types.
     339In summation, the \CFA type-system uses \newterm{nominal typing} for concrete types, matching with the C type-system, and \newterm{structural typing} for polymorphic types.
    343340Hence, trait names play no part in type equivalence;
    344341the names are simply macros for a list of polymorphic assertions, which are expanded at usage sites.
     
    385382Furthermore, writing and using preprocessor macros can be unnatural and inflexible.
    386383
    387 \CC, Java, and other languages use \emph{generic types} to produce type-safe abstract data-types.
     384\CC, Java, and other languages use \newterm{generic types} to produce type-safe abstract data-types.
    388385\CFA also implements generic types that integrate efficiently and naturally with the existing polymorphic functions, while retaining backwards compatibility with C and providing separate compilation.
    389386However, for known concrete parameters, the generic-type definition can be inlined, like \CC templates.
     
    406403\end{lstlisting}
    407404
    408 \CFA classifies generic types as either \emph{concrete} or \emph{dynamic}.
     405\CFA classifies generic types as either \newterm{concrete} or \newterm{dynamic}.
    409406Concrete types have a fixed memory layout regardless of type parameters, while dynamic types vary in memory layout depending on their type parameters.
    410 A type may have polymorphic parameters but still be concrete, called \emph{dtype-static}.
     407A type may have polymorphic parameters but still be concrete, called \newterm{dtype-static}.
    411408Polymorphic pointers are an example of dtype-static types, \eg @forall(dtype T) T *@ is a polymorphic type, but for any @T@, @T *@  is a fixed-sized pointer, and therefore, can be represented by a @void *@ in code generation.
    412409
     
    445442Though \CFA implements concrete generic-types efficiently, it also has a fully general system for dynamic generic types.
    446443As mentioned in Section~\ref{sec:poly-fns}, @otype@ function parameters (in fact all @sized@ polymorphic parameters) come with implicit size and alignment parameters provided by the caller.
    447 Dynamic generic-types also have an \emph{offset array} containing structure-member offsets.
     444Dynamic generic-types also have an \newterm{offset array} containing structure-member offsets.
    448445A dynamic generic-union needs no such offset array, as all members are at offset 0, but size and alignment are still necessary.
    449446Access to members of a dynamic structure is provided at runtime via base-displacement addressing with the structure pointer and the member offset (similar to the @offsetof@ macro), moving a compile-time offset calculation to runtime.
     
    458455For instance, modularity is generally provided in C by including an opaque forward-declaration of a structure and associated accessor and mutator functions in a header file, with the actual implementations in a separately-compiled @.c@ file.
    459456\CFA supports this pattern for generic types, but the caller does not know the actual layout or size of the dynamic generic-type, and only holds it by a pointer.
    460 The \CFA translator automatically generates \emph{layout functions} for cases where the size, alignment, and offset array of a generic struct cannot be passed into a function from that function's caller.
     457The \CFA translator automatically generates \newterm{layout functions} for cases where the size, alignment, and offset array of a generic struct cannot be passed into a function from that function's caller.
    461458These layout functions take as arguments pointers to size and alignment variables and a caller-allocated array of member offsets, as well as the size and alignment of all @sized@ parameters to the generic structure (un@sized@ parameters are forbidden from being used in a context that affects layout).
    462459Results of these layout functions are cached so that they are only computed once per type per function. %, as in the example below for @pair@.
     
    482479Since @pair(T *, T * )@ is a concrete type, there are no implicit parameters passed to @lexcmp@, so the generated code is identical to a function written in standard C using @void *@, yet the \CFA version is type-checked to ensure the fields of both pairs and the arguments to the comparison function match in type.
    483480
    484 Another useful pattern enabled by reused dtype-static type instantiations is zero-cost \emph{tag-structures}.
     481Another useful pattern enabled by reused dtype-static type instantiations is zero-cost \newterm{tag-structures}.
    485482Sometimes information is only used for type-checking and can be omitted at runtime, \eg:
    486483\begin{lstlisting}
     
    538535The addition of multiple-return-value functions (MRVF) are useless without a syntax for accepting multiple values at the call-site.
    539536The simplest mechanism for capturing the return values is variable assignment, allowing the values to be retrieved directly.
    540 As such, \CFA allows assigning multiple values from a function into multiple variables, using a square-bracketed list of lvalue expressions (as above), called a \emph{tuple}.
    541 
    542 However, functions also use \emph{composition} (nested calls), with the direct consequence that MRVFs must also support composition to be orthogonal with single-returning-value functions (SRVF), \eg:
     537As such, \CFA allows assigning multiple values from a function into multiple variables, using a square-bracketed list of lvalue expressions (as above), called a \newterm{tuple}.
     538
     539However, functions also use \newterm{composition} (nested calls), with the direct consequence that MRVFs must also support composition to be orthogonal with single-returning-value functions (SRVF), \eg:
    543540\begin{lstlisting}
    544541printf( "%d %d\n", div( 13, 5 ) );                      $\C{// return values seperated into arguments}$
     
    573570printf( "%d %d\n", qr );
    574571\end{lstlisting}
    575 \CFA also supports \emph{tuple indexing} to access single components of a tuple expression:
     572\CFA also supports \newterm{tuple indexing} to access single components of a tuple expression:
    576573\begin{lstlisting}
    577574[int, int] * p = &qr;                                           $\C{// tuple pointer}$
     
    616613\subsection{Tuple Assignment}
    617614
    618 An assignment where the left side is a tuple type is called \emph{tuple assignment}.
    619 There are two kinds of tuple assignment depending on whether the right side of the assignment operator has a tuple type or a non-tuple type, called \emph{multiple} and \emph{mass assignment}, respectively.
     615An assignment where the left side is a tuple type is called \newterm{tuple assignment}.
     616There are two kinds of tuple assignment depending on whether the right side of the assignment operator has a tuple type or a non-tuple type, called \newterm{multiple} and \newterm{mass assignment}, respectively.
    620617%\lstDeleteShortInline@%
    621618%\par\smallskip
     
    651648\subsection{Member Access}
    652649
    653 It is also possible to access multiple fields from a single expression using a \emph{member-access}.
     650It is also possible to access multiple fields from a single expression using a \newterm{member-access}.
    654651The result is a single tuple-valued expression whose type is the tuple of the types of the members, \eg:
    655652\begin{lstlisting}
     
    781778Matching against a @ttype@ parameter consumes all remaining argument components and packages them into a tuple, binding to the resulting tuple of types.
    782779In a given parameter list, there must be at most one @ttype@ parameter that occurs last, which matches normal variadic semantics, with a strong feeling of similarity to \CCeleven variadic templates.
    783 As such, @ttype@ variables are also called \emph{argument packs}.
     780As such, @ttype@ variables are also called \newterm{argument packs}.
    784781
    785782Like variadic templates, the main way to manipulate @ttype@ polymorphic functions is via recursion.
     
    853850\subsection{Implementation}
    854851
    855 Tuples are implemented in the \CFA translator via a transformation into \emph{generic types}.
     852Tuples are implemented in the \CFA translator via a transformation into \newterm{generic types}.
    856853For each $N$, the first time an $N$-tuple is seen in a scope a generic type with $N$ type parameters is generated, \eg:
    857854\begin{lstlisting}
     
    904901Similarly, tuple member expressions are recursively expanded into a list of member access expressions.
    905902
    906 Expressions that may contain side effects are made into \emph{unique expressions} before being expanded by the flattening conversion.
     903Expressions that may contain side effects are made into \newterm{unique expressions} before being expanded by the flattening conversion.
    907904Each unique expression is assigned an identifier and is guaranteed to be executed exactly once:
    908905\begin{lstlisting}
     
    10471044The implicit targets of the current @continue@ and @break@, \ie the closest enclosing loop or @switch@, change as certain constructs are added or removed.
    10481045
    1049 \TODO{choose and fallthrough here as well?}
    1050 
     1046\subsection{\texorpdfstring{Enhanced \LstKeywordStyle{switch} Statement}{Enhanced switch Statement}}
     1047
     1048\CFA also fixes a number of ergonomic defecits in the @switch@ statements of standard C.
     1049C can specify a number of equivalent cases by using the default ``fall-through'' semantics of @case@ clauses, \eg @case 1: case 2: case 3:@ -- this syntax is cluttered, however, so \CFA includes a more concise list syntax, @case 1, 2, 3:@.
     1050For contiguous ranges, \CFA provides an even more concise range syntax as well, @case 1~3:@; lists of ranges are also allowed in case selectors.
     1051
     1052Forgotten @break@ statements at the end of @switch@ cases are a persistent sort of programmer error in C, and the @break@ statements themselves introduce visual clutter and an un-C-like keyword-based block delimiter.
     1053\CFA addresses this error by introducing a @choose@ statement, which works identically to a @switch@ except that its default end-of-case behaviour is to break rather than to fall through for all non-empty cases.
     1054Since empty cases like @case 7:@ in @case 7: case 11:@ still have fall-through semantics and explicit @break@ is still allowed at the end of a @choose@ case, many idiomatic uses of @switch@ in standard C can be converted to @choose@ statements by simply changing the keyword.
     1055Where fall-through is desired for a non-empty case, it can be specified with the new @fallthrough@ statement, making @choose@ equivalently powerful to @switch@, but more concise in the common case where most non-empty cases end with a @break@ statement, as in the example below:
     1056
     1057\begin{cfa}
     1058choose( i ) {
     1059        case 2:
     1060                printf("even ");
     1061                fallthrough;
     1062        case 3: case 5: case 7:
     1063                printf("small prime\n");
     1064        case 4,6,8,9:
     1065                printf("small composite\n");
     1066        case 13~19:
     1067                printf("teen\n");
     1068        default:
     1069                printf("something else\n");
     1070}
     1071\end{cfa}
    10511072
    10521073\subsection{\texorpdfstring{\LstKeywordStyle{with} Clause / Statement}{with Clause / Statement}}
     
    10911112% In object-oriented programming, there is an implicit first parameter, often names @self@ or @this@, which is elided.
    10921113% In any programming language, some functions have a naturally close relationship with a particular data type.
    1093 % Object-oriented programming allows this close relationship to be codified in the language by making such functions \emph{class methods} of their related data type.
     1114% Object-oriented programming allows this close relationship to be codified in the language by making such functions \newterm{class methods} of their related data type.
    10941115% Class methods have certain privileges with respect to their associated data type, notably un-prefixed access to the fields of that data type.
    10951116% When writing C functions in an object-oriented style, this un-prefixed access is swiftly missed, as access to fields of a @Foo* f@ requires an extra three characters @f->@ every time, which disrupts coding flow and clutters the produced code.
     
    12051226C declaration syntax is notoriously confusing and error prone.
    12061227For example, many C programmers are confused by a declaration as simple as:
    1207 \begin{flushleft}
     1228\begin{cquote}
    12081229\lstDeleteShortInline@%
    12091230\begin{tabular}{@{}ll@{}}
     
    12151236\end{tabular}
    12161237\lstMakeShortInline@%
    1217 \end{flushleft}
     1238\end{cquote}
    12181239Is this an array of 5 pointers to integers or a pointer to an array of 5 integers?
    1219 The fact this declaration is unclear to many C programmers means there are productivity and safety issues even for basic programs.
     1240If there is any doubt, it implies productivity and safety issues even for basic programs.
    12201241Another example of confusion results from the fact that a routine name and its parameters are embedded within the return type, mimicking the way the return value is used at the routine's call site.
    12211242For example, a routine returning a pointer to an array of integers is defined and used in the following way:
     
    12311252In the following example, \R{red} is the base type and \B{blue} is qualifiers.
    12321253The \CFA declarations move the qualifiers to the left of the base type, \ie move the blue to the left of the red, while the qualifiers have the same meaning but are ordered left to right to specify a variable's type.
    1233 \begin{quote}
     1254\begin{cquote}
    12341255\lstDeleteShortInline@%
    12351256\lstset{moredelim=**[is][\color{blue}]{+}{+}}
     
    12491270\end{tabular}
    12501271\lstMakeShortInline@%
    1251 \end{quote}
     1272\end{cquote}
    12521273The only exception is bit field specification, which always appear to the right of the base type.
    1253 % Specifically, the character ©*© is used to indicate a pointer, square brackets ©[©\,©]© are used to represent an array or function return value, and parentheses ©()© are used to indicate a routine parameter.
     1274% Specifically, the character @*@ is used to indicate a pointer, square brackets @[@\,@]@ are used to represent an array or function return value, and parentheses @()@ are used to indicate a routine parameter.
    12541275However, unlike C, \CFA type declaration tokens are distributed across all variables in the declaration list.
    1255 For instance, variables ©x© and ©y© of type pointer to integer are defined in \CFA as follows:
    1256 \begin{quote}
     1276For instance, variables @x@ and @y@ of type pointer to integer are defined in \CFA as follows:
     1277\begin{cquote}
    12571278\lstDeleteShortInline@%
    12581279\begin{tabular}{@{}l@{\hspace{3em}}l@{}}
     
    12671288\end{tabular}
    12681289\lstMakeShortInline@%
    1269 \end{quote}
     1290\end{cquote}
    12701291The downside of this semantics is the need to separate regular and pointer declarations:
    1271 \begin{quote}
     1292\begin{cquote}
    12721293\lstDeleteShortInline@%
    12731294\begin{tabular}{@{}l@{\hspace{3em}}l@{}}
     
    12841305\end{tabular}
    12851306\lstMakeShortInline@%
    1286 \end{quote}
     1307\end{cquote}
    12871308which is prescribing a safety benefit.
    12881309Other examples are:
    1289 \begin{quote}
     1310\begin{cquote}
    12901311\lstDeleteShortInline@%
    12911312\begin{tabular}{@{}l@{\hspace{3em}}l@{\hspace{2em}}l@{}}
     
    13251346\end{tabular}
    13261347\lstMakeShortInline@%
    1327 \end{quote}
    1328 
    1329 All type qualifiers, \eg ©const©, ©volatile©, etc., are used in the normal way with the new declarations and also appear left to right, \eg:
    1330 \begin{quote}
     1348\end{cquote}
     1349
     1350All type qualifiers, \eg @const@, @volatile@, etc., are used in the normal way with the new declarations and also appear left to right, \eg:
     1351\begin{cquote}
    13311352\lstDeleteShortInline@%
    13321353\begin{tabular}{@{}l@{\hspace{1em}}l@{\hspace{1em}}l@{}}
     
    13481369\end{tabular}
    13491370\lstMakeShortInline@%
    1350 \end{quote}
    1351 All declaration qualifiers, \eg ©extern©, ©static©, etc., are used in the normal way with the new declarations but can only appear at the start of a \CFA routine declaration,\footnote{\label{StorageClassSpecifier}
     1371\end{cquote}
     1372All declaration qualifiers, \eg @extern@, @static@, etc., are used in the normal way with the new declarations but can only appear at the start of a \CFA routine declaration,\footnote{\label{StorageClassSpecifier}
    13521373The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature.~\cite[\S~6.11.5(1)]{C11}} \eg:
    1353 \begin{quote}
     1374\begin{cquote}
    13541375\lstDeleteShortInline@%
    13551376\begin{tabular}{@{}l@{\hspace{3em}}l@{\hspace{2em}}l@{}}
     
    13711392\end{tabular}
    13721393\lstMakeShortInline@%
    1373 \end{quote}
    1374 
    1375 The new declaration syntax can be used in other contexts where types are required, \eg casts and the pseudo-routine ©sizeof©:
    1376 \begin{quote}
     1394\end{cquote}
     1395
     1396The new declaration syntax can be used in other contexts where types are required, \eg casts and the pseudo-routine @sizeof@:
     1397\begin{cquote}
    13771398\lstDeleteShortInline@%
    13781399\begin{tabular}{@{}l@{\hspace{3em}}l@{}}
    13791400\multicolumn{1}{c@{\hspace{3em}}}{\textbf{\CFA}}        & \multicolumn{1}{c}{\textbf{C}}        \\
    13801401\begin{cfa}
    1381 y = (`* int`)x;
    1382 i = sizeof(`[ 5 ] * int`);
     1402y = (* int)x;
     1403i = sizeof([ 5 ] * int);
    13831404\end{cfa}
    13841405&
    13851406\begin{cfa}
    1386 y = (`int *`)x;
    1387 i = sizeof(`int * [ 5 ]`);
     1407y = (int *)x;
     1408i = sizeof(int * [ 5 ]);
    13881409\end{cfa}
    13891410\end{tabular}
    13901411\lstMakeShortInline@%
    1391 \end{quote}
     1412\end{cquote}
    13921413
    13931414Finally, new \CFA declarations may appear together with C declarations in the same program block, but cannot be mixed within a specific declaration.
    13941415Therefore, a programmer has the option of either continuing to use traditional C declarations or take advantage of the new style.
    1395 Clearly, both styles need to be supported for some time due to existing C-style header-files, particularly for UNIX systems.
     1416Clearly, both styles need to be supported for some time due to existing C-style header-files, particularly for UNIX-like systems.
    13961417
    13971418
    13981419\subsection{References}
    13991420
    1400 All variables in C have an \emph{address}, a \emph{value}, and a \emph{type}; at the position in the program's memory denoted by the address, there exists a sequence of bits (the value), with the length and semantic meaning of this bit sequence defined by the type.
    1401 The C type system does not always track the relationship between a value and its address; a value that does not have a corresponding address is called a \emph{rvalue} (for ``right-hand value''), while a value that does have an address is called a \emph{lvalue} (for ``left-hand value''); in @int x; x = 42;@ the variable expression @x@ on the left-hand-side of the assignment is a lvalue, while the constant expression @42@ on the right-hand-side of the assignment is a rvalue.
    1402 Which address a value is located at is sometimes significant; the imperative programming paradigm of C relies on the mutation of values at specific addresses.
    1403 Within a lexical scope, lvalue exressions can be used in either their \emph{address interpretation} to determine where a mutated value should be stored or in their \emph{value interpretation} to refer to their stored value; in @x = y;@ in @{ int x, y = 7; x = y; }@, @x@ is used in its address interpretation, while y is used in its value interpretation.
    1404 Though this duality of interpretation is useful, C lacks a direct mechanism to pass lvalues between contexts, instead relying on \emph{pointer types} to serve a similar purpose.
    1405 In C, for any type @T@ there is a pointer type @T*@, the value of which is the address of a value of type @T@; a pointer rvalue can be explicitly \emph{dereferenced} to the pointed-to lvalue with the dereference operator @*?@, while the rvalue representing the address of a lvalue can be obtained with the address-of operator @&?@.
     1421All variables in C have an \newterm{address}, a \newterm{value}, and a \newterm{type};
     1422at the position in the program's memory denoted by the address, there exists a sequence of bits (the value), with the length and semantic meaning of this bit sequence defined by the type.
     1423The C type-system does not always track the relationship between a value and its address;
     1424a value that does not have a corresponding address is called a \newterm{rvalue} (for ``right-hand value''), while a value that does have an address is called a \newterm{lvalue} (for ``left-hand value'').
     1425For example, in @int x; x = 42;@ the variable expression @x@ on the left-hand-side of the assignment is a lvalue, while the constant expression @42@ on the right-hand-side of the assignment is a rvalue.
     1426Despite the nomenclature of ``left-hand'' and ``right-hand'', an expression's classification as lvalue or rvalue is entirely dependent on whether it has an address or not; in imperative programming, the address of a value is used for both reading and writing (mutating) a value, and as such lvalues can be converted to rvalues and read from, but rvalues cannot be mutated because they lack a location to store the updated value.
     1427
     1428Within a lexical scope, lvalue expressions have an \newterm{address interpretation} for writing a value or a \newterm{value interpretation} to read a value.
     1429For example, in @x = y@, @x@ has an address interpretation, while @y@ has a value interpretation.
     1430Though this duality of interpretation is useful, C lacks a direct mechanism to pass lvalues between contexts, instead relying on \newterm{pointer types} to serve a similar purpose.
     1431In C, for any type @T@ there is a pointer type @T *@, the value of which is the address of a value of type @T@.
     1432A pointer rvalue can be explicitly \newterm{dereferenced} to the pointed-to lvalue with the dereference operator @*?@, while the rvalue representing the address of a lvalue can be obtained with the address-of operator @&?@.
    14061433
    14071434\begin{cfa}
    14081435int x = 1, y = 2, * p1, * p2, ** p3;
    1409 p1 = &x;  $\C{// p1 points to x}$
    1410 p2 = &y;  $\C{// p2 points to y}$
    1411 p3 = &p1;  $\C{// p3 points to p1}$
     1436p1 = &x;                                                                $\C{// p1 points to x}$
     1437p2 = &y;                                                                $\C{// p2 points to y}$
     1438p3 = &p1;                                                               $\C{// p3 points to p1}$
    14121439*p2 = ((*p1 + *p2) * (**p3 - *p1)) / (**p3 - 15);
    14131440\end{cfa}
     
    14151442Unfortunately, the dereference and address-of operators introduce a great deal of syntactic noise when dealing with pointed-to values rather than pointers, as well as the potential for subtle bugs.
    14161443For both brevity and clarity, it would be desirable to have the compiler figure out how to elide the dereference operators in a complex expression such as the assignment to @*p2@ above.
    1417 However, since C defines a number of forms of \emph{pointer arithmetic}, two similar expressions involving pointers to arithmetic types (\eg @*p1 + x@ and @p1 + x@) may each have well-defined but distinct semantics, introducing the possibility that a user programmer may write one when they mean the other, and precluding any simple algorithm for elision of dereference operators.
     1444However, since C defines a number of forms of \newterm{pointer arithmetic}, two similar expressions involving pointers to arithmetic types (\eg @*p1 + x@ and @p1 + x@) may each have well-defined but distinct semantics, introducing the possibility that a user programmer may write one when they mean the other, and precluding any simple algorithm for elision of dereference operators.
    14181445To solve these problems, \CFA introduces reference types @T&@; a @T&@ has exactly the same value as a @T*@, but where the @T*@ takes the address interpretation by default, a @T&@ takes the value interpretation by default, as below:
    14191446
    14201447\begin{cfa}
    1421 inx x = 1, y = 2, & r1, & r2, && r3;
     1448int x = 1, y = 2, & r1, & r2, && r3;
    14221449&r1 = &x;  $\C{// r1 points to x}$
    14231450&r2 = &y;  $\C{// r2 points to y}$
     
    14411468This allows \CFA references to be default-initialized (\eg to a null pointer), and also to point to different addresses throughout their lifetime.
    14421469This rebinding is accomplished without adding any new syntax to \CFA, but simply by extending the existing semantics of the address-of operator in C.
     1470
    14431471In C, the address of a lvalue is always a rvalue, as in general that address is not stored anywhere in memory, and does not itself have an address.
    14441472In \CFA, the address of a @T&@ is a lvalue @T*@, as the address of the underlying @T@ is stored in the reference, and can thus be mutated there.
     
    14541482        if @L@ is an lvalue of type {@T &@$_1 \cdots$@ &@$_l$} where $l \ge 0$ references (@&@ symbols) then @&L@ has type {@T `*`&@$_{\color{red}1} \cdots$@ &@$_{\color{red}l}$}, \\ \ie @T@ pointer with $l$ references (@&@ symbols).
    14551483\end{itemize}
    1456 
    14571484Since pointers and references share the same internal representation, code using either is equally performant; in fact the \CFA compiler converts references to pointers internally, and the choice between them in user code can be made based solely on convenience.
    1458 By analogy to pointers, \CFA references also allow cv-qualifiers:
     1485
     1486By analogy to pointers, \CFA references also allow cv-qualifiers such as @const@:
    14591487
    14601488\begin{cfa}
     
    14741502
    14751503More generally, this initialization of references from lvalues rather than pointers is an instance of a ``lvalue-to-reference'' conversion rather than an elision of the address-of operator; this conversion can actually be used in any context in \CFA an implicit conversion would be allowed.
    1476 Similarly, use of a the value pointed to by a reference in an rvalue context can be thought of as a ``reference-to-rvalue'' conversion, and \CFA also includes a qualifier-adding ``reference-to-reference'' conversion, analagous to the @T *@ to @const T *@ conversion in standard C.
     1504Similarly, use of a the value pointed to by a reference in an rvalue context can be thought of as a ``reference-to-rvalue'' conversion, and \CFA also includes a qualifier-adding ``reference-to-reference'' conversion, analogous to the @T *@ to @const T *@ conversion in standard C.
    14771505The final reference conversion included in \CFA is ``rvalue-to-reference'' conversion, implemented by means of an implicit temporary.
    14781506When an rvalue is used to initialize a reference, it is instead used to initialize a hidden temporary value with the same lexical scope as the reference, and the reference is initialized to the address of this temporary.
    14791507This allows complex values to be succinctly and efficiently passed to functions, without the syntactic overhead of explicit definition of a temporary variable or the runtime cost of pass-by-value.
    1480 \CC allows a similar binding, but only for @const@ references; the more general semantics of \CFA are an attempt to avoid the \emph{const hell} problem, in which addition of a @const@ qualifier to one reference requires a cascading chain of added qualifiers.
     1508\CC allows a similar binding, but only for @const@ references; the more general semantics of \CFA are an attempt to avoid the \newterm{const hell} problem, in which addition of a @const@ qualifier to one reference requires a cascading chain of added qualifiers.
     1509
    14811510
    14821511\subsection{Constructors and Destructors}
     
    14841513One of the strengths of C is the control over memory management it gives programmers, allowing resource release to be more consistent and precisely timed than is possible with garbage-collected memory management.
    14851514However, this manual approach to memory management is often verbose, and it is useful to manage resources other than memory (\eg file handles) using the same mechanism as memory.
    1486 \CC is well-known for an approach to manual memory management that addresses both these issues, Resource Aquisition Is Initialization (RAII), implemented by means of special \emph{constructor} and \emph{destructor} functions; we have implemented a similar feature in \CFA.
     1515\CC is well-known for an approach to manual memory management that addresses both these issues, Resource Aquisition Is Initialization (RAII), implemented by means of special \newterm{constructor} and \newterm{destructor} functions; we have implemented a similar feature in \CFA.
    14871516While RAII is a common feature of object-oriented programming languages, its inclusion in \CFA does not violate the design principle that \CFA retain the same procedural paradigm as C.
    14881517In particular, \CFA does not implement class-based encapsulation: neither the constructor nor any other function has privileged access to the implementation details of a type, except through the translation-unit-scope method of opaque structs provided by C.
     
    15161545\end{cfa}
    15171546
    1518 In the example above, a \emph{default constructor} (\ie one with no parameters besides the @this@ parameter) and destructor are defined for the @Array@ struct, a dynamic array of @int@.
    1519 @Array@ is an example of a \emph{managed type} in \CFA, a type with a non-trivial constructor or destructor, or with a field of a managed type.
     1547In the example above, a \newterm{default constructor} (\ie one with no parameters besides the @this@ parameter) and destructor are defined for the @Array@ struct, a dynamic array of @int@.
     1548@Array@ is an example of a \newterm{managed type} in \CFA, a type with a non-trivial constructor or destructor, or with a field of a managed type.
    15201549As in the example, all instances of managed types are implicitly constructed upon allocation, and destructed upon deallocation; this ensures proper initialization and cleanup of resources contained in managed types, in this case the @data@ array on the heap.
    15211550The exact details of the placement of these implicit constructor and destructor calls are omitted here for brevity, the interested reader should consult \cite{Schluntz17}.
    15221551
    15231552Constructor calls are intended to seamlessly integrate with existing C initialization syntax, providing a simple and familiar syntax to veteran C programmers and allowing constructor calls to be inserted into legacy C code with minimal code changes.
    1524 As such, \CFA also provides syntax for \emph{copy initialization} and \emph{initialization parameters}:
     1553As such, \CFA also provides syntax for \newterm{copy initialization} and \newterm{initialization parameters}:
    15251554
    15261555\begin{cfa}
     
    15371566In addition to initialization syntax, \CFA provides two ways to explicitly call constructors and destructors.
    15381567Explicit calls to constructors double as a placement syntax, useful for construction of member fields in user-defined constructors and reuse of large storage allocations.
    1539 While the existing function-call syntax works for explicit calls to constructors and destructors, \CFA also provides a more concise \emph{operator syntax} for both:
     1568While the existing function-call syntax works for explicit calls to constructors and destructors, \CFA also provides a more concise \newterm{operator syntax} for both:
    15401569
    15411570\begin{cfa}
     
    15541583For compatibility with C, a copy constructor from the first union member type is also defined.
    15551584For @struct@ types, each of the four functions are implicitly defined to call their corresponding functions on each member of the struct.
    1556 To better simulate the behaviour of C initializers, a set of \emph{field constructors} is also generated for structures.
     1585To better simulate the behaviour of C initializers, a set of \newterm{field constructors} is also generated for structures.
    15571586A constructor is generated for each non-empty prefix of a structure's member-list which copy-constructs the members passed as parameters and default-constructs the remaining members.
    15581587To allow users to limit the set of constructors available for a type, when a user declares any constructor or destructor, the corresponding generated function and all field constructors for that type are hidden from expression resolution; similarly, the generated default constructor is hidden upon declaration of any constructor.
     
    15601589
    15611590In rare situations user programmers may not wish to have constructors and destructors called; in these cases, \CFA provides an ``escape hatch'' to not call them.
    1562 If a variable is initialized using the syntax \lstinline|S x @= {}| it will be an \emph{unmanaged object}, and will not have constructors or destructors called.
     1591If a variable is initialized using the syntax \lstinline|S x @= {}| it will be an \newterm{unmanaged object}, and will not have constructors or destructors called.
    15631592Any C initializer can be the right-hand side of an \lstinline|@=| initializer, \eg  \lstinline|Array a @= { 0, 0x0 }|, with the usual C initialization semantics.
    15641593In addition to the expressive power, \lstinline|@=| provides a simple path for migrating legacy C code to \CFA, by providing a mechanism to incrementally convert initializers; the \CFA design team decided to introduce a new syntax for this escape hatch because we believe that our RAII implementation will handle the vast majority of code in a desirable way, and we wished to maintain familiar syntax for this common case.
     
    15691598\section{Literals}
    15701599
     1600C already includes limited polymorphism for literals -- @0@ can be either an integer or a pointer literal, depending on context, while the syntactic forms of literals of the various integer and floating-point types are very similar, differing from each other only in suffix.
     1601In keeping with the general \CFA approach of adding features while respecting ``the C way'' of doing things, we have extended both C's polymorphic zero and typed literal syntax to interoperate with user-defined types, while maintaining a backwards-compatible semantics.
    15711602
    15721603\subsection{0/1}
    15731604
    1574 \TODO{Some text already at the end of Section~\ref{sec:poly-fns}}
    1575 
     1605In C, @0@ has the special property that it is the only ``false'' value; by the standard, any value which compares equal to @0@ is false, while any value that compares unequal to @0@ is true.
     1606As such, an expression @x@ in any boolean context (such as the condition of an @if@ or @while@ statement, or the arguments to an @&&@, @||@, or ternary operator) can be rewritten as @x != 0@ without changing its semantics.
     1607The operator overloading feature of \CFA provides a natural means to implement this truth value comparison for arbitrary types, but the C type system is not precise enough to distinguish an equality comparison with @0@ from an equality comparison with an arbitrary integer or pointer.
     1608To provide this precision, \CFA introduces a new type @zero_t@ as type type of literal @0@ (somewhat analagous to @nullptr_t@ and @nullptr@ in \CCeleven); @zero_t@ can only take the value @0@, but has implicit conversions to the integer and pointer types so that standard C code involving @0@ continues to work properly.
     1609With this addition, the \CFA compiler rewrites @if (x)@ and similar expressions to @if ((x) != 0)@ or the appropriate analogue, and any type @T@ can be made ``truthy'' by defining an operator overload @int ?!=?(T, zero_t)@.
     1610\CC makes types truthy by adding a conversion to @bool@; prior to the addition of explicit cast operators in \CCeleven this approach had the pitfall of making truthy types transitively convertable to any numeric type; our design for \CFA avoids this issue.
     1611
     1612\CFA also includes a special type for @1@, @one_t@; like @zero_t@, @one_t@ has built-in implicit conversions to the various integral types so that @1@ maintains its expected semantics in legacy code.
     1613The addition of @one_t@ allows generic algorithms to handle the unit value uniformly for types where that is meaningful.
     1614\TODO{Make this sentence true} In particular, polymorphic functions in the \CFA prelude define @++x@ and @x++@ in terms of @x += 1@, allowing users to idiomatically define all forms of increment for a type @T@ by defining the single function @T& ?+=(T&, one_t)@; analogous overloads for the decrement operators are present as well.
    15761615
    15771616\subsection{Units}
     
    16021641\end{cfa}
    16031642}%
    1604 
    16051643
    16061644\section{Evaluation}
     
    17781816Finally, we demonstrate that \CFA performance for some idiomatic cases is better than C and close to \CC, showing the design is practically applicable.
    17791817
    1780 There is ongoing work on a wide range of \CFA feature extensions, including reference types, arrays with size, exceptions, concurrent primitives and modules.
     1818There is ongoing work on a wide range of \CFA feature extensions, including arrays with size, exceptions, concurrent primitives, modules, and user-defined conversions.
    17811819(While all examples in the paper compile and run, a public beta-release of \CFA will take another 8--12 months to finalize these additional extensions.)
    17821820In addition, there are interesting future directions for the polymorphism design.
Note: See TracChangeset for help on using the changeset viewer.