Changeset e869e434


Ignore:
Timestamp:
Apr 12, 2017, 3:57:53 PM (4 years ago)
Author:
Aaron Moss <a3moss@…>
Branches:
aaron-thesis, arm-eh, cleanup-dtors, deferred_resn, demangler, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-env, no_list, persistent-indexer, resolv-new, with_gc
Children:
ff178ee
Parents:
b14dd03 (diff), 0eb18557 (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the (diff) links above to see all the changes relative to each parent.
Message:

Merge branch 'master' of plg.uwaterloo.ca:software/cfa/cfa-cc

Location:
doc
Files:
9 edited

Legend:

Unmodified
Added
Removed
  • doc/LaTeXmacros/common.tex

    rb14dd03 re869e434  
     1
    12%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -*- Mode: Latex -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%
    23%%
     
    1112%% Created On       : Sat Apr  9 10:06:17 2016
    1213%% Last Modified By : Peter A. Buhr
    13 %% Last Modified On : Wed Apr  5 23:19:42 2017
    14 %% Update Count     : 255
     14%% Last Modified On : Wed Apr 12 11:32:26 2017
     15%% Update Count     : 257
    1516%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    1617
     
    4445\newcommand{\CCtwenty}{\rm C\kern-.1em\hbox{+\kern-.25em+}20} % C++20 symbolic name
    4546\newcommand{\Celeven}{C11\xspace}               % C11 symbolic name
    46 \newcommand{\Csharp}{C\raisebox{0.4ex}{\#}\xspace}      % C# symbolic name
     47\newcommand{\Csharp}{C\raisebox{-0.65ex}{\large$^\sharp$}\xspace}       % C# symbolic name
    4748
    4849%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  • doc/generic_types/generic_types.tex

    rb14dd03 re869e434  
    1919\newcommand{\C}[2][\@empty]{\ifx#1\@empty\else\global\setlength{\columnposn}{#1}\global\columnposn=\columnposn\fi\hfill\makebox[\textwidth-\columnposn][l]{\lst@commentstyle{#2}}}
    2020\newcommand{\CRT}{\global\columnposn=\gcolumnposn}
     21
     22\newcommand{\TODO}[1]{\textbf{TODO}: {\itshape #1}} % TODO included
     23%\newcommand{\TODO}[1]{} % TODO elided
     24% Latin abbreviation
     25\newcommand{\abbrevFont}{\textit}       % set empty for no italics
     26\newcommand*{\eg}{%
     27        \@ifnextchar{,}{\abbrevFont{e}.\abbrevFont{g}.}%
     28                {\@ifnextchar{:}{\abbrevFont{e}.\abbrevFont{g}.}%
     29                        {\abbrevFont{e}.\abbrevFont{g}.,\xspace}}%
     30}%
     31\newcommand*{\ie}{%
     32        \@ifnextchar{,}{\abbrevFont{i}.\abbrevFont{e}.}%
     33                {\@ifnextchar{:}{\abbrevFont{i}.\abbrevFont{e}.}%
     34                        {\abbrevFont{i}.\abbrevFont{e}.,\xspace}}%
     35}%
     36\newcommand*{\etc}{%
     37        \@ifnextchar{.}{\abbrevFont{etc}}%
     38        {\abbrevFont{etc}.\xspace}%
     39}%
     40\newcommand{\etal}{%
     41        \@ifnextchar{.}{\abbrevFont{et~al}}%
     42                {\abbrevFont{et al}.\xspace}%
     43}%
     44% \newcommand{\eg}{\textit{e}.\textit{g}.,\xspace}
     45% \newcommand{\ie}{\textit{i}.\textit{e}.,\xspace}
     46% \newcommand{\etc}{\textit{etc}.,\xspace}
    2147\makeatother
    2248
     
    3056\newcommand{\CS}{C\raisebox{-0.7ex}{\Large$^\sharp$}\xspace}
    3157\newcommand{\Textbf}[1]{{\color{red}\textbf{#1}}}
    32 
    33 \newcommand{\TODO}[1]{\textbf{TODO}: {\itshape #1}} % TODO included
    34 %\newcommand{\TODO}[1]{} % TODO elided
    35 \newcommand{\eg}{\textit{e}.\textit{g}.,\xspace}
    36 \newcommand{\ie}{\textit{i}.\textit{e}.,\xspace}
    37 \newcommand{\etc}{\textit{etc}.,\xspace}
    3858
    3959% CFA programming language, based on ANSI C (with some gcc additions)
     
    137157                & 2017  & 2012  & 2007  & 2002  & 1997  & 1992  & 1987          \\
    138158\hline
    139 Java    & 1             & 1             & 1             & 3             & 13    & -             & -                     \\
     159Java    & 1             & 1             & 1             & 1             & 12    & -             & -                     \\
    140160\hline
    141 \Textbf{C}      & \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{1}& \Textbf{1}& \Textbf{1}& \Textbf{1}    \\
     161\Textbf{C}      & \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{2}& \Textbf{1}& \Textbf{1}& \Textbf{1}    \\
    142162\hline
    143163\CC             & 3             & 3             & 3             & 3             & 2             & 2             & 4                     \\
     
    155175(4) Extensions introduced by \CFA must be translated in the most efficient way possible.
    156176These goals ensure existing C code-bases can be converted to \CFA incrementally with minimal effort, and C programmers can productively generate \CFA code without training beyond the features being used.
    157 We claim \CC is diverging from C, and hence, incremental additions of language features require significant effort and training, while suffering from historically poor design choices.
     177Unfortunately, \CC is actively diverging from C, so incremental additions require significant effort and training, coupled with multiple legacy design-choices that cannot be updated.
    158178
    159179\CFA is currently implemented as a source-to-source translator from \CFA to the GCC-dialect of C~\citep{GCCExtensions}, allowing it to leverage the portability and code optimizations provided by GCC, meeting goals (1)-(3). Ultimately, a compiler is necessary for advanced features and optimal performance.
     
    179199int val = twice( twice( 3.7 ) );
    180200\end{lstlisting}
    181 which works for any type @T@ with a matching addition operator. The polymorphism is achieved by creating a wrapper function for calling @+@ with @T@ bound to @double@, then passing this function to the first call of @twice@. There is now the option of using the same @twice@ and converting the result to @int@ on assignment, or creating another @twice@ with type parameter @T@ bound to @int@ because \CFA uses the return type (as in~\cite{Ada}) in its type analysis. The first approach has a late conversion from @int@ to @double@ on the final assignment, while the second has an eager conversion to @int@. \CFA minimizes the number of conversions and their potential to lose information, so it selects the first approach, which corresponds with C-programmer intuition.
     201which works for any type @T@ with a matching addition operator. The polymorphism is achieved by creating a wrapper function for calling @+@ with @T@ bound to @double@, then passing this function to the first call of @twice@. There is now the option of using the same @twice@ and converting the result to @int@ on assignment, or creating another @twice@ with type parameter @T@ bound to @int@ because \CFA uses the return type, as in~\cite{Ada}, in its type analysis.
     202The first approach has a late conversion from @double@ to @int@ on the final assignment, while the second has an eager conversion to @int@. \CFA minimizes the number of conversions and their potential to lose information, so it selects the first approach, which corresponds with C-programmer intuition.
    182203
    183204Crucial to the design of a new programming language are the libraries to access thousands of external software features.
     
    186207\begin{lstlisting}
    187208void * bsearch( const void * key, const void * base, size_t nmemb, size_t size,
    188                                 int (* compar)(const void *, const void *));
     209                                int (* compar)( const void *, const void * ));
    189210int comp( const void * t1, const void * t2 ) { return *(double *)t1 < *(double *)t2 ? -1 :
    190211                                *(double *)t2 < *(double *)t1 ? 1 : 0; }
     
    204225int posn = bsearch( 5.0, vals, 10 );
    205226\end{lstlisting}
    206 The nested routine @comp@ (impossible in \CC as lambdas do not use C calling conventions) provides the hidden interface from typed \CFA to untyped (@void *@) C, plus the cast of the result.
     227The nested function @comp@ provides the hidden interface from typed \CFA to untyped (@void *@) C, plus the cast of the result.
     228Providing a hidden @comp@ function in \CC is awkward as lambdas do not use C calling-conventions and template declarations cannot appear at block scope.
    207229As well, an alternate kind of return is made available: position versus pointer to found element.
    208230\CC's type-system cannot disambiguate between the two versions of @bsearch@ because it does not use the return type in overload resolution, nor can \CC separately compile a templated @bsearch@.
     
    248270Hence, the single name @MAX@ replaces all the C type-specific names: @SHRT_MAX@, @INT_MAX@, @DBL_MAX@.
    249271As well, restricted constant overloading is allowed for the values @0@ and @1@, which have special status in C, \eg the value @0@ is both an integer and a pointer literal, so its meaning depends on context.
    250 In addition, several operations are defined in terms values @0@ and @1@.
    251 For example,
     272In addition, several operations are defined in terms values @0@ and @1@, \eg:
    252273\begin{lstlisting}
    253274int x;
     
    275296        return total; }
    276297\end{lstlisting}
    277 A trait name plays no part in type equivalence; it is solely a macro for a list of assertions.
    278 Traits may overlap assertions without conflict, and therefore, do not form a hierarchy.
    279 
    280 In fact, the set of operators is incomplete, \eg no assignment, but @otype@ is syntactic sugar for the following implicit trait:
     298
     299In fact, the set of trait operators is incomplete, as there is no assignment requirement for type @T@, but @otype@ is syntactic sugar for the following implicit trait:
    281300\begin{lstlisting}
    282301trait otype( dtype T | sized(T) ) {  // sized is a pseudo-trait for types with known size and alignment
     
    308327% \end{lstlisting}
    309328
    310 Traits may be used for many of the same purposes as interfaces in Java or abstract base classes in \CC. Unlike Java interfaces or \CC base classes, \CFA types do not explicitly state any inheritance relationship to traits they satisfy, which is a form of structural inheritance, similar to the implementation of an interface in Go~\citep{Go}, as opposed to the nominal inheritance model of Java and \CC.
    311 
    312 Nominal inheritance can be simulated with traits using marker variables or functions:
    313 \begin{lstlisting}
    314 trait nominal(otype T) {
    315     T is_nominal;
     329In summation, the \CFA type-system uses \emph{nominal typing} for concrete types, matching with the C type-system, and \emph{structural typing} for polymorphic types.
     330Hence, trait names play no part in type equivalence;
     331the names are simply macros for a list of polymorphic assertions, which are expanded at usage sites.
     332Nevertheless, trait names form a logical subtype-hierarchy with @dtype@ at the top, where traits often contain overlapping assertions, \eg operator @+@.
     333Traits are used like interfaces in Java or abstract base-classes in \CC, but without the nominal inheritance-relationships.
     334Instead, each polymorphic function (or generic type) defines the structural type needed for its execution (polymorphic type-key), and this key is fulfilled at each call site from the lexical environment, which is similar to Go~\citep{Go} interfaces.
     335Hence, new lexical scopes and nested functions are used extensively to create local subtypes, as in the @qsort@ example, without having to manage a nominal-inheritance hierarchy.
     336(Nominal inheritance can be approximated with traits using marker variables or functions, as is done in Go.)
     337
     338% Nominal inheritance can be simulated with traits using marker variables or functions:
     339% \begin{lstlisting}
     340% trait nominal(otype T) {
     341%     T is_nominal;
     342% };
     343% int is_nominal;                                                               $\C{// int now satisfies the nominal trait}$
     344% \end{lstlisting}
     345%
     346% Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship \emph{among} multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems:
     347% \begin{lstlisting}
     348% trait pointer_like(otype Ptr, otype El) {
     349%     lvalue El *?(Ptr);                                                $\C{// Ptr can be dereferenced into a modifiable value of type El}$
     350% }
     351% struct list {
     352%     int value;
     353%     list *next;                                                               $\C{// may omit "struct" on type names as in \CC}$
     354% };
     355% typedef list *list_iterator;
     356%
     357% lvalue int *?( list_iterator it ) { return it->value; }
     358% \end{lstlisting}
     359% In the example above, @(list_iterator, int)@ satisfies @pointer_like@ by the user-defined dereference function, and @(list_iterator, list)@ also satisfies @pointer_like@ by the built-in dereference operator for pointers. Given a declaration @list_iterator it@, @*it@ can be either an @int@ or a @list@, with the meaning disambiguated by context (\eg @int x = *it;@ interprets @*it@ as an @int@, while @(*it).value = 42;@ interprets @*it@ as a @list@).
     360% While a nominal-inheritance system with associated types could model one of those two relationships by making @El@ an associated type of @Ptr@ in the @pointer_like@ implementation, few such systems could model both relationships simultaneously.
     361
     362
     363\section{Generic Types}
     364
     365One of the known shortcomings of standard C is that it does not provide reusable type-safe abstractions for generic data structures and algorithms. Broadly speaking, there are three approaches to create data structures in C. One approach is to write bespoke data structures for each context in which they are needed. While this approach is flexible and supports integration with the C type-checker and tooling, it is also tedious and error-prone, especially for more complex data structures.
     366A second approach is to use @void *@--based polymorphism, \eg the C standard-library functions @bsearch@ and @qsort@, and does allow the use of common code for common functionality. However, basing all polymorphism on @void *@ eliminates the type-checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that would not otherwise be needed.
     367A third approach to generic code is to use preprocessor macros, which does allow the generated code to be both generic and type-checked, but errors may be difficult to interpret. Furthermore, writing and using preprocessor macros can be unnatural and inflexible.
     368
     369Other languages use \emph{generic types}, \eg \CC and Java, to produce type-safe abstract data-types. \CFA also implements generic types that integrate efficiently and naturally with the existing polymorphic functions, while retaining backwards compatibility with C and providing separate compilation. However, for known concrete parameters, the generic type can be inlined, like \CC templates.
     370
     371A generic type can be declared by placing a @forall@ specifier on a @struct@ or @union@ declaration, and instantiated using a parenthesized list of types after the type name:
     372\begin{lstlisting}
     373forall( otype R, otype S ) struct pair {
     374        R first;
     375        S second;
    316376};
    317 int is_nominal;                                                         $\C{// int now satisfies the nominal trait}$
    318 \end{lstlisting}
    319 
    320 Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship \emph{among} multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems:
    321 \begin{lstlisting}
    322 trait pointer_like(otype Ptr, otype El) {
    323     lvalue El *?(Ptr);                                          $\C{// Ptr can be dereferenced into a modifiable value of type El}$
    324 }
    325 struct list {
    326     int value;
    327     list *next;                                                         $\C{// may omit "struct" on type names as in \CC}$
    328 };
    329 typedef list *list_iterator;
    330 
    331 lvalue int *?( list_iterator it ) { return it->value; }
    332 \end{lstlisting}
    333 
    334 In the example above, @(list_iterator, int)@ satisfies @pointer_like@ by the user-defined dereference function, and @(list_iterator, list)@ also satisfies @pointer_like@ by the built-in dereference operator for pointers. Given a declaration @list_iterator it@, @*it@ can be either an @int@ or a @list@, with the meaning disambiguated by context (\eg @int x = *it;@ interprets @*it@ as an @int@, while @(*it).value = 42;@ interprets @*it@ as a @list@).
    335 While a nominal-inheritance system with associated types could model one of those two relationships by making @El@ an associated type of @Ptr@ in the @pointer_like@ implementation, few such systems could model both relationships simultaneously.
    336 
    337 \section{Generic Types}
    338 
    339 One of the known shortcomings of standard C is that it does not provide reusable type-safe abstractions for generic data structures and algorithms. Broadly speaking, there are three approaches to create data structures in C. One approach is to write bespoke data structures for each context in which they are needed. While this approach is flexible and supports integration with the C type-checker and tooling, it is also tedious and error-prone, especially for more complex data structures. A second approach is to use @void*@-based polymorphism. This approach is taken by the C standard library functions @qsort@ and @bsearch@, and does allow the use of common code for common functionality. However, basing all polymorphism on @void*@ eliminates the type-checker's ability to ensure that argument types are properly matched, often requires a number of extra function parameters, and also adds pointer indirection and dynamic allocation to algorithms and data structures that would not otherwise require them. A third approach to generic code is to use pre-processor macros to generate it -- this approach does allow the generated code to be both generic and type-checked, though any errors produced may be difficult to interpret. Furthermore, writing and invoking C code as preprocessor macros is unnatural and somewhat inflexible.
    340 
    341 Other C-like languages such as \CC and Java use \emph{generic types} to produce type-safe abstract data types. \CFA implements generic types with some care taken that the generic types design for \CFA integrates efficiently and naturally with the existing polymorphic functions in \CFA while retaining backwards compatibility with C; maintaining separate compilation is a particularly important constraint on the design. However, where the concrete parameters of the generic type are known, there is no extra overhead for the use of a generic type, as for \CC templates.
    342 
    343 A generic type can be declared by placing a @forall@ specifier on a @struct@ or @union@ declaration, and instantiated using a parenthesized list of types after the type name:
    344 \begin{lstlisting}
    345 forall(otype R, otype S) struct pair {
    346     R first;
    347     S second;
    348 };
    349 
    350 forall(otype T)
    351 T value( pair(const char*, T) p ) { return p.second; }
    352 
    353 forall(dtype F, otype T)
    354 T value_p( pair(F*, T*) p ) { return *p.second; }
    355 
    356 pair(const char*, int) p = { "magic", 42 };
     377forall( otype T ) T value( pair( const char *, T ) p ) { return p.second; }
     378forall( dtype F, otype T ) T value_p( pair( F *, T * ) p ) { return *p.second; }
     379
     380pair( const char *, int ) p = { "magic", 42 };
    357381int magic = value( p );
    358 
    359 pair(void*, int*) q = { 0, &p.second };
     382pair( void *, int * ) q = { 0, &p.second };
    360383magic = value_p( q );
    361384double d = 1.0;
    362 pair(double*, double*) r = { &d, &d };
     385pair( double *, double * ) r = { &d, &d };
    363386d = value_p( r );
    364387\end{lstlisting}
    365388
    366 \CFA classifies generic types as either \emph{concrete} or \emph{dynamic}. Concrete generic types have a fixed memory layout regardless of type parameters, while dynamic generic types vary in their in-memory layout depending on their type parameters. A type may have polymorphic parameters but still be concrete; in \CFA such types are called \emph{dtype-static}. Polymorphic pointers are an example of dtype-static types -- @forall(dtype T) T*@ is a polymorphic type, but for any @T@ chosen, @T*@ has exactly the same in-memory representation as a @void*@, and can therefore be represented by a @void*@ in code generation.
    367 
    368 \CFA generic types may also specify constraints on their argument type to be checked by the compiler. For example, consider the following declaration of a sorted set-type, which ensures that the set key supports equality and relational comparison:
    369 \begin{lstlisting}
    370 forall(otype Key | { _Bool ?==?(Key, Key); _Bool ?<?(Key, Key); })
    371   struct sorted_set;
    372 \end{lstlisting}
    373 
    374 \subsection{Concrete Generic Types}
    375 
    376 The \CFA translator instantiates concrete generic types by template-expanding them to fresh struct types; concrete generic types can therefore be used with zero runtime overhead. To enable inter-operation among equivalent instantiations of a generic type, the translator saves the set of instantiations currently in scope and reuses the generated struct declarations where appropriate. For example, a function declaration that accepts or returns a concrete generic type produces a declaration for the instantiated struct in the same scope, which all callers that can see that declaration may reuse. As an example of the expansion, the concrete instantiation for @pair(const char*, int)@ looks like this:
     389\CFA classifies generic types as either \emph{concrete} or \emph{dynamic}. Concrete have a fixed memory layout regardless of type parameters, while dynamic vary in memory layout depending on their type parameters. A type may have polymorphic parameters but still be concrete, called \emph{dtype-static}. Polymorphic pointers are an example of dtype-static types, \eg @forall(dtype T) T *@ is a polymorphic type, but for any @T@, @T *@  is a fixed-sized pointer, and therefore, can be represented by a @void *@ in code generation.
     390
     391\CFA generic types also allow checked argument-constraints. For example, the following declaration of a sorted set-type ensures the set key supports equality and relational comparison:
     392\begin{lstlisting}
     393forall( otype Key | { _Bool ?==?(Key, Key); _Bool ?<?(Key, Key); } ) struct sorted_set;
     394\end{lstlisting}
     395
     396
     397\subsection{Concrete Generic-Types}
     398
     399The \CFA translator template-expands concrete generic-types into new structure types, affording maximal inlining. To enable inter-operation among equivalent instantiations of a generic type, the translator saves the set of instantiations currently in scope and reuses the generated structure declarations where appropriate. For example, a function declaration that accepts or returns a concrete generic-type produces a declaration for the instantiated struct in the same scope, which all callers may reuse. For example, the concrete instantiation for @pair( const char *, int )@ is:
    377400\begin{lstlisting}
    378401struct _pair_conc1 {
    379         const char* first;
     402        const char * first;
    380403        int second;
    381404};
    382405\end{lstlisting}
    383406
    384 A concrete generic type with dtype-static parameters is also expanded to a struct type, but this struct type is used for all matching instantiations. In the example above, the @pair(F*, T*)@ parameter to @value_p@ is such a type; its expansion looks something like this, and is used as the type of the variables @q@ and @r@ as well, with casts for member access where appropriate:
     407A concrete generic-type with dtype-static parameters is also expanded to a structure type, but this type is used for all matching instantiations. In the above example, the @pair( F *, T * )@ parameter to @value_p@ is such a type; its expansion is below and it is used as the type of the variables @q@ and @r@ as well, with casts for member access where appropriate:
    385408\begin{lstlisting}
    386409struct _pair_conc0 {
    387         void* first;
    388         void* second;
     410        void * first;
     411        void * second;
    389412};
    390413\end{lstlisting}
    391414
    392415
    393 \subsection{Dynamic Generic Types}
    394 
    395 Though \CFA implements concrete generic types efficiently, it also has a fully general system for computing with dynamic generic types. As mentioned in Section~\ref{sec:poly-fns}, @otype@ function parameters (in fact all @sized@ polymorphic parameters) come with implicit size and alignment parameters provided by the caller. Dynamic generic structs also have implicit size and alignment parameters, and also an \emph{offset array} which contains the offsets of each member of the struct\footnote{Dynamic generic unions need no such offset array, as all members are at offset 0; the size and alignment parameters are still provided for dynamic unions, however.}. Access to members\footnote{The \lstinline@offsetof@ macro is implemented similarly.} of a dynamic generic struct is provided by adding the corresponding member of the offset array to the struct pointer at runtime, essentially moving a compile-time offset calculation to runtime where necessary.
    396 
    397 These offset arrays are statically generated where possible. If a dynamic generic type is declared to be passed or returned by value from a polymorphic function, the translator can safely assume that the generic type is complete (that is, has a known layout) at any call-site, and the offset array is passed from the caller; if the generic type is concrete at the call site the elements of this offset array can even be statically generated using the C @offsetof@ macro. As an example, @p.second@ in the @value@ function above is implemented as @*(p + _offsetof_pair[1])@, where @p@ is a @void*@, and @_offsetof_pair@ is the offset array passed in to @value@ for @pair(const char*, T)@. The offset array @_offsetof_pair@ is generated at the call site as @size_t _offsetof_pair[] = { offsetof(_pair_conc1, first), offsetof(_pair_conc1, second) };@.
    398 
    399 In some cases the offset arrays cannot be statically generated. For instance, modularity is generally provided in C by including an opaque forward-declaration of a struct and associated accessor and mutator routines in a header file, with the actual implementations in a separately-compiled \texttt{.c} file. \CFA supports this pattern for generic types, and in this instance the caller does not know the actual layout or size of the dynamic generic type, and only holds it by pointer. The \CFA translator automatically generates \emph{layout functions} for cases where the size, alignment, and offset array of a generic struct cannot be passed in to a function from that function's caller. These layout functions take as arguments pointers to size and alignment variables and a caller-allocated array of member offsets, as well as the size and alignment of all @sized@ parameters to the generic struct (un-@sized@ parameters are forbidden from the language from being used in a context that affects layout). Results of these layout functions are cached so that they are only computed once per type per function.%, as in the example below for @pair@.
     416\subsection{Dynamic Generic-Types}
     417
     418Though \CFA implements concrete generic-types efficiently, it also has a fully general system for dynamic generic types.
     419As mentioned in Section~\ref{sec:poly-fns}, @otype@ function parameters (in fact all @sized@ polymorphic parameters) come with implicit size and alignment parameters provided by the caller.
     420Dynamic generic-types also have an \emph{offset array} containing structure member-offsets.
     421A dynamic generic-union needs no such offset array, as all members are at offset 0 but size and alignment are still necessary.
     422Access to members of a dynamic structure is provided at runtime via base-displacement addressing with the structure pointer and the member offset (similar to the @offsetof@ macro), moving a compile-time offset calculation to runtime.
     423
     424The offset arrays are statically generated where possible.
     425If a dynamic generic-type is declared to be passed or returned by value from a polymorphic function, the translator can safely assume the generic type is complete (\ie has a known layout) at any call-site, and the offset array is passed from the caller;
     426if the generic type is concrete at the call site, the elements of this offset array can even be statically generated using the C @offsetof@ macro.
     427As an example, @p.second@ in the @value@ function above is implemented as @*(p + _offsetof_pair[1])@, where @p@ is a @void *@, and @_offsetof_pair@ is the offset array passed into @value@ for @pair( const char *, T )@.
     428The offset array @_offsetof_pair@ is generated at the call site as @size_t _offsetof_pair[] = { offsetof(_pair_conc1, first), offsetof(_pair_conc1, second) }@.
     429
     430In some cases the offset arrays cannot be statically generated. For instance, modularity is generally provided in C by including an opaque forward-declaration of a structure and associated accessor and mutator functions in a header file, with the actual implementations in a separately-compiled @.c@ file.
     431\CFA supports this pattern for generic types, but the caller does not know the actual layout or size of the dynamic generic-type, and only holds it by a pointer.
     432The \CFA translator automatically generates \emph{layout functions} for cases where the size, alignment, and offset array of a generic struct cannot be passed into a function from that function's caller.
     433These layout functions take as arguments pointers to size and alignment variables and a caller-allocated array of member offsets, as well as the size and alignment of all @sized@ parameters to the generic structure (un@sized@ parameters are forbidden from being used in a context that affects layout).
     434Results of these layout functions are cached so that they are only computed once per type per function. %, as in the example below for @pair@.
    400435% \begin{lstlisting}
    401436% static inline void _layoutof_pair(size_t* _szeof_pair, size_t* _alignof_pair, size_t* _offsetof_pair,
     
    403438%     *_szeof_pair = 0; // default values
    404439%     *_alignof_pair = 1;
    405 
     440%
    406441%       // add offset, size, and alignment of first field
    407442%     _offsetof_pair[0] = *_szeof_pair;
    408443%     *_szeof_pair += _szeof_R;
    409444%     if ( *_alignof_pair < _alignof_R ) *_alignof_pair = _alignof_R;
    410 
     445%
    411446%       // padding, offset, size, and alignment of second field
    412447%     if ( *_szeof_pair & (_alignof_S - 1) )
     
    415450%     *_szeof_pair += _szeof_S;
    416451%     if ( *_alignof_pair < _alignof_S ) *_alignof_pair = _alignof_S;
    417 
     452%
    418453%       // pad to struct alignment
    419454%     if ( *_szeof_pair & (*_alignof_pair - 1) )
     
    421456% }
    422457% \end{lstlisting}
    423 
    424 Layout functions also allow generic types to be used in a function definition without reflecting them in the function signature. For instance, a function that strips duplicate values from an unsorted @vector(T)@ would likely have a pointer to the vector as its only explicit parameter, but use some sort of @set(T)@ internally to test for duplicate values. This function could acquire the layout for @set(T)@ by calling its layout function with the layout of @T@ implicitly passed into the function.
    425 
    426 Whether a type is concrete, dtype-static, or dynamic is decided based solely on the type parameters and @forall@ clause on the struct declaration. This design allows opaque forward declarations of generic types like @forall(otype T) struct Box;@ -- like in C, all uses of @Box(T)@ can be in a separately compiled translation unit, and callers from other translation units know the proper calling conventions to use. If the definition of a struct type was included in the decision of whether a generic type is dynamic or concrete, some further types may be recognized as dtype-static (\eg @forall(otype T) struct unique_ptr { T* p };@ does not depend on @T@ for its layout, but the existence of an @otype@ parameter means that it \emph{could}.), but preserving separate compilation (and the associated C compatibility) in the existing design is judged to be an appropriate trade-off.
     458Layout functions also allow generic types to be used in a function definition without reflecting them in the function signature.
     459For instance, a function that strips duplicate values from an unsorted @vector(T)@ would likely have a pointer to the vector as its only explicit parameter, but use some sort of @set(T)@ internally to test for duplicate values.
     460This function could acquire the layout for @set(T)@ by calling its layout function with the layout of @T@ implicitly passed into the function.
     461
     462Whether a type is concrete, dtype-static, or dynamic is decided solely on the type parameters and @forall@ clause on a declaration.
     463This design allows opaque forward declarations of generic types, \eg @forall(otype T) struct Box@ -- like in C, all uses of @Box(T)@ can be separately compiled, and callers from other translation units know the proper calling conventions to use.
     464If the definition of a structure type is included in deciding whether a generic type is dynamic or concrete, some further types may be recognized as dtype-static (\eg @forall(otype T) struct unique_ptr { T* p }@ does not depend on @T@ for its layout, but the existence of an @otype@ parameter means that it \emph{could}.), but preserving separate compilation (and the associated C compatibility) in the existing design is judged to be an appropriate trade-off.
     465
    427466
    428467\subsection{Applications}
    429468\label{sec:generic-apps}
    430469
    431 The reuse of dtype-static struct instantiations enables some useful programming patterns at zero runtime cost. The most important such pattern is using @forall(dtype T) T*@ as a type-checked replacement for @void*@, as in this example, which takes a @qsort@ or @bsearch@-compatible comparison routine and creates a similar lexicographic comparison for pairs of pointers:
    432 \begin{lstlisting}
    433 forall(dtype T)
    434 int lexcmp( pair(T*, T*)* a, pair(T*, T*)* b, int (*cmp)(T*, T*) ) {
    435         int c = cmp(a->first, b->first);
    436         if ( c == 0 ) c = cmp(a->second, b->second);
    437         return c;
    438 }
    439 \end{lstlisting}
    440 Since @pair(T*, T*)@ is a concrete type, there are no added implicit parameters to @lexcmp@, so the code generated by \CFA is effectively identical to a version of this function written in standard C using @void*@, yet the \CFA version is type-checked to ensure that the fields of both pairs and the arguments to the comparison function match in type.
    441 
    442 Another useful pattern enabled by reused dtype-static type instantiations is zero-cost ``tag'' structs. Sometimes a particular bit of information is only useful for type-checking, and can be omitted at runtime. Tag structs can be used to provide this information to the compiler without further runtime overhead, as in the following example:
     470The reuse of dtype-static structure instantiations enables useful programming patterns at zero runtime cost. The most important such pattern is using @forall(dtype T) T *@ as a type-checked replacement for @void *@, \eg creating a lexicographic comparison for pairs of pointers used by @bsearch@ or @qsort@:
     471\begin{lstlisting}
     472forall(dtype T) int lexcmp( pair( T *, T * ) * a, pair( T *, T * ) * b, int (* cmp)( T *, T * ) ) {
     473        return cmp( a->first, b->first ) ? : cmp( a->second, b->second );
     474}
     475\end{lstlisting}
     476%       int c = cmp( a->first, b->first );
     477%       if ( c == 0 ) c = cmp( a->second, b->second );
     478%       return c;
     479Since @pair(T *, T * )@ is a concrete type, there are no implicit parameters passed to @lexcmp@, so the generated code is identical to a function written in standard C using @void *@, yet the \CFA version is type-checked to ensure the fields of both pairs and the arguments to the comparison function match in type.
     480
     481Another useful pattern enabled by reused dtype-static type instantiations is zero-cost \emph{tag-structures}.
     482Sometimes information is only used for type-checking and can be omitted at runtime, \eg:
    443483\begin{lstlisting}
    444484forall(dtype Unit) struct scalar { unsigned long value; };
    445 
    446485struct metres {};
    447486struct litres {};
    448487
    449 forall(dtype U)
    450 scalar(U) ?+?(scalar(U) a, scalar(U) b) {
     488forall(dtype U) scalar(U) ?+?( scalar(U) a, scalar(U) b ) {
    451489        return (scalar(U)){ a.value + b.value };
    452490}
    453 
    454491scalar(metres) half_marathon = { 21093 };
    455492scalar(litres) swimming_pool = { 2500000 };
    456 
    457493scalar(metres) marathon = half_marathon + half_marathon;
    458494scalar(litres) two_pools = swimming_pool + swimming_pool;
    459 marathon + swimming_pool; // ERROR -- caught by compiler
    460 \end{lstlisting}
    461 @scalar@ is a dtype-static type, so all uses of it use a single struct definition, containing only a single @unsigned long@, and can share the same implementations of common routines like @?+?@ -- these implementations may even be separately compiled, unlike \CC template functions. However, the \CFA type-checker ensures that matching types are used by all calls to @?+?@, preventing nonsensical computations like adding the length of a marathon to the volume of an olympic pool.
     495marathon + swimming_pool;                       $\C{// compilation ERROR}$
     496\end{lstlisting}
     497@scalar@ is a dtype-static type, so all uses have a single structure definition, containing @unsigned long@, and can share the same implementations of common functions like @?+?@.
     498These implementations may even be separately compiled, unlike \CC template functions.
     499However, the \CFA type-checker ensures matching types are used by all calls to @?+?@, preventing nonsensical computations like adding a length to a volume.
    462500
    463501\section{Tuples}
     
    466504The @pair(R, S)@ generic type used as an example in the previous section can be considered a special case of a more general \emph{tuple} data structure. The authors have implemented tuples in \CFA, with a design particularly motivated by two use cases: \emph{multiple-return-value functions} and \emph{variadic functions}.
    467505
    468 In standard C, functions can return at most one value. This restriction results in code that emulates functions with multiple return values by \emph{aggregation} or by \emph{aliasing}. In the former situation, the function designer creates a record type that combines all of the return values into a single type. Unfortunately, the designer must come up with a name for the return type and for each of its fields. Unnecessary naming is a common programming language issue, introducing verbosity and a complication of the user's mental model. As such, this technique is effective when used sparingly, but can quickly get out of hand if many functions need to return different combinations of types. In the latter approach, the designer simulates multiple return values by passing the additional return values as pointer parameters. The pointer parameters are assigned inside of the routine body to emulate a return. Using this approach, the caller is directly responsible for allocating storage for the additional temporary return values. This responsibility complicates the call site with a sequence of variable declarations leading up to the call. Also, while a disciplined use of @const@ can give clues about whether a pointer parameter is going to be used as an out parameter, it is not immediately obvious from only the routine signature whether the callee expects such a parameter to be initialized before the call. Furthermore, while many C routines that accept pointers are designed so that it is safe to pass @NULL@ as a parameter, there are many C routines that are not null-safe. On a related note, C does not provide a standard mechanism to state that a parameter is going to be used as an additional return value, which makes the job of ensuring that a value is returned more difficult for the compiler.
     506In standard C, functions can return at most one value. This restriction results in code that emulates functions with multiple return values by \emph{aggregation} or by \emph{aliasing}. In the former situation, the function designer creates a record type that combines all of the return values into a single type. Unfortunately, the designer must come up with a name for the return type and for each of its fields. Unnecessary naming is a common programming language issue, introducing verbosity and a complication of the user's mental model. As such, this technique is effective when used sparingly, but can quickly get out of hand if many functions need to return different combinations of types. In the latter approach, the designer simulates multiple return values by passing the additional return values as pointer parameters. The pointer parameters are assigned inside of the function body to emulate a return. Using this approach, the caller is directly responsible for allocating storage for the additional temporary return values. This responsibility complicates the call site with a sequence of variable declarations leading up to the call. Also, while a disciplined use of @const@ can give clues about whether a pointer parameter is going to be used as an out parameter, it is not immediately obvious from only the function signature whether the callee expects such a parameter to be initialized before the call. Furthermore, while many C functions that accept pointers are designed so that it is safe to pass @NULL@ as a parameter, there are many C functions that are not null-safe. On a related note, C does not provide a standard mechanism to state that a parameter is going to be used as an additional return value, which makes the job of ensuring that a value is returned more difficult for the compiler.
    469507
    470508C does provide a mechanism for variadic functions through manipulation of @va_list@ objects, but it is notoriously type-unsafe. A variadic function is one that contains at least one parameter, followed by @...@ as the last token in the parameter list. In particular, some form of \emph{argument descriptor} is needed to inform the function of the number of arguments and their types, commonly a format string or counter parameter. It is important to note that both of these mechanisms are inherently redundant, because they require the user to specify information that the compiler knows explicitly. This required repetition is error prone, because it is easy for the user to add or remove arguments without updating the argument descriptor. In addition, C requires the programmer to hard code all of the possible expected types. As a result, it is cumbersome to write a variadic function that is open to extension. For example, consider a simple function that sums $N$ @int@s:
     
    475513  int ret = 0;
    476514  while(N) {
    477     ret += va_arg(args, int);  // must specify type
    478     N--;
     515        ret += va_arg(args, int);  // must specify type
     516        N--;
    479517  }
    480518  va_end(args);
     
    489527In practice, compilers can provide warnings to help mitigate some of the problems. For example, GCC provides the @format@ attribute to specify that a function uses a format string, which allows the compiler to perform some checks related to the standard format specifiers. Unfortunately, this attribute does not permit extensions to the format string syntax, so a programmer cannot extend it to warn for mismatches with custom types.
    490528
     529
    491530\subsection{Tuple Expressions}
    492531
     
    495534\CFA allows declaration of \emph{tuple variables}, variables of tuple type. For example:
    496535\begin{lstlisting}
    497 [int, char] most_frequent(const char*);
     536[int, char] most_frequent(const char * );
    498537
    499538const char* str = "hello, world!";
     
    739778Unlike C, it is not necessary to hard code the expected type. This code is naturally open to extension, in that any user-defined type with a @?+?@ operator is automatically able to be used with the @sum@ function. That is to say, the programmer who writes @sum@ does not need full program knowledge of every possible data type, unlike what is necessary to write an equivalent function using the standard C mechanisms. Summing arbitrary heterogeneous lists is possible with similar code by adding the appropriate type variables and addition operators.
    740779
    741 It is also possible to write a type-safe variadic print routine which can replace @printf@:
     780It is also possible to write a type-safe variadic print function which can replace @printf@:
    742781\begin{lstlisting}
    743782struct S { int x, y; };
     
    754793print("s = ", (S){ 1, 2 }, "\n");
    755794\end{lstlisting}
    756 This example routine showcases a variadic-template-like decomposition of the provided argument list. The individual @print@ routines allow printing a single element of a type. The polymorphic @print@ allows printing any list of types, as long as each individual type has a @print@ function. The individual print functions can be used to build up more complicated @print@ routines, such as for @S@, which is something that cannot be done with @printf@ in C.
     795This example function showcases a variadic-template-like decomposition of the provided argument list. The individual @print@ functions allow printing a single element of a type. The polymorphic @print@ allows printing any list of types, as long as each individual type has a @print@ function. The individual print functions can be used to build up more complicated @print@ functions, such as for @S@, which is something that cannot be done with @printf@ in C.
    757796
    758797It is also possible to use @ttype@ polymorphism to provide arbitrary argument forwarding functions. For example, it is possible to write @new@ as a library function:
     
    793832  forall(dtype T0, dtype T1, dtype T2 | sized(T0) | sized(T1) | sized(T2))
    794833  struct _tuple3 {  // generated before the first 3-tuple
    795     T0 field_0;
    796     T1 field_1;
    797     T2 field_2;
     834        T0 field_0;
     835        T1 field_1;
     836        T2 field_2;
    798837  };
    799838  _tuple3_(int, double, int) y;
  • doc/rob_thesis/cfa-format.tex

    rb14dd03 re869e434  
    131131  style=defaultStyle
    132132}
    133 \lstMakeShortInline[basewidth=0.5em,breaklines=true]@  % single-character for \lstinline
     133\lstMakeShortInline[basewidth=0.5em,breaklines=true,basicstyle=\normalsize\ttfamily\color{basicCol}]@  % single-character for \lstinline
    134134
    135135\lstnewenvironment{cfacode}[1][]{
  • doc/rob_thesis/conclusions.tex

    rb14dd03 re869e434  
    4545
    4646A caveat of this approach is that the @cleanup@ attribute only permits a name that refers to a function that consumes a single argument of type @T *@ for a variable of type @T@.
    47 This means that any destructor that consumes multiple arguments (e.g., because it is polymorphic) or any destructor that is a function pointer (e.g., because it is an assertion parameter) must be called through a local thunk.
     47This means that any destructor that consumes multiple arguments (\eg, because it is polymorphic) or any destructor that is a function pointer (\eg, because it is an assertion parameter) must be called through a local thunk.
    4848For example,
    4949\begin{cfacode}
  • doc/rob_thesis/ctordtor.tex

    rb14dd03 re869e434  
    77
    88Since \CFA is a true systems language, it does not provide a garbage collector.
    9 As well, \CFA is not an object-oriented programming language, i.e., structures cannot have routine members.
     9As well, \CFA is not an object-oriented programming language, \ie, structures cannot have routine members.
    1010Nevertheless, one important goal is to reduce programming complexity and increase safety.
    1111To that end, \CFA provides support for implicit pre/post-execution of routines for objects, via constructors and destructors.
     
    3030Next, @x@ is assigned the value of @y@.
    3131In the last line, @z@ is implicitly initialized to 0 since it is marked @static@.
    32 The key difference between assignment and initialization being that assignment occurs on a live object (i.e., an object that contains data).
     32The key difference between assignment and initialization being that assignment occurs on a live object (\ie, an object that contains data).
    3333It is important to note that this means @x@ could have been used uninitialized prior to being assigned, while @y@ could not be used uninitialized.
    3434Use of uninitialized variables yields undefined behaviour, which is a common source of errors in C programs.
     
    7979
    8080In \CFA, a constructor is a function with the name @?{}@.
    81 Like other operators in \CFA, the name represents the syntax used to call the constructor, e.g., @struct S = { ... };@.
     81Like other operators in \CFA, the name represents the syntax used to call the constructor, \eg, @struct S = { ... };@.
    8282Every constructor must have a return type of @void@ and at least one parameter, the first of which is colloquially referred to as the \emph{this} parameter, as in many object-oriented programming-languages (however, a programmer can give it an arbitrary name).
    8383The @this@ parameter must have a pointer type, whose base type is the type of object that the function constructs.
     
    114114In other words, a default constructor is a constructor that takes a single argument: the @this@ parameter.
    115115
    116 In \CFA, a destructor is a function much like a constructor, except that its name is \lstinline!^?{}! and it take only one argument.
    117 A destructor for the @Array@ type can be defined as such.
     116In \CFA, a destructor is a function much like a constructor, except that its name is \lstinline!^?{}! and it takes only one argument.
     117A destructor for the @Array@ type can be defined as:
    118118\begin{cfacode}
    119119void ^?{}(Array * arr) {
     
    167167}
    168168\end{cfacode}
     169
    169170In \CFA, constructors are called implicitly in initialization contexts.
    170171\begin{cfacode}
    171172Array x, y = { 20, 0xdeadbeef }, z = y;
    172173\end{cfacode}
    173 
    174 In \CFA, constructor calls look just like C initializers, which allows them to be inserted into legacy C code with minimal code changes, and also provides a very simple syntax that veteran C programmers are familiar with.
    175 One downside of reusing C initialization syntax is that it isn't possible to determine whether an object is constructed just by looking at its declaration, since that requires knowledge of whether the type is managed at that point.
     174Constructor calls look just like C initializers, which allows them to be inserted into legacy C code with minimal code changes, and also provides a very simple syntax that veteran C programmers are familiar with.
     175One downside of reusing C initialization syntax is that it is not possible to determine whether an object is constructed just by looking at its declaration, since that requires knowledge of whether the type is managed at that point in the program.
    176176
    177177This example generates the following code
     
    246246\end{cfacode}
    247247Finally, constructors and destructors support \emph{operator syntax}.
    248 Like other operators in \CFA, the function name mirrors the use-case, in that the first $N$ arguments fill in the place of the question mark.
     248Like other operators in \CFA, the function name mirrors the use-case, in that the question marks are placeholders for the first $N$ arguments.
    249249This syntactic form is similar to the new initialization syntax in \CCeleven, except that it is used in expression contexts, rather than declaration contexts.
    250250\begin{cfacode}
     
    272272Like other operators, the function name @?{}@ matches its operator syntax.
    273273For example, @(&x){}@ calls the default constructor on the variable @x@, and produces @&x@ as a result.
    274 A key example for this capability is the use of constructor expressions to initialize the result of a call to standard C routine @malloc@.
     274A key example for this capability is the use of constructor expressions to initialize the result of a call to @malloc@.
    275275\begin{cfacode}
    276276struct X { ... };
    277277void ?{}(X *, double);
    278 X * x = malloc(sizeof(X)){ 1.5 };
     278X * x = malloc(){ 1.5 };
    279279\end{cfacode}
    280280In this example, @malloc@ dynamically allocates storage and initializes it using a constructor, all before assigning it into the variable @x@.
    281281If this extension is not present, constructing dynamically allocated objects is much more cumbersome, requiring separate initialization of the pointer and initialization of the pointed-to memory.
    282282\begin{cfacode}
    283 X * x = malloc(sizeof(X));
     283X * x = malloc();
    284284x{ 1.5 };
    285285\end{cfacode}
     
    291291struct X *_tmp_ctor;
    292292struct X *x = ?{}(  // construct result of malloc
    293   _tmp_ctor=malloc(sizeof(struct X)), // store result of malloc
     293  _tmp_ctor=malloc_T(sizeof(struct X), _Alignof(struct X)), // store result of malloc
    294294  1.5
    295295), _tmp_ctor; // produce constructed result of malloc
     
    297297It should be noted that this technique is not exclusive to @malloc@, and allows a user to write a custom allocator that can be idiomatically used in much the same way as a constructed @malloc@ call.
    298298
    299 It is also possible to use operator syntax with destructors.
    300 Unlike constructors, operator syntax with destructors is a statement and thus does not produce a value, since the destructed object is invalidated by the use of a destructor.
    301 For example, \lstinline!^(&x){}! calls the destructor on the variable @x@.
     299It should be noted that while it is possible to use operator syntax with destructors, destructors invalidate their argument, thus operator syntax with destructors is a statement and does not produce a value.
    302300
    303301\subsection{Function Generation}
     
    376374The field constructors are constructors that consume a prefix of the structure's member-list.
    377375That is, $N$ constructors are built of the form @void ?{}(S *, T$_{\text{M}_0}$)@, @void ?{}(S *, T$_{\text{M}_0}$, T$_{\text{M}_1}$)@, ..., @void ?{}(S *, T$_{\text{M}_0}$, T$_{\text{M}_1}$, ..., T$_{\text{M}_{N-1}}$)@, where members are copy constructed if they have a corresponding positional argument and are default constructed otherwise.
    378 The addition of field constructors allows structures in \CFA to be used naturally in the same ways as used in C (i.e., to initialize any prefix of the structure), e.g., @A a0 = { b }, a1 = { b, c }@.
     376The addition of field constructors allows structures in \CFA to be used naturally in the same ways as used in C (\ie, to initialize any prefix of the structure), \eg, @A a0 = { b }, a1 = { b, c }@.
    379377Extending the previous example, the following constructors are implicitly generated for @A@.
    380378\begin{cfacode}
     
    429427
    430428\subsection{Using Constructors and Destructors}
    431 Implicitly generated constructor and destructor calls ignore the outermost type qualifiers, e.g. @const@ and @volatile@, on a type by way of a cast on the first argument to the function.
     429Implicitly generated constructor and destructor calls ignore the outermost type qualifiers, \eg @const@ and @volatile@, on a type by way of a cast on the first argument to the function.
    432430For example,
    433431\begin{cfacode}
     
    448446Here, @&s@ and @&s2@ are cast to unqualified pointer types.
    449447This mechanism allows the same constructors and destructors to be used for qualified objects as for unqualified objects.
    450 This applies only to implicitly generated constructor calls.
     448This rule applies only to implicitly generated constructor calls.
    451449Hence, explicitly re-initializing qualified objects with a constructor requires an explicit cast.
    452450
     
    489487Instead, @a2->x@ is initialized to @0@ as if it were a C object, because of the explicit initializer.
    490488
    491 In addition to freedom, \ateq provides a simple path to migrating legacy C code to \CFA, in that objects can be moved from C-style initialization to \CFA gradually and individually.
     489In addition to freedom, \ateq provides a simple path for migrating legacy C code to \CFA, in that objects can be moved from C-style initialization to \CFA gradually and individually.
    492490It is worth noting that the use of unmanaged objects can be tricky to get right, since there is no guarantee that the proper invariants are established on an unmanaged object.
    493491It is recommended that most objects be managed by sensible constructors and destructors, except where absolutely necessary.
     
    503501  {
    504502    void ?{}(S * s, int i) { s->x = i*2; } // locally hide autogen constructors
    505     S s4;  // error
    506     S s5 = { 3 };  // okay
    507     S s6 = { 4, 5 };  // error
     503    S s4;  // error, no default constructor
     504    S s5 = { 3 };  // okay, local constructor
     505    S s6 = { 4, 5 };  // error, no field constructor
    508506    S s7 = s5; // okay
    509507  }
     
    513511In this example, the inner scope declares a constructor from @int@ to @S@, which hides the default constructor and field constructors until the end of the scope.
    514512
    515 When defining a constructor or destructor for a struct @S@, any members that are not explicitly constructed or destructed are implicitly constructed or destructed automatically.
     513When defining a constructor or destructor for a structure @S@, any members that are not explicitly constructed or destructed are implicitly constructed or destructed automatically.
    516514If an explicit call is present, then that call is taken in preference to any implicitly generated call.
    517515A consequence of this rule is that it is possible, unlike \CC, to precisely control the order of construction and destruction of sub-objects on a per-constructor basis, whereas in \CC sub-object initialization and destruction is always performed based on the declaration order.
     
    597595In practice, however, there could be many objects that can be constructed from a given @int@ (or, indeed, any arbitrary parameter list), and thus a complete solution to this problem would require fully exploring all possibilities.
    598596
    599 More precisely, constructor calls cannot have a nesting depth greater than the number of array components in the type of the initialized object, plus one.
     597More precisely, constructor calls cannot have a nesting depth greater than the number of array dimensions in the type of the initialized object, plus one.
    600598For example,
    601599\begin{cfacode}
     
    609607  { {14 }, { 15 } }   // a2[1]
    610608};
    611 A a3[4] = {
    612   { { 11 }, { 12 } },  // error
     609A a3[4] = { // 1 dimension => max depth 2
     610  { { 11 }, { 12 } },  // error, three levels deep
    613611  { 80 }, { 90 }, { 100 }
    614612}
     
    622620\label{sub:implicit_dtor}
    623621Destructors are automatically called at the end of the block in which the object is declared.
    624 In addition to this, destructors are automatically called when statements manipulate control flow to leave a block in which the object is declared, e.g., with return, break, continue, and goto statements.
     622In addition to this, destructors are automatically called when statements manipulate control flow to leave a block in which the object is declared, \eg, with return, break, continue, and goto statements.
    625623The example below demonstrates a simple routine with multiple return statements.
    626624\begin{cfacode}
     
    747745Exempt from these rules are intrinsic and built-in functions.
    748746It should be noted that unmanaged objects are subject to copy constructor calls when passed as arguments to a function or when returned from a function, since they are not the \emph{target} of the copy constructor call.
    749 That is, since the parameter is not marked as an unmanaged object using \ateq, it will be copy constructed if it is returned by value or passed as an argument to another function, so to guarantee consistent behaviour, unmanaged objects must be copy constructed when passed as arguments.
    750 This is an important detail to bear in mind when using unmanaged objects, and could produce unexpected results when mixed with objects that are explicitly constructed.
     747That is, since the parameter is not marked as an unmanaged object using \ateq, it is be copy constructed if it is returned by value or passed as an argument to another function, so to guarantee consistent behaviour, unmanaged objects must be copy constructed when passed as arguments.
     748These semantics are important to bear in mind when using unmanaged objects, and could produce unexpected results when mixed with objects that are explicitly constructed.
    751749\begin{cfacode}
    752750struct A;
     
    763761identity(z);  // copy construct z into x
    764762\end{cfacode}
    765 Note that @z@ is copy constructed into a temporary variable to be passed as an argument, which is also destructed after the call.
     763Note that unmanaged argument @z@ is logically copy constructed into managed parameter @x@; however, the translator must copy construct into a temporary variable to be passed as an argument, which is also destructed after the call.
     764A compiler could by-pass the argument temporaries since it is in control of the calling conventions and knows exactly where the called-function's parameters live.
    766765
    767766This generates the following
     
    859858This transformation provides @f@ with the address of the return variable so that it can be constructed into directly.
    860859It is worth pointing out that this kind of signature rewriting already occurs in polymorphic functions that return by value, as discussed in \cite{Bilson03}.
    861 A key difference in this case is that every function would need to be rewritten like this, since types can switch between managed and unmanaged at different scope levels, e.g.
     860A key difference in this case is that every function would need to be rewritten like this, since types can switch between managed and unmanaged at different scope levels, \eg
    862861\begin{cfacode}
    863862struct A { int v; };
     
    874873Furthermore, it is not possible to overload C functions, so using @extern "C"@ to declare functions is of limited use.
    875874
    876 It would be possible to regain some control by adding an attribute to structs that specifies whether they can be managed or not (perhaps \emph{manageable} or \emph{unmanageable}), and to emit an error in the case that a constructor or destructor is declared for an unmanageable type.
    877 Ideally, structs should be manageable by default, since otherwise the default case becomes more verbose.
     875It would be possible to regain some control by adding an attribute to structures that specifies whether they can be managed or not (perhaps \emph{manageable} or \emph{unmanageable}), and to emit an error in the case that a constructor or destructor is declared for an unmanageable type.
     876Ideally, structures should be manageable by default, since otherwise the default case becomes more verbose.
    878877This means that in general, function signatures would have to be rewritten, and in a select few cases the signatures would not be rewritten.
    879878\begin{cfacode}
     
    886885C h();  // rewritten void h(C *);
    887886\end{cfacode}
    888 An alternative is to instead make the attribute \emph{identifiable}, which states that objects of this type use the @this@ parameter as an identity.
     887An alternative is to make the attribute \emph{identifiable}, which states that objects of this type use the @this@ parameter as an identity.
    889888This strikes more closely to the visible problem, in that only types marked as identifiable would need to have the return value moved into the parameter list, and every other type could remain the same.
    890889Furthermore, no restrictions would need to be placed on whether objects can be constructed.
     
    10151014
    10161015\subsection{Global Initialization}
    1017 In standard C, global variables can only be initialized to compile-time constant expressions.
    1018 This places strict limitations on the programmer's ability to control the default values of objects.
     1016In standard C, global variables can only be initialized to compile-time constant expressions, which places strict limitations on the programmer's ability to control the default values of objects.
    10191017In \CFA, constructors and destructors are guaranteed to be run on global objects, allowing arbitrary code to be run before and after the execution of the main routine.
    10201018By default, objects within a translation unit are constructed in declaration order, and destructed in the reverse order.
    10211019The default order of construction of objects amongst translation units is unspecified.
    1022 It is, however, guaranteed that any global objects in the standard library are initialized prior to the initialization of any object in the user program.
     1020It is, however, guaranteed that any global objects in the standard library are initialized prior to the initialization of any object in a user program.
    10231021
    10241022This feature is implemented in the \CFA translator by grouping every global constructor call into a function with the GCC attribute \emph{constructor}, which performs most of the heavy lifting \cite[6.31.1]{GCCExtensions}.
     
    10531051%   https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Attributes.html#C_002b_002b-Attributes
    10541052% suggestion: implement this in CFA by picking objects with a specified priority and pulling them into their own init functions (could even group them by priority level -> map<int, list<ObjectDecl*>>) and pull init_priority forward into constructor and destructor attributes with the same priority level
    1055 GCC provides an attribute @init_priority@, which allows specifying the relative priority for initialization of global objects on a per-object basis in \CC.
     1053GCC provides an attribute @init_priority@ in \CC, which allows specifying the relative priority for initialization of global objects on a per-object basis.
    10561054A similar attribute can be implemented in \CFA by pulling marked objects into global constructor/destructor-attribute functions with the specified priority.
    10571055For example,
     
    10761074In standard C, it is possible to mark variables that are local to a function with the @static@ storage class.
    10771075Unlike normal local variables, a @static@ local variable is defined to live for the entire duration of the program, so that each call to the function has access to the same variable with the same address and value as it had in the previous call to the function.
    1078 Much like global variables, in C @static@ variables can only be initialized to a \emph{compile-time constant value} so that a compiler is able to create storage for the variable and initialize it at compile-time.
     1076Much like global variables, @static@ variables can only be initialized to a \emph{compile-time constant value} so that a compiler is able to create storage for the variable and initialize it at compile-time.
    10791077
    10801078Yet again, this rule is too restrictive for a language with constructors and destructors.
    1081 Instead, \CFA modifies the definition of a @static@ local variable so that objects are guaranteed to be live from the time control flow reaches their declaration, until the end of the program, since the initializer expression is not necessarily a compile-time constant, but can depend on the current execution state of the function.
    1082 Since standard C does not allow access to a @static@ local variable before the first time control flow reaches the declaration, this restriction does not preclude any valid C code.
     1079Since the initializer expression is not necessarily a compile-time constant and can depend on the current execution state of the function, \CFA modifies the definition of a @static@ local variable so that objects are guaranteed to be live from the time control flow reaches their declaration, until the end of the program.
     1080Since standard C does not allow access to a @static@ local variable before the first time control flow reaches the declaration, this change does not preclude any valid C code.
    10831081Local objects with @static@ storage class are only implicitly constructed and destructed once for the duration of the program.
    10841082The object is constructed when its declaration is reached for the first time.
     
    10901088Since the parameter to @atexit@ is a parameter-less function, some additional tweaking is required.
    10911089First, the @static@ variable must be hoisted up to global scope and uniquely renamed to prevent name clashes with other global objects.
    1092 Second, a function is built which calls the destructor for the newly hoisted variable.
     1090If necessary, a local structure may need to be hoisted, as well.
     1091Second, a function is built that calls the destructor for the newly hoisted variable.
    10931092Finally, the newly generated function is registered with @atexit@, instead of registering the destructor directly.
    10941093Since @atexit@ calls functions in the reverse order in which they are registered, @static@ local variables are guaranteed to be destructed in the reverse order that they are constructed, which may differ between multiple executions of the same program.
     
    11561155void f(T);
    11571156\end{cfacode}
    1158 This allows easily specifying constraints that are common to all complete object types very simply.
    1159 
    1160 Now that \CFA has constructors and destructors, more of a complete object's behaviour can be specified by than was previously possible.
     1157This allows easily specifying constraints that are common to all complete object-types very simply.
     1158
     1159Now that \CFA has constructors and destructors, more of a complete object's behaviour can be specified than was previously possible.
    11611160As such, @otype@ has been augmented to include assertions for a default constructor, copy constructor, and destructor.
    11621161That is, the previous example is now equivalent to
    11631162\begin{cfacode}
    1164 forall(dtype T | sized(T) | { T ?=?(T *, T); void ?{}(T *); void ?{}(T *, T); void ^?{}(T *); })
     1163forall(dtype T | sized(T) |
     1164  { T ?=?(T *, T); void ?{}(T *); void ?{}(T *, T); void ^?{}(T *); })
    11651165void f(T);
    11661166\end{cfacode}
    1167 This allows @f@'s body to create and destroy objects of type @T@, and pass objects of type @T@ as arguments to other functions, following the normal \CFA rules.
    1168 A point of note here is that objects can be missing default constructors (and eventually other functions through deleted functions), so it is important for \CFA programmers to think carefully about the operations needed by their function, as to not over-constrain the acceptable parameter types.
     1167These additions allow @f@'s body to create and destroy objects of type @T@, and pass objects of type @T@ as arguments to other functions, following the normal \CFA rules.
     1168A point of note here is that objects can be missing default constructors (and eventually other functions through deleted functions), so it is important for \CFA programmers to think carefully about the operations needed by their function, as to not over-constrain the acceptable parameter types and prevent potential reuse.
  • doc/rob_thesis/intro.tex

    rb14dd03 re869e434  
    1616Therefore, these design principles must be kept in mind throughout the design and development of new language features.
    1717In order to appeal to existing C programmers, great care must be taken to ensure that new features naturally feel like C.
     18These goals ensure existing C code-bases can be converted to \CFA incrementally with minimal effort, and C programmers can productively generate \CFA code without training beyond the features being used.
     19Unfortunately, \CC is actively diverging from C, so incremental additions require significant effort and training, coupled with multiple legacy design-choices that cannot be updated.
     20
    1821The remainder of this section describes some of the important new features that currently exist in \CFA, to give the reader the necessary context in which the new features presented in this thesis must dovetail.
    1922
     
    5356\end{cfacode}
    5457Compound literals create an unnamed object, and result in an lvalue, so it is legal to assign a value into a compound literal or to take its address \cite[p.~86]{C11}.
    55 Syntactically, compound literals look like a cast operator followed by a brace-enclosed initializer, but semantically are different from a C cast, which only applies basic conversions and is never an lvalue.
     58Syntactically, compound literals look like a cast operator followed by a brace-enclosed initializer, but semantically are different from a C cast, which only applies basic conversions and coercions and is never an lvalue.
    5659
    5760\subsection{Overloading}
     
    5962Overloading is the ability to specify multiple entities with the same name.
    6063The most common form of overloading is function overloading, wherein multiple functions can be defined with the same name, but with different signatures.
    61 Like in \CC, \CFA allows overloading based both on the number of parameters and on the types of parameters.
     64C provides a small amount of built-in overloading, \eg + is overloaded for the basic types.
     65Like in \CC, \CFA allows user-defined overloading based both on the number of parameters and on the types of parameters.
    6266  \begin{cfacode}
    6367  void f(void);  // (1)
     
    9296There are times when a function should logically return multiple values.
    9397Since a function in standard C can only return a single value, a programmer must either take in additional return values by address, or the function's designer must create a wrapper structure to package multiple return-values.
     98For example, the first approach:
    9499\begin{cfacode}
    95100int f(int * ret) {        // returns a value through parameter ret
     
    101106int res1 = g(&res2);      // explicitly pass storage
    102107\end{cfacode}
    103 The former solution is awkward because it requires the caller to explicitly allocate memory for $n$ result variables, even if they are only temporary values used as a subexpression, or even not used at all.
    104 The latter approach:
     108is awkward because it requires the caller to explicitly allocate memory for $n$ result variables, even if they are only temporary values used as a subexpression, or even not used at all.
     109The second approach:
    105110\begin{cfacode}
    106111struct A {
     
    113118... res3.x ... res3.y ... // use result values
    114119\end{cfacode}
    115 requires the caller to either learn the field names of the structure or learn the names of helper routines to access the individual return values.
    116 Both solutions are syntactically unnatural.
     120is awkward because the caller has to either learn the field names of the structure or learn the names of helper routines to access the individual return values.
     121Both approaches are syntactically unnatural.
    117122
    118123In \CFA, it is possible to directly declare a function returning multiple values.
     
    165170  \begin{cfacode}
    166171  struct A { int i; };
    167   int ?+?(A x, A y);
     172  int ?+?(A x, A y);    // '?'s represent operands
    168173  bool ?<?(A x, A y);
    169174  \end{cfacode}
    170175Notably, the only difference is syntax.
    171176Most of the operators supported by \CC for operator overloading are also supported in \CFA.
    172 Of notable exception are the logical operators (e.g. @||@), the sequence operator (i.e. @,@), and the member-access operators (e.g. @.@ and \lstinline{->}).
     177Of notable exception are the logical operators (\eg @||@), the sequence operator (\ie @,@), and the member-access operators (\eg @.@ and \lstinline{->}).
    173178
    174179Finally, \CFA also permits overloading variable identifiers.
     
    243248  template<typename T>
    244249  T sum(T *arr, int n) {
    245     T t;
     250    T t;  // default construct => 0
    246251    for (; n > 0; n--) t += arr[n-1];
    247252    return t;
     
    261266  \end{cfacode}
    262267The first thing to note here is that immediately following the declaration of @otype T@ is a list of \emph{type assertions} that specify restrictions on acceptable choices of @T@.
    263 In particular, the assertions above specify that there must be a an assignment from \zero to @T@ and an addition assignment operator from @T@ to @T@.
     268In particular, the assertions above specify that there must be an assignment from \zero to @T@ and an addition assignment operator from @T@ to @T@.
    264269The existence of an assignment operator from @T@ to @T@ and the ability to create an object of type @T@ are assumed implicitly by declaring @T@ with the @otype@ type-class.
    265270In addition to @otype@, there are currently two other type-classes.
     
    281286A major difference between the approaches of \CC and \CFA to polymorphism is that the set of assumed properties for a type is \emph{explicit} in \CFA.
    282287One of the major limiting factors of \CC's approach is that templates cannot be separately compiled.
    283 In contrast, the explicit nature of assertions allows \CFA's polymorphic functions to be separately compiled.
     288In contrast, the explicit nature of assertions allows \CFA's polymorphic functions to be separately compiled, as the function prototype states all necessary requirements separate from the implementation.
     289For example, the prototype for the previous sum function is
     290  \begin{cfacode}
     291  forall(otype T | **R**{ T ?=?(T *, zero_t); T ?+=?(T *, T); }**R**)
     292  T sum(T *arr, int n);
     293  \end{cfacode}
     294With this prototype, a caller in another translation unit knows all of the constraints on @T@, and thus knows all of the operations that need to be made available to @sum@.
    284295
    285296In \CFA, a set of assertions can be factored into a \emph{trait}.
     
    296307This capability allows specifying the same set of assertions in multiple locations, without the repetition and likelihood of mistakes that come with manually writing them out for each function declaration.
    297308
    298 An interesting application of return-type resolution and polymorphism is with type-safe @malloc@.
     309An interesting application of return-type resolution and polymorphism is a type-safe version of @malloc@.
    299310\begin{cfacode}
    300311forall(dtype T | sized(T))
     
    316327
    317328In object-oriented programming languages, type invariants are typically established in a constructor and maintained throughout the object's lifetime.
    318 These assertions are typically achieved through a combination of access control modifiers and a restricted interface.
     329These assertions are typically achieved through a combination of access-control modifiers and a restricted interface.
    319330Typically, data which requires the maintenance of an invariant is hidden from external sources using the \emph{private} modifier, which restricts reads and writes to a select set of trusted routines, including member functions.
    320331It is these trusted routines that perform all modifications to internal data in a way that is consistent with the invariant, by ensuring that the invariant holds true at the end of the routine call.
     
    388399In other languages, a hybrid situation exists where resources escape the allocation block, but ownership is precisely controlled by the language.
    389400This pattern requires a strict interface and protocol for a data structure, consisting of a pre-initialization and a post-termination call, and all intervening access is done via interface routines.
    390 This kind of encapsulation is popular in object-oriented programming languages, and like the stack, it takes care of a significant portion of resource management cases.
     401This kind of encapsulation is popular in object-oriented programming languages, and like the stack, it takes care of a significant portion of resource-management cases.
    391402
    392403For example, \CC directly supports this pattern through class types and an idiom known as RAII \footnote{Resource Acquisition is Initialization} by means of constructors and destructors.
     
    399410In the context of \CFA, a non-trivial constructor is either a user defined constructor or an auto-generated constructor that calls a non-trivial constructor.
    400411
    401 For the remaining resource ownership cases, programmer must follow a brittle, explicit protocol for freeing resources or an implicit protocol implemented via the programming language.
     412For the remaining resource ownership cases, a programmer must follow a brittle, explicit protocol for freeing resources or an implicit protocol enforced by the programming language.
    402413
    403414In garbage collected languages, such as Java, resources are largely managed by the garbage collector.
    404 Still, garbage collectors are typically focus only on memory management.
     415Still, garbage collectors typically focus only on memory management.
    405416There are many kinds of resources that the garbage collector does not understand, such as sockets, open files, and database connections.
    406417In particular, Java supports \emph{finalizers}, which are similar to destructors.
    407 Sadly, finalizers are only guaranteed to be called before an object is reclaimed by the garbage collector \cite[p.~373]{Java8}, which may not happen if memory use is not contentious.
     418Unfortunately, finalizers are only guaranteed to be called before an object is reclaimed by the garbage collector \cite[p.~373]{Java8}, which may not happen if memory use is not contentious.
    408419Due to operating-system resource-limits, this is unacceptable for many long running programs.
    409420Instead, the paradigm in Java requires programmers to manually keep track of all resources \emph{except} memory, leading many novices and experts alike to forget to close files, etc.
     
    450461\end{javacode}
    451462Variables declared as part of a try-with-resources statement must conform to the @AutoClosable@ interface, and the compiler implicitly calls @close@ on each of the variables at the end of the block.
    452 Depending on when the exception is raised, both @out@ and @log@ are null, @log@ is null, or both are non-null, therefore, the cleanup for these variables at the end is appropriately guarded and conditionally executed to prevent null-pointer exceptions.
     463Depending on when the exception is raised, both @out@ and @log@ are null, @log@ is null, or both are non-null, therefore, the cleanup for these variables at the end is automatically guarded and conditionally executed to prevent null-pointer exceptions.
    453464
    454465While Rust \cite{Rust} does not enforce the use of a garbage collector, it does provide a manual memory management environment, with a strict ownership model that automatically frees allocated memory and prevents common memory management errors.
     
    486497There is no runtime cost imposed on these restrictions, since they are enforced at compile-time.
    487498
    488 Rust provides RAII through the @Drop@ trait, allowing arbitrary code to execute when the object goes out of scope, allowing Rust programs to automatically clean up auxiliary resources much like a \CC program.
     499Rust provides RAII through the @Drop@ trait, allowing arbitrary code to execute when the object goes out of scope, providing automatic clean up of auxiliary resources, much like a \CC program.
    489500\begin{rustcode}
    490501struct S {
     
    493504
    494505impl Drop for S {  // RAII for S
    495   fn drop(&mut self) {
     506  fn drop(&mut self) {  // destructor
    496507    println!("dropped {}", self.name);
    497508  }
     
    558569tuple<int, int, int> triple(10, 20, 30);
    559570auto & [t1, t2, t3] = triple;
    560 t2 = 0; // changes triple
     571t2 = 0; // changes middle element of triple
    561572
    562573struct S { int x; double y; };
     
    564575auto [x, y] = s; // unpack s
    565576\end{cppcode}
    566 Structured bindings allow unpacking any struct with all public non-static data members into fresh local variables.
     577Structured bindings allow unpacking any structure with all public non-static data members into fresh local variables.
    567578The use of @&@ allows declaring new variables as references, which is something that cannot be done with @std::tie@, since \CC references do not support rebinding.
    568579This extension requires the use of @auto@ to infer the types of the new variables, so complicated expressions with a non-obvious type must be documented with some other mechanism.
    569580Furthermore, structured bindings are not a full replacement for @std::tie@, as it always declares new variables.
    570581
    571 Like \CC, D provides tuples through a library variadic template struct.
     582Like \CC, D provides tuples through a library variadic-template structure.
    572583In D, it is possible to name the fields of a tuple type, which creates a distinct type.
    573584% http://dlang.org/phobos/std_typecons.html
     
    600611\end{smlcode}
    601612Here, the function @binco@ appears to take 2 arguments, but it actually takes a single argument which is implicitly decomposed via pattern matching.
    602 Tuples are a foundational tool in SML, allowing the creation of arbitrarily complex structured data types.
     613Tuples are a foundational tool in SML, allowing the creation of arbitrarily-complex structured data-types.
    603614
    604615Scala, like \CC, provides tuple types through the standard library \cite{Scala}.
     
    653664Since the variadic arguments are untyped, it is up to the function to interpret any data that is passed in.
    654665Additionally, the interface to manipulate @va_list@ objects is essentially limited to advancing to the next argument, without any built-in facility to determine when the last argument is read.
    655 This requires the use of an \emph{argument descriptor} to pass information to the function about the structure of the argument list, including the number of arguments and their types.
     666This limitation requires the use of an \emph{argument descriptor} to pass information to the function about the structure of the argument list, including the number of arguments and their types.
    656667The format string in @printf@ is one such example of an argument descriptor.
    657668\begin{cfacode}
  • doc/rob_thesis/tuples.tex

    rb14dd03 re869e434  
    7070Furthermore, while many C routines that accept pointers are designed so that it is safe to pass @NULL@ as a parameter, there are many C routines that are not null-safe.
    7171On a related note, C does not provide a standard mechanism to state that a parameter is going to be used as an additional return value, which makes the job of ensuring that a value is returned more difficult for the compiler.
    72 There is a subtle bug in the previous example, in that @ret_ch@ is never assigned for a string that does not contain any letters, which can lead to undefined behaviour.
     72Interestingly, there is a subtle bug in the previous example, in that @ret_ch@ is never assigned for a string that does not contain any letters, which can lead to undefined behaviour.
     73In this particular case, it turns out that the frequency return value also doubles as an error code, where a frequency of 0 means the character return value should be ignored.
     74Still, not every routine with multiple return values should be required to return an error code, and error codes are easily ignored, so this is not a satisfying solution.
    7375As with the previous approach, this technique can simulate multiple return values, but in practice it is verbose and error prone.
    7476
     
    8486  char freqs [26] = { 0 };
    8587  int ret_freq = 0;
    86   char ret_ch = 'a';
     88  char ret_ch = 'a';  // arbitrary default value for consistent results
    8789  for (int i = 0; str[i] != '\0'; ++i) {
    8890    if (isalpha(str[i])) {        // only count letters
     
    98100}
    99101\end{cfacode}
    100 This approach provides the benefits of compile-time checking for appropriate return statements as in aggregation, but without the required verbosity of declaring a new named type, which precludes the bug seen with out parameters.
     102This approach provides the benefits of compile-time checking for appropriate return statements as in aggregation, but without the required verbosity of declaring a new named type, which precludes the bug seen with out-parameters.
    101103
    102104The addition of multiple-return-value functions necessitates a syntax for accepting multiple values at the call-site.
     
    208210For the call to @g@, the values @y@ and @10@ are structured into a single argument of type @[int, int]@ to match the type of the parameter of @g@.
    209211Finally, in the call to @h@, @y@ is flattened to yield an argument list of length 3, of which the first component of @x@ is passed as the first parameter of @h@, and the second component of @x@ and @y@ are structured into the second argument of type @[int, int]@.
    210 The flexible structure of tuples permits a simple and expressive function call syntax to work seamlessly with both single- and multiple-return-value functions, and with any number of arguments of arbitrarily complex structure.
     212The flexible structure of tuples permits a simple and expressive function-call syntax to work seamlessly with both single- and multiple-return-value functions, and with any number of arguments of arbitrarily complex structure.
    211213
    212214In \KWC \cite{Buhr94a,Till89}, a precursor to \CFA, there were 4 tuple coercions: opening, closing, flattening, and structuring.
    213215Opening coerces a tuple value into a tuple of values, while closing converts a tuple of values into a single tuple value.
    214 Flattening coerces a nested tuple into a flat tuple, i.e. it takes a tuple with tuple components and expands it into a tuple with only non-tuple components.
    215 Structuring moves in the opposite direction, i.e. it takes a flat tuple value and provides structure by introducing nested tuple components.
     216Flattening coerces a nested tuple into a flat tuple, \ie it takes a tuple with tuple components and expands it into a tuple with only non-tuple components.
     217Structuring moves in the opposite direction, \ie it takes a flat tuple value and provides structure by introducing nested tuple components.
    216218
    217219In \CFA, the design has been simplified to require only the two conversions previously described, which trigger only in function call and return situations.
     
    258260A mass assignment assigns the value $R$ to each $L_i$.
    259261For a mass assignment to be valid, @?=?(&$L_i$, $R$)@ must be a well-typed expression.
    260 These semantics differ from C cascading assignment (e.g. @a=b=c@) in that conversions are applied to $R$ in each individual assignment, which prevents data loss from the chain of conversions that can happen during a cascading assignment.
     262These semantics differ from C cascading assignment (\eg @a=b=c@) in that conversions are applied to $R$ in each individual assignment, which prevents data loss from the chain of conversions that can happen during a cascading assignment.
    261263For example, @[y, x] = 3.14@ performs the assignments @y = 3.14@ and @x = 3.14@, which results in the value @3.14@ in @y@ and the value @3@ in @x@.
    262264On the other hand, the C cascading assignment @y = x = 3.14@ performs the assignments @x = 3.14@ and @y = x@, which results in the value @3@ in @x@, and as a result the value @3@ in @y@ as well.
     
    274276These semantics allow cascading tuple assignment to work out naturally in any context where a tuple is permitted.
    275277These semantics are a change from the original tuple design in \KWC \cite{Till89}, wherein tuple assignment was a statement that allows cascading assignments as a special case.
    276 Restricting tuple assignment to statements was an attempt to to fix what was seen as a problem with assignment, wherein it can be used in many different locations, such as in function-call argument position.
     278Restricting tuple assignment to statements was an attempt to to fix what was seen as a problem with side-effects, wherein assignment can be used in many different locations, such as in function-call argument position.
    277279While permitting assignment as an expression does introduce the potential for subtle complexities, it is impossible to remove assignment expressions from \CFA without affecting backwards compatibility.
    278280Furthermore, there are situations where permitting assignment as an expression improves readability by keeping code succinct and reducing repetition, and complicating the definition of tuple assignment puts a greater cognitive burden on the user.
     
    289291\end{cfacode}
    290292The tuple expression begins with a mass assignment of @1.5@ into @[b, d]@, which assigns @1.5@ into @b@, which is truncated to @1@, and @1.5@ into @d@, producing the tuple @[1, 1.5]@ as a result.
    291 That tuple is used as the right side of the multiple assignment (i.e., @[c, a] = [1, 1.5]@) that assigns @1@ into @c@ and @1.5@ into @a@, which is truncated to @1@, producing the result @[1, 1]@.
     293That tuple is used as the right side of the multiple assignment (\ie, @[c, a] = [1, 1.5]@) that assigns @1@ into @c@ and @1.5@ into @a@, which is truncated to @1@, producing the result @[1, 1]@.
    292294Finally, the tuple @[1, 1]@ is used as an expression in the call to @f@.
    293295
     
    307309In this example, @x@ is initialized by the multiple constructor calls @?{}(&x.0, 3)@ and @?{}(&x.1, 6.28)@, while @y@ is initialized by two default constructor calls @?{}(&y.0)@ and @?{}(&y.1)@.
    308310@z@ is initialized by mass copy constructor calls @?{}(&z.0, x.0)@ and @?{}(&z.1, x.0)@.
    309 Finally, @x@, @y@, and @z@ are destructed, i.e. the calls @^?{}(&x.0)@, @^?{}(&x.1)@, @^?{}(&y.0)@, @^?{}(&y.1)@, @^?{}(&z.0)@, and @^?{}(&z.1)@.
     311Finally, @x@, @y@, and @z@ are destructed, \ie the calls @^?{}(&x.0)@, @^?{}(&x.1)@, @^?{}(&y.0)@, @^?{}(&y.1)@, @^?{}(&z.0)@, and @^?{}(&z.1)@.
    310312
    311313It is possible to define constructors and assignment functions for tuple types that provide new semantics, if the existing semantics do not fit the needs of an application.
     
    340342Then the type of @a.[x, y, z]@ is @[T_x, T_y, T_z]@.
    341343
    342 Since tuple index expressions are a form of member-access expression, it is possible to use tuple-index expressions in conjunction with member tuple expressions to manually restructure a tuple (e.g., rearrange components, drop components, duplicate components, etc.).
     344Since tuple index expressions are a form of member-access expression, it is possible to use tuple-index expressions in conjunction with member tuple expressions to manually restructure a tuple (\eg, rearrange components, drop components, duplicate components, etc.).
    343345\begin{cfacode}
    344346[int, int, long, double] x;
     
    392394
    393395As for @z.y@, one interpretation is to extend the meaning of member tuple expressions.
    394 That is, currently the tuple must occur as the member, i.e. to the right of the dot.
     396That is, currently the tuple must occur as the member, \ie to the right of the dot.
    395397Allowing tuples to the left of the dot could distribute the member across the elements of the tuple, in much the same way that member tuple expressions distribute the aggregate across the member tuple.
    396398In this example, @z.y@ expands to @[z.0.y, z.1.y]@, allowing what is effectively a very limited compile-time field-sections map operation, where the argument must be a tuple containing only aggregates having a member named @y@.
     
    450452
    451453struct A { int x; };
    452 (struct A)f();  // invalid
     454(struct A)f();  // invalid, int cannot be converted to A
    453455\end{cfacode}
    454456In C, line 4 is a valid cast, which calls @f@ and discards its result.
     
    466468  [int, [int, int], int] g();
    467469
    468   ([int, double])f();           // (1)
    469   ([int, int, int])g();         // (2)
    470   ([void, [int, int]])g();      // (3)
    471   ([int, int, int, int])g();    // (4)
    472   ([int, [int, int, int]])g();  // (5)
     470  ([int, double])f();           // (1) valid
     471  ([int, int, int])g();         // (2) valid
     472  ([void, [int, int]])g();      // (3) valid
     473  ([int, int, int, int])g();    // (4) invalid
     474  ([int, [int, int, int]])g();  // (5) invalid
    473475\end{cfacode}
    474476
     
    477479If @g@ is free of side effects, this is equivalent to @[(int)(g().0), (int)(g().1.0), (int)(g().2)]@.
    478480Since @void@ is effectively a 0-element tuple, (3) discards the first and third return values, which is effectively equivalent to @[(int)(g().1.0), (int)(g().1.1)]@).
    479 
    480481% will this always hold true? probably, as constructors should give all of the conversion power we need. if casts become function calls, what would they look like? would need a way to specify the target type, which seems awkward. Also, C++ basically only has this because classes are closed to extension, while we don't have that problem (can have floating constructors for any type).
    481482Note that a cast is not a function call in \CFA, so flattening and structuring conversions do not occur for cast expressions.
     
    534535\end{cfacode}
    535536
    536 Until this point, it has been assumed that assertion arguments must match the parameter type exactly, modulo polymorphic specialization (i.e., no implicit conversions are applied to assertion arguments).
     537Until this point, it has been assumed that assertion arguments must match the parameter type exactly, modulo polymorphic specialization (\ie, no implicit conversions are applied to assertion arguments).
    537538This decision presents a conflict with the flexibility of tuples.
    538539\subsection{Assertion Inference}
     
    566567}
    567568\end{cfacode}
    568 Is transformed into
     569is transformed into
    569570\begin{cfacode}
    570571forall(dtype T0, dtype T1 | sized(T0) | sized(T1))
    571 struct _tuple2 {  // generated before the first 2-tuple
     572struct _tuple2_ {  // generated before the first 2-tuple
    572573  T0 field_0;
    573574  T1 field_1;
     
    576577  _tuple2_(double, double) x;
    577578  forall(dtype T0, dtype T1, dtype T2 | sized(T0) | sized(T1) | sized(T2))
    578   struct _tuple3 {  // generated before the first 3-tuple
     579  struct _tuple3_ {  // generated before the first 3-tuple
    579580    T0 field_0;
    580581    T1 field_1;
     
    589590[5, 'x', 1.24];
    590591\end{cfacode}
    591 Becomes
     592becomes
    592593\begin{cfacode}
    593594(_tuple3_(int, char, double)){ 5, 'x', 1.24 };
     
    603604f(x, 'z');
    604605\end{cfacode}
    605 Is transformed into
     606is transformed into
    606607\begin{cfacode}
    607608void f(int, _tuple2_(double, char));
     
    650651It is possible that lazy evaluation could be exposed to the user through a lazy keyword with little additional effort.
    651652
    652 Tuple member expressions are recursively expanded into a list of member access expressions.
     653Tuple-member expressions are recursively expanded into a list of member-access expressions.
    653654\begin{cfacode}
    654655[int, [double, int, double], int]] x;
    655656x.[0, 1.[0, 2]];
    656657\end{cfacode}
    657 which becomes
     658becomes
    658659\begin{cfacode}
    659660[x.0, [x.1.0, x.1.2]];
     
    670671[x, y, z] = 1.5;            // mass assignment
    671672\end{cfacode}
    672 Generates the following
     673generates the following
    673674\begin{cfacode}
    674675// [x, y, z] = 1.5;
     
    711712});
    712713\end{cfacode}
    713 A variable is generated to store the value produced by a statement expression, since its fields may need to be constructed with a non-trivial constructor and it may need to be referred to multiple time, e.g., in a unique expression.
     714A variable is generated to store the value produced by a statement expression, since its fields may need to be constructed with a non-trivial constructor and it may need to be referred to multiple time, \eg, in a unique expression.
    714715$N$ LHS variables are generated and constructed using the address of the tuple components, and a single RHS variable is generated to store the value of the RHS without any loss of precision.
    715716A nested statement expression is generated that performs the individual assignments and constructs the return value using the results of the individual assignments.
     
    720721[x, y, z] = [f(), 3];       // multiple assignment
    721722\end{cfacode}
    722 Generates
     723generates the following
    723724\begin{cfacode}
    724725// [x, y, z] = [f(), 3];
  • doc/rob_thesis/variadic.tex

    rb14dd03 re869e434  
    1212In addition, C requires the programmer to hard code all of the possible expected types.
    1313As a result, it is cumbersome to write a function that is open to extension.
    14 For example, a simple function which sums $N$ @int@s,
     14For example, a simple function to sum $N$ @int@s,
    1515\begin{cfacode}
    1616int sum(int N, ...) {
     
    2727sum(3, 10, 20, 30);  // need to keep counter in sync
    2828\end{cfacode}
    29 The @va_list@ type is a special C data type that abstracts variadic argument manipulation.
     29The @va_list@ type is a special C data type that abstracts variadic-argument manipulation.
    3030The @va_start@ macro initializes a @va_list@, given the last named parameter.
    3131Each use of the @va_arg@ macro allows access to the next variadic argument, given a type.
     
    3434In the case where the provided type is not compatible with the argument's actual type after default argument promotions, or if too many arguments are accessed, the behaviour is undefined \cite[p.~81]{C11}.
    3535Furthermore, there is no way to perform the necessary error checks in the @sum@ function at run-time, since type information is not carried into the function body.
    36 Since they rely on programmer convention rather than compile-time checks, variadic functions are generally unsafe.
     36Since they rely on programmer convention rather than compile-time checks, variadic functions are unsafe.
    3737
    3838In practice, compilers can provide warnings to help mitigate some of the problems.
    39 For example, GCC provides the @format@ attribute to specify that a function uses a format string, which allows the compiler to perform some checks related to the standard format specifiers.
    40 Unfortunately, this approach does not permit extensions to the format string syntax, so a programmer cannot extend the attribute to warn for mismatches with custom types.
     39For example, GCC provides the @format@ attribute to specify that a function uses a format string, which allows the compiler to perform some checks related to the standard format-specifiers.
     40Unfortunately, this approach does not permit extensions to the format-string syntax, so a programmer cannot extend the attribute to warn for mismatches with custom types.
    4141
    4242As a result, C's variadic functions are a deficient language feature.
     
    7979Similarly, in order to pass 0 variadic arguments, an explicit empty tuple must be passed into the argument list, otherwise the exact matching rule would not have an argument to bind against.
    8080
    81 It should be otherwise noted that the addition of an exact matching rule only affects the outcome for polymorphic type binding when tuples are involved.
     81It should be otherwise noted that the addition of an exact matching rule only affects the outcome for polymorphic type-binding when tuples are involved.
    8282For non-tuple arguments, exact matching and flattening and structuring are equivalent.
    83 For tuple arguments to a function without polymorphic formal parameters, flattening and structuring work whenever an exact match would have worked, since the tuple is flattened and implicitly restructured to its original structure.
     83For tuple arguments to a function without polymorphic formal-parameters, flattening and structuring work whenever an exact match would have worked, since the tuple is flattened and implicitly restructured to its original structure.
    8484Thus there is nothing to be gained from permitting the exact matching rule to take effect when a function does not contain polymorphism and none of the arguments are tuples.
    8585
     
    161161  return x+y;
    162162}
    163 forall(otype T1, otype T2, otype T3, ttype Params, otype R
     163forall(otype T1, otype T2, otype T3, otype R, ttype Params
    164164  | summable(T1, T2, T3)
    165165  | { R sum(T3, Params); })
     
    184184\CFA does not need an ellipsis in either case, since the type class @ttype@ is only used for variadics.
    185185An alternative design is to use an ellipsis combined with an existing type class.
    186 This approach was not taken because the largest benefit of the ellipsis token in \CC is the ability to expand a parameter pack within an expression, e.g., in fold expressions, which requires compile-time knowledge of the structure of the parameter pack, which is not available in \CFA.
     186This approach was not taken because the largest benefit of the ellipsis token in \CC is the ability to expand a parameter pack within an expression, \eg, in fold expressions, which requires compile-time knowledge of the structure of the parameter pack, which is not available in \CFA.
    187187\begin{cppcode}
    188188template<typename... Args>
     
    224224Array * x = new(1, 2, 3);
    225225\end{cfacode}
    226 The @new@ function provides the combination of type-safe @malloc@ with a constructor call, so that it becomes impossible to forget to construct dynamically allocated objects.
     226In the call to @new@, @Array@ is selected to match @T@, and @Params@ is expanded to match @[int, int, int, int]@. To satisfy the assertions, a constructor with an interface compatible with @void ?{}(Array *, int, int, int)@ must exist in the current scope.
     227
     228The @new@ function provides the combination of type-safe @malloc@ with a constructor call, so that it becomes impossible to forget to construct dynamically-allocated objects.
    227229This approach provides the type-safety of @new@ in \CC, without the need to specify the allocated type, thanks to return-type inference.
    228 
    229 In the call to @new@, @Array@ is selected to match @T@, and @Params@ is expanded to match @[int, int, int, int]@. To satisfy the assertions, a constructor with an interface compatible with @void ?{}(Array *, int, int, int)@ must exist in the current scope.
    230230
    231231\section{Implementation}
     
    240240}
    241241\end{cfacode}
    242 Generates the following
     242generates the following
    243243\begin{cfacode}
    244244void *malloc(long unsigned int _sizeof_T, long unsigned int _alignof_T);
     
    267267\end{cfacode}
    268268The constructor for @T@ is called indirectly through the adapter function on the result of @malloc@ and the parameter pack.
    269 The variable that was allocated and constructed is then returned from @new@.
     269The variable that is allocated and constructed is then returned from @new@.
    270270
    271271A call to @new@
     
    337337}
    338338\end{cfacode}
    339 Generates
     339generates the following
    340340\begin{cfacode}
    341341void print_variadic(
     
    382382print("x = ", 123, ".\n");
    383383\end{cfacode}
    384 Generates the following
     384generates the following
    385385\begin{cfacode}
    386386void print_string(const char *x){
  • doc/user/user.tex

    rb14dd03 re869e434  
    1111%% Created On       : Wed Apr  6 14:53:29 2016
    1212%% Last Modified By : Peter A. Buhr
    13 %% Last Modified On : Wed Apr  5 23:19:40 2017
    14 %% Update Count     : 1412
     13%% Last Modified On : Wed Apr 12 12:18:58 2017
     14%% Update Count     : 1415
    1515%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    1616
     
    6464% Names used in the document.
    6565\newcommand{\Version}{\input{../../version}}
    66 \newcommand{\CS}{C\raisebox{-0.9ex}{\large$^\sharp$}\xspace}
    67 
    6866\newcommand{\Textbf}[2][red]{{\color{#1}{\textbf{#2}}}}
    6967\newcommand{\Emph}[2][red]{{\color{#1}\textbf{\emph{#2}}}}
     
    195193For system programming, where direct access to hardware and dealing with real-time issues is a requirement, C is usually the language of choice.
    196194As well, there are millions of lines of C legacy code, forming the base for many software development projects (especially on UNIX systems).
    197 The TIOBE index (\url{http://www.tiobe.com/tiobe_index}) for March 2016 shows programming-language popularity, with \Index*{Java} 20.5\%, C 14.5\%, \Index*[C++]{\CC} 6.7\%, \CS 4.3\%, \Index*{Python} 4.3\%, and all other programming languages below 3\%.
     195The TIOBE index (\url{http://www.tiobe.com/tiobe_index}) for March 2016 shows programming-language popularity, with \Index*{Java} 20.5\%, C 14.5\%, \Index*[C++]{\CC} 6.7\%, \Csharp 4.3\%, \Index*{Python} 4.3\%, and all other programming languages below 3\%.
    198196As well, for 30 years, C has been the number 1 and 2 most popular programming language:
    199197\begin{center}
Note: See TracChangeset for help on using the changeset viewer.