Ignore:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/generic_types/generic_types.tex

    r5995116 r8396044  
    2929\newcommand{\CCtwenty}{\rm C\kern-.1em\hbox{+\kern-.25em+}20\xspace} % C++20 symbolic name
    3030
    31 \newcommand{\TODO}[1]{\textbf{TODO}: #1} % TODO included
    32 %\newcommand{\TODO}[1]{} % TODO elided
     31\newcommand{\TODO}{\textbf{TODO}}
    3332\newcommand{\eg}{\textit{e}.\textit{g}.,\xspace}
    3433\newcommand{\ie}{\textit{i}.\textit{e}.,\xspace}
     
    124123\maketitle
    125124
     125
    126126\section{Introduction \& Background}
    127127
     
    146146int forty_two = identity( 42 );                         $\C{// T is bound to int, forty\_two == 42}$
    147147\end{lstlisting}
    148 The @identity@ function above can be applied to any complete object-type (or ``@otype@''). The type variable @T@ is transformed into a set of additional implicit parameters to @identity@ that encode sufficient information about @T@ to create and return a variable of that type. The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, \eg for a pointer, the type parameter can be declared as @dtype T@, where @dtype@ is short for ``data type''.
     148The @identity@ function above can be applied to any complete object-type (or ``@otype@''). The type variable @T@ is transformed into a set of additional implicit parameters to @identity@ that encode sufficient information about @T@ to create and return a variable of that type. The \CFA implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, e.g., for a pointer, the type parameter can be declared as @dtype T@, where @dtype@ is short for ``data type''.
    149149
    150150Here, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; preliminary experiments have shown this overhead to be similar to \CC virtual function calls. An advantage of this design is that, unlike \CC template functions, \CFA @forall@ functions are compatible with C separate compilation.
     
    214214Hence, programmers can easily form new local environments to maximize reuse of existing functions and types.
    215215
    216 Finally, variables may be overloaded:
     216Finally, \CFA allows variable overloading:
    217217\lstDeleteShortInline@
    218218\par\smallskip
     
    263263forall( otype T | summable( T ) )
    264264  T sum( T a[$\,$], size_t size ) {
    265         T total = { 0 };                                                $\C{// instantiate T from 0}$
     265        `T` total = { `0` };                                    $\C{// instantiate T from 0}$
    266266        for ( unsigned int i = 0; i < size; i += 1 )
    267                 total += a[i];                                          $\C{// select appropriate +}$
     267                total `+=` a[i];                                        $\C{// select appropriate +}$
    268268        return total;
    269269}
     
    281281};
    282282\end{lstlisting}
    283 Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete struct type -- they can be stack-allocated using the @alloca@ compiler builtin, default or copy-initialized, assigned, and deleted. As an example, the @sum@ function produces generated code something like the following (simplified for clarity and brevity)\TODO{fix example, maybe elide, it's likely too long with the more complicated function}:
     283Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete struct type -- they can be stack-allocated using the @alloca@ compiler builtin, default or copy-initialized, assigned, and deleted. As an example, the @sum@ function produces generated code something like the following (simplified for clarity and brevity) \TODO{} fix example, maybe elide, it's likely too long with the more complicated function:
    284284\begin{lstlisting}
    285285void abs( size_t _sizeof_M, size_t _alignof_M,
     
    387387\subsection{Dynamic Generic Types}
    388388
    389 Though \CFA implements concrete generic types efficiently, it also has a fully general system for computing with dynamic generic types. As mentioned in Section~\ref{sec:poly-fns}, @otype@ function parameters (in fact all @sized@ polymorphic parameters) come with implicit size and alignment parameters provided by the caller. Dynamic generic structs also have implicit size and alignment parameters, and also an \emph{offset array} which contains the offsets of each member of the struct\footnote{Dynamic generic unions need no such offset array, as all members are at offset 0; the size and alignment parameters are still provided for dynamic unions, however.}. Access to members\footnote{The \lstinline@offsetof@ macro is implemented similarly.} of a dynamic generic struct is provided by adding the corresponding member of the offset array to the struct pointer at runtime, essentially moving a compile-time offset calculation to runtime where necessary.
     389Though \CFA implements concrete generic types efficiently, it also has a fully general system for computing with dynamic generic types. As mentioned in Section~\ref{sec:poly-fns}, @otype@ function parameters (in fact all @sized@ polymorphic parameters) come with implicit size and alignment parameters provided by the caller. Dynamic generic structs have the same size and alignment parameter, and also an \emph{offset array} which contains the offsets of each member of the struct\footnote{Dynamic generic unions need no such offset array, as all members are at offset 0; the size and alignment parameters are still provided for dynamic unions, however.}. Access to members\footnote{The \lstinline@offsetof@ macro is implemented similarly.} of a dynamic generic struct is provided by adding the corresponding member of the offset array to the struct pointer at runtime, essentially moving a compile-time offset calculation to runtime where necessary.
    390390
    391391These offset arrays are statically generated where possible. If a dynamic generic type is declared to be passed or returned by value from a polymorphic function, the translator can safely assume that the generic type is complete (that is, has a known layout) at any call-site, and the offset array is passed from the caller; if the generic type is concrete at the call site the elements of this offset array can even be statically generated using the C @offsetof@ macro. As an example, @p.second@ in the @value@ function above is implemented as @*(p + _offsetof_pair[1])@, where @p@ is a @void*@, and @_offsetof_pair@ is the offset array passed in to @value@ for @pair(const char*, T)@. The offset array @_offsetof_pair@ is generated at the call site as @size_t _offsetof_pair[] = { offsetof(_pair_conc1, first), offsetof(_pair_conc1, second) };@.
     
    521521In \CFA, each of these calls is valid. In the call to @f@, @x@ is implicitly flattened so that the components of @x@ are passed as the two arguments to @f@. For the call to @g@, the values @y@ and @10@ are structured into a single argument of type @[int, int]@ to match the type of the parameter of @g@. Finally, in the call to @h@, @y@ is flattened to yield an argument list of length 3, of which the first component of @x@ is passed as the first parameter of @h@, and the second component of @x@ and @y@ are structured into the second argument of type @[int, int]@. The flexible structure of tuples permits a simple and expressive function call syntax to work seamlessly with both single- and multiple-return-value functions, and with any number of arguments of arbitrarily complex structure.
    522522
    523 % In {K-W C} \citep{Buhr94a,Till89}, a precursor to \CFA, there were 4 tuple coercions: opening, closing, flattening, and structuring. Opening coerces a tuple value into a tuple of values, while closing converts a tuple of values into a single tuple value. Flattening coerces a nested tuple into a flat tuple, \ie it takes a tuple with tuple components and expands it into a tuple with only non-tuple components. Structuring moves in the opposite direction, \ie it takes a flat tuple value and provides structure by introducing nested tuple components.
     523In {K-W C}~\citep{Buhr94a,Till89}, a precursor to \CFA, there were 4 tuple coercions: opening, closing, flattening, and structuring. Opening coerces a tuple value into a tuple of values, while closing converts a tuple of values into a single tuple value. Flattening coerces a nested tuple into a flat tuple, \ie it takes a tuple with tuple components and expands it into a tuple with only non-tuple components. Structuring moves in the opposite direction, \ie it takes a flat tuple value and provides structure by introducing nested tuple components.
    524524
    525525In \CFA, the design has been simplified to require only the two conversions previously described, which trigger only in function call and return situations. Specifically, the expression resolution algorithm examines all of the possible alternatives for an expression to determine the best match. In resolving a function call expression, each combination of function value and list of argument alternatives is examined. Given a particular argument list and function value, the list of argument alternatives is flattened to produce a list of non-tuple valued expressions. Then the flattened list of expressions is compared with each value in the function's parameter list. If the parameter's type is not a tuple type, then the current argument value is unified with the parameter type, and on success the next argument and parameter are examined. If the parameter's type is a tuple type, then the structuring conversion takes effect, recursively applying the parameter matching algorithm using the tuple's component types as the parameter list types. Assuming a successful unification, eventually the algorithm gets to the end of the tuple type, which causes all of the matching expressions to be consumed and structured into a tuple expression. For example, in
     
    630630  [int, [int, int], int] g();
    631631
    632   ([int, double])f();           $\C{// (1)}$
    633   ([int, int, int])g();         $\C{// (2)}$
    634   ([void, [int, int]])g();      $\C{// (3)}$
    635   ([int, int, int, int])g();    $\C{// (4)}$
    636   ([int, [int, int, int]])g();  $\C{// (5)}$
     632  ([int, double])f();           // (1)
     633  ([int, int, int])g();         // (2)
     634  ([void, [int, int]])g();      // (3)
     635  ([int, int, int, int])g();    // (4)
     636  ([int, [int, int, int]])g();  // (5)
    637637\end{lstlisting}
    638638
     
    767767In the call to @new@, @Pair(double, char)@ is selected to match @T@, and @Params@ is expanded to match @[double, char]@. The constructor (1) may be specialized to  satisfy the assertion for a constructor with an interface compatible with @void ?{}(Pair(int, char) *, int, char)@.
    768768
    769 \TODO{Check if we actually can use ttype parameters on generic types (if they set the complete flag, it should work, or nearly so).}
     769\TODO{} Check if we actually can use ttype parameters on generic types (if they set the complete flag, it should work, or nearly so).
    770770
    771771\subsection{Implementation}
     
    850850The various kinds of tuple assignment, constructors, and destructors generate GNU C statement expressions. A variable is generated to store the value produced by a statement expression, since its fields may need to be constructed with a non-trivial constructor and it may need to be referred to multiple time, \eg in a unique expression. The use of statement expressions allows the translator to arbitrarily generate additional temporary variables as needed, but binds the implementation to a non-standard extension of the C language. However, there are other places where the \CFA translator makes use of GNU C extensions, such as its use of nested functions, so this restriction is not new.
    851851
    852 \section{Evaluation}
    853 
    854 \TODO{Magnus suggests we need some graphs, it's kind of a done thing that the reviewers will be looking for. Also, we've made some unsubstantiated claims about the runtime performance of \CFA, which some micro-benchmarks could help with. I'm thinking a simple stack push and pop, with an idiomatic \lstinline@void*@, \CFA, \CC template and \CC virtual inheritance versions (the void* and virtual inheritance versions likely need to be linked lists, or clumsy in their API -- possibly both versions) to test generics, and variadic print to test tuples. We measure SLOC, runtime performance, executable size (making sure to include benchmarks for multiple types in the executable), and possibly manually count the number of places where the programmer must provide un-type-checked type information. Appendices don't count against our page limit, so we might want to include the source code for the benchmarks (or at least the relevant implementation details) in one.}
    855 
    856852\section{Related Work}
    857853
     
    860856Cyclone also provides capabilities for polymorphic functions and existential types~\citep{Grossman06}, similar in concept to \CFA's @forall@ functions and generic types. Cyclone existential types can include function pointers in a construct similar to a virtual function table, but these pointers must be explicitly initialized at some point in the code, a tedious and potentially error-prone process. Furthermore, Cyclone's polymorphic functions and types are restricted in that they may only abstract over types with the same layout and calling convention as @void*@, in practice only pointer types and @int@ - in \CFA terms, all Cyclone polymorphism must be dtype-static. This design provides the efficiency benefits discussed in Section~\ref{sec:generic-apps} for dtype-static polymorphism, but is more restrictive than \CFA's more general model.
    861857
    862 Go \citep{Go} and Rust \citep{Rust} are both modern, compiled languages with abstraction features similar to \CFA traits, \emph{interfaces} in Go and \emph{traits} in Rust. However, both languages represent dramatic departures from C in terms of language model, and neither has the same level of compatibility with C as \CFA. Go is a garbage-collected language, imposing the associated runtime overhead, and complicating foreign-function calls with the necessity of accounting for data transfer between the managed Go runtime and the unmanaged C runtime. Furthermore, while generic types and functions are available in Go, they are limited to a small fixed set provided by the compiler, with no language facility to define more. Rust is not garbage-collected, and thus has a lighter-weight runtime that is more easily interoperable with C. It also possesses much more powerful abstraction capabilities for writing generic code than Go. On the other hand, Rust's borrow-checker, while it does provide strong safety guarantees, is complex and difficult to learn, and imposes a distinctly idiomatic programming style on Rust. \CFA, with its more modest safety features, is significantly easier to port C code to, while maintaining the idiomatic style of the original source.
     858Go and Rust are both modern, compiled languages with abstraction features similar to \CFA traits, \emph{interfaces} in Go and \emph{traits} in Rust. However, both languages represent dramatic departures from C in terms of language model, and neither has the same level of compatibility with C as \CFA. Go is a garbage-collected language, imposing the associated runtime overhead, and complicating foreign-function calls with the necessity of accounting for data transfer between the managed Go runtime and the unmanaged C runtime. Furthermore, while generic types and functions are available in Go, they are limited to a small fixed set provided by the compiler, with no language facility to define more. Rust is not garbage-collected, and thus has a lighter-weight runtime that is more easily interoperable with C. It also possesses much more powerful abstraction capabilities for writing generic code than Go. On the other hand, Rust's borrow-checker, while it does provide strong safety guarantees, is complex and difficult to learn, and imposes a distinctly idiomatic programming style on Rust. \CFA, with its more modest safety features, is significantly easier to port C code to, while maintaining the idiomatic style of the original source.
    863859
    864860\section{Conclusion \& Future Work}
     
    867863
    868864\begin{acks}
    869 The authors would like to thank Magnus Madsen for valuable editorial feedback.
    870 
    871865This work is supported in part by a corporate partnership with \grantsponsor{Huawei}{Huawei Ltd.}{http://www.huawei.com}\ and the first author's \grantsponsor{NSERC-PGS}{NSERC PGS D}{http://www.nserc-crsng.gc.ca/Students-Etudiants/PG-CS/BellandPostgrad-BelletSuperieures_eng.asp} scholarship.
    872866\end{acks}
Note: See TracChangeset for help on using the changeset viewer.