1 edited


  • doc/generic_types/generic_types.tex

    rad4d035 rcceab8a  
    66\usepackage{upquote}                                                                    % switch curled `'" to straight
    77\usepackage{listings}                                                                   % format program code
    5455\newcommand{\CCseventeen}{\rm C\kern-.1em\hbox{+\kern-.25em+}17\xspace} % C++17 symbolic name
    5556\newcommand{\CCtwenty}{\rm C\kern-.1em\hbox{+\kern-.25em+}20\xspace} % C++20 symbolic name
     57\newcommand{\CCV}{\rm C\kern-.1em\hbox{+\kern-.25em+}obj\xspace} % C++ virtual symbolic name
    766768\subsection{Variadic Tuples}
    768771To define variadic functions, \CFA adds a new kind of type parameter, @ttype@ (tuple type).
    931 \TODO{Magnus suggests we need some graphs, it's kind of a done thing that the reviewers will be looking for. Also, we've made some unsubstantiated claims about the runtime performance of \CFA, which some micro-benchmarks could help with. I'm thinking a simple stack push and pop, with an idiomatic \lstinline@void*@, \CFA, \CC template and \CC virtual inheritance versions (the void* and virtual inheritance versions likely need to be linked lists, or clumsy in their API -- possibly both versions) to test generics, and variadic print to test tuples. We measure SLOC, runtime performance, executable size (making sure to include benchmarks for multiple types in the executable), and possibly manually count the number of places where the programmer must provide un-type-checked type information. Appendices don't count against our page limit, so we might want to include the source code for the benchmarks (or at least the relevant implementation details) in one.}
     934Though \CFA provides significant added functionality over C, these added features do not impose a significant runtime penalty. In fact, \CFA's features for generic programming can enable runtime execution that is faster than idiomatic @void*@-based C code. We have produced a set of generic-code-based micro-benchmarks to demonstrate these claims, source code for which may be found in \TODO{Appendix A}. These benchmarks test a generic stack based on a singly-linked-list, a generic pair data structure, and a variadic @print@ routine similar to that shown in Section~\ref{sec:variadic-tuples}. Each benchmark has been implemented in C with @void*@-based polymorphism, \CFA with the features discussed in this paper, \CC with templates, and \CC using only class inheritance for polymorphism (``\CCV''). The intention of these benchmarks is to represent the costs of idiomatic use of each language's features, rather than the strict maximal performance obtainable by code written in each language -- as all the languages considered have a shared subset comprising most of standard C, a set of maximal-performance benchmarks would presumably show very little runtime variance, and would differ primarily in length and clarity of source code. Particularly, in the \CCV variant of the benchmark all objects inherit from a base @object@ class, explicitly implement interfaces defined as abstract base classes, and must do runtime checks in generic code to safely down-cast objects; this is not an idiomatic programming pattern for \CC, but is meant to represent the design of a simple object-oriented programming language. The most notable difference between the implementations is  memory layout; the \CFA and \CC variants inline the stack and pair elements into their corresponding list and pair nodes, while the C and \CCV versions are forced by their lack of a generic type capability to store generic objects via pointers to separately-allocated objects. For more idiomatic language use, the C and \CFA variants used \texttt{cstdio.h} for printing, while the \CC and \CCV variants used \texttt{iostream}, though preliminary experiments showed this distinction to make little runtime difference. For consistency in testing, all implementations used the C @rand()@ function for random number generation.
     939\caption{Timing Results for benchmarks}
     944\caption{Properties of benchmark code}
     947                                                        &       C               &       \CFA    &       \CC             &       \CCV    \\ \hline
     948maximum memory usage (MB)       &       10001   &       2501    &       2503    &       11253   \\
     949source code size (lines)        &       301             &       224             &       188             &       437             \\
     950binary size (KB)                        &       18.46   &       234.22  &       18.42   &       42.10   \\
     954The results of running the benchmarks can be seen in Figure~\ref{fig:eval} and Table~\ref{tab:eval}; each result records the time taken by a single function call, repeated $N = 40,000,000$ times where appropriate. The five functions are $N$ stack pushes of randomly generated elements, deep copy of an $N$ element stack, clearing all nodes of an $N$ element stack, $N/2$ variadic @print@ calls each containing two constant strings and two stack elements \TODO{right now $N$ fresh elements: FIX}, and $N$ stack pops, keeping a running record of the maximum element to ensure that the object copies are not optimized out. These five functions are run first for a stack of integers, and second for a stack of generic pairs of a boolean and a @char@. \TODO{} The data shown is the median of 5 consecutive runs of each program, with an initial warm-up run omitted. All code was compiled at \texttt{-O2} by GCC or G++ 6.2.0, with all \CC code compiled as \CCfourteen. The benchmarks were run on an Ubuntu 16.04 workstation with 16 GB of RAM and a 6-core AMD FX-6300 CPU with 3.5 GHz maximum clock frequency. The C and \CCV variants are generally the slowest and most memory-hungry, due to their less-efficient memory layout and the pointer-indirection necessary to implement generic types in these languages; this problem is exacerbated by the second level of generic types in the pair-based benchmarks. By contrast, the \CFA and \CC variants run in roughly equivalent time for both the integer and pair of boolean and char tests, which makes sense given that an integer is actually larger than the pair in both languages.
     956The \CC code is the shortest largely due to its use of header-only libraries, as template code cannot be separately compiled, the \CFA line count would shrink to \TODO{} if it used a header-only approach instead of the more idiomatic separate compilation. \CFA and \CC also have the advantage of a more extensive standard library; as part of the standard library neither language's generic @pair@ type is included in the line count, while this type must be written by the user programmer in both C and \CCV. The definition of @object@ and wrapper classes for @bool@, @char@, @int@, and @const char*@ are included in the line count for \CCV, which somewhat inflates its line count, as an actual object-oriented language would include these in the standard library and with their omission the \CCV line count is similar to C; we justify the given line count by the fact that many object-oriented languages do not allow implementing new interfaces on library types without subclassing or boilerplate-filled wrapper types, which may be similarly verbose. Raw line-count, however, is a fairly rough measure of code complexity; another important factor is how much type information the programmer must manually specify, especially where that information is not checked by the compiler. Such un-checked type information produces a heavier documentation burden and increased potential for runtime bugs, and is much less common in \CFA than C, with its manually specified function pointers arguments and format codes, or \CCV, with its extensive use of un-type-checked downcasts (\eg @object@ to @integer@ when popping a stack, or @object@ to @printable@ when printing the elements of a @pair@) \TODO{Actually calculate this; I want to put a distinctive comment in the source code and grep for it}.
    934958\section{Related Work}
    937 \subsection{Generics}
    939960\CC is the existing language it is most natural to compare \CFA to, as they are both more modern extensions to C with backwards source compatibility. The most fundamental difference in approach between \CC and \CFA is their approach to this C compatibility. \CC does provide fairly strong source backwards compatibility with C, but is a dramatically more complex language than C, and imposes a steep learning curve to use many of its extension features. For instance, in a break from general C practice, template code is typically written in header files, with a variety of subtle restrictions implied on its use by this choice, while the other polymorphism mechanism made available by \CC, class inheritance, requires programmers to learn an entirely new object-oriented programming paradigm; the interaction between templates and inheritance is also quite complex. \CFA, by contrast, has a single facility for polymorphic code, one which supports separate compilation and the existing procedural paradigm of C code. A major difference between the approaches of \CC and \CFA to polymorphism is that the set of assumed properties for a type is \emph{explicit} in \CFA. One of the major limiting factors of \CC's approach is that templates cannot be separately compiled, and, until concepts~\citep{C++Concepts} are standardized (currently anticipated for \CCtwenty), \CC provides no way to specify the requirements of a generic function in code beyond compilation errors for template expansion failures. By contrast, the explicit nature of assertions in \CFA allows polymorphic functions to be separately compiled, and for their requirements to be checked by the compiler; similarly, \CFA generic types may be opaque, unlike \CC template classes.
    945966D \citep{D}, Go \citep{Go}, and Rust \citep{Rust} are modern, compiled languages with abstraction features similar to \CFA traits, \emph{interfaces} in D and Go and \emph{traits} in Rust. However, each language represents dramatic departures from C in terms of language model, and none has the same level of compatibility with C as \CFA. D and Go are garbage-collected languages, imposing the associated runtime overhead. The necessity of accounting for data transfer between the managed Go runtime and the unmanaged C runtime complicates foreign-function interface between Go and C. Furthermore, while generic types and functions are available in Go, they are limited to a small fixed set provided by the compiler, with no language facility to define more. D restricts garbage collection to its own heap by default, while Rust is not garbage-collected, and thus has a lighter-weight runtime that is more easily interoperable with C. Rust also possesses much more powerful abstraction capabilities for writing generic code than Go. On the other hand, Rust's borrow-checker, while it does provide strong safety guarantees, is complex and difficult to learn, and imposes a distinctly idiomatic programming style on Rust. \CFA, with its more modest safety features, is significantly easier to port C code to, while maintaining the idiomatic style of the original source.
    948 \subsection{Tuples/Variadics}
    950 Many programming languages have some form of tuple construct and/or variadic functions, \eg SETL, C, KW-C, \CC, D, Go, Java, ML, and Scala.
    951 SETL~\cite{SETL} is a high-level mathematical programming language, with tuples being one of the primary data types.
    952 Tuples in SETL allow subscripting, dynamic expansion, and multiple assignment.
    953 KW-C~\cite{Buhr94a}, a predecessor of \CFA, introduced tuples to C as an extension of the C syntax, taking much of its inspiration from SETL.
    954 The main contributions of that work were adding MRVF, tuple mass and multiple assignment, and record-field access.
    955 \CCeleven introduced @std::tuple@ as a library variadic template structure.
    956 Tuples are a generalization of @std::pair@, in that they allow for arbitrary length, fixed-size aggregation of heterogeneous values.
    957 Operations include @std::get<N>@ to extract vales, @std::tie@ to create a tuple of references used for assignment, and lexicographic comparisons.
    958 \CCseventeen proposes \emph{structured bindings}~\cite{Sutter15} to eliminate pre-declaring variables and use of @std::tie@ for binding the results.
    959 This extension requires the use of @auto@ to infer the types of the new variables, so complicated expressions with a non-obvious type must be documented with some other mechanism.
    960 Furthermore, structured bindings are not a full replacement for @std::tie@, as it always declares new variables.
    961 Like \CC, D provides tuples through a library variadic-template structure.
    962 Go does not have tuple but supports MRVF.
    963 Java's variadic functions appears similar to C's but type-safe using arrays.
    964 Tuples are a fundamental abstraction in most functional programming languages, such as Standard ML~\cite{sml} and Scala~\cite{Scala}, which decompose tuples using pattern matching.
    967968\section{Conclusion \& Future Work}
Note: See TracChangeset for help on using the changeset viewer.