Changeset 3ed64ff

Ignore:
Timestamp:
Mar 28, 2017, 10:06:18 PM (6 years ago)
Branches:
aaron-thesis, arm-eh, cleanup-dtors, deferred_resn, demangler, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, new-env, no_list, persistent-indexer, pthread-emulation, qualifiedEnum, resolv-new, with_gc
Children:
95448f1e
Parents:
c1fb1f2f
Message:

Initial editing pass on generics/tuples paper

File:
1 edited

Legend:

Unmodified
 rc1fb1f2f \lstdefinelanguage{CFA}[ANSI]{C}{ morekeywords={_Alignas,_Alignof,__alignof,__alignof__,asm,__asm,__asm__,_At,_Atomic,__attribute,__attribute__,auto, _Bool,catch,catchResume,choose,_Complex,__complex,__complex__,__const,__const__,disable,dtype,enable,__extension__, _Bool,bool,catch,catchResume,choose,_Complex,__complex,__complex__,__const,__const__,disable,dtype,enable,__extension__, fallthrough,fallthru,finally,forall,ftype,_Generic,_Imaginary,inline,__label__,lvalue,_Noreturn,one_t,otype,restrict,size_t,sized,_Static_assert, _Thread_local,throw,throwResume,trait,try,typeof,__typeof,__typeof__,zero_t}, _Thread_local,throw,throwResume,trait,try,ttype,typeof,__typeof,__typeof__,zero_t}, }% \begin{abstract} The C programming language is a foundational technology for modern computing, with millions of lines of code implementing everything from commercial operating systems to hobby projects. This installed base of code and the programmers who produced it represent a massive software engineering investment spanning decades. Nonetheless, C, first standardized over thirty years ago, lacks many features that make programming in more modern languages safer and more productive. The goal of the \CFA{} project is to create an extension of C which provides modern safety and productivity features while still providing strong backwards compatibility with C. Particularly, \CFA{} is designed to have an orthogonal feature set based closely on the C programming paradigm, so that \CFA{} features can be added incrementally to existing C codebases, and C programmers can learn \CFA{} extensions on an as-needed basis, preserving investment in existing engineers and code. This paper describes how generic and tuple types are implemented in \CFA{} in accordance with these principles. The C programming language is a foundational technology for modern computing, with millions of lines of code implementing everything from commercial operating systems to hobby projects. This installed base of code and the programmers who produced it represent a massive software engineering investment spanning decades. Nonetheless, C, first standardized over thirty years ago, lacks many features that make programming in more modern languages safer and more productive. The goal of the \CFA{} project is to create an extension of C that provides modern safety and productivity features while still ensuring strong backwards compatibility with C. Particularly, \CFA{} is designed to have an orthogonal feature set based closely on the C programming paradigm, so that \CFA{} features can be added incrementally to existing C code-bases, and C programmers can learn \CFA{} extensions on an as-needed basis, preserving investment in existing engineers and code. This paper describes how generic and tuple types are implemented in \CFA{} in accordance with these principles. \end{abstract} \section{Introduction \& Background} \CFA{}\footnote{Pronounced C-for-all'', and written \CFA{} or Cforall.} is an evolutionary extension of the C programming language which aims to add modern language features to C while maintaining both source compatibility with C and a familiar mental model for programmers. Four key design goals were set out in the original design of \CFA{} \citep{Bilson03}: \CFA{}\footnote{Pronounced C-for-all'', and written \CFA{} or Cforall.} is an evolutionary extension of the C programming language that aims to add modern language features to C while maintaining both source compatibility with C and a familiar mental model for programmers. Four key design goals were set out in the original design of \CFA{} \citep{Bilson03}: \begin{enumerate} \item The behaviour of standard C code must remain the same when translated by a \CFA{} compiler as when translated by a C compiler. \item Extensions introduced by \CFA{} must be translated in the most efficient way possible. \end{enumerate} The purpose of these goals is to ensure that existing C codebases can be converted to \CFA{} incrementally and with minimal effort, and that programmers who already know C can productively produce \CFA{} code without extensive training beyond the extension features they wish to employ. In its current implementation, \CFA{} is compiled by translating it to GCC-dialect C, allowing it to leverage the portability and code optimizations provided by GCC, meeting goals (1)-(3). The purpose of these goals is to ensure that existing C code-bases can be converted to \CFA{} incrementally and with minimal effort, and that programmers who already know C can productively produce \CFA{} code without training in \CFA{} beyond the extension features they wish to employ. In its current implementation, \CFA{} is compiled by translating it to GCC-dialect C, allowing it to leverage the portability and code optimizations provided by GCC, meeting goals (1)-(3). \CFA{} has been previously extended with polymorphic functions and name overloading (including operator overloading) \citep{Bilson03}, and deterministically-executed constructors and destructors \citep{Schluntz17}. This paper describes how generic and tuple types are designed and implemented in \CFA{} in accordance with both the backward compatibility goals and existing features described above. int forty_two = identity(42); // T is bound to int, forty_two == 42 \end{lstlisting} The @identity@ function above can be applied to any complete object type (or @otype@''). The type variable @T@ is transformed into a set of additional implicit parameters to @identity@, which encode sufficient information about @T@ to create and return a variable of that type. The \CFA{} implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, the type parameter can be declared as @dtype T@, where @dtype@ is short for data type''. The @identity@ function above can be applied to any complete object type (or @otype@''). The type variable @T@ is transformed into a set of additional implicit parameters to @identity@, that encode sufficient information about @T@ to create and return a variable of that type. The \CFA{} implementation passes the size and alignment of the type represented by an @otype@ parameter, as well as an assignment operator, constructor, copy constructor and destructor. If this extra information is not needed, the type parameter can be declared as @dtype T@, where @dtype@ is short for data type''. Here, the runtime cost of polymorphism is spread over each polymorphic call, due to passing more arguments to polymorphic functions; preliminary experiments have shown this overhead to be similar to \CC{} virtual function calls. An advantage of this design is that, unlike \CC{} template functions, \CFA{} @forall@ functions are compatible with separate compilation. \end{lstlisting} This version of @twice@ works for any type @S@ that has an addition operator defined for it, and it could have been used to satisfy the type assertion on @four_times@. The translator accomplishes this by creating a wrapper function calling @twice // (2)@ with @S@ bound to @double@, then providing this wrapper function to @four_times@\footnote{\lstinline@twice // (2)@ could also have had a type parameter named \lstinline@T@; \CFA{} specifies renaming of the type parameters, which would avoid the name conflict with the type variable \lstinline@T@ of \lstinline@four_times@.}. The translator accomplishes this polymorphism by creating a wrapper function calling @twice // (2)@ with @S@ bound to @double@, then providing this wrapper function to @four_times@\footnote{\lstinline@twice // (2)@ could also have had a type parameter named \lstinline@T@; \CFA{} specifies renaming of the type parameters, which would avoid the name conflict with the type variable \lstinline@T@ of \lstinline@four_times@.}. \subsection{Traits} }; \end{lstlisting} Given this information, variables of polymorphic type can be treated as if they were a complete struct type -- they can be stack-allocated using the @alloca@ compiler builtin, default or copy-initialized, assigned, and deleted. As an example, the @abs@ function above would produce generated code something like the following (simplified for clarity and brevity): Given the information provided for an @otype@, variables of polymorphic type can be treated as if they were a complete struct type -- they can be stack-allocated using the @alloca@ compiler builtin, default or copy-initialized, assigned, and deleted. As an example, the @abs@ function above produces generated code something like the following (simplified for clarity and brevity): \begin{lstlisting} void abs( size_t _sizeof_M, size_t _alignof_M, void* m, void* _rtn ) {  // polymorphic parameter and return passed as void* // M zero = { 0 }; void* zero = alloca(_sizeof_M);  // stack allocate 0 temporary void* zero = alloca(_sizeof_M);  // stack allocate zero temporary _ctor_M_zero(zero, 0);  // initialize using zero_t constructor // return m < zero ? -m : m; int is_nominal;  // int now satisfies the nominal trait { char is_nominal; // char satisfies the nominal trait } // char no longer satisfies the nominal trait here \end{lstlisting} Traits, however, are significantly more powerful than nominal-inheritance interfaces; firstly, due to the scoping rules of the declarations that satisfy a trait's type assertions, a type may not satisfy a trait everywhere that the type is declared, as with @char@ and the @nominal@ trait above. Secondly, traits may be used to declare a relationship among multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems: \end{lstlisting} Traits, however, are significantly more powerful than nominal-inheritance interfaces; most notably, traits may be used to declare a relationship among multiple types, a property that may be difficult or impossible to represent in nominal-inheritance type systems: \begin{lstlisting} trait pointer_like(otype Ptr, otype El) { \end{lstlisting} In the example above, @(list_iterator, int)@ satisfies @pointer_like@ by the user-defined dereference function, and @(list_iterator, list)@ also satisfies @pointer_like@ by the built-in dereference operator for pointers. Given a declaration @list_iterator it@, @*it@ can be either an @int@ or a @list@, with the meaning disambiguated by context (\eg, @int x = *it;@ interprets @*it@ as an @int@, while @(*it).value = 42;@ interprets @*it@ as a @list@). In the example above, @(list_iterator, int)@ satisfies @pointer_like@ by the user-defined dereference function, and @(list_iterator, list)@ also satisfies @pointer_like@ by the built-in dereference operator for pointers. Given a declaration @list_iterator it@, @*it@ can be either an @int@ or a @list@, with the meaning disambiguated by context (\eg{} @int x = *it;@ interprets @*it@ as an @int@, while @(*it).value = 42;@ interprets @*it@ as a @list@). While a nominal-inheritance system with associated types could model one of those two relationships by making @El@ an associated type of @Ptr@ in the @pointer_like@ implementation, few such systems could model both relationships simultaneously. \section{Generic Types} One of the known shortcomings of standard C is that it does not provide reusable type-safe abstractions for generic data structures and algorithms. Broadly speaking, there are three approaches to create data structures in C. One approach is to write bespoke data structures for each context in which they are needed. While this approach is flexible and supports integration with the C type-checker and tooling, it is also tedious and error-prone, especially for data structures more complicated than a singly-linked list. A second approach is to use @void*@-based polymorphism. This approach is taken by the C standard library functions @qsort@ and @bsearch@, and does allow the use of common code for common functionality. However, basing all polymorphism on @void*@ eliminates the type-checker's ability to ensure that argument types are properly matched, as well as adding pointer indirection and dynamic allocation to algorithms and data structures which would not otherwise require them. A third approach to generic code is to use pre-processor macros to generate it -- this approach does allow the generated code to be both generic and type-checked, though any errors produced may be difficult to read. Furthermore, writing and invoking C code as preprocessor macros is unnatural and somewhat inflexible. Other C-like languages such as \CC{} and Java use \emph{generic types} to produce type-safe abstract data types. The \CFA{} team has chosen to implement generic types as well, with the constraints that the generic types design for \CFA{} must integrate efficiently and naturally with the existing polymorphic functions in \CFA{}, while retaining backwards compatibility with C; maintaining separate compilation is a particularly important constraint on the design. However, where the concrete parameters of the generic type are known, there should not be extra overhead for the use of a generic type. One of the known shortcomings of standard C is that it does not provide reusable type-safe abstractions for generic data structures and algorithms. Broadly speaking, there are three approaches to create data structures in C. One approach is to write bespoke data structures for each context in which they are needed. While this approach is flexible and supports integration with the C type-checker and tooling, it is also tedious and error-prone, especially for more complex data structures. A second approach is to use @void*@-based polymorphism. This approach is taken by the C standard library functions @qsort@ and @bsearch@, and does allow the use of common code for common functionality. However, basing all polymorphism on @void*@ eliminates the type-checker's ability to ensure that argument types are properly matched, often requires a number of extra function parameters, and also adds pointer indirection and dynamic allocation to algorithms and data structures that would not otherwise require them. A third approach to generic code is to use pre-processor macros to generate it -- this approach does allow the generated code to be both generic and type-checked, though any errors produced may be difficult to interpret. Furthermore, writing and invoking C code as preprocessor macros is unnatural and somewhat inflexible. Other C-like languages such as \CC{} and Java use \emph{generic types} to produce type-safe abstract data types. The authors have chosen to implement generic types as well, with some care taken that the generic types design for \CFA{} integrates efficiently and naturally with the existing polymorphic functions in \CFA{} while retaining backwards compatibility with C; maintaining separate compilation is a particularly important constraint on the design. However, where the concrete parameters of the generic type are known, there is not extra overhead for the use of a generic type. A generic type can be declared by placing a @forall@ specifier on a @struct@ or @union@ declaration, and instantiated using a parenthesized list of types after the type name: \end{lstlisting} \CFA{} classifies generic types as either \emph{concrete} or \emph{dynamic}. Dynamic generic types vary in their in-memory layout depending on their type parameters, while concrete generic types have a fixed memory layout regardless of type parameters. A type may have polymorphic parameters but still be concrete; \CFA{} refers to such types as \emph{dtype-static}. Polymorphic pointers are an example of dtype-static types -- @forall(dtype T) T*@ is a polymorphic type, but for any @T@ chosen, @T*@ will have exactly the same in-memory representation as a @void*@, and can therefore be represented by a @void*@ in code generation. \CFA{} generic types may also specify constraints on their argument type that will be checked by the compiler. For example, consider the following declaration of a sorted set type, which will ensure that the set key supports comparison and tests for equality: \begin{lstlisting} forall(otype Key | { bool ?==?(Key, Key); bool ?