- Timestamp:
- Feb 27, 2019, 10:26:18 PM (6 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, pthread-emulation, qualifiedEnum
- Children:
- d1b1063
- Parents:
- 4cdfcbd
- Location:
- doc/theses/aaron_moss_PhD/phd
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/theses/aaron_moss_PhD/phd/background.tex
r4cdfcbd r4eaefd1 115 115 \subsection{Type Assertions} 116 116 117 Since bare polymorphic types do not provide a great range of available operations, \CFA{} provides a \emph{type assertion} mechanism to provide further information about a type\footnote{Subscript not included in source code }:117 Since bare polymorphic types do not provide a great range of available operations, \CFA{} provides a \emph{type assertion} mechanism to provide further information about a type\footnote{Subscript not included in source code.\label{sub-foot}}: 118 118 119 119 \begin{cfa} … … 129 129 130 130 Monomorphic specializations of polymorphic functions can themselves be used to satisfy type assertions. 131 For instance, !twice! could have been defined like this\foot notemark[5]:131 For instance, !twice! could have been defined like this\footref{sub-foot}: 132 132 133 133 \begin{cfa} -
doc/theses/aaron_moss_PhD/phd/generic-types.tex
r4cdfcbd r4eaefd1 7 7 While this approach is flexible and supports integration with the C type checker and tooling, it is also tedious and error prone, especially for more complex data structures. 8 8 A second approach is to use !void*!-based polymorphism, \eg{} the C standard library functions !bsearch! and !qsort!, which allow for the reuse of common functionality. 9 However, basing all polymorphism on !void*! eliminates the type checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is otherwise not needed.9 However, basing all polymorphism on !void*! eliminates the type checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is otherwise unnecessary. 10 10 A third approach to generic code is to use preprocessor macros, which does allow the generated code to be both generic and type checked, but errors in such code may be difficult to locate and debug. 11 11 Furthermore, writing and using preprocessor macros is unnatural and inflexible. 12 Figure~\ref{bespoke-generic-fig} demonstrates the bespoke approach for a simple linked list with !insert! and !head! operations, while Figure~\ref{void-generic-fig} and Figure~\ref{macro-generic-fig} show the same example using !void*! -and !#define!-based polymorphism, respectively.12 Figure~\ref{bespoke-generic-fig} demonstrates the bespoke approach for a simple linked list with !insert! and !head! operations, while Figure~\ref{void-generic-fig} and Figure~\ref{macro-generic-fig} show the same example using !void*! and !#define!-based polymorphism, respectively. 13 13 14 14 \begin{figure} … … 185 185 \section{Design} 186 186 187 Though a number of languages have some implementation of generic types, backward compatibility with both C and existing \CFA{} polymorphism present ed some unique design constraints for this project.188 The guiding principle was to maintain an unsurprising language model for C programmers without compromising runtime efficiency.189 A key insight for this design was that C already possesses a handful of built-in generic types (\emph{derived types} in the language of the standard \cite[\S{}6.2.5]{C11}), notably pointer (!T*!) and array (!T[]!), and that user-definable generics should act similarly.187 Though a number of languages have some implementation of generic types, backward compatibility with both C and existing \CFA{} polymorphism present some unique design constraints for \CFA{} generics. 188 The guiding principle is to maintain an unsurprising language model for C programmers without compromising runtime efficiency. 189 A key insight for this design is that C already possesses a handful of built-in generic types (\emph{derived types} in the language of the standard \cite[\S{}6.2.5]{C11}), notably pointer (!T*!) and array (!T[]!), and that user-definable generics should act similarly. 190 190 191 191 \subsection{Related Work} … … 194 194 The template approach is closely related to the macro-expansion approach to C polymorphism demonstrated in Figure~\ref{macro-generic-fig}, but where the macro-expansion syntax has been given first-class language support. 195 195 Template expansion has the benefit of generating code with near-optimal runtime efficiency, as distinct optimizations can be applied for each instantiation of the template. 196 On the other hand, template expansion can also lead to significant code bloat, exponential in the worst case \cite{Haberman16}, and the costs of increased instruction cache pressure at runtime and wasted developer time when compiling cannot be discounted.196 On the other hand, template expansion can also lead to significant code bloat, exponential in the worst case \cite{Haberman16}, and the costs of increased compilation time and instruction cache pressure cannot be ignored. 197 197 The most significant restriction of the \CC{} template model is that it breaks separate compilation and C's translation-unit-based encapsulation mechanisms. 198 Because a \CC{} template is not actually code, but rather a sort of``recipe'' to generate code, template code must be visible at its call site to be used.198 Because a \CC{} template is not actually code, but rather a ``recipe'' to generate code, template code must be visible at its call site to be used. 199 199 Furthermore, \CC{} template code cannot be type-checked without instantiating it, a time consuming process with no hope of improvement until \CC{} concepts \cite{C++Concepts} are standardized in \CCtwenty{}. 200 C code, by contrast, only needs a !struct! or function declaration to call that function or use (by-pointer) values of that type, a desirable property to maintain for\CFA{}.200 C code, by contrast, only needs a function declaration to call that function or a !struct! declaration to use (by-pointer) values of that type, desirable properties to maintain in \CFA{}. 201 201 202 202 Java \cite{Java8} has another prominent implementation for generic types, introduced in Java~5 and based on a significantly different approach than \CC{}. … … 205 205 To use this model, a more C-like language such as \CFA{} would be required to dynamically allocate internal storage for variables, track their lifetime, and properly clean them up afterward. 206 206 207 Cyclone \cite{Grossman06} is another language extending C, and also provides capabilities for polymorphic functions and existential types,similar to \CFA{}'s !forall! functions and generic types.207 Cyclone \cite{Grossman06} extends C and also provides capabilities for polymorphic functions and existential types which are similar to \CFA{}'s !forall! functions and generic types. 208 208 Cyclone existential types can include function pointers in a construct similar to a virtual function table, but these pointers must be explicitly initialized at some point in the code, which is tedious and error-prone compared to \CFA{}'s implicit assertion satisfaction. 209 209 Furthermore, Cyclone's polymorphic functions and types are restricted to abstraction over types with the same layout and calling convention as !void*!, \ie{} only pointer types and !int!. … … 215 215 Haskell \cite{Haskell10} combines ML-style polymorphism with the notion of type classes, similar to \CFA{} traits, but requiring an explicit association with their implementing types, unlike \CFA{}. 216 216 Objective-C \cite{obj-c-book} is an extension to C which has had some industrial success; however, it did not support type-checked generics until recently \cite{xcode7}, and its garbage-collected, message-passing object-oriented model is a radical departure from C. 217 Go \cite{Go}, and Rust \cite{Rust} are modern compiled languages with abstraction features similar to \CFA{} traits ,\emph{interfaces} in Go and \emph{traits} in Rust.217 Go \cite{Go}, and Rust \cite{Rust} are modern compiled languages with abstraction features similar to \CFA{} traits: \emph{interfaces} in Go and \emph{traits} in Rust. 218 218 Go has implicit interface implementation and uses a ``fat pointer'' construct to pass polymorphic objects to functions, similar in principle to \CFA{}'s implicit forall parameters. 219 219 Go does not, however, allow user code to define generic types, restricting Go programmers to the small set of generic types defined by the compiler. … … 223 223 \subsection{\CFA{} Generics} 224 224 225 The generic types design in \CFA{} draws inspiration from both \CC{} and Java generics, capturing the betteraspects of each.226 Like \CC{} template types, generic !struct! s and !union!s in \CFA{} have macro-expanded storage layouts, but, like Java generics, \CFA{} generic types can be used with separately-compiled polymorphic functions without requiring either the type or function definition to be visible to the other.225 The generic types design in \CFA{} draws inspiration from both \CC{} and Java generics, capturing useful aspects of each. 226 Like \CC{} template types, generic !struct! and !union! types in \CFA{} have macro-expanded storage layouts, but, like Java generics, \CFA{} generic types can be used with separately-compiled polymorphic functions without requiring either the type or function definition to be visible to the other. 227 227 The fact that the storage layout of any instantiation of a \CFA{} generic type is identical to that of the monomorphic type produced by simple macro replacement of the generic type parameters is important to provide consistent and predictable runtime performance, and to not impose any undue abstraction penalty on generic code. 228 228 As an example, consider the following generic type and function: … … 238 238 239 239 In this example, !with_len! is defined at the same scope as !pair!, but it could be called from any context that can see the definition of !pair! and a declaration of !with_len!. 240 If its return type w as!pair(const char*, int)*!, callers of !with_len! would only need the declaration !forall(otype R, otype S) struct pair! to call it, in accordance with the usual C rules for opaque types.241 242 !with_len! is itself a monomorphic function, returning a type that is structurally identical to !struct { const char* first; int second; }!, and as such could be called from C given a n appropriate re-declarationand demangling flags.240 If its return type were !pair(const char*, int)*!, callers of !with_len! would only need the declaration !forall(otype R, otype S) struct pair! to call it, in accordance with the usual C rules for opaque types. 241 242 !with_len! is itself a monomorphic function, returning a type that is structurally identical to !struct { const char* first; int second; }!, and as such could be called from C given appropriate re-declarations and demangling flags. 243 243 However, the definition of !with_len! depends on a polymorphic function call to the !pair! constructor, which only needs to be written once (in this case, implicitly by the compiler according to the usual \CFA{} constructor generation \cite{Schluntz17}) and can be re-used for a wide variety of !pair! instantiations. 244 Since the parameters to this polymorphic constructor call are all statically known, compiler inlining can eliminate any runtime overhead of this polymorphic call.244 Since the parameters to this polymorphic constructor call are all statically known, compiler inlining can in principle eliminate any runtime overhead of this polymorphic call. 245 245 246 246 \CFA{} deliberately does not support \CC{}-style partial specializations of generic types. … … 251 251 252 252 Since \CFA{} polymorphic functions can operate over polymorphic generic types, functions over such types can be partially or completely specialized using the usual overload selection rules. 253 As an example, the !with_len! function above could be an optimization of the following more general function:253 As an example, the following generalization of !with_len! is a semantically-equivalent function which works for any type that has a !len! function declared, making use of both the ad-hoc (overloading) and parametric (!forall!) polymorphism features of \CFA{}: 254 254 255 255 \begin{cfa} … … 260 260 \end{cfa} 261 261 262 \CFA{} generic types also support t he type constraints from!forall! functions.262 \CFA{} generic types also support type constraints, as in !forall! functions. 263 263 For example, the following declaration of a sorted set type ensures that the set key implements equality and relational comparison: 264 264 … … 267 267 \end{cfa} 268 268 269 These constraints are implemented by applying equivalent constraints to the compiler-generated constructors for this type.269 These constraints are enforced by applying equivalent constraints to the compiler-generated constructors for this type. 270 270 271 271 \section{Implementation} \label{generic-impl-sec} 272 272 273 The ability to use generic types in polymorphic contexts means that the \CFA{} implementation in \CFACC{} must support a mechanism for accessing fields of generic types dynamically at runtime. 274 While \CFACC{} could in principle use this same mechanism for accessing fields of all generic types, such an approach would throw away compiler knowledge of static types and impose an unnecessary runtime cost, limiting the utility of the generic type design. 275 Instead, my design for generic type support in \CFACC{} distinguishes between \emph{concrete} generic types that have a fixed memory layout regardless of type parameters and \emph{dynamic} generic types that may vary in memory layout depending on their type parameters. 273 The ability to use generic types in polymorphic contexts means that the \CFA{} implementation must support a mechanism for accessing fields of generic types dynamically. 274 While \CFACC{} could in principle use this same mechanism for accessing fields of generic types in monomorphic contexts as well, such an approach would throw away compiler knowledge of static types and impose an unnecessary runtime cost. 275 Instead, my design for generic types in \CFACC{} distinguishes between \emph{concrete} generic types that have a fixed memory layout regardless of type parameters and \emph{dynamic} generic types that may vary in memory layout depending on their type parameters. 276 276 277 A \emph{dtype-static} type has polymorphic parameters but is still concrete. 277 278 Polymorphic pointers are an example of dtype-static types; given some type variable !T!, !T! is a polymorphic type, but !T*! has a fixed size and can therefore be represented by a !void*! in code generation. … … 283 284 \begin{cfa} 284 285 //dynamic, layout varies based on T 285 forall(otype T) T value ( pair(const char*, T) p ) { return p.second; }286 forall(otype T) T value$\(_1\)$( pair(const char*, T) p ) { return p.second; } 286 287 287 288 // dtype-static, F* and T* are concrete but recursively polymorphic 288 forall(dtype F, otype T) T value ( pair(F*, T*) ) { return *p.second; }289 forall(dtype F, otype T) T value$\(_2\)$( pair(F*, T*) ) { return *p.second; } 289 290 290 291 pair(const char*, int) p = {"magic", 42}; $\C[2.5in]{// concrete}$ 291 292 int i = value(p); 292 pair(void*, int*) q = {0, & p.second}; $\C[2.5in]{// concrete}$293 pair(void*, int*) q = {0, &i}; $\C[2.5in]{// concrete}$ 293 294 i = value(q); 294 295 double d = 1.0; … … 299 300 \subsection{Concrete Generic Types} 300 301 301 The \CFACC{} translator template 302 The \CFACC{} translator template-expands concrete generic types into new structure types, affording maximal inlining. 302 303 To enable interoperation among equivalent instantiations of a generic type, \CFACC{} saves the set of instantiations currently in scope and reuses the generated structure declarations where appropriate. 303 304 In particular, tuple types are implemented as a single compiler-generated generic type definition per tuple arity, and can be instantiated and reused according to the usual rules for generic types. 304 305 A function declaration that accepts or returns a concrete generic type produces a declaration for the instantiated structure in the same scope, which all callers may reuse. 305 As an example, the concrete instantiation for !pair(const char*, int)! is\footnote{ This omits the field name mangling performed by \CFACC{} for overloading purposes.\label{mangle-foot}}306 As an example, the concrete instantiation for !pair(const char*, int)! is\footnote{Field name mangling for overloading purposes is omitted.\label{mangle-foot}}: 306 307 307 308 \begin{cfa} … … 310 311 311 312 A concrete generic type with dtype-static parameters is also expanded to a structure type, but this type is used for all matching instantiations. 312 In the example above, the !pair(F*, T*)! parameter to !value! is such a type; its expansion is below\footref{mangle-foot}, and it is used as the type of the variables !q! and !r! as well, with casts for member access where appropriate .313 In the example above, the !pair(F*, T*)! parameter to !value! is such a type; its expansion is below\footref{mangle-foot}, and it is used as the type of the variables !q! and !r! as well, with casts for member access where appropriate: 313 314 314 315 \begin{cfa} … … 322 323 The design for generic types presented here adds an \emph{offset array} containing structure-member offsets for dynamic generic !struct! types. 323 324 A dynamic generic !union! needs no such offset array, as all members are at offset 0, but size and alignment are still necessary. 324 Access to members of a dynamic structure is provided at runtime via base displacement addressingthe structure pointer and the member offset (similar to the !offsetof! macro), moving a compile-time offset calculation to runtime.325 Access to members of a dynamic structure is provided at runtime via base-displacement addressing of the structure pointer and the member offset (similar to the !offsetof! macro), moving a compile-time offset calculation to runtime. 325 326 326 327 The offset arrays are statically generated where possible. 327 328 If a dynamic generic type is passed or returned by value from a polymorphic function, \CFACC{} can safely assume that the generic type is complete (\ie{} has a known layout) at any call site, and the offset array is passed from the caller; if the generic type is concrete at the call site, the elements of this offset array can even be statically generated using the C !offsetof! macro. 328 As an example, the body of the second !value! function above is implemented as329 As an example, the body of !value!$_2$ above is implemented as: 329 330 330 331 \begin{cfa} … … 333 334 334 335 Here, !_assign_T! is passed in as an implicit parameter from !otype T! and takes two !T*! (!void*! in the generated code), a destination and a source, and !_retval! is the pointer to a caller-allocated buffer for the return value, the usual \CFA{} method to handle dynamically-sized return types. 335 !_offsetof_pair! is the offset array passed into !value!; this array is statically generated at the call site as 336 !_offsetof_pair! is the offset array passed into !value!; this array is statically generated at the call site as: 336 337 337 338 \begin{cfa} … … 347 348 These layout functions take as arguments pointers to size and alignment variables and a caller-allocated array of member offsets, as well as the size and alignment of all !sized! parameters to the generic structure. 348 349 Un-!sized! parameters are not passed because they are forbidden from being used in a context that affects layout by C's usual rules about incomplete types. 349 Similarly, the layout function can only safely be called from a context where the generic type definition is visible, because otherwise the caller willnot know how large to allocate the array of member offsets.350 Similarly, the layout function can only safely be called from a context where the generic type definition is visible, because otherwise the caller does not know how large to allocate the array of member offsets. 350 351 351 352 The C standard does not specify a memory layout for structs, but the POSIX ABI for x86 \cite{POSIX08} does; this memory layout is common for C implementations, but is a platform-specific issue for porting \CFA{}. … … 356 357 forall(dtype T1, dtype T2, ... | sized(T1) | sized(T2) | ...) 357 358 void layout(size_t* size, size_t* align, size_t* offsets) { 358 // initialize values359 359 *size = 0; *align = 1; 360 360 // set up members … … 374 374 \end{cfa} 375 375 376 Results of layout 376 Results of layout-function calls are cached so that they are only computed once per type per function. 377 377 Layout functions also allow generic types to be used in a function definition without reflecting them in the function signature, an important implementation-hiding constraint of the design. 378 378 For instance, a function that strips duplicate values from an unsorted !list(T)! likely has a reference to the list as its only explicit parameter, but uses some sort of !set(T)! internally to test for duplicate values. … … 380 380 381 381 Whether a type is concrete, dtype-static, or dynamic is decided solely on the basis of the type arguments and !forall! clause type parameters. 382 This design allows opaque forward declarations of generic types, \eg{} !forall(otype T) struct Box;! like in C, all uses of !Box(T)! can be separately compiled, and callers from other translation units know the proper calling conventions to use.383 In an alternate design where the definition of a structure type is included in deciding whether a generic type is dynamic or concrete, some further types may be recognized as dtype-static --- \eg{} !Box! could be defined with a body !{ T* p; }!, and would thus not depend on !T! for its layout.384 However, the existence of an !otype! parameter !T! means that !Box! \emph{could} depend on !T! for its layout if this definition is not visible, and we judged preserving separate compilation (and the associated C compatibility) in the implemented design to be an acceptable trade-off.382 This design allows opaque forward declarations of generic types, \eg{} !forall(otype T) struct Box;! like in C, all uses of !Box(T)! can be separately compiled, and callers from other translation units know the proper calling conventions. 383 In an alternate design, where the definition of a structure type is included in deciding whether a generic type is dynamic or concrete, some further types may be recognized as dtype-static --- \eg{} !Box! could be defined with a body !{ T* p; }!, and would thus not depend on !T! for its layout. 384 However, the existence of an !otype! parameter !T! means that !Box! \emph{could} depend on !T! for its layout if this definition is not visible, and preserving separate compilation (and the associated C compatibility) is a more important design metric. 385 385 386 386 \subsection{Applications of Dtype-static Types} \label{dtype-static-sec} … … 401 401 Another useful pattern enabled by reused dtype-static type instantiations is zero-cost \emph{tag structures}. 402 402 Sometimes, information is only used for type checking and can be omitted at runtime. 403 In the example below, !scalar! is a dtype-static type; hence, all uses have a single structure definition containing !unsigned long! and can share the same implementations of common functions like !?+?!.403 In the example below, !scalar! is a dtype-static type; hence, all uses have a single structure definition containing !unsigned long! and can share the same implementations of common functions, like !?+?!. 404 404 These implementations may even be separately compiled, unlike \CC{} template functions. 405 405 However, the \CFA{} type checker ensures matching types are used by all calls to !?+?!, preventing nonsensical computations like adding a length to a volume. … … 422 422 \section{Performance Experiments} \label{generic-performance-sec} 423 423 424 To validate the practicality of this generic type design I have conducted microbenchmark-based testsagainst a number of comparable code designs in C and \CC{}, first published in \cite{Moss18}.425 Since all these languages are compiled with the same compiler backend and share a subset essentially comprising standard C, maximal-performance benchmarks should show little runtime variance, differing only in length and clarity of source code.424 To validate the practicality of this generic type design, microbenchmark-based tests were conducted against a number of comparable code designs in C and \CC{}, first published in \cite{Moss18}. 425 Since all these languages are all C-based and compiled with the same compiler backend, maximal-performance benchmarks should show little runtime variance, differing only in length and clarity of source code. 426 426 A more illustrative comparison measures the costs of idiomatic usage of each language's features. 427 427 The code below shows the \CFA{} benchmark tests for a generic stack based on a singly-linked list; the test suite is equivalent for the other other languages. … … 429 429 430 430 \begin{cfa} 431 #define N 4000000 431 432 int main() { 432 433 int max = 0, val = 42;
Note: See TracChangeset
for help on using the changeset viewer.