- Timestamp:
- Sep 25, 2018, 5:33:02 PM (6 years ago)
- Branches:
- ADT, aaron-thesis, arm-eh, ast-experimental, cleanup-dtors, deferred_resn, enum, forall-pointer-decay, jacob/cs343-translation, jenkins-sandbox, master, new-ast, new-ast-unique-expr, no_list, persistent-indexer, pthread-emulation, qualifiedEnum
- Children:
- 9ad5ee1
- Parents:
- 461eed2 (diff), a32346b (diff)
Note: this is a merge changeset, the changes displayed below correspond to the merge itself.
Use the(diff)
links above to see all the changes relative to each parent. - Location:
- doc
- Files:
-
- 5 added
- 6 edited
Legend:
- Unmodified
- Added
- Removed
-
doc/proposals/virtual.txt
r461eed2 r34a6b2e 87 87 first polymorphic parameter). 88 88 89 Once a function in a trait has been marked as virtual it defines a new 90 function that takes in that trait's reference and then dynamically calls the 91 underlying type implementation. Hence a trait reference becomes a kind of 92 abstract type, cannot be directly instantiated but can still be used.89 Instances of a trait are created by wrapping an existing instance of a type 90 that implements that trait. This wrapper includes all the function pointers 91 and other values required to preform the dynamic look-up. These are chosen by 92 the normal look-up rules at the point of abstraction. 93 93 94 94 One of the limitations of this design is that it does not support double … … 98 98 is also restricted, initially forbidden, see extension. 99 99 100 Ownership of the underlying structure is also a bit of a trick. Considering 101 the use cases for trait object, it is probably best to have the underlying 102 object be heap allocated and owned by the trait object. 103 100 104 Extension: Multi-parameter Virtual Traits: 101 105 … … 157 161 context, for instance if the cast occurs on the right hand side of an 158 162 assignment. 163 164 Function look-up follows the same rules as relaxed (behavioural) inheritance. 165 Traits can be upcast and down cast without losing information unless the 166 trait is cast down to a structure. Here there are two options. 167 168 Abstraction Time Binding: The more efficient and consistant with other parts 169 of CFA. Only the trait types use dynamic look-up, if converveted back into a 170 structure the normal static look-up rules find the function at compile time. 171 Casting down to a structure type can then result in the loss of a set of 172 bindings. 173 Construction Time Binding: For more consistant handling of the virtual 174 structs, they are always considered wrapped. Functions are bound to the 175 instance the moment it is constructed and remain unchanged throughout its 176 lifetime, so down casting does not lose information. 177 178 (We will have to decide between one of these two.) 159 179 160 180 Extension: Multiple Parents … … 205 225 206 226 We have so far been silent on how the vtable is created, stored and accessed. 207 208 Creation happens at compile time. Function pointers are found by using the 209 same best match rules as elsewhere (additional rules for defaults from the 210 parent may or may not be required). For strict virtual this must happen at the 211 global scope and forbidding static functions, to ensure that a single unique 212 vtable is created. Similarly, there may have to be stricter matching rules 213 for the functions that go into the vtable, possibly requiring an exact match. 214 Relaxed virtual could relax both restrictions, if we allow different vtable 215 at different conversion (struct to trait reference) sites. If it is allowed 216 local functions being bound to a vtable could cause issues when they go out 217 of scope, however this should follow the lifetime rules most C programs 218 already follow implicitly. 219 220 Most vtables should be stored statically, the only exception being some of 221 the relaxed vtables that could have local function pointers. These may be able 222 to be stack allocated. All vtables should be immutable and require no manual 223 cleanup. 227 The vtables for the two types might be handled slightly differently and then 228 there is also the hierarchy data for virtual casts. 229 230 The hierarchy data is simple conceptually. A single (exactly one copy) pointer 231 for each type can act as the identity for it. The value of the pointer is 232 its parent type, with the root pointer being NULL. Additional meta-data 233 can accompany the parent pointer, such as a string name or the vtable fields. 234 235 They types of each vtable can be constructed from the definitions of the 236 traits (or internal nodes). The stand alone/base vtable is the same for both 237 kinds of inheritance. It may be argumented differently however (include parent 238 /this pointer in hierachal inheritance). 239 240 Creation of the actual vtable is tricky. For classical single implementation 241 semantics we would assemble the functions and create one vtable at compile 242 time. However, not only does this not give CFA-like behaviour, it is 243 impossible generally because types can satify assertions in different ways at 244 different times and stop satifying them. A special set of harder rules could 245 be used, instead we have decided to try creating multiple vtables for each 246 type. The different vtables will all implement the same type but not always 247 in the same way. 248 249 Storage has some issues from creation. If the contents of every vtable could 250 be determained at compile time they could all be created and stored 251 statically. However since thunks can be constructed on the stack and become 252 the best match, that isn't always possible. Those will have to be stored in 253 dynamic memory. Which means that all vtables must be stored dynamically or 254 there must be a way to determain which ones to free when the trait object is 255 destroyed. 224 256 225 257 Access has two main options: -
doc/theses/aaron_moss_PhD/phd/.gitignore
r461eed2 r34a6b2e 1 1 templates/ 2 code/a.out 2 3 thesis.pdf 3 4 thesis.aux -
doc/theses/aaron_moss_PhD/phd/Makefile
r461eed2 r34a6b2e 17 17 introduction \ 18 18 background \ 19 generic-types \ 19 20 type-environment \ 20 21 resolution-heuristics \ -
doc/theses/aaron_moss_PhD/phd/cfa-macros.tex
r461eed2 r34a6b2e 43 43 tabsize=5, % N space tabbing 44 44 xleftmargin=\parindentlnth, % indent code to paragraph indentation 45 %mathescape=true, % LaTeX math escape in CFA code $...$46 45 escapechar=\$, % LaTeX escape in CFA code 47 46 keepspaces=true, % -
doc/theses/aaron_moss_PhD/phd/frontpgs.tex
r461eed2 r34a6b2e 160 160 % L I S T O F F I G U R E S 161 161 % ----------------------------- 162 %\addcontentsline{toc}{chapter}{List of Figures}163 %\listoffigures164 %\cleardoublepage165 %\phantomsection % allows hyperref to link to the correct page162 \addcontentsline{toc}{chapter}{List of Figures} 163 \listoffigures 164 \cleardoublepage 165 \phantomsection % allows hyperref to link to the correct page 166 166 167 167 % GLOSSARIES (Lists of definitions, abbreviations, symbols, etc. provided by the glossaries-extra package) -
doc/theses/aaron_moss_PhD/phd/generic-types.tex
r461eed2 r34a6b2e 2 2 \label{generic-chap} 3 3 4 Talk about generic types. Pull from Moss~\etal\cite{Moss18}. 4 A significant shortcoming in standard C is the lack of reusable type-safe abstractions for generic data structures and algorithms. 5 Broadly speaking, there are three approaches to implement abstract data structures in C. 6 One approach is to write bespoke data structures for each context in which they are needed. 7 While this approach is flexible and supports integration with the C type checker and tooling, it is also tedious and error prone, especially for more complex data structures. 8 A second approach is to use !void*!-based polymorphism, \eg{} the C standard library functions !bsearch! and !qsort!, which allow for the reuse of common functionality. 9 However, basing all polymorphism on !void*! eliminates the type checker's ability to ensure that argument types are properly matched, often requiring a number of extra function parameters, pointer indirection, and dynamic allocation that is otherwise not needed. 10 A third approach to generic code is to use preprocessor macros, which does allow the generated code to be both generic and type checked, but errors in such code may be difficult to locate and debug. 11 Furthermore, writing and using preprocessor macros is unnatural and inflexible. 12 Figure~\ref{bespoke-generic-fig} demonstrates the bespoke approach for a simple linked list with !insert! and !head! operations, while Figure~\ref{void-generic-fig} and Figure~\ref{macro-generic-fig} show the same example using !void*!- and !#define!-based polymorphism, respectively. 13 14 \begin{figure} 15 \begin{cfa} 16 struct int_list { int value; struct int_list* next; }; 17 18 void int_list_insert( struct int_list** ls, int x ) { 19 struct int_list* node = malloc(sizeof(struct int_list)); 20 node->value = x; node->next = *ls; 21 *ls = node; 22 } 23 24 int int_list_head( const struct int_list* ls ) { return ls->value; } 25 26 $\C[\textwidth]{// all code must be duplicated for every generic instantiation}$ 27 28 struct string_list { const char* value; struct string_list* next; }; 29 30 void string_list_insert( struct string_list** ls, const char* x ) { 31 struct string_list* node = malloc(sizeof(struct string_list)); 32 node->value = x; node->next = *ls; 33 *ls = node; 34 } 35 36 const char* string_list_head( const struct string_list* ls ) 37 { return ls->value; } 38 39 $\C[\textwidth]{// use is efficient and idiomatic}$ 40 41 int main() { 42 struct int_list* il = NULL; 43 int_list_insert( &il, 42 ); 44 printf("%d\n", int_list_head(il)); 45 46 struct string_list* sl = NULL; 47 string_list_insert( &sl, "hello" ); 48 printf("%s\n", string_list_head(sl)); 49 } 50 \end{cfa} 51 52 \caption{Bespoke code for linked list implementation.} \label{bespoke-generic-fig} 53 \end{figure} 54 55 \begin{figure} 56 \begin{cfa} 57 // single code implementation 58 59 struct list { void* value; struct list* next; }; 60 61 $\C[\textwidth]{// internal memory management requires helper functions}$ 62 63 void list_insert( struct list** ls, void* x, void* (*copy)(void*) ) { 64 struct list* node = malloc(sizeof(struct list)); 65 node->value = copy(x); node->next = *ls; 66 *ls = node; 67 } 68 69 void* list_head( const struct list* ls ) { return ls->value; } 70 71 $\C[\textwidth]{// helpers duplicated per type}$ 72 73 void* int_copy(void* x) { 74 int* n = malloc(sizeof(int)); 75 *n = *(int*)x; 76 return n; 77 } 78 79 void* string_copy(void* x) { return strdup((const char*)x); } 80 81 int main() { 82 struct list* il = NULL; 83 int i = 42; 84 list_insert( &il, &i, int_copy ); 85 printf("%d\n", *(int*)list_head(il)); $\C[2in]{// unsafe type cast}$ 86 87 struct list* sl = NULL; 88 list_insert( &sl, "hello", string_copy ); 89 printf("%s\n", (char*)list_head(sl)); $\C[2in]{// unsafe type cast}$ 90 } 91 \end{cfa} 92 93 \caption{\lstinline{void*}-polymorphic code for linked list implementation.} \label{void-generic-fig} 94 \end{figure} 95 96 \begin{figure} 97 \begin{cfa} 98 $\C[\textwidth]{// code is nested in macros}$ 99 100 #define list(N) N ## _list 101 102 #define list_insert(N) N ## _list_insert 103 104 #define list_head(N) N ## _list_head 105 106 #define define_list(N, T) $\C[0.25in]{ \textbackslash }$ 107 struct list(N) { T value; struct list(N)* next; }; $\C[0.25in]{ \textbackslash }$ 108 $\C[0.25in]{ \textbackslash }$ 109 void list_insert(N)( struct list(N)** ls, T x ) { $\C[0.25in]{ \textbackslash }$ 110 struct list(N)* node = malloc(sizeof(struct list(N))); $\C[0.25in]{ \textbackslash }$ 111 node->value = x; node->next = *ls; $\C[0.25in]{ \textbackslash }$ 112 *ls = node; $\C[0.25in]{ \textbackslash }$ 113 } $\C[0.25in]{ \textbackslash }$ 114 $\C[0.25in]{ \textbackslash }$ 115 T list_head(N)( const struct list(N)* ls ) { return ls->value; } 116 117 define_list(int, int); $\C[3in]{// defines int\_list}$ 118 define_list(string, const char*); $\C[3in]{// defines string\_list}$ 119 120 $\C[\textwidth]{// use is efficient, but syntactically idiosyncratic}$ 121 122 int main() { 123 struct list(int)* il = NULL; $\C[3in]{// does not match compiler-visible name}$ 124 list_insert(int)( &il, 42 ); 125 printf("%d\n", list_head(int)(il)); 126 127 struct list(string)* sl = NULL; 128 list_insert(string)( &sl, "hello" ); 129 printf("%s\n", list_head(string)(sl)); 130 } 131 \end{cfa} 132 133 \caption{Macros for generic linked list implementation.} \label{macro-generic-fig} 134 \end{figure} 135 136 \CC{}, Java, and other languages use \emph{generic types} to produce type-safe abstract data types. 137 Design and implementation of generic types for \CFA{} is the first major contribution of this thesis, a summary of which is published in \cite{Moss18}. 138 \CFA{} generic types integrate efficiently and naturally with the existing polymorphic functions in \CFA{}, while retaining backward compatibility with C in layout and support for separate compilation. 139 A generic type can be declared in \CFA{} by placing a !forall! specifier on a !struct! or !union! declaration, and instantiated using a parenthesized list of types after the generic name. 140 An example comparable to the C polymorphism examples in Figures~\ref{bespoke-generic-fig}, \ref{void-generic-fig}, and \ref{macro-generic-fig} can be seen in Figure~\ref{cfa-generic-fig} \TODO{test this code}. 141 142 \begin{figure} 143 \begin{cfa} 144 forall(otype T) struct list { T value; list(T)* next; }; 145 146 $\C[\textwidth]{// single polymorphic implementation of each function}$ 147 $\C[\textwidth]{// overloading reduces need for namespace prefixes}$ 148 149 forall(otype T) void insert( list(T)** ls, T x ) { 150 list(T)* node = alloc(); $\C{// type-inferring alloc}$ 151 (*node){ x, *ls }; $\C{// concise constructor syntax}$ 152 *ls = node; 153 } 154 155 forall(otype T) T head( const list(T)* ls ) { return ls->value; } 156 157 $\C[\textwidth]{// use is clear and efficient}$ 158 159 int main() { 160 list(int)* il = 0; 161 insert( &il, 42 ); $\C{// inferred polymorphic T}$ 162 printf("%d\n", head(il)); 163 164 list(const char*)* sl = 0; 165 insert( &sl, "hello" ); 166 printf("%s\n", head(sl)); 167 } 168 \end{cfa} 169 170 \caption{\CFA{} generic linked list implementation.} \label{cfa-generic-fig} 171 \end{figure} 172 173 \section{Design} 174 175 Though a number of languages have some implementation of generic types, backward compatibility with both C and existing \CFA{} polymorphism presented some unique design constraints for this project. 176 The guiding principle was to maintain an unsurprising language model for C programmers without compromising runtime efficiency. 177 A key insight for this design was that C already possesses a handful of built-in generic types (\emph{compound types} in the language of the standard\cit{}), notably pointer (!T*!) and array (!T[]!), and that user-definable generics should act similarly. 178 179 \subsection{Related Work} 180 181 One approach to the design of generic types is that taken by \CC{} templates\cit{}. 182 The template approach is closely related to the macro-expansion approach to C polymorphism demonstrated in Figure~\ref{macro-generic-fig}, but where the macro-expansion syntax has been given first-class language support. 183 Template expansion has the benefit of generating code with near-optimal runtime efficiency, as distinct optimizations can be applied for each instantiation of the template. 184 On the other hand, template expansion can also lead to significant code bloat, exponential in the worst case\cit{}, and the costs of increased instruction cache pressure at runtime and wasted developer time when compiling cannot be discounted. 185 The most significant restriction of the \CC{} template model is that it breaks separate compilation and C's translation-unit-based encapsulation mechanisms. 186 Because a \CC{} template is not actually code, but rather a sort of ``recipe'' to generate code, template code must be visible at its call site to be used. 187 C code, by contrast, only needs a !struct! or function declaration to call that function or use (by-pointer) values of that type, a desirable property to maintain for \CFA{}. 188 189 Java\cit{} has another prominent implementation for generic types, based on a significantly different approach than \CC{}. 190 The Java approach has much more in common with the !void*!-polymorphism shown in Figure~\ref{void-generic-fig}; since in Java nearly all data is stored by reference, the Java approach to polymorphic data is to store pointers to arbitrary data and insert type-checked implicit casts at compile-time. 191 This process of \emph{type erasure} has the benefit of allowing a single instantiation of polymorphic code, but relies heavily on Java's object model and garbage collector. 192 To use this model, a more C-like language such as \CFA{} would be required to dynamically allocate internal storage for variables, track their lifetime, and properly clean them up afterward. 193 194 \TODO{Talk about Go, maybe Rust, Swift, etc. as well; specifically mention ``fat pointer'' polymorphism} 195 196 \TODO{Talk about Cyclone as well, and why my generics are more powerful} 197 198 \subsection{\CFA{} Generics} 199 200 The generic types design in \CFA{} draws inspiration from both \CC{} and Java generics, capturing the better aspects of each. 201 Like \CC{} template types, generic !struct!s and !union!s in \CFA{} have macro-expanded storage layouts, but, like Java generics, \CFA{} generic types can be used with separately-compiled polymorphic functions without requiring either the type or function definition to be visible to the other. 202 The fact that the storage layout of any instantiation of a \CFA{} generic type is identical to that of the monomorphic type produced by simple macro replacement of the generic type parameters is important to provide consistent and predictable runtime performance, and to not impose any undue abstraction penalty on generic code. 203 As an example, consider the following generic type and function \TODO{test this}: 204 205 \begin{cfa} 206 forall( otype R, otype S ) struct pair { R first; S second; }; 207 208 pair(const char*, int) with_len( const char* s ) { 209 return (pair(const char*), int){ s, strlen(s) }; 210 } 211 \end{cfa} 212 213 In this example, !with_len! is defined at the same scope as !pair!, but it could be called from any context that can see the definition of !pair! and a declaration of !with_len!. 214 If its return type was !pair(const char*, int)*!, callers of !with_len! would only need the declaration !forall(otype R, otype S) struct pair;! to call it, in accordance with the usual C rules for opaque types. 215 216 !with_len! is itself a monomorphic function, returning a type that is structurally identical to !struct { const char* first; int second; }!, and as such could be called from C given an appropriate redeclaration and demangling flags. 217 However, the definition of !with_len! depends on a polymorphic function call to the !pair! constructor, which only needs to be written once (in this case, implicitly by the compiler according to the usual \CFA{} constructor generation\cite{Moss18}) and can be re-used for a wide variety of !pair! instantiations. 218 Since the parameters to this polymorphic constructor call are all statically known, compiler inlining can eliminate any runtime overhead of this polymorphic call. 219 220 \CFA{} deliberately does not support \CC{}-style partial specializations of generic types. 221 A particularly infamous example in the \CC{} standard library is !vector<bool>!, which is represented as a bitstring rather than the array representation of the other !vector! instantiations. 222 Complications from this inconsistency (chiefly the fact that a single bit is not addressable, unlike an array element) make the \CC{} !vector! unpleasant to use in generic contexts due to the break in its public interface. 223 Rather than attempting to plug leaks in the template specialization abstraction with a detailed method interface, \CFA{} takes the more principled position that two types with an unrelated data layout are in fact unrelated types, and should be handled with different code. 224 Of course, to the degree that distinct types are similar enough to share an interface, the \CFA{} !trait! system allows one to be defined, and objects of types implementing that !trait! to be operated on in the same polymorphic functions. 225 226 Since \CFA{} polymorphic functions can operate over polymorphic generic types, functions over such types can be partially or completely specialized using the usual overload selection rules. 227 As an example, the !with_len! function above could be an optimization of the following more general function: 228 229 \begin{cfa} 230 forall(otype T, otype I | { I len(T); }) 231 pair(T, I) with_len( T s ) { 232 return (pair(T,I)){ s, len(s) }; 233 } 234 \end{cfa} 235 236 \section{Implementation} 237 238 % forall constraints on struct/union constrain default constructor (TODO check with Rob) 5 239 6 240 % TODO discuss layout function algorithm, application to separate compilation 7 % TODO put a static const field in for _n_fields for each generic, describe utility for separate compilation 8 9 % TODO mention impetus for zero_t design10 11 % TODO mention use in tuple-type implementation 241 % TODO put a static const field in for _n_fields for each generic, describe utility for separate compilation (actually no, you need to be able to see the type for it to be sized) 242 243 % mention that tuples are implemented on top of a per-arity generic type 244 245 \section{Performance Experiments} 12 246 13 247 % TODO pull benchmarks from Moss et al. 248 249 \section{Future Work} 250 251 % mention future work adding non-type generic parameters, like ints 252 253 % taking advantage of generic layout functions to provide field assertions in forall qualifiers 254 255 % mention packed generic layouts (significantly more complex layout function, but possible)
Note: See TracChangeset
for help on using the changeset viewer.