source: doc/proposals/vtable.md @ 0727d97

ADTarm-ehast-experimentalcleanup-dtorsenumforall-pointer-decayjacob/cs343-translationjenkins-sandboxnew-astnew-ast-unique-exprpthread-emulationqualifiedEnum
Last change on this file since 0727d97 was 18d4dbd, checked in by Andrew Beach <ajbeach@…>, 6 years ago

First round of updates to vtable.md from last review. Just hitting some of the major problem areas.

  • Property mode set to 100644
File size: 20.8 KB
Line 
1Proposal For Use of Virtual Tables
2==================================
3
4This is an adaptation of the earlier virtual proposal, updating it with new
5ideas, re-framing it and laying out more design decisions. It should
6eventually replace the earlier proposal, but not all features and syntax have
7been converted to the new design.
8
9The basic concept of a virtual table (vtable) is the same here as in most
10other languages that use them. They will mostly contain function pointers
11although they should be able to store anything that goes into a trait.
12
13I also include notes on a sample implementation, which primarily exists to show
14there is a reasonable implementation. The code samples for that are in a slight
15pseudo-code to help avoid name mangling and keeps some CFA features while they
16would actually be written in C.
17
18Trait Instances
19---------------
20
21Currently traits are completely abstract. Data types might implement a trait
22but traits are not themselves data types. Which is to say you cannot have an
23instance of a trait. This proposal will change that and allow instances of
24traits to be created from instances of data types that implement the trait.
25
26For example:
27
28    trait combiner(otype T) {
29        void combine(T&, int);
30    };
31
32    struct summation {
33        int sum;
34    };
35
36    void ?{}( struct summation & this ) {
37        this.sum = 0;
38    }
39
40    void combine( struct summation & this, int num ) {
41        this.sum = this.sum + num;
42    }
43
44    trait combiner obj = struct summation{};
45    combine(obj, 5);
46
47As with `struct` (and `union` and `enum`), `trait` might be optional when
48using the trait as a type name. A trait may be used in assertion list as
49before.
50
51For traits to be used this way they should meet two requirements. First they
52should only have a single polymorphic type and each assertion should use that
53type once as a parameter. Extensions may later loosen these requirements.
54
55Also note this applies to the final expanded list of assertions. Consider:
56
57    trait foo(otype T, otype U) {
58        ... functions that use T once ...
59    }
60
61    trait bar(otype S | foo(S, char)) {
62        ... functions that use S once ...
63    }
64
65In this example `bar` may be used as a type but `foo` may not.
66
67When a trait is used as a type it creates a generic object which combines
68the base structure (an instance of `summation` in this case) and the vtable,
69which is currently created and provided by a hidden mechanism.
70
71The generic object type for each trait also implements that trait. This is
72actually the only means by which it can be used. The type of these functions
73look something like this:
74
75    void combine(trait combiner & this, int num);
76
77The main use case for trait objects is that they can be stored. They can be
78passed into functions, but using the trait directly is preferred in this case.
79
80    trait drawable(otype T) {
81        void draw(Surface & to, T & draw);
82        Rect(int) drawArea(T & draw);
83    };
84
85    struct UpdatingSurface {
86        Surface * surface;
87        vector(trait drawable) drawables;
88    };
89
90    void updateSurface(UpdatingSurface & us) {
91        for (size_t i = 0 ; i < us.drawables.size ; ++i) {
92            draw(us.surface, us.drawables[i]);
93        }
94    }
95
96The trait types can also be used in the types of assertions on traits as well.
97In this usage they passed as the underlying object and vtable pair as they
98are stored. The trait types can also be used in that trait's definition, which
99means you can pass two instances of a trait to a single function. However the
100look-up of the one that is not used to look up any functions, until another
101function that uses that object in the generic/look-up location is called.
102
103    trait example(otype T) {
104        bool test(T & this, trait example & that);
105    }
106
107### Explanation Of Restrictions
108
109The two restrictions on traits that can be used as trait objects are:
110
1111.  Only one generic parameter may be defined in the trait's header.
1122.  Each function assertion must have one parameter with the type of the
113    generic parameter. They may or may not return a value of that type.
114
115Elsewhere in this proposal I suggest ways to broaden these requirements.
116A simple example would be if a trait meets requirement 1 but not 2, then
117the assertions that do not satisfy the exactly one parameter requirement can
118be ignored.
119
120However I would like to talk about why these two rules are in place in the
121first place and the problems that any exceptions to these rules must avoid.
122
123The problems appear when the dispatcher function which operates on the
124generic object.
125
126    trait combiner(otype T, otype U) {
127        void combine(T&, U);
128    }
129
130This one is so strange I don't have proper syntax for it but let us say that
131the concrete dispatcher would be typed as
132`void combine(combiner(T) &, combiner(U));`. Does the function that combine
133the two underlying types exist to dispatch too?
134
135Maybe not. If `combiner(T)` works with ints and `combiner(U)` is a char then
136they could not be. It would have to enforce that all pairs of any types
137that are wrapped in this way. Which would pretty much destroy any chance of
138separate compilation.
139
140Even then it would be more expensive as the wrappers would have to carry ids
141that you use to look up on an <number of types>+1 dimensional table.
142
143The second restriction has a similar issue but makes a bit more sense to
144write out.
145
146    trait Series(otype T) {
147        ... size, iterators, getters ...
148        T join(T const &, T const &);
149    }
150
151With the dispatcher typed as:
152
153    Series join(Series const &, Series const &);
154
155Because these instances are generic and hide the underlying implementation we
156do not know what that implementation is. Unfortunately this also means the
157implementation for the two parameters might not be the same. Once we have
158two different types involved this devolves into the first case.
159
160We could check at run-time that the have the same underlying type, but this
161would likely time and space overhead and there is no clear recovery path.
162
163#### Sample Implementation
164A simple way to implement trait objects is by a pair of pointers. One to the
165underlying object and one to the vtable.
166
167    struct vtable_drawable {
168        void (*draw)(Surface &, void *);
169        Rect(int) (*drawArea)(void *);
170    };
171
172    struct drawable {
173        void * object;
174        vtable_drawable * vtable;
175    };
176
177The functions that run on the trait object would generally be generated using
178the following pattern:
179
180    void draw(Surface & surface, drawable & traitObj) {
181        return traitObj.vtable->draw(surface, traitObj.object);
182    }
183
184There may have to be special cases for things like copy construction, that
185might require a more significant wrapper. On the other hand moving could be
186implemented by moving the pointers without any need to refer to the base
187object.
188
189### Extension: Multiple Trait Parameters
190The base proposal in effect creates another use for the trait syntax that is
191related to the ones currently in the language but is also separate from them.
192The current uses generic functions and generic types, this new use could be
193described as generic objects.
194
195A generic object is of a concrete type and has concrete functions that work on
196it. It is generic in that it is a wrapper for an unknown type. Traits serve
197a similar role here as in generic functions as they limit what the function
198can be generic over.
199
200This combines the use allowing to have a generic type that is a generic
201object. All but one of the trait's parameters is given a concrete type,
202conceptually currying the trait to create a trait with on generic parameter
203that fits the original restrictions. The resulting concrete generic object
204type is different with each set of provided parameters and their values.
205
206Then it just becomes a question of where this is done. Again both examples use
207a basic syntax to show the idea.
208
209    trait iterator(virtual otype T, otype Item) {
210        bool has_next(T const &);
211        Item get_next(T const *);
212    }
213
214    iterator(int) int_it = begin(container_of_ints);
215
216The first option is to do it at the definition of the trait. One parameter
217is selected (here with the `virtual` keyword, but other rules like "the first"
218could also be used) and when an instance of the trait is created all the
219other parameters must be provided.
220
221    trait iterator(otype T, otype Item) {
222        bool has_next(T const &);
223        Item get_next(T const *);
224    }
225
226    iterator(virtual, int) int_it = begin(container_of_ints);
227
228The second option is to skip a parameter as part of the type instance
229definition. One parameter is explicitly skipped (again with the `virtual`
230keyword) and the others have concrete types. The skipped one is the one we
231are generic on.
232
233Incidentally in both examples `container_of_ints` may itself be a generic
234object and `begin` returns a generic iterator with unknown implementation.
235
236These options are not exclusive. Defining a default on the trait allows for
237an object to be created as in the first example. However, whether the
238default is provided or not, the second syntax can be used to pick a
239parameter on instantiation.
240
241Hierarchy
242---------
243
244We would also like to implement hierarchical relations between types.
245
246    AstNode
247    |-ParseNode
248    | |-Declaration
249    | | |-DeclarationWithType
250    | | |-StructureDeclaration
251    | |-Statement
252    | | |-CompoundStatement
253    | |-Expression
254    |-Type
255
256Virtual tables by themselves are not quite enough to implement this system.
257A vtable is just a list of functions and there is no way to check at run-time
258what these functions, we carry that knowledge with the table.
259
260This proposal adds type ids to check for position in the hierarchy and an
261explicate syntax for establishing a hierarchical relation between traits and
262their implementing types. The ids should uniquely identify each type and
263allow retrieval of the type's parent if one exists. By recursion this allows
264the ancestor relation between any two hierarchical types can be checked.
265
266The hierarchy is created with traits as the internal nodes and structures
267as the leaf nodes. The structures may be used normally and the traits can
268be used to create generic objects as in the first section (the same
269restrictions apply). However these type objects store their type id which can
270be recovered to figure out which type they are or at least check to see if
271they fall into a given sub-tree at run-time.
272
273Here is an example of part of a hierarchy. The `virtual(PARENT)` syntax is
274just an example. But when used it give the name of the parent type or if
275empty it shows that this type is the root of its hierarchy.
276(Also I'm not sure where I got these casing rules.)
277
278    trait ast_node(otype T) virtual() {
279        void print(T & this, ostream & out);
280        void visit(T & this, Visitor & visitor);
281        CodeLocation const & get_code_location(T & this);
282    }
283
284    trait expression_node(otype T) virtual(ast_node) {
285        Type eval_type(T const & this);
286    }
287
288    struct operator_expression virtual(expression_node) {
289        enum operator_kind kind;
290        trait expression_node rands[2];
291    }
292
293    trait statement_node(otype T) virtual(ast_node) {
294        vector(Label) & get_labels(T & this);
295    }
296
297    struct goto_statement virtual(statement_node) {
298        vector(Label) labels;
299        Label target;
300    }
301
302    trait declaration_node(otype T) virtual(ast_node) {
303        string name_of(T const & this);
304        Type type_of(T const & this);
305    }
306
307    struct using_declaration virtual(declaration_node) {
308        string new_type;
309        Type old_type;
310    }
311
312    struct variable_declaration virtual(declaration_node) {
313        string name;
314        Type type;
315    }
316
317### Extension: Structural Inheritance
318An extension would be allow structures to be used as internal nodes on the
319inheritance tree. Its child types would have to implement the same fields.
320
321The weaker restriction would be to convert the fields into field assertions
322(Not implemented yet: `U T.x` means there is a field of type you on the type
323T. Offset unknown and passed in/stored with function pointers.)
324A concrete child would have to declare the same set of fields with the same
325types. This is of a more functional style.
326
327The stronger restriction is that the fields of the parent are a prefix of the
328child's fields. Possibly automatically inserted. This the imperative view and
329may also have less overhead.
330
331### Extension: Unions and Enumerations
332Currently there is no reason unions and enumerations, in the cases they
333do implement the trait, could not be in the hierarchy as leaf nodes.
334
335It does not work with structural induction, but that could just be a compile
336time check that all ancestors are traits or do not add field assertions.
337
338#### Sample Implementation
339The type id may be as little as:
340
341    struct typeid {
342        struct typeid const * const parent;
343    };
344
345Some linker magic would have to be used to ensure exactly one copy of each
346structure for each type exists in memory. There seem to be special once
347sections that support this and it should be easier than generating unique
348ids across compilation units.
349
350The structure could be extended to contain any additional type information.
351
352There are two general designs for vtables with type ids. The first is to put
353the type id at the top of the vtable, this is the most compact and efficient
354solution but only works if we have exactly 1 vtable for each type. The second
355is to put a pointer to the type id in each vtable. This has more overhead but
356allows multiple vtables.
357
358    struct <trait>_vtable {
359        struct typeid const id;
360
361        // Trait dependent list of vtable members.
362    };
363
364    struct <trait>_vtable {
365        struct typeid const * const id;
366
367        // Trait dependent list of vtable members.
368    };
369
370### Virtual Casts
371The generic objects may be cast up and down the hierarchy.
372
373Casting to an ancestor type always succeeds. From one generic type to another
374is just a reinterpretation and could be implicate. Wrapping and unwrapping
375a concrete type will probably use the same syntax as in the first section.
376
377Casting from an ancestor to a descendent requires a check. The underlying
378type may or may not belong to the sub-tree headed by that descendent. For this
379we introduce a new cast operator, which returns the pointer unchanged if the
380check succeeds and null otherwise.
381
382    trait SubType * new_value = (virtual trait SubType *)super_type;
383
384For the following example I am using the as of yet finished exception system.
385
386    trait exception(otype T) virtual() {
387        char const * what(T & this);
388    }
389
390    trait io_error(otype T) virtual(exception) {
391        FILE * which_file(T & this);
392    }
393
394    struct eof_error(otype T) virtual(io_error) {
395        FILE * file;
396    }
397
398    char const * what(eof_error &) {
399        return "Tried to read from an empty file.";
400    }
401
402    FILE * which_file(eof_error & this) {
403        return eof_error.file;
404    }
405
406    bool handleIoError(exception * exc) {
407        io_error * error = (virtual io_error *)exc;
408        if (NULL == error) {
409            return false;
410        }
411        ...
412        return true;
413    }
414
415### Extension: Implicate Virtual Cast Target
416This is a small extension, even in the example above `io_error *` is repeated
417in the cast and the variable being assigned to. Using return type inference
418would allow the second type to be skipped in cases it is clear what type is
419being checked against.
420
421The line then becomes:
422
423    io_error * error = (virtual)exc;
424
425### Extension: Inline vtables
426Since the structures here are usually made to be turned into trait objects
427it might be worth it to have fields in them to store the virtual table
428pointer. This would have to be declared on the trait as an assertion (example:
429`vtable;` or `T.vtable;`), but if it is the trait object could be a single
430pointer.
431
432There are also three options for where the pointer to the vtable. It could be
433anywhere, a fixed location for each trait or always at the front. For the per-
434trait solution an extension to specify what it is (example `vtable[0];`) which
435could also be used to combine it with others. So these options can be combined
436to allow access to all three options.
437
438### Virtual Tables as Types
439Here we consider encoding plus the implementation of functions on it to be a
440type. Which is to say in the type hierarchy structures aren't concrete types
441anymore, instead they are parent types to vtables, which combine the encoding
442and implementation.
443
444Resolution Scope
445----------------
446
447What is the scope of a resolution? When are the functions in a vtable decided
448and how broadly is this applied?
449
450### Type Level:
451Each structure has a single resolution for all of the functions in the
452virtual trait. This is how many languages that implement this or similar
453features do it.
454
455The main thing CFA would need to do it this way is some single point where
456the type declaration, including the functions that satisfy the trait, are
457all defined. Currently there are many points where this can happen, not all
458of them have the same definitions and no way to select one over the other.
459
460Some syntax would have to be added to specify the resolution point. To ensure
461a single instance there may have to be two variants, one forward declaration
462and one to create the instance. With some compiler magic the forward
463declaration maybe enough.
464
465    extern trait combiner(struct summation) vtable;
466    trait combiner(struct summation) vtable;
467
468Or (with the same variants):
469
470    vtable combiner(struct summation);
471
472The extern variant promises that the vtable will exist while the normal one
473is where the resolution actually happens.
474
475### Explicit Resolution Points:
476Slightly looser than the above, there are explicit points where the vtables
477are resolved, but there is no limit on the number of resolution points that
478might be provided. Each time a object is bound to a trait, one of the
479resolutions is selected. This might be the most flexible option.
480
481An syntax would have to be provided as above. There may also be the option
482to name resolution points so that you can choose between them. This also
483could come with the ability to forward declare them.
484
485Especially if they are not named, these resolution points should be able to
486appear in functions, where the scoping rules can be used to select one.
487However this also means that stack-allocated functions can end up in the
488vtable.
489
490    extern trait combiner(struct summation) vtable sum;
491    trait combiner(struct summation) vtable sum;
492
493    extern trait combiner(struct summation) vtable sum default;
494    trait combiner(struct summation) vtable sum default;
495
496The extern difference is the same before. The name (sum in the samples) is
497used at the binding site to say which one is picked. The default keyword can
498be used in only some of the declarations.
499
500    trait combiner fee = (summation_instance, sum);
501    trait combiner foe = summation_instance;
502
503(I am not really happy about this syntax, but it kind of works.)
504The object being bound is required. The name of the vtable is optional if
505there is exactly one vtable name marked with default.
506
507These could also be placed inside functions. In which case both the name and
508the default keyword might be optional. If the name is omitted in an assignment
509the closest vtable is chosen (returning to the global default rule if no
510appropriate local vtable is in scope).
511
512### Site Based Resolution:
513Every place in code where the binding of a vtable to an object occurs has
514its own resolution. Syntax-wise this is the simplest as it should be able
515to use just the existing declarations and the conversion to trait object.
516It also is very close to the current polymorphic resolution rules.
517
518This works as the explicit resolution points except the resolution points
519are implicit and their would be no selection of which resolution to use. The
520closest (current) resolution is always selected.
521
522This could easily lead to an explosion of vtables as it has the most fine
523grained resolution the number of bindings in a single scope (that produces
524the same binding) could be quite high. Merging identical vtables might help
525reduce that.
526
527Vtable Lifetime Issues
528----------------------
529
530Vtables interact badly with the thunk issue. Conceptually vtables are static
531like type/function data they carry, as those decisions are made by the
532resolver at compile time.
533
534Stack allocated functions interact badly with this because they are not
535static. There are several ways to try to resolve this, however without a
536general solution most can only buy time.
537
538Filling in some fields of a static vtable could cause issues on a recursive
539call. And then we are still limited by the lifetime of the stack functions, as
540the vtable with stale pointers is still a problem.
541
542Dynamically allocated vtables introduces memory management overhead and
543requires some way to differentiate between dynamic and statically allocated
544tables. The stale function pointer problem continues unless those becomes
545dynamically allocated as well which gives us the same costs again.
546
547Stack allocating the vtable seems like the best issue. The vtable's lifetime
548is now the limiting factor but it should be effectively the same as the
549shortest lifetime of a function assigned to it. However this still limits the
550lifetime "implicitly" and returns to the original problem with thunks.
Note: See TracBrowser for help on using the repository browser.