source: doc/proposals/vtable.md @ 223a633

ADTarm-ehast-experimentalenumforall-pointer-decayjacob/cs343-translationnew-ast-unique-exprpthread-emulationqualifiedEnum
Last change on this file since 223a633 was fbfd97bd, checked in by Andrew Beach <ajbeach@…>, 4 years ago

Addition to the vtable proposal. We can recover the underlying type without a hierarchy.

  • Property mode set to 100644
File size: 27.3 KB
Line 
1Proposal For Use of Virtual Tables
2==================================
3
4The basic concept of a virtual table (vtable) is the same here as in most
5other languages that use them. They will mostly contain function pointers
6although they should be able to store anything that goes into a trait.
7
8I also include notes on a sample implementation, which primarily exists to show
9there is a reasonable implementation. The code samples for that are in a slight
10pseudo-code to help avoid name mangling and keeps some CFA features while they
11would actually be written in C.
12
13Trait Instances
14---------------
15
16Currently traits are completely abstract. Data types might implement a trait
17but traits are not themselves data types. Which is to say you cannot have an
18instance of a trait. This proposal will change that and allow instances of
19traits to be created from instances of data types that implement the trait.
20
21For example:
22
23    trait combiner(otype T) {
24        void combine(T&, int);
25    };
26
27    struct summation {
28        int sum;
29    };
30
31    void ?{}( struct summation & this ) {
32        this.sum = 0;
33    }
34
35    void combine( struct summation & this, int num ) {
36        this.sum = this.sum + num;
37    }
38
39    trait combiner obj = struct summation{};
40    combine(obj, 5);
41
42As with `struct` (and `union` and `enum`), `trait` might be optional when
43using the trait as a type name. A trait may be used in assertion list as
44before.
45
46For traits to be used this way they should meet two requirements. First they
47should only have a single polymorphic type and each assertion should use that
48type once as a parameter. Extensions may later loosen these requirements.
49
50Also note this applies to the final expanded list of assertions. Consider:
51
52    trait foo(otype T, otype U) {
53        ... functions that use T once ...
54    }
55
56    trait bar(otype S | foo(S, char)) {
57        ... functions that use S once ...
58    }
59
60In this example `bar` may be used as a type but `foo` may not.
61
62When a trait is used as a type it creates a generic object which combines
63the base structure (an instance of `summation` in this case) and the vtable,
64which is currently created and provided by a hidden mechanism.
65
66The generic object type for each trait also implements that trait. This is
67actually the only means by which it can be used. The type of these functions
68look something like this:
69
70    void combine(trait combiner & this, int num);
71
72The main use case for trait objects is that they can be stored. They can be
73passed into functions, but using the trait directly is preferred in this case.
74
75    trait drawable(otype T) {
76        void draw(Surface & to, T & draw);
77        Rect(int) drawArea(T & draw);
78    };
79
80    struct UpdatingSurface {
81        Surface * surface;
82        vector(trait drawable) drawables;
83    };
84
85    void updateSurface(UpdatingSurface & us) {
86        for (size_t i = 0 ; i < us.drawables.size ; ++i) {
87            draw(us.surface, us.drawables[i]);
88        }
89    }
90
91With a more complete widget trait you could, for example, construct a UI tool
92kit that can declare containers that hold widgets without knowing about the
93widget types. Making it reasonable to extend the tool kit.
94
95The trait types can also be used in the types of assertions on traits as well.
96In this usage they passed as the underlying object and vtable pair as they
97are stored. The trait types can also be used in that trait's definition, which
98means you can pass two instances of a trait to a single function. However the
99look-up of the one that is not used to look up any functions, until another
100function that uses that object in the generic/look-up location is called.
101
102    trait example(otype T) {
103        bool test(T & this, trait example & that);
104    }
105
106### Explanation Of Restrictions
107
108The two restrictions on traits that can be used as trait objects are:
109
1101.  Only one generic parameter may be defined in the trait's header.
1112.  Each function assertion must have one parameter with the type of the
112    generic parameter. They may or may not return a value of that type.
113
114Elsewhere in this proposal I suggest ways to broaden these requirements.
115A simple example would be if a trait meets requirement 1 but not 2, then
116the assertions that do not satisfy the exactly one parameter requirement can
117be ignored.
118
119However I would like to talk about why these two rules are in place in the
120first place and the problems that any exceptions to these rules must avoid.
121
122The problems appear when the dispatcher function which operates on the
123generic object.
124
125    trait combiner(otype T, otype U) {
126        void combine(T&, U);
127    }
128
129This one is so strange I don't have proper syntax for it but let us say that
130the concrete dispatcher would be typed as
131`void combine(combiner(T) &, combiner(U));`. Does the function that combine
132the two underlying types exist to dispatch too?
133
134Maybe not. If `combiner(T)` works with ints and `combiner(U)` is a char then
135they could not be. It would have to enforce that all pairs of any types
136that are wrapped in this way. Which would pretty much destroy any chance of
137separate compilation.
138
139Even then it would be more expensive as the wrappers would have to carry ids
140that you use to look up on an <number of types>+1 dimensional table.
141
142The second restriction has a similar issue but makes a bit more sense to
143write out.
144
145    trait Series(otype T) {
146        ... size, iterators, getters ...
147        T join(T const &, T const &);
148    }
149
150With the dispatcher typed as:
151
152    Series join(Series const &, Series const &);
153
154Because these instances are generic and hide the underlying implementation we
155do not know what that implementation is. Unfortunately this also means the
156implementation for the two parameters might not be the same. Once we have
157two different types involved this devolves into the first case.
158
159We could check at run-time that the have the same underlying type, but this
160would likely time and space overhead and there is no clear recovery path.
161
162#### Sample Implementation
163A simple way to implement trait objects is by a pair of pointers. One to the
164underlying object and one to the vtable.
165
166    struct vtable_drawable {
167        void (*draw)(Surface &, void *);
168        Rect(int) (*drawArea)(void *);
169    };
170
171    struct drawable {
172        void * object;
173        vtable_drawable * vtable;
174    };
175
176The functions that run on the trait object would generally be generated using
177the following pattern:
178
179    void draw(Surface & surface, drawable & traitObj) {
180        return traitObj.vtable->draw(surface, traitObj.object);
181    }
182
183There may have to be special cases for things like copy construction, that
184might require a more significant wrapper. On the other hand moving could be
185implemented by moving the pointers without any need to refer to the base
186object.
187
188### Extension: Multiple Trait Parameters
189The base proposal in effect creates another use for the trait syntax that is
190related to the ones currently in the language but is also separate from them.
191The current uses generic functions and generic types, this new use could be
192described as generic objects.
193
194A generic object is of a concrete type and has concrete functions that work on
195it. It is generic in that it is a wrapper for an unknown type. Traits serve
196a similar role here as in generic functions as they limit what the function
197can be generic over.
198
199This combines the use allowing to have a generic type that is a generic
200object. All but one of the trait's parameters is given a concrete type,
201conceptually currying the trait to create a trait with on generic parameter
202that fits the original restrictions. The resulting concrete generic object
203type is different with each set of provided parameters and their values.
204
205Then it just becomes a question of where this is done. Again both examples use
206a basic syntax to show the idea.
207
208    trait iterator(virtual otype T, otype Item) {
209        bool has_next(T const &);
210        Item get_next(T const *);
211    }
212
213    iterator(int) int_it = begin(container_of_ints);
214
215The first option is to do it at the definition of the trait. One parameter
216is selected (here with the `virtual` keyword, but other rules like "the first"
217could also be used) and when an instance of the trait is created all the
218other parameters must be provided.
219
220    trait iterator(otype T, otype Item) {
221        bool has_next(T const &);
222        Item get_next(T &);
223    }
224
225    iterator(virtual, int) int_it = begin(container_of_ints);
226
227The second option is to skip a parameter as part of the type instance
228definition. One parameter is explicitly skipped (again with the `virtual`
229keyword) and the others have concrete types. The skipped one is the one we
230are generic on.
231
232Incidentally in both examples `container_of_ints` may itself be a generic
233object and `begin` returns a generic iterator with unknown implementation.
234
235These options are not exclusive. Defining a default on the trait allows for
236an object to be created as in the first example. However, whether the
237default is provided or not, the second syntax can be used to pick a
238parameter on instantiation.
239
240### Extension: Object Access
241This requires that the resolution scope (see below) is at the type level or
242has explicate points with names. These are the tables and table names used
243here.
244
245The system already knows where to find the virtual table and the object. If
246the tables have particular identities, or on the user side names, then it is
247meaningful to check if a binding virtual table is the same* as another. The
248main use of this is virtual table declarations also give the type they bind
249and if a binding table matches a known table then the underlyind object in the
250trait object must be of that type.
251
252* By identity, by value would work and in some senses be more flexiable. But
253  it would be slower and refering to further away functions would be harder.
254
255This gives one of the main new features of the hierarchical use of virtual
256tables (see below); the ability to recover the underlying object. Or a pointer
257of the approprate type it which both reflects the implementation and gives a
258convenent way to encode the boolean/conditional aspect of the operation which
259is that a different virtual table might be in use.
260
261There are two general ways to reperent this; a cast or a field access. The
262cast is traditional and would definitely fit if a single pointer repersents
263a trait object with the virtual table as part of the object. However for a
264double pointer field access might be more approprate. By this system though
265it is not the type that is used as the identifier but the virtual table. If
266there is one table per type than it becomes equivilant again. Otherwise the
267table has to be used as the identifier and the type is just a result of that
268which seems important for syntax.
269
270Hierarchy
271---------
272
273We would also like to implement hierarchical relations between types.
274
275    ast_node
276    |-expression_node
277    | |-operator_expression
278    |
279    |-statement_node
280    | |-goto_statement
281    |
282    |-declaration_node
283      |-using_declaration
284      |-variable_declaration
285
286Virtual tables by themselves are not quite enough to implement this system.
287A vtable is just a list of functions and there is no way to check at run-time
288what these functions, we carry that knowledge with the table.
289
290This proposal adds type ids to check for position in the hierarchy and an
291explicate syntax for establishing a hierarchical relation between traits and
292their implementing types. The ids should uniquely identify each type and
293allow retrieval of the type's parent if one exists. By recursion this allows
294the ancestor relation between any two hierarchical types can be checked.
295
296The hierarchy is created with traits as the internal nodes and structures
297as the leaf nodes. The structures may be used normally and the traits can
298be used to create generic objects as in the first section (the same
299restrictions apply). However these type objects store their type id which can
300be recovered to figure out which type they are or at least check to see if
301they fall into a given sub-tree at run-time.
302
303Here is an example of part of a hierarchy. The `virtual(PARENT)` syntax is
304just an example. But when used it give the name of the parent type or if
305empty it shows that this type is the root of its hierarchy.
306(Also I'm not sure where I got these casing rules.)
307
308    trait ast_node(otype T) virtual() {
309        void print(T & this, ostream & out);
310        void visit(T & this, Visitor & visitor);
311        CodeLocation const & get_code_location(T & this);
312    }
313
314    trait expression_node(otype T) virtual(ast_node) {
315        Type eval_type(T const & this);
316    }
317
318    struct operator_expression virtual(expression_node) {
319        enum operator_kind kind;
320        trait expression_node rands[2];
321    }
322
323    trait statement_node(otype T) virtual(ast_node) {
324        vector(Label) & get_labels(T & this);
325    }
326
327    struct goto_statement virtual(statement_node) {
328        vector(Label) labels;
329        Label target;
330    }
331
332    trait declaration_node(otype T) virtual(ast_node) {
333        string name_of(T const & this);
334        Type type_of(T const & this);
335    }
336
337    struct using_declaration virtual(declaration_node) {
338        string new_type;
339        Type old_type;
340    }
341
342    struct variable_declaration virtual(declaration_node) {
343        string name;
344        Type type;
345    }
346
347This system does not support multiple inheritance. The system could be
348extended to support it or a limited form (ex. you may have multiple parents
349but they may not have a common ancestor). However this proposal focuses just
350on using hierachy as organization. Other uses for reusable/genaric code or
351shared interfaces is left for other features of the language.
352
353### Extension: Structural Inheritance
354An extension would be allow structures to be used as internal nodes on the
355inheritance tree. Its child types would have to implement the same fields.
356
357The weaker restriction would be to convert the fields into field assertions
358(Not implemented yet: `U T.x` means there is a field of type you on the type
359T. Offset unknown and passed in/stored with function pointers.)
360A concrete child would have to declare the same set of fields with the same
361types. This is of a more functional style.
362
363The stronger restriction is that the fields of the parent are a prefix of the
364child's fields. Possibly automatically inserted. This the imperative view and
365may also have less overhead.
366
367### Extension: Unions and Enumerations
368Currently there is no reason unions and enumerations, in the cases they
369do implement the trait, could not be in the hierarchy as leaf nodes.
370
371It does not work with structural induction, but that could just be a compile
372time check that all ancestors are traits or do not add field assertions.
373
374#### Sample Implementation
375The type id may be as little as:
376
377    struct typeid {
378        struct typeid const * const parent;
379    };
380
381Some linker magic would have to be used to ensure exactly one copy of each
382structure for each type exists in memory. There seem to be special once
383sections that support this and it should be easier than generating unique
384ids across compilation units.
385
386The structure could be extended to contain any additional type information.
387
388There are two general designs for vtables with type ids. The first is to put
389the type id at the top of the vtable, this is the most compact and efficient
390solution but only works if we have exactly 1 vtable for each type. The second
391is to put a pointer to the type id in each vtable. This has more overhead but
392allows multiple vtables per type.
393
394    struct <trait>_vtable {
395        struct typeid const id;
396
397        // Trait dependent list of vtable members.
398    };
399
400    struct <trait>_vtable {
401        struct typeid const * const id;
402
403        // Trait dependent list of vtable members.
404    };
405
406One important restriction is that only one instance of each typeid in memory.
407There is a ".gnu.linkonce" feature in the linker that might solve the issue.
408
409### Virtual Casts
410The generic objects may be cast up and down the hierarchy.
411
412Casting to an ancestor type always succeeds. From one generic type to another
413is just a reinterpretation and could be implicate. Wrapping and unwrapping
414a concrete type will probably use the same syntax as in the first section.
415
416Casting from an ancestor to a descendent requires a check. The underlying
417type may or may not belong to the sub-tree headed by that descendent. For this
418we introduce a new cast operator, which returns the pointer unchanged if the
419check succeeds and null otherwise.
420
421    trait SubType * new_value = (virtual trait SubType *)super_type;
422
423For the following example I am using the as of yet finished exception system.
424
425    trait exception(otype T) virtual() {
426        char const * what(T & this);
427    }
428
429    trait io_error(otype T) virtual(exception) {
430        FILE * which_file(T & this);
431    }
432
433    struct eof_error(otype T) virtual(io_error) {
434        FILE * file;
435    }
436
437    char const * what(eof_error &) {
438        return "Tried to read from an empty file.";
439    }
440
441    FILE * which_file(eof_error & this) {
442        return eof_error.file;
443    }
444
445    bool handleIoError(exception * exc) {
446        io_error * error = (virtual io_error *)exc;
447        if (NULL == error) {
448            return false;
449        }
450        ...
451        return true;
452    }
453
454### Extension: Implicate Virtual Cast Target
455This is a small extension, even in the example above `io_error *` is repeated
456in the cast and the variable being assigned to. Using return type inference
457would allow the second type to be skipped in cases it is clear what type is
458being checked against.
459
460The line then becomes:
461
462    io_error * error = (virtual)exc;
463
464#### Sample Implementation
465This cast implementation assumes a type id layout similar to the one given
466above. Also this code is definitely in the underlying C. Functions that give
467this functionality could exist in the standard library but these are meant to
468be produced by code translation of the virtual cast.
469
470    bool is_in_subtree(typeid const * root, typeid const * id) {
471        if (root == id) {
472            return true
473        } else if (NULL == id->parent) {
474            return false;
475        } else {
476            return is_in_subtree(root, id->parent);
477        }
478    }
479
480    void * virtual_cast(typeid const * target, void * value) {
481        return is_in_subtree(target, *(typeid const **)value) ? value : NULL;
482    }
483
484The virtual cast function might have to be wrapped with some casts to make it
485compile without warning.
486
487For the implicate target type we may be able to lean on the type resolution
488system that already exists. If the casting to ancestor type is built into
489the resolution then the impicate target could be decided by picking an
490overload, generated for each hierarchial type (here io_error and its root
491type exception).
492
493    io_error * virtual_cast(exception * value) {
494        return virtual_cast(io_error_typeid, value);
495    }
496
497### Extension: Inline vtables
498Since the structures here are usually made to be turned into trait objects
499it might be worth it to have fields in them to store the virtual table
500pointer. This would have to be declared on the trait as an assertion (example:
501`vtable;` or `T.vtable;`), but if it is the trait object could be a single
502pointer.
503
504There are also three options for where the pointer to the vtable. It could be
505anywhere, a fixed location for each trait or always at the front. For the per-
506trait solution an extension to specify what it is (example `vtable[0];`) which
507could also be used to combine it with others. So these options can be combined
508to allow access to all three options.
509
510The pointer to virtual table field on structures might implicately added (the
511types have to declare they are a child here) or created with a declaration,
512possibly like the one used to create the assertion.
513
514### Virtual Tables as Types
515Here we consider encoding plus the implementation of functions on it to be a
516type. Which is to say in the type hierarchy structures aren't concrete types
517anymore, instead they are parent types to vtables, which combine the encoding
518and implementation.
519
520### Question: Wrapping Structures
521One issue is what to do with concrete types at the base of the type tree.
522When we are working with the concrete type generally it would like them to be
523regular structures with direct calls. On the other hand for interactions with
524other types in the hierarchy it is more convenent for the type already to be
525cast.
526
527Which of these two should we use? Should we support both and if so how do we
528choose which one is being used at any given time.
529
530On a related note I have been using pointers two trait types here, as that
531is how many existing languages handle it. However the generic objects might
532be only one or two pointers wide passing the objects as a whole would not
533be very expensive and all operations on the generic objects probably have
534to be defined anyways.
535
536Resolution Scope
537----------------
538
539What is the scope of a resolution? When are the functions in a vtable decided
540and how broadly is this applied?
541
542### Type Level:
543Each structure has a single resolution for all of the functions in the
544virtual trait. This is how many languages that implement this or similar
545features do it.
546
547The main thing CFA would need to do it this way is some single point where
548the type declaration, including the functions that satisfy the trait, are
549all defined. Currently there are many points where this can happen, not all
550of them have the same definitions and no way to select one over the other.
551
552Some syntax would have to be added to specify the resolution point. To ensure
553a single instance there may have to be two variants, one forward declaration
554and one to create the instance. With some compiler magic the forward
555declaration maybe enough.
556
557    extern trait combiner(struct summation) vtable;
558    trait combiner(struct summation) vtable;
559
560Or (with the same variants):
561
562    vtable combiner(struct summation);
563
564The extern variant promises that the vtable will exist while the normal one
565is where the resolution actually happens.
566
567### Explicit Resolution Points:
568Slightly looser than the above, there are explicit points where the vtables
569are resolved, but there is no limit on the number of resolution points that
570might be provided. Each time a object is bound to a trait, one of the
571resolutions is selected. This might be the most flexible option.
572
573An syntax would have to be provided as above. There may also be the option
574to name resolution points so that you can choose between them. This also
575could come with the ability to forward declare them.
576
577Especially if they are not named, these resolution points should be able to
578appear in functions, where the scoping rules can be used to select one.
579However this also means that stack-allocated functions can end up in the
580vtable.
581
582    extern trait combiner(struct summation) vtable sum;
583    trait combiner(struct summation) vtable sum;
584
585    extern trait combiner(struct summation) vtable sum default;
586    trait combiner(struct summation) vtable sum default;
587
588The extern difference is the same before. The name (sum in the samples) is
589used at the binding site to say which one is picked. The default keyword can
590be used in only some of the declarations.
591
592    trait combiner fee = {summation_instance, sum};
593    trait combiner foe = summation_instance;
594
595(I am not really happy about this syntax, but it kind of works.)
596The object being bound is required. The name of the vtable is optional if
597there is exactly one vtable name marked with default.
598
599These could also be placed inside functions. In which case both the name and
600the default keyword might be optional. If the name is omitted in an assignment
601the closest vtable is chosen (returning to the global default rule if no
602appropriate local vtable is in scope).
603
604### Site Based Resolution:
605Every place in code where the binding of a vtable to an object occurs has
606its own resolution. Syntax-wise this is the simplest as it should be able
607to use just the existing declarations and the conversion to trait object.
608It also is very close to the current polymorphic resolution rules.
609
610This works as the explicit resolution points except the resolution points
611are implicit and their would be no selection of which resolution to use. The
612closest (current) resolution is always selected.
613
614This could easily lead to an explosion of vtables as it has the most fine
615grained resolution the number of bindings in a single scope (that produces
616the same binding) could be quite high. Merging identical vtables might help
617reduce that.
618
619Vtable Lifetime Issues
620----------------------
621
622Vtables interact badly with the thunk issue. Conceptually vtables are static
623like type/function data they carry, as those decisions are made by the
624resolver at compile time.
625
626Stack allocated functions interact badly with this because they are not
627static. There are several ways to try to resolve this, however without a
628general solution most can keep vtables from making the existing thunk problem
629worse, they don't do anything to solve it.
630
631Filling in some fields of a static vtable could cause issues on a recursive
632call. And then we are still limited by the lifetime of the stack functions, as
633the vtable with stale pointers is still a problem.
634
635Dynamically allocated vtables introduces memory management overhead and
636requires some way to differentiate between dynamic and statically allocated
637tables. The stale function pointer problem continues unless those becomes
638dynamically allocated as well which gives us the same costs again.
639
640Stack allocating the vtable seems like the best issue. The vtable's lifetime
641is now the limiting factor but it should be effectively the same as the
642shortest lifetime of a function assigned to it. However this still limits the
643lifetime "implicitly" and returns to the original problem with thunks.
644
645Odds And Ends
646-------------
647
648In addition to the main design there are a few extras that should be
649considered. They are not part of the core design but make the new uses fully
650featured.
651
652### Extension: Parent-Child Assertion
653For hierarchy types in regular traits, generic functions or generic structures
654we may want to be able to check parent-child relationships between two types
655given. For this we might have to add another primitive assertion. It would
656have the following form if declared in code:
657
658    trait is_parent_child(dtype Parent, dtype Child) { <built-in magic> }
659
660This assertion is satified if Parent is an ancestor of Child in a hierarchy.
661In other words Child can be statically cast to Parent. The cast from Parent
662to child would be dynamically checked as usual.
663
664However in this form there are two concerns. The first that Parent will
665usually be consistent for a given use, it will not be a variable. Second is
666that we may also need the assertion functions. To do any casting/conversions
667anyways.
668TODO: Talk about when we wrap a concrete type and how that leads to "may".
669
670To this end it may be better that the parent trait combines the usual
671assertions plus this new primitive assertion. There may or may not be use
672cases for accessing just one half and providing easy access to them may be
673required depending on how that turns out.
674
675    trait Parent(dtype T | interface(T)) virtual(<grand-parent?>) { }
676
677### Extension: sizeof Compatablity
678Trait types are always sized, it may even be a fixed size like how pointers
679have the same size regardless of what they point at. However their contents
680may or may not be of a known size (if the `sized(...)` assertion is used).
681
682Currently there is no way to access this information. If it is needed a
683special syntax would have to be added. Here a special case of `sizeof` is
684used.
685
686    struct line aLine;
687    trait drawable widget = aLine;
688
689    size_t x = sizeof(widget);
690    size_t y = sizeof(trait drawable);
691
692As usual `y`, size of the type, is the size of the local storage used to put
693the value into. The other case `x` checks the saved stored value in the
694virtual table and returns that.
Note: See TracBrowser for help on using the repository browser.