source: doc/proposals/vtable.md @ 881f590

ADTarm-ehast-experimentalcleanup-dtorsenumforall-pointer-decayjacob/cs343-translationjenkins-sandboxnew-astnew-ast-unique-exprpthread-emulationqualifiedEnum
Last change on this file since 881f590 was 881f590, checked in by Andrew Beach <ajbeach@…>, 6 years ago

Moved anything I felt was worth saving from virtual to vtable. Cleared out virtual.txt.

  • Property mode set to 100644
File size: 25.9 KB
Line 
1Proposal For Use of Virtual Tables
2==================================
3
4This is an adaptation of the earlier virtual proposal, updating it with new
5ideas, re-framing it and laying out more design decisions. It should
6eventually replace the earlier proposal, but not all features and syntax have
7been converted to the new design.
8
9The basic concept of a virtual table (vtable) is the same here as in most
10other languages that use them. They will mostly contain function pointers
11although they should be able to store anything that goes into a trait.
12
13I also include notes on a sample implementation, which primarily exists to show
14there is a reasonable implementation. The code samples for that are in a slight
15pseudo-code to help avoid name mangling and keeps some CFA features while they
16would actually be written in C.
17
18Trait Instances
19---------------
20
21Currently traits are completely abstract. Data types might implement a trait
22but traits are not themselves data types. Which is to say you cannot have an
23instance of a trait. This proposal will change that and allow instances of
24traits to be created from instances of data types that implement the trait.
25
26For example:
27
28    trait combiner(otype T) {
29        void combine(T&, int);
30    };
31
32    struct summation {
33        int sum;
34    };
35
36    void ?{}( struct summation & this ) {
37        this.sum = 0;
38    }
39
40    void combine( struct summation & this, int num ) {
41        this.sum = this.sum + num;
42    }
43
44    trait combiner obj = struct summation{};
45    combine(obj, 5);
46
47As with `struct` (and `union` and `enum`), `trait` might be optional when
48using the trait as a type name. A trait may be used in assertion list as
49before.
50
51For traits to be used this way they should meet two requirements. First they
52should only have a single polymorphic type and each assertion should use that
53type once as a parameter. Extensions may later loosen these requirements.
54
55Also note this applies to the final expanded list of assertions. Consider:
56
57    trait foo(otype T, otype U) {
58        ... functions that use T once ...
59    }
60
61    trait bar(otype S | foo(S, char)) {
62        ... functions that use S once ...
63    }
64
65In this example `bar` may be used as a type but `foo` may not.
66
67When a trait is used as a type it creates a generic object which combines
68the base structure (an instance of `summation` in this case) and the vtable,
69which is currently created and provided by a hidden mechanism.
70
71The generic object type for each trait also implements that trait. This is
72actually the only means by which it can be used. The type of these functions
73look something like this:
74
75    void combine(trait combiner & this, int num);
76
77The main use case for trait objects is that they can be stored. They can be
78passed into functions, but using the trait directly is preferred in this case.
79
80    trait drawable(otype T) {
81        void draw(Surface & to, T & draw);
82        Rect(int) drawArea(T & draw);
83    };
84
85    struct UpdatingSurface {
86        Surface * surface;
87        vector(trait drawable) drawables;
88    };
89
90    void updateSurface(UpdatingSurface & us) {
91        for (size_t i = 0 ; i < us.drawables.size ; ++i) {
92            draw(us.surface, us.drawables[i]);
93        }
94    }
95
96With a more complete widget trait you could, for example, construct a UI tool
97kit that can declare containers that hold widgets without knowing about the
98widget types. Making it reasonable to extend the tool kit.
99
100The trait types can also be used in the types of assertions on traits as well.
101In this usage they passed as the underlying object and vtable pair as they
102are stored. The trait types can also be used in that trait's definition, which
103means you can pass two instances of a trait to a single function. However the
104look-up of the one that is not used to look up any functions, until another
105function that uses that object in the generic/look-up location is called.
106
107    trait example(otype T) {
108        bool test(T & this, trait example & that);
109    }
110
111### Explanation Of Restrictions
112
113The two restrictions on traits that can be used as trait objects are:
114
1151.  Only one generic parameter may be defined in the trait's header.
1162.  Each function assertion must have one parameter with the type of the
117    generic parameter. They may or may not return a value of that type.
118
119Elsewhere in this proposal I suggest ways to broaden these requirements.
120A simple example would be if a trait meets requirement 1 but not 2, then
121the assertions that do not satisfy the exactly one parameter requirement can
122be ignored.
123
124However I would like to talk about why these two rules are in place in the
125first place and the problems that any exceptions to these rules must avoid.
126
127The problems appear when the dispatcher function which operates on the
128generic object.
129
130    trait combiner(otype T, otype U) {
131        void combine(T&, U);
132    }
133
134This one is so strange I don't have proper syntax for it but let us say that
135the concrete dispatcher would be typed as
136`void combine(combiner(T) &, combiner(U));`. Does the function that combine
137the two underlying types exist to dispatch too?
138
139Maybe not. If `combiner(T)` works with ints and `combiner(U)` is a char then
140they could not be. It would have to enforce that all pairs of any types
141that are wrapped in this way. Which would pretty much destroy any chance of
142separate compilation.
143
144Even then it would be more expensive as the wrappers would have to carry ids
145that you use to look up on an <number of types>+1 dimensional table.
146
147The second restriction has a similar issue but makes a bit more sense to
148write out.
149
150    trait Series(otype T) {
151        ... size, iterators, getters ...
152        T join(T const &, T const &);
153    }
154
155With the dispatcher typed as:
156
157    Series join(Series const &, Series const &);
158
159Because these instances are generic and hide the underlying implementation we
160do not know what that implementation is. Unfortunately this also means the
161implementation for the two parameters might not be the same. Once we have
162two different types involved this devolves into the first case.
163
164We could check at run-time that the have the same underlying type, but this
165would likely time and space overhead and there is no clear recovery path.
166
167#### Sample Implementation
168A simple way to implement trait objects is by a pair of pointers. One to the
169underlying object and one to the vtable.
170
171    struct vtable_drawable {
172        void (*draw)(Surface &, void *);
173        Rect(int) (*drawArea)(void *);
174    };
175
176    struct drawable {
177        void * object;
178        vtable_drawable * vtable;
179    };
180
181The functions that run on the trait object would generally be generated using
182the following pattern:
183
184    void draw(Surface & surface, drawable & traitObj) {
185        return traitObj.vtable->draw(surface, traitObj.object);
186    }
187
188There may have to be special cases for things like copy construction, that
189might require a more significant wrapper. On the other hand moving could be
190implemented by moving the pointers without any need to refer to the base
191object.
192
193### Extension: Multiple Trait Parameters
194The base proposal in effect creates another use for the trait syntax that is
195related to the ones currently in the language but is also separate from them.
196The current uses generic functions and generic types, this new use could be
197described as generic objects.
198
199A generic object is of a concrete type and has concrete functions that work on
200it. It is generic in that it is a wrapper for an unknown type. Traits serve
201a similar role here as in generic functions as they limit what the function
202can be generic over.
203
204This combines the use allowing to have a generic type that is a generic
205object. All but one of the trait's parameters is given a concrete type,
206conceptually currying the trait to create a trait with on generic parameter
207that fits the original restrictions. The resulting concrete generic object
208type is different with each set of provided parameters and their values.
209
210Then it just becomes a question of where this is done. Again both examples use
211a basic syntax to show the idea.
212
213    trait iterator(virtual otype T, otype Item) {
214        bool has_next(T const &);
215        Item get_next(T const *);
216    }
217
218    iterator(int) int_it = begin(container_of_ints);
219
220The first option is to do it at the definition of the trait. One parameter
221is selected (here with the `virtual` keyword, but other rules like "the first"
222could also be used) and when an instance of the trait is created all the
223other parameters must be provided.
224
225    trait iterator(otype T, otype Item) {
226        bool has_next(T const &);
227        Item get_next(T const *);
228    }
229
230    iterator(virtual, int) int_it = begin(container_of_ints);
231
232The second option is to skip a parameter as part of the type instance
233definition. One parameter is explicitly skipped (again with the `virtual`
234keyword) and the others have concrete types. The skipped one is the one we
235are generic on.
236
237Incidentally in both examples `container_of_ints` may itself be a generic
238object and `begin` returns a generic iterator with unknown implementation.
239
240These options are not exclusive. Defining a default on the trait allows for
241an object to be created as in the first example. However, whether the
242default is provided or not, the second syntax can be used to pick a
243parameter on instantiation.
244
245Hierarchy
246---------
247
248We would also like to implement hierarchical relations between types.
249
250    ast_node
251    |-expression_node
252    | |-operator_expression
253    |
254    |-statement_node
255    | |-goto_statement
256    |
257    |-declaration_node
258      |-using_declaration
259      |-variable_declaration
260
261Virtual tables by themselves are not quite enough to implement this system.
262A vtable is just a list of functions and there is no way to check at run-time
263what these functions, we carry that knowledge with the table.
264
265This proposal adds type ids to check for position in the hierarchy and an
266explicate syntax for establishing a hierarchical relation between traits and
267their implementing types. The ids should uniquely identify each type and
268allow retrieval of the type's parent if one exists. By recursion this allows
269the ancestor relation between any two hierarchical types can be checked.
270
271The hierarchy is created with traits as the internal nodes and structures
272as the leaf nodes. The structures may be used normally and the traits can
273be used to create generic objects as in the first section (the same
274restrictions apply). However these type objects store their type id which can
275be recovered to figure out which type they are or at least check to see if
276they fall into a given sub-tree at run-time.
277
278Here is an example of part of a hierarchy. The `virtual(PARENT)` syntax is
279just an example. But when used it give the name of the parent type or if
280empty it shows that this type is the root of its hierarchy.
281(Also I'm not sure where I got these casing rules.)
282
283    trait ast_node(otype T) virtual() {
284        void print(T & this, ostream & out);
285        void visit(T & this, Visitor & visitor);
286        CodeLocation const & get_code_location(T & this);
287    }
288
289    trait expression_node(otype T) virtual(ast_node) {
290        Type eval_type(T const & this);
291    }
292
293    struct operator_expression virtual(expression_node) {
294        enum operator_kind kind;
295        trait expression_node rands[2];
296    }
297
298    trait statement_node(otype T) virtual(ast_node) {
299        vector(Label) & get_labels(T & this);
300    }
301
302    struct goto_statement virtual(statement_node) {
303        vector(Label) labels;
304        Label target;
305    }
306
307    trait declaration_node(otype T) virtual(ast_node) {
308        string name_of(T const & this);
309        Type type_of(T const & this);
310    }
311
312    struct using_declaration virtual(declaration_node) {
313        string new_type;
314        Type old_type;
315    }
316
317    struct variable_declaration virtual(declaration_node) {
318        string name;
319        Type type;
320    }
321
322This system does not support multiple inheritance. The system could be
323extended to support it or a limited form (ex. you may have multiple parents
324but they may not have a common ancestor). However this proposal focuses just
325on using hierachy as organization. Other uses for reusable/genaric code or
326shared interfaces is left for other features of the language.
327
328### Extension: Structural Inheritance
329An extension would be allow structures to be used as internal nodes on the
330inheritance tree. Its child types would have to implement the same fields.
331
332The weaker restriction would be to convert the fields into field assertions
333(Not implemented yet: `U T.x` means there is a field of type you on the type
334T. Offset unknown and passed in/stored with function pointers.)
335A concrete child would have to declare the same set of fields with the same
336types. This is of a more functional style.
337
338The stronger restriction is that the fields of the parent are a prefix of the
339child's fields. Possibly automatically inserted. This the imperative view and
340may also have less overhead.
341
342### Extension: Unions and Enumerations
343Currently there is no reason unions and enumerations, in the cases they
344do implement the trait, could not be in the hierarchy as leaf nodes.
345
346It does not work with structural induction, but that could just be a compile
347time check that all ancestors are traits or do not add field assertions.
348
349#### Sample Implementation
350The type id may be as little as:
351
352    struct typeid {
353        struct typeid const * const parent;
354    };
355
356Some linker magic would have to be used to ensure exactly one copy of each
357structure for each type exists in memory. There seem to be special once
358sections that support this and it should be easier than generating unique
359ids across compilation units.
360
361The structure could be extended to contain any additional type information.
362
363There are two general designs for vtables with type ids. The first is to put
364the type id at the top of the vtable, this is the most compact and efficient
365solution but only works if we have exactly 1 vtable for each type. The second
366is to put a pointer to the type id in each vtable. This has more overhead but
367allows multiple vtables per type.
368
369    struct <trait>_vtable {
370        struct typeid const id;
371
372        // Trait dependent list of vtable members.
373    };
374
375    struct <trait>_vtable {
376        struct typeid const * const id;
377
378        // Trait dependent list of vtable members.
379    };
380
381One important restriction is that only one instance of each typeid in memory.
382There is a ".gnu.linkonce" feature in the linker that might solve the issue.
383
384### Virtual Casts
385The generic objects may be cast up and down the hierarchy.
386
387Casting to an ancestor type always succeeds. From one generic type to another
388is just a reinterpretation and could be implicate. Wrapping and unwrapping
389a concrete type will probably use the same syntax as in the first section.
390
391Casting from an ancestor to a descendent requires a check. The underlying
392type may or may not belong to the sub-tree headed by that descendent. For this
393we introduce a new cast operator, which returns the pointer unchanged if the
394check succeeds and null otherwise.
395
396    trait SubType * new_value = (virtual trait SubType *)super_type;
397
398For the following example I am using the as of yet finished exception system.
399
400    trait exception(otype T) virtual() {
401        char const * what(T & this);
402    }
403
404    trait io_error(otype T) virtual(exception) {
405        FILE * which_file(T & this);
406    }
407
408    struct eof_error(otype T) virtual(io_error) {
409        FILE * file;
410    }
411
412    char const * what(eof_error &) {
413        return "Tried to read from an empty file.";
414    }
415
416    FILE * which_file(eof_error & this) {
417        return eof_error.file;
418    }
419
420    bool handleIoError(exception * exc) {
421        io_error * error = (virtual io_error *)exc;
422        if (NULL == error) {
423            return false;
424        }
425        ...
426        return true;
427    }
428
429### Extension: Implicate Virtual Cast Target
430This is a small extension, even in the example above `io_error *` is repeated
431in the cast and the variable being assigned to. Using return type inference
432would allow the second type to be skipped in cases it is clear what type is
433being checked against.
434
435The line then becomes:
436
437    io_error * error = (virtual)exc;
438
439#### Sample Implementation
440This cast implementation assumes a type id layout similar to the one given
441above. Also this code is definitely in the underlying C. Functions that give
442this functionality could exist in the standard library but these are meant to
443be produced by code translation of the virtual cast.
444
445    bool is_in_subtree(typeid const * root, typeid const * id) {
446        if (root == id) {
447            return true
448        } else if (NULL == id->parent) {
449            return false;
450        } else {
451            return is_in_subtree(root, id->parent);
452        }
453    }
454
455    void * virtual_cast(typeid const * target, void * value) {
456        return is_in_subtree(target, *(typeid const **)value) ? value : NULL;
457    }
458
459The virtual cast function might have to be wrapped with some casts to make it
460compile without warning.
461
462For the implicate target type we may be able to lean on the type resolution
463system that already exists. If the casting to ancestor type is built into
464the resolution then the impicate target could be decided by picking an
465overload, generated for each hierarchial type (here io_error and its root
466type exception).
467
468    io_error * virtual_cast(exception * value) {
469        return virtual_cast(io_error_typeid, value);
470    }
471
472### Extension: Inline vtables
473Since the structures here are usually made to be turned into trait objects
474it might be worth it to have fields in them to store the virtual table
475pointer. This would have to be declared on the trait as an assertion (example:
476`vtable;` or `T.vtable;`), but if it is the trait object could be a single
477pointer.
478
479There are also three options for where the pointer to the vtable. It could be
480anywhere, a fixed location for each trait or always at the front. For the per-
481trait solution an extension to specify what it is (example `vtable[0];`) which
482could also be used to combine it with others. So these options can be combined
483to allow access to all three options.
484
485The pointer to virtual table field on structures might implicately added (the
486types have to declare they are a child here) or created with a declaration,
487possibly like the one used to create the assertion.
488
489### Virtual Tables as Types
490Here we consider encoding plus the implementation of functions on it to be a
491type. Which is to say in the type hierarchy structures aren't concrete types
492anymore, instead they are parent types to vtables, which combine the encoding
493and implementation.
494
495### Question: Wrapping Structures
496One issue is what to do with concrete types at the base of the type tree.
497When we are working with the concrete type generally it would like them to be
498regular structures with direct calls. On the other hand for interactions with
499other types in the hierarchy it is more convenent for the type already to be
500cast.
501
502Which of these two should we use? Should we support both and if so how do we
503choose which one is being used at any given time.
504
505On a related note I have been using pointers two trait types here, as that
506is how many existing languages handle it. However the generic objects might
507be only one or two pointers wide passing the objects as a whole would not
508be very expensive and all operations on the generic objects probably have
509to be defined anyways.
510
511Resolution Scope
512----------------
513
514What is the scope of a resolution? When are the functions in a vtable decided
515and how broadly is this applied?
516
517### Type Level:
518Each structure has a single resolution for all of the functions in the
519virtual trait. This is how many languages that implement this or similar
520features do it.
521
522The main thing CFA would need to do it this way is some single point where
523the type declaration, including the functions that satisfy the trait, are
524all defined. Currently there are many points where this can happen, not all
525of them have the same definitions and no way to select one over the other.
526
527Some syntax would have to be added to specify the resolution point. To ensure
528a single instance there may have to be two variants, one forward declaration
529and one to create the instance. With some compiler magic the forward
530declaration maybe enough.
531
532    extern trait combiner(struct summation) vtable;
533    trait combiner(struct summation) vtable;
534
535Or (with the same variants):
536
537    vtable combiner(struct summation);
538
539The extern variant promises that the vtable will exist while the normal one
540is where the resolution actually happens.
541
542### Explicit Resolution Points:
543Slightly looser than the above, there are explicit points where the vtables
544are resolved, but there is no limit on the number of resolution points that
545might be provided. Each time a object is bound to a trait, one of the
546resolutions is selected. This might be the most flexible option.
547
548An syntax would have to be provided as above. There may also be the option
549to name resolution points so that you can choose between them. This also
550could come with the ability to forward declare them.
551
552Especially if they are not named, these resolution points should be able to
553appear in functions, where the scoping rules can be used to select one.
554However this also means that stack-allocated functions can end up in the
555vtable.
556
557    extern trait combiner(struct summation) vtable sum;
558    trait combiner(struct summation) vtable sum;
559
560    extern trait combiner(struct summation) vtable sum default;
561    trait combiner(struct summation) vtable sum default;
562
563The extern difference is the same before. The name (sum in the samples) is
564used at the binding site to say which one is picked. The default keyword can
565be used in only some of the declarations.
566
567    trait combiner fee = (summation_instance, sum);
568    trait combiner foe = summation_instance;
569
570(I am not really happy about this syntax, but it kind of works.)
571The object being bound is required. The name of the vtable is optional if
572there is exactly one vtable name marked with default.
573
574These could also be placed inside functions. In which case both the name and
575the default keyword might be optional. If the name is omitted in an assignment
576the closest vtable is chosen (returning to the global default rule if no
577appropriate local vtable is in scope).
578
579### Site Based Resolution:
580Every place in code where the binding of a vtable to an object occurs has
581its own resolution. Syntax-wise this is the simplest as it should be able
582to use just the existing declarations and the conversion to trait object.
583It also is very close to the current polymorphic resolution rules.
584
585This works as the explicit resolution points except the resolution points
586are implicit and their would be no selection of which resolution to use. The
587closest (current) resolution is always selected.
588
589This could easily lead to an explosion of vtables as it has the most fine
590grained resolution the number of bindings in a single scope (that produces
591the same binding) could be quite high. Merging identical vtables might help
592reduce that.
593
594Vtable Lifetime Issues
595----------------------
596
597Vtables interact badly with the thunk issue. Conceptually vtables are static
598like type/function data they carry, as those decisions are made by the
599resolver at compile time.
600
601Stack allocated functions interact badly with this because they are not
602static. There are several ways to try to resolve this, however without a
603general solution most can keep vtables from making the existing thunk problem
604worse, they don't do anything to solve it.
605
606Filling in some fields of a static vtable could cause issues on a recursive
607call. And then we are still limited by the lifetime of the stack functions, as
608the vtable with stale pointers is still a problem.
609
610Dynamically allocated vtables introduces memory management overhead and
611requires some way to differentiate between dynamic and statically allocated
612tables. The stale function pointer problem continues unless those becomes
613dynamically allocated as well which gives us the same costs again.
614
615Stack allocating the vtable seems like the best issue. The vtable's lifetime
616is now the limiting factor but it should be effectively the same as the
617shortest lifetime of a function assigned to it. However this still limits the
618lifetime "implicitly" and returns to the original problem with thunks.
619
620Odds And Ends
621-------------
622
623In addition to the main design there are a few extras that should be
624considered. They are not part of the core design but make the new uses fully
625featured.
626
627### Extension: Parent-Child Assertion
628For hierarchy types in regular traits, generic functions or generic structures
629we may want to be able to check parent-child relationships between two types
630given. For this we might have to add another primitive assertion. It would
631have the following form if declared in code:
632
633    trait is_parent_child(dtype Parent, dtype Child) { <built-in magic> }
634
635This assertion is satified if Parent is an ancestor of Child in a hierarchy.
636In other words Child can be statically cast to Parent. The cast from Parent
637to child would be dynamically checked as usual.
638
639However in this form there are two concerns. The first that Parent will
640usually be consistent for a given use, it will not be a variable. Second is
641that we may also need the assertion functions. To do any casting/conversions
642anyways.
643TODO: Talk about when we wrap a concrete type and how that leads to "may".
644
645To this end it may be better that the parent trait combines the usual
646assertions plus this new primitive assertion. There may or may not be use
647cases for accessing just one half and providing easy access to them may be
648required depending on how that turns out.
649
650    trait Parent(dtype T | interface(T)) virtual(<grand-parent?>) { }
651
652### Extension: sizeof Compatablity
653Trait types are always sized, it may even be a fixed size like how pointers
654have the same size regardless of what they point at. However their contents
655may or may not be of a known size (if the `sized(...)` assertion is used).
656
657Currently there is no way to access this information. If it is needed a
658special syntax would have to be added. Here a special case of `sizeof` is
659used.
660
661    struct line aLine;
662    trait drawable widget = aLine;
663
664    size_t x = sizeof(widget);
665    size_t y = sizeof(trait drawable);
666
667As usual `y`, size of the type, is the size of the local storage used to put
668the value into. The other case `x` checks the saved stored value in the
669virtual table and returns that.
Note: See TracBrowser for help on using the repository browser.