source: doc/proposals/vtable.md @ bce76d1

ADTarm-ehast-experimentalenumforall-pointer-decayjacob/cs343-translationnew-astnew-ast-unique-exprpthread-emulationqualifiedEnum
Last change on this file since bce76d1 was 0f740d6, checked in by Andrew Beach <ajbeach@…>, 5 years ago

Clean-up. Added one line docs for Stmts.

  • Property mode set to 100644
File size: 25.7 KB
Line 
1Proposal For Use of Virtual Tables
2==================================
3
4The basic concept of a virtual table (vtable) is the same here as in most
5other languages that use them. They will mostly contain function pointers
6although they should be able to store anything that goes into a trait.
7
8I also include notes on a sample implementation, which primarily exists to show
9there is a reasonable implementation. The code samples for that are in a slight
10pseudo-code to help avoid name mangling and keeps some CFA features while they
11would actually be written in C.
12
13Trait Instances
14---------------
15
16Currently traits are completely abstract. Data types might implement a trait
17but traits are not themselves data types. Which is to say you cannot have an
18instance of a trait. This proposal will change that and allow instances of
19traits to be created from instances of data types that implement the trait.
20
21For example:
22
23    trait combiner(otype T) {
24        void combine(T&, int);
25    };
26
27    struct summation {
28        int sum;
29    };
30
31    void ?{}( struct summation & this ) {
32        this.sum = 0;
33    }
34
35    void combine( struct summation & this, int num ) {
36        this.sum = this.sum + num;
37    }
38
39    trait combiner obj = struct summation{};
40    combine(obj, 5);
41
42As with `struct` (and `union` and `enum`), `trait` might be optional when
43using the trait as a type name. A trait may be used in assertion list as
44before.
45
46For traits to be used this way they should meet two requirements. First they
47should only have a single polymorphic type and each assertion should use that
48type once as a parameter. Extensions may later loosen these requirements.
49
50Also note this applies to the final expanded list of assertions. Consider:
51
52    trait foo(otype T, otype U) {
53        ... functions that use T once ...
54    }
55
56    trait bar(otype S | foo(S, char)) {
57        ... functions that use S once ...
58    }
59
60In this example `bar` may be used as a type but `foo` may not.
61
62When a trait is used as a type it creates a generic object which combines
63the base structure (an instance of `summation` in this case) and the vtable,
64which is currently created and provided by a hidden mechanism.
65
66The generic object type for each trait also implements that trait. This is
67actually the only means by which it can be used. The type of these functions
68look something like this:
69
70    void combine(trait combiner & this, int num);
71
72The main use case for trait objects is that they can be stored. They can be
73passed into functions, but using the trait directly is preferred in this case.
74
75    trait drawable(otype T) {
76        void draw(Surface & to, T & draw);
77        Rect(int) drawArea(T & draw);
78    };
79
80    struct UpdatingSurface {
81        Surface * surface;
82        vector(trait drawable) drawables;
83    };
84
85    void updateSurface(UpdatingSurface & us) {
86        for (size_t i = 0 ; i < us.drawables.size ; ++i) {
87            draw(us.surface, us.drawables[i]);
88        }
89    }
90
91With a more complete widget trait you could, for example, construct a UI tool
92kit that can declare containers that hold widgets without knowing about the
93widget types. Making it reasonable to extend the tool kit.
94
95The trait types can also be used in the types of assertions on traits as well.
96In this usage they passed as the underlying object and vtable pair as they
97are stored. The trait types can also be used in that trait's definition, which
98means you can pass two instances of a trait to a single function. However the
99look-up of the one that is not used to look up any functions, until another
100function that uses that object in the generic/look-up location is called.
101
102    trait example(otype T) {
103        bool test(T & this, trait example & that);
104    }
105
106### Explanation Of Restrictions
107
108The two restrictions on traits that can be used as trait objects are:
109
1101.  Only one generic parameter may be defined in the trait's header.
1112.  Each function assertion must have one parameter with the type of the
112    generic parameter. They may or may not return a value of that type.
113
114Elsewhere in this proposal I suggest ways to broaden these requirements.
115A simple example would be if a trait meets requirement 1 but not 2, then
116the assertions that do not satisfy the exactly one parameter requirement can
117be ignored.
118
119However I would like to talk about why these two rules are in place in the
120first place and the problems that any exceptions to these rules must avoid.
121
122The problems appear when the dispatcher function which operates on the
123generic object.
124
125    trait combiner(otype T, otype U) {
126        void combine(T&, U);
127    }
128
129This one is so strange I don't have proper syntax for it but let us say that
130the concrete dispatcher would be typed as
131`void combine(combiner(T) &, combiner(U));`. Does the function that combine
132the two underlying types exist to dispatch too?
133
134Maybe not. If `combiner(T)` works with ints and `combiner(U)` is a char then
135they could not be. It would have to enforce that all pairs of any types
136that are wrapped in this way. Which would pretty much destroy any chance of
137separate compilation.
138
139Even then it would be more expensive as the wrappers would have to carry ids
140that you use to look up on an <number of types>+1 dimensional table.
141
142The second restriction has a similar issue but makes a bit more sense to
143write out.
144
145    trait Series(otype T) {
146        ... size, iterators, getters ...
147        T join(T const &, T const &);
148    }
149
150With the dispatcher typed as:
151
152    Series join(Series const &, Series const &);
153
154Because these instances are generic and hide the underlying implementation we
155do not know what that implementation is. Unfortunately this also means the
156implementation for the two parameters might not be the same. Once we have
157two different types involved this devolves into the first case.
158
159We could check at run-time that the have the same underlying type, but this
160would likely time and space overhead and there is no clear recovery path.
161
162#### Sample Implementation
163A simple way to implement trait objects is by a pair of pointers. One to the
164underlying object and one to the vtable.
165
166    struct vtable_drawable {
167        void (*draw)(Surface &, void *);
168        Rect(int) (*drawArea)(void *);
169    };
170
171    struct drawable {
172        void * object;
173        vtable_drawable * vtable;
174    };
175
176The functions that run on the trait object would generally be generated using
177the following pattern:
178
179    void draw(Surface & surface, drawable & traitObj) {
180        return traitObj.vtable->draw(surface, traitObj.object);
181    }
182
183There may have to be special cases for things like copy construction, that
184might require a more significant wrapper. On the other hand moving could be
185implemented by moving the pointers without any need to refer to the base
186object.
187
188### Extension: Multiple Trait Parameters
189The base proposal in effect creates another use for the trait syntax that is
190related to the ones currently in the language but is also separate from them.
191The current uses generic functions and generic types, this new use could be
192described as generic objects.
193
194A generic object is of a concrete type and has concrete functions that work on
195it. It is generic in that it is a wrapper for an unknown type. Traits serve
196a similar role here as in generic functions as they limit what the function
197can be generic over.
198
199This combines the use allowing to have a generic type that is a generic
200object. All but one of the trait's parameters is given a concrete type,
201conceptually currying the trait to create a trait with on generic parameter
202that fits the original restrictions. The resulting concrete generic object
203type is different with each set of provided parameters and their values.
204
205Then it just becomes a question of where this is done. Again both examples use
206a basic syntax to show the idea.
207
208    trait iterator(virtual otype T, otype Item) {
209        bool has_next(T const &);
210        Item get_next(T const *);
211    }
212
213    iterator(int) int_it = begin(container_of_ints);
214
215The first option is to do it at the definition of the trait. One parameter
216is selected (here with the `virtual` keyword, but other rules like "the first"
217could also be used) and when an instance of the trait is created all the
218other parameters must be provided.
219
220    trait iterator(otype T, otype Item) {
221        bool has_next(T const &);
222        Item get_next(T &);
223    }
224
225    iterator(virtual, int) int_it = begin(container_of_ints);
226
227The second option is to skip a parameter as part of the type instance
228definition. One parameter is explicitly skipped (again with the `virtual`
229keyword) and the others have concrete types. The skipped one is the one we
230are generic on.
231
232Incidentally in both examples `container_of_ints` may itself be a generic
233object and `begin` returns a generic iterator with unknown implementation.
234
235These options are not exclusive. Defining a default on the trait allows for
236an object to be created as in the first example. However, whether the
237default is provided or not, the second syntax can be used to pick a
238parameter on instantiation.
239
240Hierarchy
241---------
242
243We would also like to implement hierarchical relations between types.
244
245    ast_node
246    |-expression_node
247    | |-operator_expression
248    |
249    |-statement_node
250    | |-goto_statement
251    |
252    |-declaration_node
253      |-using_declaration
254      |-variable_declaration
255
256Virtual tables by themselves are not quite enough to implement this system.
257A vtable is just a list of functions and there is no way to check at run-time
258what these functions, we carry that knowledge with the table.
259
260This proposal adds type ids to check for position in the hierarchy and an
261explicate syntax for establishing a hierarchical relation between traits and
262their implementing types. The ids should uniquely identify each type and
263allow retrieval of the type's parent if one exists. By recursion this allows
264the ancestor relation between any two hierarchical types can be checked.
265
266The hierarchy is created with traits as the internal nodes and structures
267as the leaf nodes. The structures may be used normally and the traits can
268be used to create generic objects as in the first section (the same
269restrictions apply). However these type objects store their type id which can
270be recovered to figure out which type they are or at least check to see if
271they fall into a given sub-tree at run-time.
272
273Here is an example of part of a hierarchy. The `virtual(PARENT)` syntax is
274just an example. But when used it give the name of the parent type or if
275empty it shows that this type is the root of its hierarchy.
276(Also I'm not sure where I got these casing rules.)
277
278    trait ast_node(otype T) virtual() {
279        void print(T & this, ostream & out);
280        void visit(T & this, Visitor & visitor);
281        CodeLocation const & get_code_location(T & this);
282    }
283
284    trait expression_node(otype T) virtual(ast_node) {
285        Type eval_type(T const & this);
286    }
287
288    struct operator_expression virtual(expression_node) {
289        enum operator_kind kind;
290        trait expression_node rands[2];
291    }
292
293    trait statement_node(otype T) virtual(ast_node) {
294        vector(Label) & get_labels(T & this);
295    }
296
297    struct goto_statement virtual(statement_node) {
298        vector(Label) labels;
299        Label target;
300    }
301
302    trait declaration_node(otype T) virtual(ast_node) {
303        string name_of(T const & this);
304        Type type_of(T const & this);
305    }
306
307    struct using_declaration virtual(declaration_node) {
308        string new_type;
309        Type old_type;
310    }
311
312    struct variable_declaration virtual(declaration_node) {
313        string name;
314        Type type;
315    }
316
317This system does not support multiple inheritance. The system could be
318extended to support it or a limited form (ex. you may have multiple parents
319but they may not have a common ancestor). However this proposal focuses just
320on using hierachy as organization. Other uses for reusable/genaric code or
321shared interfaces is left for other features of the language.
322
323### Extension: Structural Inheritance
324An extension would be allow structures to be used as internal nodes on the
325inheritance tree. Its child types would have to implement the same fields.
326
327The weaker restriction would be to convert the fields into field assertions
328(Not implemented yet: `U T.x` means there is a field of type you on the type
329T. Offset unknown and passed in/stored with function pointers.)
330A concrete child would have to declare the same set of fields with the same
331types. This is of a more functional style.
332
333The stronger restriction is that the fields of the parent are a prefix of the
334child's fields. Possibly automatically inserted. This the imperative view and
335may also have less overhead.
336
337### Extension: Unions and Enumerations
338Currently there is no reason unions and enumerations, in the cases they
339do implement the trait, could not be in the hierarchy as leaf nodes.
340
341It does not work with structural induction, but that could just be a compile
342time check that all ancestors are traits or do not add field assertions.
343
344#### Sample Implementation
345The type id may be as little as:
346
347    struct typeid {
348        struct typeid const * const parent;
349    };
350
351Some linker magic would have to be used to ensure exactly one copy of each
352structure for each type exists in memory. There seem to be special once
353sections that support this and it should be easier than generating unique
354ids across compilation units.
355
356The structure could be extended to contain any additional type information.
357
358There are two general designs for vtables with type ids. The first is to put
359the type id at the top of the vtable, this is the most compact and efficient
360solution but only works if we have exactly 1 vtable for each type. The second
361is to put a pointer to the type id in each vtable. This has more overhead but
362allows multiple vtables per type.
363
364    struct <trait>_vtable {
365        struct typeid const id;
366
367        // Trait dependent list of vtable members.
368    };
369
370    struct <trait>_vtable {
371        struct typeid const * const id;
372
373        // Trait dependent list of vtable members.
374    };
375
376One important restriction is that only one instance of each typeid in memory.
377There is a ".gnu.linkonce" feature in the linker that might solve the issue.
378
379### Virtual Casts
380The generic objects may be cast up and down the hierarchy.
381
382Casting to an ancestor type always succeeds. From one generic type to another
383is just a reinterpretation and could be implicate. Wrapping and unwrapping
384a concrete type will probably use the same syntax as in the first section.
385
386Casting from an ancestor to a descendent requires a check. The underlying
387type may or may not belong to the sub-tree headed by that descendent. For this
388we introduce a new cast operator, which returns the pointer unchanged if the
389check succeeds and null otherwise.
390
391    trait SubType * new_value = (virtual trait SubType *)super_type;
392
393For the following example I am using the as of yet finished exception system.
394
395    trait exception(otype T) virtual() {
396        char const * what(T & this);
397    }
398
399    trait io_error(otype T) virtual(exception) {
400        FILE * which_file(T & this);
401    }
402
403    struct eof_error(otype T) virtual(io_error) {
404        FILE * file;
405    }
406
407    char const * what(eof_error &) {
408        return "Tried to read from an empty file.";
409    }
410
411    FILE * which_file(eof_error & this) {
412        return eof_error.file;
413    }
414
415    bool handleIoError(exception * exc) {
416        io_error * error = (virtual io_error *)exc;
417        if (NULL == error) {
418            return false;
419        }
420        ...
421        return true;
422    }
423
424### Extension: Implicate Virtual Cast Target
425This is a small extension, even in the example above `io_error *` is repeated
426in the cast and the variable being assigned to. Using return type inference
427would allow the second type to be skipped in cases it is clear what type is
428being checked against.
429
430The line then becomes:
431
432    io_error * error = (virtual)exc;
433
434#### Sample Implementation
435This cast implementation assumes a type id layout similar to the one given
436above. Also this code is definitely in the underlying C. Functions that give
437this functionality could exist in the standard library but these are meant to
438be produced by code translation of the virtual cast.
439
440    bool is_in_subtree(typeid const * root, typeid const * id) {
441        if (root == id) {
442            return true
443        } else if (NULL == id->parent) {
444            return false;
445        } else {
446            return is_in_subtree(root, id->parent);
447        }
448    }
449
450    void * virtual_cast(typeid const * target, void * value) {
451        return is_in_subtree(target, *(typeid const **)value) ? value : NULL;
452    }
453
454The virtual cast function might have to be wrapped with some casts to make it
455compile without warning.
456
457For the implicate target type we may be able to lean on the type resolution
458system that already exists. If the casting to ancestor type is built into
459the resolution then the impicate target could be decided by picking an
460overload, generated for each hierarchial type (here io_error and its root
461type exception).
462
463    io_error * virtual_cast(exception * value) {
464        return virtual_cast(io_error_typeid, value);
465    }
466
467### Extension: Inline vtables
468Since the structures here are usually made to be turned into trait objects
469it might be worth it to have fields in them to store the virtual table
470pointer. This would have to be declared on the trait as an assertion (example:
471`vtable;` or `T.vtable;`), but if it is the trait object could be a single
472pointer.
473
474There are also three options for where the pointer to the vtable. It could be
475anywhere, a fixed location for each trait or always at the front. For the per-
476trait solution an extension to specify what it is (example `vtable[0];`) which
477could also be used to combine it with others. So these options can be combined
478to allow access to all three options.
479
480The pointer to virtual table field on structures might implicately added (the
481types have to declare they are a child here) or created with a declaration,
482possibly like the one used to create the assertion.
483
484### Virtual Tables as Types
485Here we consider encoding plus the implementation of functions on it to be a
486type. Which is to say in the type hierarchy structures aren't concrete types
487anymore, instead they are parent types to vtables, which combine the encoding
488and implementation.
489
490### Question: Wrapping Structures
491One issue is what to do with concrete types at the base of the type tree.
492When we are working with the concrete type generally it would like them to be
493regular structures with direct calls. On the other hand for interactions with
494other types in the hierarchy it is more convenent for the type already to be
495cast.
496
497Which of these two should we use? Should we support both and if so how do we
498choose which one is being used at any given time.
499
500On a related note I have been using pointers two trait types here, as that
501is how many existing languages handle it. However the generic objects might
502be only one or two pointers wide passing the objects as a whole would not
503be very expensive and all operations on the generic objects probably have
504to be defined anyways.
505
506Resolution Scope
507----------------
508
509What is the scope of a resolution? When are the functions in a vtable decided
510and how broadly is this applied?
511
512### Type Level:
513Each structure has a single resolution for all of the functions in the
514virtual trait. This is how many languages that implement this or similar
515features do it.
516
517The main thing CFA would need to do it this way is some single point where
518the type declaration, including the functions that satisfy the trait, are
519all defined. Currently there are many points where this can happen, not all
520of them have the same definitions and no way to select one over the other.
521
522Some syntax would have to be added to specify the resolution point. To ensure
523a single instance there may have to be two variants, one forward declaration
524and one to create the instance. With some compiler magic the forward
525declaration maybe enough.
526
527    extern trait combiner(struct summation) vtable;
528    trait combiner(struct summation) vtable;
529
530Or (with the same variants):
531
532    vtable combiner(struct summation);
533
534The extern variant promises that the vtable will exist while the normal one
535is where the resolution actually happens.
536
537### Explicit Resolution Points:
538Slightly looser than the above, there are explicit points where the vtables
539are resolved, but there is no limit on the number of resolution points that
540might be provided. Each time a object is bound to a trait, one of the
541resolutions is selected. This might be the most flexible option.
542
543An syntax would have to be provided as above. There may also be the option
544to name resolution points so that you can choose between them. This also
545could come with the ability to forward declare them.
546
547Especially if they are not named, these resolution points should be able to
548appear in functions, where the scoping rules can be used to select one.
549However this also means that stack-allocated functions can end up in the
550vtable.
551
552    extern trait combiner(struct summation) vtable sum;
553    trait combiner(struct summation) vtable sum;
554
555    extern trait combiner(struct summation) vtable sum default;
556    trait combiner(struct summation) vtable sum default;
557
558The extern difference is the same before. The name (sum in the samples) is
559used at the binding site to say which one is picked. The default keyword can
560be used in only some of the declarations.
561
562    trait combiner fee = (summation_instance, sum);
563    trait combiner foe = summation_instance;
564
565(I am not really happy about this syntax, but it kind of works.)
566The object being bound is required. The name of the vtable is optional if
567there is exactly one vtable name marked with default.
568
569These could also be placed inside functions. In which case both the name and
570the default keyword might be optional. If the name is omitted in an assignment
571the closest vtable is chosen (returning to the global default rule if no
572appropriate local vtable is in scope).
573
574### Site Based Resolution:
575Every place in code where the binding of a vtable to an object occurs has
576its own resolution. Syntax-wise this is the simplest as it should be able
577to use just the existing declarations and the conversion to trait object.
578It also is very close to the current polymorphic resolution rules.
579
580This works as the explicit resolution points except the resolution points
581are implicit and their would be no selection of which resolution to use. The
582closest (current) resolution is always selected.
583
584This could easily lead to an explosion of vtables as it has the most fine
585grained resolution the number of bindings in a single scope (that produces
586the same binding) could be quite high. Merging identical vtables might help
587reduce that.
588
589Vtable Lifetime Issues
590----------------------
591
592Vtables interact badly with the thunk issue. Conceptually vtables are static
593like type/function data they carry, as those decisions are made by the
594resolver at compile time.
595
596Stack allocated functions interact badly with this because they are not
597static. There are several ways to try to resolve this, however without a
598general solution most can keep vtables from making the existing thunk problem
599worse, they don't do anything to solve it.
600
601Filling in some fields of a static vtable could cause issues on a recursive
602call. And then we are still limited by the lifetime of the stack functions, as
603the vtable with stale pointers is still a problem.
604
605Dynamically allocated vtables introduces memory management overhead and
606requires some way to differentiate between dynamic and statically allocated
607tables. The stale function pointer problem continues unless those becomes
608dynamically allocated as well which gives us the same costs again.
609
610Stack allocating the vtable seems like the best issue. The vtable's lifetime
611is now the limiting factor but it should be effectively the same as the
612shortest lifetime of a function assigned to it. However this still limits the
613lifetime "implicitly" and returns to the original problem with thunks.
614
615Odds And Ends
616-------------
617
618In addition to the main design there are a few extras that should be
619considered. They are not part of the core design but make the new uses fully
620featured.
621
622### Extension: Parent-Child Assertion
623For hierarchy types in regular traits, generic functions or generic structures
624we may want to be able to check parent-child relationships between two types
625given. For this we might have to add another primitive assertion. It would
626have the following form if declared in code:
627
628    trait is_parent_child(dtype Parent, dtype Child) { <built-in magic> }
629
630This assertion is satified if Parent is an ancestor of Child in a hierarchy.
631In other words Child can be statically cast to Parent. The cast from Parent
632to child would be dynamically checked as usual.
633
634However in this form there are two concerns. The first that Parent will
635usually be consistent for a given use, it will not be a variable. Second is
636that we may also need the assertion functions. To do any casting/conversions
637anyways.
638TODO: Talk about when we wrap a concrete type and how that leads to "may".
639
640To this end it may be better that the parent trait combines the usual
641assertions plus this new primitive assertion. There may or may not be use
642cases for accessing just one half and providing easy access to them may be
643required depending on how that turns out.
644
645    trait Parent(dtype T | interface(T)) virtual(<grand-parent?>) { }
646
647### Extension: sizeof Compatablity
648Trait types are always sized, it may even be a fixed size like how pointers
649have the same size regardless of what they point at. However their contents
650may or may not be of a known size (if the `sized(...)` assertion is used).
651
652Currently there is no way to access this information. If it is needed a
653special syntax would have to be added. Here a special case of `sizeof` is
654used.
655
656    struct line aLine;
657    trait drawable widget = aLine;
658
659    size_t x = sizeof(widget);
660    size_t y = sizeof(trait drawable);
661
662As usual `y`, size of the type, is the size of the local storage used to put
663the value into. The other case `x` checks the saved stored value in the
664virtual table and returns that.
Note: See TracBrowser for help on using the repository browser.