source: doc/proposals/ @ 936d95c

Last change on this file since 936d95c was 936d95c, checked in by Andrew Beach <ajbeach@…>, 4 years ago

The vtable proposal now notes how the assotiated types proposal might help it.

  • Property mode set to 100644
File size: 28.1 KB
1Proposal For Use of Virtual Tables
4The basic concept of a virtual table (vtable) is the same here as in most
5other languages that use them. They will mostly contain function pointers
6although they should be able to store anything that goes into a trait.
8I also include notes on a sample implementation, which primarily exists to show
9there is a reasonable implementation. The code samples for that are in a slight
10pseudo-code to help avoid name mangling and keeps some CFA features while they
11would actually be written in C.
13Trait Instances
16Currently traits are completely abstract. Data types might implement a trait
17but traits are not themselves data types. Which is to say you cannot have an
18instance of a trait. This proposal will change that and allow instances of
19traits to be created from instances of data types that implement the trait.
21For example:
23    trait combiner(otype T) {
24        void combine(T&, int);
25    };
27    struct summation {
28        int sum;
29    };
31    void ?{}( struct summation & this ) {
32        this.sum = 0;
33    }
35    void combine( struct summation & this, int num ) {
36        this.sum = this.sum + num;
37    }
39    trait combiner obj = struct summation{};
40    combine(obj, 5);
42As with `struct` (and `union` and `enum`), `trait` might be optional when
43using the trait as a type name. A trait may be used in assertion list as
46For traits to be used this way they should meet two requirements. First they
47should only have a single polymorphic type and each assertion should use that
48type once as a parameter. Extensions may later loosen these requirements.
50Also note this applies to the final expanded list of assertions. Consider:
52    trait foo(otype T, otype U) {
53        ... functions that use T once ...
54    }
56    trait bar(otype S | foo(S, char)) {
57        ... functions that use S once ...
58    }
60In this example `bar` may be used as a type but `foo` may not.
62When a trait is used as a type it creates a generic object which combines
63the base structure (an instance of `summation` in this case) and the vtable,
64which is currently created and provided by a hidden mechanism.
66The generic object type for each trait also implements that trait. This is
67actually the only means by which it can be used. The type of these functions
68look something like this:
70    void combine(trait combiner & this, int num);
72The main use case for trait objects is that they can be stored. They can be
73passed into functions, but using the trait directly is preferred in this case.
75    trait drawable(otype T) {
76        void draw(Surface & to, T & draw);
77        Rect(int) drawArea(T & draw);
78    };
80    struct UpdatingSurface {
81        Surface * surface;
82        vector(trait drawable) drawables;
83    };
85    void updateSurface(UpdatingSurface & us) {
86        for (size_t i = 0 ; i < us.drawables.size ; ++i) {
87            draw(us.surface, us.drawables[i]);
88        }
89    }
91With a more complete widget trait you could, for example, construct a UI tool
92kit that can declare containers that hold widgets without knowing about the
93widget types. Making it reasonable to extend the tool kit.
95The trait types can also be used in the types of assertions on traits as well.
96In this usage they passed as the underlying object and vtable pair as they
97are stored. The trait types can also be used in that trait's definition, which
98means you can pass two instances of a trait to a single function. However the
99look-up of the one that is not used to look up any functions, until another
100function that uses that object in the generic/look-up location is called.
102    trait example(otype T) {
103        bool test(T & this, trait example & that);
104    }
106### Explanation Of Restrictions
108The two restrictions on traits that can be used as trait objects are:
1101.  Only one generic parameter may be defined in the trait's header.
1112.  Each function assertion must have one parameter with the type of the
112    generic parameter. They may or may not return a value of that type.
114Elsewhere in this proposal I suggest ways to broaden these requirements.
115A simple example would be if a trait meets requirement 1 but not 2, then
116the assertions that do not satisfy the exactly one parameter requirement can
117be ignored.
119However I would like to talk about why these two rules are in place in the
120first place and the problems that any exceptions to these rules must avoid.
122The problems appear when the dispatcher function which operates on the
123generic object.
125    trait combiner(otype T, otype U) {
126        void combine(T&, U);
127    }
129This one is so strange I don't have proper syntax for it but let us say that
130the concrete dispatcher would be typed as
131`void combine(combiner(T) &, combiner(U));`. Does the function that combine
132the two underlying types exist to dispatch too?
134Maybe not. If `combiner(T)` works with ints and `combiner(U)` is a char then
135they could not be. It would have to enforce that all pairs of any types
136that are wrapped in this way. Which would pretty much destroy any chance of
137separate compilation.
139Even then it would be more expensive as the wrappers would have to carry ids
140that you use to look up on an <number of types>+1 dimensional table.
142The second restriction has a similar issue but makes a bit more sense to
143write out.
145    trait Series(otype T) {
146        ... size, iterators, getters ...
147        T join(T const &, T const &);
148    }
150With the dispatcher typed as:
152    Series join(Series const &, Series const &);
154Because these instances are generic and hide the underlying implementation we
155do not know what that implementation is. Unfortunately this also means the
156implementation for the two parameters might not be the same. Once we have
157two different types involved this devolves into the first case.
159We could check at run-time that the have the same underlying type, but this
160would likely time and space overhead and there is no clear recovery path.
162#### Sample Implementation
163A simple way to implement trait objects is by a pair of pointers. One to the
164underlying object and one to the vtable.
166    struct vtable_drawable {
167        void (*draw)(Surface &, void *);
168        Rect(int) (*drawArea)(void *);
169    };
171    struct drawable {
172        void * object;
173        vtable_drawable * vtable;
174    };
176The functions that run on the trait object would generally be generated using
177the following pattern:
179    void draw(Surface & surface, drawable & traitObj) {
180        return traitObj.vtable->draw(surface, traitObj.object);
181    }
183There may have to be special cases for things like copy construction, that
184might require a more significant wrapper. On the other hand moving could be
185implemented by moving the pointers without any need to refer to the base
188### Extension: Multiple Trait Parameters
189The base proposal in effect creates another use for the trait syntax that is
190related to the ones currently in the language but is also separate from them.
191The current uses generic functions and generic types, this new use could be
192described as generic objects.
194A generic object is of a concrete type and has concrete functions that work on
195it. It is generic in that it is a wrapper for an unknown type. Traits serve
196a similar role here as in generic functions as they limit what the function
197can be generic over.
199This combines the use allowing to have a generic type that is a generic
200object. All but one of the trait's parameters is given a concrete type,
201conceptually currying the trait to create a trait with on generic parameter
202that fits the original restrictions. The resulting concrete generic object
203type is different with each set of provided parameters and their values.
205Then it just becomes a question of where this is done. Again both examples use
206a basic syntax to show the idea.
208    trait iterator(virtual otype T, otype Item) {
209        bool has_next(T const &);
210        Item get_next(T const *);
211    }
213    iterator(int) int_it = begin(container_of_ints);
215The first option is to do it at the definition of the trait. One parameter
216is selected (here with the `virtual` keyword, but other rules like "the first"
217could also be used) and when an instance of the trait is created all the
218other parameters must be provided.
220    trait iterator(otype T, otype Item) {
221        bool has_next(T const &);
222        Item get_next(T &);
223    }
225    iterator(virtual, int) int_it = begin(container_of_ints);
227The second option is to skip a parameter as part of the type instance
228definition. One parameter is explicitly skipped (again with the `virtual`
229keyword) and the others have concrete types. The skipped one is the one we
230are generic on.
232Incidentally in both examples `container_of_ints` may itself be a generic
233object and `begin` returns a generic iterator with unknown implementation.
235These options are not exclusive. Defining a default on the trait allows for
236an object to be created as in the first example. However, whether the
237default is provided or not, the second syntax can be used to pick a
238parameter on instantiation.
240### Extension: Object Access
241This requires that the resolution scope (see below) is at the type level or
242has explicate points with names. These are the tables and table names used
245The system already knows where to find the virtual table and the object. If
246the tables have particular identities, or on the user side names, then it is
247meaningful to check if a binding virtual table is the same* as another. The
248main use of this is virtual table declarations also give the type they bind
249and if a binding table matches a known table then the underlyind object in the
250trait object must be of that type.
252* By identity, by value would work and in some senses be more flexiable. But
253  it would be slower and refering to further away functions would be harder.
255This gives one of the main new features of the hierarchical use of virtual
256tables (see below); the ability to recover the underlying object. Or a pointer
257of the approprate type it which both reflects the implementation and gives a
258convenent way to encode the boolean/conditional aspect of the operation which
259is that a different virtual table might be in use.
261There are two general ways to reperent this; a cast or a field access. The
262cast is traditional and would definitely fit if a single pointer repersents
263a trait object with the virtual table as part of the object. However for a
264double pointer field access might be more approprate. By this system though
265it is not the type that is used as the identifier but the virtual table. If
266there is one table per type than it becomes equivilant again. Otherwise the
267table has to be used as the identifier and the type is just a result of that
268which seems important for syntax.
273We would also like to implement hierarchical relations between types.
275    ast_node
276    |-expression_node
277    | |-operator_expression
278    |
279    |-statement_node
280    | |-goto_statement
281    |
282    |-declaration_node
283      |-using_declaration
284      |-variable_declaration
286Virtual tables by themselves are not quite enough to implement this system.
287A vtable is just a list of functions and there is no way to check at run-time
288what these functions, we carry that knowledge with the table.
290This proposal adds type ids to check for position in the hierarchy and an
291explicate syntax for establishing a hierarchical relation between traits and
292their implementing types. The ids should uniquely identify each type and
293allow retrieval of the type's parent if one exists. By recursion this allows
294the ancestor relation between any two hierarchical types can be checked.
296The hierarchy is created with traits as the internal nodes and structures
297as the leaf nodes. The structures may be used normally and the traits can
298be used to create generic objects as in the first section (the same
299restrictions apply). However these type objects store their type id which can
300be recovered to figure out which type they are or at least check to see if
301they fall into a given sub-tree at run-time.
303Here is an example of part of a hierarchy. The `virtual(PARENT)` syntax is
304just an example. But when used it give the name of the parent type or if
305empty it shows that this type is the root of its hierarchy.
306(Also I'm not sure where I got these casing rules.)
308    trait ast_node(otype T) virtual() {
309        void print(T & this, ostream & out);
310        void visit(T & this, Visitor & visitor);
311        CodeLocation const & get_code_location(T & this);
312    }
314    trait expression_node(otype T) virtual(ast_node) {
315        Type eval_type(T const & this);
316    }
318    struct operator_expression virtual(expression_node) {
319        enum operator_kind kind;
320        trait expression_node rands[2];
321    }
323    trait statement_node(otype T) virtual(ast_node) {
324        vector(Label) & get_labels(T & this);
325    }
327    struct goto_statement virtual(statement_node) {
328        vector(Label) labels;
329        Label target;
330    }
332    trait declaration_node(otype T) virtual(ast_node) {
333        string name_of(T const & this);
334        Type type_of(T const & this);
335    }
337    struct using_declaration virtual(declaration_node) {
338        string new_type;
339        Type old_type;
340    }
342    struct variable_declaration virtual(declaration_node) {
343        string name;
344        Type type;
345    }
347This system does not support multiple inheritance. The system could be
348extended to support it or a limited form (ex. you may have multiple parents
349but they may not have a common ancestor). However this proposal focuses just
350on using hierachy as organization. Other uses for reusable/genaric code or
351shared interfaces is left for other features of the language.
353### Extension: Structural Inheritance
354An extension would be allow structures to be used as internal nodes on the
355inheritance tree. Its child types would have to implement the same fields.
357The weaker restriction would be to convert the fields into field assertions
358(Not implemented yet: `U T.x` means there is a field of type you on the type
359T. Offset unknown and passed in/stored with function pointers.)
360A concrete child would have to declare the same set of fields with the same
361types. This is of a more functional style.
363The stronger restriction is that the fields of the parent are a prefix of the
364child's fields. Possibly automatically inserted. This the imperative view and
365may also have less overhead.
367### Extension: Unions and Enumerations
368Currently there is no reason unions and enumerations, in the cases they
369do implement the trait, could not be in the hierarchy as leaf nodes.
371It does not work with structural induction, but that could just be a compile
372time check that all ancestors are traits or do not add field assertions.
374#### Sample Implementation
375The type id may be as little as:
377    struct typeid {
378        struct typeid const * const parent;
379    };
381Some linker magic would have to be used to ensure exactly one copy of each
382structure for each type exists in memory. There seem to be special once
383sections that support this and it should be easier than generating unique
384ids across compilation units.
386The structure could be extended to contain any additional type information.
388There are two general designs for vtables with type ids. The first is to put
389the type id at the top of the vtable, this is the most compact and efficient
390solution but only works if we have exactly 1 vtable for each type. The second
391is to put a pointer to the type id in each vtable. This has more overhead but
392allows multiple vtables per type.
394    struct <trait>_vtable {
395        struct typeid const id;
397        // Trait dependent list of vtable members.
398    };
400    struct <trait>_vtable {
401        struct typeid const * const id;
403        // Trait dependent list of vtable members.
404    };
406One important restriction is that only one instance of each typeid in memory.
407There is a ".gnu.linkonce" feature in the linker that might solve the issue.
409### Virtual Casts
410The generic objects may be cast up and down the hierarchy.
412Casting to an ancestor type always succeeds. From one generic type to another
413is just a reinterpretation and could be implicate. Wrapping and unwrapping
414a concrete type will probably use the same syntax as in the first section.
416Casting from an ancestor to a descendent requires a check. The underlying
417type may or may not belong to the sub-tree headed by that descendent. For this
418we introduce a new cast operator, which returns the pointer unchanged if the
419check succeeds and null otherwise.
421    trait SubType * new_value = (virtual trait SubType *)super_type;
423For the following example I am using the as of yet finished exception system.
425    trait exception(otype T) virtual() {
426        char const * what(T & this);
427    }
429    trait io_error(otype T) virtual(exception) {
430        FILE * which_file(T & this);
431    }
433    struct eof_error(otype T) virtual(io_error) {
434        FILE * file;
435    }
437    char const * what(eof_error &) {
438        return "Tried to read from an empty file.";
439    }
441    FILE * which_file(eof_error & this) {
442        return eof_error.file;
443    }
445    bool handleIoError(exception * exc) {
446        io_error * error = (virtual io_error *)exc;
447        if (NULL == error) {
448            return false;
449        }
450        ...
451        return true;
452    }
454### Extension: Implicate Virtual Cast Target
455This is a small extension, even in the example above `io_error *` is repeated
456in the cast and the variable being assigned to. Using return type inference
457would allow the second type to be skipped in cases it is clear what type is
458being checked against.
460The line then becomes:
462    io_error * error = (virtual)exc;
464#### Sample Implementation
465This cast implementation assumes a type id layout similar to the one given
466above. Also this code is definitely in the underlying C. Functions that give
467this functionality could exist in the standard library but these are meant to
468be produced by code translation of the virtual cast.
470    bool is_in_subtree(typeid const * root, typeid const * id) {
471        if (root == id) {
472            return true
473        } else if (NULL == id->parent) {
474            return false;
475        } else {
476            return is_in_subtree(root, id->parent);
477        }
478    }
480    void * virtual_cast(typeid const * target, void * value) {
481        return is_in_subtree(target, *(typeid const **)value) ? value : NULL;
482    }
484The virtual cast function might have to be wrapped with some casts to make it
485compile without warning.
487For the implicate target type we may be able to lean on the type resolution
488system that already exists. If the casting to ancestor type is built into
489the resolution then the impicate target could be decided by picking an
490overload, generated for each hierarchial type (here io_error and its root
491type exception).
493    io_error * virtual_cast(exception * value) {
494        return virtual_cast(io_error_typeid, value);
495    }
497### Extension: Inline vtables
498Since the structures here are usually made to be turned into trait objects
499it might be worth it to have fields in them to store the virtual table
500pointer. This would have to be declared on the trait as an assertion (example:
501`vtable;` or `T.vtable;`), but if it is the trait object could be a single
504There are also three options for where the pointer to the vtable. It could be
505anywhere, a fixed location for each trait or always at the front. For the per-
506trait solution an extension to specify what it is (example `vtable[0];`) which
507could also be used to combine it with others. So these options can be combined
508to allow access to all three options.
510The pointer to virtual table field on structures might implicately added (the
511types have to declare they are a child here) or created with a declaration,
512possibly like the one used to create the assertion.
514### Extension: Associated Types Use
515If the `` proposal is accepted the following trait could
516be added:
518    trait is_virtual(dtype T) {
519        dtype table;
520        // An example assertion:
521        const table & get_virtual_table(T &);
522    }
524There may be more assertions but there has to be at least one way to find
525the (possibly default) virtual table. It is required to construct instances
526of the type.
528Without the assotiated type it would look like this:
530    trait is_virtual(dtype T, dtype table) {
531        const table & get_virtual_table(T &);
532    }
534Which is just a little bit longer to use but becomes more problematic if the
535user has to explicately provide the table's name as it doesn't really have its
536own type name. If it does it is probably mangled.
538### Virtual Tables as Types
539Here we consider encoding plus the implementation of functions on it to be a
540type. Which is to say in the type hierarchy structures aren't concrete types
541anymore, instead they are parent types to vtables, which combine the encoding
542and implementation.
544### Question: Wrapping Structures
545One issue is what to do with concrete types at the base of the type tree.
546When we are working with the concrete type generally it would like them to be
547regular structures with direct calls. On the other hand for interactions with
548other types in the hierarchy it is more convenent for the type already to be
551Which of these two should we use? Should we support both and if so how do we
552choose which one is being used at any given time.
554On a related note I have been using pointers two trait types here, as that
555is how many existing languages handle it. However the generic objects might
556be only one or two pointers wide passing the objects as a whole would not
557be very expensive and all operations on the generic objects probably have
558to be defined anyways.
560Resolution Scope
563What is the scope of a resolution? When are the functions in a vtable decided
564and how broadly is this applied?
566### Type Level:
567Each structure has a single resolution for all of the functions in the
568virtual trait. This is how many languages that implement this or similar
569features do it.
571The main thing CFA would need to do it this way is some single point where
572the type declaration, including the functions that satisfy the trait, are
573all defined. Currently there are many points where this can happen, not all
574of them have the same definitions and no way to select one over the other.
576Some syntax would have to be added to specify the resolution point. To ensure
577a single instance there may have to be two variants, one forward declaration
578and one to create the instance. With some compiler magic the forward
579declaration maybe enough.
581    extern trait combiner(struct summation) vtable;
582    trait combiner(struct summation) vtable;
584Or (with the same variants):
586    vtable combiner(struct summation);
588The extern variant promises that the vtable will exist while the normal one
589is where the resolution actually happens.
591### Explicit Resolution Points:
592Slightly looser than the above, there are explicit points where the vtables
593are resolved, but there is no limit on the number of resolution points that
594might be provided. Each time a object is bound to a trait, one of the
595resolutions is selected. This might be the most flexible option.
597An syntax would have to be provided as above. There may also be the option
598to name resolution points so that you can choose between them. This also
599could come with the ability to forward declare them.
601Especially if they are not named, these resolution points should be able to
602appear in functions, where the scoping rules can be used to select one.
603However this also means that stack-allocated functions can end up in the
606    extern trait combiner(struct summation) vtable sum;
607    trait combiner(struct summation) vtable sum;
609    extern trait combiner(struct summation) vtable sum default;
610    trait combiner(struct summation) vtable sum default;
612The extern difference is the same before. The name (sum in the samples) is
613used at the binding site to say which one is picked. The default keyword can
614be used in only some of the declarations.
616    trait combiner fee = {summation_instance, sum};
617    trait combiner foe = summation_instance;
619(I am not really happy about this syntax, but it kind of works.)
620The object being bound is required. The name of the vtable is optional if
621there is exactly one vtable name marked with default.
623These could also be placed inside functions. In which case both the name and
624the default keyword might be optional. If the name is omitted in an assignment
625the closest vtable is chosen (returning to the global default rule if no
626appropriate local vtable is in scope).
628### Site Based Resolution:
629Every place in code where the binding of a vtable to an object occurs has
630its own resolution. Syntax-wise this is the simplest as it should be able
631to use just the existing declarations and the conversion to trait object.
632It also is very close to the current polymorphic resolution rules.
634This works as the explicit resolution points except the resolution points
635are implicit and their would be no selection of which resolution to use. The
636closest (current) resolution is always selected.
638This could easily lead to an explosion of vtables as it has the most fine
639grained resolution the number of bindings in a single scope (that produces
640the same binding) could be quite high. Merging identical vtables might help
641reduce that.
643Vtable Lifetime Issues
646Vtables interact badly with the thunk issue. Conceptually vtables are static
647like type/function data they carry, as those decisions are made by the
648resolver at compile time.
650Stack allocated functions interact badly with this because they are not
651static. There are several ways to try to resolve this, however without a
652general solution most can keep vtables from making the existing thunk problem
653worse, they don't do anything to solve it.
655Filling in some fields of a static vtable could cause issues on a recursive
656call. And then we are still limited by the lifetime of the stack functions, as
657the vtable with stale pointers is still a problem.
659Dynamically allocated vtables introduces memory management overhead and
660requires some way to differentiate between dynamic and statically allocated
661tables. The stale function pointer problem continues unless those becomes
662dynamically allocated as well which gives us the same costs again.
664Stack allocating the vtable seems like the best issue. The vtable's lifetime
665is now the limiting factor but it should be effectively the same as the
666shortest lifetime of a function assigned to it. However this still limits the
667lifetime "implicitly" and returns to the original problem with thunks.
669Odds And Ends
672In addition to the main design there are a few extras that should be
673considered. They are not part of the core design but make the new uses fully
676### Extension: Parent-Child Assertion
677For hierarchy types in regular traits, generic functions or generic structures
678we may want to be able to check parent-child relationships between two types
679given. For this we might have to add another primitive assertion. It would
680have the following form if declared in code:
682    trait is_parent_child(dtype Parent, dtype Child) { <built-in magic> }
684This assertion is satified if Parent is an ancestor of Child in a hierarchy.
685In other words Child can be statically cast to Parent. The cast from Parent
686to child would be dynamically checked as usual.
688However in this form there are two concerns. The first that Parent will
689usually be consistent for a given use, it will not be a variable. Second is
690that we may also need the assertion functions. To do any casting/conversions
692TODO: Talk about when we wrap a concrete type and how that leads to "may".
694To this end it may be better that the parent trait combines the usual
695assertions plus this new primitive assertion. There may or may not be use
696cases for accessing just one half and providing easy access to them may be
697required depending on how that turns out.
699    trait Parent(dtype T | interface(T)) virtual(<grand-parent?>) { }
701### Extension: sizeof Compatablity
702Trait types are always sized, it may even be a fixed size like how pointers
703have the same size regardless of what they point at. However their contents
704may or may not be of a known size (if the `sized(...)` assertion is used).
706Currently there is no way to access this information. If it is needed a
707special syntax would have to be added. Here a special case of `sizeof` is
710    struct line aLine;
711    trait drawable widget = aLine;
713    size_t x = sizeof(widget);
714    size_t y = sizeof(trait drawable);
716As usual `y`, size of the type, is the size of the local storage used to put
717the value into. The other case `x` checks the saved stored value in the
718virtual table and returns that.
Note: See TracBrowser for help on using the repository browser.