Proposal For Use of Virtual Tables ================================== This is an adaptation of the earlier virtual proposal, updating it with new ideas, re-framing it and laying out more design decisions. It should eventually replace the earlier proposal, but not all features and syntax have been converted to the new design. The basic concept of a virtual table (vtable) is the same here as in most other languages that use them. They will mostly contain function pointers although they should be able to store anything that goes into a trait. I also include notes on a sample implementation, which primarily exists to show there is a reasonable implementation. The code samples for that are in a slight pseudo-code to help avoid name mangling and keeps some CFA features while they would actually be written in C. Trait Instances --------------- Currently traits are completely abstract. Data types might implement a trait but traits are not themselves data types. Which is to say you cannot have an instance of a trait. This proposal will change that and allow instances of traits to be created from instances of data types that implement the trait. For example: trait combiner(otype T) { void combine(T&, int); }; struct summation { int sum; }; void ?{}( struct summation & this ) { this.sum = 0; } void combine( struct summation & this, int num ) { this.sum = this.sum + num; } trait combiner obj = struct summation{}; combine(obj, 5); As with `struct` (and `union` and `enum`), `trait` might be optional when using the trait as a type name. A trait may be used in assertion list as before. For traits to be used this way they should meet two requirements. First they should only have a single polymorphic type and each assertion should use that type once as a parameter. Extensions may later loosen these requirements. Also note this applies to the final expanded list of assertions. Consider: trait foo(otype T, otype U) { ... functions that use T once ... } trait bar(otype S | foo(S, char)) { ... functions that use S once ... } In this example `bar` may be used as a type but `foo` may not. When a trait is used as a type it creates a generic object which combines the base structure (an instance of `summation` in this case) and the vtable, which is currently created and provided by a hidden mechanism. The generic object type for each trait also implements that trait. This is actually the only means by which it can be used. The type of these functions look something like this: void combine(trait combiner & this, int num); The main use case for trait objects is that they can be stored. They can be passed into functions, but using the trait directly is preferred in this case. trait drawable(otype T) { void draw(Surface & to, T & draw); Rect(int) drawArea(T & draw); }; struct UpdatingSurface { Surface * surface; vector(trait drawable) drawables; }; void updateSurface(UpdatingSurface & us) { for (size_t i = 0 ; i < us.drawables.size ; ++i) { draw(us.surface, us.drawables[i]); } } The trait types can also be used in the types of assertions on traits as well. In this usage they passed as the underlying object and vtable pair as they are stored. The trait types can also be used in that trait's definition, which means you can pass two instances of a trait to a single function. However the look-up of the one that is not used to look up any functions, until another function that uses that object in the generic/look-up location is called. trait example(otype T) { bool test(T & this, trait example & that); } ### Explanation Of Restrictions The two restrictions on traits that can be used as trait objects are: 1. Only one generic parameter may be defined in the trait's header. 2. Each function assertion must have one parameter with the type of the generic parameter. They may or may not return a value of that type. Elsewhere in this proposal I suggest ways to broaden these requirements. A simple example would be if a trait meets requirement 1 but not 2, then the assertions that do not satisfy the exactly one parameter requirement can be ignored. However I would like to talk about why these two rules are in place in the first place and the problems that any exceptions to these rules must avoid. The problems appear when the dispatcher function which operates on the generic object. trait combiner(otype T, otype U) { void combine(T&, U); } This one is so strange I don't have proper syntax for it but let us say that the concrete dispatcher would be typed as `void combine(combiner(T) &, combiner(U));`. Does the function that combine the two underlying types exist to dispatch too? Maybe not. If `combiner(T)` works with ints and `combiner(U)` is a char then they could not be. It would have to enforce that all pairs of any types that are wrapped in this way. Which would pretty much destroy any chance of separate compilation. Even then it would be more expensive as the wrappers would have to carry ids that you use to look up on an +1 dimensional table. The second restriction has a similar issue but makes a bit more sense to write out. trait Series(otype T) { ... size, iterators, getters ... T join(T const &, T const &); } With the dispatcher typed as: Series join(Series const &, Series const &); Because these instances are generic and hide the underlying implementation we do not know what that implementation is. Unfortunately this also means the implementation for the two parameters might not be the same. Once we have two different types involved this devolves into the first case. We could check at run-time that the have the same underlying type, but this would likely time and space overhead and there is no clear recovery path. #### Sample Implementation A simple way to implement trait objects is by a pair of pointers. One to the underlying object and one to the vtable. struct vtable_drawable { void (*draw)(Surface &, void *); Rect(int) (*drawArea)(void *); }; struct drawable { void * object; vtable_drawable * vtable; }; The functions that run on the trait object would generally be generated using the following pattern: void draw(Surface & surface, drawable & traitObj) { return traitObj.vtable->draw(surface, traitObj.object); } There may have to be special cases for things like copy construction, that might require a more significant wrapper. On the other hand moving could be implemented by moving the pointers without any need to refer to the base object. ### Extension: Multiple Trait Parameters The base proposal in effect creates another use for the trait syntax that is related to the ones currently in the language but is also separate from them. The current uses generic functions and generic types, this new use could be described as generic objects. A generic object is of a concrete type and has concrete functions that work on it. It is generic in that it is a wrapper for an unknown type. Traits serve a similar role here as in generic functions as they limit what the function can be generic over. This combines the use allowing to have a generic type that is a generic object. All but one of the trait's parameters is given a concrete type, conceptually currying the trait to create a trait with on generic parameter that fits the original restrictions. The resulting concrete generic object type is different with each set of provided parameters and their values. Then it just becomes a question of where this is done. Again both examples use a basic syntax to show the idea. trait iterator(virtual otype T, otype Item) { bool has_next(T const &); Item get_next(T const *); } iterator(int) int_it = begin(container_of_ints); The first option is to do it at the definition of the trait. One parameter is selected (here with the `virtual` keyword, but other rules like "the first" could also be used) and when an instance of the trait is created all the other parameters must be provided. trait iterator(otype T, otype Item) { bool has_next(T const &); Item get_next(T const *); } iterator(virtual, int) int_it = begin(container_of_ints); The second option is to skip a parameter as part of the type instance definition. One parameter is explicitly skipped (again with the `virtual` keyword) and the others have concrete types. The skipped one is the one we are generic on. Incidentally in both examples `container_of_ints` may itself be a generic object and `begin` returns a generic iterator with unknown implementation. These options are not exclusive. Defining a default on the trait allows for an object to be created as in the first example. However, whether the default is provided or not, the second syntax can be used to pick a parameter on instantiation. Hierarchy --------- We would also like to implement hierarchical relations between types. AstNode |-ParseNode | |-Declaration | | |-DeclarationWithType | | |-StructureDeclaration | |-Statement | | |-CompoundStatement | |-Expression |-Type Virtual tables by themselves are not quite enough to implement this system. A vtable is just a list of functions and there is no way to check at run-time what these functions, we carry that knowledge with the table. This proposal adds type ids to check for position in the hierarchy and an explicate syntax for establishing a hierarchical relation between traits and their implementing types. The ids should uniquely identify each type and allow retrieval of the type's parent if one exists. By recursion this allows the ancestor relation between any two hierarchical types can be checked. The hierarchy is created with traits as the internal nodes and structures as the leaf nodes. The structures may be used normally and the traits can be used to create generic objects as in the first section (the same restrictions apply). However these type objects store their type id which can be recovered to figure out which type they are or at least check to see if they fall into a given sub-tree at run-time. Here is an example of part of a hierarchy. The `virtual(PARENT)` syntax is just an example. But when used it give the name of the parent type or if empty it shows that this type is the root of its hierarchy. (Also I'm not sure where I got these casing rules.) trait ast_node(otype T) virtual() { void print(T & this, ostream & out); void visit(T & this, Visitor & visitor); CodeLocation const & get_code_location(T & this); } trait expression_node(otype T) virtual(ast_node) { Type eval_type(T const & this); } struct operator_expression virtual(expression_node) { enum operator_kind kind; trait expression_node rands[2]; } trait statement_node(otype T) virtual(ast_node) { vector(Label) & get_labels(T & this); } struct goto_statement virtual(statement_node) { vector(Label) labels; Label target; } trait declaration_node(otype T) virtual(ast_node) { string name_of(T const & this); Type type_of(T const & this); } struct using_declaration virtual(declaration_node) { string new_type; Type old_type; } struct variable_declaration virtual(declaration_node) { string name; Type type; } ### Extension: Structural Inheritance An extension would be allow structures to be used as internal nodes on the inheritance tree. Its child types would have to implement the same fields. The weaker restriction would be to convert the fields into field assertions (Not implemented yet: `U T.x` means there is a field of type you on the type T. Offset unknown and passed in/stored with function pointers.) A concrete child would have to declare the same set of fields with the same types. This is of a more functional style. The stronger restriction is that the fields of the parent are a prefix of the child's fields. Possibly automatically inserted. This the imperative view and may also have less overhead. ### Extension: Unions and Enumerations Currently there is no reason unions and enumerations, in the cases they do implement the trait, could not be in the hierarchy as leaf nodes. It does not work with structural induction, but that could just be a compile time check that all ancestors are traits or do not add field assertions. #### Sample Implementation The type id may be as little as: struct typeid { struct typeid const * const parent; }; Some linker magic would have to be used to ensure exactly one copy of each structure for each type exists in memory. There seem to be special once sections that support this and it should be easier than generating unique ids across compilation units. The structure could be extended to contain any additional type information. There are two general designs for vtables with type ids. The first is to put the type id at the top of the vtable, this is the most compact and efficient solution but only works if we have exactly 1 vtable for each type. The second is to put a pointer to the type id in each vtable. This has more overhead but allows multiple vtables. struct _vtable { struct typeid const id; // Trait dependent list of vtable members. }; struct _vtable { struct typeid const * const id; // Trait dependent list of vtable members. }; ### Virtual Casts The generic objects may be cast up and down the hierarchy. Casting to an ancestor type always succeeds. From one generic type to another is just a reinterpretation and could be implicate. Wrapping and unwrapping a concrete type will probably use the same syntax as in the first section. Casting from an ancestor to a descendent requires a check. The underlying type may or may not belong to the sub-tree headed by that descendent. For this we introduce a new cast operator, which returns the pointer unchanged if the check succeeds and null otherwise. trait SubType * new_value = (virtual trait SubType *)super_type; For the following example I am using the as of yet finished exception system. trait exception(otype T) virtual() { char const * what(T & this); } trait io_error(otype T) virtual(exception) { FILE * which_file(T & this); } struct eof_error(otype T) virtual(io_error) { FILE * file; } char const * what(eof_error &) { return "Tried to read from an empty file."; } FILE * which_file(eof_error & this) { return eof_error.file; } bool handleIoError(exception * exc) { io_error * error = (virtual io_error *)exc; if (NULL == error) { return false; } ... return true; } ### Extension: Implicate Virtual Cast Target This is a small extension, even in the example above `io_error *` is repeated in the cast and the variable being assigned to. Using return type inference would allow the second type to be skipped in cases it is clear what type is being checked against. The line then becomes: io_error * error = (virtual)exc; ### Extension: Inline vtables Since the structures here are usually made to be turned into trait objects it might be worth it to have fields in them to store the virtual table pointer. This would have to be declared on the trait as an assertion (example: `vtable;` or `T.vtable;`), but if it is the trait object could be a single pointer. There are also three options for where the pointer to the vtable. It could be anywhere, a fixed location for each trait or always at the front. For the per- trait solution an extension to specify what it is (example `vtable[0];`) which could also be used to combine it with others. So these options can be combined to allow access to all three options. ### Virtual Tables as Types Here we consider encoding plus the implementation of functions on it to be a type. Which is to say in the type hierarchy structures aren't concrete types anymore, instead they are parent types to vtables, which combine the encoding and implementation. Resolution Scope ---------------- What is the scope of a resolution? When are the functions in a vtable decided and how broadly is this applied? ### Type Level: Each structure has a single resolution for all of the functions in the virtual trait. This is how many languages that implement this or similar features do it. The main thing CFA would need to do it this way is some single point where the type declaration, including the functions that satisfy the trait, are all defined. Currently there are many points where this can happen, not all of them have the same definitions and no way to select one over the other. Some syntax would have to be added to specify the resolution point. To ensure a single instance there may have to be two variants, one forward declaration and one to create the instance. With some compiler magic the forward declaration maybe enough. extern trait combiner(struct summation) vtable; trait combiner(struct summation) vtable; Or (with the same variants): vtable combiner(struct summation); The extern variant promises that the vtable will exist while the normal one is where the resolution actually happens. ### Explicit Resolution Points: Slightly looser than the above, there are explicit points where the vtables are resolved, but there is no limit on the number of resolution points that might be provided. Each time a object is bound to a trait, one of the resolutions is selected. This might be the most flexible option. An syntax would have to be provided as above. There may also be the option to name resolution points so that you can choose between them. This also could come with the ability to forward declare them. Especially if they are not named, these resolution points should be able to appear in functions, where the scoping rules can be used to select one. However this also means that stack-allocated functions can end up in the vtable. extern trait combiner(struct summation) vtable sum; trait combiner(struct summation) vtable sum; extern trait combiner(struct summation) vtable sum default; trait combiner(struct summation) vtable sum default; The extern difference is the same before. The name (sum in the samples) is used at the binding site to say which one is picked. The default keyword can be used in only some of the declarations. trait combiner fee = (summation_instance, sum); trait combiner foe = summation_instance; (I am not really happy about this syntax, but it kind of works.) The object being bound is required. The name of the vtable is optional if there is exactly one vtable name marked with default. These could also be placed inside functions. In which case both the name and the default keyword might be optional. If the name is omitted in an assignment the closest vtable is chosen (returning to the global default rule if no appropriate local vtable is in scope). ### Site Based Resolution: Every place in code where the binding of a vtable to an object occurs has its own resolution. Syntax-wise this is the simplest as it should be able to use just the existing declarations and the conversion to trait object. It also is very close to the current polymorphic resolution rules. This works as the explicit resolution points except the resolution points are implicit and their would be no selection of which resolution to use. The closest (current) resolution is always selected. This could easily lead to an explosion of vtables as it has the most fine grained resolution the number of bindings in a single scope (that produces the same binding) could be quite high. Merging identical vtables might help reduce that. Vtable Lifetime Issues ---------------------- Vtables interact badly with the thunk issue. Conceptually vtables are static like type/function data they carry, as those decisions are made by the resolver at compile time. Stack allocated functions interact badly with this because they are not static. There are several ways to try to resolve this, however without a general solution most can only buy time. Filling in some fields of a static vtable could cause issues on a recursive call. And then we are still limited by the lifetime of the stack functions, as the vtable with stale pointers is still a problem. Dynamically allocated vtables introduces memory management overhead and requires some way to differentiate between dynamic and statically allocated tables. The stale function pointer problem continues unless those becomes dynamically allocated as well which gives us the same costs again. Stack allocating the vtable seems like the best issue. The vtable's lifetime is now the limiting factor but it should be effectively the same as the shortest lifetime of a function assigned to it. However this still limits the lifetime "implicitly" and returns to the original problem with thunks.