Proposal For Use of Virtual Tables ================================== The basic concept of a virtual table (vtable) is the same here as in most other languages that use them. They will mostly contain function pointers although they should be able to store anything that goes into a trait. I also include notes on a sample implementation, which primarily exists to show there is a reasonable implementation. The code samples for that are in a slight pseudo-code to help avoid name mangling and keeps some CFA features while they would actually be written in C. Trait Instances --------------- Currently traits are completely abstract. Data types might implement a trait but traits are not themselves data types. Which is to say you cannot have an instance of a trait. This proposal will change that and allow instances of traits to be created from instances of data types that implement the trait. For example: trait combiner(otype T) { void combine(T&, int); }; struct summation { int sum; }; void ?{}( struct summation & this ) { this.sum = 0; } void combine( struct summation & this, int num ) { this.sum = this.sum + num; } trait combiner obj = struct summation{}; combine(obj, 5); As with `struct` (and `union` and `enum`), `trait` might be optional when using the trait as a type name. A trait may be used in assertion list as before. For traits to be used this way they should meet two requirements. First they should only have a single polymorphic type and each assertion should use that type once as a parameter. Extensions may later loosen these requirements. Also note this applies to the final expanded list of assertions. Consider: trait foo(otype T, otype U) { ... functions that use T once ... } trait bar(otype S | foo(S, char)) { ... functions that use S once ... } In this example `bar` may be used as a type but `foo` may not. When a trait is used as a type it creates a generic object which combines the base structure (an instance of `summation` in this case) and the vtable, which is currently created and provided by a hidden mechanism. The generic object type for each trait also implements that trait. This is actually the only means by which it can be used. The type of these functions look something like this: void combine(trait combiner & this, int num); The main use case for trait objects is that they can be stored. They can be passed into functions, but using the trait directly is preferred in this case. trait drawable(otype T) { void draw(Surface & to, T & draw); Rect(int) drawArea(T & draw); }; struct UpdatingSurface { Surface * surface; vector(trait drawable) drawables; }; void updateSurface(UpdatingSurface & us) { for (size_t i = 0 ; i < us.drawables.size ; ++i) { draw(us.surface, us.drawables[i]); } } With a more complete widget trait you could, for example, construct a UI tool kit that can declare containers that hold widgets without knowing about the widget types. Making it reasonable to extend the tool kit. The trait types can also be used in the types of assertions on traits as well. In this usage they passed as the underlying object and vtable pair as they are stored. The trait types can also be used in that trait's definition, which means you can pass two instances of a trait to a single function. However the look-up of the one that is not used to look up any functions, until another function that uses that object in the generic/look-up location is called. trait example(otype T) { bool test(T & this, trait example & that); } ### Explanation Of Restrictions The two restrictions on traits that can be used as trait objects are: 1. Only one generic parameter may be defined in the trait's header. 2. Each function assertion must have one parameter with the type of the generic parameter. They may or may not return a value of that type. Elsewhere in this proposal I suggest ways to broaden these requirements. A simple example would be if a trait meets requirement 1 but not 2, then the assertions that do not satisfy the exactly one parameter requirement can be ignored. However I would like to talk about why these two rules are in place in the first place and the problems that any exceptions to these rules must avoid. The problems appear when the dispatcher function which operates on the generic object. trait combiner(otype T, otype U) { void combine(T&, U); } This one is so strange I don't have proper syntax for it but let us say that the concrete dispatcher would be typed as `void combine(combiner(T) &, combiner(U));`. Does the function that combine the two underlying types exist to dispatch too? Maybe not. If `combiner(T)` works with ints and `combiner(U)` is a char then they could not be. It would have to enforce that all pairs of any types that are wrapped in this way. Which would pretty much destroy any chance of separate compilation. Even then it would be more expensive as the wrappers would have to carry ids that you use to look up on an +1 dimensional table. The second restriction has a similar issue but makes a bit more sense to write out. trait Series(otype T) { ... size, iterators, getters ... T join(T const &, T const &); } With the dispatcher typed as: Series join(Series const &, Series const &); Because these instances are generic and hide the underlying implementation we do not know what that implementation is. Unfortunately this also means the implementation for the two parameters might not be the same. Once we have two different types involved this devolves into the first case. We could check at run-time that the have the same underlying type, but this would likely time and space overhead and there is no clear recovery path. #### Sample Implementation A simple way to implement trait objects is by a pair of pointers. One to the underlying object and one to the vtable. struct vtable_drawable { void (*draw)(Surface &, void *); Rect(int) (*drawArea)(void *); }; struct drawable { void * object; vtable_drawable * vtable; }; The functions that run on the trait object would generally be generated using the following pattern: void draw(Surface & surface, drawable & traitObj) { return traitObj.vtable->draw(surface, traitObj.object); } There may have to be special cases for things like copy construction, that might require a more significant wrapper. On the other hand moving could be implemented by moving the pointers without any need to refer to the base object. ### Extension: Multiple Trait Parameters The base proposal in effect creates another use for the trait syntax that is related to the ones currently in the language but is also separate from them. The current uses generic functions and generic types, this new use could be described as generic objects. A generic object is of a concrete type and has concrete functions that work on it. It is generic in that it is a wrapper for an unknown type. Traits serve a similar role here as in generic functions as they limit what the function can be generic over. This combines the use allowing to have a generic type that is a generic object. All but one of the trait's parameters is given a concrete type, conceptually currying the trait to create a trait with on generic parameter that fits the original restrictions. The resulting concrete generic object type is different with each set of provided parameters and their values. Then it just becomes a question of where this is done. Again both examples use a basic syntax to show the idea. trait iterator(virtual otype T, otype Item) { bool has_next(T const &); Item get_next(T const *); } iterator(int) int_it = begin(container_of_ints); The first option is to do it at the definition of the trait. One parameter is selected (here with the `virtual` keyword, but other rules like "the first" could also be used) and when an instance of the trait is created all the other parameters must be provided. trait iterator(otype T, otype Item) { bool has_next(T const &); Item get_next(T const *); } iterator(virtual, int) int_it = begin(container_of_ints); The second option is to skip a parameter as part of the type instance definition. One parameter is explicitly skipped (again with the `virtual` keyword) and the others have concrete types. The skipped one is the one we are generic on. Incidentally in both examples `container_of_ints` may itself be a generic object and `begin` returns a generic iterator with unknown implementation. These options are not exclusive. Defining a default on the trait allows for an object to be created as in the first example. However, whether the default is provided or not, the second syntax can be used to pick a parameter on instantiation. Hierarchy --------- We would also like to implement hierarchical relations between types. ast_node |-expression_node | |-operator_expression | |-statement_node | |-goto_statement | |-declaration_node |-using_declaration |-variable_declaration Virtual tables by themselves are not quite enough to implement this system. A vtable is just a list of functions and there is no way to check at run-time what these functions, we carry that knowledge with the table. This proposal adds type ids to check for position in the hierarchy and an explicate syntax for establishing a hierarchical relation between traits and their implementing types. The ids should uniquely identify each type and allow retrieval of the type's parent if one exists. By recursion this allows the ancestor relation between any two hierarchical types can be checked. The hierarchy is created with traits as the internal nodes and structures as the leaf nodes. The structures may be used normally and the traits can be used to create generic objects as in the first section (the same restrictions apply). However these type objects store their type id which can be recovered to figure out which type they are or at least check to see if they fall into a given sub-tree at run-time. Here is an example of part of a hierarchy. The `virtual(PARENT)` syntax is just an example. But when used it give the name of the parent type or if empty it shows that this type is the root of its hierarchy. (Also I'm not sure where I got these casing rules.) trait ast_node(otype T) virtual() { void print(T & this, ostream & out); void visit(T & this, Visitor & visitor); CodeLocation const & get_code_location(T & this); } trait expression_node(otype T) virtual(ast_node) { Type eval_type(T const & this); } struct operator_expression virtual(expression_node) { enum operator_kind kind; trait expression_node rands[2]; } trait statement_node(otype T) virtual(ast_node) { vector(Label) & get_labels(T & this); } struct goto_statement virtual(statement_node) { vector(Label) labels; Label target; } trait declaration_node(otype T) virtual(ast_node) { string name_of(T const & this); Type type_of(T const & this); } struct using_declaration virtual(declaration_node) { string new_type; Type old_type; } struct variable_declaration virtual(declaration_node) { string name; Type type; } This system does not support multiple inheritance. The system could be extended to support it or a limited form (ex. you may have multiple parents but they may not have a common ancestor). However this proposal focuses just on using hierachy as organization. Other uses for reusable/genaric code or shared interfaces is left for other features of the language. ### Extension: Structural Inheritance An extension would be allow structures to be used as internal nodes on the inheritance tree. Its child types would have to implement the same fields. The weaker restriction would be to convert the fields into field assertions (Not implemented yet: `U T.x` means there is a field of type you on the type T. Offset unknown and passed in/stored with function pointers.) A concrete child would have to declare the same set of fields with the same types. This is of a more functional style. The stronger restriction is that the fields of the parent are a prefix of the child's fields. Possibly automatically inserted. This the imperative view and may also have less overhead. ### Extension: Unions and Enumerations Currently there is no reason unions and enumerations, in the cases they do implement the trait, could not be in the hierarchy as leaf nodes. It does not work with structural induction, but that could just be a compile time check that all ancestors are traits or do not add field assertions. #### Sample Implementation The type id may be as little as: struct typeid { struct typeid const * const parent; }; Some linker magic would have to be used to ensure exactly one copy of each structure for each type exists in memory. There seem to be special once sections that support this and it should be easier than generating unique ids across compilation units. The structure could be extended to contain any additional type information. There are two general designs for vtables with type ids. The first is to put the type id at the top of the vtable, this is the most compact and efficient solution but only works if we have exactly 1 vtable for each type. The second is to put a pointer to the type id in each vtable. This has more overhead but allows multiple vtables per type. struct _vtable { struct typeid const id; // Trait dependent list of vtable members. }; struct _vtable { struct typeid const * const id; // Trait dependent list of vtable members. }; One important restriction is that only one instance of each typeid in memory. There is a ".gnu.linkonce" feature in the linker that might solve the issue. ### Virtual Casts The generic objects may be cast up and down the hierarchy. Casting to an ancestor type always succeeds. From one generic type to another is just a reinterpretation and could be implicate. Wrapping and unwrapping a concrete type will probably use the same syntax as in the first section. Casting from an ancestor to a descendent requires a check. The underlying type may or may not belong to the sub-tree headed by that descendent. For this we introduce a new cast operator, which returns the pointer unchanged if the check succeeds and null otherwise. trait SubType * new_value = (virtual trait SubType *)super_type; For the following example I am using the as of yet finished exception system. trait exception(otype T) virtual() { char const * what(T & this); } trait io_error(otype T) virtual(exception) { FILE * which_file(T & this); } struct eof_error(otype T) virtual(io_error) { FILE * file; } char const * what(eof_error &) { return "Tried to read from an empty file."; } FILE * which_file(eof_error & this) { return eof_error.file; } bool handleIoError(exception * exc) { io_error * error = (virtual io_error *)exc; if (NULL == error) { return false; } ... return true; } ### Extension: Implicate Virtual Cast Target This is a small extension, even in the example above `io_error *` is repeated in the cast and the variable being assigned to. Using return type inference would allow the second type to be skipped in cases it is clear what type is being checked against. The line then becomes: io_error * error = (virtual)exc; #### Sample Implementation This cast implementation assumes a type id layout similar to the one given above. Also this code is definitely in the underlying C. Functions that give this functionality could exist in the standard library but these are meant to be produced by code translation of the virtual cast. bool is_in_subtree(typeid const * root, typeid const * id) { if (root == id) { return true } else if (NULL == id->parent) { return false; } else { return is_in_subtree(root, id->parent); } } void * virtual_cast(typeid const * target, void * value) { return is_in_subtree(target, *(typeid const **)value) ? value : NULL; } The virtual cast function might have to be wrapped with some casts to make it compile without warning. For the implicate target type we may be able to lean on the type resolution system that already exists. If the casting to ancestor type is built into the resolution then the impicate target could be decided by picking an overload, generated for each hierarchial type (here io_error and its root type exception). io_error * virtual_cast(exception * value) { return virtual_cast(io_error_typeid, value); } ### Extension: Inline vtables Since the structures here are usually made to be turned into trait objects it might be worth it to have fields in them to store the virtual table pointer. This would have to be declared on the trait as an assertion (example: `vtable;` or `T.vtable;`), but if it is the trait object could be a single pointer. There are also three options for where the pointer to the vtable. It could be anywhere, a fixed location for each trait or always at the front. For the per- trait solution an extension to specify what it is (example `vtable[0];`) which could also be used to combine it with others. So these options can be combined to allow access to all three options. The pointer to virtual table field on structures might implicately added (the types have to declare they are a child here) or created with a declaration, possibly like the one used to create the assertion. ### Virtual Tables as Types Here we consider encoding plus the implementation of functions on it to be a type. Which is to say in the type hierarchy structures aren't concrete types anymore, instead they are parent types to vtables, which combine the encoding and implementation. ### Question: Wrapping Structures One issue is what to do with concrete types at the base of the type tree. When we are working with the concrete type generally it would like them to be regular structures with direct calls. On the other hand for interactions with other types in the hierarchy it is more convenent for the type already to be cast. Which of these two should we use? Should we support both and if so how do we choose which one is being used at any given time. On a related note I have been using pointers two trait types here, as that is how many existing languages handle it. However the generic objects might be only one or two pointers wide passing the objects as a whole would not be very expensive and all operations on the generic objects probably have to be defined anyways. Resolution Scope ---------------- What is the scope of a resolution? When are the functions in a vtable decided and how broadly is this applied? ### Type Level: Each structure has a single resolution for all of the functions in the virtual trait. This is how many languages that implement this or similar features do it. The main thing CFA would need to do it this way is some single point where the type declaration, including the functions that satisfy the trait, are all defined. Currently there are many points where this can happen, not all of them have the same definitions and no way to select one over the other. Some syntax would have to be added to specify the resolution point. To ensure a single instance there may have to be two variants, one forward declaration and one to create the instance. With some compiler magic the forward declaration maybe enough. extern trait combiner(struct summation) vtable; trait combiner(struct summation) vtable; Or (with the same variants): vtable combiner(struct summation); The extern variant promises that the vtable will exist while the normal one is where the resolution actually happens. ### Explicit Resolution Points: Slightly looser than the above, there are explicit points where the vtables are resolved, but there is no limit on the number of resolution points that might be provided. Each time a object is bound to a trait, one of the resolutions is selected. This might be the most flexible option. An syntax would have to be provided as above. There may also be the option to name resolution points so that you can choose between them. This also could come with the ability to forward declare them. Especially if they are not named, these resolution points should be able to appear in functions, where the scoping rules can be used to select one. However this also means that stack-allocated functions can end up in the vtable. extern trait combiner(struct summation) vtable sum; trait combiner(struct summation) vtable sum; extern trait combiner(struct summation) vtable sum default; trait combiner(struct summation) vtable sum default; The extern difference is the same before. The name (sum in the samples) is used at the binding site to say which one is picked. The default keyword can be used in only some of the declarations. trait combiner fee = (summation_instance, sum); trait combiner foe = summation_instance; (I am not really happy about this syntax, but it kind of works.) The object being bound is required. The name of the vtable is optional if there is exactly one vtable name marked with default. These could also be placed inside functions. In which case both the name and the default keyword might be optional. If the name is omitted in an assignment the closest vtable is chosen (returning to the global default rule if no appropriate local vtable is in scope). ### Site Based Resolution: Every place in code where the binding of a vtable to an object occurs has its own resolution. Syntax-wise this is the simplest as it should be able to use just the existing declarations and the conversion to trait object. It also is very close to the current polymorphic resolution rules. This works as the explicit resolution points except the resolution points are implicit and their would be no selection of which resolution to use. The closest (current) resolution is always selected. This could easily lead to an explosion of vtables as it has the most fine grained resolution the number of bindings in a single scope (that produces the same binding) could be quite high. Merging identical vtables might help reduce that. Vtable Lifetime Issues ---------------------- Vtables interact badly with the thunk issue. Conceptually vtables are static like type/function data they carry, as those decisions are made by the resolver at compile time. Stack allocated functions interact badly with this because they are not static. There are several ways to try to resolve this, however without a general solution most can keep vtables from making the existing thunk problem worse, they don't do anything to solve it. Filling in some fields of a static vtable could cause issues on a recursive call. And then we are still limited by the lifetime of the stack functions, as the vtable with stale pointers is still a problem. Dynamically allocated vtables introduces memory management overhead and requires some way to differentiate between dynamic and statically allocated tables. The stale function pointer problem continues unless those becomes dynamically allocated as well which gives us the same costs again. Stack allocating the vtable seems like the best issue. The vtable's lifetime is now the limiting factor but it should be effectively the same as the shortest lifetime of a function assigned to it. However this still limits the lifetime "implicitly" and returns to the original problem with thunks. Odds And Ends ------------- In addition to the main design there are a few extras that should be considered. They are not part of the core design but make the new uses fully featured. ### Extension: Parent-Child Assertion For hierarchy types in regular traits, generic functions or generic structures we may want to be able to check parent-child relationships between two types given. For this we might have to add another primitive assertion. It would have the following form if declared in code: trait is_parent_child(dtype Parent, dtype Child) { } This assertion is satified if Parent is an ancestor of Child in a hierarchy. In other words Child can be statically cast to Parent. The cast from Parent to child would be dynamically checked as usual. However in this form there are two concerns. The first that Parent will usually be consistent for a given use, it will not be a variable. Second is that we may also need the assertion functions. To do any casting/conversions anyways. TODO: Talk about when we wrap a concrete type and how that leads to "may". To this end it may be better that the parent trait combines the usual assertions plus this new primitive assertion. There may or may not be use cases for accessing just one half and providing easy access to them may be required depending on how that turns out. trait Parent(dtype T | interface(T)) virtual() { } ### Extension: sizeof Compatablity Trait types are always sized, it may even be a fixed size like how pointers have the same size regardless of what they point at. However their contents may or may not be of a known size (if the `sized(...)` assertion is used). Currently there is no way to access this information. If it is needed a special syntax would have to be added. Here a special case of `sizeof` is used. struct line aLine; trait drawable widget = aLine; size_t x = sizeof(widget); size_t y = sizeof(trait drawable); As usual `y`, size of the type, is the size of the local storage used to put the value into. The other case `x` checks the saved stored value in the virtual table and returns that.