Proposal For Use of Virtual Tables
==================================

The basic concept of a virtual table (vtable) is the same here as in most
other languages that use them. They will mostly contain function pointers
although they should be able to store anything that goes into a trait.

I also include notes on a sample implementation, which primarily exists to show
there is a reasonable implementation. The code samples for that are in a slight
pseudo-code to help avoid name mangling and keeps some CFA features while they
would actually be written in C.

Trait Instances
---------------

Currently traits are completely abstract. Data types might implement a trait
but traits are not themselves data types. Which is to say you cannot have an
instance of a trait. This proposal will change that and allow instances of
traits to be created from instances of data types that implement the trait.

For example:

    trait combiner(otype T) {
        void combine(T&, int);
    };

    struct summation {
        int sum;
    };

    void ?{}( struct summation & this ) {
        this.sum = 0;
    }

    void combine( struct summation & this, int num ) {
        this.sum = this.sum + num;
    }

    trait combiner obj = struct summation{};
    combine(obj, 5);

As with `struct` (and `union` and `enum`), `trait` might be optional when
using the trait as a type name. A trait may be used in assertion list as
before.

For traits to be used this way they should meet two requirements. First they
should only have a single polymorphic type and each assertion should use that
type once as a parameter. Extensions may later loosen these requirements.

Also note this applies to the final expanded list of assertions. Consider:

    trait foo(otype T, otype U) {
        ... functions that use T once ...
    }

    trait bar(otype S | foo(S, char)) {
        ... functions that use S once ...
    }

In this example `bar` may be used as a type but `foo` may not.

When a trait is used as a type it creates a generic object which combines
the base structure (an instance of `summation` in this case) and the vtable,
which is currently created and provided by a hidden mechanism.

The generic object type for each trait also implements that trait. This is
actually the only means by which it can be used. The type of these functions
look something like this:

    void combine(trait combiner & this, int num);

The main use case for trait objects is that they can be stored. They can be
passed into functions, but using the trait directly is preferred in this case.

    trait drawable(otype T) {
        void draw(Surface & to, T & draw);
        Rect(int) drawArea(T & draw);
    };

    struct UpdatingSurface {
        Surface * surface;
        vector(trait drawable) drawables;
    };

    void updateSurface(UpdatingSurface & us) {
        for (size_t i = 0 ; i < us.drawables.size ; ++i) {
            draw(us.surface, us.drawables[i]);
        }
    }

With a more complete widget trait you could, for example, construct a UI tool
kit that can declare containers that hold widgets without knowing about the
widget types. Making it reasonable to extend the tool kit.

The trait types can also be used in the types of assertions on traits as well.
In this usage they passed as the underlying object and vtable pair as they
are stored. The trait types can also be used in that trait's definition, which
means you can pass two instances of a trait to a single function. However the
look-up of the one that is not used to look up any functions, until another
function that uses that object in the generic/look-up location is called.

    trait example(otype T) {
        bool test(T & this, trait example & that);
    }

### Explanation Of Restrictions

The two restrictions on traits that can be used as trait objects are:

1.  Only one generic parameter may be defined in the trait's header.
2.  Each function assertion must have one parameter with the type of the
    generic parameter. They may or may not return a value of that type.

Elsewhere in this proposal I suggest ways to broaden these requirements.
A simple example would be if a trait meets requirement 1 but not 2, then
the assertions that do not satisfy the exactly one parameter requirement can
be ignored.

However I would like to talk about why these two rules are in place in the
first place and the problems that any exceptions to these rules must avoid.

The problems appear when the dispatcher function which operates on the
generic object.

    trait combiner(otype T, otype U) {
        void combine(T&, U);
    }

This one is so strange I don't have proper syntax for it but let us say that
the concrete dispatcher would be typed as
`void combine(combiner(T) &, combiner(U));`. Does the function that combine
the two underlying types exist to dispatch too?

Maybe not. If `combiner(T)` works with ints and `combiner(U)` is a char then
they could not be. It would have to enforce that all pairs of any types
that are wrapped in this way. Which would pretty much destroy any chance of
separate compilation.

Even then it would be more expensive as the wrappers would have to carry ids
that you use to look up on an <number of types>+1 dimensional table.

The second restriction has a similar issue but makes a bit more sense to
write out.

    trait Series(otype T) {
        ... size, iterators, getters ...
        T join(T const &, T const &);
    }

With the dispatcher typed as:

    Series join(Series const &, Series const &);

Because these instances are generic and hide the underlying implementation we
do not know what that implementation is. Unfortunately this also means the
implementation for the two parameters might not be the same. Once we have
two different types involved this devolves into the first case.

We could check at run-time that the have the same underlying type, but this
would likely time and space overhead and there is no clear recovery path.

#### Sample Implementation
A simple way to implement trait objects is by a pair of pointers. One to the
underlying object and one to the vtable.

    struct vtable_drawable {
        void (*draw)(Surface &, void *);
        Rect(int) (*drawArea)(void *);
    };

    struct drawable {
        void * object;
        vtable_drawable * vtable;
    };

The functions that run on the trait object would generally be generated using
the following pattern:

    void draw(Surface & surface, drawable & traitObj) {
        return traitObj.vtable->draw(surface, traitObj.object);
    }

There may have to be special cases for things like copy construction, that
might require a more significant wrapper. On the other hand moving could be
implemented by moving the pointers without any need to refer to the base
object.

### Extension: Multiple Trait Parameters
The base proposal in effect creates another use for the trait syntax that is
related to the ones currently in the language but is also separate from them.
The current uses generic functions and generic types, this new use could be
described as generic objects.

A generic object is of a concrete type and has concrete functions that work on
it. It is generic in that it is a wrapper for an unknown type. Traits serve
a similar role here as in generic functions as they limit what the function
can be generic over.

This combines the use allowing to have a generic type that is a generic
object. All but one of the trait's parameters is given a concrete type,
conceptually currying the trait to create a trait with on generic parameter
that fits the original restrictions. The resulting concrete generic object
type is different with each set of provided parameters and their values.

Then it just becomes a question of where this is done. Again both examples use
a basic syntax to show the idea.

    trait iterator(virtual otype T, otype Item) {
        bool has_next(T const &);
        Item get_next(T const *);
    }

    iterator(int) int_it = begin(container_of_ints);

The first option is to do it at the definition of the trait. One parameter
is selected (here with the `virtual` keyword, but other rules like "the first"
could also be used) and when an instance of the trait is created all the
other parameters must be provided.

    trait iterator(otype T, otype Item) {
        bool has_next(T const &);
        Item get_next(T const *);
    }

    iterator(virtual, int) int_it = begin(container_of_ints);

The second option is to skip a parameter as part of the type instance
definition. One parameter is explicitly skipped (again with the `virtual`
keyword) and the others have concrete types. The skipped one is the one we
are generic on.

Incidentally in both examples `container_of_ints` may itself be a generic
object and `begin` returns a generic iterator with unknown implementation.

These options are not exclusive. Defining a default on the trait allows for
an object to be created as in the first example. However, whether the
default is provided or not, the second syntax can be used to pick a
parameter on instantiation.

Hierarchy
---------

We would also like to implement hierarchical relations between types.

    ast_node
    |-expression_node
    | |-operator_expression
    |
    |-statement_node
    | |-goto_statement
    |
    |-declaration_node
      |-using_declaration
      |-variable_declaration

Virtual tables by themselves are not quite enough to implement this system.
A vtable is just a list of functions and there is no way to check at run-time
what these functions, we carry that knowledge with the table.

This proposal adds type ids to check for position in the hierarchy and an
explicate syntax for establishing a hierarchical relation between traits and
their implementing types. The ids should uniquely identify each type and
allow retrieval of the type's parent if one exists. By recursion this allows
the ancestor relation between any two hierarchical types can be checked.

The hierarchy is created with traits as the internal nodes and structures
as the leaf nodes. The structures may be used normally and the traits can
be used to create generic objects as in the first section (the same
restrictions apply). However these type objects store their type id which can
be recovered to figure out which type they are or at least check to see if
they fall into a given sub-tree at run-time.

Here is an example of part of a hierarchy. The `virtual(PARENT)` syntax is
just an example. But when used it give the name of the parent type or if
empty it shows that this type is the root of its hierarchy.
(Also I'm not sure where I got these casing rules.)

    trait ast_node(otype T) virtual() {
        void print(T & this, ostream & out);
        void visit(T & this, Visitor & visitor);
        CodeLocation const & get_code_location(T & this);
    }

    trait expression_node(otype T) virtual(ast_node) {
        Type eval_type(T const & this);
    }

    struct operator_expression virtual(expression_node) {
        enum operator_kind kind;
        trait expression_node rands[2];
    }

    trait statement_node(otype T) virtual(ast_node) {
        vector(Label) & get_labels(T & this);
    }

    struct goto_statement virtual(statement_node) {
        vector(Label) labels;
        Label target;
    }

    trait declaration_node(otype T) virtual(ast_node) {
        string name_of(T const & this);
        Type type_of(T const & this);
    }

    struct using_declaration virtual(declaration_node) {
        string new_type;
        Type old_type;
    }

    struct variable_declaration virtual(declaration_node) {
        string name;
        Type type;
    }

This system does not support multiple inheritance. The system could be
extended to support it or a limited form (ex. you may have multiple parents
but they may not have a common ancestor). However this proposal focuses just
on using hierachy as organization. Other uses for reusable/genaric code or
shared interfaces is left for other features of the language.

### Extension: Structural Inheritance
An extension would be allow structures to be used as internal nodes on the
inheritance tree. Its child types would have to implement the same fields.

The weaker restriction would be to convert the fields into field assertions
(Not implemented yet: `U T.x` means there is a field of type you on the type
T. Offset unknown and passed in/stored with function pointers.)
A concrete child would have to declare the same set of fields with the same
types. This is of a more functional style.

The stronger restriction is that the fields of the parent are a prefix of the
child's fields. Possibly automatically inserted. This the imperative view and
may also have less overhead.

### Extension: Unions and Enumerations
Currently there is no reason unions and enumerations, in the cases they
do implement the trait, could not be in the hierarchy as leaf nodes.

It does not work with structural induction, but that could just be a compile
time check that all ancestors are traits or do not add field assertions.

#### Sample Implementation
The type id may be as little as:

    struct typeid {
        struct typeid const * const parent;
    };

Some linker magic would have to be used to ensure exactly one copy of each
structure for each type exists in memory. There seem to be special once
sections that support this and it should be easier than generating unique
ids across compilation units.

The structure could be extended to contain any additional type information.

There are two general designs for vtables with type ids. The first is to put
the type id at the top of the vtable, this is the most compact and efficient
solution but only works if we have exactly 1 vtable for each type. The second
is to put a pointer to the type id in each vtable. This has more overhead but
allows multiple vtables per type.

    struct <trait>_vtable {
        struct typeid const id;

        // Trait dependent list of vtable members.
    };

    struct <trait>_vtable {
        struct typeid const * const id;

        // Trait dependent list of vtable members.
    };

One important restriction is that only one instance of each typeid in memory.
There is a ".gnu.linkonce" feature in the linker that might solve the issue.

### Virtual Casts
The generic objects may be cast up and down the hierarchy.

Casting to an ancestor type always succeeds. From one generic type to another
is just a reinterpretation and could be implicate. Wrapping and unwrapping
a concrete type will probably use the same syntax as in the first section.

Casting from an ancestor to a descendent requires a check. The underlying
type may or may not belong to the sub-tree headed by that descendent. For this
we introduce a new cast operator, which returns the pointer unchanged if the
check succeeds and null otherwise.

    trait SubType * new_value = (virtual trait SubType *)super_type;

For the following example I am using the as of yet finished exception system.

    trait exception(otype T) virtual() {
        char const * what(T & this);
    }

    trait io_error(otype T) virtual(exception) {
        FILE * which_file(T & this);
    }

    struct eof_error(otype T) virtual(io_error) {
        FILE * file;
    }

    char const * what(eof_error &) {
        return "Tried to read from an empty file.";
    }

    FILE * which_file(eof_error & this) {
        return eof_error.file;
    }

    bool handleIoError(exception * exc) {
        io_error * error = (virtual io_error *)exc;
        if (NULL == error) {
            return false;
        }
        ...
        return true;
    }

### Extension: Implicate Virtual Cast Target
This is a small extension, even in the example above `io_error *` is repeated
in the cast and the variable being assigned to. Using return type inference
would allow the second type to be skipped in cases it is clear what type is
being checked against.

The line then becomes:

    io_error * error = (virtual)exc;

#### Sample Implementation
This cast implementation assumes a type id layout similar to the one given
above. Also this code is definitely in the underlying C. Functions that give
this functionality could exist in the standard library but these are meant to
be produced by code translation of the virtual cast.

    bool is_in_subtree(typeid const * root, typeid const * id) {
        if (root == id) {
            return true
        } else if (NULL == id->parent) {
            return false;
        } else {
            return is_in_subtree(root, id->parent);
        }
    }

    void * virtual_cast(typeid const * target, void * value) {
        return is_in_subtree(target, *(typeid const **)value) ? value : NULL;
    }

The virtual cast function might have to be wrapped with some casts to make it
compile without warning.

For the implicate target type we may be able to lean on the type resolution
system that already exists. If the casting to ancestor type is built into
the resolution then the impicate target could be decided by picking an
overload, generated for each hierarchial type (here io_error and its root
type exception).

    io_error * virtual_cast(exception * value) {
        return virtual_cast(io_error_typeid, value);
    }

### Extension: Inline vtables
Since the structures here are usually made to be turned into trait objects
it might be worth it to have fields in them to store the virtual table
pointer. This would have to be declared on the trait as an assertion (example:
`vtable;` or `T.vtable;`), but if it is the trait object could be a single
pointer.

There are also three options for where the pointer to the vtable. It could be
anywhere, a fixed location for each trait or always at the front. For the per-
trait solution an extension to specify what it is (example `vtable[0];`) which
could also be used to combine it with others. So these options can be combined
to allow access to all three options.

The pointer to virtual table field on structures might implicately added (the
types have to declare they are a child here) or created with a declaration,
possibly like the one used to create the assertion.

### Virtual Tables as Types
Here we consider encoding plus the implementation of functions on it to be a
type. Which is to say in the type hierarchy structures aren't concrete types
anymore, instead they are parent types to vtables, which combine the encoding
and implementation.

### Question: Wrapping Structures
One issue is what to do with concrete types at the base of the type tree.
When we are working with the concrete type generally it would like them to be
regular structures with direct calls. On the other hand for interactions with
other types in the hierarchy it is more convenent for the type already to be
cast.

Which of these two should we use? Should we support both and if so how do we
choose which one is being used at any given time.

On a related note I have been using pointers two trait types here, as that
is how many existing languages handle it. However the generic objects might
be only one or two pointers wide passing the objects as a whole would not
be very expensive and all operations on the generic objects probably have
to be defined anyways.

Resolution Scope
----------------

What is the scope of a resolution? When are the functions in a vtable decided
and how broadly is this applied?

### Type Level:
Each structure has a single resolution for all of the functions in the
virtual trait. This is how many languages that implement this or similar
features do it.

The main thing CFA would need to do it this way is some single point where
the type declaration, including the functions that satisfy the trait, are
all defined. Currently there are many points where this can happen, not all
of them have the same definitions and no way to select one over the other.

Some syntax would have to be added to specify the resolution point. To ensure
a single instance there may have to be two variants, one forward declaration
and one to create the instance. With some compiler magic the forward
declaration maybe enough.

    extern trait combiner(struct summation) vtable;
    trait combiner(struct summation) vtable;

Or (with the same variants):

    vtable combiner(struct summation);

The extern variant promises that the vtable will exist while the normal one
is where the resolution actually happens.

### Explicit Resolution Points:
Slightly looser than the above, there are explicit points where the vtables
are resolved, but there is no limit on the number of resolution points that
might be provided. Each time a object is bound to a trait, one of the
resolutions is selected. This might be the most flexible option.

An syntax would have to be provided as above. There may also be the option
to name resolution points so that you can choose between them. This also
could come with the ability to forward declare them.

Especially if they are not named, these resolution points should be able to
appear in functions, where the scoping rules can be used to select one.
However this also means that stack-allocated functions can end up in the
vtable.

    extern trait combiner(struct summation) vtable sum;
    trait combiner(struct summation) vtable sum;

    extern trait combiner(struct summation) vtable sum default;
    trait combiner(struct summation) vtable sum default;

The extern difference is the same before. The name (sum in the samples) is
used at the binding site to say which one is picked. The default keyword can
be used in only some of the declarations.

    trait combiner fee = (summation_instance, sum);
    trait combiner foe = summation_instance;

(I am not really happy about this syntax, but it kind of works.)
The object being bound is required. The name of the vtable is optional if
there is exactly one vtable name marked with default.

These could also be placed inside functions. In which case both the name and
the default keyword might be optional. If the name is omitted in an assignment
the closest vtable is chosen (returning to the global default rule if no
appropriate local vtable is in scope).

### Site Based Resolution:
Every place in code where the binding of a vtable to an object occurs has
its own resolution. Syntax-wise this is the simplest as it should be able
to use just the existing declarations and the conversion to trait object.
It also is very close to the current polymorphic resolution rules.

This works as the explicit resolution points except the resolution points
are implicit and their would be no selection of which resolution to use. The
closest (current) resolution is always selected.

This could easily lead to an explosion of vtables as it has the most fine
grained resolution the number of bindings in a single scope (that produces
the same binding) could be quite high. Merging identical vtables might help
reduce that.

Vtable Lifetime Issues
----------------------

Vtables interact badly with the thunk issue. Conceptually vtables are static
like type/function data they carry, as those decisions are made by the
resolver at compile time.

Stack allocated functions interact badly with this because they are not
static. There are several ways to try to resolve this, however without a
general solution most can keep vtables from making the existing thunk problem
worse, they don't do anything to solve it.

Filling in some fields of a static vtable could cause issues on a recursive
call. And then we are still limited by the lifetime of the stack functions, as
the vtable with stale pointers is still a problem.

Dynamically allocated vtables introduces memory management overhead and
requires some way to differentiate between dynamic and statically allocated
tables. The stale function pointer problem continues unless those becomes
dynamically allocated as well which gives us the same costs again.

Stack allocating the vtable seems like the best issue. The vtable's lifetime
is now the limiting factor but it should be effectively the same as the
shortest lifetime of a function assigned to it. However this still limits the
lifetime "implicitly" and returns to the original problem with thunks.

Odds And Ends
-------------

In addition to the main design there are a few extras that should be
considered. They are not part of the core design but make the new uses fully
featured.

### Extension: Parent-Child Assertion
For hierarchy types in regular traits, generic functions or generic structures
we may want to be able to check parent-child relationships between two types
given. For this we might have to add another primitive assertion. It would
have the following form if declared in code:

    trait is_parent_child(dtype Parent, dtype Child) { <built-in magic> }

This assertion is satified if Parent is an ancestor of Child in a hierarchy.
In other words Child can be statically cast to Parent. The cast from Parent
to child would be dynamically checked as usual.

However in this form there are two concerns. The first that Parent will
usually be consistent for a given use, it will not be a variable. Second is
that we may also need the assertion functions. To do any casting/conversions
anyways.
TODO: Talk about when we wrap a concrete type and how that leads to "may".

To this end it may be better that the parent trait combines the usual
assertions plus this new primitive assertion. There may or may not be use
cases for accessing just one half and providing easy access to them may be
required depending on how that turns out.

    trait Parent(dtype T | interface(T)) virtual(<grand-parent?>) { }

### Extension: sizeof Compatablity
Trait types are always sized, it may even be a fixed size like how pointers
have the same size regardless of what they point at. However their contents
may or may not be of a known size (if the `sized(...)` assertion is used).

Currently there is no way to access this information. If it is needed a
special syntax would have to be added. Here a special case of `sizeof` is
used.

    struct line aLine;
    trait drawable widget = aLine;

    size_t x = sizeof(widget);
    size_t y = sizeof(trait drawable);

As usual `y`, size of the type, is the size of the local storage used to put
the value into. The other case `x` checks the saved stored value in the
virtual table and returns that.