Proposal For Use of Virtual Tables
==================================

This is an adaptation of the earlier virtual proposal, updating it with new
ideas, re-framing it and laying out more design decisions. It should
eventually replace the earlier proposal, but not all features and syntax have
been converted to the new design.

The basic concept of a virtual table (vtable) is the same here as in most
other languages that use them. They will mostly contain function pointers
although they should be able to store anything that goes into a trait.

I also include notes on a sample implementation, which primarily exists to show
there is a reasonable implementation. The code samples for that are in a slight
pseudo-code to help avoid name mangling and keeps some CFA features while they
would actually be written in C.

Trait Instances
---------------

Currently traits are completely abstract. Data types might implement a trait
but traits are not themselves data types. Which is to say you cannot have an
instance of a trait. This proposal will change that and allow instances of
traits to be created from instances of data types that implement the trait.

For example:

    trait combiner(otype T) {
        void combine(T&, int);
    };

    struct summation {
        int sum;
    };

    void ?{}( struct summation & this ) {
        this.sum = 0;
    }

    void combine( struct summation & this, int num ) {
        this.sum = this.sum + num;
    }

    trait combiner obj = struct summation{};
    combine(obj, 5);

As with `struct` (and `union` and `enum`), `trait` might be optional when
using the trait as a type name. A trait may be used in assertion list as
before.

For traits to be used this way they should meet two requirements. First they
should only have a single polymorphic type and each assertion should use that
type once as a parameter. Extensions may later loosen these requirements.

Also note this applies to the final expanded list of assertions. Consider:

    trait foo(otype T, otype U) {
        ... functions that use T once ...
    }

    trait bar(otype S | foo(S, char)) {
        ... functions that use S once ...
    }

In this example `bar` may be used as a type but `foo` may not.

When a trait is used as a type it creates a generic object which combines
the base structure (an instance of `summation` in this case) and the vtable,
which is currently created and provided by a hidden mechanism.

The generic object type for each trait also implements that trait. This is
actually the only means by which it can be used. The type of these functions
look something like this:

    void combine(trait combiner & this, int num);

The main use case for trait objects is that they can be stored. They can be
passed into functions, but using the trait directly is preferred in this case.

    trait drawable(otype T) {
        void draw(Surface & to, T & draw);
        Rect(int) drawArea(T & draw);
    };

    struct UpdatingSurface {
        Surface * surface;
        vector(trait drawable) drawables;
    };

    void updateSurface(UpdatingSurface & us) {
        for (size_t i = 0 ; i < us.drawables.size ; ++i) {
            draw(us.surface, us.drawables[i]);
        }
    }

The trait types can also be used in the types of assertions on traits as well.
In this usage they passed as the underlying object and vtable pair as they
are stored. The trait types can also be used in that trait's definition, which
means you can pass two instances of a trait to a single function. However the
look-up of the one that is not used to look up any functions, until another
function that uses that object in the generic/look-up location is called.

    trait example(otype T) {
        bool test(T & this, trait example & that);
    }

### Explanation Of Restrictions

The two restrictions on traits that can be used as trait objects are:

1.  Only one generic parameter may be defined in the trait's header.
2.  Each function assertion must have one parameter with the type of the
    generic parameter. They may or may not return a value of that type.

Elsewhere in this proposal I suggest ways to broaden these requirements.
A simple example would be if a trait meets requirement 1 but not 2, then
the assertions that do not satisfy the exactly one parameter requirement can
be ignored.

However I would like to talk about why these two rules are in place in the
first place and the problems that any exceptions to these rules must avoid.

The problems appear when the dispatcher function which operates on the
generic object.

    trait combiner(otype T, otype U) {
        void combine(T&, U);
    }

This one is so strange I don't have proper syntax for it but let us say that
the concrete dispatcher would be typed as
`void combine(combiner(T) &, combiner(U));`. Does the function that combine
the two underlying types exist to dispatch too?

Maybe not. If `combiner(T)` works with ints and `combiner(U)` is a char then
they could not be. It would have to enforce that all pairs of any types
that are wrapped in this way. Which would pretty much destroy any chance of
separate compilation.

Even then it would be more expensive as the wrappers would have to carry ids
that you use to look up on an <number of types>+1 dimensional table.

The second restriction has a similar issue but makes a bit more sense to
write out.

    trait Series(otype T) {
        ... size, iterators, getters ...
        T join(T const &, T const &);
    }

With the dispatcher typed as:

    Series join(Series const &, Series const &);

Because these instances are generic and hide the underlying implementation we
do not know what that implementation is. Unfortunately this also means the
implementation for the two parameters might not be the same. Once we have
two different types involved this devolves into the first case.

We could check at run-time that the have the same underlying type, but this
would likely time and space overhead and there is no clear recovery path.

#### Sample Implementation
A simple way to implement trait objects is by a pair of pointers. One to the
underlying object and one to the vtable.

    struct vtable_drawable {
        void (*draw)(Surface &, void *);
        Rect(int) (*drawArea)(void *);
    };

    struct drawable {
        void * object;
        vtable_drawable * vtable;
    };

The functions that run on the trait object would generally be generated using
the following pattern:

    void draw(Surface & surface, drawable & traitObj) {
        return traitObj.vtable->draw(surface, traitObj.object);
    }

There may have to be special cases for things like copy construction, that
might require a more significant wrapper. On the other hand moving could be
implemented by moving the pointers without any need to refer to the base
object.

### Extension: Multiple Trait Parameters
The base proposal in effect creates another use for the trait syntax that is
related to the ones currently in the language but is also separate from them.
The current uses generic functions and generic types, this new use could be
described as generic objects.

A generic object is of a concrete type and has concrete functions that work on
it. It is generic in that it is a wrapper for an unknown type. Traits serve
a similar role here as in generic functions as they limit what the function
can be generic over.

This combines the use allowing to have a generic type that is a generic
object. All but one of the trait's parameters is given a concrete type,
conceptually currying the trait to create a trait with on generic parameter
that fits the original restrictions. The resulting concrete generic object
type is different with each set of provided parameters and their values.

Then it just becomes a question of where this is done. Again both examples use
a basic syntax to show the idea.

    trait iterator(virtual otype T, otype Item) {
        bool has_next(T const &);
        Item get_next(T const *);
    }

    iterator(int) int_it = begin(container_of_ints);

The first option is to do it at the definition of the trait. One parameter
is selected (here with the `virtual` keyword, but other rules like "the first"
could also be used) and when an instance of the trait is created all the
other parameters must be provided.

    trait iterator(otype T, otype Item) {
        bool has_next(T const &);
        Item get_next(T const *);
    }

    iterator(virtual, int) int_it = begin(container_of_ints);

The second option is to skip a parameter as part of the type instance
definition. One parameter is explicitly skipped (again with the `virtual`
keyword) and the others have concrete types. The skipped one is the one we
are generic on.

Incidentally in both examples `container_of_ints` may itself be a generic
object and `begin` returns a generic iterator with unknown implementation.

These options are not exclusive. Defining a default on the trait allows for
an object to be created as in the first example. However, whether the
default is provided or not, the second syntax can be used to pick a
parameter on instantiation.

Hierarchy
---------

We would also like to implement hierarchical relations between types.

    AstNode
    |-ParseNode
    | |-Declaration
    | | |-DeclarationWithType
    | | |-StructureDeclaration
    | |-Statement
    | | |-CompoundStatement
    | |-Expression
    |-Type

Virtual tables by themselves are not quite enough to implement this system.
A vtable is just a list of functions and there is no way to check at run-time
what these functions, we carry that knowledge with the table.

This proposal adds type ids to check for position in the hierarchy and an
explicate syntax for establishing a hierarchical relation between traits and
their implementing types. The ids should uniquely identify each type and
allow retrieval of the type's parent if one exists. By recursion this allows
the ancestor relation between any two hierarchical types can be checked.

The hierarchy is created with traits as the internal nodes and structures
as the leaf nodes. The structures may be used normally and the traits can
be used to create generic objects as in the first section (the same
restrictions apply). However these type objects store their type id which can
be recovered to figure out which type they are or at least check to see if
they fall into a given sub-tree at run-time.

Here is an example of part of a hierarchy. The `virtual(PARENT)` syntax is
just an example. But when used it give the name of the parent type or if
empty it shows that this type is the root of its hierarchy.
(Also I'm not sure where I got these casing rules.)

    trait ast_node(otype T) virtual() {
        void print(T & this, ostream & out);
        void visit(T & this, Visitor & visitor);
        CodeLocation const & get_code_location(T & this);
    }

    trait expression_node(otype T) virtual(ast_node) {
        Type eval_type(T const & this);
    }

    struct operator_expression virtual(expression_node) {
        enum operator_kind kind;
        trait expression_node rands[2];
    }

    trait statement_node(otype T) virtual(ast_node) {
        vector(Label) & get_labels(T & this);
    }

    struct goto_statement virtual(statement_node) {
        vector(Label) labels;
        Label target;
    }

    trait declaration_node(otype T) virtual(ast_node) {
        string name_of(T const & this);
        Type type_of(T const & this);
    }

    struct using_declaration virtual(declaration_node) {
        string new_type;
        Type old_type;
    }

    struct variable_declaration virtual(declaration_node) {
        string name;
        Type type;
    }

### Extension: Structural Inheritance
An extension would be allow structures to be used as internal nodes on the
inheritance tree. Its child types would have to implement the same fields.

The weaker restriction would be to convert the fields into field assertions
(Not implemented yet: `U T.x` means there is a field of type you on the type
T. Offset unknown and passed in/stored with function pointers.)
A concrete child would have to declare the same set of fields with the same
types. This is of a more functional style.

The stronger restriction is that the fields of the parent are a prefix of the
child's fields. Possibly automatically inserted. This the imperative view and
may also have less overhead.

### Extension: Unions and Enumerations
Currently there is no reason unions and enumerations, in the cases they
do implement the trait, could not be in the hierarchy as leaf nodes.

It does not work with structural induction, but that could just be a compile
time check that all ancestors are traits or do not add field assertions.

#### Sample Implementation
The type id may be as little as:

    struct typeid {
        struct typeid const * const parent;
    };

Some linker magic would have to be used to ensure exactly one copy of each
structure for each type exists in memory. There seem to be special once
sections that support this and it should be easier than generating unique
ids across compilation units.

The structure could be extended to contain any additional type information.

There are two general designs for vtables with type ids. The first is to put
the type id at the top of the vtable, this is the most compact and efficient
solution but only works if we have exactly 1 vtable for each type. The second
is to put a pointer to the type id in each vtable. This has more overhead but
allows multiple vtables.

    struct <trait>_vtable {
        struct typeid const id;

        // Trait dependent list of vtable members.
    };

    struct <trait>_vtable {
        struct typeid const * const id;

        // Trait dependent list of vtable members.
    };

### Virtual Casts
The generic objects may be cast up and down the hierarchy.

Casting to an ancestor type always succeeds. From one generic type to another
is just a reinterpretation and could be implicate. Wrapping and unwrapping
a concrete type will probably use the same syntax as in the first section.

Casting from an ancestor to a descendent requires a check. The underlying
type may or may not belong to the sub-tree headed by that descendent. For this
we introduce a new cast operator, which returns the pointer unchanged if the
check succeeds and null otherwise.

    trait SubType * new_value = (virtual trait SubType *)super_type;

For the following example I am using the as of yet finished exception system.

    trait exception(otype T) virtual() {
        char const * what(T & this);
    }

    trait io_error(otype T) virtual(exception) {
        FILE * which_file(T & this);
    }

    struct eof_error(otype T) virtual(io_error) {
        FILE * file;
    }

    char const * what(eof_error &) {
        return "Tried to read from an empty file.";
    }

    FILE * which_file(eof_error & this) {
        return eof_error.file;
    }

    bool handleIoError(exception * exc) {
        io_error * error = (virtual io_error *)exc;
        if (NULL == error) {
            return false;
        }
        ...
        return true;
    }

### Extension: Implicate Virtual Cast Target
This is a small extension, even in the example above `io_error *` is repeated
in the cast and the variable being assigned to. Using return type inference
would allow the second type to be skipped in cases it is clear what type is
being checked against.

The line then becomes:

    io_error * error = (virtual)exc;

### Extension: Inline vtables
Since the structures here are usually made to be turned into trait objects
it might be worth it to have fields in them to store the virtual table
pointer. This would have to be declared on the trait as an assertion (example:
`vtable;` or `T.vtable;`), but if it is the trait object could be a single
pointer.

There are also three options for where the pointer to the vtable. It could be
anywhere, a fixed location for each trait or always at the front. For the per-
trait solution an extension to specify what it is (example `vtable[0];`) which
could also be used to combine it with others. So these options can be combined
to allow access to all three options.

### Virtual Tables as Types
Here we consider encoding plus the implementation of functions on it to be a
type. Which is to say in the type hierarchy structures aren't concrete types
anymore, instead they are parent types to vtables, which combine the encoding
and implementation.

Resolution Scope
----------------

What is the scope of a resolution? When are the functions in a vtable decided
and how broadly is this applied?

### Type Level:
Each structure has a single resolution for all of the functions in the
virtual trait. This is how many languages that implement this or similar
features do it.

The main thing CFA would need to do it this way is some single point where
the type declaration, including the functions that satisfy the trait, are
all defined. Currently there are many points where this can happen, not all
of them have the same definitions and no way to select one over the other.

Some syntax would have to be added to specify the resolution point. To ensure
a single instance there may have to be two variants, one forward declaration
and one to create the instance. With some compiler magic the forward
declaration maybe enough.

    extern trait combiner(struct summation) vtable;
    trait combiner(struct summation) vtable;

Or (with the same variants):

    vtable combiner(struct summation);

The extern variant promises that the vtable will exist while the normal one
is where the resolution actually happens.

### Explicit Resolution Points:
Slightly looser than the above, there are explicit points where the vtables
are resolved, but there is no limit on the number of resolution points that
might be provided. Each time a object is bound to a trait, one of the
resolutions is selected. This might be the most flexible option.

An syntax would have to be provided as above. There may also be the option
to name resolution points so that you can choose between them. This also
could come with the ability to forward declare them.

Especially if they are not named, these resolution points should be able to
appear in functions, where the scoping rules can be used to select one.
However this also means that stack-allocated functions can end up in the
vtable.

    extern trait combiner(struct summation) vtable sum;
    trait combiner(struct summation) vtable sum;

    extern trait combiner(struct summation) vtable sum default;
    trait combiner(struct summation) vtable sum default;

The extern difference is the same before. The name (sum in the samples) is
used at the binding site to say which one is picked. The default keyword can
be used in only some of the declarations.

    trait combiner fee = (summation_instance, sum);
    trait combiner foe = summation_instance;

(I am not really happy about this syntax, but it kind of works.)
The object being bound is required. The name of the vtable is optional if
there is exactly one vtable name marked with default.

These could also be placed inside functions. In which case both the name and
the default keyword might be optional. If the name is omitted in an assignment
the closest vtable is chosen (returning to the global default rule if no
appropriate local vtable is in scope).

### Site Based Resolution:
Every place in code where the binding of a vtable to an object occurs has
its own resolution. Syntax-wise this is the simplest as it should be able
to use just the existing declarations and the conversion to trait object.
It also is very close to the current polymorphic resolution rules.

This works as the explicit resolution points except the resolution points
are implicit and their would be no selection of which resolution to use. The
closest (current) resolution is always selected.

This could easily lead to an explosion of vtables as it has the most fine
grained resolution the number of bindings in a single scope (that produces
the same binding) could be quite high. Merging identical vtables might help
reduce that.

Vtable Lifetime Issues
----------------------

Vtables interact badly with the thunk issue. Conceptually vtables are static
like type/function data they carry, as those decisions are made by the
resolver at compile time.

Stack allocated functions interact badly with this because they are not
static. There are several ways to try to resolve this, however without a
general solution most can only buy time.

Filling in some fields of a static vtable could cause issues on a recursive
call. And then we are still limited by the lifetime of the stack functions, as
the vtable with stale pointers is still a problem.

Dynamically allocated vtables introduces memory management overhead and
requires some way to differentiate between dynamic and statically allocated
tables. The stale function pointer problem continues unless those becomes
dynamically allocated as well which gives us the same costs again.

Stack allocating the vtable seems like the best issue. The vtable's lifetime
is now the limiting factor but it should be effectively the same as the
shortest lifetime of a function assigned to it. However this still limits the
lifetime "implicitly" and returns to the original problem with thunks.