Context Navigation

source: doc/proposals/vtable.md@ bcb311b

Visit:

ADT arm-eh ast-experimental enum forall-pointer-decay jacob/cs343-translation new-ast new-ast-unique-expr pthread-emulation qualifiedEnum

Last change on this file since bcb311b was 0f740d6, checked in by Andrew Beach <ajbeach@…>, 6 years ago
Clean-up. Added one line docs for Stmts.
Property mode set to `100644`
File size: 25.7 KB

Line
1	Proposal For Use of Virtual Tables
2	==================================
3
4	The basic concept of a virtual table (vtable) is the same here as in most
5	other languages that use them. They will mostly contain function pointers
6	although they should be able to store anything that goes into a trait.
7
8	I also include notes on a sample implementation, which primarily exists to show
9	there is a reasonable implementation. The code samples for that are in a slight
10	pseudo-code to help avoid name mangling and keeps some CFA features while they
11	would actually be written in C.
12
13	Trait Instances
14	---------------
15
16	Currently traits are completely abstract. Data types might implement a trait
17	but traits are not themselves data types. Which is to say you cannot have an
18	instance of a trait. This proposal will change that and allow instances of
19	traits to be created from instances of data types that implement the trait.
20
21	For example:
22
23	trait combiner(otype T) {
24	void combine(T&, int);
25	};
26
27	struct summation {
28	int sum;
29	};
30
31	void ?{}( struct summation & this ) {
32	this.sum = 0;
33	}
34
35	void combine( struct summation & this, int num ) {
36	this.sum = this.sum + num;
37	}
38
39	trait combiner obj = struct summation{};
40	combine(obj, 5);
41
42	As with `struct` (and `union` and `enum`), `trait` might be optional when
43	using the trait as a type name. A trait may be used in assertion list as
44	before.
45
46	For traits to be used this way they should meet two requirements. First they
47	should only have a single polymorphic type and each assertion should use that
48	type once as a parameter. Extensions may later loosen these requirements.
49
50	Also note this applies to the final expanded list of assertions. Consider:
51
52	trait foo(otype T, otype U) {
53	... functions that use T once ...
54	}
55
56	trait bar(otype S \| foo(S, char)) {
57	... functions that use S once ...
58	}
59
60	In this example `bar` may be used as a type but `foo` may not.
61
62	When a trait is used as a type it creates a generic object which combines
63	the base structure (an instance of `summation` in this case) and the vtable,
64	which is currently created and provided by a hidden mechanism.
65
66	The generic object type for each trait also implements that trait. This is
67	actually the only means by which it can be used. The type of these functions
68	look something like this:
69
70	void combine(trait combiner & this, int num);
71
72	The main use case for trait objects is that they can be stored. They can be
73	passed into functions, but using the trait directly is preferred in this case.
74
75	trait drawable(otype T) {
76	void draw(Surface & to, T & draw);
77	Rect(int) drawArea(T & draw);
78	};
79
80	struct UpdatingSurface {
81	Surface * surface;
82	vector(trait drawable) drawables;
83	};
84
85	void updateSurface(UpdatingSurface & us) {
86	for (size_t i = 0 ; i < us.drawables.size ; ++i) {
87	draw(us.surface, us.drawables[i]);
88	}
89	}
90
91	With a more complete widget trait you could, for example, construct a UI tool
92	kit that can declare containers that hold widgets without knowing about the
93	widget types. Making it reasonable to extend the tool kit.
94
95	The trait types can also be used in the types of assertions on traits as well.
96	In this usage they passed as the underlying object and vtable pair as they
97	are stored. The trait types can also be used in that trait's definition, which
98	means you can pass two instances of a trait to a single function. However the
99	look-up of the one that is not used to look up any functions, until another
100	function that uses that object in the generic/look-up location is called.
101
102	trait example(otype T) {
103	bool test(T & this, trait example & that);
104	}
105
106	### Explanation Of Restrictions
107
108	The two restrictions on traits that can be used as trait objects are:
109
110	1. Only one generic parameter may be defined in the trait's header.
111	2. Each function assertion must have one parameter with the type of the
112	generic parameter. They may or may not return a value of that type.
113
114	Elsewhere in this proposal I suggest ways to broaden these requirements.
115	A simple example would be if a trait meets requirement 1 but not 2, then
116	the assertions that do not satisfy the exactly one parameter requirement can
117	be ignored.
118
119	However I would like to talk about why these two rules are in place in the
120	first place and the problems that any exceptions to these rules must avoid.
121
122	The problems appear when the dispatcher function which operates on the
123	generic object.
124
125	trait combiner(otype T, otype U) {
126	void combine(T&, U);
127	}
128
129	This one is so strange I don't have proper syntax for it but let us say that
130	the concrete dispatcher would be typed as
131	`void combine(combiner(T) &, combiner(U));`. Does the function that combine
132	the two underlying types exist to dispatch too?
133
134	Maybe not. If `combiner(T)` works with ints and `combiner(U)` is a char then
135	they could not be. It would have to enforce that all pairs of any types
136	that are wrapped in this way. Which would pretty much destroy any chance of
137	separate compilation.
138
139	Even then it would be more expensive as the wrappers would have to carry ids
140	that you use to look up on an <number of types>+1 dimensional table.
141
142	The second restriction has a similar issue but makes a bit more sense to
143	write out.
144
145	trait Series(otype T) {
146	... size, iterators, getters ...
147	T join(T const &, T const &);
148	}
149
150	With the dispatcher typed as:
151
152	Series join(Series const &, Series const &);
153
154	Because these instances are generic and hide the underlying implementation we
155	do not know what that implementation is. Unfortunately this also means the
156	implementation for the two parameters might not be the same. Once we have
157	two different types involved this devolves into the first case.
158
159	We could check at run-time that the have the same underlying type, but this
160	would likely time and space overhead and there is no clear recovery path.
161
162	#### Sample Implementation
163	A simple way to implement trait objects is by a pair of pointers. One to the
164	underlying object and one to the vtable.
165
166	struct vtable_drawable {
167	void (draw)(Surface &, void );
168	Rect(int) (drawArea)(void );
169	};
170
171	struct drawable {
172	void * object;
173	vtable_drawable * vtable;
174	};
175
176	The functions that run on the trait object would generally be generated using
177	the following pattern:
178
179	void draw(Surface & surface, drawable & traitObj) {
180	return traitObj.vtable->draw(surface, traitObj.object);
181	}
182
183	There may have to be special cases for things like copy construction, that
184	might require a more significant wrapper. On the other hand moving could be
185	implemented by moving the pointers without any need to refer to the base
186	object.
187
188	### Extension: Multiple Trait Parameters
189	The base proposal in effect creates another use for the trait syntax that is
190	related to the ones currently in the language but is also separate from them.
191	The current uses generic functions and generic types, this new use could be
192	described as generic objects.
193
194	A generic object is of a concrete type and has concrete functions that work on
195	it. It is generic in that it is a wrapper for an unknown type. Traits serve
196	a similar role here as in generic functions as they limit what the function
197	can be generic over.
198
199	This combines the use allowing to have a generic type that is a generic
200	object. All but one of the trait's parameters is given a concrete type,
201	conceptually currying the trait to create a trait with on generic parameter
202	that fits the original restrictions. The resulting concrete generic object
203	type is different with each set of provided parameters and their values.
204
205	Then it just becomes a question of where this is done. Again both examples use
206	a basic syntax to show the idea.
207
208	trait iterator(virtual otype T, otype Item) {
209	bool has_next(T const &);
210	Item get_next(T const *);
211	}
212
213	iterator(int) int_it = begin(container_of_ints);
214
215	The first option is to do it at the definition of the trait. One parameter
216	is selected (here with the `virtual` keyword, but other rules like "the first"
217	could also be used) and when an instance of the trait is created all the
218	other parameters must be provided.
219
220	trait iterator(otype T, otype Item) {
221	bool has_next(T const &);
222	Item get_next(T &);
223	}
224
225	iterator(virtual, int) int_it = begin(container_of_ints);
226
227	The second option is to skip a parameter as part of the type instance
228	definition. One parameter is explicitly skipped (again with the `virtual`
229	keyword) and the others have concrete types. The skipped one is the one we
230	are generic on.
231
232	Incidentally in both examples `container_of_ints` may itself be a generic
233	object and `begin` returns a generic iterator with unknown implementation.
234
235	These options are not exclusive. Defining a default on the trait allows for
236	an object to be created as in the first example. However, whether the
237	default is provided or not, the second syntax can be used to pick a
238	parameter on instantiation.
239
240	Hierarchy
241	---------
242
243	We would also like to implement hierarchical relations between types.
244
245	ast_node
246	\|-expression_node
247	\| \|-operator_expression
248	\|
249	\|-statement_node
250	\| \|-goto_statement
251	\|
252	\|-declaration_node
253	\|-using_declaration
254	\|-variable_declaration
255
256	Virtual tables by themselves are not quite enough to implement this system.
257	A vtable is just a list of functions and there is no way to check at run-time
258	what these functions, we carry that knowledge with the table.
259
260	This proposal adds type ids to check for position in the hierarchy and an
261	explicate syntax for establishing a hierarchical relation between traits and
262	their implementing types. The ids should uniquely identify each type and
263	allow retrieval of the type's parent if one exists. By recursion this allows
264	the ancestor relation between any two hierarchical types can be checked.
265
266	The hierarchy is created with traits as the internal nodes and structures
267	as the leaf nodes. The structures may be used normally and the traits can
268	be used to create generic objects as in the first section (the same
269	restrictions apply). However these type objects store their type id which can
270	be recovered to figure out which type they are or at least check to see if
271	they fall into a given sub-tree at run-time.
272
273	Here is an example of part of a hierarchy. The `virtual(PARENT)` syntax is
274	just an example. But when used it give the name of the parent type or if
275	empty it shows that this type is the root of its hierarchy.
276	(Also I'm not sure where I got these casing rules.)
277
278	trait ast_node(otype T) virtual() {
279	void print(T & this, ostream & out);
280	void visit(T & this, Visitor & visitor);
281	CodeLocation const & get_code_location(T & this);
282	}
283
284	trait expression_node(otype T) virtual(ast_node) {
285	Type eval_type(T const & this);
286	}
287
288	struct operator_expression virtual(expression_node) {
289	enum operator_kind kind;
290	trait expression_node rands[2];
291	}
292
293	trait statement_node(otype T) virtual(ast_node) {
294	vector(Label) & get_labels(T & this);
295	}
296
297	struct goto_statement virtual(statement_node) {
298	vector(Label) labels;
299	Label target;
300	}
301
302	trait declaration_node(otype T) virtual(ast_node) {
303	string name_of(T const & this);
304	Type type_of(T const & this);
305	}
306
307	struct using_declaration virtual(declaration_node) {
308	string new_type;
309	Type old_type;
310	}
311
312	struct variable_declaration virtual(declaration_node) {
313	string name;
314	Type type;
315	}
316
317	This system does not support multiple inheritance. The system could be
318	extended to support it or a limited form (ex. you may have multiple parents
319	but they may not have a common ancestor). However this proposal focuses just
320	on using hierachy as organization. Other uses for reusable/genaric code or
321	shared interfaces is left for other features of the language.
322
323	### Extension: Structural Inheritance
324	An extension would be allow structures to be used as internal nodes on the
325	inheritance tree. Its child types would have to implement the same fields.
326
327	The weaker restriction would be to convert the fields into field assertions
328	(Not implemented yet: `U T.x` means there is a field of type you on the type
329	T. Offset unknown and passed in/stored with function pointers.)
330	A concrete child would have to declare the same set of fields with the same
331	types. This is of a more functional style.
332
333	The stronger restriction is that the fields of the parent are a prefix of the
334	child's fields. Possibly automatically inserted. This the imperative view and
335	may also have less overhead.
336
337	### Extension: Unions and Enumerations
338	Currently there is no reason unions and enumerations, in the cases they
339	do implement the trait, could not be in the hierarchy as leaf nodes.
340
341	It does not work with structural induction, but that could just be a compile
342	time check that all ancestors are traits or do not add field assertions.
343
344	#### Sample Implementation
345	The type id may be as little as:
346
347	struct typeid {
348	struct typeid const * const parent;
349	};
350
351	Some linker magic would have to be used to ensure exactly one copy of each
352	structure for each type exists in memory. There seem to be special once
353	sections that support this and it should be easier than generating unique
354	ids across compilation units.
355
356	The structure could be extended to contain any additional type information.
357
358	There are two general designs for vtables with type ids. The first is to put
359	the type id at the top of the vtable, this is the most compact and efficient
360	solution but only works if we have exactly 1 vtable for each type. The second
361	is to put a pointer to the type id in each vtable. This has more overhead but
362	allows multiple vtables per type.
363
364	struct <trait>_vtable {
365	struct typeid const id;
366
367	// Trait dependent list of vtable members.
368	};
369
370	struct <trait>_vtable {
371	struct typeid const * const id;
372
373	// Trait dependent list of vtable members.
374	};
375
376	One important restriction is that only one instance of each typeid in memory.
377	There is a ".gnu.linkonce" feature in the linker that might solve the issue.
378
379	### Virtual Casts
380	The generic objects may be cast up and down the hierarchy.
381
382	Casting to an ancestor type always succeeds. From one generic type to another
383	is just a reinterpretation and could be implicate. Wrapping and unwrapping
384	a concrete type will probably use the same syntax as in the first section.
385
386	Casting from an ancestor to a descendent requires a check. The underlying
387	type may or may not belong to the sub-tree headed by that descendent. For this
388	we introduce a new cast operator, which returns the pointer unchanged if the
389	check succeeds and null otherwise.
390
391	trait SubType * new_value = (virtual trait SubType *)super_type;
392
393	For the following example I am using the as of yet finished exception system.
394
395	trait exception(otype T) virtual() {
396	char const * what(T & this);
397	}
398
399	trait io_error(otype T) virtual(exception) {
400	FILE * which_file(T & this);
401	}
402
403	struct eof_error(otype T) virtual(io_error) {
404	FILE * file;
405	}
406
407	char const * what(eof_error &) {
408	return "Tried to read from an empty file.";
409	}
410
411	FILE * which_file(eof_error & this) {
412	return eof_error.file;
413	}
414
415	bool handleIoError(exception * exc) {
416	io_error * error = (virtual io_error *)exc;
417	if (NULL == error) {
418	return false;
419	}
420	...
421	return true;
422	}
423
424	### Extension: Implicate Virtual Cast Target
425	This is a small extension, even in the example above `io_error *` is repeated
426	in the cast and the variable being assigned to. Using return type inference
427	would allow the second type to be skipped in cases it is clear what type is
428	being checked against.
429
430	The line then becomes:
431
432	io_error * error = (virtual)exc;
433
434	#### Sample Implementation
435	This cast implementation assumes a type id layout similar to the one given
436	above. Also this code is definitely in the underlying C. Functions that give
437	this functionality could exist in the standard library but these are meant to
438	be produced by code translation of the virtual cast.
439
440	bool is_in_subtree(typeid const * root, typeid const * id) {
441	if (root == id) {
442	return true
443	} else if (NULL == id->parent) {
444	return false;
445	} else {
446	return is_in_subtree(root, id->parent);
447	}
448	}
449
450	void * virtual_cast(typeid const * target, void * value) {
451	return is_in_subtree(target, (typeid const *)value) ? value : NULL;
452	}
453
454	The virtual cast function might have to be wrapped with some casts to make it
455	compile without warning.
456
457	For the implicate target type we may be able to lean on the type resolution
458	system that already exists. If the casting to ancestor type is built into
459	the resolution then the impicate target could be decided by picking an
460	overload, generated for each hierarchial type (here io_error and its root
461	type exception).
462
463	io_error * virtual_cast(exception * value) {
464	return virtual_cast(io_error_typeid, value);
465	}
466
467	### Extension: Inline vtables
468	Since the structures here are usually made to be turned into trait objects
469	it might be worth it to have fields in them to store the virtual table
470	pointer. This would have to be declared on the trait as an assertion (example:
471	`vtable;` or `T.vtable;`), but if it is the trait object could be a single
472	pointer.
473
474	There are also three options for where the pointer to the vtable. It could be
475	anywhere, a fixed location for each trait or always at the front. For the per-
476	trait solution an extension to specify what it is (example `vtable[0];`) which
477	could also be used to combine it with others. So these options can be combined
478	to allow access to all three options.
479
480	The pointer to virtual table field on structures might implicately added (the
481	types have to declare they are a child here) or created with a declaration,
482	possibly like the one used to create the assertion.
483
484	### Virtual Tables as Types
485	Here we consider encoding plus the implementation of functions on it to be a
486	type. Which is to say in the type hierarchy structures aren't concrete types
487	anymore, instead they are parent types to vtables, which combine the encoding
488	and implementation.
489
490	### Question: Wrapping Structures
491	One issue is what to do with concrete types at the base of the type tree.
492	When we are working with the concrete type generally it would like them to be
493	regular structures with direct calls. On the other hand for interactions with
494	other types in the hierarchy it is more convenent for the type already to be
495	cast.
496
497	Which of these two should we use? Should we support both and if so how do we
498	choose which one is being used at any given time.
499
500	On a related note I have been using pointers two trait types here, as that
501	is how many existing languages handle it. However the generic objects might
502	be only one or two pointers wide passing the objects as a whole would not
503	be very expensive and all operations on the generic objects probably have
504	to be defined anyways.
505
506	Resolution Scope
507	----------------
508
509	What is the scope of a resolution? When are the functions in a vtable decided
510	and how broadly is this applied?
511
512	### Type Level:
513	Each structure has a single resolution for all of the functions in the
514	virtual trait. This is how many languages that implement this or similar
515	features do it.
516
517	The main thing CFA would need to do it this way is some single point where
518	the type declaration, including the functions that satisfy the trait, are
519	all defined. Currently there are many points where this can happen, not all
520	of them have the same definitions and no way to select one over the other.
521
522	Some syntax would have to be added to specify the resolution point. To ensure
523	a single instance there may have to be two variants, one forward declaration
524	and one to create the instance. With some compiler magic the forward
525	declaration maybe enough.
526
527	extern trait combiner(struct summation) vtable;
528	trait combiner(struct summation) vtable;
529
530	Or (with the same variants):
531
532	vtable combiner(struct summation);
533
534	The extern variant promises that the vtable will exist while the normal one
535	is where the resolution actually happens.
536
537	### Explicit Resolution Points:
538	Slightly looser than the above, there are explicit points where the vtables
539	are resolved, but there is no limit on the number of resolution points that
540	might be provided. Each time a object is bound to a trait, one of the
541	resolutions is selected. This might be the most flexible option.
542
543	An syntax would have to be provided as above. There may also be the option
544	to name resolution points so that you can choose between them. This also
545	could come with the ability to forward declare them.
546
547	Especially if they are not named, these resolution points should be able to
548	appear in functions, where the scoping rules can be used to select one.
549	However this also means that stack-allocated functions can end up in the
550	vtable.
551
552	extern trait combiner(struct summation) vtable sum;
553	trait combiner(struct summation) vtable sum;
554
555	extern trait combiner(struct summation) vtable sum default;
556	trait combiner(struct summation) vtable sum default;
557
558	The extern difference is the same before. The name (sum in the samples) is
559	used at the binding site to say which one is picked. The default keyword can
560	be used in only some of the declarations.
561
562	trait combiner fee = (summation_instance, sum);
563	trait combiner foe = summation_instance;
564
565	(I am not really happy about this syntax, but it kind of works.)
566	The object being bound is required. The name of the vtable is optional if
567	there is exactly one vtable name marked with default.
568
569	These could also be placed inside functions. In which case both the name and
570	the default keyword might be optional. If the name is omitted in an assignment
571	the closest vtable is chosen (returning to the global default rule if no
572	appropriate local vtable is in scope).
573
574	### Site Based Resolution:
575	Every place in code where the binding of a vtable to an object occurs has
576	its own resolution. Syntax-wise this is the simplest as it should be able
577	to use just the existing declarations and the conversion to trait object.
578	It also is very close to the current polymorphic resolution rules.
579
580	This works as the explicit resolution points except the resolution points
581	are implicit and their would be no selection of which resolution to use. The
582	closest (current) resolution is always selected.
583
584	This could easily lead to an explosion of vtables as it has the most fine
585	grained resolution the number of bindings in a single scope (that produces
586	the same binding) could be quite high. Merging identical vtables might help
587	reduce that.
588
589	Vtable Lifetime Issues
590	----------------------
591
592	Vtables interact badly with the thunk issue. Conceptually vtables are static
593	like type/function data they carry, as those decisions are made by the
594	resolver at compile time.
595
596	Stack allocated functions interact badly with this because they are not
597	static. There are several ways to try to resolve this, however without a
598	general solution most can keep vtables from making the existing thunk problem
599	worse, they don't do anything to solve it.
600
601	Filling in some fields of a static vtable could cause issues on a recursive
602	call. And then we are still limited by the lifetime of the stack functions, as
603	the vtable with stale pointers is still a problem.
604
605	Dynamically allocated vtables introduces memory management overhead and
606	requires some way to differentiate between dynamic and statically allocated
607	tables. The stale function pointer problem continues unless those becomes
608	dynamically allocated as well which gives us the same costs again.
609
610	Stack allocating the vtable seems like the best issue. The vtable's lifetime
611	is now the limiting factor but it should be effectively the same as the
612	shortest lifetime of a function assigned to it. However this still limits the
613	lifetime "implicitly" and returns to the original problem with thunks.
614
615	Odds And Ends
616	-------------
617
618	In addition to the main design there are a few extras that should be
619	considered. They are not part of the core design but make the new uses fully
620	featured.
621
622	### Extension: Parent-Child Assertion
623	For hierarchy types in regular traits, generic functions or generic structures
624	we may want to be able to check parent-child relationships between two types
625	given. For this we might have to add another primitive assertion. It would
626	have the following form if declared in code:
627
628	trait is_parent_child(dtype Parent, dtype Child) { <built-in magic> }
629
630	This assertion is satified if Parent is an ancestor of Child in a hierarchy.
631	In other words Child can be statically cast to Parent. The cast from Parent
632	to child would be dynamically checked as usual.
633
634	However in this form there are two concerns. The first that Parent will
635	usually be consistent for a given use, it will not be a variable. Second is
636	that we may also need the assertion functions. To do any casting/conversions
637	anyways.
638	TODO: Talk about when we wrap a concrete type and how that leads to "may".
639
640	To this end it may be better that the parent trait combines the usual
641	assertions plus this new primitive assertion. There may or may not be use
642	cases for accessing just one half and providing easy access to them may be
643	required depending on how that turns out.
644
645	trait Parent(dtype T \| interface(T)) virtual(<grand-parent?>) { }
646
647	### Extension: sizeof Compatablity
648	Trait types are always sized, it may even be a fixed size like how pointers
649	have the same size regardless of what they point at. However their contents
650	may or may not be of a known size (if the `sized(...)` assertion is used).
651
652	Currently there is no way to access this information. If it is needed a
653	special syntax would have to be added. Here a special case of `sizeof` is
654	used.
655
656	struct line aLine;
657	trait drawable widget = aLine;
658
659	size_t x = sizeof(widget);
660	size_t y = sizeof(trait drawable);
661
662	As usual `y`, size of the type, is the size of the local storage used to put
663	the value into. The other case `x` checks the saved stored value in the
664	virtual table and returns that.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: