Context Navigation

source: doc/theses/fangren_yu_MMath/content1.tex @ 9861ef2

Last change on this file since 9861ef2 was 7c80a86, checked in by Peter A. Buhr <pabuhr@…>, 4 weeks ago
proofread chapter 3
Property mode set to `100644`
File size: 38.6 KB

Line
1	\chapter{\CFA Features and Type System Interactions}
2	\label{c:content1}
3
4	This chapter discusses \CFA feature introduced over time by multiple people and their interactions with the type system.
5
6
7	\section{Reference Types}
8
9	Reference types were added to \CFA by Robert Schluntz and Aaron Moss~\cite{Moss18}.
10	The \CFA reference type generalizes the \CC reference type (and its equivalent in other modern programming languages) by providing both mutable and immutable forms and cascading referencing and dereferencing.
11	Specifically, \CFA attempts to extend programmer intuition about pointers to references.
12	That is, use a pointer when its primary purpose is manipulating the address of storage, \eg a top/head/tail pointer or link field in a mutable data structure.
13	Here, manipulating the pointer address is the primary operation, while dereferencing the pointer to its value is the secondary operation.
14	For example, \emph{within} a data structure, \eg stack or queue, all operations involve pointer addresses and the pointer may never be dereferenced because the referenced object is opaque.
15	Alternatively, use a reference when its primary purpose is to alias a value, \eg a function parameter that does not copy the argument (performance reason).
16	Here, manipulating the value is the primary operation, while changing the pointer address is the secondary operation.
17	Succinctly, if the address changes often, use a pointer;
18	if the value changes often, use a reference.
19	Java has mutable references but no pointers.
20	\CC has mutable pointers but immutable references;
21	hence, references match with functional programming.
22	However, the consequence is asymmetry semantics between the pointer and reference.
23	\CFA adopts a uniform policy between pointers and references where mutability is a separate property made at the declaration.
24
25	The following examples shows how pointers and references are treated uniformly in \CFA.
26	\begin{cfa}[numbers=left,numberblanklines=false]
27	int x = 1, y = 2, z = 3;$\label{p:refexamples}$
28	int * p1 = &x, p2 = &p1, * p3 = &p2, $\C{// pointers to x}$
29	@&@ r1 = x, @&&@ r2 = r1, @&&&@ r3 = r2; $\C{// references to x}$
30	int * p4 = &z, & r4 = z;
31
32	p1 = 3; p2 = 3; **p3 = 3; $\C{// different ways to change x to 3}$
33	r1 = 3; r2 = 3; r3 = 3; $\C{// change x: implicit dereference r1, r2, **r3}$
34	*p3 = &y; p3 = &p4; $\C{// change p1, p2}$
35	// cancel implicit dereferences (&)r3, (&(&))r3, &(&*)r4
36	@&@r3 = @&@y; @&&@r3 = @&&@r4; $\C{// change r1, r2}$
37	\end{cfa}
38	Like pointers, reference can be cascaded, \ie a reference to a reference, \eg @&& r2@.\footnote{
39	\CC uses \lstinline{&&} for rvalue reference, a feature for move semantics and handling the \lstinline{const} Hell problem.}
40	Usage of a reference variable automatically performs the same number of dereferences as the number of references in its declaration, \eg @r2@ becomes @**r2@.
41	Finally, to reassign a reference's address needs a mechanism to stop the auto-referencing, which is accomplished by using a single reference to cancel all the auto-dereferencing, \eg @&r3 = &y@ resets @r3@'s address to point to @y@.
42	\CFA's reference type (including multi-de/references) is powerful enough to describe the lvalue rules in C by types only.
43	As a result, the \CFA type checker now works on just types without using the notion of lvalue in an expression.
44	(\CFA internals still use lvalue for code generation purposes.)
45
46	The current reference typing rules in \CFA are summarized as follows:
47	\begin{enumerate}
48	\item For a variable $x$ with declared type $T$, the variable-expression $x$ has type reference to $T$, even if $T$ itself is a reference type.
49	\item For an expression $e$ with type $T\ \&_1...\&_n$, \ie $T$ followed by $n$ references, where $T$ is not a reference type, the expression $\&T$ (address of $T$) has type $T *$ followed by $n - 1$ references.
50	\item For an expression $e$ with type $T \&_1...\&_n$, \ie $T $ followed by $n$ references, the expression $* T$ (dereference $T$) has type $T$ followed by $n + 1$ references.
51	This rule is the reverse of the previous rule, such that address-of and dereference operators are perfect inverses.
52	\item When matching argument and parameter types at a function call, the number of references on the argument type is stripped off to match the number of references on the parameter type.\footnote{
53	\CFA handles the \lstinline{const} Hell problem by allowing rvalue expressions to be converted to reference values by implicitly creating a temporary variable, with some restrictions.
54	As well, there is a warning that the output nature of the reference is lost.
55	Hence, a single function handles \lstinline{const} and non-\lstinline{const} as constness is handled at the call site.}
56	In an assignment context, the left-hand-side operand-type is always reduced to a single reference.
57	\end{enumerate}
58	Under this ruleset, a type parameter is never bound to a reference type in a function-call context.
59	\begin{cfa}
60	forall( T ) void f( T & );
61	int & x;
62	f( x ); // implicit dereference
63	\end{cfa}
64	The call applies an implicit dereference once to @x@ so the call is typed @f( int & )@ with @T = int@, rather than with @T = int &@.
65
66	As for a pointer type, a reference type may have qualifiers, where @const@ is most interesting.
67	\begin{cfa}
68	int x = 3; $\C{// mutable}$
69	const int cx = 5; $\C{// immutable}$
70	int * const cp = &x, $\C{// immutable pointer pointer/reference}$
71	& const cr = cx;
72	const int * const ccp = &cx, $\C{// immutable value and pointer/reference}$
73	& const ccr = cx;
74	\end{cfa}
75	\begin{cquote}
76	\setlength{\tabcolsep}{26pt}
77	\begin{tabular}{@{}lll@{}}
78	pointer & reference & \\
79	\begin{cfa}
80	*cp = 7;
81	cp = &x;
82	*ccp = 7;
83	ccp = &cx;
84	\end{cfa}
85	&
86	\begin{cfa}
87	cr = 7;
88	cr = &x;
89	*ccr = 7;
90	ccr = &cx;
91	\end{cfa}
92	&
93	\begin{cfa}
94	// allowed
95	// error, assignment of read-only variable
96	// error, assignment of read-only location
97	// error, assignment of read-only variable
98	\end{cfa}
99	\end{tabular}
100	\end{cquote}
101	Interestingly, C does not give a warning/error if a @const@ pointer is not initialized, while \CC does.
102	Hence, type @& const@ is similar to a \CC reference, but \CFA does not preclude initialization with a non-variable address.
103	For example, in system's programming, there are cases where an immutable address is initialized to a specific memory location.
104	\begin{cfa}
105	int & const mem_map = *0xe45bbc67@p@; $\C{// hardware mapped registers ('p' for pointer)}$
106	\end{cfa}
107	Finally, qualification is generalized across all pointer/reference declarations.
108	\begin{cfa}
109	const * const * const * const ccccp = ...
110	const & const & const & const ccccr = ...
111	\end{cfa}
112
113	In the initial \CFA reference design, the goal was to make the reference type a \emph{real} data type \vs a restricted \CC reference, which is mostly used for choosing the argument-passing method, \ie by-value or by-reference.
114	However, there is an inherent ambiguity for auto-dereferencing: every argument expression involving a reference variable can potentially mean passing the reference's value or address.
115	Without any restrictions, this ambiguity limits the behaviour of reference types in \CFA polymorphic functions, where a type @T@ can bind to a reference or non-reference type.
116	This ambiguity prevents the type system treating reference types the same way as other types, even if type variables could be bound to reference types.
117	The reason is that \CFA uses a common \emph{object trait}\label{p:objecttrait} (constructor, destructor and assignment operators) to handle passing dynamic concrete type arguments into polymorphic functions, and the reference types are handled differently in these contexts so they do not satisfy this common interface.
118
119	Moreover, there is also some discrepancy in how the reference types are treated in initialization and assignment expressions.
120	For example, in line 3 of the example code on \VPageref{p:refexamples}:
121	\begin{cfa}
122	int @&@ r1 = x, @&&@ r2 = r1, @&&&@ r3 = r2; $\C{// references to x}$
123	\end{cfa}
124	each initialization expression is implicitly dereferenced to match the types, \eg @&x@, because an address is always required and a variable normally returns its value;
125	\CC does the same implicit dereference when initializing its reference variables.
126	For lines 6 and 9 of the previous example code:
127	\begin{cfa}
128	r1 = 3; r2 = 3; r3 = 3; $\C{// change x: implicit dereference r1, r2, **r3}$
129	@&@r3 = @&@y; @&&@r3 = @&&@r4; $\C{// change r1, r2}$
130	\end{cfa}
131	there are no actual assignment operators defined for reference types that can be overloaded;
132	instead, all reference assignments are handled by semantic actions in the type system.
133	In fact, the reassignment of reference variables is setup internally to use the assignment operators for pointer types.
134	Finally, there is an annoying issue (although purely syntactic) for setting a mutable reference to a specific address like null, @int & r1 = *0p@, which looks like dereferencing a null pointer.
135	Here, the expression is rewritten as @int & r1 = &(*0p)@, like the variable dereference of @x@ above.
136	However, the implicit @&@ needs to be cancelled for an address, which is done with the @@, \ie @&@ cancel each other, giving @0p@.
137	Therefore, the dereferencing operation does not actually happen and the expression is translated into directly initializing the reference variable with the address.
138	Note, the same explicit reference is used in \CC to set a reference variable to null.
139	\begin{c++}
140	int & ip = @@(int )nullptr;
141	\end{c++}
142	which is used in certain systems-programming situations.
143
144	When generic types were introduced to \CFA~\cite{Moss19}, some thought was given to allow reference types as type arguments.
145	\begin{cfa}
146	forall( T ) struct vector { T t; }; $\C{// generic type}$
147	vector( int @&@ ) vec; $\C{// vector of references to ints}$
148	\end{cfa}
149	While it is possible to write a reference type as the argument to a generic type, it is disallowed in assertion checking, if the generic type requires the object trait \see{\VPageref{p:objecttrait}} for the type argument, a fairly common use case.
150	Even if the object trait can be made optional, the current type system often misbehaves by adding undesirable auto-dereference on the referenced-to value rather than the reference variable itself, as intended.
151	Some tweaks are necessary to accommodate reference types in polymorphic contexts and it is unclear what can or cannot be achieved.
152	Currently, there are contexts where \CFA programmer is forced to use a pointer type, giving up the benefits of auto-dereference operations and better syntax with reference types.
153
154
155	\section{Tuple Types}
156
157	The addition of tuples to \CFA can be traced back to the original design by David Till in \mbox{K-W C}~\cite{Till89,Buhr94a}, a predecessor project of \CFA.
158	The primary purpose of tuples is to eliminate output parameters or creating an aggregate type to return multiple values from a function, called a multiple-value-returning (MVR) function.
159	Traditionally, returning multiple values is accomplished via (in/)output parameters or packing the results in a structure.
160	The following examples show these two techniques for a function returning three values.
161	\begin{cquote}
162	\begin{tabular}{@{}l@{\hspace{20pt}}l@{}}
163	\begin{cfa}
164
165	int foo( int &p2, int &p3 ); // in/out parameters
166	int x, y = 3, z = 4;
167	x = foo( y, z ); // return 3 values
168	\end{cfa}
169	&
170	\begin{cfa}
171	struct Ret { int x, y, z; };
172	Ret foo( int p2, int p3 ); // multiple return values
173	Ret ret = { .y = 3, .z = 4 };
174	ret = foo( ret.y, ret.z ); // return 3 values
175	\end{cfa}
176	\end{tabular}
177	\end{cquote}
178	K-W C allows direct return of multiple values into a tuple.
179	\begin{cfa}
180	@[int, int, int]@ foo( int p2, int p3 );
181	@[x, y, z]@ = foo( y, z ); // return 3 values into a tuple
182	\end{cfa}
183	Along with making returning multiple values a first-class feature, tuples were extended to simplify a number of other common context that normally require multiple statements and/or additional declarations, all of which reduces coding time and errors.
184	\begin{cfa}
185	[x, y, z] = 3; $\C[2in]{// x = 3; y = 3; z = 3, where types may be different}$
186	[x, y] = [y, x]; $\C{// int tmp = x; x = y; y = tmp;}$
187	void bar( int, int, int );
188	bar( foo( 3, 4 ) ); $\C{// int t0, t1, t2; [t0, t1, t2] = foo( 3, 4 ); bar( t0, t1, t2 );}$
189	x = foo( 3, 4 )@.1@; $\C{// int t0, t1, t2; [t0, t1, t2] = foo( 3, 4 ); x = t1;}\CRT$
190	\end{cfa}
191	For the call to @bar@, the three results (tuple value) from @foo@ are \newterm{flattened} into individual arguments.
192	Flattening is how tuples interact with parameter and subscript lists, and with other tuples, \eg:
193	\begin{cfa}
194	[ [ x, y ], z, [a, b, c] ] = [2, [3, 4], foo( 3, 4) ] $\C{// structured}$
195	[ x, y, z, a, b, c] = [2, 3, 4, foo( 3, 4) ] $\C{// flattened, where foo results are t0, t1, t2}$
196	\end{cfa}
197	Note, in most cases, a tuple is just compile-time syntactic-sugar for a number of individual assignments statements and possibly temporary variables.
198	Only when returning a tuple from a function is there the notion of a tuple value.
199
200	Overloading in the \CFA type-system must support complex composition of tuples and C type conversions using a costing scheme giving lower cost to widening conversions that do not truncate a value.
201	\begin{cfa}
202	[ int, int ] foo$\(_1\)$( int ); $\C{// overloaded foo functions}$
203	[ double ] foo$\(_2\)$( int );
204	void bar( int, double, double );
205	bar( @foo@( 3 ), @foo@( 3 ) );
206	\end{cfa}
207	The type resolver only has the tuple return types to resolve the call to @bar@ as the @foo@ parameters are identical, which involves unifying the flattened @foo@ return values with @bar@'s parameter list.
208	However, no combination of @foo@s is an exact match with @bar@'s parameters;
209	thus, the resolver applies C conversions to obtain a best match.
210	The resulting minimal cost expression is @bar( foo@$_1$@( 3 ), foo@$_2$@( 3 ) )@, where the two possible coversions are (@int@, {\color{red}@int@}, @double@) to (@int@, {\color{red}@double@}, @double@) with a safe (widening) conversion from @int@ to @double@ versus ({\color{red}@double@}, {\color{red}@int@}, {\color{red}@int@}) to ({\color{red}@int@}, {\color{red}@double@}, {\color{red}@double@}) with one unsafe (narrowing) conversion from @double@ to @int@ and two safe conversions from @int@ to @double@.
211	The programming language Go provides a similar but simplier tuple mechanism, as it does not have overloaded functions.
212
213	K-W C also supported tuple variables, but with a strong distinction between tuples and tuple values/variables.
214	\begin{quote}
215	Note that tuple variables are not themselves tuples.
216	Tuple variables reference contiguous areas of storage, in which tuple values are stored;
217	tuple variables and tuple values are entities which appear during program execution.
218	Tuples, on the other hand, are compile-time constructs;
219	they are lists of expressions, whose values may not be stored contiguously.~\cite[p.~30]{Till89}
220	\end{quote}
221	Fundamentally, a tuple value/variable is just a structure (contiguous areas) with an alternate (tuple) interface.
222	A tuple value/variable is assignable (like structures), its fields can be accessed using position rather than name qualification, and it can interact with regular tuples.
223	\begin{cfa}
224	[ int, int, int ] t1, t2;
225	t1 = t2; $\C{// tuple assignment}$
226	t1@.1@ = t2@.0@; $\C{// position qualification}$
227	int x, y, z;
228	t1 = [ x, y, z ]; $\C{// interact with regular tuples}$
229	[ x, y, z ] = t1;
230	bar( t2 ); $\C{// bar defined above}$
231	\end{cfa}
232	\VRef[Figure]{f:Nesting} shows the difference is nesting of structures and tuples.
233	The left \CC nested-structure is named so it is not flattened.
234	The middle C/\CC nested-structure is unnamed and flattened, causing an error because @i@ and @j@ are duplication names.
235	The right \CFA nested tuple cannot be named and is flattened.
236	C allows named nested-structures, but they have issues \see{\VRef{s:inlineSubstructure}}.
237	Note, it is common in C to have an unnamed @union@ so its fields do not require qualification.
238
239	\begin{figure}
240	\setlength{\tabcolsep}{20pt}
241	\begin{tabular}{@{}ll@{\hspace{90pt}}l@{}}
242	\multicolumn{1}{c}{\CC} & \multicolumn{1}{c}{C/\CC} & \multicolumn{1}{c}{tuple} \\
243	\begin{cfa}
244	struct S {
245	struct @T@ { // not flattened
246	int i, j;
247	};
248	int i, j;
249	};
250	\end{cfa}
251	&
252	\begin{cfa}
253	struct S2 {
254	struct ${\color{red}/* unnamed */}$ { // flatten
255	int i, j;
256	};
257	int i, j;
258	};
259	\end{cfa}
260	&
261	\begin{cfa}
262	[
263	[ // flatten
264	1, 2
265	]
266	1, 2
267	]
268	\end{cfa}
269	\end{tabular}
270	\caption{Nesting}
271	\label{f:Nesting}
272	\end{figure}
273
274	The primary issues for tuples in the \CFA type system are polymorphism and conversions.
275	Specifically, does it make sense to have a generic (polymorphic) tuple type, as is possible for a structure?
276	\begin{cfa}
277	forall( T, S ) [ T, S ] GT; // polymorphic tuple type
278	GT( int, char ) @gt@;
279	GT( int, double ) @gt@;
280	@gt@ = [ 3, 'a' ]; // select correct gt
281	@gt@ = [ 3, 3.5 ];
282	\end{cfa}
283	and what is the cost model for C conversions across multiple values?
284	\begin{cfa}
285	gt = [ 'a', 3L ]; // select correct gt
286	\end{cfa}
287
288
289	\section{Tuple Implementation}
290
291	As noted, tradition languages manipulate multiple values by in/out parameters and/or structures.
292	K-W C adopted the structure for tuple values or variables, and as needed, the fields are extracted by field access operations.
293	As well, for the tuple-assignment implementation, the left-hand tuple expression is expanded into assignments of each component, creating temporary variables to avoid unexpected side effects.
294	For example, the tuple value returned from @foo@ is a structure, and its fields are individually assigned to a left-hand tuple, @x@, @y@, @z@, \emph{or} copied directly into a corresponding tuple variable.
295
296	In the second implementation of \CFA tuples by Rodolfo Gabriel Esteves~\cite{Esteves04}, a different strategy is taken to handle MVR functions.
297	The return values are converted to output parameters passed in by pointers.
298	When the return values of a MVR function are directly used in an assignment expression, the addresses of the left-hand operands can be directly passed into the function;
299	composition of MVR functions is handled by creating temporaries for the returns.
300	For example, given a function returning two values:
301	\begin{cfa}
302	[int, int] gives_two() { int r1, r2; ... return [ r1, r2 ]; }
303	int x, y;
304	[x, y] = gives_two();
305	\end{cfa}
306	\VRef[Figure]{f:AlternateTupleImplementation} shows the two implementation approaches.
307	In the left approach, the return statement is rewritten to pack the return values into a structure, which is returned by value, and the structure fields are indiviually assigned to the left-hand side of the assignment.
308	In the right approach, the return statement is rewritten as direct assignments into the passed-in argument addresses.
309	The right imlementation looks more concise and saves unnecessary copying.
310	The downside is indirection within @gives_two@ to access values, unless values get hoisted into registers for some period of time, which is common.
311
312	\begin{figure}
313	\begin{cquote}
314	\setlength{\tabcolsep}{20pt}
315	\begin{tabular}{@{}ll@{}}
316	Till K-W C implementation & Rodolfo \CFA implementation \\
317	\begin{cfa}
318	struct _tuple2 { int _0; int _1; }
319	struct _tuple2 gives_two() {
320	... struct _tuple2 ret = { r1, r2 };
321	return ret;
322	}
323	int x, y;
324	struct _tuple2 _tmp = gives_two();
325	x = _tmp._0; y = _tmp._1;
326	\end{cfa}
327	&
328	\begin{cfa}
329
330	void gives_two( int * r1, int * r2 ) {
331	... r1 = ...; r2 = ...;
332	return;
333	}
334	int x, y;
335
336	gives_two( &x, &y );
337	\end{cfa}
338	\end{tabular}
339	\end{cquote}
340	\caption{Alternate Tuple Implementation}
341	\label{f:AlternateTupleImplementation}
342	\end{figure}
343
344	Interestingly, in the third implementation of \CFA tuples by Robert Schluntz~\cite[\S~3]{Schluntz17}, the MVR functions revert back to structure based, where it remains in the current version of \CFA.
345	The reason for the reversion was to have a uniform approach for tuple values/variables making tuples first-class types in \CFA, \ie allow tuples with corresponding tuple variables.
346	This extension was possible, because in parallel with Schluntz's work, generic types were added independently by Moss~\cite{Moss19}, and the tuple variables leveraged the same implementation techniques as the generic variables.
347	\PAB{I'm not sure about the connection here. Do you have an example of what you mean?}
348
349	However, after experience gained building the \CFA runtime system, making tuple-types first-class seems to add little benefit.
350	The main reason is that tuples usages are largely unstructured,
351	\begin{cfa}
352	[int, int] foo( int, int ); $\C[2in]{// unstructured function}$
353	typedef [int, int] Pair; $\C{// tuple type}$
354	Pair bar( Pair ); $\C{// structured function}$
355	int x = 3, y = 4;
356	[x, y] = foo( x, y ); $\C{// unstructured call}$
357	Pair ret = [3, 4]; $\C{// tuple variable}$
358	ret = bar( ret ); $\C{// structured call}\CRT$
359	\end{cfa}
360	Basically, creating the tuple-type @Pair@ is largely equivalent to creating a @struct@ type, and creating more types and names defeats the simplicity that tuples are trying to achieve.
361	Furthermore, since operator overloading in \CFA is implemented by treating operators as overloadable functions, tuple types are very rarely used in a structured way.
362	When a tuple-type expression appears in a function call (except assignment expressions, which are handled differently by mass- or multiple-assignment expansions), it is always flattened, and the tuple structure of function parameter is not considered a part of the function signatures.
363	For example,
364	\begin{cfa}
365	void f( int, int );
366	void f( @[@ int, int @]@ );
367	f( 3, 4 ); // ambiguous call
368	\end{cfa}
369	the two prototypes for @foo@ have the same signature (a function taking two @int@s and returning nothing), and therefore invalid overloads.
370	Note, the ambiguity error occurs at the call rather than at the second declaration of @f@, because it is possible to have multiple equivalent prototype definitions of a function.
371	Furthermore, ordinary polymorphic type-parameters are not allowed to have tuple types.
372	\begin{cfa}
373	forall( T ) T foo( T );
374	int x, y, z;
375	[x, y, z] = foo( [x, y, z] ); // substitute tuple type for T
376	\end{cfa}
377	Without this restriction, the expression resolution algorithm can create too many argument-parameter matching options.
378	For example, with multiple type parameters,
379	\begin{cfa}
380	forall( T, U ) void f( T, U );
381	f( [1, 2, 3, 4] );
382	\end{cfa}
383	the call to @f@ can be interpreted as @T = [1]@ and @U = [2, 3, 4, 5]@, or @T = [1, 2]@ and @U = [3, 4, 5]@, and so on.
384	The restriction ensures type checking remains tractable and does not take too long to compute.
385	Therefore, tuple types are never present in any fixed-argument function calls, because of the flattening.
386
387	Finally, a type-safe variadic argument signature was added by Robert Schluntz~\cite[\S~4.1.2]{Schluntz17} using @forall@ and a new tuple parameter-type, denoted by the keyword @ttype@ in Schluntz's implementation, but changed to the ellipsis syntax similar to \CC's template parameter pack.
388	For C variadics, \eg @va_list@, the number and types of the arguments must be conveyed in some way, \eg @printf@ uses a format string indicating the number and types of the arguments.
389	\VRef[Figure]{f:CVariadicMaxFunction} shows an $N$ argument @maxd@ function using the C untyped @va_list@ interface.
390	In the example, the first argument is the number of following arguments, and the following arguments are assumed to be @double@;
391	looping is used to traverse the argument pack from left to right.
392	The @va_list@ interface is walking up the stack (by address) looking at the arguments pushed by the caller.
393	(Magic knowledge is needed for arguments pushed using registers.)
394
395	\begin{figure}
396	\begin{cfa}
397	double maxd( int @count@, @...@ ) { // ellipse parameter
398	double max = 0;
399	va_list args;
400	va_start( args, count );
401	for ( int i = 0; i < count; i += 1 ) {
402	double num = va_arg( args, double );
403	if ( num > max ) max = num;
404	}
405	va_end(args);
406	return max;
407	}
408	printf( "%g\n", maxd( @4@, 25.0, 27.3, 26.9, 25.7 ) );
409	\end{cfa}
410	\caption{C Variadic Maximum Function}
411	\label{f:CVariadicMaxFunction}
412	\end{figure}
413
414	There are two common patterns for using the variadic functions in \CFA.
415	\begin{enumerate}[leftmargin=*]
416	\item
417	Argument forwarding to another function, \eg:
418	\begin{cfa}
419	forall( T *, TT ... \| { @void ?{}( T &, TT );@ } ) // constructor assertion
420	T * new( TT tp ) { return ((T *)malloc())@{@ tp @}@; } // call constructor on storage
421	\end{cfa}
422	Note, the assertion on @T@ requires it to have a constructor @?{}@.
423	The function @new@ calls @malloc@ to obtain storage and then invokes @T@'s constructor passing the tuple pack by flattening it over the constructor's arguments, \eg:
424	\begin{cfa}
425	struct S { int i, j; };
426	void ?{}( S & s, int i, int j ) { s.[ i, j ] = [ i, j ]; } // constructor
427	S * sp = new( 3, 4 ); // flatten [3, 4] into call ?{}( 3, 4 );
428	\end{cfa}
429	\item
430	Structural recursion for processing the argument-pack values one at a time, \eg:
431	\begin{cfa}
432	forall( T \| { int ?>?( T, T ); } )
433	T max( T v1, T v2 ) { return v1 > v2 ? v1 : v2; }
434	$\vspace{-10pt}$
435	forall( T, TT ... \| { T max( T, T ); T max( TT ); } )
436	T max( T arg, TT args ) { return max( arg, max( args ) ); }
437	\end{cfa}
438	The first non-recursive @max@ function is the polymorphic base-case for the recursion, \ie, find the maximum of two identically typed values with a greater-than (@>@) operator.
439	The second recursive @max@ function takes two parameters, a @T@ and a @TT@ tuple pack, handling all argument lengths greater than two.
440	The recursive function computes the maximum for the first argument and the maximum value of the rest of the tuple pack.
441	The call of @max@ with one argument is the recursive call, where the tuple pack is converted into two arguments by taking the first value (lisp @car@) from the tuple pack as the first argument (flattening) and the remaining pack becomes the second argument (lisp @cdr@).
442	The recursion stops when the argument pack is empty.
443	For example, @max( 2, 3, 4 )@ matches with the recursive function, which performs @return max( 2, max( [3, 4] ) )@ and one more step yields @return max( 2, max( 3, 4 ) )@, so the tuple pack is empty.
444	\end{enumerate}
445
446	As an aside, polymorphic functions are precise with respect to their parameter types, \eg @max@ states all argument values must be the same type, which logically makes sense.
447	However, this precision precludes normal C conversions among the base types, \eg, this mix-mode call @max( 2h, 2l, 3.0f, 3.0ld )@ fails because the types are not the same.
448	Unfortunately, this failure violates programmer intuition because there are specialized two-argument non-polymorphic versions of @max@ that work, \eg @max( 3, 3.5 )@.
449	Allowing C conversions for polymorphic types will require a significant change to the type resolver.
450
451	Currently in \CFA, variadic polymorphic functions are the only place tuple types are used.
452	And because \CFA compiles polymorphic functions versus template expansion, many wrapper functions are generated to implement both user-defined generic-types and polymorphism with variadics.
453	Fortunately, the only permitted operations on polymorphic function parameters are given by the list of assertion (trait) functions.
454	Nevertheless, this small set of functions eventually need to be called with flattened tuple arguments.
455	Unfortunately, packing the variadic arguments into a rigid @struct@ type and generating all the required wrapper functions is significant work and largely wasted because most are never called.
456	Interested readers can refer to pages 77-80 of Robert Schluntz's thesis to see how verbose the translator output is to implement a simple variadic call with 3 arguments.
457	As the number of arguments increases, \eg a call with 5 arguments, the translator generates a concrete @struct@ types for a 4-tuple and a 3-tuple along with all the polymorphic type data for them.
458	An alternative approach is to put the variadic arguments into an array, along with an offset array to retrieve each individual argument.
459	This method is similar to how the C @va_list@ object is used (and how \CFA accesses polymorphic fields in a generic type), but the \CFA variadics generate the required type information to guarantee type safety.
460	For example, given the following heterogeneous, variadic, typed @print@ and usage.
461	\begin{cquote}
462	\begin{tabular}{@{}ll@{}}
463	\begin{cfa}
464	forall( T, TT ... \| { void print( T ); void print( TT ); } )
465	void print( T arg , TT rest ) {
466	print( arg );
467	print( rest );
468	}
469	\end{cfa}
470	&
471	\begin{cfa}
472	void print( int i ) { printf( "%d ", i ); }
473	void print( double d ) { printf( "%g ", d ); }
474	... // other types
475	int i = 3 ; double d = 3.5;
476	print( i, d );
477	\end{cfa}
478	\end{tabular}
479	\end{cquote}
480	it would look like the following using the offset-array implementation approach.
481	\begin{cfa}
482	void print( T arg, char * _data_rest, size_t * _offset_rest ) {
483	print( arg );
484	print( ((typeof rest.0)) _data_rest, $\C{// first element of rest}$
485	_data_rest + _offset_rest[0], $\C{// remainder of data}$
486	_offset_rest + 1); $\C{// remainder of offset array}$
487	}
488	\end{cfa}
489	where the fixed-arg polymorphism for @T@ can be handled by the standard @void *@-based \CFA polymorphic calling conventions, and the type information can all be deduced at the call site.
490	Note, the variadic @print@ supports heterogeneous types because the polymorphic @T@ is not returned (unlike variadic @max@), so there is no cascade of type relationships.
491
492	Turning tuples into first-class values in \CFA does have a few benefits, namely allowing pointers to tuples and arrays of tuples to exist.
493	However, it seems unlikely that these types have realistic use cases that cannot be achieved without them.
494	And having a pointer-to-tuple type potentially forbids the simple offset-array implementation of variadic polymorphism.
495	For example, in the case where a type assertion requests the pointer type @TT *@ in the above example, it forces the tuple type to be a @struct@, and thus incurring a high cost.
496	My conclusion is that tuples should not be structured (first-class), rather they should be unstructured.
497	This agrees with Rodolfo's original describes
498	\begin{quote}
499	As such, their [tuples] use does not enforce a particular memory layout, and in particular, does not guarantee that the components of a tuple occupy a contiguous region of memory.~\cite[pp.~74--75]{Esteves04}
500	\end{quote}
501	allowing the simplified implementations for MVR and variadic functions.
502
503	Finally, a missing feature for tuples is using an MVR in an initialization context.
504	Currently, this feature is \emph{only} possible when declaring a tuple variable.
505	\begin{cfa}
506	[int, int] ret = gives_two(); $\C{// no constructor call (unstructured)}$
507	Pair ret = gives_two(); $\C{// constructor call (structured)}$
508	\end{cfa}
509	However, this forces the programer to use a tuple variable and possibly a tuple type to support a constructor, when they actually want separate variables with separate constructors.
510	And as stated previously, type variables (structured tuples) are rare in general \CFA programming so far.
511	To address this issue, while retaining the ability to leverage constructors, the following new tuple-like declaration syntax is proposed.
512	\begin{cfa}
513	[ int x, int y ] = gives_two();
514	\end{cfa}
515	where the semantics is:
516	\begin{cfa}
517	T t0, t1;
518	[ t0, t1 ] = gives_two();
519	T x = t0; // constructor call
520	T y = t1; // constructor call
521	\end{cfa}
522	and the implementation performs as much copy elision as possible.
523
524
525	\section{\lstinline{inline} Substructure}
526	\label{s:inlineSubstructure}
527
528	As mentioned \see{\VRef[Figure]{f:Nesting}}, C allows an anonymous aggregate type (@struct@ or @union@) to be embedded (nested) within another one, \eg a tagged union.
529	\begin{cfa}
530	struct S {
531	unsigned int tag;
532	union { $\C{// anonymous nested aggregate}$
533	int x; double y; char z;
534	};
535	} s;
536	\end{cfa}
537	The @union@ field-names are hoisted into the @struct@, so there is direct access, \eg @s.x@;
538	hence, field names must be unique.
539	For a nested anonymous @struct@, both field names and values are hoisted.
540	\begin{cquote}
541	\begin{tabular}{@{}l@{\hspace{35pt}}l@{}}
542	original & rewritten \\
543	\begin{cfa}
544	struct S {
545	struct { int i, j; };
546	struct { int k, l; };
547	};
548	\end{cfa}
549	&
550	\begin{cfa}
551	struct S {
552	int i, j;
553	int k, l;
554	};
555	\end{cfa}
556	\end{tabular}
557	\end{cquote}
558
559	As an aside, C nested \emph{named} aggregates behave in a (mysterious) way because the nesting is allowed but there is no ability to use qualification to access an inner type, like the \CC type operator `@::@'.
560	\emph{In fact, all named nested aggregates are hoisted to global scope, regardless of the nesting depth.}
561	\begin{cquote}
562	\begin{tabular}{@{}l@{\hspace{35pt}}l@{}}
563	original & rewritten \\
564	\begin{cfa}
565	struct S {
566	struct T {
567	int i, j;
568	};
569	struct U {
570	int k, l;
571	};
572	};
573	\end{cfa}
574	&
575	\begin{cfa}
576	struct T {
577	int i, j;
578	};
579	struct U {
580	int k, l;
581	};
582	struct S {
583	};
584	\end{cfa}
585	\end{tabular}
586	\end{cquote}
587	Hence, the possible accesses are:
588	\begin{cfa}
589	struct S s; // s cannot access any fields
590	struct T t; t.i; t.j;
591	struct U u; u.k; u.l;
592	\end{cfa}
593	and the hoisted type names can clash with global type names.
594	For good reasons, \CC chose to change this semantics:
595	\begin{cquote}
596	\begin{description}[leftmargin=*,topsep=0pt,itemsep=0pt,parsep=0pt]
597	\item[Change:] A struct is a scope in C++, not in C.
598	\item[Rationale:] Class scope is crucial to C++, and a struct is a class.
599	\item[Effect on original feature:] Change to semantics of well-defined feature.
600	\item[Difficulty of converting:] Semantic transformation.
601	\item[How widely used:] C programs use @struct@ extremely frequently, but the change is only noticeable when @struct@, enumeration, or enumerator names are referred to outside the @struct@.
602	The latter is probably rare.
603	\end{description}
604	\hfill ISO/IEC 14882:1998 (\CC Programming Language Standard)~\cite[C.1.2.3.3]{ANSI98:C++}
605	\end{cquote}
606	However, there is no syntax to access from a variable through a type to a field.
607	\begin{cfa}
608	struct S s; @s::T@.i; @s::U@.k;
609	\end{cfa}
610	\CFA chose to adopt the \CC non-compatible change for nested types, since \CC's change has already forced certain coding changes in C libraries that must be parsed by \CC.
611	\CFA also added the ability to access from a variable through a type to a field.
612	\begin{cfa}
613	struct S s; @s.T@.i; @s.U@.k;
614	\end{cfa}
615
616	% https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html
617
618	A polymorphic extension to nested aggregates appears in the Plan-9 C dialect, used in the Bell Labs' Plan-9 research operating system.
619	The feature is called \newterm{unnamed substructures}~\cite[\S~3.3]{Thompson90new}, which continues to be supported by @gcc@ and @clang@ using the extension (@-fplan9-extensions@).
620	The goal is to provided the same effect of the nested aggregate with the aggregate type defined elsewhere, which requires it be named.
621	\begin{cfa}
622	union U { $\C{// unnested named}$
623	int x; double y; char z;
624	} u;
625	struct W {
626	int i; double j; char k;
627	} w;
628	struct S {
629	@struct W;@ $\C{// Plan-9 substructure}$
630	unsigned int tag;
631	@union U;@ $\C{// Plan-9 substructure}$
632	} s;
633	\end{cfa}
634	Note, the position of the substructure is normally unimportant, unless there is some form of memory or @union@ overlay.
635	Like an anonymous nested type, a named nested Plan-9 type has its field names hoisted into @struct S@, so there is direct access, \eg @s.x@ and @s.i@.
636	Hence, the field names must be unique, unlike \CC nested types, but the type names are at a nested scope level, unlike type nesting in C.
637	In addition, a pointer to a structure is automatically converted to a pointer to an anonymous field for assignments and function calls, providing containment inheritance with implicit subtyping, \ie @U@ $\subset$ @S@ and @W@ $\subset$ @S@, \eg:
638	\begin{cfa}
639	void f( union U * u );
640	void g( struct W * );
641	union U * up; struct W * wp; struct S * sp;
642	up = &s; $\C{// assign pointer to U in S}$
643	wp = &s; $\C{// assign pointer to W in S}$
644	f( &s ); $\C{// pass pointer to U in S}$
645	g( &s ); $\C{// pass pointer to W in S}$
646	\end{cfa}
647	Note, there is no value assignment, such as, @w = s@, to copy the @W@ field from @S@.
648
649	Unfortunately, the Plan-9 designers did not lookahead to other useful features, specifically nested types.
650	This nested type compiles in \CC and \CFA.
651	\begin{cfa}
652	struct R {
653	@struct T;@ $\C[2in]{// forward declaration, conflicts with Plan-9 syntax}$
654	struct S { $\C{// nested types, mutually recursive reference}\CRT$
655	S * sp; T * tp; ...
656	};
657	struct T {
658	S * sp; T * tp; ...
659	};
660	};
661	\end{cfa}
662	Note, the syntax for the forward declaration conflicts with the Plan-9 declaration syntax.
663
664	\CFA extends the Plan-9 substructure by allowing polymorphism for values and pointers, where the extended substructure is denoted using @inline@.
665	\begin{cfa}
666	struct S {
667	@inline@ struct W; $\C{// extended Plan-9 substructure}$
668	unsigned int tag;
669	@inline@ U; $\C{// extended Plan-9 substructure}$
670	} s;
671	\end{cfa}
672	Note, the declaration of @U@ is not prefixed with @union@.
673	Like \CC, \CFA allows optional prefixing of type names with their kind, \eg @struct@, @union@, and @enum@, unless there is ambiguity with variable names in the same scope.
674	In addition, a semi-non-compatible change is made so that Plan-9 syntax means a forward declaration in a nested type.
675	Since the Plan-9 extension is not part of C and rarely used, this change has minimal impact.
676	Hence, all Plan-9 semantics are denoted by the @inline@ qualifier, which good ``eye-candy'' when reading a structure definition to spot Plan-9 definitions.
677	Finally, the following code shows the value and pointer polymorphism.
678	\begin{cfa}
679	void f( U, U * ); $\C{// value, pointer}$
680	void g( W, W * ); $\C{// value, pointer}$
681	U u, * up; S s, * sp; W w, * wp;
682	u = s; up = sp; $\C{// value, pointer}$
683	w = s; wp = sp; $\C{// value, pointer}$
684	f( s, &s ); $\C{// value, pointer}$
685	g( s, &s ); $\C{// value, pointer}$
686	\end{cfa}
687
688	In general, non-standard C features (@gcc@) do not need any special treatment, as they are directly passed through to the C compiler.
689	However, the Plan-9 semantics allow implicit conversions from the outer type to the inner type, which means the \CFA type resolver must take this information into account.
690	Therefore, the \CFA resolver must implement the Plan-9 features and insert necessary type conversions into the translated code output.
691	In the current version of \CFA, this is the only kind of implicit type conversion other than the standard C conversions.
692
693	Plan-9 polymorphism can result in duplicate field names.
694	For example, the \newterm{diamond pattern}~\cite[\S~6.1]{Stroustrup89}\cite[\S~4]{Cargill91} can result in nested fields being embedded twice.
695	\begin{cfa}
696	struct A { int x; };
697	struct B { inline A; };
698	struct C { inline A; };
699	struct D {
700	inline B; // B.x
701	inline C; // C.x
702	} d;
703	\end{cfa}
704	Because the @inline@ structures are flattened, the expression @d.x@ is ambiguous, as it can refer to the embedded field either from @B@ or @C@.
705	@gcc@ generates a syntax error about the duplicate member @x@.
706	The equivalent \CC definition compiles:
707	\begin{c++}
708	struct A { int x; };
709	struct B : public A {};
710	struct C : public A {};
711	struct D : @public B, C@ { // multiple inheritance
712	} d;
713	\end{c++}
714	and again the expression @d.x@ is ambiguous.
715	While \CC has no direct syntax to disambiguate @x@, \ie @d.B.x@ or @d.C.x@, it is possible with casts, @((B)d).x@ or @((C)d).x@.
716	Like \CC, \CFA compiles the Plan-9 version and provides direct syntax and casts to disambiguate @x@.
717	While ambiguous definitions are allowed, duplicate field names is poor practice and should be avoided if possible.
718	However, when a programmer does not control all code, this problem can occur and a naming workaround should exist.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: