source: doc/theses/fangren_yu_MMath/content1.tex @ 9c2ac95

Last change on this file since 9c2ac95 was 9c2ac95, checked in by Fangren Yu <f37yu@…>, 4 hours ago

update

  • Property mode set to 100644
File size: 18.7 KB
Line 
1\chapter{Recent Features Introduced to \CFA}
2\label{c:content1}
3
4This chapter discusses some recent additions to the \CFA language and their interactions with the type system.
5
6\section{Reference Types}
7
8Reference types are added to \CFA by Robert Schluntz in conjunction to his work on resource management. \CFA reference type is similar to \CC reference type but with its own added features.
9
10The main difference between \CFA and \CC references is that \CC references are immutable, while \CFA supports reference rebinding operations. In \CC, references are mostly used in function parameters for pass-by-reference semantics, and in class members, which must be initialized during construction. Merely declaring a variable of reference type has little use as it only creates an alias of an existing variable. In contrast, \CFA reference variables can be reassigned (rebinded) and reference to reference is also allowed.
11
12This example is taken from the feature demonstration page of \CFA: \footnote{Currently there are no plans of introducing \CC rvalue references to \CFA. Readers should not confuse the multi-reference declarations with \CC rvalue reference syntax.}
13
14\begin{cfa}
15int x = 1, y = 2, z = 3;
16int * p1 = &x, ** p2 = &p1,  *** p3 = &p2, // pointers to x
17        & r1 = x,  && r2 = r1,   &&& r3 = r2;  // references to x
18int * p4 = &z, & r4 = z;
19
20*p1 = 3; **p2 = 3; ***p3 = 3;                   // change x
21 r1 =  3;       r2 = 3;         r3 = 3;                 // change x: implicit dereference *r1, **r2, ***r3
22**p3 = &y;      *p3 = &p4;                                      // change p1, p2
23// cancel implicit dereferences (&*)**r3, (&(&*)*)*r3, &(&*)r4
24&r3 = &y; &&r3 = &&r4;                                  // change r1, r2
25\end{cfa}
26
27A different syntax is required for reassigning to a reference variable itself, since auto-deferencing is always performed and the expression \texttt{r1} would mean the \texttt{int} variable that \texttt{r1} referenced to instead. Using \CFA's reference types (including multi-references) we can actually describe the "lvalue" rules in C by types only, and the concept of lvalue could have been completely dropped off. However, since the cfa-cc program is not a full compiler but a transpiler from \CFA to C, lvalueness is still used in some places for code generation purposes, while the type checker now works on just types without needing to consider lvalueness of an expression.
28
29The current typing rules used in \CFA can be summarized as follows:
30
31\begin{enumerate}
32    \item For a variable x with declared type T, the variable-expression x has type reference to T, even if T itself is a reference type.
33    \item For an expression e with type $T\&_1...\&_n$ i.e. T followed by n references, where T is not a reference type, the expression \&T (address of T) has type T* followed by n-1 references.
34    \item For an expression e with type T* followed by n references, *T has type T followed by n+1 references. This is the reverse of previous rule, such that address-of and dereference operators are perfect inverses.
35    \item When matching parameter and argument types in a function call context, the number of references on the argument type is always stripped off to match the number of references on the parameter type \footnote{\CFA allows rvalue expressions be converted to reference values by implicitly creating a temporary variable, with some restrictions.}. In an assignment context, the left-hand side operand type is always reduced to a single reference.
36\end{enumerate}
37
38Under this ruleset, in a function call context, a type parameter will never be bound to a reference type. For example given the declarations
39
40\begin{cfa}
41    forall (T) void f (T&);
42
43    int &x;
44\end{cfa}
45
46The call f(x); will apply implicit dereference once to x so the call is typed f(int\&) with T=int, rather than with T=int\&.
47
48While initially the design of reference types in \CFA seeks to make it more like a "real" data type than reference in \CC, which mostly only serves the purpose of choosing argument-passing methods (by-value or by-reference) in function calls, the inherent ambiguity of auto-dereferencing still limits the behavior of reference types in \CFA polymorphic functions. Moreover, there is also some discrepancy of how the reference types are treated in initialization and assignment expressions. In the former case no implicit dereference is applied at all (see line 3 of example code) and in the latter case there is actually no assignment operators defined for reference types; the reassignment of reference variables uses the assignment operators for pointer types instead. There is also an annoying issue (although purely syntactic) that to pass in a null value for reference initialization one has to write \texttt{int \& r1 = *0p;} which looks like dereferencing a null pointer, but the dereferencing operation does not actually happen and the expression is eventually translated into initializing the underlying pointer value to null.
49
50This second point of difference would prevent the type system to treat reference types the same way as other types in many cases even if we allow type variables to be bound to reference types. This is because \CFA uses the common "object" trait (constructor, destructor and assignment operators) to handle passing dynamic concrete type arguments into polymorphic functions, and the reference types are handled differently in these contexts so they do not satisfy this common interface.
51
52When generic types are introduced to \CFA, some thoughts had been put into allowing reference types as type arguments. While it is possible to write a declaration such as \texttt{vector(int\&)} for a container of reference variables, it will be disallowed in assertion checking if the generic type in question requires the object trait for the type argument (a fairly common use case) and even if the object trait can be made as non-required, the current type system often misbehaves by adding undesirable auto-dereference and operate on the referenced-to value rather than the reference variable itself as intended. Some tweaks would have to be made to accommodate reference types in polymorphic contexts and we are still not sure of what can or cannot be achieved. Currently we have to reside on using pointer types and giving up the benefits of auto-dereference operations on reference types.
53
54
55
56\section{Tuple Types}
57
58The addition of tuple types to \CFA can be traced back to the original design by David Till in K-W C, a predecessor project of \CFA. The introduction of tuples was aiming to eliminate the need of having output parameters or defining an aggregate type in order to return multiple values from a function. In the K-W C design, tuples can be thought of as merely a syntactic sugar as it is not allowed to define a variable with tuple type. The usage of tuples are restricted to argument passing and assignments, and the translator implementation converts tuple assignments by expanding the tuple assignment expressions to assignments of each component, creating temporary variables to avoid unexpected side effects when necessary. As in the case of a function returning multiple values (thereafter called MVR functions), a struct type is created for the returning tuple and the values are extracted by field access operations.
59
60In an early implementation of tuples in \CFA made by Rodolfo Gabriel Esteves, a different strategy is taken to handle MVR functions. The return values are converted to output parameters passed in by pointers. When the return values of a MVR function are directly used in an assignment expression, the addresses of the left-hand operands can be directly passed in to the function; composition of MVR functions is handled by creating temporaries for the returns.
61
62Suppose we have a function returning two values as follows:
63
64\begin{cfa}
65    [int, int] gives_two();
66
67    int x,y;
68    [x,y] = gives_two();
69\end{cfa}
70
71The K-W C implementation translates the program to
72
73\begin{cfa}
74    struct _tuple2 { int _0; int _1; }
75    struct _tuple2 gives_two();
76    int x,y;
77    struct _tuple2 _tmp = gives_two();
78    x = _tmp._0; y = _tmp._1;
79\end{cfa}
80
81While the Rodolfo implementation translates it to
82
83\begin{cfa}
84    void gives_two(int *, int *);
85    int x,y;
86    gives_two(&x, &y);
87\end{cfa}
88
89and inside the body of the function \text{gives\_two}, the return statement is rewritten to assignments into the passed-in addresses.
90
91The latter implementation looks much more concise, and in the case of returning values having nontrivial types (e.g. structs), this implementation can also save some unnecessary copying.
92
93Interestingly, in Robert Schluntz's rework of the tuple type, the implementation got reverted back to struct-based, and it remained in the current version of cfa-cc translator. During the same time of his work, generic types were being added into \CFA independently as another feature, and the tuple type was changed to use the same implementation techniques of generic types. Consequently, it made tuples become first-class values in \CFA.
94
95However, upon further investigation, making tuple types first-class has very little benefits in \CFA, mainly because that function call semantics with tuples are designed to be unstructured, and that since operator overloading in \CFA are implemented by treating every overloadable operator as functions, tuple types are very rarely used in a structured way. When a tuple-type expression appears in a function call (except assignment expressions, which are handled differently by mass- or multiple-assignment expansions), it is always flattened, and the tuple structure of function parameter is not considered a part of the function signature, for example
96
97\begin{cfa}
98    void f(int, int);
99    void f([int, int]);
100\end{cfa}
101
102are considered to have the same signature (a function taking two ints and returning nothing), and therefore not valid overloads. Furthermore, ordinary polymorphic type parameters are not allowed to have tuple types in order to restrict the expression resolution algorithm to not create too many argument-parameter matching options, such that the type-checking problem remains tractable and does not take too long to compute. Therefore tuple types are never present in any fixed-argument function calls.
103
104A type-safe variadic argument signature was proposed using \CFA's \texttt{forall} clause and a new tuple parameter type, denoted previously by the \texttt{ttype} keyword and now by the ellipsis syntax similar to \CC's template parameter pack.
105
106The C \texttt{printf} function, for example, can be rewritten using the new variadic argument, in favor of the C untyped \texttt{va\_list} interface as
107
108\begin{cfa}
109    forall (TT...) int printf(char *fmt, TT args);
110\end{cfa}
111
112Note that this is just for illustration purposes, as type assertions are needed to actually achieve type safety, and \CFA's I/O library does not use a format string since argument types are inferred by the type system.
113
114There are two common patterns for using the variadic function in \CFA: one is to forward the arguments to another function
115
116\begin{cfa}
117    forall(T*, TT... | {void ?{}(T &, TT);})
118    T* new (T, TT) { return ((T*)malloc()){TT}; }
119\end{cfa}
120
121and the other is structural recursion which extracts arguments one at a time
122
123\begin{cfa}
124    forall( ostype &, T, Params... | { ostype & ?|?( ostype &, T); ostype & ?|?( ostype &, Params ); } )
125        ostype & ?|?( ostype & os, T arg, Params rest );
126\end{cfa}
127
128The above is the actual implementation of variadic print function in \CFA. \texttt{ostype} represents the output stream, similar to \CC's \texttt{ostream} interface. Note that recursion must be used in order to extract type information of the first argument in the list, as opposed to C \texttt{va\_list} variadics which uses a loop to extract each argument, and generally requires some companion data that provides type information, such as the format string in \texttt{printf}.
129
130Variadic polymorphic functions are somehow currently the only place tuple types are used in functions. And just like the case for user-defined generic types, many wrapper functions need to be generated to implement polymorphism with variadics. However, note that the only permitted operations on polymorphic function parameters are given by the assertion (trait) functions, and those eventually need to be supplied flattened tuple arguments, packing the variadic arguments into a rigid struct type and generating all the required wrapper functions become mostly wasted work. Interested readers can refer to pages 77-80 of Robert Schluntz's thesis to see how verbose the translator output needs to be to implement a simple variadic call with 3 arguments, and it will quickly become even worse if the number of arguments is increased: for a call with 5 arguments the translator would have to generate concrete struct types for a 4-tuple and a 3-tuple along with all the polymorphic type data for them! Instead, a much simpler approach of putting all variadic arguments into a data array and providing an offset array to retrieve each individual argument can be utilized. This method is very similar to how the C \texttt{va\_list} object is used, with \CFA type resolver validating and generating the required type information to guarantee type safety.
131
132The print function example
133
134\begin{cfa}
135    forall(T, Params... | { void print(T); void print(Params ); })
136    void print(T arg , Params rest) {
137        print(arg);
138        print(rest);
139    }
140\end{cfa}
141
142using the offset array approach, can be converted to pseudo-\CFA code (ignoring type assertions for T) as
143
144\begin{cfa}
145    void print(T arg, char* _data_rest, size_t* _offset_rest) {
146        print(arg);
147        print(*((typeof rest.0)*) _data_rest, // first element of rest
148            _data_rest + _offset_rest[0],  // remainder of data
149            _offset_rest + 1);  // remainder of offset array
150    }
151\end{cfa}
152
153where the fixed-arg polymorphism for T can be handled by the standard \texttt{void*}-based \CFA polymorphic calling conventions, and the type information can all be deduced at the call site.
154
155Turning tuples into first-class values in \CFA does have a few benefits, namely allowing pointers to tuples and arrays of tuples to exist. However it seems very unlikely that these types can have realistic use cases that are hard to achieve without them. And indeed having the pointer-to-tuple type to exist at all will potentially forbid the simple offset-array implementation of variadic polymorphism (in case that a type assertion requests the pointer type \texttt{Params*} in the above example, forcing the tuple type to be materialized as a struct), and thus incurring a high cost. Perhaps it is of best interest to keep tuples as non-first-class, as Rodolfo originally describes them to "[does] not enforce a particular memory layout, and in particular, [does] not guarantee that the components of a tuple occupy a contiguous region of memory," and therefore to be able to use the simplified implementations for MVR and variadic functions.
156
157One final topic worth discussing is that the strategy of converting return values to output parameters can be utilized to implement copy elision, which is relevant for \CFA since constructors are introduced to the language. However, the first example given in this section
158
159\begin{cfa}
160    int x,y;
161    [x,y] = gives_two();
162\end{cfa}
163
164actually \textit{cannot} have copy elision applied, since the call to \texttt{gives\_two} appears in an \textit{assignment} context rather than an initialization context, as the variables x and y may be already initialized. Unfortunately \CFA currently does not support declaring variables in tuple form:
165
166\begin{cfa}
167    [int x, int y] = gives_two(); // NOT ALLOWED
168\end{cfa}
169
170It is possible to declare a tuple-typed variable and call MVR functions in initialization context
171
172\begin{cfa}
173    [int, int] ret = gives_two(); // OK
174\end{cfa}
175
176but using the values is more awkward as we cannot give them separate names and have to use \texttt{ret.0} or \texttt{ret.1} to extract the values. If copy elision semantics were to be added to \CFA it would be preferable to allow declaring variables in tuple form to have the benefit of eliding copy construction while giving each variable a unique name.
177
178\section{Plan-9 Struct Inheritance}
179
180Plan-9 Inheritance is a non-standard C feature originally introduced by the C dialect used in Bell Labs' Plan-9 research operating system, and is supported by mainstream C compilers such as GCC and Clang. This feature allows an aggregate type (struct or union) to be embedded into another one with implicit access semantics similar to anonymous substructures.
181
182In standard C, it is possible to define a field with an anonymous struct or union type within another. This is often utilized to implement a tagged union:
183
184\begin{cfa}
185
186struct T {
187  unsigned int tag;
188  union {
189    int x;
190    double y;
191    char z;
192  };
193} t;
194\end{cfa}
195
196The union fields can be directly accessed using their names, such as \texttt{T.x}. With Plan-9 extensions enabled, the same can be applied to a struct or union type defined elsewhere. \CFA uses the inline specifier to denote the anonymously embedded field.
197
198In GCC it is possible to simply use\texttt{struct B \{struct A;\}}
199for the Plan-9 feature; since \CFA no longer requires the struct and union keywords in variable declarations, having a keyword to denote Plan-9 inheritance is preferable.
200
201\begin{cfa}
202  struct A {int x;};
203
204  struct B {
205    inline A;
206    int y;
207  };
208
209  B b;
210  b.x; // accesses the embedded struct A's field
211\end{cfa}
212
213As the \CFA translator simply just reduce the source code to C, usually the non-standard C features do not need any special treatment, and are directly passed down to the C compiler. However, the Plan-9 semantics allow implicit conversions from the outer type to the inner type, which means the type checking algorithm must take that information into account. Therefore, the \CFA translator must implement the Plan-9 features and insert the type conversions into the translated code output. In the current version of \CFA, this is the only kind of implicit type conversion other than the standard C conversions.
214
215Since variable overloading is possible, \CFA's implementation of Plan-9 inheritance allows duplicate field names. When an outer field and an embedded field have the same name and type, the inner field is shadowed and cannot be accessed directly by name. While such definitions are allowed, using duplicate names is not good practice in general and should be avoided if possible.
216
217Plan-9 fields can be nested, and a struct definition can contain multiple Plan-9 embedded fields. In particular, the "diamond pattern" can occur and result in a nested field to be embedded twice.
218
219\begin{cfa}
220  struct A {int x;};
221  struct B {inline A;};
222  struct C {inline A;};
223
224  struct D {
225    inline B;
226    inline C;
227  };
228
229  D d;
230\end{cfa}
231
232In the above example, the expression \texttt{d.x} becomes ambiguous, since it can refer to the indirectly embedded field either from B or from C. It is still possible to disambiguate the expression by first casting the outer struct to one of the directly embedded type, such as \texttt{((B)d).x} 
233
234
235
236
Note: See TracBrowser for help on using the repository browser.