Context Navigation

source: doc/theses/rob_schluntz/ctordtor.tex @ 68e9ace

ADTaaron-thesisarm-ehast-experimentalcleanup-dtorsdeferred_resndemanglerenumforall-pointer-decayjacob/cs343-translationjenkins-sandboxnew-astnew-ast-unique-exprnew-envno_listpersistent-indexerpthread-emulationqualifiedEnumwith_gc

Last change on this file since 68e9ace was 728df66, checked in by Peter A. Buhr <pabuhr@…>, 7 years ago
more documentation name changes
Property mode set to `100644`
File size: 64.9 KB

Line
1	%======================================================================
2	\chapter{Constructors and Destructors}
3	%======================================================================
4
5	% TODO now: as an experiment, implement Andrei Alexandrescu's ScopeGuard http://www.drdobbs.com/cpp/generic-change-the-way-you-write-excepti/184403758?pgno=2
6	% doesn't seem possible to do this without allowing ttype on generic structs?
7
8	Since \CFA is a true systems language, it does not require a garbage collector.
9	As well, \CFA is not an object-oriented programming language, \ie, structures cannot have methods.
10	While structures can have function pointer members, this is different from methods, since methods have implicit access to structure members and methods cannot be reassigned.
11	Nevertheless, one important goal is to reduce programming complexity and increase safety.
12	To that end, \CFA provides support for implicit pre/post-execution of routines for objects, via constructors and destructors.
13
14	This chapter details the design of constructors and destructors in \CFA, along with their current implementation in the translator.
15	Generated code samples have been edited for clarity and brevity.
16
17	\section{Design Criteria}
18	\label{s:Design}
19	In designing constructors and destructors for \CFA, the primary goals were ease of use and maintaining backwards compatibility.
20
21	In C, when a variable is defined, its value is initially undefined unless it is explicitly initialized or allocated in the static area.
22	\begin{cfacode}
23	int main() {
24	int x; // uninitialized
25	int y = 5; // initialized to 5
26	x = y; // assigned 5
27	static int z; // initialized to 0
28	}
29	\end{cfacode}
30	In the example above, @x@ is defined and left uninitialized, while @y@ is defined and initialized to 5.
31	Next, @x@ is assigned the value of @y@.
32	In the last line, @z@ is implicitly initialized to 0 since it is marked @static@.
33	The key difference between assignment and initialization being that assignment occurs on a live object (\ie, an object that contains data).
34	It is important to note that this means @x@ could have been used uninitialized prior to being assigned, while @y@ could not be used uninitialized.
35	Use of uninitialized variables yields undefined behaviour \cite[p.~558]{C11}, which is a common source of errors in C programs.
36
37	Initialization of a declaration is strictly optional, permitting uninitialized variables to exist.
38	Furthermore, declaration initialization is limited to expressions, so there is no way to insert arbitrary code before a variable is live, without delaying the declaration.
39	Many C compilers give good warnings for uninitialized variables most of the time, but they cannot in all cases.
40	\begin{cfacode}
41	int f(int *); // output parameter: never reads, only writes
42	int g(int *); // input parameter: never writes, only reads,
43	// so requires initialized variable
44
45	int x, y;
46	f(&x); // okay - only writes to x
47	g(&y); // uses y uninitialized
48	\end{cfacode}
49	Other languages are able to give errors in the case of uninitialized variable use, but due to backwards compatibility concerns, this is not the case in \CFA.
50
51	In C, constructors and destructors are often mimicked by providing routines that create and tear down objects, where the tear down function is typically only necessary if the type modifies the execution environment.
52	\begin{cfacode}
53	struct array_int {
54	int * x;
55	};
56	struct array_int create_array(int sz) {
57	return (struct array_int) { calloc(sizeof(int)*sz) };
58	}
59	void destroy_rh(struct resource_holder * rh) {
60	free(rh->x);
61	}
62	\end{cfacode}
63	This idiom does not provide any guarantees unless the structure is opaque, which then requires that all objects are heap allocated.
64	\begin{cfacode}
65	struct opqaue_array_int;
66	struct opqaue_array_int * create_opqaue_array(int sz);
67	void destroy_opaque_array(opaque_array_int *);
68	int opaque_get(opaque_array_int *); // subscript
69
70	opaque_array_int * x = create_opaque_array(10);
71	int x2 = opaque_get(x, 2);
72	\end{cfacode}
73	This pattern is cumbersome to use since every access becomes a function call, requiring awkward syntax and a performance cost.
74	While useful in some situations, this compromise is too restrictive.
75	Furthermore, even with this idiom it is easy to make mistakes, such as forgetting to destroy an object or destroying it multiple times.
76
77	A constructor provides a way of ensuring that the necessary aspects of object initialization is performed, from setting up invariants to providing compile- and run-time checks for appropriate initialization parameters.
78	This goal is achieved through a \emph{guarantee} that a constructor is called \emph{implicitly} after every object is allocated from a type with associated constructors, as part of an object's \emph{definition}.
79	Since a constructor is called on every object of a managed type, it is \emph{impossible} to forget to initialize such objects, as long as all constructors perform some sensible form of initialization.
80
81	In \CFA, a constructor is a function with the name @?{}@.
82	Like other operators in \CFA, the name represents the syntax used to call the constructor, \eg, @struct S = { ... };@.
83	Every constructor must have a return type of @void@ and at least one parameter, the first of which is colloquially referred to as the \emph{this} parameter, as in many object-oriented programming-languages (however, a programmer can give it an arbitrary name).
84	The @this@ parameter must have a pointer type, whose base type is the type of object that the function constructs.
85	There is precedence for enforcing the first parameter to be the @this@ parameter in other operators, such as the assignment operator, where in both cases, the left-hand side of the equals is the first parameter.
86	There is currently a proposal to add reference types to \CFA.
87	Once this proposal has been implemented, the @this@ parameter will become a reference type with the same restrictions.
88
89	Consider the definition of a simple type encapsulating a dynamic array of @int@s.
90
91	\begin{cfacode}
92	struct Array {
93	int * data;
94	int len;
95	}
96	\end{cfacode}
97
98	In C, if the user creates an @Array@ object, the fields @data@ and @len@ are uninitialized, unless an explicit initializer list is present.
99	It is the user's responsibility to remember to initialize both of the fields to sensible values, since there are no implicit checks for invalid values or reasonable defaults.
100	In \CFA, the user can define a constructor to handle initialization of @Array@ objects.
101
102	\begin{cfacode}
103	void ?{}(Array * arr){
104	arr->len = 10; // default size
105	arr->data = malloc(sizeof(int)*arr->len);
106	for (int i = 0; i < arr->len; ++i) {
107	arr->data[i] = 0;
108	}
109	}
110	Array x; // allocates storage for Array and calls ?{}(&x)
111	\end{cfacode}
112
113	This constructor initializes @x@ so that its @length@ field has the value 10, and its @data@ field holds a pointer to a block of memory large enough to hold 10 @int@s, and sets the value of each element of the array to 0.
114	This particular form of constructor is called the \emph{default constructor}, because it is called on an object defined without an initializer.
115	In other words, a default constructor is a constructor that takes a single argument: the @this@ parameter.
116
117	In \CFA, a destructor is a function much like a constructor, except that its name is \lstinline!^?{}! \footnote{Originally, the name @~?{}@ was chosen for destructors, to provide familiarity to \CC programmers. Unforunately, this name causes parsing conflicts with the bitwise-not operator when used with operator syntax (see section \ref{sub:syntax}.)} and it takes only one argument.
118	A destructor for the @Array@ type can be defined as:
119	\begin{cfacode}
120	void ^?{}(Array * arr) {
121	free(arr->data);
122	}
123	\end{cfacode}
124	The destructor is automatically called at deallocation for all objects of type @Array@.
125	Hence, the memory associated with an @Array@ is automatically freed when the object's lifetime ends.
126	The exact guarantees made by \CFA with respect to the calling of destructors are discussed in section \ref{sub:implicit_dtor}.
127
128	As discussed previously, the distinction between initialization and assignment is important.
129	Consider the following example.
130	\begin{cfacode}[numbers=left]
131	Array x, y;
132	Array z = x; // initialization
133	y = x; // assignment
134	\end{cfacode}
135	By the previous definition of the default constructor for @Array@, @x@ and @y@ are initialized to valid arrays of length 10 after their respective definitions.
136	On line 2, @z@ is initialized with the value of @x@, while on line 3, @y@ is assigned the value of @x@.
137	The key distinction between initialization and assignment is that a value to be initialized does not hold any meaningful values, whereas an object to be assigned might.
138	In particular, these cases cannot be handled the same way because in the former case @z@ has no array, while @y@ does.
139	A \emph{copy constructor} is used to perform initialization using another object of the same type.
140
141	\begin{cfacode}[emph={other}, emphstyle=\color{red}]
142	void ?{}(Array * arr, Array other) { // copy constructor
143	arr->len = other.len; // initialization
144	arr->data = malloc(sizeof(int)*arr->len)
145	for (int i = 0; i < arr->len; ++i) {
146	arr->data[i] = other.data[i]; // copy from other object
147	}
148	}
149	Array ?=?(Array * arr, Array other) { // assignment
150	^?{}(arr); // explicitly call destructor
151	?{}(arr, other); // explicitly call constructor
152	return *arr;
153	}
154	\end{cfacode}
155	The two functions above handle the cases of initialization and assignment.
156	The first function is called a copy constructor, because it constructs its argument by copying the values from another object of the same type.
157	The second function is the standard copy-assignment operator.
158	\CFA does not currently have the concept of reference types, so the most appropriate type for the source object in copy constructors and assignment operators is a value type.
159	Appropriate care is taken in the implementation to avoid recursive calls to the copy constructor.
160	The four functions (default constructor, destructor, copy constructor, and assignment operator) are special in that they safely control the state of most objects.
161
162	It is possible to define a constructor that takes any combination of parameters to provide additional initialization options.
163	For example, a reasonable extension to the array type would be a constructor that allocates the array to a given initial capacity and initializes the elements of the array to a given @fill@ value.
164	\begin{cfacode}
165	void ?{}(Array * arr, int capacity, int fill) {
166	arr->len = capacity;
167	arr->data = malloc(sizeof(int)*arr->len);
168	for (int i = 0; i < arr->len; ++i) {
169	arr->data[i] = fill;
170	}
171	}
172	\end{cfacode}
173
174	In \CFA, constructors are called implicitly in initialization contexts.
175	\begin{cfacode}
176	Array x, y = { 20, 0xdeadbeef }, z = y;
177	\end{cfacode}
178	Constructor calls look just like C initializers, which allows them to be inserted into legacy C code with minimal code changes, and also provides a very simple syntax that veteran C programmers are familiar with.
179	One downside of reusing C initialization syntax is that it is not possible to determine whether an object is constructed just by looking at its declaration, since that requires knowledge of whether the type is managed at that point in the program.
180
181	This example generates the following code
182	\begin{cfacode}
183	Array x;
184	?{}(&x); // implicit default construct
185	Array y;
186	?{}(&y, 20, 0xdeadbeef); // explicit fill construct
187	Array z;
188	?{}(&z, y); // copy construct
189	^?{}(&z); // implicit destruct
190	^?{}(&y); // implicit destruct
191	^?{}(&x); // implicit destruct
192	\end{cfacode}
193	Due to the way that constructor calls are interleaved, it is impossible for @y@ to be referenced before it is initialized, except in its own constructor.
194	This loophole is minor and exists in \CC as well.
195	Destructors are implicitly called in reverse declaration-order so that objects with dependencies are destructed before the objects they are dependent on.
196
197	\subsection{Calling Syntax}
198	\label{sub:syntax}
199	There are several ways to construct an object in \CFA.
200	As previously introduced, every variable is automatically constructed at its definition, which is the most natural way to construct an object.
201	\begin{cfacode}
202	struct A { ... };
203	void ?{}(A *);
204	void ?{}(A *, A);
205	void ?{}(A *, int, int);
206
207	A a1; // default constructed
208	A a2 = { 0, 0 }; // constructed with 2 ints
209	A a3 = a1; // copy constructed
210	// implicitly destruct a3, a2, a1, in that order
211	\end{cfacode}
212	Since constructors and destructors are just functions, the second way is to call the function directly.
213	\begin{cfacode}
214	struct A { int a; };
215	void ?{}(A *);
216	void ?{}(A *, A);
217	void ^?{}(A *);
218
219	A x; // implicitly default constructed: ?{}(&x)
220	A * y = malloc(); // copy construct: ?{}(&y, malloc())
221
222	^?{}(&x); // explicit destroy x, in different order
223	?{}(&x); // explicit construct x, second construction
224	^?{}(y); // explicit destroy y
225	?{}(y, x); // explit construct y from x, second construction
226
227	// implicit ^?{}(&y);
228	// implicit ^?{}(&x);
229	\end{cfacode}
230	Calling a constructor or destructor directly is a flexible feature that allows complete control over the management of storage.
231	In particular, constructors double as a placement syntax.
232	\begin{cfacode}
233	struct A { ... };
234	struct memory_pool { ... };
235	void ?{}(memory_pool *, size_t);
236
237	memory_pool pool = { 1024 }; // create an arena of size 1024
238
239	A * a = allocate(&pool); // allocate from memory pool
240	?{}(a); // construct an A in place
241
242	for (int i = 0; i < 10; i++) {
243	// reuse storage rather than reallocating
244	^?{}(a);
245	?{}(a);
246	// use a ...
247	}
248	^?{}(a);
249	deallocate(&pool, a); // return to memory pool
250	\end{cfacode}
251	Finally, constructors and destructors support \emph{operator syntax}.
252	Like other operators in \CFA, the function name mirrors the use-case, in that the question marks are placeholders for the first $N$ arguments.
253	This syntactic form is similar to the new initialization syntax in \CCeleven, except that it is used in expression contexts, rather than declaration contexts.
254	\begin{cfacode}
255	struct A { ... };
256	struct B { A a; };
257
258	A x, y, * z = &x;
259	(&x){} // default construct
260	(&x){ y } // copy construct
261	(&x){ 1, 2, 3 } // construct with 3 arguments
262	z{ y }; // copy construct x through a pointer
263	^(&x){} // destruct
264
265	void ?{}(B * b) {
266	(&b->a){ 11, 17, 13 }; // construct a member
267	}
268	\end{cfacode}
269	Constructor operator syntax has relatively high precedence, requiring parentheses around an address-of expression.
270	Destructor operator syntax is actually an statement, and requires parentheses for symmetry with constructor syntax.
271
272	One of these three syntactic forms should appeal to either C or \CC programmers using \CFA.
273
274	\subsection{Constructor Expressions}
275	In \CFA, it is possible to use a constructor as an expression.
276	Like other operators, the function name @?{}@ matches its operator syntax.
277	For example, @(&x){}@ calls the default constructor on the variable @x@, and produces @&x@ as a result.
278	A key example for this capability is the use of constructor expressions to initialize the result of a call to @malloc@.
279	\begin{cfacode}
280	struct X { ... };
281	void ?{}(X *, double);
282	X * x = malloc(){ 1.5 };
283	\end{cfacode}
284	In this example, @malloc@ dynamically allocates storage and initializes it using a constructor, all before assigning it into the variable @x@.
285	Intuitively, the expression-resolver determines that @malloc@ returns some type @T *@, as does the constructor expression since it returns the type of its argument.
286	This type flows outwards to the declaration site where the expected type is known to be @X @, thus the first argument to the constructor must be @X @, narrowing the search space.
287
288	If this extension is not present, constructing dynamically allocated objects is much more cumbersome, requiring separate initialization of the pointer and initialization of the pointed-to memory.
289	\begin{cfacode}
290	X * x = malloc();
291	x{ 1.5 };
292	\end{cfacode}
293	Not only is this verbose, but it is also more error prone, since this form allows maintenance code to easily sneak in between the initialization of @x@ and the initialization of the memory that @x@ points to.
294	This feature is implemented via a transformation producing the value of the first argument of the constructor, since constructors do not themselves have a return value.
295	Since this transformation results in two instances of the subexpression, care is taken to allocate a temporary variable to hold the result of the subexpression in the case where the subexpression may contain side effects.
296	The previous example generates the following code.
297	\begin{cfacode}
298	struct X *_tmp_ctor;
299	struct X *x = ?{}( // construct result of malloc
300	_tmp_ctor=malloc_T( // store result of malloc
301	sizeof(struct X),
302	_Alignof(struct X)
303	),
304	1.5
305	), _tmp_ctor; // produce constructed result of malloc
306	\end{cfacode}
307	It should be noted that this technique is not exclusive to @malloc@, and allows a user to write a custom allocator that can be idiomatically used in much the same way as a constructed @malloc@ call.
308
309	While it is possible to use operator syntax with destructors, destructors invalidate their argument, thus operator syntax with destructors is void-typed expression.
310
311	\subsection{Function Generation}
312	In \CFA, every type is defined to have the core set of four special functions described previously.
313	Having these functions exist for every type greatly simplifies the semantics of the language, since most operations can simply be defined directly in terms of function calls.
314	In addition to simplifying the definition of the language, it also simplifies the analysis that the translator must perform.
315	If the translator can expect these functions to exist, then it can unconditionally attempt to resolve them.
316	Moreover, the existence of a standard interface allows polymorphic code to interoperate with new types seamlessly.
317	While automatic generation of assignment functions is present in previous versions of \CFA, the the implementation has been largely rewritten to accomodate constructors and destructors.
318
319	To mimic the behaviour of standard C, the default constructor and destructor for all of the basic types and for all pointer types are defined to do nothing, while the copy constructor and assignment operator perform a bitwise copy of the source parameter (as in \CC).
320	This default is intended to maintain backwards compatibility and performance, by not imposing unexpected operations for a C programmer, as a zero-default behaviour would.
321	However, it is possible for a user to define such constructors so that variables are safely zeroed by default, if desired.
322	%%%%%%%%%%%%%%%%%%%%%%%%%% line width %%%%%%%%%%%%%%%%%%%%%%%%%%
323	\begin{cfacode}
324	void ?{}(int * i) { *i = 0; }
325	forall(dtype T) void ?{}(T ** p) { *p = 0; } // any pointer type
326	void f() {
327	int x; // initialized to 0
328	int * p; // initialized to 0
329	}
330	\end{cfacode}
331	%%%%%%%%%%%%%%%%%%%%%%%%%% line width %%%%%%%%%%%%%%%%%%%%%%%%%%
332
333	There are several options for user-defined types: structures, unions, and enumerations.
334	To aid in ease of use, the standard set of four functions is automatically generated for a user-defined type after its definition is completed.
335	By auto-generating these functions, it is ensured that legacy C code continues to work correctly in every context where \CFA expects these functions to exist, since they are generated for every complete type.
336	As well, these functions are always generated, since they may be needed by polymorphic functions.
337	With that said, the generated functions are not called implicitly unless they are non-trivial, and are never exported, making it simple for the optimizer to strip them away when they are not used.
338
339	The generated functions for enumerations are the simplest.
340	Since enumerations in C are essentially just another integral type, the generated functions behave in the same way that the built-in functions for the basic types work.
341	For example, given the enumeration
342	\begin{cfacode}
343	enum Colour {
344	R, G, B
345	};
346	\end{cfacode}
347	The following functions are automatically generated.
348	\begin{cfacode}
349	void ?{}(enum Colour *_dst){
350	// default constructor does nothing
351	}
352	void ?{}(enum Colour *_dst, enum Colour _src){
353	*_dst=_src; // bitwise copy
354	}
355	void ^?{}(enum Colour *_dst){
356	// destructor does nothing
357	}
358	enum Colour ?=?(enum Colour *_dst, enum Colour _src){
359	return *_dst=_src; // bitwise copy
360	}
361	\end{cfacode}
362	In the future, \CFA will introduce strongly-typed enumerations, like those in \CC, wherein enumerations create a new type distinct from @int@ so that integral values require an explicit cast to be stored in an enumeration variable.
363	The existing generated routines are sufficient to express this restriction, since they are currently set up to take in values of that enumeration type.
364	Changes related to this feature only need to affect the expression resolution phase, where more strict rules will be applied to prevent implicit conversions from integral types to enumeration types, but should continue to permit conversions from enumeration types to @int@.
365	In this way, it is still possible to add an @int@ to an enumeration, but the resulting value is an @int@, meaning it cannot be reassigned to an enumeration without a cast.
366
367	For structures, the situation is more complicated.
368	Given a structure @S@ with members @M$_0$@, @M$_1$@, ... @M$_{N-1}$@, each function @f@ in the standard set calls \lstinline{f(s->M$_i$, ...)} for each @$i$@.
369	That is, a default constructor for @S@ default constructs the members of @S@, the copy constructor copy constructs them, and so on.
370	For example, given the structure definition
371	\begin{cfacode}
372	struct A {
373	B b;
374	C c;
375	}
376	\end{cfacode}
377	The following functions are implicitly generated.
378	\begin{cfacode}
379	void ?{}(A * this) {
380	?{}(&this->b); // default construct each field
381	?{}(&this->c);
382	}
383	void ?{}(A * this, A other) {
384	?{}(&this->b, other.b); // copy construct each field
385	?{}(&this->c, other.c);
386	}
387	A ?=?(A * this, A other) {
388	?=?(&this->b, other.b); // assign each field
389	?=?(&this->c, other.c);
390	}
391	void ^?{}(A * this) {
392	^?{}(&this->c); // destruct each field
393	^?{}(&this->b);
394	}
395	\end{cfacode}
396	It is important to note that the destructors are called in reverse declaration order to prevent conflicts in the event there are dependencies among members.
397
398	In addition to the standard set, a set of \emph{field constructors} is also generated for structures.
399	The field constructors are constructors that consume a prefix of the structure's member-list.
400	That is, $N$ constructors are built of the form @void ?{}(S , T$_{\text{M}_0}$)@, @void ?{}(S , T$_{\text{M}_0}$, T$_{\text{M}_1}$)@, ..., @void ?{}(S *, T$_{\text{M}_0}$, T$_{\text{M}_1}$, ..., T$_{\text{M}_{N-1}}$)@, where members are copy constructed if they have a corresponding positional argument and are default constructed otherwise.
401	The addition of field constructors allows structures in \CFA to be used naturally in the same ways as used in C (\ie, to initialize any prefix of the structure), \eg, @A a0 = { b }, a1 = { b, c }@.
402	Extending the previous example, the following constructors are implicitly generated for @A@.
403	\begin{cfacode}
404	void ?{}(A * this, B b) {
405	?{}(&this->b, b);
406	?{}(&this->c);
407	}
408	void ?{}(A * this, B b, C c) {
409	?{}(&this->b, b);
410	?{}(&this->c, c);
411	}
412	\end{cfacode}
413
414	For unions, the default constructor and destructor do nothing, as it is not obvious which member, if any, should be constructed.
415	For copy constructor and assignment operations, a bitwise @memcpy@ is applied.
416	In standard C, a union can also be initialized using a value of the same type as its first member, and so a corresponding field constructor is generated to perform a bitwise @memcpy@ of the object.
417	An alternative to this design is to always construct and destruct the first member of a union, to match with the C semantics of initializing the first member of the union.
418	This approach ultimately feels subtle and unsafe.
419	Another option is to, like \CC, disallow unions from containing members that are themselves managed types.
420	This restriction is a reasonable approach from a safety standpoint, but is not very C-like.
421	Since the primary purpose of a union is to provide low-level memory optimization, it is assumed that the user has a certain level of maturity.
422	It is therefore the responsibility of the user to define the special functions explicitly if they are appropriate, since it is impossible to accurately predict the ways that a union is intended to be used at compile-time.
423
424	For example, given the union
425	\begin{cfacode}
426	union X {
427	Y y;
428	Z z;
429	};
430	\end{cfacode}
431	The following functions are automatically generated.
432	\begin{cfacode}
433	void ?{}(union X *_dst){ // default constructor
434	}
435	void ?{}(union X *_dst, union X _src){ // copy constructor
436	__builtin_memcpy(_dst, &_src, sizeof(union X ));
437	}
438	void ^?{}(union X *_dst){ // destructor
439	}
440	union X ?=?(union X *_dst, union X _src){ // assignment
441	__builtin_memcpy(_dst, &_src, sizeof(union X));
442	return _src;
443	}
444	void ?{}(union X *_dst, struct Y src){ // construct first field
445	__builtin_memcpy(_dst, &src, sizeof(struct Y));
446	}
447	\end{cfacode}
448
449	% This feature works in the \CFA model, since constructors are simply special functions and can be called explicitly, unlike in \CC. % this sentence isn't really true => placement new
450	In \CCeleven, unions may have managed members, with the caveat that if there are any members with a user-defined operation, then that operation is not implicitly defined, forcing the user to define the operation if necessary.
451	This restriction could easily be added into \CFA once \emph{deleted} functions are added.
452
453	\subsection{Using Constructors and Destructors}
454	Implicitly generated constructor and destructor calls ignore the outermost type qualifiers, \eg @const@ and @volatile@, on a type by way of a cast on the first argument to the function.
455	For example,
456	\begin{cfacode}
457	struct S { int i; };
458	void ?{}(S *, int);
459	void ?{}(S *, S);
460
461	const S s = { 11 };
462	volatile S s2 = s;
463	\end{cfacode}
464	Generates the following code
465	\begin{cfacode}
466	const struct S s;
467	?{}((struct S *)&s, 11);
468	volatile struct S s2;
469	?{}((struct S *)&s2, s);
470	\end{cfacode}
471	Here, @&s@ and @&s2@ are cast to unqualified pointer types.
472	This mechanism allows the same constructors and destructors to be used for qualified objects as for unqualified objects.
473	This rule applies only to implicitly generated constructor calls.
474	Hence, explicitly re-initializing qualified objects with a constructor requires an explicit cast.
475
476	As discussed in Section \ref{sub:c_background}, compound literals create unnamed objects.
477	This mechanism can continue to be used seamlessly in \CFA with managed types to create temporary objects.
478	The object created by a compound literal is constructed using the provided brace-enclosed initializer-list, and is destructed at the end of the scope it is used in.
479	For example,
480	\begin{cfacode}
481	struct A { int x; };
482	void ?{}(A *, int, int);
483	{
484	int x = (A){ 10, 20 }.x;
485	}
486	\end{cfacode}
487	is equivalent to
488	\begin{cfacode}
489	struct A { int x, y; };
490	void ?{}(A *, int, int);
491	{
492	A _tmp;
493	?{}(&_tmp, 10, 20);
494	int x = _tmp.x;
495	^?{}(&tmp);
496	}
497	\end{cfacode}
498
499	Unlike \CC, \CFA provides an escape hatch that allows a user to decide at an object's definition whether it should be managed or not.
500	An object initialized with \ateq is guaranteed to be initialized like a C object, and has no implicit destructor call.
501	This feature provides all of the freedom that C programmers are used to having to optimize a program, while maintaining safety as a sensible default.
502	\begin{cfacode}
503	struct A { int * x; };
504	// RAII
505	void ?{}(A * a) { a->x = malloc(sizeof(int)); }
506	void ^?{}(A * a) { free(a->x); }
507
508	A a1; // managed
509	A a2 @= { 0 }; // unmanaged
510	\end{cfacode}
511	In this example, @a1@ is a managed object, and thus is default constructed and destructed at the start/end of @a1@'s lifetime, while @a2@ is an unmanaged object and is not implicitly constructed or destructed.
512	Instead, @a2->x@ is initialized to @0@ as if it were a C object, because of the explicit initializer.
513
514	In addition to freedom, \ateq provides a simple path for migrating legacy C code to \CFA, in that objects can be moved from C-style initialization to \CFA gradually and individually.
515	It is worth noting that the use of unmanaged objects can be tricky to get right, since there is no guarantee that the proper invariants are established on an unmanaged object.
516	It is recommended that most objects be managed by sensible constructors and destructors, except where absolutely necessary, such as memory-mapped devices, trigger devices, I/O controllers, etc.
517
518	When a user declares any constructor or destructor, the corresponding intrinsic/generated function and all field constructors for that type are hidden, so that they are not found during expression resolution until the user-defined function goes out of scope.
519	Furthermore, if the user declares any constructor, then the intrinsic/generated default constructor is also hidden, precluding default construction.
520	These semantics closely mirror the rule for implicit declaration of constructors in \CC, wherein the default constructor is implicitly declared if there is no user-declared constructor \cite[p.~186]{ANSI98:C++}.
521	\begin{cfacode}
522	struct S { int x, y; };
523
524	void f() {
525	S s0, s1 = { 0 }, s2 = { 0, 2 }, s3 = s2; // okay
526	{
527	void ?{}(S * s, int i) { s->x = i*2; } // locally hide autogen ctors
528	S s4; // error, no default constructor
529	S s5 = { 3 }; // okay, local constructor
530	S s6 = { 4, 5 }; // error, no field constructor
531	S s7 = s5; // okay
532	}
533	S s8, s9 = { 6 }, s10 = { 7, 8 }, s11 = s10; // okay
534	}
535	\end{cfacode}
536	In this example, the inner scope declares a constructor from @int@ to @S@, which hides the default constructor and field constructors until the end of the scope.
537
538	When defining a constructor or destructor for a structure @S@, any members that are not explicitly constructed or destructed are implicitly constructed or destructed automatically.
539	If an explicit call is present, then that call is taken in preference to any implicitly generated call.
540	A consequence of this rule is that it is possible, unlike \CC, to precisely control the order of construction and destruction of sub-objects on a per-constructor basis, whereas in \CC sub-object initialization and destruction is always performed based on the declaration order.
541	\begin{cfacode}
542	struct A {
543	B w, x, y, z;
544	};
545	void ?{}(A * a, int i) {
546	(&a->x){ i };
547	(&a->z){ a->y };
548	}
549	\end{cfacode}
550	Generates the following
551	\begin{cfacode}
552	void ?{}(A * a, int i) {
553	(&a->w){}; // implicit default ctor
554	(&a->y){}; // implicit default ctor
555	(&a->x){ i };
556	(&a->z){ a->y };
557	}
558	\end{cfacode}
559	Finally, it is illegal for a sub-object to be explicitly constructed for the first time after it is used for the first time.
560	If the translator cannot be reasonably sure that an object is constructed prior to its first use, but is constructed afterward, an error is emitted.
561	More specifically, the translator searches the body of a constructor to ensure that every sub-object is initialized.
562	\begin{cfacode}
563	void ?{}(A * a, double x) {
564	f(a->x);
565	(&a->x){ (int)x }; // error, used uninitialized on previous line
566	}
567	\end{cfacode}
568	However, if the translator sees a sub-object used within the body of a constructor, but does not see a constructor call that uses the sub-object as the target of a constructor, then the translator assumes the object is to be implicitly constructed (copy constructed in a copy constructor and default constructed in any other constructor).
569	To override this rule, \ateq can be used to force the translator to trust the programmer's discretion.
570	This form of \ateq is not yet implemented.
571	\begin{cfacode}
572	void ?{}(A * a) {
573	// default constructs all members
574	f(a->x);
575	}
576
577	void ?{}(A * a, A other) {
578	// copy constructs all members
579	f(a->y);
580	}
581
582	void ?{}(A * a, int x) {
583	// object forwarded to another constructor,
584	// does not implicitly construct any members
585	(&a){};
586	}
587
588	void ^?{}(A * a) {
589	^(&a->x){}; // explicit destructor call
590	} // z, y, w implicitly destructed, in this order
591	\end{cfacode}
592	If at any point, the @this@ parameter is passed directly as the target of another constructor, then it is assumed the other constructor handles the initialization of all of the object's members and no implicit constructor calls are added to the current constructor.
593
594	Despite great effort, some forms of C syntax do not work well with constructors in \CFA.
595	In particular, constructor calls cannot contain designations (see \ref{sub:c_background}), since this is equivalent to allowing designations on the arguments to arbitrary function calls.
596	\begin{cfacode}
597	// all legal forward declarations in C
598	void f(int, int, int);
599	void f(int a, int b, int c);
600	void f(int b, int c, int a);
601	void f(int c, int a, int b);
602	void f(int x, int y, int z);
603
604	f(b:10, a:20, c:30); // which parameter is which?
605	\end{cfacode}
606	In C, function prototypes are permitted to have arbitrary parameter names, including no names at all, which may have no connection to the actual names used at function definition.
607	Furthermore, a function prototype can be repeated an arbitrary number of times, each time using different names.
608	As a result, it was decided that any attempt to resolve designated function calls with C's function prototype rules would be brittle, and thus it is not sensible to allow designations in constructor calls.
609
610	\begin{sloppypar}
611	In addition, constructor calls do not support unnamed nesting.
612	\begin{cfacode}
613	struct B { int x; };
614	struct C { int y; };
615	struct A { B b; C c; };
616	void ?{}(A *, B);
617	void ?{}(A *, C);
618
619	A a = {
620	{ 10 }, // construct B? - invalid
621	};
622	\end{cfacode}
623	In C, nesting initializers means that the programmer intends to initialize sub-objects with the nested initializers.
624	The reason for this omission is to both simplify the mental model for using constructors, and to make initialization simpler for the expression resolver.
625	If this were allowed, it would be necessary for the expression resolver to decide whether each argument to the constructor call could initialize to some argument in one of the available constructors, making the problem highly recursive and potentially much more expensive.
626	That is, in the previous example the line marked as an error could mean construct using @?{}(A , B)@ or with @?{}(A , C)@, since the inner initializer @{ 10 }@ could be taken as an intermediate object of type @B@ or @C@.
627	In practice, however, there could be many objects that can be constructed from a given @int@ (or, indeed, any arbitrary parameter list), and thus a complete solution to this problem would require fully exploring all possibilities.
628	\end{sloppypar}
629
630	More precisely, constructor calls cannot have a nesting depth greater than the number of array dimensions in the type of the initialized object, plus one.
631	For example,
632	\begin{cfacode}
633	struct A;
634	void ?{}(A *, int);
635	void ?{}(A *, A, A);
636
637	A a1[3] = { { 3 }, { 4 }, { 5 } };
638	A a2[2][2] = {
639	{ { 9 }, { 10 } }, // a2[0]
640	{ {14 }, { 15 } } // a2[1]
641	};
642	A a3[4] = { // 1 dimension => max depth 2
643	{ { 11 }, { 12 } }, // error, three levels deep
644	{ 80 }, { 90 }, { 100 }
645	}
646	\end{cfacode}
647	The body of @A@ has been omitted, since only the constructor interfaces are important.
648
649	It should be noted that unmanaged objects, i.e. objects that have only trivial constructors, can still make use of designations and nested initializers in \CFA.
650	It is simple to overcome this limitation for managed objects by making use of compound literals, so that the arguments to the constructor call are explicitly typed.
651	%%%%%%%%%%%%%%%%%%%%%%%%%% line width %%%%%%%%%%%%%%%%%%%%%%%%%%
652	\begin{cfacode}
653	struct B { int x; };
654	struct C { int y; };
655	struct A { B b; C c; };
656	void ?{}(A *, B);
657	void ?{}(A *, C);
658
659	A a = {
660	(C){ 10 } // disambiguate with compound literal
661	};
662	\end{cfacode}
663	%%%%%%%%%%%%%%%%%%%%%%%%%% line width %%%%%%%%%%%%%%%%%%%%%%%%%%
664
665	\subsection{Implicit Destructors}
666	\label{sub:implicit_dtor}
667	Destructors are automatically called at the end of the block in which the object is declared.
668	In addition to this, destructors are automatically called when statements manipulate control flow to leave a block in which the object is declared, \eg, with return, break, continue, and goto statements.
669	The example below demonstrates a simple routine with multiple return statements.
670	\begin{cfacode}
671	struct A;
672	void ^?{}(A *);
673
674	void f(int i) {
675	A x; // construct x
676	{
677	A y; // construct y
678	{
679	A z; // construct z
680	{
681	if (i == 0) return; // destruct x, y, z
682	}
683	if (i == 1) return; // destruct x, y, z
684	} // destruct z
685	if (i == 2) return; // destruct x, y
686	} // destruct y
687	} // destruct x
688	\end{cfacode}
689
690	The next example illustrates the use of simple continue and break statements and the manner that they interact with implicit destructors.
691	\begin{cfacode}
692	for (int i = 0; i < 10; i++) {
693	A x;
694	if (i == 2) {
695	continue; // destruct x
696	} else if (i == 3) {
697	break; // destruct x
698	}
699	} // destruct x
700	\end{cfacode}
701	Since a destructor call is automatically inserted at the end of the block, nothing special needs to happen to destruct @x@ in the case where control reaches the end of the loop.
702	In the case where @i@ is @2@, the continue statement runs the loop update expression and attempts to begin the next iteration of the loop.
703	Since continue is a C statement, which does not understand destructors, it is transformed into a @goto@ statement that branches to the end of the loop, just before the block's destructors, to ensure that @x@ is destructed.
704	When @i@ is @3@, the break statement moves control to just past the end of the loop.
705	Unlike the previous case, the destructor for @x@ cannot be reused, so a destructor call for @x@ is inserted just before the break statement.
706
707	\CFA also supports labeled break and continue statements, which allow more precise manipulation of control flow.
708	Labeled break and continue allow the programmer to specify which control structure to target by using a label attached to a control structure.
709	\begin{cfacode}[emph={L1,L2}, emphstyle=\color{red}]
710	L1: for (int i = 0; i < 10; i++) {
711	A x;
712	for (int j = 0; j < 10; j++) {
713	A y;
714	if (i == 1) {
715	continue L1; // destruct y
716	} else if (i == 2) {
717	break L1; // destruct x,y
718	}
719	} // destruct y
720	} // destruct X
721	\end{cfacode}
722	The statement @continue L1@ begins the next iteration of the outer for-loop.
723	Since the semantics of continue require the loop update expression to execute, control branches to the end of the outer for loop, meaning that the block destructor for @x@ can be reused, and it is only necessary to generate the destructor for @y@.
724	Break, on the other hand, requires jumping out of both loops, so the destructors for both @x@ and @y@ are generated and inserted before the @break L1@ statement.
725
726	Finally, an example which demonstrates goto.
727	Since goto is a general mechanism for jumping to different locations in the program, a more comprehensive approach is required.
728	For each goto statement $G$ and each target label $L$, let $S_G$ be the set of all managed variables alive at $G$, and let $S_L$ be the set of all managed variables alive at $L$.
729	If at any $G$, $S_L \setminus S_G = \emptyset$, then the translator emits an error, because control flow branches from a point where the object is not yet live to a point where it is live, skipping the object's constructor.
730	Then, for every $G$, the destructors for each variable in the set $S_G \setminus S_L$ is inserted directly before $G$, which ensures each object that is currently live at $G$, but not at $L$, is destructed before control branches.
731	\begin{cfacode}
732	int i = 0;
733	{
734	L0: ; // S_L0 = { x }
735	A y;
736	L1: ; // S_L1 = { x }
737	A x;
738	L2: ; // S_L2 = { y, x }
739	if (i == 0) {
740	++i;
741	goto L1; // S_G = { y, x }
742	// S_G-S_L1 = { x } => destruct x
743	} else if (i == 1) {
744	++i;
745	goto L2; // S_G = { y, x }
746	// S_G-S_L2 = {} => destruct nothing
747	} else if (i == 2) {
748	++i;
749	goto L3; // S_G = { y, x }
750	// S_G-S_L3 = {}
751	} else if (false) {
752	++i;
753	A z;
754	goto L3; // S_G = { z, y, x }
755	// S_G-S_L3 = { z } => destruct z
756	} else {
757	++i;
758	goto L4; // S_G = { y, x }
759	// S_G-S_L4 = { y, x } => destruct y, x
760	}
761	L3: ; // S_L3 = { y, x }
762	goto L2; // S_G = { y, x }
763	// S_G-S_L2 = {}
764	}
765	L4: ; // S_L4 = {}
766	if (i == 4) {
767	goto L0; // S_G = {}
768	// S_G-S_L0 = {}
769	}
770	\end{cfacode}
771	All break and continue statements are implemented in \CFA in terms of goto statements, so the more constrained forms are precisely governed by these rules.
772
773	The next example demonstrates the error case.
774	\begin{cfacode}
775	{
776	goto L1; // S_G = {}
777	// S_L1-S_G = { y } => error
778	A y;
779	L1: ; // S_L1 = { y }
780	A x;
781	L2: ; // S_L2 = { y, x }
782	}
783	goto L2; // S_G = {}
784	// S_L2-S_G = { y, x } => error
785	\end{cfacode}
786
787	While \CFA supports the GCC computed-goto extension, the behaviour of managed objects in combination with computed-goto is undefined.
788	\begin{cfacode}
789	void f(int val) {
790	void * l = val == 0 ? &&L1 : &&L2;
791	{
792	A x;
793	L1: ;
794	goto *l; // branches differently depending on argument
795	}
796	L2: ;
797	}
798	\end{cfacode}
799	Likewise, destructors are not executed at scope-exit due to a computed-goto in \CC, as of g++ version 6.2.
800
801	\subsection{Implicit Copy Construction}
802	\label{s:implicit_copy_construction}
803	When a function is called, the arguments supplied to the call are subject to implicit copy construction (and destruction of the generated temporary), and the return value is subject to destruction.
804	When a value is returned from a function, the copy constructor is called to pass the value back to the call site.
805	Exempt from these rules are intrinsic and built-in functions.
806	It should be noted that unmanaged objects are subject to copy constructor calls when passed as arguments to a function or when returned from a function, since they are not the \emph{target} of the copy constructor call.
807	That is, since the parameter is not marked as an unmanaged object using \ateq, it is copy constructed if it is returned by value or passed as an argument to another function, so to guarantee consistent behaviour, unmanaged objects must be copy constructed when passed as arguments.
808	These semantics are important to bear in mind when using unmanaged objects, and could produce unexpected results when mixed with objects that are explicitly constructed.
809	\begin{cfacode}
810	struct A { ... };
811	void ?{}(A *);
812	void ?{}(A *, A);
813	void ^?{}(A *);
814
815	A identity(A x) { // pass by value => need local copy
816	return x; // return by value => make call-site copy
817	}
818
819	A y, z @= {};
820	identity(y); // copy construct y into x
821	identity(z); // copy construct z into x
822	\end{cfacode}
823	Note that unmanaged argument @z@ is logically copy constructed into managed parameter @x@; however, the translator must copy construct into a temporary variable to be passed as an argument, which is also destructed after the call.
824	A compiler could by-pass the argument temporaries since it is in control of the calling conventions and knows exactly where the called-function's parameters live.
825
826	This generates the following
827	\begin{cfacode}
828	struct A f(struct A x){
829	struct A _retval_f; // return value
830	?{}((&_retval_f), x); // copy construct return value
831	return _retval_f;
832	}
833
834	struct A y;
835	?{}(&y); // default construct
836	struct A z = { 0 }; // C default
837
838	struct A _tmp_cp1; // argument 1
839	struct A _tmp_cp_ret0; // return value
840	_tmp_cp_ret0=f(
841	(?{}(&_tmp_cp1, y) , _tmp_cp1) // argument is a comma expression
842	), _tmp_cp_ret0; // return value for cascading
843	^?{}(&_tmp_cp_ret0); // destruct return value
844	^?{}(&_tmp_cp1); // destruct argument 1
845
846	struct A _tmp_cp2; // argument 1
847	struct A _tmp_cp_ret1; // return value
848	_tmp_cp_ret1=f(
849	(?{}(&_tmp_cp2, z), _tmp_cp2) // argument is a common expression
850	), _tmp_cp_ret1; // return value for cascading
851	^?{}(&_tmp_cp_ret1); // destruct return value
852	^?{}(&_tmp_cp2); // destruct argument 1
853	^?{}(&y);
854	\end{cfacode}
855
856	A special syntactic form, such as a variant of \ateq, can be implemented to specify at the call site that an argument should not be copy constructed, to regain some control for the C programmer.
857	\begin{cfacode}
858	identity(z@); // do not copy construct argument
859	// - will copy construct/destruct return value
860	A@ identity_nocopy(A @ x) { // argument not copy constructed or destructed
861	return x; // not copy constructed
862	// return type marked @ => not destructed
863	}
864	\end{cfacode}
865	It should be noted that reference types will allow specifying that a value does not need to be copied, however reference types do not provide a means of preventing implicit copy construction from uses of the reference, so the problem is still present when passing or returning the reference by value.
866
867	Adding implicit copy construction imposes the additional runtime cost of the copy constructor for every argument and return value in a function call.
868	This cost is necessary to maintain appropriate value semantics when calling a function.
869	In the future, return-value-optimization (RVO) can be implemented for \CFA to elide unnecessary copy construction and destruction of temporary objects.
870	This cost is not present for types with trivial copy constructors and destructors.
871
872	A known issue with this implementation is that the argument and return value temporaries are not guaranteed to have the same address for their entire lifetimes.
873	In the previous example, since @_retval_f@ is allocated and constructed in @f@, then returned by value, the internal data is bitwise copied into the caller's stack frame.
874	This approach works out most of the time, because typically destructors need to only access the fields of the object and recursively destroy.
875	It is currently the case that constructors and destructors that use the @this@ pointer as a unique identifier to store data externally do not work correctly for return value objects.
876	Thus, it is currently not safe to rely on an object's @this@ pointer to remain constant throughout execution of the program.
877	\begin{cfacode}
878	A * external_data[32];
879	int ext_count;
880	struct A;
881	void ?{}(A * a) {
882	// ...
883	external_data[ext_count++] = a;
884	}
885	void ^?{}(A * a) {
886	for (int i = 0; i < ext_count) {
887	if (a == external_data[i]) { // may never be true
888	// ...
889	}
890	}
891	}
892
893	A makeA() {
894	A x; // stores &x in external_data
895	return x;
896	}
897	makeA(); // return temporary has a different address than x
898	// equivalent to:
899	// A _tmp;
900	// _tmp = makeA(), _tmp;
901	// ^?{}(&_tmp);
902	\end{cfacode}
903	In the above example, a global array of pointers is used to keep track of all of the allocated @A@ objects.
904	Due to copying on return, the current object being destructed does not exist in the array if an @A@ object is ever returned by value from a function, such as in @makeA@.
905
906	This problem could be solved in the translator by changing the function signatures so that the return value is moved into the parameter list.
907	For example, the translator could restructure the code like so
908	\begin{cfacode}
909	void f(struct A x, struct A * _retval_f){
910	?{}(_retval_f, x); // construct directly into caller's stack frame
911	}
912
913	struct A y;
914	?{}(&y);
915	struct A z = { 0 };
916
917	struct A _tmp_cp1; // argument 1
918	struct A _tmp_cp_ret0; // return value
919	f((?{}(&_tmp_cp1, y) , _tmp_cp1), &_tmp_cp_ret0), _tmp_cp_ret0;
920	^?{}(&_tmp_cp_ret0); // return value
921	^?{}(&_tmp_cp1); // argument 1
922	\end{cfacode}
923	This transformation provides @f@ with the address of the return variable so that it can be constructed into directly.
924	It is worth pointing out that this kind of signature rewriting already occurs in polymorphic functions that return by value, as discussed in \cite{Bilson03}.
925	A key difference in this case is that every function would need to be rewritten like this, since types can switch between managed and unmanaged at different scope levels, \eg
926	\begin{cfacode}
927	struct A { int v; };
928	A x; // unmanaged, since only trivial constructors are available
929	{
930	void ?{}(A * a) { ... }
931	void ^?{}(A * a) { ... }
932	A y; // managed
933	}
934	A z; // unmanaged
935	\end{cfacode}
936	Hence there is not enough information to determine at function declaration whether a type is managed or not, and thus it is the case that all signatures have to be rewritten to account for possible copy constructor and destructor calls.
937	Even with this change, it would still be possible to declare backwards compatible function prototypes with an @extern "C"@ block, which allows for the definition of C-compatible functions within \CFA code, however this would require actual changes to the way code inside of an @extern "C"@ function is generated as compared with normal code generation.
938	Furthermore, it is not possible to overload C functions, so using @extern "C"@ to declare functions is of limited use.
939
940	It would be possible to regain some control by adding an attribute to structures that specifies whether they can be managed or not (perhaps \emph{manageable} or \emph{unmanageable}), and to emit an error in the case that a constructor or destructor is declared for an unmanageable type.
941	Ideally, structures should be manageable by default, since otherwise the default case becomes more verbose.
942	This means that in general, function signatures would have to be rewritten, and in a select few cases the signatures would not be rewritten.
943	\begin{cfacode}
944	__attribute__((manageable)) struct A { ... }; // can declare ctors
945	__attribute__((unmanageable)) struct B { ... }; // cannot declare ctors
946	struct C { ... }; // can declare ctors
947
948	A f(); // rewritten void f(A *);
949	B g(); // not rewritten
950	C h(); // rewritten void h(C *);
951	\end{cfacode}
952	An alternative is to make the attribute \emph{identifiable}, which states that objects of this type use the @this@ parameter as an identity.
953	This strikes more closely to the visible problem, in that only types marked as identifiable would need to have the return value moved into the parameter list, and every other type could remain the same.
954	Furthermore, no restrictions would need to be placed on whether objects can be constructed.
955	\begin{cfacode}
956	__attribute__((identifiable)) struct A { ... }; // can declare ctors
957	struct B { ... }; // can declare ctors
958
959	A f(); // rewritten void f(A *);
960	B g(); // not rewritten
961	\end{cfacode}
962
963	Ultimately, both of these are patchwork solutions.
964	Since a real compiler has full control over its calling conventions, it can seamlessly allow passing the return parameter without outwardly changing the signature of a routine.
965	As such, it has been decided that this issue is not currently a priority and will be fixed when a full \CFA compiler is implemented.
966
967	\section{Implementation}
968	\subsection{Array Initialization}
969	Arrays are a special case in the C type-system.
970	Type checking largely ignores size information for C arrays, making it impossible to write a standalone \CFA function that constructs or destructs an array, while maintaining the standard interface for constructors and destructors.
971	Instead, \CFA defines the initialization and destruction of an array recursively.
972	That is, when an array is defined, each of its elements is constructed in order from element 0 up to element $n-1$.
973	When an array is to be implicitly destructed, each of its elements is destructed in reverse order from element $n-1$ down to element 0.
974	As in C, it is possible to explicitly provide different initializers for each element of the array through array initialization syntax.
975	In this case, each of the initializers is taken in turn to construct a subsequent element of the array.
976	If too many initializers are provided, only the initializers up to N are actually used.
977	If too few initializers are provided, then the remaining elements are default constructed.
978
979	For example, given the following code.
980	\begin{cfacode}
981	struct X {
982	int x, y, z;
983	};
984	void f() {
985	X x[10] = { { 1, 2, 3 }, { 4 }, { 7, 8 } };
986	}
987	\end{cfacode}
988	The following code is generated for @f@.
989	\begin{cfacode}
990	void f(){
991	struct X x[((long unsigned int )10)];
992	// construct x
993	{
994	int _index0 = 0;
995	// construct with explicit initializers
996	{
997	if (_index0<10) ?{}(&x[_index0], 1, 2, 3);
998	++_index0;
999	if (_index0<10) ?{}(&x[_index0], 4);
1000	++_index0;
1001	if (_index0<10) ?{}(&x[_index0], 7, 8);
1002	++_index0;
1003	}
1004
1005	// default construct remaining elements
1006	for (;_index0<10;++_index0) {
1007	?{}(&x[_index0]);
1008	}
1009	}
1010	// destruct x
1011	{
1012	int _index1 = 10-1;
1013	for (;_index1>=0;--_index1) {
1014	^?{}(&x[_index1]);
1015	}
1016	}
1017	}
1018	\end{cfacode}
1019	Multidimensional arrays require more complexity.
1020	For example, a two dimensional array
1021	\begin{cfacode}
1022	void g() {
1023	X x[10][10] = {
1024	{ { 1, 2, 3 }, { 4 } }, // x[0]
1025	{ { 7, 8 } } // x[1]
1026	};
1027	}\end{cfacode}
1028	Generates the following
1029	\begin{cfacode}
1030	void g(){
1031	struct X x[10][10];
1032	// construct x
1033	{
1034	int _index0 = 0;
1035	for (;_index0<10;++_index0) {
1036	{
1037	int _index1 = 0;
1038	// construct with explicit initializers
1039	{
1040	switch ( _index0 ) {
1041	case 0:
1042	// construct first array
1043	if ( _index1<10 ) ?{}(&x[_index0][_index1], 1, 2, 3);
1044	++_index1;
1045	if ( _index1<10 ) ?{}(&x[_index0][_index1], 4);
1046	++_index1;
1047	break;
1048	case 1:
1049	// construct second array
1050	if ( _index1<10 ) ?{}(&x[_index0][_index1], 7, 8);
1051	++_index1;
1052	break;
1053	}
1054	}
1055	// default construct remaining elements
1056	for (;_index1<10;++_index1) {
1057	?{}(&x[_index0][_index1]);
1058	}
1059	}
1060	}
1061	}
1062	// destruct x
1063	{
1064	int _index2 = 10-1;
1065	for (;_index2>=0;--_index2) {
1066	{
1067	int _index3 = 10-1;
1068	for (;_index3>=0;--_index3) {
1069	^?{}(&x[_index2][_index3]);
1070	}
1071	}
1072	}
1073	}
1074	}
1075	\end{cfacode}
1076	% It is possible to generate slightly simpler code for the switch cases, since the value of @_index1@ is known at compile-time within each case, however the procedure for generating constructor calls is complicated.
1077	% It is simple to remove the increment statements for @_index1@, but it is not simple to remove the
1078	%% technically, it's not hard either. I could easily downcast and change the second argument to ?[?], but is it really necessary/worth it??
1079
1080	\subsection{Global Initialization}
1081	In standard C, global variables can only be initialized to compile-time constant expressions, which places strict limitations on the programmer's ability to control the default values of objects.
1082	In \CFA, constructors and destructors are guaranteed to be run on global objects, allowing arbitrary code to be run before and after the execution of the main routine.
1083	By default, objects within a translation unit are constructed in declaration order, and destructed in the reverse order.
1084	The default order of construction of objects amongst translation units is unspecified.
1085	It is, however, guaranteed that any global objects in the standard library are initialized prior to the initialization of any object in a user program.
1086
1087	This feature is implemented in the \CFA translator by grouping every global constructor call into a function with the GCC attribute \emph{constructor}, which performs most of the heavy lifting \cite[6.31.1]{GCCExtensions}.
1088	A similar function is generated with the \emph{destructor} attribute, which handles all global destructor calls.
1089	At the time of writing, initialization routines in the library are specified with priority \emph{101}, which is the highest priority level that GCC allows, whereas initialization routines in the user's code are implicitly given the default priority level, which ensures they have a lower priority than any code with a specified priority level.
1090	This mechanism allows arbitrarily complicated initialization to occur before any user code runs, making it possible for library designers to initialize their modules without requiring the user to call specific startup or tear-down routines.
1091
1092	For example, given the following global declarations.
1093	\begin{cfacode}
1094	struct X {
1095	int y, z;
1096	};
1097	void ?{}(X *);
1098	void ?{}(X *, int, int);
1099	void ^?{}(X *);
1100
1101	X a;
1102	X b = { 10, 3 };
1103	\end{cfacode}
1104	The following code is generated.
1105	\begin{cfacode}
1106	__attribute__ ((constructor)) static void _init_global_ctor(void){
1107	?{}(&a);
1108	?{}(&b, 10, 3);
1109	}
1110	__attribute__ ((destructor)) static void _destroy_global_ctor(void){
1111	^?{}(&b);
1112	^?{}(&a);
1113	}
1114	\end{cfacode}
1115
1116	% https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Attributes.html#C_002b_002b-Attributes
1117	% suggestion: implement this in CFA by picking objects with a specified priority and pulling them into their own init functions (could even group them by priority level -> map<int, list<ObjectDecl*>>) and pull init_priority forward into constructor and destructor attributes with the same priority level
1118	GCC provides an attribute @init_priority@ in \CC, which allows specifying the relative priority for initialization of global objects on a per-object basis.
1119	A similar attribute can be implemented in \CFA by pulling marked objects into global constructor/destructor-attribute functions with the specified priority.
1120	For example,
1121	\begin{cfacode}
1122	struct A { ... };
1123	void ?{}(A *, int);
1124	void ^?{}(A *);
1125	__attribute__((init_priority(200))) A x = { 123 };
1126	\end{cfacode}
1127	would generate
1128	\begin{cfacode}
1129	A x;
1130	__attribute__((constructor(200))) __init_x() {
1131	?{}(&x, 123); // construct x with priority 200
1132	}
1133	__attribute__((destructor(200))) __destroy_x() {
1134	?{}(&x); // destruct x with priority 200
1135	}
1136	\end{cfacode}
1137
1138	\subsection{Static Local Variables}
1139	In standard C, it is possible to mark variables that are local to a function with the @static@ storage class.
1140	Unlike normal local variables, a @static@ local variable is defined to live for the entire duration of the program, so that each call to the function has access to the same variable with the same address and value as it had in the previous call to the function.
1141	Much like global variables, @static@ variables can only be initialized to a \emph{compile-time constant value} so that a compiler is able to create storage for the variable and initialize it at compile-time.
1142
1143	Yet again, this rule is too restrictive for a language with constructors and destructors.
1144	Since the initializer expression is not necessarily a compile-time constant and can depend on the current execution state of the function, \CFA modifies the definition of a @static@ local variable so that objects are guaranteed to be live from the time control flow reaches their declaration, until the end of the program.
1145	Since standard C does not allow access to a @static@ local variable before the first time control flow reaches the declaration, this change does not preclude any valid C code.
1146	Local objects with @static@ storage class are only implicitly constructed and destructed once for the duration of the program.
1147	The object is constructed when its declaration is reached for the first time.
1148	The object is destructed once at the end of the program.
1149
1150	Construction of @static@ local objects is implemented via an accompanying @static bool@ variable, which records whether the variable has already been constructed.
1151	A conditional branch checks the value of the companion @bool@, and if the variable has not yet been constructed then the object is constructed.
1152	The object's destructor is scheduled to be run when the program terminates using @atexit@ \footnote{When using the dynamic linker, it is possible to dynamically load and unload a shared library. Since glibc 2.2.3 \cite{atexit}, functions registered with @atexit@ within the shared library are called when unloading the shared library. As such, static local objects can be destructed using this mechanism even in shared libraries on Linux systems.}, and the companion @bool@'s value is set so that subsequent invocations of the function do not reconstruct the object.
1153	Since the parameter to @atexit@ is a parameter-less function, some additional tweaking is required.
1154	First, the @static@ variable must be hoisted up to global scope and uniquely renamed to prevent name clashes with other global objects.
1155	If necessary, a local structure may need to be hoisted, as well.
1156	Second, a function is built that calls the destructor for the newly hoisted variable.
1157	Finally, the newly generated function is registered with @atexit@, instead of registering the destructor directly.
1158	Since @atexit@ calls functions in the reverse order in which they are registered, @static@ local variables are guaranteed to be destructed in the reverse order that they are constructed, which may differ between multiple executions of the same program.
1159	Extending the previous example
1160	\begin{cfacode}
1161	int f(int x) {
1162	static X a;
1163	static X b = { x, x }; // depends on parameter value
1164	static X c = b; // depends on local variable
1165	}
1166	\end{cfacode}
1167	Generates the following.
1168	\begin{cfacode}
1169	static struct X a_static_var0;
1170	static void __a_dtor_atexit0(void){
1171	((void)^?{}(((struct X *)(&a_static_var0))));
1172	}
1173	static struct X b_static_var1;
1174	static void __b_dtor_atexit1(void){
1175	((void)^?{}(((struct X *)(&b_static_var1))));
1176	}
1177	static struct X c_static_var2;
1178	static void __c_dtor_atexit2(void){
1179	((void)^?{}(((struct X *)(&c_static_var2))));
1180	}
1181	int f(int x){
1182	int _retval_f;
1183	__attribute__ ((unused)) static void *_dummy0;
1184	static _Bool __a_uninitialized = 1;
1185	if ( __a_uninitialized ) {
1186	((void)?{}(((struct X *)(&a_static_var0))));
1187	((void)(__a_uninitialized=0));
1188	((void)atexit(__a_dtor_atexit0));
1189	}
1190
1191	__attribute__ ((unused)) static void *_dummy1;
1192	static _Bool __b_uninitialized = 1;
1193	if ( __b_uninitialized ) {
1194	((void)?{}(((struct X *)(&b_static_var1)), x, x));
1195	((void)(__b_uninitialized=0));
1196	((void)atexit(__b_dtor_atexit1));
1197	}
1198
1199	__attribute__ ((unused)) static void *_dummy2;
1200	static _Bool __c_uninitialized = 1;
1201	if ( __c_uninitialized ) {
1202	((void)?{}(((struct X *)(&c_static_var2)), b_static_var1));
1203	((void)(__c_uninitialized=0));
1204	((void)atexit(__c_dtor_atexit2));
1205	}
1206	}
1207	\end{cfacode}
1208
1209	This implementation comes at the runtime cost of an additional branch for every @static@ local variable, each time the function is called.
1210	Since initializers are not required to be compile-time constant expressions, they can involve global variables, function arguments, function calls, etc.
1211	As a direct consequence, @static@ local variables cannot be initialized with an attribute-constructor routines like global variables can.
1212	However, in the case where the variable is unmanaged and has a compile-time constant initializer, a C-compliant initializer is generated and the additional cost is not present.
1213	\CC shares the same semantics for its @static@ local variables.
1214
1215	\subsection{Polymorphism}
1216	As mentioned in section \ref{sub:polymorphism}, \CFA currently has 3 type-classes that are used to designate polymorphic data types: @otype@, @dtype@, and @ftype@.
1217	In previous versions of \CFA, @otype@ was syntactic sugar for @dtype@ with known size/alignment information and an assignment function.
1218	That is,
1219	\begin{cfacode}
1220	forall(otype T)
1221	void f(T);
1222	\end{cfacode}
1223	was equivalent to
1224	\begin{cfacode}
1225	forall(dtype T \| sized(T) \| { T ?=?(T *, T); })
1226	void f(T);
1227	\end{cfacode}
1228	This allows easily specifying constraints that are common to all complete object-types very simply.
1229
1230	Now that \CFA has constructors and destructors, more of a complete object's behaviour can be specified than was previously possible.
1231	As such, @otype@ has been augmented to include assertions for a default constructor, copy constructor, and destructor.
1232	That is, the previous example is now equivalent to
1233	\begin{cfacode}
1234	forall(dtype T \| sized(T) \|
1235	{ T ?=?(T , T); void ?{}(T ); void ?{}(T , T); void ^?{}(T ); })
1236	void f(T);
1237	\end{cfacode}
1238	These additions allow @f@'s body to create and destroy objects of type @T@, and pass objects of type @T@ as arguments to other functions, following the normal \CFA rules.
1239	A point of note here is that objects can be missing default constructors (and eventually other functions through deleted functions), so it is important for \CFA programmers to think carefully about the operations needed by their function, as to not over-constrain the acceptable parameter types and prevent potential reuse.
1240
1241	These additional assertion parameters impose a runtime cost on all managed temporary objects created in polymorphic code, even those with trivial constructors and destructors.
1242	This cost is necessary because polymorphic code does not know the actual type at compile-time, due to separate compilation.
1243	Since trivial constructors and destructors either do not perform operations or are simply bit-wise copy operations, the imposed cost is essentially the cost of the function calls.
1244
1245	\section{Summary}
1246
1247	When creating a new object of a managed type, it is guaranteed that a constructor is be called to initialize the object at its definition point, and is destructed when the object's lifetime ends.
1248	Destructors are called in the reverse order of construction.
1249
1250	Every argument passed to a function is copy constructed into a temporary object that is passed by value to the functions and destructed at the end of the statement.
1251	Function return values are copy constructed inside the function at the return statement, passed by value to the call-site, and destructed at the call-site at the end of the statement.
1252
1253	Every complete object type has a default constructor, copy constructor, assignment operator, and destructor.
1254	To accomplish this, these functions are generated as appropriate for new types.
1255	User-defined functions shadow built-in and automatically generated functions, so it is possible to specialize the behaviour of a type.
1256	Furthermore, default constructors and aggregate field constructors are hidden when \emph{any} constructor is defined.
1257
1258	Objects dynamically allocated with @malloc@, \ateq objects, and objects with only trivial constructors and destructors are unmanaged.
1259	Unmanaged objects are never the target of an implicit constructor or destructor call.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: