source: doc/rob_thesis/ctordtor.tex@ eaa2f3a1

ADT aaron-thesis arm-eh ast-experimental cleanup-dtors deferred_resn demangler enum forall-pointer-decay jacob/cs343-translation jenkins-sandbox new-ast new-ast-unique-expr new-env no_list persistent-indexer pthread-emulation qualifiedEnum resolv-new with_gc
Last change on this file since eaa2f3a1 was f92aa32, checked in by Rob Schluntz <rschlunt@…>, 9 years ago

thesis conclusions and editting pass

  • Property mode set to 100644
File size: 58.2 KB
RevLine 
[9c14ae9]1%======================================================================
2\chapter{Constructors and Destructors}
3%======================================================================
4
[f92aa32]5% TODO now: as an experiment, implement Andrei Alexandrescu's ScopeGuard http://www.drdobbs.com/cpp/generic-change-the-way-you-write-excepti/184403758?pgno=2
[9c14ae9]6% doesn't seem possible to do this without allowing ttype on generic structs?
7
8Since \CFA is a true systems language, it does not provide a garbage collector.
[7493339]9As well, \CFA is not an object-oriented programming language, i.e., structures cannot have routine members.
[9c14ae9]10Nevertheless, one important goal is to reduce programming complexity and increase safety.
11To that end, \CFA provides support for implicit pre/post-execution of routines for objects, via constructors and destructors.
12
13This chapter details the design of constructors and destructors in \CFA, along with their current implementation in the translator.
[f92aa32]14Generated code samples have been edited for clarity and brevity.
[9c14ae9]15
16\section{Design Criteria}
17\label{s:Design}
18In designing constructors and destructors for \CFA, the primary goals were ease of use and maintaining backwards compatibility.
19
20In C, when a variable is defined, its value is initially undefined unless it is explicitly initialized or allocated in the static area.
21\begin{cfacode}
22int main() {
23 int x; // uninitialized
24 int y = 5; // initialized to 5
25 x = y; // assigned 5
26 static int z; // initialized to 0
27}
28\end{cfacode}
29In the example above, @x@ is defined and left uninitialized, while @y@ is defined and initialized to 5.
30Next, @x@ is assigned the value of @y@.
31In the last line, @z@ is implicitly initialized to 0 since it is marked @static@.
[7493339]32The key difference between assignment and initialization being that assignment occurs on a live object (i.e., an object that contains data).
[9c14ae9]33It is important to note that this means @x@ could have been used uninitialized prior to being assigned, while @y@ could not be used uninitialized.
[7493339]34Use of uninitialized variables yields undefined behaviour, which is a common source of errors in C programs.
[9c14ae9]35
[f92aa32]36Initialization of a declaration is strictly optional, permitting uninitialized variables to exist.
37Furthermore, declaration initialization is limited to expressions, so there is no way to insert arbitrary code before a variable is live, without delaying the declaration.
[7493339]38Many C compilers give good warnings for uninitialized variables most of the time, but they cannot in all cases.
[9c14ae9]39\begin{cfacode}
[7493339]40int f(int *); // output parameter: never reads, only writes
41int g(int *); // input parameter: never writes, only reads,
42 // so requires initialized variable
[9c14ae9]43
44int x, y;
45f(&x); // okay - only writes to x
[7493339]46g(&y); // uses y uninitialized
[9c14ae9]47\end{cfacode}
[7493339]48Other languages are able to give errors in the case of uninitialized variable use, but due to backwards compatibility concerns, this is not the case in \CFA.
[9c14ae9]49
[f92aa32]50In C, constructors and destructors are often mimicked by providing routines that create and tear down objects, where the tear down function is typically only necessary if the type modifies the execution environment.
[9c14ae9]51\begin{cfacode}
52struct array_int {
53 int * x;
54};
55struct array_int create_array(int sz) {
[7493339]56 return (struct array_int) { calloc(sizeof(int)*sz) };
[9c14ae9]57}
58void destroy_rh(struct resource_holder * rh) {
59 free(rh->x);
60}
61\end{cfacode}
62This idiom does not provide any guarantees unless the structure is opaque, which then requires that all objects are heap allocated.
63\begin{cfacode}
64struct opqaue_array_int;
65struct opqaue_array_int * create_opqaue_array(int sz);
66void destroy_opaque_array(opaque_array_int *);
67int opaque_get(opaque_array_int *); // subscript
68
69opaque_array_int * x = create_opaque_array(10);
70int x2 = opaque_get(x, 2);
71\end{cfacode}
72This pattern is cumbersome to use since every access becomes a function call.
73While useful in some situations, this compromise is too restrictive.
74Furthermore, even with this idiom it is easy to make mistakes, such as forgetting to destroy an object or destroying it multiple times.
75
[f92aa32]76A constructor provides a way of ensuring that the necessary aspects of object initialization is performed, from setting up invariants to providing compile- and run-time checks for appropriate initialization parameters.
[9c14ae9]77This goal is achieved through a guarantee that a constructor is called implicitly after every object is allocated from a type with associated constructors, as part of an object's definition.
78Since a constructor is called on every object of a managed type, it is impossible to forget to initialize such objects, as long as all constructors perform some sensible form of initialization.
79
80In \CFA, a constructor is a function with the name @?{}@.
[7493339]81Like other operators in \CFA, the name represents the syntax used to call the constructor, e.g., @struct S = { ... };@.
[9c14ae9]82Every constructor must have a return type of @void@ and at least one parameter, the first of which is colloquially referred to as the \emph{this} parameter, as in many object-oriented programming-languages (however, a programmer can give it an arbitrary name).
83The @this@ parameter must have a pointer type, whose base type is the type of object that the function constructs.
84There is precedence for enforcing the first parameter to be the @this@ parameter in other operators, such as the assignment operator, where in both cases, the left-hand side of the equals is the first parameter.
85There is currently a proposal to add reference types to \CFA.
86Once this proposal has been implemented, the @this@ parameter will become a reference type with the same restrictions.
87
88Consider the definition of a simple type encapsulating a dynamic array of @int@s.
89
90\begin{cfacode}
91struct Array {
92 int * data;
93 int len;
94}
95\end{cfacode}
96
97In C, if the user creates an @Array@ object, the fields @data@ and @len@ are uninitialized, unless an explicit initializer list is present.
[7493339]98It is the user's responsibility to remember to initialize both of the fields to sensible values, since there are no implicit checks for invalid values or reasonable defaults.
[9c14ae9]99In \CFA, the user can define a constructor to handle initialization of @Array@ objects.
100
101\begin{cfacode}
102void ?{}(Array * arr){
103 arr->len = 10; // default size
104 arr->data = malloc(sizeof(int)*arr->len);
105 for (int i = 0; i < arr->len; ++i) {
106 arr->data[i] = 0;
107 }
108}
109Array x; // allocates storage for Array and calls ?{}(&x)
110\end{cfacode}
111
112This constructor initializes @x@ so that its @length@ field has the value 10, and its @data@ field holds a pointer to a block of memory large enough to hold 10 @int@s, and sets the value of each element of the array to 0.
113This particular form of constructor is called the \emph{default constructor}, because it is called on an object defined without an initializer.
[7493339]114In other words, a default constructor is a constructor that takes a single argument: the @this@ parameter.
[9c14ae9]115
[f92aa32]116In \CFA, a destructor is a function much like a constructor, except that its name is \lstinline!^?{}! and it take only one argument.
[9c14ae9]117A destructor for the @Array@ type can be defined as such.
118\begin{cfacode}
119void ^?{}(Array * arr) {
120 free(arr->data);
121}
122\end{cfacode}
[7493339]123The destructor is automatically called at deallocation for all objects of type @Array@.
124Hence, the memory associated with an @Array@ is automatically freed when the object's lifetime ends.
[9c14ae9]125The exact guarantees made by \CFA with respect to the calling of destructors are discussed in section \ref{sub:implicit_dtor}.
126
127As discussed previously, the distinction between initialization and assignment is important.
128Consider the following example.
129\begin{cfacode}[numbers=left]
130Array x, y;
131Array z = x; // initialization
132y = x; // assignment
133\end{cfacode}
134By the previous definition of the default constructor for @Array@, @x@ and @y@ are initialized to valid arrays of length 10 after their respective definitions.
[7493339]135On line 2, @z@ is initialized with the value of @x@, while on line 3, @y@ is assigned the value of @x@.
[9c14ae9]136The key distinction between initialization and assignment is that a value to be initialized does not hold any meaningful values, whereas an object to be assigned might.
137In particular, these cases cannot be handled the same way because in the former case @z@ does not currently own an array, while @y@ does.
138
139\begin{cfacode}[emph={other}, emphstyle=\color{red}]
140void ?{}(Array * arr, Array other) { // copy constructor
141 arr->len = other.len; // initialization
142 arr->data = malloc(sizeof(int)*arr->len)
143 for (int i = 0; i < arr->len; ++i) {
144 arr->data[i] = other.data[i]; // copy from other object
145 }
146}
147Array ?=?(Array * arr, Array other) { // assignment
148 ^?{}(arr); // explicitly call destructor
149 ?{}(arr, other); // explicitly call constructor
150 return *arr;
151}
152\end{cfacode}
153The two functions above handle these cases.
154The first function is called a \emph{copy constructor}, because it constructs its argument by copying the values from another object of the same type.
155The second function is the standard copy-assignment operator.
[7493339]156The four functions (default constructor, destructor, copy constructor, and assignment operator) are special in that they safely control the state of most objects.
[9c14ae9]157
158It is possible to define a constructor that takes any combination of parameters to provide additional initialization options.
[f92aa32]159For example, a reasonable extension to the array type would be a constructor that allocates the array to a given initial capacity and initializes the elements of the array to a given @fill@ value.
[9c14ae9]160\begin{cfacode}
161void ?{}(Array * arr, int capacity, int fill) {
162 arr->len = capacity;
163 arr->data = malloc(sizeof(int)*arr->len);
164 for (int i = 0; i < arr->len; ++i) {
165 arr->data[i] = fill;
166 }
167}
168\end{cfacode}
169In \CFA, constructors are called implicitly in initialization contexts.
170\begin{cfacode}
171Array x, y = { 20, 0xdeadbeef }, z = y;
172\end{cfacode}
[7493339]173
[9c14ae9]174In \CFA, constructor calls look just like C initializers, which allows them to be inserted into legacy C code with minimal code changes, and also provides a very simple syntax that veteran C programmers are familiar with.
175One downside of reusing C initialization syntax is that it isn't possible to determine whether an object is constructed just by looking at its declaration, since that requires knowledge of whether the type is managed at that point.
176
177This example generates the following code
178\begin{cfacode}
179Array x;
180?{}(&x); // implicit default construct
181Array y;
182?{}(&y, 20, 0xdeadbeef); // explicit fill construct
183Array z;
184?{}(&z, y); // copy construct
185^?{}(&z); // implicit destruct
186^?{}(&y); // implicit destruct
187^?{}(&x); // implicit destruct
188\end{cfacode}
189Due to the way that constructor calls are interleaved, it is impossible for @y@ to be referenced before it is initialized, except in its own constructor.
190This loophole is minor and exists in \CC as well.
191Destructors are implicitly called in reverse declaration-order so that objects with dependencies are destructed before the objects they are dependent on.
192
[7493339]193\subsection{Calling Syntax}
194\label{sub:syntax}
[9c14ae9]195There are several ways to construct an object in \CFA.
196As previously introduced, every variable is automatically constructed at its definition, which is the most natural way to construct an object.
197\begin{cfacode}
198struct A { ... };
199void ?{}(A *);
200void ?{}(A *, A);
201void ?{}(A *, int, int);
202
203A a1; // default constructed
204A a2 = { 0, 0 }; // constructed with 2 ints
205A a3 = a1; // copy constructed
206// implicitly destruct a3, a2, a1, in that order
207\end{cfacode}
208Since constructors and destructors are just functions, the second way is to call the function directly.
209\begin{cfacode}
210struct A { int a; };
211void ?{}(A *);
212void ?{}(A *, A);
213void ^?{}(A *);
214
215A x; // implicitly default constructed: ?{}(&x)
216A * y = malloc(); // copy construct: ?{}(&y, malloc())
217
[7493339]218?{}(&x); // explicit construct x, second construction
219?{}(y, x); // explit construct y from x, second construction
220^?{}(&x); // explicit destroy x, in different order
[9c14ae9]221^?{}(y); // explicit destroy y
222
223// implicit ^?{}(&y);
224// implicit ^?{}(&x);
225\end{cfacode}
[7493339]226Calling a constructor or destructor directly is a flexible feature that allows complete control over the management of storage.
[9c14ae9]227In particular, constructors double as a placement syntax.
228\begin{cfacode}
229struct A { ... };
230struct memory_pool { ... };
231void ?{}(memory_pool *, size_t);
232
233memory_pool pool = { 1024 }; // create an arena of size 1024
234
235A * a = allocate(&pool); // allocate from memory pool
236?{}(a); // construct an A in place
237
238for (int i = 0; i < 10; i++) {
239 // reuse storage rather than reallocating
240 ^?{}(a);
241 ?{}(a);
242 // use a ...
243}
244^?{}(a);
245deallocate(&pool, a); // return to memory pool
246\end{cfacode}
247Finally, constructors and destructors support \emph{operator syntax}.
248Like other operators in \CFA, the function name mirrors the use-case, in that the first $N$ arguments fill in the place of the question mark.
[7493339]249This syntactic form is similar to the new initialization syntax in \CCeleven, except that it is used in expression contexts, rather than declaration contexts.
[9c14ae9]250\begin{cfacode}
251struct A { ... };
252struct B { A a; };
253
254A x, y, * z = &x;
255(&x){} // default construct
256(&x){ y } // copy construct
257(&x){ 1, 2, 3 } // construct with 3 arguments
258z{ y }; // copy construct x through a pointer
259^(&x){} // destruct
260
261void ?{}(B * b) {
262 (&b->a){ 11, 17, 13 }; // construct a member
263}
264\end{cfacode}
265Constructor operator syntax has relatively high precedence, requiring parentheses around an address-of expression.
266Destructor operator syntax is actually an statement, and requires parentheses for symmetry with constructor syntax.
267
[7493339]268One of these three syntactic forms should appeal to either C or \CC programmers using \CFA.
269
[f92aa32]270\subsection{Constructor Expressions}
271In \CFA, it is possible to use a constructor as an expression.
272Like other operators, the function name @?{}@ matches its operator syntax.
273For example, @(&x){}@ calls the default constructor on the variable @x@, and produces @&x@ as a result.
274A key example for this capability is the use of constructor expressions to initialize the result of a call to standard C routine @malloc@.
275\begin{cfacode}
276struct X { ... };
277void ?{}(X *, double);
278X * x = malloc(sizeof(X)){ 1.5 };
279\end{cfacode}
280In this example, @malloc@ dynamically allocates storage and initializes it using a constructor, all before assigning it into the variable @x@.
281If this extension is not present, constructing dynamically allocated objects is much more cumbersome, requiring separate initialization of the pointer and initialization of the pointed-to memory.
282\begin{cfacode}
283X * x = malloc(sizeof(X));
284x{ 1.5 };
285\end{cfacode}
286Not only is this verbose, but it is also more error prone, since this form allows maintenance code to easily sneak in between the initialization of @x@ and the initialization of the memory that @x@ points to.
287This feature is implemented via a transformation producing the value of the first argument of the constructor, since constructors do not themselves have a return value.
288Since this transformation results in two instances of the subexpression, care is taken to allocate a temporary variable to hold the result of the subexpression in the case where the subexpression may contain side effects.
289The previous example generates the following code.
290\begin{cfacode}
291struct X *_tmp_ctor;
292struct X *x = ?{}( // construct result of malloc
293 _tmp_ctor=malloc(sizeof(struct X)), // store result of malloc
294 1.5
295), _tmp_ctor; // produce constructed result of malloc
296\end{cfacode}
297It should be noted that this technique is not exclusive to @malloc@, and allows a user to write a custom allocator that can be idiomatically used in much the same way as a constructed @malloc@ call.
298
299It is also possible to use operator syntax with destructors.
300Unlike constructors, operator syntax with destructors is a statement and thus does not produce a value, since the destructed object is invalidated by the use of a destructor.
301For example, \lstinline!^(&x){}! calls the destructor on the variable @x@.
302
[9c14ae9]303\subsection{Function Generation}
[f92aa32]304In \CFA, every type is defined to have the core set of four special functions described previously.
[9c14ae9]305Having these functions exist for every type greatly simplifies the semantics of the language, since most operations can simply be defined directly in terms of function calls.
306In addition to simplifying the definition of the language, it also simplifies the analysis that the translator must perform.
307If the translator can expect these functions to exist, then it can unconditionally attempt to resolve them.
308Moreover, the existence of a standard interface allows polymorphic code to interoperate with new types seamlessly.
309
310To mimic the behaviour of standard C, the default constructor and destructor for all of the basic types and for all pointer types are defined to do nothing, while the copy constructor and assignment operator perform a bitwise copy of the source parameter (as in \CC).
311
312There are several options for user-defined types: structures, unions, and enumerations.
313To aid in ease of use, the standard set of four functions is automatically generated for a user-defined type after its definition is completed.
[7493339]314By auto-generating these functions, it is ensured that legacy C code continues to work correctly in every context where \CFA expects these functions to exist, since they are generated for every complete type.
[9c14ae9]315
316The generated functions for enumerations are the simplest.
[f92aa32]317Since enumerations in C are essentially just another integral type, the generated functions behave in the same way that the built-in functions for the basic types work.
[9c14ae9]318For example, given the enumeration
319\begin{cfacode}
320enum Colour {
321 R, G, B
322};
323\end{cfacode}
324The following functions are automatically generated.
325\begin{cfacode}
326void ?{}(enum Colour *_dst){
327 // default constructor does nothing
328}
329void ?{}(enum Colour *_dst, enum Colour _src){
[f92aa32]330 *_dst=_src; // bitwise copy
[9c14ae9]331}
332void ^?{}(enum Colour *_dst){
333 // destructor does nothing
334}
335enum Colour ?=?(enum Colour *_dst, enum Colour _src){
[f92aa32]336 return *_dst=_src; // bitwise copy
[9c14ae9]337}
338\end{cfacode}
339In the future, \CFA will introduce strongly-typed enumerations, like those in \CC.
[7493339]340The existing generated routines are sufficient to express this restriction, since they are currently set up to take in values of that enumeration type.
[9c14ae9]341Changes related to this feature only need to affect the expression resolution phase, where more strict rules will be applied to prevent implicit conversions from integral types to enumeration types, but should continue to permit conversions from enumeration types to @int@.
[7493339]342In this way, it is still possible to add an @int@ to an enumeration, but the resulting value is an @int@, meaning it cannot be reassigned to an enumeration without a cast.
[9c14ae9]343
344For structures, the situation is more complicated.
[7493339]345Given a structure @S@ with members @M$_0$@, @M$_1$@, ... @M$_{N-1}$@, each function @f@ in the standard set calls \lstinline{f(s->M$_i$, ...)} for each @$i$@.
346That is, a default constructor for @S@ default constructs the members of @S@, the copy constructor copy constructs them, and so on.
347For example, given the structure definition
[9c14ae9]348\begin{cfacode}
349struct A {
350 B b;
351 C c;
352}
353\end{cfacode}
354The following functions are implicitly generated.
355\begin{cfacode}
356void ?{}(A * this) {
357 ?{}(&this->b); // default construct each field
358 ?{}(&this->c);
359}
360void ?{}(A * this, A other) {
361 ?{}(&this->b, other.b); // copy construct each field
362 ?{}(&this->c, other.c);
363}
364A ?=?(A * this, A other) {
365 ?=?(&this->b, other.b); // assign each field
366 ?=?(&this->c, other.c);
367}
368void ^?{}(A * this) {
369 ^?{}(&this->c); // destruct each field
370 ^?{}(&this->b);
371}
372\end{cfacode}
[7493339]373It is important to note that the destructors are called in reverse declaration order to prevent conflicts in the event there are dependencies among members.
[9c14ae9]374
375In addition to the standard set, a set of \emph{field constructors} is also generated for structures.
[7493339]376The field constructors are constructors that consume a prefix of the structure's member-list.
[9c14ae9]377That is, $N$ constructors are built of the form @void ?{}(S *, T$_{\text{M}_0}$)@, @void ?{}(S *, T$_{\text{M}_0}$, T$_{\text{M}_1}$)@, ..., @void ?{}(S *, T$_{\text{M}_0}$, T$_{\text{M}_1}$, ..., T$_{\text{M}_{N-1}}$)@, where members are copy constructed if they have a corresponding positional argument and are default constructed otherwise.
[7493339]378The addition of field constructors allows structures in \CFA to be used naturally in the same ways as used in C (i.e., to initialize any prefix of the structure), e.g., @A a0 = { b }, a1 = { b, c }@.
[9c14ae9]379Extending the previous example, the following constructors are implicitly generated for @A@.
380\begin{cfacode}
381void ?{}(A * this, B b) {
382 ?{}(&this->b, b);
383 ?{}(&this->c);
384}
385void ?{}(A * this, B b, C c) {
386 ?{}(&this->b, b);
387 ?{}(&this->c, c);
388}
389\end{cfacode}
390
[7493339]391For unions, the default constructor and destructor do nothing, as it is not obvious which member, if any, should be constructed.
[9c14ae9]392For copy constructor and assignment operations, a bitwise @memcpy@ is applied.
393In standard C, a union can also be initialized using a value of the same type as its first member, and so a corresponding field constructor is generated to perform a bitwise @memcpy@ of the object.
[f92aa32]394An alternative to this design is to always construct and destruct the first member of a union, to match with the C semantics of initializing the first member of the union.
[9c14ae9]395This approach ultimately feels subtle and unsafe.
396Another option is to, like \CC, disallow unions from containing members that are themselves managed types.
397This restriction is a reasonable approach from a safety standpoint, but is not very C-like.
398Since the primary purpose of a union is to provide low-level memory optimization, it is assumed that the user has a certain level of maturity.
399It is therefore the responsibility of the user to define the special functions explicitly if they are appropriate, since it is impossible to accurately predict the ways that a union is intended to be used at compile-time.
400
401For example, given the union
402\begin{cfacode}
403union X {
404 Y y;
405 Z z;
406};
407\end{cfacode}
408The following functions are automatically generated.
409\begin{cfacode}
410void ?{}(union X *_dst){ // default constructor
411}
412void ?{}(union X *_dst, union X _src){ // copy constructor
413 __builtin_memcpy(_dst, &_src, sizeof(union X ));
414}
415void ^?{}(union X *_dst){ // destructor
416}
417union X ?=?(union X *_dst, union X _src){ // assignment
418 __builtin_memcpy(_dst, &_src, sizeof(union X));
419 return _src;
420}
421void ?{}(union X *_dst, struct Y src){ // construct first field
422 __builtin_memcpy(_dst, &src, sizeof(struct Y));
423}
424\end{cfacode}
425
426% This feature works in the \CFA model, since constructors are simply special functions and can be called explicitly, unlike in \CC. % this sentence isn't really true => placement new
[7493339]427In \CCeleven, unions may have managed members, with the caveat that if there are any members with a user-defined operation, then that operation is not implicitly defined, forcing the user to define the operation if necessary.
[9c14ae9]428This restriction could easily be added into \CFA once \emph{deleted} functions are added.
429
430\subsection{Using Constructors and Destructors}
431Implicitly generated constructor and destructor calls ignore the outermost type qualifiers, e.g. @const@ and @volatile@, on a type by way of a cast on the first argument to the function.
432For example,
433\begin{cfacode}
434struct S { int i; };
435void ?{}(S *, int);
436void ?{}(S *, S);
437
438const S s = { 11 };
439volatile S s2 = s;
440\end{cfacode}
441Generates the following code
442\begin{cfacode}
443const struct S s;
444?{}((struct S *)&s, 11);
445volatile struct S s2;
446?{}((struct S *)&s2, s);
447\end{cfacode}
448Here, @&s@ and @&s2@ are cast to unqualified pointer types.
449This mechanism allows the same constructors and destructors to be used for qualified objects as for unqualified objects.
[7493339]450This applies only to implicitly generated constructor calls.
451Hence, explicitly re-initializing qualified objects with a constructor requires an explicit cast.
452
453As discussed in Section \ref{sub:c_background}, compound literals create unnamed objects.
454This mechanism can continue to be used seamlessly in \CFA with managed types to create temporary objects.
455The object created by a compound literal is constructed using the provided brace-enclosed initializer-list, and is destructed at the end of the scope it is used in.
456For example,
457\begin{cfacode}
458struct A { int x; };
459void ?{}(A *, int, int);
460{
461 int x = (A){ 10, 20 }.x;
462}
463\end{cfacode}
464is equivalent to
465\begin{cfacode}
466struct A { int x, y; };
467void ?{}(A *, int, int);
468{
469 A _tmp;
470 ?{}(&_tmp, 10, 20);
471 int x = _tmp.x;
472 ^?{}(&tmp);
473}
474\end{cfacode}
[9c14ae9]475
476Unlike \CC, \CFA provides an escape hatch that allows a user to decide at an object's definition whether it should be managed or not.
477An object initialized with \ateq is guaranteed to be initialized like a C object, and has no implicit destructor call.
478This feature provides all of the freedom that C programmers are used to having to optimize a program, while maintaining safety as a sensible default.
479\begin{cfacode}
480struct A { int * x; };
481// RAII
482void ?{}(A * a) { a->x = malloc(sizeof(int)); }
483void ^?{}(A * a) { free(a->x); }
484
485A a1; // managed
486A a2 @= { 0 }; // unmanaged
487\end{cfacode}
[7493339]488In this example, @a1@ is a managed object, and thus is default constructed and destructed at the start/end of @a1@'s lifetime, while @a2@ is an unmanaged object and is not implicitly constructed or destructed.
489Instead, @a2->x@ is initialized to @0@ as if it were a C object, because of the explicit initializer.
[9c14ae9]490
[f92aa32]491In addition to freedom, \ateq provides a simple path to migrating legacy C code to \CFA, in that objects can be moved from C-style initialization to \CFA gradually and individually.
[9c14ae9]492It is worth noting that the use of unmanaged objects can be tricky to get right, since there is no guarantee that the proper invariants are established on an unmanaged object.
493It is recommended that most objects be managed by sensible constructors and destructors, except where absolutely necessary.
494
[7493339]495When a user declares any constructor or destructor, the corresponding intrinsic/generated function and all field constructors for that type are hidden, so that they are not found during expression resolution until the user-defined function goes out of scope.
496Furthermore, if the user declares any constructor, then the intrinsic/generated default constructor is also hidden, precluding default construction.
497These semantics closely mirror the rule for implicit declaration of constructors in \CC, wherein the default constructor is implicitly declared if there is no user-declared constructor \cite[p.~186]{ANSI98:C++}.
[9c14ae9]498\begin{cfacode}
499struct S { int x, y; };
500
501void f() {
502 S s0, s1 = { 0 }, s2 = { 0, 2 }, s3 = s2; // okay
503 {
[7493339]504 void ?{}(S * s, int i) { s->x = i*2; } // locally hide autogen constructors
[9c14ae9]505 S s4; // error
506 S s5 = { 3 }; // okay
507 S s6 = { 4, 5 }; // error
508 S s7 = s5; // okay
509 }
510 S s8, s9 = { 6 }, s10 = { 7, 8 }, s11 = s10; // okay
511}
512\end{cfacode}
513In this example, the inner scope declares a constructor from @int@ to @S@, which hides the default constructor and field constructors until the end of the scope.
514
515When defining a constructor or destructor for a struct @S@, any members that are not explicitly constructed or destructed are implicitly constructed or destructed automatically.
516If an explicit call is present, then that call is taken in preference to any implicitly generated call.
[f92aa32]517A consequence of this rule is that it is possible, unlike \CC, to precisely control the order of construction and destruction of sub-objects on a per-constructor basis, whereas in \CC sub-object initialization and destruction is always performed based on the declaration order.
[9c14ae9]518\begin{cfacode}
519struct A {
520 B w, x, y, z;
521};
522void ?{}(A * a, int i) {
523 (&a->x){ i };
524 (&a->z){ a->y };
525}
526\end{cfacode}
527Generates the following
528\begin{cfacode}
529void ?{}(A * a, int i) {
530 (&a->w){}; // implicit default ctor
531 (&a->y){}; // implicit default ctor
532 (&a->x){ i };
533 (&a->z){ a->y };
534}
535\end{cfacode}
[f92aa32]536Finally, it is illegal for a sub-object to be explicitly constructed for the first time after it is used for the first time.
[9c14ae9]537If the translator cannot be reasonably sure that an object is constructed prior to its first use, but is constructed afterward, an error is emitted.
[f92aa32]538More specifically, the translator searches the body of a constructor to ensure that every sub-object is initialized.
[9c14ae9]539\begin{cfacode}
540void ?{}(A * a, double x) {
541 f(a->x);
542 (&a->x){ (int)x }; // error, used uninitialized on previous line
543}
544\end{cfacode}
[f92aa32]545However, if the translator sees a sub-object used within the body of a constructor, but does not see a constructor call that uses the sub-object as the target of a constructor, then the translator assumes the object is to be implicitly constructed (copy constructed in a copy constructor and default constructed in any other constructor).
[9c14ae9]546\begin{cfacode}
547void ?{}(A * a) {
548 // default constructs all members
549 f(a->x);
550}
551
552void ?{}(A * a, A other) {
553 // copy constructs all members
554 f(a->y);
555}
556
557void ^?{}(A * a) {
558 ^(&a->x){}; // explicit destructor call
559} // z, y, w implicitly destructed, in this order
560\end{cfacode}
[f92aa32]561If at any point, the @this@ parameter is passed directly as the target of another constructor, then it is assumed that constructor handles the initialization of all of the object's members and no implicit constructor calls are added.
[9c14ae9]562To override this rule, \ateq can be used to force the translator to trust the programmer's discretion.
563This form of \ateq is not yet implemented.
564
565Despite great effort, some forms of C syntax do not work well with constructors in \CFA.
566In particular, constructor calls cannot contain designations (see \ref{sub:c_background}), since this is equivalent to allowing designations on the arguments to arbitrary function calls.
567\begin{cfacode}
568// all legal forward declarations in C
569void f(int, int, int);
570void f(int a, int b, int c);
571void f(int b, int c, int a);
572void f(int c, int a, int b);
573void f(int x, int y, int z);
574
575f(b:10, a:20, c:30); // which parameter is which?
576\end{cfacode}
[7493339]577In C, function prototypes are permitted to have arbitrary parameter names, including no names at all, which may have no connection to the actual names used at function definition.
578Furthermore, a function prototype can be repeated an arbitrary number of times, each time using different names.
[9c14ae9]579As a result, it was decided that any attempt to resolve designated function calls with C's function prototype rules would be brittle, and thus it is not sensible to allow designations in constructor calls.
580
[7493339]581In addition, constructor calls do not support unnamed nesting.
582\begin{cfacode}
583struct B { int x; };
584struct C { int y; };
585struct A { B b; C c; };
586void ?{}(A *, B);
587void ?{}(A *, C);
588
589A a = {
590 { 10 }, // construct B? - invalid
591};
592\end{cfacode}
[f92aa32]593In C, nesting initializers means that the programmer intends to initialize sub-objects with the nested initializers.
[7493339]594The reason for this omission is to both simplify the mental model for using constructors, and to make initialization simpler for the expression resolver.
595If this were allowed, it would be necessary for the expression resolver to decide whether each argument to the constructor call could initialize to some argument in one of the available constructors, making the problem highly recursive and potentially much more expensive.
596That is, in the previous example the line marked as an error could mean construct using @?{}(A *, B)@ or with @?{}(A *, C)@, since the inner initializer @{ 10 }@ could be taken as an intermediate object of type @B@ or @C@.
597In practice, however, there could be many objects that can be constructed from a given @int@ (or, indeed, any arbitrary parameter list), and thus a complete solution to this problem would require fully exploring all possibilities.
598
599More precisely, constructor calls cannot have a nesting depth greater than the number of array components in the type of the initialized object, plus one.
[9c14ae9]600For example,
601\begin{cfacode}
602struct A;
603void ?{}(A *, int);
604void ?{}(A *, A, A);
605
606A a1[3] = { { 3 }, { 4 }, { 5 } };
607A a2[2][2] = {
608 { { 9 }, { 10 } }, // a2[0]
609 { {14 }, { 15 } } // a2[1]
610};
611A a3[4] = {
612 { { 11 }, { 12 } }, // error
613 { 80 }, { 90 }, { 100 }
614}
615\end{cfacode}
616The body of @A@ has been omitted, since only the constructor interfaces are important.
[7493339]617
[9c14ae9]618It should be noted that unmanaged objects can still make use of designations and nested initializers in \CFA.
[7493339]619It is simple to overcome this limitation for managed objects by making use of compound literals, so that the arguments to the constructor call are explicitly typed.
[9c14ae9]620
621\subsection{Implicit Destructors}
622\label{sub:implicit_dtor}
623Destructors are automatically called at the end of the block in which the object is declared.
624In addition to this, destructors are automatically called when statements manipulate control flow to leave a block in which the object is declared, e.g., with return, break, continue, and goto statements.
625The example below demonstrates a simple routine with multiple return statements.
626\begin{cfacode}
627struct A;
628void ^?{}(A *);
629
630void f(int i) {
631 A x; // construct x
632 {
633 A y; // construct y
634 {
635 A z; // construct z
636 {
637 if (i == 0) return; // destruct x, y, z
638 }
639 if (i == 1) return; // destruct x, y, z
640 } // destruct z
641 if (i == 2) return; // destruct x, y
642 } // destruct y
[f92aa32]643} // destruct x
[9c14ae9]644\end{cfacode}
645
646The next example illustrates the use of simple continue and break statements and the manner that they interact with implicit destructors.
647\begin{cfacode}
648for (int i = 0; i < 10; i++) {
649 A x;
650 if (i == 2) {
651 continue; // destruct x
652 } else if (i == 3) {
653 break; // destruct x
654 }
655} // destruct x
656\end{cfacode}
657Since a destructor call is automatically inserted at the end of the block, nothing special needs to happen to destruct @x@ in the case where control reaches the end of the loop.
[7493339]658In the case where @i@ is @2@, the continue statement runs the loop update expression and attempts to begin the next iteration of the loop.
[f92aa32]659Since continue is a C statement, which does not understand destructors, it is transformed into a @goto@ statement that branches to the end of the loop, just before the block's destructors, to ensure that @x@ is destructed.
[9c14ae9]660When @i@ is @3@, the break statement moves control to just past the end of the loop.
[f92aa32]661Unlike the previous case, the destructor for @x@ cannot be reused, so a destructor call for @x@ is inserted just before the break statement.
[9c14ae9]662
[f92aa32]663\CFA also supports labeled break and continue statements, which allow more precise manipulation of control flow.
664Labeled break and continue allow the programmer to specify which control structure to target by using a label attached to a control structure.
[9c14ae9]665\begin{cfacode}[emph={L1,L2}, emphstyle=\color{red}]
666L1: for (int i = 0; i < 10; i++) {
667 A x;
[7493339]668 for (int j = 0; j < 10; j++) {
[9c14ae9]669 A y;
[7493339]670 if (i == 1) {
[9c14ae9]671 continue L1; // destruct y
672 } else if (i == 2) {
673 break L1; // destruct x,y
674 }
675 } // destruct y
676} // destruct X
677\end{cfacode}
678The statement @continue L1@ begins the next iteration of the outer for-loop.
[f92aa32]679Since the semantics of continue require the loop update expression to execute, control branches to the end of the outer for loop, meaning that the block destructor for @x@ can be reused, and it is only necessary to generate the destructor for @y@.
680Break, on the other hand, requires jumping out of both loops, so the destructors for both @x@ and @y@ are generated and inserted before the @break L1@ statement.
[9c14ae9]681
682Finally, an example which demonstrates goto.
683Since goto is a general mechanism for jumping to different locations in the program, a more comprehensive approach is required.
684For each goto statement $G$ and each target label $L$, let $S_G$ be the set of all managed variables alive at $G$, and let $S_L$ be the set of all managed variables alive at $L$.
685If at any $G$, $S_L \setminus S_G = \emptyset$, then the translator emits an error, because control flow branches from a point where the object is not yet live to a point where it is live, skipping the object's constructor.
686Then, for every $G$, the destructors for each variable in the set $S_G \setminus S_L$ is inserted directly before $G$, which ensures each object that is currently live at $G$, but not at $L$, is destructed before control branches.
687\begin{cfacode}
688int i = 0;
689{
690 L0: ; // S_L0 = { x }
691 A y;
692 L1: ; // S_L1 = { x }
693 A x;
694 L2: ; // S_L2 = { y, x }
695 if (i == 0) {
696 ++i;
697 goto L1; // S_G = { y, x }
698 // S_G-S_L1 = { x } => destruct x
699 } else if (i == 1) {
700 ++i;
701 goto L2; // S_G = { y, x }
702 // S_G-S_L2 = {} => destruct nothing
703 } else if (i == 2) {
704 ++i;
705 goto L3; // S_G = { y, x }
706 // S_G-S_L3 = {}
707 } else if (false) {
708 ++i;
709 A z;
710 goto L3; // S_G = { z, y, x }
711 // S_G-S_L3 = { z } => destruct z
712 } else {
713 ++i;
714 goto L4; // S_G = { y, x }
715 // S_G-S_L4 = { y, x } => destruct y, x
716 }
717 L3: ; // S_L3 = { y, x }
718 goto L2; // S_G = { y, x }
719 // S_G-S_L2 = {}
720}
721L4: ; // S_L4 = {}
722if (i == 4) {
723 goto L0; // S_G = {}
724 // S_G-S_L0 = {}
725}
726\end{cfacode}
[f92aa32]727All break and continue statements are implemented in \CFA in terms of goto statements, so the more constrained forms are precisely governed by these rules.
[9c14ae9]728
729The next example demonstrates the error case.
730\begin{cfacode}
731{
732 goto L1; // S_G = {}
733 // S_L1-S_G = { y } => error
734 A y;
735 L1: ; // S_L1 = { y }
736 A x;
737 L2: ; // S_L2 = { y, x }
738}
739goto L2; // S_G = {}
740// S_L2-S_G = { y, x } => error
741\end{cfacode}
742
743\subsection{Implicit Copy Construction}
[f92aa32]744\label{s:implicit_copy_construction}
[9c14ae9]745When a function is called, the arguments supplied to the call are subject to implicit copy construction (and destruction of the generated temporary), and the return value is subject to destruction.
746When a value is returned from a function, the copy constructor is called to pass the value back to the call site.
[f92aa32]747Exempt from these rules are intrinsic and built-in functions.
[9c14ae9]748It should be noted that unmanaged objects are subject to copy constructor calls when passed as arguments to a function or when returned from a function, since they are not the \emph{target} of the copy constructor call.
[7493339]749That is, since the parameter is not marked as an unmanaged object using \ateq, it will be copy constructed if it is returned by value or passed as an argument to another function, so to guarantee consistent behaviour, unmanaged objects must be copy constructed when passed as arguments.
[9c14ae9]750This is an important detail to bear in mind when using unmanaged objects, and could produce unexpected results when mixed with objects that are explicitly constructed.
751\begin{cfacode}
752struct A;
753void ?{}(A *);
754void ?{}(A *, A);
755void ^?{}(A *);
756
[7493339]757A identity(A x) { // pass by value => need local copy
758 return x; // return by value => make call-site copy
[9c14ae9]759}
760
761A y, z @= {};
[7493339]762identity(y); // copy construct y into x
763identity(z); // copy construct z into x
[9c14ae9]764\end{cfacode}
765Note that @z@ is copy constructed into a temporary variable to be passed as an argument, which is also destructed after the call.
766
767This generates the following
768\begin{cfacode}
769struct A f(struct A x){
[7493339]770 struct A _retval_f; // return value
771 ?{}((&_retval_f), x); // copy construct return value
[9c14ae9]772 return _retval_f;
773}
774
775struct A y;
[7493339]776?{}(&y); // default construct
777struct A z = { 0 }; // C default
778
779struct A _tmp_cp1; // argument 1
780struct A _tmp_cp_ret0; // return value
781_tmp_cp_ret0=f(
782 (?{}(&_tmp_cp1, y) , _tmp_cp1) // argument is a comma expression
783), _tmp_cp_ret0; // return value for cascading
784^?{}(&_tmp_cp_ret0); // destruct return value
785^?{}(&_tmp_cp1); // destruct argument 1
786
787struct A _tmp_cp2; // argument 1
788struct A _tmp_cp_ret1; // return value
789_tmp_cp_ret1=f(
790 (?{}(&_tmp_cp2, z), _tmp_cp2) // argument is a common expression
791), _tmp_cp_ret1; // return value for cascading
792^?{}(&_tmp_cp_ret1); // destruct return value
793^?{}(&_tmp_cp2); // destruct argument 1
[9c14ae9]794^?{}(&y);
795\end{cfacode}
796
[7493339]797A special syntactic form, such as a variant of \ateq, can be implemented to specify at the call site that an argument should not be copy constructed, to regain some control for the C programmer.
798\begin{cfacode}
799identity(z@); // do not copy construct argument
800 // - will copy construct/destruct return value
801A@ identity_nocopy(A @ x) { // argument not copy constructed or destructed
802 return x; // not copy constructed
803 // return type marked @ => not destructed
804}
805\end{cfacode}
806It should be noted that reference types will allow specifying that a value does not need to be copied, however reference types do not provide a means of preventing implicit copy construction from uses of the reference, so the problem is still present when passing or returning the reference by value.
807
[f92aa32]808A known issue with this implementation is that the argument and return value temporaries are not guaranteed to have the same address for their entire lifetimes.
809In the previous example, since @_retval_f@ is allocated and constructed in @f@, then returned by value, the internal data is bitwise copied into the caller's stack frame.
[9c14ae9]810This approach works out most of the time, because typically destructors need to only access the fields of the object and recursively destroy.
[7493339]811It is currently the case that constructors and destructors that use the @this@ pointer as a unique identifier to store data externally do not work correctly for return value objects.
[f92aa32]812Thus, it is currently not safe to rely on an object's @this@ pointer to remain constant throughout execution of the program.
[9c14ae9]813\begin{cfacode}
814A * external_data[32];
815int ext_count;
816struct A;
817void ?{}(A * a) {
818 // ...
819 external_data[ext_count++] = a;
820}
821void ^?{}(A * a) {
822 for (int i = 0; i < ext_count) {
823 if (a == external_data[i]) { // may never be true
824 // ...
825 }
826 }
827}
[7493339]828
829A makeA() {
830 A x; // stores &x in external_data
831 return x;
832}
833makeA(); // return temporary has a different address than x
834// equivalent to:
835// A _tmp;
836// _tmp = makeA(), _tmp;
837// ^?{}(&_tmp);
[9c14ae9]838\end{cfacode}
839In the above example, a global array of pointers is used to keep track of all of the allocated @A@ objects.
[f92aa32]840Due to copying on return, the current object being destructed does not exist in the array if an @A@ object is ever returned by value from a function, such as in @makeA@.
[9c14ae9]841
[7493339]842This problem could be solved in the translator by changing the function signatures so that the return value is moved into the parameter list.
[9c14ae9]843For example, the translator could restructure the code like so
844\begin{cfacode}
845void f(struct A x, struct A * _retval_f){
846 ?{}(_retval_f, x); // construct directly into caller's stack frame
847}
848
849struct A y;
850?{}(&y);
851struct A z = { 0 };
852
853struct A _tmp_cp1; // argument 1
854struct A _tmp_cp_ret0; // return value
855f((?{}(&_tmp_cp1, y) , _tmp_cp1), &_tmp_cp_ret0), _tmp_cp_ret0;
856^?{}(&_tmp_cp_ret0); // return value
857^?{}(&_tmp_cp1); // argument 1
858\end{cfacode}
859This transformation provides @f@ with the address of the return variable so that it can be constructed into directly.
[7493339]860It is worth pointing out that this kind of signature rewriting already occurs in polymorphic functions that return by value, as discussed in \cite{Bilson03}.
[9c14ae9]861A key difference in this case is that every function would need to be rewritten like this, since types can switch between managed and unmanaged at different scope levels, e.g.
862\begin{cfacode}
863struct A { int v; };
[7493339]864A x; // unmanaged, since only trivial constructors are available
[9c14ae9]865{
866 void ?{}(A * a) { ... }
867 void ^?{}(A * a) { ... }
868 A y; // managed
869}
870A z; // unmanaged
871\end{cfacode}
[7493339]872Hence there is not enough information to determine at function declaration whether a type is managed or not, and thus it is the case that all signatures have to be rewritten to account for possible copy constructor and destructor calls.
[9c14ae9]873Even with this change, it would still be possible to declare backwards compatible function prototypes with an @extern "C"@ block, which allows for the definition of C-compatible functions within \CFA code, however this would require actual changes to the way code inside of an @extern "C"@ function is generated as compared with normal code generation.
[7493339]874Furthermore, it is not possible to overload C functions, so using @extern "C"@ to declare functions is of limited use.
[9c14ae9]875
[7493339]876It would be possible to regain some control by adding an attribute to structs that specifies whether they can be managed or not (perhaps \emph{manageable} or \emph{unmanageable}), and to emit an error in the case that a constructor or destructor is declared for an unmanageable type.
[9c14ae9]877Ideally, structs should be manageable by default, since otherwise the default case becomes more verbose.
878This means that in general, function signatures would have to be rewritten, and in a select few cases the signatures would not be rewritten.
879\begin{cfacode}
880__attribute__((manageable)) struct A { ... }; // can declare constructors
881__attribute__((unmanageable)) struct B { ... }; // cannot declare constructors
882struct C { ... }; // can declare constructors
883
884A f(); // rewritten void f(A *);
885B g(); // not rewritten
886C h(); // rewritten void h(C *);
887\end{cfacode}
888An alternative is to instead make the attribute \emph{identifiable}, which states that objects of this type use the @this@ parameter as an identity.
[f92aa32]889This strikes more closely to the visible problem, in that only types marked as identifiable would need to have the return value moved into the parameter list, and every other type could remain the same.
[9c14ae9]890Furthermore, no restrictions would need to be placed on whether objects can be constructed.
891\begin{cfacode}
892__attribute__((identifiable)) struct A { ... }; // can declare constructors
893struct B { ... }; // can declare constructors
894
895A f(); // rewritten void f(A *);
896B g(); // not rewritten
897\end{cfacode}
898
[f92aa32]899Ultimately, both of these are patchwork solutions.
900Since a real compiler has full control over its calling conventions, it can seamlessly allow passing the return parameter without outwardly changing the signature of a routine.
901As such, it has been decided that this issue is not currently a priority and will be fixed when a full \CFA compiler is implemented.
[9c14ae9]902
903\section{Implementation}
904\subsection{Array Initialization}
[7493339]905Arrays are a special case in the C type-system.
[9c14ae9]906C arrays do not carry around their size, making it impossible to write a standalone \CFA function that constructs or destructs an array while maintaining the standard interface for constructors and destructors.
907Instead, \CFA defines the initialization and destruction of an array recursively.
908That is, when an array is defined, each of its elements is constructed in order from element 0 up to element $n-1$.
909When an array is to be implicitly destructed, each of its elements is destructed in reverse order from element $n-1$ down to element 0.
910As in C, it is possible to explicitly provide different initializers for each element of the array through array initialization syntax.
911In this case, each of the initializers is taken in turn to construct a subsequent element of the array.
912If too many initializers are provided, only the initializers up to N are actually used.
913If too few initializers are provided, then the remaining elements are default constructed.
914
915For example, given the following code.
916\begin{cfacode}
917struct X {
918 int x, y, z;
919};
920void f() {
921 X x[10] = { { 1, 2, 3 }, { 4 }, { 7, 8 } };
922}
923\end{cfacode}
924The following code is generated for @f@.
925\begin{cfacode}
926void f(){
927 struct X x[((long unsigned int )10)];
928 // construct x
929 {
930 int _index0 = 0;
931 // construct with explicit initializers
932 {
933 if (_index0<10) ?{}(&x[_index0], 1, 2, 3);
934 ++_index0;
935 if (_index0<10) ?{}(&x[_index0], 4);
936 ++_index0;
937 if (_index0<10) ?{}(&x[_index0], 7, 8);
938 ++_index0;
939 }
940
941 // default construct remaining elements
942 for (;_index0<10;++_index0) {
943 ?{}(&x[_index0]);
944 }
945 }
946 // destruct x
947 {
948 int _index1 = 10-1;
949 for (;_index1>=0;--_index1) {
950 ^?{}(&x[_index1]);
951 }
952 }
953}
954\end{cfacode}
955Multidimensional arrays require more complexity.
956For example, a two dimensional array
957\begin{cfacode}
958void g() {
959 X x[10][10] = {
960 { { 1, 2, 3 }, { 4 } }, // x[0]
961 { { 7, 8 } } // x[1]
962 };
963}\end{cfacode}
964Generates the following
965\begin{cfacode}
966void g(){
967 struct X x[10][10];
968 // construct x
969 {
970 int _index0 = 0;
971 for (;_index0<10;++_index0) {
972 {
973 int _index1 = 0;
974 // construct with explicit initializers
975 {
976 switch ( _index0 ) {
977 case 0:
978 // construct first array
979 if ( _index1<10 ) ?{}(&x[_index0][_index1], 1, 2, 3);
980 ++_index1;
981 if ( _index1<10 ) ?{}(&x[_index0][_index1], 4);
982 ++_index1;
983 break;
984 case 1:
985 // construct second array
986 if ( _index1<10 ) ?{}(&x[_index0][_index1], 7, 8);
987 ++_index1;
988 break;
989 }
990 }
991 // default construct remaining elements
992 for (;_index1<10;++_index1) {
993 ?{}(&x[_index0][_index1]);
994 }
995 }
996 }
997 }
998 // destruct x
999 {
1000 int _index2 = 10-1;
1001 for (;_index2>=0;--_index2) {
1002 {
1003 int _index3 = 10-1;
1004 for (;_index3>=0;--_index3) {
1005 ^?{}(&x[_index2][_index3]);
1006 }
1007 }
1008 }
1009 }
1010}
1011\end{cfacode}
1012% It is possible to generate slightly simpler code for the switch cases, since the value of @_index1@ is known at compile-time within each case, however the procedure for generating constructor calls is complicated.
1013% It is simple to remove the increment statements for @_index1@, but it is not simple to remove the
1014%% technically, it's not hard either. I could easily downcast and change the second argument to ?[?], but is it really necessary/worth it??
1015
1016\subsection{Global Initialization}
1017In standard C, global variables can only be initialized to compile-time constant expressions.
1018This places strict limitations on the programmer's ability to control the default values of objects.
1019In \CFA, constructors and destructors are guaranteed to be run on global objects, allowing arbitrary code to be run before and after the execution of the main routine.
1020By default, objects within a translation unit are constructed in declaration order, and destructed in the reverse order.
1021The default order of construction of objects amongst translation units is unspecified.
1022It is, however, guaranteed that any global objects in the standard library are initialized prior to the initialization of any object in the user program.
1023
[f92aa32]1024This feature is implemented in the \CFA translator by grouping every global constructor call into a function with the GCC attribute \emph{constructor}, which performs most of the heavy lifting \cite[6.31.1]{GCCExtensions}.
[9c14ae9]1025A similar function is generated with the \emph{destructor} attribute, which handles all global destructor calls.
1026At the time of writing, initialization routines in the library are specified with priority \emph{101}, which is the highest priority level that GCC allows, whereas initialization routines in the user's code are implicitly given the default priority level, which ensures they have a lower priority than any code with a specified priority level.
[f92aa32]1027This mechanism allows arbitrarily complicated initialization to occur before any user code runs, making it possible for library designers to initialize their modules without requiring the user to call specific startup or tear-down routines.
[9c14ae9]1028
1029For example, given the following global declarations.
1030\begin{cfacode}
1031struct X {
1032 int y, z;
1033};
1034void ?{}(X *);
1035void ?{}(X *, int, int);
1036void ^?{}(X *);
1037
1038X a;
1039X b = { 10, 3 };
1040\end{cfacode}
1041The following code is generated.
1042\begin{cfacode}
1043__attribute__ ((constructor)) static void _init_global_ctor(void){
1044 ?{}(&a);
1045 ?{}(&b, 10, 3);
1046}
1047__attribute__ ((destructor)) static void _destroy_global_ctor(void){
1048 ^?{}(&b);
1049 ^?{}(&a);
1050}
1051\end{cfacode}
1052
[7493339]1053% https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Attributes.html#C_002b_002b-Attributes
1054% suggestion: implement this in CFA by picking objects with a specified priority and pulling them into their own init functions (could even group them by priority level -> map<int, list<ObjectDecl*>>) and pull init_priority forward into constructor and destructor attributes with the same priority level
[f92aa32]1055GCC provides an attribute @init_priority@, which allows specifying the relative priority for initialization of global objects on a per-object basis in \CC.
[7493339]1056A similar attribute can be implemented in \CFA by pulling marked objects into global constructor/destructor-attribute functions with the specified priority.
1057For example,
1058\begin{cfacode}
1059struct A { ... };
1060void ?{}(A *, int);
1061void ^?{}(A *);
1062__attribute__((init_priority(200))) A x = { 123 };
1063\end{cfacode}
1064would generate
1065\begin{cfacode}
1066A x;
1067__attribute__((constructor(200))) __init_x() {
1068 ?{}(&x, 123); // construct x with priority 200
1069}
1070__attribute__((destructor(200))) __destroy_x() {
1071 ?{}(&x); // destruct x with priority 200
1072}
1073\end{cfacode}
1074
[9c14ae9]1075\subsection{Static Local Variables}
1076In standard C, it is possible to mark variables that are local to a function with the @static@ storage class.
[f92aa32]1077Unlike normal local variables, a @static@ local variable is defined to live for the entire duration of the program, so that each call to the function has access to the same variable with the same address and value as it had in the previous call to the function.
[7493339]1078Much like global variables, in C @static@ variables can only be initialized to a \emph{compile-time constant value} so that a compiler is able to create storage for the variable and initialize it at compile-time.
[9c14ae9]1079
1080Yet again, this rule is too restrictive for a language with constructors and destructors.
1081Instead, \CFA modifies the definition of a @static@ local variable so that objects are guaranteed to be live from the time control flow reaches their declaration, until the end of the program, since the initializer expression is not necessarily a compile-time constant, but can depend on the current execution state of the function.
1082Since standard C does not allow access to a @static@ local variable before the first time control flow reaches the declaration, this restriction does not preclude any valid C code.
1083Local objects with @static@ storage class are only implicitly constructed and destructed once for the duration of the program.
1084The object is constructed when its declaration is reached for the first time.
1085The object is destructed once at the end of the program.
1086
1087Construction of @static@ local objects is implemented via an accompanying @static bool@ variable, which records whether the variable has already been constructed.
1088A conditional branch checks the value of the companion @bool@, and if the variable has not yet been constructed then the object is constructed.
[f92aa32]1089The object's destructor is scheduled to be run when the program terminates using @atexit@ \footnote{When using the dynamic linker, it is possible to dynamically load and unload a shared library. Since glibc 2.2.3 \cite{atexit}, functions registered with @atexit@ within the shared library are called when unloading the shared library. As such, static local objects can be destructed using this mechanism even in shared libraries on Linux systems.}, and the companion @bool@'s value is set so that subsequent invocations of the function do not reconstruct the object.
[9c14ae9]1090Since the parameter to @atexit@ is a parameter-less function, some additional tweaking is required.
1091First, the @static@ variable must be hoisted up to global scope and uniquely renamed to prevent name clashes with other global objects.
1092Second, a function is built which calls the destructor for the newly hoisted variable.
1093Finally, the newly generated function is registered with @atexit@, instead of registering the destructor directly.
1094Since @atexit@ calls functions in the reverse order in which they are registered, @static@ local variables are guaranteed to be destructed in the reverse order that they are constructed, which may differ between multiple executions of the same program.
1095Extending the previous example
1096\begin{cfacode}
1097int f(int x) {
1098 static X a;
1099 static X b = { x, x }; // depends on parameter value
1100 static X c = b; // depends on local variable
1101}
1102\end{cfacode}
1103Generates the following.
1104\begin{cfacode}
1105static struct X a_static_var0;
1106static void __a_dtor_atexit0(void){
1107 ((void)^?{}(((struct X *)(&a_static_var0))));
1108}
1109static struct X b_static_var1;
1110static void __b_dtor_atexit1(void){
1111 ((void)^?{}(((struct X *)(&b_static_var1))));
1112}
1113static struct X c_static_var2;
1114static void __c_dtor_atexit2(void){
1115 ((void)^?{}(((struct X *)(&c_static_var2))));
1116}
1117int f(int x){
1118 int _retval_f;
1119 __attribute__ ((unused)) static void *_dummy0;
1120 static _Bool __a_uninitialized = 1;
1121 if ( __a_uninitialized ) {
1122 ((void)?{}(((struct X *)(&a_static_var0))));
1123 ((void)(__a_uninitialized=0));
1124 ((void)atexit(__a_dtor_atexit0));
1125 }
1126
1127 __attribute__ ((unused)) static void *_dummy1;
1128 static _Bool __b_uninitialized = 1;
1129 if ( __b_uninitialized ) {
1130 ((void)?{}(((struct X *)(&b_static_var1)), x, x));
1131 ((void)(__b_uninitialized=0));
1132 ((void)atexit(__b_dtor_atexit1));
1133 }
1134
1135 __attribute__ ((unused)) static void *_dummy2;
1136 static _Bool __c_uninitialized = 1;
1137 if ( __c_uninitialized ) {
1138 ((void)?{}(((struct X *)(&c_static_var2)), b_static_var1));
1139 ((void)(__c_uninitialized=0));
1140 ((void)atexit(__c_dtor_atexit2));
1141 }
1142}
1143\end{cfacode}
1144
[f92aa32]1145\subsection{Polymorphism}
1146As mentioned in section \ref{sub:polymorphism}, \CFA currently has 3 type-classes that are used to designate polymorphic data types: @otype@, @dtype@, and @ftype@.
1147In previous versions of \CFA, @otype@ was syntactic sugar for @dtype@ with known size/alignment information and an assignment function.
1148That is,
[9c14ae9]1149\begin{cfacode}
[f92aa32]1150forall(otype T)
1151void f(T);
[9c14ae9]1152\end{cfacode}
[f92aa32]1153was equivalent to
[9c14ae9]1154\begin{cfacode}
[f92aa32]1155forall(dtype T | sized(T) | { T ?=?(T *, T); })
1156void f(T);
[9c14ae9]1157\end{cfacode}
[f92aa32]1158This allows easily specifying constraints that are common to all complete object types very simply.
1159
1160Now that \CFA has constructors and destructors, more of a complete object's behaviour can be specified by than was previously possible.
1161As such, @otype@ has been augmented to include assertions for a default constructor, copy constructor, and destructor.
1162That is, the previous example is now equivalent to
[9c14ae9]1163\begin{cfacode}
[f92aa32]1164forall(dtype T | sized(T) | { T ?=?(T *, T); void ?{}(T *); void ?{}(T *, T); void ^?{}(T *); })
1165void f(T);
[9c14ae9]1166\end{cfacode}
[f92aa32]1167This allows @f@'s body to create and destroy objects of type @T@, and pass objects of type @T@ as arguments to other functions, following the normal \CFA rules.
1168A point of note here is that objects can be missing default constructors (and eventually other functions through deleted functions), so it is important for \CFA programmers to think carefully about the operations needed by their function, as to not over-constrain the acceptable parameter types.
Note: See TracBrowser for help on using the repository browser.